Control Systems: Classical, Neural, and Fuzzy

advertisement
.
Control Systems: Classical, Neural, and Fuzzy
Oregon Graduate Institute
Lecture Notes - 1998
Eric A. Wan
1
.
Preface
This material corresponds to the consolidated lecture summaries and handouts for Control
Systems: Classical, Neural, and Fuzzy. The class was motivated from a desire to educate students
interested in neural and fuzzy control. Often students and practitioners studying these subjects
lack the fundamentals. This class attempts to provide both a foundation and appreciation for
traditional control theory, while at the same time, putting newer trends in neural and fuzzy control
into perspective. Yes, this really is only a one quarter class (though not everything was covered
each term). Clearly, the material could be better covered over two quarters and is also not meant
as a substitute for more formal courses in control theory.
The lecture summaries are just that. They are often terse on explanation and are not a substitute
for attending lectures or reading the supplemental material. Many of the summaries were initially
formatted in LaTeX by student1 \scribes" who would try to decipher my handwritten notes after a
lecture. Subsequent years I would try to make minor corrections. Included gures are often copied
directly from other sources (without permission). Thus, these notes are not for general distribution.
This is the rst draft of a working document - beware of typos!
Eric A. Wan
B. John, M. Kumashikar, Y. Liao, A. Nelson, M. Saell, R. Sharma, T. Srinivasan, S. Tibrewala, X. Tu, S.
Vuuren
1
i
Contents
Preface
Class Information Sheet
I Introduction
0.1
0.2
0.3
0.4
Basic Structure . . . . . . . . .
Classical Control . . . . . . . .
State-Space Control . . . . . .
Advanced Topics . . . . . . . .
0.4.1 Dynamic programming .
0.4.2 Adaptive Control . . . .
0.4.3 Robust Control / H1 .
0.5 History of Feedback Control . .
i
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
3
3
4
4
5
1 Neural Control
6
2 Fuzzy Logic
9
1.0.1 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 History of Neural Networks and Neural Control . . . . . . . . . . . . . . . . . . . . .
7
9
2.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Fuzzy Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Summary Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
II Basic Feedback Principles
12
1 Dynamic Systems - \Equations of Motion"
2 Linearization
12
19
2.1 Feedback Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Small Signal Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Basic Concepts
3.1 Laplace Transforms . . . . . . . . . . . . . . . .
3.1.1 Basic Properties of Laplace Transforms
3.2 Poles and Zeros . . . . . . . . . . . . . . . . .
3.3 Second Order Systems . . . . . . . . . . . . . .
3.3.1 Step Response . . . . . . . . . . . . . .
3.4 Additional Poles and Zeros . . . . . . . . . . .
3.5 Basic Feedback . . . . . . . . . . . . . . . . . .
3.6 Sensitivity . . . . . . . . . . . . . . . . . . . . .
3.7 Generic System Tradeos . . . . . . . . . . . .
3.8 Types of Control - PID . . . . . . . . . . . . .
3.9 Steady State Error and Tracking . . . . . . . .
4 Appendix - Laplace Transform Tables
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20
20
20
21
22
23
25
26
28
30
31
33
35
ii
III Classical Control - Root Locus
38
1 The Root Locus Design Method
38
1.1
1.2
1.3
1.4
1.5
1.6
Introduction . . . . . . . . . . . . . . . . . .
Denition of Root Locus . . . . . . . . . . .
Construction Steps for Sketching Root Loci
Illustrative Root Loci . . . . . . . . . . . .
Some Root Loci Construction Aspects . . .
Summary . . . . . . . . . . . . . . . . . . .
2 Root Locus - Compensation
2.1 Lead Compensation . . . . . .
2.1.1 Zero and Pole Selection
2.2 Lag Compensation . . . . . . .
2.2.1 Illustration . . . . . . .
2.3 The "Stick on a Cart" example
2.4 Extensions . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
38
38
39
44
45
46
46
47
48
48
48
49
51
IV Frequency Design Methods
52
1 Frequency Response
2 Bode Plots
52
52
2.1 Stability Margins . . . . . . . . . . . . . . . . .
2.2 Compensation . . . . . . . . . . . . . . . . . . .
2.2.1 Bode's Gain-Phase Relationship . . . .
2.2.2 Closed-loop frequency response . . . . .
2.3 Proportional Compensation . . . . . . . . . . .
2.3.1 Proportional/Dierential Compensation
2.3.2 Lead compensation . . . . . . . . . . . .
2.3.3 Proportional/Integral Compensation . .
2.3.4 Lag Compensation . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
56
58
58
59
60
63
64
66
67
3 Nyquist Diagrams
68
V Digital Classical Control
76
1 Discrete Control - Z-transform
76
2 Root Locus control design
82
3.0.5 Nyquist Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.1 Stability Margins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
1.1 Continuous to Discrete Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
1.2 ZOH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
1.3 Z-plane and dynamic response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.1 Comments - Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
iii
3 Frequency Design Methods
84
4 Z-Transfrom Tables
85
VI State-Space Control
87
3.1 Compensator design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.2 Direct Method (Ragazzini 1958) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
1 State-Space Representation
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
1.12
1.13
1.14
Denition . . . . . . . . . . . . . . . . . . . . . . .
Continuous-time . . . . . . . . . . . . . . . . . . .
Linear Time Invariant Systems . . . . . . . . . . .
"Units" of F in physical terms . . . . . . . . . . . .
Discrete-Time . . . . . . . . . . . . . . . . . . . . .
1.5.1 Example 2 - "analog computers" . . . . . .
State Space Vs. Classical Approach . . . . . . . .
Linear Systems we won't study . . . . . . . . . . .
Linearization . . . . . . . . . . . . . . . . . . . . .
State-transformation . . . . . . . . . . . . . . . . .
Transfer Function . . . . . . . . . . . . . . . . . . .
1.10.1 Continuous System . . . . . . . . . . . . . .
1.10.2 Discrete System . . . . . . . . . . . . . . .
Example - what transfer function don't tell us. . .
Time-domain Solutions . . . . . . . . . . . . . . . .
Poles and Zeros from the State-Space Description .
Discrete-Systems . . . . . . . . . . . . . . . . . . .
1.14.1 Transfer Function . . . . . . . . . . . . . .
1.14.2 Relation between Continuous and Discrete .
2 Controllability and Observability
2.1
2.2
2.3
2.4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Controller canonical form . . . . . . . . . . . . . . . . .
Duality of the controller and observer canonical forms .
Transformation of state space forms . . . . . . . . . . .
Discrete controllability or reachability . . . . . . . . . .
2.4.1 Controllability to the origin for discrete systems
2.5 Observability . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Things we won't prove . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
87
. 87
. 88
. 88
. 88
. 88
. 89
. 89
. 90
. 91
. 92
. 92
. 92
. 93
. 94
. 95
. 98
. 100
. 100
. 100
101
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 101
. 102
. 102
. 106
. 107
. 107
. 108
3.1 Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Reference Input . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Selection of Pole Locations . . . . . . . . . . . . . . . . . . . . .
3.3.1 Method 1 : Dominant second-order poles . . . . . . . . .
3.3.2 Step Response Using State Feedback . . . . . . . . . . . .
3.3.3 Gain/Phase Margin . . . . . . . . . . . . . . . . . . . . .
3.3.4 Method 2 : Prototype Design . . . . . . . . . . . . . . . .
3.3.5 Method 3 : Optimal Control and Symmetric Root Locus .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 110
. 112
. 114
. 114
. 115
. 117
. 119
. 119
iv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Feedback control
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
109
3.4 Discrete Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4 Estimator and Compensator Design
4.1 Estimators . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Selection of Estimator poles e . . . . . . .
4.2 Compensators: Estimators plus Feedback Control .
4.2.1 Equivalent feedback compensation . . . . .
4.2.2 Bode/root locus . . . . . . . . . . . . . . .
4.2.3 Alternate reference input methods . . . . .
4.3 Discrete Estimators . . . . . . . . . . . . . . . . . .
4.3.1 Predictive Estimator . . . . . . . . . . . . .
4.3.2 Current estimator . . . . . . . . . . . . . .
4.4 Miscellaneous Topics . . . . . . . . . . . . . . . . .
4.4.1 Reduced order estimator . . . . . . . . . . .
4.4.2 Integral control . . . . . . . . . . . . . . . .
4.4.3 Internal model principle . . . . . . . . . . .
4.4.4 Polynomial methods . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
134
. 134
. 135
. 136
. 138
. 138
. 146
. 147
. 147
. 147
. 148
. 148
. 148
. 149
. 149
5 Kalman
6 Appendix 1 - State-SpaceControl
7 Appendix - Canonical Forms
150
151
153
VII Advanced Topics in Control
157
1 Calculus of Variations - Optimal Control
1.1 Basic Optimization . . . . . . . . . . . . . . . .
1.1.1 Optimization without constraints . . . .
1.1.2 Optimization with equality constraints .
1.1.3 Numerical Optimization . . . . . . . . .
1.2 Euler-Lagrange and Optimal control . . . . . .
1.2.1 Optimization over time . . . . . . . . .
1.2.2 Miscellaneous items . . . . . . . . . . .
1.3 Linear system / ARE . . . . . . . . . . . . . .
1.3.1 Another Example . . . . . . . . . . . . .
1.4 Symmetric root locus . . . . . . . . . . . . . . .
1.4.1 Discrete case . . . . . . . . . . . . . . .
1.4.2 Predictive control . . . . . . . . . . . . .
1.4.3 Other applied problems . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Dynamic Programming (DP) - Optimal Control
2.1 Bellman . . . . . . .
2.2 Examples: . . . . . .
2.2.1 Routing . . .
2.2.2 City Walk . .
2.3 General Formulation
2.4 LQ Cost . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
157
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 157
. 157
. 157
. 159
. 159
. 159
. 161
. 161
. 162
. 162
. 163
. 164
. 165
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 167
. 168
. 168
. 168
. 168
. 169
167
2.5 Hamilton-Jacobi-Bellman (HJB) Equations . . . . . . . . . . . . . . . . . . . . . . . 170
2.5.1 Example: LQR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
2.5.2 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
3 Introduction to Stability Theory
3.1 Comments on Non-Linear Control . . . . . . . . . . .
3.1.1 Design Methods . . . . . . . . . . . . . . . . .
3.1.2 Analysis Methods . . . . . . . . . . . . . . . .
3.2 Describing Functions . . . . . . . . . . . . . . . . . . .
3.3 Equivalent gains and the Circle Theorem . . . . . . .
3.4 Lyapunovs Direct Method . . . . . . . . . . . . . . . .
3.4.1 Example . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Lyapunovs Direct Method for a Linear System
3.4.3 Comments . . . . . . . . . . . . . . . . . . . . .
3.5 Lyapunovs Indirect Method . . . . . . . . . . . . . . .
3.5.1 Example: Pendulum . . . . . . . . . . . . . . .
3.6 Other concepts worth mentioning . . . . . . . . . . . .
4 Stable Adaptive Control
4.1 Direct Versus Indirect Control . . . . . . . . .
4.2 Self-Tuning Regulators . . . . . . . . . . . . .
4.3 Model Reference Adaptive Control (MRAC) .
4.3.1 A General MRAC . . . . . . . . . . .
5 Robust Control
5.1 MIMO systems . . . . . . . . . . . . . . . .
5.2 Robust Stability . . . . . . . . . . . . . . .
5.2.1 Small gain theorem (linear version) .
5.2.2 Modeling Uncertainty . . . . . . . .
5.2.3 Multiplicative Error . . . . . . . . .
5.3 2-port formulation for control problems . .
5.4 H1 control . . . . . . . . . . . . . . . . . .
5.5 Other Terminology . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
173
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 173
. 173
. 173
. 173
. 175
. 178
. 180
. 181
. 181
. 182
. 182
. 183
183
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 183
. 184
. 185
. 187
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 190
. 191
. 191
. 192
. 193
. 195
. 196
. 199
189
VIII Neural Control
200
1 Heuristic Neural Control
200
1.1
1.2
1.3
1.4
Expert emulator . . . . . . . . . . . . .
Open loop / inverse control . . . . . . .
Feedback control - ignoring the feedback
Examples . . . . . . . . . . . . . . . . .
1.4.1 Electric Arc furnace control . . .
1.4.2 Steel rolling mill . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Euler-Lagrange Formulation of Backpropagation
vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 200
. 201
. 205
. 205
. 205
. 209
213
3 Neural Feedback Control
3.1 Recurrent neural networks . . . . . . . . . . . . . . . . . .
3.1.1 Real Time Recurrent Learning (RTRL) . . . . . .
3.1.2 Dynamic BP . . . . . . . . . . . . . . . . . . . . .
3.2 BPTT - Back Propagation through Time . . . . . . . . .
3.2.1 Derivation by Lagrange Multipliers . . . . . . . . .
3.2.2 Diagrammatic Derivation of Gradient Algorithms:
3.2.3 How do RTRL & BPTT Relate . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
214
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 215
. 215
. 216
. 217
. 218
. 219
. 219
4.1 Summary of methods for desired response . . . . . . . . . . . .
4.1.1 Video demos of Broom balancing and Truck backing . .
4.1.2 Other Misc. Topics . . . . . . . . . . . . . . . . . . . . .
4.1.3 Direct Stable Adaptive Control . . . . . . . . . . . . . .
4.1.4 Comments on Controllability / Observability / Stability
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 220
. 222
. 229
. 229
. 229
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 234
. 234
. 235
. 238
. 240
. 241
. 242
. 243
. 243
. 243
4 Training networks for Feedback Control
5 Reinforcement Learning
5.1 Temporal Credit Assignment . . . . . . . . .
5.1.1 Adaptive Critic (Actor - Critic) . . . .
5.1.2 TD algorithm . . . . . . . . . . . . . .
5.2 Broom Balancing using Adaptive Critic . . .
5.3 Dynamic Programming Perspective . . . . . .
5.3.1 Heuristic DP (HDP) . . . . . . . . . .
5.4 Q-Learning . . . . . . . . . . . . . . . . . . .
5.5 Model Dependant Variations . . . . . . . . .
5.6 Relating BPTT and Reinforcement Learning
5.7 LQR and Reinforcement Learning . . . . . .
6 Reinforcement Learning II
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Immediate RL . . . . . . . . . . . . . . . . . . . . .
6.2.1 Finite Action Set . . . . . . . . . . . . . . . .
6.2.2 Continuous Actions . . . . . . . . . . . . . .
6.3 Delayed RL . . . . . . . . . . . . . . . . . . . . . . .
6.3.1 Finite States and Action Sets . . . . . . . . .
6.4 Methods of Estimating V (and Q ) . . . . . . . . .
6.4.1 n-Step Truncated Return . . . . . . . . . . .
6.4.2 Corrected n-Step Truncated Return . . . . .
6.4.3 Which is Better: Corrected or Uncorrected?
6.4.4 Temporal Dierence Methods . . . . . . . . .
6.5 Relating Delayed-RL to DP Methods . . . . . . . . .
6.6 Value Iteration Methods . . . . . . . . . . . . . . . .
6.6.1 Policy Iteration Methods . . . . . . . . . . .
6.6.2 Continuous Action Spaces . . . . . . . . . . .
7 Selected References on Neural Control
8 Videos
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
219
230
244
. 244
. 244
. 244
. 245
. 246
. 246
. 247
. 247
. 247
. 247
. 248
. 249
. 249
. 250
. 251
251
252
IX Fuzzy Logic & Control
253
1
2
3
4
253
253
254
255
Fuzzy Systems - Overview
Sample Commercial Applications
Regular Set Theory
Fuzzy Logic
4.1
4.2
4.3
4.4
4.5
4.6
Denitions . . . . .
Properties . . . . .
Fuzzy Relations . .
Boolean Functions
Other Denitions .
Hedges . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Fuzzy Systems
5.1
5.2
5.3
5.4
5.5
A simple example of process control. . . . . .
Some Strategies for Defuzzication . . . . . .
Variations on Inference . . . . . . . . . . . . .
7 Rules for the Broom Balancer . . . . . . . .
Summary Comments on Basic Fuzzy Systems
6 Adaptive Fuzzy Systems
6.1 Fuzzy Clustering . . . . . . . . . . . . . . .
6.1.1 Fuzzy logic viewed as a mapping . .
6.1.2 Adaptive Product Space Clustering .
6.2 Additive Model with weights (Kosko) . . .
6.3 Example Ad-Hoc Adaptive Fuzzy . . . . . .
6.4 Neuro-Fuzzy Systems . . . . . . . . . . . . .
6.4.1 Example: Sugene Fuzzy Model . . .
6.4.2 Another Simple Example . . . . . .
7 Neural & Fuzzy
7.1
7.2
7.3
7.4
7.5
NN as design Tool . . . . . . . . .
NN for pre-processing . . . . . . .
NN corrective type . . . . . . . . .
NN for post-processing . . . . . . .
Fuzzy Control and NN system ID .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 255
. 256
. 257
. 257
. 258
. 258
259
. 260
. 262
. 262
. 264
. 264
265
. 265
. 265
. 266
. 266
. 267
. 267
. 268
. 269
269
. 270
. 270
. 271
. 271
. 271
8 Fuzzy Control Examples
272
9 Appendix - Fuzziness vs Randomness
275
8.1 Intelligent Cruise Control with Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . 272
8.2 Fuzzy Logic Anti-Lock Brake System for a Limited Range Coecient of Friction
Surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
viii
X Exercises and Simulator
277
HW 1 - Continuous Classical Control
HW 2 - Digital Classical Control
HW 3 - State-Space
HW 4 - Double Pendulum Simulator - Linear SS Control
Midterm
Double Pendulum Simulator - Neural Control
Double Pendulum Simulator Documents
277
277
277
277
277
277
277
0.1 System description for double pendulum on a cart . . . . . . . . . . . . . . . . . . . 278
0.2 MATLAB les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
ix
ECE553 - Control Systems: Classical, Neural, and Fuzzy
Applications of modern control systems range from advanced aircraft, to processes control in integrated circuit manufacturing, to fuzzy washing machines. The aim of this class is to integrate
dierent trends in control theory.
Background and perspective is provided in the rst half of the class through review of basic classical
techniques in feedback control (root locus, bode, etc.) as well as state-space approaches (linear
quadratic regulators, Kalman estimators, and introduction to optimal control). We then turn to
more recent movements at the forefront control technology. Articial neural network control is
presented with emphasis on nonlinear dynamics, backpropagation-through-time, model reference
control, and reinforcement learning. Finally, we cover fuzzy logic and fuzzy systems as a simple
heuristic based, yet often eective, alternative for many control problems.
Instructor:
Eric A. Wan
Room 581, Phone (503) 690-1164, Internet: ericwan@ece.ogi.edu.
Web Page:
http://www.ece.ogi.edu/~ericwan/ECE553/
Grading:
Homework-45%, Midterm-25%, Project-30%
Prerequisites:
Digital signal processing, statistics, MATLAB programming.
Required Text:
1. E. Wan. Control Systems: Classical, Neural, and Fuzzy, OGI Lecture Notes.
Recommended Texts:
(General Control)
1. G. Franklin, J. Powell, M. Workman. Digital Control of Dynamic Systems, 2nd ed. AddisonWesley, 1990.
2. G. Franklin, J. Powell, A. Emami-Naeini. Feedback Control of Dynamic Systems, 3rd ed.
Addison-Wesley, 1994.
3. B. Friedland. Control Systems Design: An introduction to state-space methods, McGraw-Hill,
1986
4. C. Phillips, H. Nagle, Digital Control System analysis and design. Prentice-Hall, 1984.
5. B. Shahian, M. Hassul. Control Systems Design Using MATLAB. Prentice Hall, 1993.
6. T. Kailath. Linear Systems. Prentice Hall, 1980
(Optimal Control)
1. A. E. Bryson, Jr., Y. Ho. Applied optimal control : optimization, estimation, and control.
Hemisphere Pub. Corp, 1975.
x
2. M. Edorardo. Optimal, Predictive, and Adaptive Control. Englewood Clis, NJ, Prentice
Hall, 1995.
3. D. Kirk, Optimal Control Theory: An Introduction. Prentice Hall, 1970.
(Adaptive Control)
1. K. S. Narendra, A. M. Annaswamy. Stable adaptive systems. Prentice Hall, 1989.
2. G. Goodwin, K Sin, Adaptive Filtering, Prediction, and Control, Prentice-Hall, 1984.
3. S. Sastry, M. Bodson. Adaptive Control, Stability, Convergence, and Robustness. Prentice
Hall, 1989.
4. K. J. Astrhom, B. Wittenmark. Adaptive Control. Addison-Wesley, 1989.
(Robust / H1 Control)
1. J. C. Doyle, B. A. Francis, A. R. Feedback control theory, Macmillan Pub. Co., 1992.
2. M. Green, D. Limebeer, Linear Robust Control, Prentice Hall, 1995.
(Neural and Fuzzy Control)
1. Handbook of intelligent control : fuzzy, neural, and adaptive approaches, edited by David
A. White, Donald A. Sofge, Van Nostrand Reinhold, 1992. (Strongly recommended but not
required.)
2. Neural Networks for Control, edited by W.T Miller, R. Sutton, P. Werbos. MIT Press, 1991.
3. Reinforcement Learning, R. Sutton, A. Barto, MIT Press, 1998.
4. Neuro-dynamic programming, Dimitri P. Bertsekas and John N. Tsitsiklis, Athena Scientic,
Belmont, 1997.
5. Neurocontrol: Towards An Industry Control Methodology, Tomaas Hrycej, Wiley 1997.
6. B. Kosko, Neural Networks and Fuzzy Systems, Prentice Hall, 1992.
7. Essentials of Fuzzy Modeling and Control, R. Yager, D. Filev, John Wiley and Sons, 1994.
8. E. Cox, The Fuzzy Systems Handbook: A Practitioner's Guide to Building, Using, and Maintaining Fuzzy Systems, Academic Press, 1994.
9. Fuzzy Logic Technology & Applications, edited by F.J. Marks II, IEEE Press, 1994.
10. F. M. McNeill and E. Thro. Fuzzy Logic: A practical approach, Academic Press, 1994.
11. Fuzzy Sets and Applications, edited by R.R. Yager, John Wiley and Sons, New York, 1987.
xi
Tentative Schedule
Week 1
{ Introduction, Basic Feedback Principles
Week 2
{ Bode Plots, Root Locus, Discrete Systems
Week 3
{ State Space Methods, Pole Placement, Controllability and Observability
Week 4
{ Optimal Control and LQR, Kalman lters and LQG
Week 5
{ Dynamic Programming, Adaptive Control or Intro to Robust Control
Week 6
{ Neural Networks, Neural Control Basics
Week 7
{ Backpropagation-through-time, Real-time-recurrent learning, Relations to optimal con
trol
Week 8
{ Reinforcement Learning, Case studies
Week 9
{ Fuzzy Logic, Fuzzy Control
Week 10
{ Neural-Fuzzy, Other
Finals Week
{ Class Presentations
xii
Part I
Introduction
What is "control"? Make x do y.
Manual control: human-machine interface, e.g., driving a car.
Automatic control (our interest): machine-machine interface (thermostat, moon landing, satellites, aircraft control, robotics control, disk drives, process control, bioreactors).
0.1 Basic Structure
Feedback control:
This is the basic structure that we will be using. In the digital domain, we must add D/A and
A/D converters:
where ZOH (zero-order hold) is associated with the D/A converter, i.e., the value of the input held
constant until the next value is available.
0.2 Classical Control
What is "classical" control? From one perspective, it is any type of control that is non-neural or nonfuzzy. More conventionally, it is control based on the use of transfer functions, G(s) (continuous)
or G(z ) (discrete) to represent linear dierential equations.
1
If G(z ) = b(z )=a(z ), then the roots of a(z ) (referred to as poles) and the roots of b(z ) (referred to
as zeros), determine the open-loop dynamics. The closed-loop dynamic response of the system is:
y = DG
r 1 + DG
(1)
What is the controller, D(z )? Examples:
k - (proportional control)
ks - (dierential control)
k/s - (integral control)
PID (combination of above)
lead/lag
Classical Techniques:
Root locus
Nyquist
Bode plots
0.3 State-Space Control
State-space, or "modern", control returns to the use of dierential equations:
x_ = Ax + bu
y = Hx
(2)
(3)
Any linear set of linear dierential equations can be put in this standard form were x is a state-vector.
Control law: given x, control u = ;kx + r
2
In large part, state-space control design involves nding the control law k. Common methods
include: 1) selecting a desired set of closed loop pole location or 2) optimal control, which involves
the solution of some cost function. For example, the solution of the LQR (linear quadratic regulator)
problem is expressed as the solution of k which minimizes
Z1
;1
(xT Qx + uT Rudt
and addresses the optimal trade-o between tracking performance and control eort.
We can also address issues of:
controllability
observability
stability
MIMO
How to nd the state x? Build an estimator:
with noise (stochastic), we will look at Kalman estimators which yields the LQG (Linear Quadratic
Gaussian)
0.4 Advanced Topics
0.4.1 Dynamic programming
Used to solve for general control law u which minimizes some cost function. This may be used, for
example, to solve the LQR problem, or some nonlinear u = f (x) where the system to be controlled
is nonlinear. Also used for learning trajectory control (terminal control).
Type classes of control problems:
a) Regulators (xed point control) or tracking.
3
b) Terminal control is concerned with getting from point a to point b. Examples: a robotics manipulator, path-planning, etc.
Terminal control is often used in conjunction with regulator control, where we rst plan an optimal
path and then use other techniques to track that path.
Variations:
a) Model predictive control: This is similar to LQR (innite Rhorizon) but you use a nite horizon
and resolve using dynamic programming at each time step: tN (u2 + Qy 2)dt. Also called receding
horizon problem. Popular in process control.
0
b) Time-optimal control: Getting from point a to point b in minimum time. e.g. solving the optimal trajectory for an airplane to get to a desired location. It is not a steady ascent. The optimal
is to climb to a intermediary elevation, level o, and then swoop to the desired height.
c) Bang-bang control: With control constraints, time-optimal control may lead to bang-bang control
(hard on / hard o).
0.4.2 Adaptive Control
Adaptive control may be used when the system to be controlled is changing with time. May also
be used with nonlinear system when we still want to use linear controllers (for dierent regimes).
In it's basic form, adaptive control is simply gain scheduling: switch in pre-determined control
parameters.
Methods include: self-tuning regulators and model reference control
0.4.3 Robust Control / H1
Robust control deals with the ability of a system to work under uncertainty and encompasses advanced mathematical methods for dealing with the same. More formally, it minimizes the maximum
singular value of the discrepancies between the closed-loop transfer function matrix and the desired
4
loop shape subject to a closed-loop stability constraint. It is a return to transfer function methods
for MIMO systems while still utilizing state-space techniques.
0.5 History of Feedback Control
Antiquity - Water clocks, level control for wine making etc. (which have now become modern
ush toilets)
1624, Drebble - Incubator (The sensor consisted of a riser lled with alcohol and mercury. A
the re heats up the box, the alcohol expands and the riser oats up lowering the damper on
the ue.)
1728, Watt - Flyball governor
1868, Maxwell - Flyball stability analysis (dierential equations ! linearization ! roots of
\characteristic equation" need be negative). 2nd and 3rd order systems.
1877, Routh - General test for stability of higher order polynomials.
1890, Lyapunov - Stability of non-linear dierential equations (introduced to state-space
control in 1958)
1910, Sperry - Gyroscope and autopilot control
5
1927, Black - Feedback amplier, Bush - Dierential analyzer (necessary for long distance
telephone communications)
1932, Nyquist stability Criterion
1938, Bode - Frequency response methods.
1936 - PID control methods
1942, Wiener - Optimal lter design (control plus stochastic processes)
1947 - Sampled data systems
1948, Evans - Root locus (developed for guidance control of aircraft)
1957, Bellman - Dynamic Programming
1960, Kalman - Optimal Estimation
1960, State-space or \modern" control (this was motivated from work on satellite control,
and was a return to ODE's)
1960's - MIMO state-space control, adaptive control
1980's - Zames, Doyle - Robust Control
1 Neural Control
Neural control (as well as fuzzy control) was developed in part as a reaction by practitioners to the
perceived excessive mathematical intensity and formalism of "classical" control. Although neural
control techniques have been inappropriately used in the past, the eld of neural control is now
starting to mature.
Early systems: open loop: C P ;1
6
Feedback control with neural networks:
Train by back-propagation-through-time (BPTT) or real-time-recurrent-learning (RTRL).
May have similar objectives and cost functions (as in LQR, minimum-time, model-reference,
etc.) as classical control, but solution are iterative and approximate.
"Approximately optimal control" - optimal control techniques and strategies constrained to
be approximate by nature of training and network architectural constraints.
Often little regard for "dynamics" - few theoretical results regarding stability, controllability,
etc.
Some advantages:
MIMO = SISO
Can handle non-linear systems
Generates good results in many problems
1.0.1 Reinforcement Learning
A related area of neural network control is reinforcement learning ("approximate dynamic programming")
Dynamic programming allows us to do a stepwise cost-minimization in order to solve a more
complicated trajectory optimization. Bellman Optimality Condition: An optimal policy has the
property that whatever the initial state and the initial decision are, the remaining decisions must
constitute an optimal policy with regard to the state resulting from the rst decision.
7
Short-term, or step-wise, maximization of J leads to long term maximization of u(k). (Unfortunately, problems grows exponentially with number of variables).
To minimize some utility function along entire path (to nd control u(k)), need only minimize
individual segments:
J (xk;1) = min
uk [r(xk ; uk ) + J (xk )]
In general, dynamic programming problems can be extremely dicult to solve. In reinforcement
learning, an adaptive critic is used to get an approximation of J .
(These methods are more complicated that other neural control strategies, are hard to train, have
many unresolved issues, but oer great promise for future performance.)
8
1.1 History of Neural Networks and Neural Control
1943, McCoulloch & Pitts - Model of articial neuron
1949, Hebb - simple learning rule
1957, Rosenblatt - Perceptron
1957, Bellman - Dynamic Programming (origins of reinforcement learning, 1960 - Samuel's
learning checker game)
1959, Widrow & Ho - LMS and Adalines (1960 - \Broom Balancer" control)
1967-1982 \Dark Ages"
1983 Barto, Anderson, Sutton - Adaptive Heuristic Critics
1986 Rumelhart, McClelland, et al. - Backpropagation (1974 - Werbos, 1982 - Parker)
1989 Nguyen, Werbos, Jordan, etc. - Backpropagation-Through-Time for control
1990's Increased sophistication, applications, some theory, relation to \classical" control recognized.
2 Fuzzy Logic
2.1 History
1965, L. Zadeh - Fuzzy Sets
1974, Mamdani & Assilan - Steam engine control using fuzzy logic, other examples.
1980's Explosion of applications from Japan (fuzzy washing machines).
1990's Adaptive Fuzzy Logic, Neural-Fuzzy, etc.
9
2.2 Fuzzy Control
1.
2.
3.
4.
No regard for dynamic, stability, mathematical modeling
Simple to design
Combines heuristic "linguistic" rule-based knowledge with smooth control
Elegant "gain scheduling" and interpolation
Fuzzy logic makes use of "membership functions" which introduce an element of ambiguity, e.g.,
the "degree" to which we may consider zero to be zero.
Because of its use of intuitive rules, fuzzy logic can be used appropriately in a wide range of control
problems where mathematical precision is not important, but it is also often misused. Newer
methods try to incorporate ideas from neural networks to adapt the fuzzy systems (neural-fuzzy).
10
2.3 Summary Comments
Classical (and state-space) control is a mature eld that utilizes rigorous mathematics and exact
modeling to exercise precise control over the dynamic response of a system.
Neural control overlaps much with classical control. It is less precise in its formulation yet may
yield better performance for certain applications.
Fuzzy control is less rigorous, but is a simple approach which generates adequate results for many
problems.
There is a place and appropriate use for all three methods.
11
Part II
Basic Feedback Principles
This handout briey describes some fundamental concepts in Feedback Control.
1 Dynamic Systems - \Equations of Motion"
Where do system equations come from?
Mechanical Systems
12
Rotational Systems
T = I
(Torque) = = (moment of inertia) (angular acceleration)
{ Satellite
{ Pendulum
13
{ \Stick on cart" / Inverted Pendulum (linearized equations)
14
{ Boeing Aircraft
15
Electrical Circuits
16
Electro-mechanical
Heat Flow
17
Incompressible Fluid Flow
Bio-reactor
18
2 Linearization
Consider the pendulum system shown.
θ
(mg)sinθ
= ml2 + mgl sin (4)
Two methods are typically used in the linear approximation of non-linear systems.
2.1 Feedback Linearization
= mgl sin + u
(5)
then,
ml2  = u
Linear always!
This method of linearization is used in Robotics for manipulator control.
2.2 Small Signal Linearization
 = ; sin form = g = l = 1:
 is a function of and ; f (; ) . Using a Taylors Expansion,
@f
@f
f (; ) = f (0; 0) + @ 0;0 + @ + : : : higher order terms
0;0
f (0; 0) = 0 at equilibrium point.
So,
and for an inverted pendulum
(6)
(7)
(8)
 = 0 + (; cos )j0;0 + = ; +  = 0 + (; cos )j0; + = + These linearized system models should then be tested on the original system to check the acceptability of the approximation.
Linearization does not always work, as can be seen from systems below.
Consider the following function y_1 = y13 versus y_2 = ;y23 ,
19
y
1
y
2
t
y
2
y
1
Linearizing both systems yields y_i = 0. However, this is not correct since it would imply that both
systems have similar responses.
3 Basic Concepts
3.1 Laplace Transforms
F (s) =
Z1
0
f (t) e;st dt
(9)
The inverse is usually computed by factorization and transformation to the time domain.
L[ f(t) ]
(t)
f(t)
1
1(t)
1
1s
s12
s+a
1
(s+a)2
a
s2 +a2
s
s2 +a2
s+a
(s+a)2 +b2
t
e;at
te;at
sin at
cos at
e;at cos at
Table 1: Some Standard Laplace Transforms
3.1.1 Basic Properties of Laplace Transforms
Convolution
Z1
y (t) = h(t) ? u(t) =
;1
h( ) u(t ; )dt
Y (s) = H (s) U (s)
20
(10)
(11)
Derivatives
y_ $ sY (s) ; y (0)
(12)
y(n) + an;1 y (n;1) + : : : + a1y_ + a0y = bmum + bm;1um;1 + : : : + b0u
(13)
m
m;1
Y (s) = bmssn ++a bm;s1ns;1 + +: : :: :+: +a b0 U (s)
(14)
Y (s) = H (s) = b(s)
U (s)
a(s)
(15)
n;1
for y (0) = 0.
0
H(s) - Transfer Function
Example:
A two mass system represented by two second order coupled equations:
x + mb (x_ ; y_ ) + mk (x ; y ) = mU
1
1
1
y + mb (y_ ; x_ ) + mk (y ; x) = 0
2
2
s2X (s) + mb (s X (s) ; s Y (s)) + mk (X (s) ; Y (s)) = Um(s)
1
1
1
s2 Y (s) + mb (s Y (s) ; X (s)) + mk (Y (s) ; X (s)) = 0
2
2
After some algebra we nd that, UY ((ss)) = (m s +b s + k)(mb s+ +k b s + k) ; (b + k)
1
2
2
2
3.2 Poles and Zeros
Consider the system,
H (s) = s2 2+s 3+s 1+ 2
b(s) = 2 (s + 21 ) ! zero : ;21
a(s) = (s + 1) (s + 2) ! poles : ; 1; ;2
21
2
x
x o
H (s) = s;+11 + s +3 2 ; h(t) = ;e;t + e;2t
The response can be empirically estimated by an inspection of the pole-zero locations. For
stability, poles have to be in the Left half of the s-Plane (LHP). Control is achieved by manipulating
the pole-zero locations.
3.3 Second Order Systems
!n2
H (s) = s2 + 2!
s + !2
n
n
= damping ratio
!n = natural frequency
q
s = ; ; j!d; = !n ; !d = !n 1 ; 2
damping = 0 ! oscillation
= 1 ! smooth damping to nal level
1 ! some oscillation; can realize faster response
22
(16)
3.3.1 Step Response
Step Response = Hs(s)
y (t) = 1 ; e;t(cos !d t + ! sin !d t)
d
(18)
Rise time : tr 1:8
(19)
Overshoot : Mp 1 ; 0:6 ; 0 0:6
(20)
!n
Typical values are :
(17)
ts = 4:6
M = 0:16 for = 0:5
M = 0:4 for = 0:707
23
(21)
24
Design Specications
For specied tr , Mp and ts
!n 1t:8
(22)
0:6 (1 ; Mp)
(23)
4t:6
(24)
r
s
Pole Locations
for a given
σ
for a given
damping ξ
for a given
frequencyω
3.4 Additional Poles and Zeros
H1(s) = (s + 1)(2 s + 2) = s +2 1 ; s +2 2
s + 1:1) = 0:18 + 1:64
H2 (s) = 1:1(2(s +
1)(s + 2) s + 1 s + 2
Here, a zero has been added near one of the poles. The 1.1 factor in the denominator adjusts the
DC gain, as can be seen from the Final Value Theorem :
lim y (t) = slim
!0 sY (s)
t!1
(25)
Note, the zero at -1.1 almost cancels the inuence of the Pole at -1. As s ! 0, the DC gain is
higher by a factor of 1.1, hence the factor in the denominator above.
As Zero approaches origin ! increased overshoot.
Zero in RHP results in non-minimum phase system and a direction reversal in the time
response.
Poles dominated by those nearest the origin.
Poles / Zeros can "cancel" in LHP relative to step response, but may still aect initial
conditions.
25
(zs+1)/(s^2+s+1)
3.5
3
2.5
Amplitude
2
1.5
1
0.5
0
−0.5
0
2
4
6
8
Time (secs)
10
12
14
1/(ps+1)(s^2+s+1)
1.2
1
Amplitude
0.8
0.6
0.4
0.2
0
0
5
10
15
Time (secs)
3.5 Basic Feedback
r
e
+
H(s)
k
y
-
E (s) = R(s) ; Y (s)
(26)
Y (s) = k H (s) E (s)
(27)
Y (s) = k H (s) (R(s) ; Y (s))
(28)
(1 + kH (s))Y (s) = kH (s)R(s)
(29)
Y (s) = kH (s) ; E (s) =
1
R(s) 1 + kH (s) R(s) 1 + kH (s)
(30)
26
(s) ! ; kH (s) ! 1
(31)
With k large, 1 +kHkH
(s)
kH (s)
We need to consider the system dynamics. Large k may make the system unstable. Example:
1
s(s + 2)
The closed loop characteristic equation:
s2 + 2s + k = 0
x
x
0.5
0
0
0.4
0.6
0.8
1
1.2
1.4
Time (secs)
proportional feedback, k =1
4
6
8
Time (secs)
proportional feedback, k =10
1.6
1.8
2
1
Amplitude
Example 2:
0.2
0.5
Amplitude
Amplitude
Amplitude
step response
1
0
0
2
10
12
14
2
1
0
0
0.5
1
1.5
0.5
1
1.5
2
2.5
3
3.5
Time (secs)
proportional feedback, k =100
4
4.5
5
4
4.5
5
2
1
0
0
2
2.5
3
Time (secs)
3.5
1
H (s) = s [ (s + 4)
2 + 16 ]
The closed loop characteristic equation:
s [ (s + 4)2 + 16 ] + k = 0
x
x
x
.
27
0.02
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time (secs)
proportional feedback, k =1
0.8
0.9
1
20
40
60
80
100
120
140
Time (secs)
proportional feedback, k =100
160
180
200
0.5
1
1.5
2
2.5
3
3.5
Time (secs)
proportional feedback, k =275
4
4.5
5
0.05
0.1
0.15
0.4
0.45
0.5
1
0.5
Amplitude
Amplitude
Amplitude
Amplitude
step response
0.04
0
0
2
1
0
0
2
1
0
0
0.2
3.6 Sensitivity
SXY =4
0.25
0.3
Time (secs)
@Y
Y
@X
X
0.35
@Y
=X
Y @X
(32)
SXY is the \sensitivity of Y with respect to X ". Denotes relative changes in Y to relative changes
in X
(s)
H (s) = 1 +kGkG
(33)
(s)
@H = 1 ! 0 ; jKGj large
SKH = K
(34)
H @K 1 + KG
Thus we see that feedback reduces the parameter sensitivity. Example:
Motor
Ω
ko
r
1/ko
Shaft
Speed
s+1
By Final Value Theorem, the nal state is = 1. But, suppose we are o with the plant gain
ko ! ko + k0, then,
output = k1 (k0 + k0 ) = + 1
0
Thus, a 10% error in k0 ) 10% error in Speed control.
ko
+
-
k
s+1
28
(35)
k sk+1
0
1 + skk+10
= s + 1kk+0 kk
(36)
0
0
Final values = 1 +kkkk
0
(37)
Set k2 = 1 +kkkk0
0
ko
r
+
-
k2
k
Sensitivity for k0 ! k0 + k0
s+1
0)
Final value = k2 1 +k (kk(0k+ +kk
0
0)
2
k2 4
3
5
1
1 + k(k +1 k )
1 1;x
1+x
0) ; 1
k2 1 ; k(k +1 k ) = 1 +kkkk0 k2 k(kk(0k+ +kk
0
0
0
0
0)
= 1 + 1 +1kk 0
for k large ! reduced sensitivity.
0
(38)
(39)
0
(40)
(41)
(42)
Example:
Integral Control
ko
r
+
-
k/s
kk0
s (s+1)
k k0 + 1
s s+1
s+1
= s (s +k1)k0+ k k
0
Final value
29
(43)
lim
0
s ! 0 s 1s s(s +kk
1) + kk0 = 1
(44)
0 Steady State Error - Always !
Proportional Control
faster
x
proportional
Increasing k, increases speed of response.
Integral Control
slower response
overshoot
x
x
Slower response, but more oscillations.
3.7 Generic System Tradeos
ω
r
e
+
µ
D
+
y
G
+
γ
For example:
Error
D(s) = +k ; Proportional Control
G
;DG
Y (s) = 1 +DG
DG R(s) + 1 + DG (s) + 1 + DG ;(s)
(45)
E (s) = 1 +1DG R(s) + 1 +;G
(s) + ;1 ;(s)
DG
1 + DG
(46)
Disturbance rejection
jDGj 1
30
Good tracking ; E small
jDGj 1
Noise Immunity
jDGj
small
3.8 Types of Control - PID
Proportional
Integral
D(s) = k ; u(k) = ke(t):
(47)
Zt
D(s) = T1 ks ; u(t) = Tk e(t)dt
(48)
D(s) = k TD s
(49)
i
i 0
) Zero steady state error ( may slow down dynamics)
Derivative
! increases damping and improves stability. Example:
M
damping
G(s) = s2 +1 as
= s (s 1+ a)
x
x
31
With derivative
Pole-Zero
Cancellation
x
x
Closed loop pole
trapped here
But what about initial conditions ?
(y) + a (y_ ) = u(t)
s2 Y (s) + s y (0) + a s Y (s) + a y(0) = U (s)
Y (s) = s(ss++aa) y(0) + s (Us (+s)a)
(s+a)Yo
r
e
+
ks
+
-
or Yo/s
1
____
s(s+a)
+
y
Relative to initial value,
Y (s) =
y(0)
s
1 + s+k a
y(0)
lim y (t) = slim
s s
t!1
!0 1 + s+k a
! 1y+(0)k 6= 0
a
Thus, the eect due to initial conditions is not negligible.
PID - Proportional - Integral - Derivative Control
kP + kD s + ksI
Parameters are often tuned using " Ziegler - Nichols PID tuning"
32
r
+
e
y
D
G
-
Figure 1:
3.9 Steady State Error and Tracking
The reference input to a control system is often of the form:
k
r(t) = kt ! 1(t)
(50)
R(s) = sk1+1
(51)
In most cases, the reference input will not be a constant but can be approximated as a linear
function of time for a time span long enough for the system to reach steady state. The error at
this point of time is called the steady-state error.
The type of input to the system depends on the value of k, as follows:
k=0 implies a step input (position)
k=1 implies a ramp input (velocity)
k=2 implies a parabolic input (acceleration)
The steady state error of a feedback control system is dened as:
e1 =4 tlim
!1 e(t)
(52)
e1 = slim
!0 sE (s)
(53)
where E(s) is the Laplace transform of the error signal and is dened as:
E (s) = 1 + D(1s)G(s) R(s)
(54)
(55)
E (s) = 1 + D(1s)G(s) sk1+1
1
1
e1 = slim
(56)
!0 sk 1 + D(s)G(s) = 0; 1; or a constant
Thus, the steady state error depends on the reference input and the loop transfer function.
System Type
The system type is dened as the order k for which e1 is a constant. This also equals number of
open loop poles at the origin. Example:
33
(1 + 0:5s) is of type 1
D(s)G(s) = s(1k+
s)(1 + 2s)
(57)
D(s)G(s) = sk3 is of type 3
(58)
Steady-state error of system with a step-function input (k=0)
Type 0 system:
1
1
==
e1 = 1 + DG
(0)
1+K
(59)
p
where Kp is called the closed loop DC gain or the step-error constant and is dened as:
Kp = slim
!0 D(s)G(s)
(60)
e1 = 0
(61)
Type 1 or higher system:
(i.e., DG(0) = 1, due to pole at origin)
Steady-state error of system with a ramp-function input (k=1)
Type 0 system:
e1 = 1
Type 1 system:
(62)
1
1 == 1
e1 = slim
=
lim
!0 s[1 + DG(s)] s!0 sDG(0)
K
v
where, Kv is called the velocity constant.
Type 2 or higher system:
e1 = 0
34
(63)
(64)
4 Appendix - Laplace Transform Tables
35
36
37
Part III
Classical Control - Root Locus
1 The Root Locus Design Method
1.1 Introduction
The poles of the closed-loop transfer function are the roots of the characteristic equation,
which determine the stability of the system.
Studying the behavior of the roots of the characteristic equation will reveal stability and
dynamics of the system, keeping in mind that the transient behavior of the system is also
governed by the zeros of the closed-loop transfer function.
An important study in linear control systems is the investigation of the trajectories of the
roots of the characteristic equation-or, simply, the root loci -when a certain system parameter
varies.
The root locus is a plot of the closed-loop pole locations in the s-plane (or z-plane).
It provides an insightful method for understanding how changes in the gain of the system
feedback inuence the closed loop pole locations.
r
+
e
KD(s)
G(s)
y
-
Figure 2:
1.2 Denition of Root Locus
The root locus is dened as a plot of the solution of:
1 + kD(s)G(s) = 0
We can think of D(s)G(s) as having the following form:
D(s)G(s) = ab((ss))
Then with feedback,
kD(s)G(s) = kb(s)
1 + kD(s)G(s) a(s) + kb(s)
The zeros of the open loop system do not move.
38
(65)
(66)
(67)
The poles move as a function of k.
The root locus of D(s)G(s) may also be dened as the locus of points in the s-plane where the phase
of D(s)G(s) is 180o . This is seen by noting that 1 + kD(s)G(s) = 0 if kD(s)G(s) = ;1, which
implies that the phase of D(s)G(s) = 180o.
At any point on the s-plane,
6 G = 6 due to zeros ; 6 due to poles
(68)
Example:
For example, consider the open-loop transfer function:
s+1
s[((s + 2)2 + 4)(s + 5)]
The pole-zero plot is shown below:
Assign a test point so = ;1 + 2j .
Draw vectors directing from the poles and zeros to the point so .
If so is indeed a point on the root locus, then equation 68 must equal ;180o.
6 G = 1 ; (1 + 2 + 3 + 4 )
1.3 Construction Steps for Sketching Root Loci
Step #1: Mark the poles and zeros of the open-loop transfer function.
Step #2: Plot the root locus due to poles on the real axis.
{ The root locus lies to the left of the odd number of real poles and real zeros.
{ A single pole or a single zero on the real axis introduces a phase shift of 180o and since
the left half of the s-plane is taken, the phase shift becomes negative.
39
{ Further, a second pole or a second zero on the real axis will introduce an additional
phase shift of 180o , making the overall phase-shift equal to 360o. Hence, the region is
chosen accordingly.
{ Here are a few illustrations:
180o
X
O
O
X
X
X
Step #3: Plot the asymptotes of the root loci.
{ Asymptotes give the behavior of the root loci as k ! 1.
{ For 1k G(s) = 0, as k ! 1, G(s) must approach 0. Typically, this should happen at the
zeros of G(s), but, it could also take place if there are more poles than zeros. We know
that b(s) has order m and a(s) has order n; i.e., let:
b(s) = sm + b1sm;1 + :::::
(69)
a(s) sn + a1 sn;1 + :::::
So, if n > m then as s ! 1, G(s) ! 0. That is, as s becomes large, the poles and zeros
approx. cancel each other. Thus
1 + kG(s) 1 + k (s ; 1)n;m
= pni ;; mzi
(70)
(71)
where, pi = poles, zi = zeros, and is the centroid.
{ There are n ; m asymptotes, where n is the number of zeros and m is the number of
poles, and since 6 G(s) = 180o, we have:
(n ; m)l = 180o + (l)360o
(72)
For instance, n ; m = 3 ) l = 60o ; 180o; 300o.
{ The s give the angle where the asymptotes actually go to. Note that the angle of
departure may be dierent and will be looked into shortly.
40
{ The asymptotes of the root loci intersect on the real axis at and this point of inter-
section is called the centroid. Hence, in other words, the asymptotes are centered at
.
{ The following gures a few typical cases involving asymptotes (which have the same
appearance as multiple roots):
Step #4: Compute the angles of departure and the angles of arrival of the root
loci. (optional-just use MATLAB)
{ The angle of departure or arrival of a root locus at a pole or zero denotes the angle of
the tangent to the locus near the point.
{ The root locus begins at poles and goes either to zeros or to 1, along the radial asymptotic lines.
{ To compute the angle of departure, take a test point so very near the pole and compute
the angle of G(so ), using equation 72. This gives the angle of departure dep of the
asymptote.
Also, dep = i ; i ; 180o ; 360ol
(73)
where, i = sum of the angles to the remaining poles and
i = sum of the angles to all the zeros.
{ For a multiple pole of order q:
qdep = i ; i ; 180o ; 360ol
(74)
In this case, there will be q branches of the locus, that depart from that multiple pole.
41
{ The same process is used for the angle of arrival arr :
q arr = i ; i + 180o + 360ol
(75)
where, i = sum of angles to all poles, i = sum of all angles to remaining zeros, and
q is the order of the zero at the point of arrival.
Step #5: Determine the intersection of the root loci with the imaginary axis.
(optional)
{ The points where the root loci intersect the imaginary axis of the s-plane (s = jw),
and the corresponding values of k, may be determined by means of the Routh-Hurwitz
criterion.
{ A root of the characteristic equation in the RHP implies that the closed-loop system is
unstable, as tested by the R-H criterion.
{ Using the R-H criterion, we can locate those values of K , for which an incremental
change will cause the number of roots in the RHP to change. Such values correspond to
a root locus crossing the imaginary axis.
Step #6: Determine the breakaway points or the saddle points on the root loci.
(optional)
{ Breakaway points or saddle points on the root loci of an equation correspond to multipleorder roots of the equation.
Figure 3:
{ Figure 3(a) illustrates a case in which two branches of the root loci meet at the breakaway
point on the real axis and then depart from the axis in opposite directions. In this case,
the breakaway point represents a double root of the equation, when the value of K is
assigned the value corresponding to the point.
42
{ Figure 3(b) shows another common situation when two complex-conjugate root loci ap{
{
{
{
proach the real axis, meet at the breakaway point and then depart in opposite directions
along the real axis.
In general, a breakaway point may involve more than two root loci.
Figure 3(c) illustrates such a situation when the breakaway point represents a fourthorder root.
A root locus diagram can have more than one saddle point. They need not always be
on the real axis and due to conjugate symmetry of root loci, the saddle points not on
the real axis must be in complex-conjugate pairs.
All breakaway points must satisfy the following equations:
dG(s)D(s) = 0
(76)
ds
1 + KG(s)D(s) = 0
(77)
{ The angles at which the root loci arrive or depart from a saddle point depends on the
number of loci that are involved at the point. For example, the root loci shown in Figures
3(a) and 3(b) all arrive and break away at 180o apart, whereas in Figure 3(c), the four
root loci arrive and depart with angles 90o apart.
{ In general, n root loci arrive or leave a breakaway point at 180=n degrees apart.
{ The following gure shows some typical situations involving saddle points.
43
1.4 Illustrative Root Loci
More examples:
order 2
O
X
X
O
X
X
X
O
X
X
X
O
X
order 2
O
X
Figure 4:
A MATLAB example:
+ 3)(s + 1 j 3)
G(s) = s(s + 1)((ss +
2)3(s + 4)(s + 5 2j )
44
(78)
Figure 5:
1.5 Some Root Loci Construction Aspects
From the standpoint of designing a control system, it is often useful to learn the eects on the root
loci when poles and zeros of D(s)G(s) are added or moved around in the s-plane. Mentioned below
are a few brief properties pertaining to the above.
Eects of Adding Poles and Zeros to D(s)G(s)
Addition of Poles: In general, adding a pole to the function D(s)G(s) in the left half of the
s-plane has the eect of pushing the root loci toward the right-half plane.
Addition of Zeros: Adding left-half plane zeros to the function D(s)G(s) generally has the
eect of moving and bending the root loci toward the left-half s-plane.
Calculation of gain k from the root locus
Once the root loci have been constructed, the values of k at any point on the loci can be determined.
We know that 1 + kG(s) = 0
kG(s) = ;1
(79)
45
k = ; G1(s)
(80)
k = jG1(s)j
(81)
distance from so to zeros
Graphically, k = distance
from s to poles
o
(82)
1.6 Summary
The Root Locus technique presents a graphical method of investigating the roots of the
characteristic equation of a linear time-invariant system when one or more parameters vary.
The steps of construction listed above should be adequate for making a reasonably adequate
plot of the root-locus diagram.
The characteristic-equation roots give exact indication on the absolute stability of the system.
The zeros of the closed-loop transfer function also govern the dynamic performance of the
system.
2 Root Locus - Compensation
If the process dynamics are of such a nature that a satisfactory design cannot be obtained by a gain
adjustment alone, then some modication of compensation of the process dynamics is indicated.
Two common methods for dynamic compensation are lead and lag compensation.
Compensation with a transfer function of the form
+ zi
D(s) = ss +
(83)
pi
is called lead compensation if zi < pi and lag compensation if zi > pi .
Compensation is typically placed in series with the plant in the feedforward path as shown
in the following gure:
Figure 6:
It can also be placed in the feedback path and in that location, has the same eect on the
overall system poles.
46
2.1 Lead Compensation
Lead compensation approximates PD control.
It acts mainly to lower the rise time.
It decreases the transient overshoot, hence improving the system damping.
It raises the bandwidth.
It has the eect of moving the locus to the left.
Illustration: consider a second-order system with transfer function
KG(s) = s(sK+ 1)
(84)
G(s) has the root locus shown by the solid line in gure 7. Let D(s) = s + 2. The root locus
produced by D(s)G(s) is shown by the dashed line. This adds a zero at s = ;2. The modied
locus is hence the circle.
Figure 7:
The eect of the zero is to move the root locus to the left, improving stability.
Also, by adding the zero, we can move the locus to a position having closed-loop roots and
damping ratio 0:5
We have "compensated" then dynamics by using D(s) = s + 2.
The trouble with choosing D(s) based on only a zero is that the physical realization would
contain a dierentiator that would greatly amplify the high-frequency noise present from the
sensor signal. Furthermore, it is impossible to build a pure dierentiator.
Try adding a pole at a high frequency, say at s = ;20, to give:
2
D(s) = ss++20
(85)
The following gure shows the resulting root loci when p = 10 and p = 20.
47
Figure 8:
2.1.1 Zero and Pole Selection
Selecting exact values of zi and pi is done by trial and error.
In general, the zero is placed in the neighborhood of the closed-loop control frequency !n .
The pole is located at 3 to 20 times the value of the zero location.
If the pole is too close to the zero, then the root locus moves back too far towards its
uncompensated shape and the zero is not successful in doing its job.
If the pole were too far to the left, then high-frequency noise amplication would result.
2.2 Lag Compensation
After obtaining satisfactory dynamic performance, perhaps by using one or more lead compensators,
the low-frequency gain of the system may be found to be low. This indicates an integration at nearzero frequencies, and is achieved by lag compensation.
Lag compensation approximates PI control.
A pole is placed near s = 0 (low frequency). But, usually a zero is included near the pole,
so that the pole-zero pair, called a dipole does not signicantly interfere with the dynamic
response of the overall system.
Choose D(s) = ss++pz ; z > p, where the values of z and p are small (e.g., z = 0:1 and p = 0:01).
Since z > p, the phase is negative, corresponding to a phase lag.
It improves the steady-state error by increasing the low-frequency gain.
Lag compensation however decreases the stability.
2.2.1 Illustration
Again, consider the transfer function, as in equation 84.
Include the lead compensation D1(s) = (s + 2)=(s + 20) that produced the locus as in gure .
Raise the gain until the closed-loop roots correspond to a damping ratio of = 0:707. At this
point, the root-locus gain is found to be 31.
48
The velocity constant is thus Kv = lims!0 sKDG = (31=10) = 3:1
Now, add a lag compensation of:
1
D2(s) = ss++00::01
(86)
This increases the velocity constant by about 10 (since z=p = 10) and keeps the values of
both z and p very small so that D2(s) would have very little eect on the dynamics of the
system. The resulting root locus is as shown in Figure 9.
Figure 9:
The very small circle near the origin is a result of the lag compensation.
A closed-loop root remains very near the lag compensation zero at ;:1, which will correspond
to a very slow decaying transient, which has a small magnitude because the zero will almost
cancel the pole in the transfer function. However, the decay is so slow that this term may
seriously inuence the settling time.
It is thus important to place the lag pole-zero combination at as high a frequency as possible
without causing major shifts in the dominant root locations.
The transfer function from a plant noise to the system error will not have the zero, and thus,
disturbance transients can be very long in duration in a system with lag compensation.
2.3 The "Stick on a Cart" example
O
m
u
After normalization, we have:
 ; = u
49
(87)
) s2(s) ; (s) = U (s)
(88)
(s)
1
U (s) = s2 ; 1
(89)
This results in the following model of the system: The root locus diagram is shown below. The
u
θ
1/(s^2 - 1)
Figure 10:
X
-1
X
1
Figure 11:
above gure indicates an unstable system, irrespective of the gain.
With lead compensation.
u
+
e
1
2
s -1
s+α
s+β
θ
-
X
O
X
50
X
The root loci of the lead-compensated system are now in the LHP, which indicates a stable
system.
A slight variation may result in a system as shown below, which may be more satisfactory.
This system may tend to be slower than the one considered above, but may have better
damping.
X
X
2.4 Extensions
O
X
Extensions to Root Locus include time-delays, zero-degree loci, nonlinear functions, etc...
51
Part IV
Frequency Design Methods
1 Frequency Response
Most of this information is covered in the Chapter 5 of the Franklin text.
Frequency domain methods remain popular in spite of other design methods such as root locus,
state space, and optimal control. They can also provide a good design in the face of plant uncertainty in the model.
We start with an open-loop transfer function
Y (s) = G(s) ! G(jw)
U (s)
Assume
u(t) = sin(wt)
For a liner system,
y(t) = A sin(wt + )
The magnitude is given by
A = jG(jw)j = jG(s)js=jw
And the phase by
ImG(jw) = 6 G(jw)
= arctan Re
G(jw)
(90)
(91)
(92)
(93)
(94)
2 Bode Plots
We refer to two plots when we talk about Bode plots
Magnitude plot - log10 magnitude vs. log10 w
Phase plot - phase vs. log10 w
Given a transfer function in s
s + z2)(: : :)
KG(s) = K ((ss ++ pz1)(
1)(s + p2)(: : :)
then there is a corresponding frequency response in jw
1 + 1)(jw2 + 1)(: : :)
KG(jw) = K 0 (jw
(jw)n(jwa + 1)(: : :)
The magnitude plot then shows
log10 KG(jw) = log10 K 0 + log10 jjw1 + 1j + : : : ; n log10 jjwj ; log10 jjwa + 1j ; : : :
The phase plot shows
6 KG(jw) = 6 K + 6 (jw1 + 1) + : : : ; n90o ; 6 (jwa + 1) ; : : :
There are three dierent terms to deal with in the previous equations:
52
(95)
(96)
(97)
(98)
K (jw)n
(jw + 1)1
(( wjwn )2 + 2 wjwn + 1)2
1. K (jw)n
log10 K j(jw)nj = log10 K + n log10 jjwj
(99)
This term adds a line with slope n through (1,1) on the magnitude plot, and adds a phase of n 90
to the phase plot. (see gure)
2. (jw + 1)1
When w 1 then the term looks like 1. When w 1 the term looks like jw . This term adds
a line with slope 0 for w < 1 and a line with slope 1 for w < 1 to the magnitude plot, and adds
90o of phase when w > 1 to the phase plot. (see gures)
3. (( wjwn )2 + 2 wjwn + 1)2
This term adds overshoot in the plots. (see gures)
53
54
See gures for sample Bode plots for the transfer function
s + 0:5)
(100)
G(s) = s(2000(
s + 10)(s + 50)
Note that the command BODE in Matlab will create Bode plots given a transfer function.
55
Bode Diagrams
20
0
Phase (deg); Magnitude (dB)
−20
−40
−60
−50
−100
−150
−2
10
−1
10
0
1
10
10
2
10
3
10
Frequency (rad/sec)
2.1 Stability Margins
We can look at the open-loop frequency response to determine the closed-loop stability characteristics. The root locus plots jKG(s)j = 1 and 6 G(s) = 180o. There are two measures that are used
to denote the stability characteristics of a system. They are the gain margin (GM) and the phase
margin (PM).
The gain margin is dened to be the distance between jKG(jw)j and the magnitude = 1 line
on the Bode magnitude plot at the frequency that satises 6 KG(jw) = ;180o . See gures for
examples showing how to determine the GM from the Bode plots. The GM can also be determined
from a root locus plot as
K j at s = jw
(101)
GM = jK j at jcurrent
design point
The phase margin is dened as the amount by which the phase of G(jw) exceeds ;180o at the
frequency that satises jKG(jw)j = 1.
The damping ratio can be approximated from the PM as
PM
(102)
100
See gures for examples showing how to determine the PM from the Bode plots. Also, there is a
graph showing the relationship between the PM and the overshoot fraction, Mp . Note that if both
the GM and the PM are positive, then the system is stable. The command MARGIN in Matlab
will display and calculate the gain and phase margins given a transfer function.
56
57
2.2 Compensation
2.2.1 Bode's Gain-Phase Relationship
When designing controllers using Bode plots we should be aware of the Bode gain-phase relationship. That is, for any stable, minimum phase system, the phase of G(jw) is uniquely determined
by the magnitude of G(jw) (see above). A fair approximation is
6 G(jw) n 90o
(103)
where n is the slope of the curve of the magnitude plot. As previously noted, we desire 6 G(jw) >
;80o at jKG(jw)j = 1 so a rule of thumb is to adjust the magnitude response so that the slope at
crossover (jKG(jw)j = 1) is approximately -1, which should give a nice phase margin.
Consider the system
G(s) = s12
(104)
D(s) = K (TD s + 1)
(105)
Clearly the slope at crossover = -2. We can use a PD controller
and pick a suitable K andTD = w1 to cause the slope at crossover to be -1. Figures below contain
plots of the open-loop and the compensated open-loop Bode plots.
1
58
2.2.2 Closed-loop frequency response
For the open-loop system, we will typically have
jG(jw)j 1 for w wc ,
jG(jw)j 1 for w wc
(106)
(107)
where wc is the crossover frequency. We can then approximate the closed-loop frequency response
by
G(jw) ( 1 for w w
(108)
jF j = 1 + G(jw) jGj for w wc
c
For w = wc , jF j depends on the phase margin. Figure 5.44 shows the relationship of jF j on the
PM for several dierent values for the PM. For a PM of 90o, jF (jwc)j = :7071. Also, if the PM =
90o then bandwidth is equal to wc . There is a tradeo between PM and bandwidth.
59
2.3 Proportional Compensation
60
61
Bode Diagrams
Gm=6.0206 dB (at 1 rad/sec), Pm=21.386 deg. (at 0.68233 rad/sec)
40
20
Phase (deg); Magnitude (dB)
0
−20
−40
−60
−100
−150
−200
−250
−1
0
10
10
Frequency (rad/sec)
62
2.3.1 Proportional/Dierential Compensation
A PD controller has the form
D(s) = K (TD s + 1)
(109)
and can be used to add phase lead at all frequencies above the breakpoint. If no change in gain of
low-frequency asymptote, PD compensation will increase crossover frequency and speed of response.
Increasing frequency-response magnitude at the higher frequencies will increase sensitivity to noise.
The gures below show the eect of a PD controller on the frequency response.
D(s) = K (TDs + 1)
63
2.3.2 Lead compensation
A lead controller has the form
Ts + 1
D(s) = K Ts
+1
where is less than 1. The maximum phase lead then occurs at
w = p1
T
(110)
(111)
Lead compensation adds phase lead at a frequency band between the two breakpoints, which
are usually selected to bracket the crossover frequency. If no change in gain of low-frequency
asymptote, lead compensation will increase the crossover frequency and speed of response over the
uncompensated system. If gain of low-frequency asymptote is reduced in order not to increase
crossover frequency, the steady-state errors of the system will increase. Lead compensation acts
approximately like a PD compensator, but with less high frequency amplication.
64
An example of lead control is given in the gures. Given an open-loop transfer function
G(s) = s(s 1+ 1)
(112)
design a lead controller that has a steady-state error of less than 10% to a ramp input. Also the
system is to have and overshoot (Mp ) of less than 25%. The steady-state error is given by
1
R(s)
(113)
e(1) = slim
s
!0 1 + D(s)G(s)
where R(s) = s1 for a unit ramp, which reduces to
2
8
9
<
= 1
1
e(1) = slim
1 ] ; = D(0)
!0 : s + D(s)[ (s+1)
(114)
so we can pick a K = 10. Also, using the relationship between PM and Mp we can see that we
need a PM of 45o to meet the overshoot requirement. We then experiment to nd T and and it
turns out that the desired compensation is
s
D(s) = 10 (( 2s ))++11
(115)
10
65
2.3.3 Proportional/Integral Compensation
A PI controller has the form
K
D(s) = s s + T1
I
(116)
PI control increases frequency-response magnitude at frequencies below the breakpoint thereby
decreasing steady-state errors. It also contributes phase lag below the breakpoint, which must be
kept at a low enough frequency so that it does not degrade stability excessively. Figures shows how
PI compensation will eect the frequency response.
66
2.3.4 Lag Compensation
A lag controller has the form
Ts + 1
D(s) = K Ts
+1
(117)
where is greater than 1. Lag compensation increases frequency-response magnitude at frequencies
below the two breakpoints thereby decreasing steady-state errors. With suitable adjustments in
loop gain, it can alternatively be used to decrease the frequency-response magnitude at frequencies
above the two breakpoints so that the crossover occurs at a frequency that yields an acceptable
phase margin. It also contributes phase lag between the two breakpoints, which must be kept at
low enough frequencies so that the phase decrease does not degrade stability excessively. Figures
show the eect of lag compensation on frequency response.
67
Question: Does increasing gain > 1 or < 1 at 180o cause stability or instability?
Answer: Usually instability, but in some cases the opposite holds. Use root locus or Nyquist
methods to see.
3 Nyquist Diagrams
A Nyquist plot refers to a plot of the magnitude vs. phase, and can be useful when Bode plots are
ambiguous with regard to stability. A Nyquist plot results from evaluating some transfer function
H (s) for values of s dened by some contour (see gures). If there are any poles or zeros inside the
contour then the Nyquist plot will encircle the origin one or more times. The closed-loop transfer
function has the form
Y = KG(s)
(118)
R 1 + KG(S )
The closed-loop response is evaluated by looking at
1 + KG(S ) = 0
(119)
which is simply the open-loop response, KG(s), shifted to the right by 1. Thus 1+ KG(S ) encircles
the origin i KG(s) encircles ;1. We can dene the contour to be the entire RHP (see gures).
If there are any encirclements while evaluating our transfer function, we know that the system is
unstable.
68
A clockwise encirclement of -1 indicates the presence of a zero in the RHP while a counterclockwise encirclement of -1 indicate a pole in the RHP. The net # of clockwise encirclements
is
N =Z ;P
(120)
Alternatively, in order to determine whether an encirclement is due to a pole or zero we can
write
(s)
(121)
1 + KG(s) = 1 + K ab((ss)) = a(s) a+(sKb
)
So the poles of 1 + KG(s) are also the poles of G(s). Since the number of RHP poles of G(s)
are known, we will assume that an encirclement of -1 indicates an unstable root of the closed-loop
system. Thus we have the number of closed-loop RHP roots
Z =N +P
69
(122)
3.0.5 Nyquist Examples
Several examples are given. The rst example has the transfer function
(123)
G(s) = (s +1 1)2
The root locus shows that the system is stable for all values of K . The Nyquist plot is shown for
K = 1 and it does not encircle ;1. Plots could be made for other values of K , but it should be
noted that an encirclement of -1 by KG(s) is equivalent to an encirclement of ;K1 by G(s). Since
G(s) only crosses the negative real axis at G(s) = 0, it will never encircle ;K1 for positive K .
70
71
The second example has the transfer function
G(s) = s(s +1 1)2
(124)
and is stable for small values of K . As can be seen from the Nyquist plot, larger values of K will
cause two encirclements. The large arc at innity in the Nyquist plot is due to the pole at 0. Two
poles at s = 0 would have resulted in a full 360o arc at innity.
72
The third example given has the transfer function
G(s) = s( ss +;11)2
10
(125)
For large values of K , there is one counterclockwise encirclement (see gures), so N = ;1. But
since P = 1 from the RHP pole of G(s), Z = 0 and there are no unstable system roots. When K
is small, N = 1 which indicates that Z = 2, that there are two unstable roots in the closed-loop
system.
73
3.1 Stability Margins
Now we are interested in dening the gain and phase margins in terms of how far the system is
from encircling the -1 point. The gain margin is dened as the inverse of jKG(jw)j when the plot
crosses the negative real axis. The phase margin is dened to be the dierence between the phase
of G(jw) and ;180o when the plot crosses the unit circle. Their determination is shown graphically
in the gures below. A problem with these denitions is that there may be several GM's and PM's
indicated by a Nyquist plot. A proposed solution is the vector margin which is dened to be the
distance to the -1 point from the closest approach of the Nyquist plot. The vector margin can be
dicult to calculate though.
Recall the denition of sensitivity
S = 1 +1GD
(126)
The sensitivity minimized over w is equal to the inverse of the vector margin. A similar result holds
for a MIMO system, where
1
min (S (1jw)) = kS (jw
(127)
)k
1
where is the maximum singular value of the matrix and k k1 is the innity norm.
74
75
Part V
Digital Classical Control
1 Discrete Control - Z-transform
X (z ) =
Examples
O
1
X
k=;1
X
x(k)z;k
O
X
O
X
O
X
Step response
exponential decay
{ Step response
sinusoid
1(k) ! 1 ;1z ;1
{ Exponential decay
rk ! z ;z r
{ sinusoid
For stability ! kpolesk < 1
Final value theorem
r cos )
[rk cos k]1(k) ! z 2 ;z (2zr;
(cos )z + r2
;1
lim x(k) = zlim
!1(1 ; z )X (z )
k!1
1.1 Continuous to Discrete Mapping
G(s)
G(z)
1. Integration Methods
s
s
s
z ; 1 forward method
T
z ; 1 backward method
Tz
2 z ; 1 trapezoidal/bilinear transformation method
T z+1
76
2. Pole-zero mapping
3. ZOH equivalent
u
y
D/A
A/D
G(s)
ZOH
sample
sampler (linear & time varying)
(with ant-alias filter)
Gives exact representation between u(k), y(k) (does not necessarily tell you what happens
between samples)
1.2 ZOH
o
1
o
o
o
o
T
Impulse- response
sampler - linear, time-variant
Input
Y (s)
Y (z )
G(z )
=
=
=
=
1(t) ; 1(t ; T )
(1 ; e;Ts ) Gs(s)
z-transform of nsamples of y(t)
o
Z fy (kT )g = Z L;1 fY (s)g
4 Z fY (s)g for shorthand
=
G
(
s
)
;
Ts
= Z (1 ; e ) s
G(s) ;
1
= (1 ; z )Z
s
Example 1:
G(s) = s +a a
G(s) =
a
1
1
s
s
(s + a) = s ; s + a
G(s) ;
1
L
= 1(t) ; e;aT 1(t)
s
after sampling = 1(kT ) ; e;akT 1(kT )
;aT
Z-transform = 1 ;1z ;1 ; 1 ; e;1aT z ;1 = (z ;z (11)(;ze; e;)aT )
;aT
G( z ) = 1 ; e
z ; e;aT
77
pole at a ! pole at e;aT , no zero ! no zero
Example 2:
G(s) = s12
G(z) = (1 ; z ;1 )Z s13
2
G(z) = T2(z(z;+1)1)2 Using Table lookup
pole at 0 ! pole at e;0T = 1
zero at 1 ! zero at ;1
In general,
z = esT for poles
zeros cannot be mapped by the same relationship.
For 2nd order systems - given parameters (Mp; tr ; ts; )
The poles s1 ; s2 = ;a jb
map to z = rej ; r = e;aT ; = bT
smaller the value of T ! closer to the unit circle, i.e. closer to z = 1
Additional comments on ZOH
U (k) ! Y (k)
What we really want to control is the continuous output y(t).
Consider sinusoid ! sampled ! ZOH
78
As seen in the gure,
fundamental frequency is still the same
but introduction of phase delay - reduced phase margin (bad for control)
adds higher harmonics.
1 ; e;sT ! s = ;j! ) e ;j!T Tsinc( !T )
s
2
2
79
As seen in the gure, extra harmonics excite G(s), this may cause aliasing and other eects.
Therefore we need to add an anti-aliasing lter as shown
D/A
A/D
ZOH
G(s)
anti-alias filter here
1.3 Z-plane and dynamic response
Plot of z = esT for various s:
80
Typically in digital control we can place poles anywhere in the left-half plane, but because of
oscillation, it is preferred to work with positive real part only.
z-plane grid is given by "zgrid" in MATLAB
81
Higher order systems
as pole comes closer to z=1, the system slows down
as zero comes closer to z=1, causes overshoot
as pole and zero come close to each other, they tend to cancel each other
Dead-beat control. In digital domain we can do several things not possible in continuous domain,
e.g. pole-zero cancellation, or suppose we set all closed loop poles to z = 0.
Consider closed loop response
b1z3 + b2z2 + b3 z + b4 = b z ;1 + b z;2 + b z;3 + b z;4
1
2
3
4
z4
All poles are at 0. This is called dead-beat control.
2 Root Locus control design
2 methods:
1. Emulation
Transform D(s) ! D(z )
This is generally considered an older approach.
2. Direct design in the z-domain - The following steps can be followed:
(a)
ZOH + G(s) ! G(z )
(b) design specications
Mp; tr ; ts ! pole locations in z-domain
(c) Compensator design : Root locus or Bode analysis or Nyquist analysis in Z-domain
82
y
r
D(z)
G(z)
y = D(z)G(z )
r 1 + D(z )G(z)
Solution of the characteristic equation
1 + kD(z )G(z ) = 0
gives the root locus. The rules for plotting are exactly the same as the continuous case.
Ex. 1.
a
s(s + a)
z + 0:9
G(z) with ZOH = k (z ; 1)(
z ; e;aT )
proportional feedback
X
X
O
X
X
-a
O
unstable (probably
due to phase lag
introduction)
X
X
O
X
O
* not great damping ratio as K increases
X
O X X
Better Lead design
(gives better damping at higher K - better Kv)
* lead control - add zero to cancel pole
83
2.1 Comments - Latency
u(k) is a function of y (k) (e.g. lead dierence equation).
computation time (in computer) introduces delay.
Worst case latency - to see the eect of a full 1 clock delay (z;1), add a pole at the origin i.e
1/z
D(z)
G(z)
To overcome this problem we may try to sample faster than required.
sample faster ! complex control
sample slower ! more latency
3 Frequency Design Methods
z = ej!
Rules for construction of Bode plot and Nyquist plots are the same as in continuous domain.
Denitions for GM and PM are the same. However, plotting by \hand" is not practical. (Use
dbode in MATLAB.)
3.1 Compensator design
Proportional k ! k
;1
Derivative ; ks ! k 1 ; z = k z ; 1
T
z
z
;
PD ! k
z
k
Integral s ! 1 ;kz ;1
PID D(z ) = kp (1 + zk;d z1 + kp(zz; 1) )
84
3.2 Direct Method (Ragazzini 1958)
Closed Loop Response
H (z ) = 1 +DG
DG
(z )
D = G1(z ) 1 ;HH
(z )
Sometimes if H(z) is not chosen properly, D(z) may be non-causal, non-minimal phase or
unstable.
Causal D(z) (cannot have a pole at innity) implies that H(z) must have zero at innity of
the same order as zero of G(z) at innity.
For stability of D(z),
{ 1 - H(z) must contain as zeros, all poles of G(z) that are outside the unit circle
{ H(z) must contain as zeros all zeros of G(z) that are outside the unit circle
Adding all these constraints to H(z), gives rise to simultaneous equations which are tedious
to solve.
This method is somewhat similar to a method called Pole Placement, though it is not as easy
as one might think. It does not necessarily yield good phase-margin, gain-margin etc. There
are more eective method is State-Space Control.
4 Z-Transfrom Tables
85
86
Part VI
State-Space Control
1 State-Space Representation
Algebraic based method of doing control.
Developed in the 1960's.
Often called \Modern" Control Theory.
Example
f
M
x
f
Let x1
x2
x_1
x_2
!
=
ma
2
m dx
dt = mx
x (position)
x_ (velocity)
=
=
=
=
0 1
0 0
!
!
x1 + 0
1
x2
m
!
u
x_ = Fx
| +{z Gu} ! state equation
linear in x,u
x = xx1
2
!
! State
Any linear, nite state, dynamic equation can be expressed in this form.
1.1 Denition
The state of a system at time t0 is the minimum amount of information at t0 that (together with
u(t); t t0) uniquely determines behavior of the system for all t t0 .
87
1.2 Continuous-time
First order ODE
x_ = f (x; u; t)
y = h(x; u; t)
1.3 Linear Time Invariant Systems
x_ = Fx + Gu
y = Hx + Ju
x(t) Rn (state)
u(t) R (input)
y (t) R (output)
F Rnn
G R1n
H Rn1
J R
Sometimes we will use A; B; C; D instead of the variables F; G; H:J . Note that dimensions change
for MIMO systems.
1.4 "Units" of F in physical terms
1 = freq
time
relates derivative of a term to the term
x_ = ax, large a ) faster the response.
1.5 Discrete-Time
xk+1 = Fxk + Guk
yk = Hxk + Juk
Sometimes we make use of the following variables,
F
G
H
J
!
!
!
!
88
; A
;; B
C
D
1.5.1 Example 2 - "analog computers"
y + 3y_ ; y = u
Y (s) =
1
2
U (s s + 3s ; 1
Never use dierentiation for implementation.
Instead, use integral model.
u
.
..
y
y
.
x
x
1
y
.
x2
1
x2
-3
1
.
x_1
x_2
!
= ;13 10
!
!
!
x1 + 1 u
x2
0
x1 !
y= 0 1
x2
Later we will show standard (canonical) forms for implementing ODE's or TF's.
1.6 State Space Vs. Classical Approach
SS is an internal description of the system whereas Classical approach is an external descrip
tion.
SS gives additional insight into the control problem and may tell us things about a system
that a TF approach misses (e.g. controllability, observability, internal stability).
Generalizes well to MIMO.
Helps in optimal control design / estimation.
Design
{ Classical - given specications are natural frequency, damping, rise time, GM, PM
Design a compensator
(Lead/Lag PID)
R.L /
Bode
89
find closed loop poles
which hopefully meet
specifications
{ SS - given specications are pole/zero locations or other constraints.
S.S description
Exact closed loop
poles
math (place poles)
iterate on specs to get good GM, PM, etc..
1.7 Linear Systems we won't study
1.
y(t) = u(t ; 1)
This simple continuous system of a 1 second delay is innite dimensional.
x(t) = [u(t)]tt;1 ( state [0; 1] ! R)
u
t-1
t
2. Cantilever beam
y = deflection
u
current
u
o
y o
voltage
transmission line
Innite dimensional, distributed system, PDE
90
1.8 Linearization
O
m=1
 = ; sin x1 = ; x2 = _
Non-linear state-space representation.
x_ =
x_ 1
x_ 2
!
=
!
x2
f1
=
f
(
x;
u
)
=
; sin x1 + u
f2
!
Small-signal approximation (linearization)
@f
f (x; u) = f (0; 0) + @f
@x (0; 0)x + @u (0; 0)u + : : :
(f (0; 0) = 0; assume 0 is an equilibrium point i.e. x_ (t) = 0
3
77
7
.. .. .. 77 = F
. . . 5
@fn : : : @fn
@x @x
2 @f 3 @xn
@f = 66 ..@u 77 = G
4. 5
2 @f
6 @x
@f
@f = 666 @x
@x 64 ...
@fn
1
1
2
1
@f1
@x2
@f2
@x2
1
2
:::
:::
@f1
@xn
@f2
@xn
1
@u
Output
@fn
@u
@h u
y = h(x; u) ' @h
x
+
@x @u
"
x_ = x;2sin x + u
1
#
2
3
" #
0
1
0
6
7
F =4 ;
| cos
{z x1} 0 5 ; G = 1
;1 or 1
91
Equilibrium at
0
0
x=
!
or
!
0
1.9 State-transformation
ODE ;! F; G; H; J
Is x(t) Unique ? NO
Consider,
z = T ;1 x
x = Tz
Given x ! z , z ! x
x, z contain the same amount of information, thus z is a valid state
x_
y
x_
z_
=
=
=
=
Fx + Gu
Hx + Ju
T z_ = FTz + Gu
1 FT z + T ;1 G u
T| ;{z
} | {z }
A
y = HT
J u
|{z} z + |{z}
C
B
D
There are an innite number of equivalent state-space realizations for a given system
state need not represent physical quantities
"
#
1
;
1
;
1
T =
"
z = 11 ;11
#"
1 1
# "
x1 = x1 ; x2
x2
x1 + x2
#
A transformation T may yield an A, B, C, D which may have a "nice" structure even though
z has no physical intuition. We'll see how to nd transformation T later.
1.10 Transfer Function
1.10.1 Continuous System
X_ = AX + BU
Y = CX + DU :
Take Laplace-transform:
92
sX (s) ; X (0) = AX (s) + BU (s)
Y (s) = CX (s) + DU (s)
(sI ; A)X (s) = X (0) + BU (s)
X (s) = (sI ; A);1 X (0) + (sI ; A);1 BU (s)
Y (s) = [C (sI ; A);1B + D]U (s) + C (sI ; A);1 X (0;) :
If X (0; ) = 0,
How to compute (sI ; A);1:
Y (s) = G(s) = C (sI ; A);1 B + D :
U (s)
(sI ; A) :
(sI ; A);1 = ADJ
det(sI ; A)
If det(sI ; A) 6= 0, (sI ; A);1 exists, and poles of the system are eigenvalues of A.
Example:
"
#
" #
;
3
1
For A = 1 0 B = 10 C = [ 1 0 ] and D = 0,
G(s) = C (sI ; A);1B
"
#;1 "
s
+
3
;
1
= [ 0 1 ] ;1 s
"
#
s ;1 "
;1 s + 3
= [ 0 1 ] s(s + 3) ; 1 10
= s2 + 31s ; 1 :
1
0
#
#
1.10.2 Discrete System
Xk+1 = Xk + ;Uk
Yk = HXk :
Take Z -transform, we get
X (z) = (zI ; );1;U (z ) + (zI ; );1zX (0)
Y (z ) = H (zI ; );1;U (z ) + H (zI ; );1zX (0) :
93
1.11 Example - what transfer function don't tell us.
The purpose of this exercise is to show what transfer functions do not tell us. For the system in
Figures below,
u
s-1
1
s+1
s-1
.
X1
U
-2
.
X2
X1
1
s
The state-space equations of the system are as follows:
"
x_1
x_2
#
=
y =
Then we have
"
#
"
"
h
;1 0
#"
1 1
0 1
s
# "
#"
#
x1 + ;2 U
x2
1
i " x1 #
x2
X2
1
+
-1
y
:
# "
#
sx1 (s) ; x1 (0) = ;1 0 x1 (s) + ;2 u(s)
sx2 (s) ; x2 (0)
1 1
x2 (s)
1
x1(s) = s +1 1 x1(0) ; s +2 1 u(s)
x2(s) = s ;1 1 [x2(0) ; 12 x1(0)] + 2(s 1+ 1) x1 (0) + s +1 1 u(s) :
Take L;1 -transform,
x1(t) = e;t x1 (0) ; 2e;t u(t)
y (t) = x2 (t)
= et [x2(0) ; 1 x1(0)] + 1 e;t x1(0) + e;t u(t) :
2
2
When t ! 1, the term et [x2(0) ; 21 x1(0)] goes to 1 unless x2(0) = 21 x1(0). The system is uncontrollable.
Now consider the system below:
u
1
s-1
s-1
s+1
94
y
U
.
X1
1
s
X1
.
X2
-2
1
s
X2
Y
+
we have
"
x_1
x_2
#
=
y =
"
h
1 0
;2 ;1
1 1
#"
# " #
x1 + 1 u
x2
0
i " x1 #
x2
Take L;1 -transform,
x1(t) = etx1 (0) + et u(t)
x2(t) = e;t [x2(0) + x1(0)] ; etx1(0) + e;t u(t) ; et u(t)
y(t) = x1(t) + x2(t)
= e;t (x1(0) + x2 (0)) + e;t u(t)
The system state blows up, but the system output y (t) is stable. The system is unobservable. As
we will see, a simple inspection of A; B; C; D tells us observability or controllability.
1.12 Time-domain Solutions
For a system
X_ = AX + BU
Y = CX ;
X (s) = (sI ; A);1X (0) + (sI ; A);1BU (s)
Y (s) = CX (s) :
The term (sI ;A);1 X (0) corresponds to the \homogeneous solution", and the term (sI ;A);1BU (s)
is called \particular solution".
Dene the \resolvent matrix" (s) = (sI ; A);1 , then (t) = L;1 f(sI ; A);1 g, and
X (t) = (t)X (0) +
Zt
0
( )BU (t ; )d :
Because
(s) = (sI ; A);1
= 1s (I ; As );1
2
= Is + sA2 + As3 + ;
95
we have
3
2
(t) = I + At + (At2!) + (At3!) + 1 (At)k
X
=
k=0 k!
= eAt :
Compare the solution of X_ = AX with the solution of x_ = ax,
X (t) = eAt X (0)
x = eat x(0)
we observed that they have similar format.
To check whether X (t) = eAt X (0) is correct, we calculate deAt =dt
P (At)k ) X
1 (At)k;1
deAt = d( 1
k=0 k! = A
= AeAt :
dt
dt
(
k
;
1)!
1
From above equation and X (t) = eAt X (0), we have
At
X_ = dX
dt = Ae X (0) = AX :
The matrix exponential eAt has the following properties:
eA(t +t ) = eAt eAt
e(A+B)t = eAt eBt; only if (AB = BA) :
1
2
1
2
The conclusion of this section:
eAt X (0) +
X (t) =
Y (t) = CX (t) :
If the matrix A is diagonal, that is
then
Zt
0
eA BU (t ; )d
2
66 1 2
A = 666
4
2 e t
66
e t
6
At
e = 66
4
3
77
77 ;
75
1
2
3
77
77 ;
75
If A is not diagonal, we use the following Eqn. to transform A to a diagonal matrix,
= T ;1 AT ;
96
where the columns of T are eigenvectors of A.
Then
eAt = TetT ;1 :
Example:
"
#
"
#
2
For matrix A = 00 10 , (sI ; A);1 = 10=s 11=s
=s , we have
eAt = L;1 f(sI ; A);1g =
"
1 t
0 1
#
:
Calculating eAt using power series
eAt = I + At + (At2 ) + "
# "
# "
#
1
0
0
t
0
0
= 0 1 + 0 0 + 0 0 + 2
=
"
1 t
0 1
#
:
Note: there is no best way to calculate a matrix exponential. A reference is \19 Dubious method
of calculating eAt " by Cleve and Moller.
100
Consider e;100 = 1 ; 100 + 100
2! ; 3! + .
2
3
State-transition matrices
eFt eF (;t) = eF (t;t) = I
(eFt );1 = e;Ft :
Thus,
X (t0) = eFt X (0)
X (0) = e;Ft X (to )
X (t) = eFte;Ft X (t0)
= eF (t;t )X (t0) :
0
0
0
0
The eF (t;t ) is the state-transition matrix (t ; ). The last equation relates state at time t0 to
state at time t.
0
For a time variant system,
X _(t) = A(t)X (t) ;
X (t) = (t; t0)X (t0)
@X (t) = @(t; t0) X (t )
0
@t
@t
@(t; t0) = A(t)(t; ) :
@t
97
Because
(t; t) = I
(t3 ; t1 ) = (t3; t2)(t2 ; t1)
I = (t; )(; t) ;
we have
[(t; )];1 = (; t) :
1.13 Poles and Zeros from the State-Space Description
For a transfer function
G(s) = UY ((ss)) = ab((ss)) ;
a pole of the transfer function is the \natural frequency", that is the motion of aY for input U 0.
In the state form we have,
X_ = AX and X (0) = Vi :
Assume X (t) = esi t Vi, then we have
X _(t) = siesi tVi
= AX (t)
= Aesi t Vi :
So,
si Vi = AVi ;
that is si is an eigenvalue of A and Vi is an eigenvector of A. We conclude poles = eig(A).
A zero of the system is a value s = si , such that if input u = esi t Ui , then y 0. For a system in
state form
X_ = FX + GU
Y = HX ;
a zero is a value s = si , such that
U (t) = esi tU (0)
X (t) = esi tX (0)
Y (t) 0 :
Then we have
h
si ; F ;G
i " X (0)
U (0)
X_ = si esi t X (0)
si t
si t
# = Fe X (0) + Ge U (0) ; or
= 0:
98
We also have
Y = HX +" JU = 0# ; or
h
i (0)
H J X
U (0) = 0 :
Combine them together, we have
"
sI ; F ;G
H
J
#"
"
# " #
X (0) = 0
U (0)
0
:
#
Non-trivial solution exists for det sI H; F ;JG 6= 0. (For mimo systems, look to where this
matrix loses rank.)
The transfer function
Example:
"
#
det sI H; F ;JG
G(s) = H (sI ; F );1 G =
:
det[sI ; F ]
For a transfer function
+1
s+1 :
G(s) = (s +s2)(
=
2
s + 3) s + 5s + 6
In state form, the system is
"
F = ;15 ;06
" #
#
1
0
h
i
H = 1 1 :
G =
The poles of the system are found by solving det[sI ; F ] = 0, that is
"
#
det[sI ; F ] = det s;+15 6s = 0 or
s(s + 5) + 6 = s2 + 5s + 6 = 0
"
#
The zeros of the system are found by solving det sI H; F ;JG = 0, that is
"
det sI H; F ;JG
#
3
2
s
+ 5 6 ;1
= det 64 ;1 s 0 75
1
= ;1(;1 ; s)
= s+1
99
1 0
1.14 Discrete-Systems
1.14.1 Transfer Function
Xk+1 = Xk + ;Uk
Yk = HXk :
"
Then
#
det zI H; ;0;
Y (z ) = H (zI ; );1; =
:
U (z )
det[zI ; ]
Example: Calculating Xn from X0.
X1 = X0 + ;U0
X2 = (X0 + ;U0 ) + ;U1
X3 = ((X0 + ;U0 ) + ;U1 ) + ;U2
Xn = n X0 +
n
X
i=1
i;1 ;Un;i
1.14.2 Relation between Continuous and Discrete
ZOH equivalent:
U(kT)
Y(kT)
ZOH
G(s)
G(z)
Figure 12: ZOH
X_ = FX + GU
Y = HX + JU
Zt
X (t) = eF (t;t )X (t0) + eF (t; )GU ( )d :
0
t0
Let's solve over one sampling period: t = nT + T and t0 = nT , we have
X (nT + T ) = eFT X (nT ) +
Z nT +T
nT
100
eF (nT +T ; )GU ( )d :
Because U ( ) = U (nT ) = const for nT nt + T ,
Z nT +T
FT
X (nT + T ) = e X (nT ) + f
eF (nT +T ; ) d gGU (nT ) :
nT
Let 2 = nT ; T ; , we have
Xn+1
ZT
F
n + f e d gGUn :
= eFT X
0
So we have
= eFT
ZT
; = f eF d gG
0
H $ H
J $ J:
2 Controllability and Observability
2.1 Controller canonical form
Y (s) = b(s) = b1s2 + b2s + b3
U (s)
a(s) s3 + a1 s2 + a2 s + a3
Y (s) = b(s) Ua((ss)) = b(s) (s)
U (s) = a = s3 + a1 s2 + a2s + a3 (s)
u = (3) + a1  + a2_ + a3 y = b1 + b2 _ + b3
b1
...
ξ
u
+
.
x1
1
s
..
ξ
.
x1
x2
1
s
b2
.
ξ
x2
.
x3
-a1
-a2
-a3
Figure 13:
101
1
s
ξ
b3
x3
y
+
2 3
2
x
_1
64 x_2 75 = 64 ;1a1 ;0a2 ;0a3
x_3
| 0 {z1 0
Ac
h
i
32 3 2 3
75 64 xx12 75 + 64 10 75 u
} x3 | {z0 }
Bc
C = b1 b2 b3
D = 0
2.2 Duality of the controller and observer canonical forms
Ac = ATo
Bc = CoT
G(s) = Cc (sI ; Ac );1 Bc
G(s) = [Cc(sI ; Ac );1Bc ]T
BcT (sI ; ATc );1CcT = Co (sI ; Ao);1 Bo
Question: Can a canonical form be transformed to another arbitrary form?
A, B, C
?
?
Ac ,Bc ,Cc
A o ,Bo ,Co
?
Figure 14:
2.3 Transformation of state space forms
Given F, G, H, J, nd transformation matrix T, such that
2
3
;
a
1 ;a2 ;a3
A = 64 1 0 0 75
0
1
0
2 3
1
6
B = 4 0 75
0
Let
T ;1
2 3
t1
= 64 t2 75
t3
102
A = T ;1 FT
AT ;1 = T ;1 F
B = T ;1 G
2
32 3
2 3 2
3
;
a
t
t
t
1 ;a2 ;a3
1
1
1F
64 1 0 0 75 64 t2 75 = 64 t2 75 F = 64 t2F 75
0
1
0
t
t
t3 F
2 33
2 3 3
64 t1 75 = 64 tt12FF 75
t2
t3F
t1 = t2 F
t2 = t3 F
need to determine t3
B = T ;1 G
2 3 2
3
1
t
1G
64 0 75 = 64 t2G 75
t3 G
0
t1 G = 1 = t3 F 2 G
t2 G = 0 = t3 FG
t3 G = 0
h
i h
i
t3 G FG F 2G = 0 0 1 =
In general form,
h
|
e3
|{z}
T
basis function
i
}
tn G FG F 2G : : : F n;1 G = enT
{z
controllability matrix C (F;G)
If C is full rank, we can solve for tn
2 n;1 3
tn F 7
6
;
1
T = 4 tn F 5
tn
where
tn = enT C ;1
103
Denition: A realization fF; Gg can be converted to controller canonical form i C ;1(F; G) exists
() system is controllable.
corollary: C is full rank () system is controllable
Exercise: Show existence of C ;1 is unaected by an invertible transformation T, x = Tz.
Bonus question: What's the condition for a system observable?
Given F, G, H, J, solve for Ao ; Bo; Co; Do. By duality
i C ;1 (F T ; H T ) exists,
h
HT F T HT
Ac = ATo
Bc = CoT
2
H
i
T
: : : F n;1 H T = 64 HF
HF n;1
3
75 =
O| (F;{z H )}
observability matrix
Exercise
u
s-1
1
s+1
s-1
Figure 15:
"
#
F = ;11 01
"
#
h
i
G = ;12
H= 0 1
"
C = ;12 ;21
det(C ) = 2 ; 2 = 0 =) not controllable
"
O = 01 11
104
#
#
y
det(O) = ;1 =) observable
Exercise:
u
1
s-1
s-1
s+1
y
Answer: You will get the reverse conclusion to the above exercise.
Conclusion: A pole zero cancellation implies the system is either uncontrollable or unobservable, or both.
Exercise: Show controllability canonical form has C = I =) controllable
Comments
Controllability does not imply observability, observability does not imply controllability.
Given a transfer function,
G(s) = ab((ss))
We can choose any canonical form w like. For example, either controller form or observer
form. But once chosen, we can not necessarily transform between the two.
Redundant states
x_ = Fx + Gu
y = Hx
let xn+1 = Lx
"
x
#
x= x
=) unobservability
n+1
Denition: A system ( F, G ) is "controllable" if 9 a control signal u to take the states x, from
arbitrary initial conditions to some desired nal state xF , in nite time, () C ;1 exists.
Proof: Find u to set initial conditions
Suppose x(0; ) = 0 and u(t) = (t)
X (s) = (sI ; A);1 bU (s) = (sI ; A);1b
A ;1
;1
I:V:T: = x(0+) = slim
!1 sX (s) = slim
!1 s(sI ; A) b = slim
!1(I ; s ) b = b
Thus an impulse brings the state to b (from 0)
105
Now suppose u(t) = (k) (t)
U (s) = sk
2
X (s) = 1s (I ; As );1 bsk = 1s (I + As + As2 + : : :)bsk
k+1
k
= bsk;1 + Absk;2 + : : : + Ak;1 b + As b + A s2 b + : : :
x(t) = b(k;1) + Ab(k;2) + Ak;1 b due to impulse at t=0;
Ak b + Ak+1 b + : : :g = Ak b
x(0+) = slim
s
f
!1
s
s2
if u(t) = (k) (t) ) x(0+ ) = Ak b
Now let
u(t) = g1 (t) + g2_(t) + : : : + gn (n;1) (t), by linearity,
2
i
x(0+) = g1 b + g2Ab + : : : + gnAn;1 b = b Ab A2 b : : : An;1b 64
h
2
64
g1 3
g1 3
.. 75
.
gn
xdesired = x(0+)
.. 75 = C ;1 x(0+ ) if C ;1 exists
.
gn
and we can solve for gi to take x(0+) to any desired vector
Conclusion: We have proven that if a system is controllable, we can instantly move the states
from any given state to any other state, using impulsive inputs. (only if proof is harder )
2.4 Discrete controllability or reachability
A discrete system is reachable i 9 a control signal u that can take the states from arbitrary initial
conditions to some desired state xF , in nite time steps.
xk = Ak x0 +
xN
k
X
i=0
Ai;1 buk;1
2
3
u(N ; 1)
h
i 666 u(N ; 2) 777
N
2
N
;
1
= A x0 + b Ab A b : : : A b 6
4 ... 75
2 u(N ; 1) 3
64 ... 75 = C ;1(xd ; AN x(0))
u(0)
106
u(0)
2.4.1 Controllability to the origin for discrete systems
jCj =6 0 is a sucient condition
suppose u(n)=0, then x(n) = 0 = Anx(0) if A is the Nilpotent matrix ( all the eigenvalues
equal to zero ).
in general, if Anx0 2
<|{z}
[C ]
range of C
Advanced question: if a continuous system (A,B) is controllable, is the discrete system (,;)
also controllable?
Answer: not necessarily!
sampling slower
zero-pole cancellation
Figure 16:
2.5 Observability
A system (F,H) is observable () for any x(0), we can deduce x(0) by observation of y(t) and u(t)
in nite time () O(F; H );1 exists
For state space equation
x_ = Ax + Bu
y = Cx
y_ = C x_ = CAx + CBu
y = C x = CA(Ax + Bu) + CB u_
..
.
y(k) = CAk x + CAk;1 bu + : : : + Cbuk;1
2
3 2
3
2
32
3
y
(0)
C
0
0 0
u
(0)
64 y_ (0) 75 = 64 CA 75 x(0) + 64 CB 0 0 75 64 u_ (0) 75
CAB CB 0
u(0)
y(0)
CA2
| {z }
O(C;A)
|
107
{z
toeplitz matrix
}
82 y(0) 3 2 u(0) 39
>
>
<
=
6
7
6
7
.
.
;
1
x(0) = O (A; C ) >4 .. 5 ; T 4 .. 5>
: yn;1
un;1 (0) ;
if O;1 exists, we can deduce x(0)
Exercise: solve for discrete system
2
x(k) = O;1 64
y (k)
..
.
y(k + n ; 1)
3 2
75 + T 64
u(k)
..
.
u(k + n ; 1)
3
75
2.6 Things we won't prove
If A; B; C is observable and controllable, then it is minimal, i.e, no other realization has a
smaller dimension). This implies a(s) and b(s) are co-prime.
The unobservable space
N (O) = fx 2 RnjOx = 0g is the null space of O
solutions of x_ = Ax which start in unobservable space cannot be seen by looking at output
(i.e. we stay in unobservable space).
If x(0) 2 N (O), then y = Cx = 0 for all t.
The controllable space,
<(C ) = fx = Cvjv 2 Rng is the range space of C
If x(0) 2 <(C ) and x_ = Ax + Bu , then, x(t) 2 <(C )8 t for any u(t) )the control can only
move the state in <(C )
PBH(Popov-Belevitch-Hautus ) Rank Tests
1. A system f A, B g is controllable i rank [sI-A b]=N for all s.
2. A system f A, C g is observable i
"
#
C
rank sI ; A = N for all s.
3. These tests tell us which modes are uncontrollable or unobservable.
Grammian Tests
P (T; t) =
ZT
t
(T; )B()B0()0 (T; )d, where is the state transition matrix.
A system is controllable i P(T,t) is non singular for some T>t. For time invariant system,
P (T ) =
ZT
0
e;At BBT e;At dt
108
3 Feedback control
u
.
x
G
x
1
s
y
H
F
-K
u = ;kx
x_ = Fx + Gu = (F ; Gk)x
Poles are the eigenvalues of (F-Gk).
Suppose, we are in controller canonical form
2
3
;
a
1 ; k1 ;a2 ; k2 ;a3 ; k3
75
F ; Gk = 64 1
0
0
0
1
0
det(sI ; F + Gk) = s3 + (a1 + k1)s2 + (a2 + k2)s + a3 + k3
We can solve for k so that the closed loop has arbitrary characteristic eigenvalues. This is called
"pole placement problem". Simply select 1 ,2,3, which are the coecients of the desired
characteristic equation,
s3 + 1s2 + 2s + 3 = c (s)
Since we know
c (s) = det(sI ; F + Gk)
we can solve for k by matching the coecients of the two equations. Suppose now that f F, G g
are not in controller canonical form, nd transformation T, such that x=Tz and
z_ = Az + Bu are in controller canonical form
Then nd kz so that det(sI ; A + Bkz ) = (s) . Feedback u = ;kz z = ;kz T ;1 x, thus, we get
kx = kz T ;1
Conclusion: T ;1 can be found () C ;1 exists, which means system is controllable. Putting all
the steps together,
T (A; B ) match coecients
T k
(F; G) ;!
;!
kz ;!
x
Closed form - Ackerman's formula
;1
kx = enT C ;1 (F ; G )c (F )
c (F ) = F n + 1F n;1 + : : : + nI
109
In MATLAB
k = Acker(F; G; P ) P is vector of the desired poles,
This is not good numerically, since we need to calculate C ;1.
Instead, we use
k = place(F; G; P )
This is based on a set of orthogonal transformations.
Summary:
u = ;Kx
x_ = Fx + Gu = (F ; GK )x
Poles are eig(F - GK) = c (s).
Can choose arbitrary c (s) given F, G controllable.
K = place(F; G; P )
where P is the vector of poles.
3.1 Zeros
x_
u
x_
y
Zeros from r ! y are given by
Fx + Gu
;Kx + Nr
(F ; GK )x + GNr
Hx
=
=
=
=
"
det sI ; FH+ GK ;GN
0
Multiply column two by K=N and add to rst column.
"
det sI H; F ;G0 N
#
#
! Same as before
! Zeros unaected by State Feedback (same conclusion as in classical control)
110
Example :
G(s) = (ss3++2)(s s;;103)
Set
c (s) = (s + 1)(s + 2)(s ; 3)
C (sI ; A + GK );1B = s +1 1
! unobservable
Does it aect controllability ? No
F ; GK ! still in Controller Canonical Form.
Example
"
#
" #
x_ = 10 12 x + 10 u
det(sI ; A + BK ) = s ; 10+ k1 ;s1 ;+ 2k2
= (s ; 1 + k1)
= +2 check with PBH
Example
X
x_
u
x_
x(t)
u
=
=
=
=
=
;x + u
;Kx
;(1 + K )x
e;(1+K )t; x(0) = 1
;ke;(1+K)t ; x(0) = 1
111
(|s ;
{z 2)}
uncontrollable mode
larger K
As K ! 1 (large gain), u looks more and more like (t) which instantaneously moves x to
0. (recall proof for controllability)
3.2 Reference Input
x_ = Fx + Gu
y = Hx
r = ref
1. Assume r ! constant as t ! 1 (e.g. step)
2. Want y ! r as t ! 1
Let,
u = ;K (x ; xss ) + uss
xss = Nxr
uss = Nur
r
Nu
+
F,G,H
-K
x
-
+
Nx
r
N
+
u
-K
112
x
N = (KNx + Nu)
At steady state,
x ! xss
x_ ! 0
u ! uss
0 = Fxss + Guss
yss = Hxss = r
FNxr + GNu r = 0
HNxr = r
Divide by r.
"
"
F G
H 0
#"
# " #
Nx
0
Nu = 1
# "
# " #
Nx = F G ;1 0
Nu
H 0
1
Example: Two mass system
X1
X2
M1
M2
We know, that in steady state
2
x
66 x_ 11
64 x2
x_ 2
Hence,
3 2 3
1
77 66 0 77
75 ! 64 1 75
0
2 3
66 10 77
Nx = 64 1 75 ;
and
0
113
Nu = 0
Nx; Nu more robust - Independent of M1; M2 In contrast, N = (KNx + Nu) ! changes with K.
For SISO systems
y = H (sI ; F + GK );1GN
r
N = H (;F +1GK );1G
G(s) js=0 = 1
=)
3.3 Selection of Pole Locations
Suppose F, G are controllable, where should the poles be placed? There are three common methods
for placing the poles.
3.3.1 Method 1 : Dominant second-order poles
y=
!n2
r s2 + 2!ns + !n2
Example
1
.1
slinky
-.014 + 1.0003
X
X
X
X
X
Choose 2nd order dominant pole locations:
!n
c
pd
=
=
=
=
1
0:5
s2 + s +
p1
;1 3 i
2
2
114
Take other 2 poles to be
!n = 4;
= 0:5;
(4 pd )
0 ;1=2 + p3i=2 1
BB ;1=2 ; p3i=2 CC
p=B
C
@ ;2 + 2p
p3i A
;2 ; 2 3i
MATLAB:
K = place(F; G; P )
K = [19:23 4:96 ; 1:6504 16:321]
eort is the price of moving poles.
3.3.2 Step Response Using State Feedback
r
N
+
u
y
1/s
G
H
-F
-K
x_
u
y
x_
=
=
=
=
Fx + Gu
;Kx + Nr
Hx
(F ; GK )x + NGr
where N is a scale factor.
MATLAB:
y = step(F ; GK; NG; H; 0; 1; t)
Example
% MATLAB .m file for Two Mass System.
State Feedback,
% Parameters
M = 1
m = .1
115
E. Wan
k = 0.091
b = 0.0036
%
F
G
H
J
State-Space Description
= [0 1 0 0; -k/M -b/M k/M b/M ;0 0 0 1; k/m b/m -k/m -b/m]
= [0 1/M 0 0]'
= [0 0 1 0]
= 0
% Desired pole locations
Pd = roots([1 1 1])
P = [Pd 4*Pd]
K = place(F,G,P)
% Closed loop pole locations
pc = eig(F-G*K)
% Reference Design
Nx = [1 0 1 0]';
Nb = K*Nx
% Step Response
t = 0:.004:15;
subplot(211)
[Y,X] = step(F-G*K,G*Nb,H,0,1,t);
plot(t,H*X','--');
subplot(212)
plot(t,-K*X'+Nb,'--');
title('2-Mass control ref. step')
% Standard Root Locus vs Plant Gain
%axis('square');
%rlocus(F,G,K,0)
%title('Plant/State-Feedback Root Locus')
%rcl = rlocus(F,G,K,0,1)
%hold
%plot(rcl,'*')
%axis([-4 4 -4 4]);
% Bode plot of loop transfer function
%margin(F,G,K,0)
116
2−Mass control ref. step
1.2
1
Amplitude
0.8
0.6
0.4
0.2
0
0
5
10
15
Time (secs)
3.3.3 Gain/Phase Margin
cut
y
+
N
F,G
-
+K
MATLAB:
bode(F; G; K; 0)
margin(F; G; K; 0)
117
H
Gm=−51.45 dB, (w= 1.006) Pm=52.22 deg. (w=5.705)
Gain dB
100
0
−100 −1
10
0
1
10
Frequency (rad/sec)
10
0
Phase deg
−90
−180
−270
−360 −1
10
0
1
10
Frequency (rad/sec)
10
Root Locus
MATLAB:
rlocus(F; G; K; 0)
Plant/State−Feedback Root Locus
4
3
2
Imag Axis
1
0
−1
−2
−3
−4
−4
−3
−2
−1
0
Real Axis
1
2
Open loop ; (F; G; H )
Closed loop ; (F ; GK; NG; H )
118
3
4
Open loop for stability check
(F; G; K )
T:F: = K (sI ; F );1 G
3.3.4 Method 2 : Prototype Design
Select prototype responses with desirable dynamics. There are several sets of these Butterworth
1
1 + (!=!o)2n
Bessel
ITAE: Minimizes the Integral of the Time multiplied by the Absolute value of Error,
J=
Zt
0
t j e j dt
e
(35 years ago - by twiddling and trial and error)
All these methods set pole locations and ignore the inuence of plant zeros.
3.3.5 Method 3 : Optimal Control and Symmetric Root Locus
Suppose we want to minimize
Z1
J=
e2 dt
0
Optimal Control Law for this is a bunch of 's and its derivatives. But the 's are not practical.
Linear Quadratic Constraints
J=
Z tf h
0
i
xT (t)Qx(t) + uT (t)Ru(t) dt + xT (tf )Sx(tf )
where Q, R and S are positive denite weighting matrices. x ! 0 is equivalent to e ! 0.
It turns out that
which is time varying state control.
u(t) = ;K (t)x(t)
119
The linear feedback K (t) may be solved through Dynamic Programming (next chapter).
A general \optimal" control has the form.
J=
Z tf
0
[L(x; u; t)] dt + L2 (x(tf ); u(tf ))
which may be dicult to solve. For the special case
J=
Z 1h
0
i
xT Qx + uT Ru dt
we have the Linear Quadratic Regulator.
and
u = ;kx =) xed feedback control.
SRL
Q = 2 H T H
R = I
=)
J=
Z 1h
0
i
2 y2 + u2 dt
Solution of u = ;Kx leads to the optimal value of K, which places the closed loop poles at the
stable roots of the equation,
1 + 2GT (;s)G(s) = 0 where G(s) = UY ((ss)) (open loop)
This is called the Symmetric Root Locus. The locus in the LHP has a mirror image in the RHP.
The locus is symmetric with respect to the imaginary axis.
= 0 : expensive control, since control eort has to be minimized
= 1 : cheap control, since y2 has to be minimized, hence actuators may approach delta
functions.
Examples
G(s) = s12
G(;s) = (;1s)2 = s12
120
4
X
Example: Two Mass system
X
XX
XX
O
X
MATLAB :
where
X
O
X
K = LQR(F; G; Q; R)
Q = 2 H T H
R = I
% MATLAB .m file for Two Mass System.
LQR State Feedback, E. Wan
%
M
m
k
b
Parameters
= 1
= .1
= 0.091
= 0.0036
%
F
G
H
J
State-Space Description
= [0 1 0 0; -k/M -b/M k/M b/M ;0 0 0 1; k/m b/m -k/m -b/m]
= [0 1/M 0 0]'
= [0 0 1 0]
= 0
% LQR Control Law for p^2 = 4
K = lqr(F,G,H'*H*4,1)
% Closed loop pole locations
pc = eig(F-G*K)
121
% Check natural frequency
a_pc = abs(eig(F-G*K))
% Symmetric Root Locus for LQR design
%subplot(211)
%A = [F zeros(4); -H'*H -F'];
%B = [G ;0;0;0;0];
%C = [0 0 0 0 G'];
%rlocus(A,B,C,0)
%title('SRL for LQR')
%axis('square');
%subplot(212)
%rlocus(A,B,C,0)
%hold
%plot(pc,'*')
%axis([-3 3 -3 3]);
%axis('square');
% The poles pc are on the stable locus for p^2 = 4
% Standard Root Locus vs Plant Gain
%rlocus(-num,den)
%rlocus(F,G,K,0)
%title('Plant/State-Feedback Root Locus')
%rcl = rlocus(F,G,K,0,1)
%hold
%plot(rcl,'*')
%axis('square');
% Bode plot of loop transfer function
margin(F,G,K,0)
% Reference Design
Nx = [1 0 1 0]';
Nb = K*Nx
% Step Response
%t = 0:.004:15;
%[Y,X] = step(F-G*K,G*Nb,H,0,1,t);
%subplot(211);
%step(F-G*K,G*Nb,H,0,1,t);
%title('2-Mass control ref. step')
%subplot(212)
%plot(t,-K*X'+Nb)
%title('2-Mass Control effort ')
122
SRL for LQR
Imag Axis
50
0
−50
−50
0
Real Axis
50
Imag Axis
2
1
0
−1
−2
−3
−2
0
Real Axis
123
2
Plant/State−Feedback Root Locus
2
1.5
1
Imag Axis
0.5
0
−0.5
−1
−1.5
−2
−2
−1.5
−1
−0.5
0
0.5
Real Axis
1
1.5
2
Gm=−30.03 dB, (w= 1.02) Pm=60.1 deg. (w=2.512)
Gain dB
100
0
−100 −2
10
−1
0
10
10
1
10
Frequency (rad/sec)
0
Phase deg
−90
−180
−270
−360 −2
10
−1
0
10
10
Frequency (rad/sec)
124
1
10
2−Mass control ref. step
Amplitude
1.5
1
0.5
0
0
5
10
15
10
15
Time (secs)
2−Mass control ref. step
20
10
0
−10
0
5
Example
G(s) = (s + 1)(s1; 1 j )
X
X
X
X
X
X
X
X
X
choose these for p = 0
Choose these poles to stabilize and minimize the control eort.
125
Fact about LQR : Nyquist plot does not penetrate disk at -1.
P.M > 60
60
=) PM 60
Example
H (s) = s2 s; 1
X
X
But,
X
X
1 + 2 GT (;s)G(s) = 1 + 2 s2 s; 1 s(2;;s)1
= 1 ; 2 (s2 s; 1)2
2
126
Instead we should plot the Zero Degree Root Locus
X
X
X
o
O
Rule : To determine the type of Root Locus (0 or 180) choose the one that dose not cross the
Imaginary axis.
3.4 Discrete Systems
1 + 2 GT (z ;1 )G(z ) = 0
z = esT =) e;sT = z1
J=
Example
1
X
k=0
u2 (k) + 2y 2(k)
1 ;! z + 1
s2
(z ; 1)2
from zero at infinity
o
O
O
X
X
XX
;1
z
G(z;1 ) = (zz;1 ;+1)1 2 = ((zz +; 1)
1)2
Example Two mass system.
127
% MATLAB .m file for Two Mass System. Discrete control
% Eric A. Wan
%
M
m
k
b
Parameters
= 1;
= .1;
= 0.091;
= 0.0036;
%
F
G
H
J
State-Space Description
= [0 1 0 0; -k/M -b/M k/M b/M ;0 0 0 1; k/m b/m -k/m -b/m];
= [0 1/M 0 0]';
= [0 0 1 0];
= 0;
% LQR Control Law for p^2 = 4
K = lqr(F,G,H'*H*4,1);
% Closed loop pole locations
pc = eig(F-G*K);
% Reference Design
Nx = [1 0 1 0]';
Nb = K*Nx;
% Step Response
%t = 0:.004:20;
%[Y,X] = step(F-G*K,G*Nb,H,0,1,t);
%subplot(211);
%u = -K*X';
%step(F-G*K,G*Nb,H,0,1,t);
%title('2-Mass reference step')
%plot(t,u+Nb)
%title('2-Mass control effort ')
%ZOH discrete equivalent
T = 1/2;
[PHI,GAM] = c2d(F,G,T);
pz = exp(pc*T);
Kz = acker(PHI,GAM,pz);
Nxz = Nx;
Nbz = Kz*Nx;
% Step Response
%axis([0 20 0 1.5])
%t = 0:.004:20;
%[Y,X] = step(F-G*K,G*Nb,H,0,1,t);
128
%subplot(211);
%step(F-G*K,G*Nb,H,0,1,t);
%hold
%title('Discrete: 2-Mass reference step, T = 1/2 sec')
%[Y,X] = dstep(PHI-GAM*Kz,GAM*Nbz,H,0,1,20/T);
%td = 0:T:20-T;
%plot(td,Y,'*')
%hold
%dstep(PHI-GAM*Kz,GAM*Nbz,-Kz,Nbz,1,20/T)
%title('ZOH control effort ')
% pole locations in s and z plane
%axis('square')
%axis([-2 2 -2 2]);
%%subplot(211)
%plot(pc,'*')
%title('s-plane pole locations')
%sgrid
%axis([-1 1 -1 1]);
%pz = exp(pc/4);
%plot(pz,'*')
%title('z-plane pole locations')
%zgrid
%hold
%T = 1/4:.01:5;
%pz = exp(pc*T);
%plot(pz,'.')
% Discrete Step Response with intersample response
%axis([0 20 0 1.5])
%subplot(211);
%[Y,X] = dstep(PHI-GAM*Kz,GAM*Nbz,H,0,1,20/T);
%td = 0:T:20-T;
%plot(td,Y,'*')
%title('Dead beat control');
%[u,X] = dstep(PHI-GAM*Kz,GAM*Nbz,-Kz,Nbz,1,20/T);
%uc = ones(20,1)*u';
%uc = uc(:);
%hold
%[PHI2,GAM2] = c2d(F,G,T/20);
%[y2,x2] = dlsim(PHI2,GAM2,H,0,uc);
%t2d = linspace(0,20,20*20/T);
%plot(t2d,y2)
%hold
%dstep(PHI-GAM*Kz,GAM*Nbz,-Kz,Nbz,1,20/T);
%title('control effort');
% Standard Root Locus vs Plant Gain
%axis([-1 1 -1 1]);
%axis('square');
%rlocus(PHI,GAM,Kz,0)
129
%title('Plant/State-Feedback Root Locus, T = 1/2')
%rcl = rlocus(PHI,GAM,Kz,0,1)
%zgrid
%hold
%plot(rcl,'*')
% Standard Root Locus vs Plant Gain
%axis([-3 3 -3 3]);
%axis('square');
%rlocus(F,G,K,0)
%title('Plant/State-Feedback Root Locus')
%rcl = rlocus(F,G,K,0,1)
%sgrid
%hold
%plot(rcl,'*')
% Bode plot of loop transfer function
%w = logspace(-1,1,200);
%bode(F,G,K,0,1,w);
%[mag,phase] = bode(F,G,K,0,1,w);
%title('Continuous Bode Plot')
%[gm,pm,wg,wc] = margin(mag,phase,w)
% Bode plot of loop transfer function
rad = logspace(-2,log10(pi),300);
dbode(PHI,GAM,Kz,0,T,1,rad/T);
[mag,phase] = dbode(PHI,GAM,Kz,0,T,1,rad/T);
title('Discrete Bode Plot, T = 1/2')
[gmd,pmd,wg,wc] = margin(mag,phase,rad/T)
130
Discrete SRL
1
0.8
0.6
0.4
Imag Axis
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
−1
−0.5
0
Real Axis
0.5
1
Plant/State−Feedback Root Locus, T = 1/2
1
0.8
0.6
0.4
Imag Axis
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
−1
−0.5
0
Real Axis
131
0.5
1
LQR step response, T = 1/2
1.5
1
0.5
0
0
2
4
6
8
10
12
14
16
18
20
control effort
1.5
Amplitude
1
0.5
0
−0.5
0
5
10
15
20
25
No. of Samples
30
35
40
LQR step response, T = 1
1.5
1
0.5
0
0
2
4
6
8
10
12
14
16
18
20
10
12
No. of Samples
14
16
18
20
control effort
1
Amplitude
0.5
0
−0.5
−1
0
2
4
6
8
132
LQR step response, T = 2
1.5
1
0.5
0
0
2
4
6
8
10
12
14
16
18
20
8
9
18
20
control effort
0.4
Amplitude
0.2
0
−0.2
−0.4
0
1
2
3
4
5
No. of Samples
6
7
Deadbeat step response, T = 1/2
1.5
1
0.5
0
0
2
4
6
8
10
12
14
16
control effort
Amplitude
50
0
−50
0
5
10
15
20
25
No. of Samples
133
30
35
40
4 Estimator and Compensator Design
4.1 Estimators
82 3 2 39
>
<6 y 7 6 u 7>
=
;
1
x(t) = O >4 y_ 5 ; T 4 u_ 5>
: y
u ;
Not very practical for real systems
x_ = Fx + Gu + G1w
y = Hx + v
where w is the process noise and v is the sensor noise. Want an estimate x^ such that (x;x^) is "small"
if
x ; x^ ;! 0
w=v=0
u
plant
F,G,H
F,G,H
model
x^ = F x^ + Gu
error x ; x^ x~
x~_ = x_ ; x^_ = Fx + Gu + G1w ; (F x^ + Gu)
= F x~ + G1w
(open loop dynamics - usually not good)
v
+
X
u
y^
^
-
+
+
X
~
y = y - y^
x^_ = F| x^ +
{z Gu} + L| (y ;{zH x^)}
model
correction term
How to choose L ?
134
Pole placement
1. Luenberger (observer)
Dual of pole-placement for K
2. Kalman
Choose L such that the estimates is Maximum Likelihood assuming Gaussian noise
statistics. to minimize
Error Dynamics
The error dynamics are given by,
x~_ = Fx + Gu + G1w ; [F x^ + Gu + L(y ; H x^)]
= F x~ + G1 w + LH x~ ; Lv
= (F ; LH )~x + G1w ; Lv
Characteristic Equation :
det(sI ; F + LH ) = e (s)
det(sI ; F + GK ) = c (s)
e(s) = Te = det(sI ; F T + H T LT )
Need (F T ; H T ) controllable =) (F; H ) observable.
LT = place(F T ; H T ; Pe )
4.1.1 Selection of Estimator poles e
1. Dominant 2nd Order
2. Prototype
3. SRL
x~_ = (F ; LH )~x + G1w ; Lv
want L large to compensate for w (fast dynamics)
L provides lowpass lter from v. want L small to do smoothing in v.
SRL - optimal tradeo
135
w
v
y
u
+
X
G1 (s) = wy
1 + q 2 GT1 (;s)G1 (s) = 0
noise energy
q = process
sensor noise energy
=) steady state solution to Kalman lter
4.2 Compensators: Estimators plus Feedback Control
Try the control law u = ;kxb
With this state estimator in feedback loop the system is as shown below.
Process
.
x=Fx + Gu
u(t)
x(t)
Sensor
H
y(t)
u(t)
Control law
-K
^
x(t)
Estimator
.
^
^
^
x=Fx+Gu+L(y-Hx)
What is the transfer function of this system?
The system is described by the following equations.
x_ = Fx + Gu + G1w
y = Hx + v
Estimation
xb_ = F xb + Gu + L(y ; H xb)
Control
!
x_ =
xb_
u = ;K xb
F
;GK
LH F ; GK ; LH
!
136
!
!
!
x + G1 w + 0 v
xb
0
L
x~ = x ; xb
Apply linear transformation
!
x =
x~
I 0
I ;I
!
x
xb
!
With this transformation the state equations are :
!
x_ =
x~_
F ; GK
0
GK
F ; LH
!
!
!
!
x + G1 w + 0
x~
G1
;L v
The characteristic equation is
C.E = det sI ; F0+ GK sI ;;FGK
+ LH
{z
|
no cross terms
!
}
=
= c(s)e(s)!!
This is the Separation Theorem. In other words the closed loop poles of the system is equivalent
to the compensator poles plus the estimator poles as if they were designed independently.
Now add reference input
u = ;K (^x ; Nxr) + Nu r = ;K x^ + Nr
What is the eect on x~ due to r ?
x~_ = x_ ; xb_ = (F ; LH )~x + G1w ; Lv
x~ is independent of r, i.e. x~ is uncontrollable from r. This implies that estimator poles do not
appear in r!x or r!y transfer function.
e (s)b(s) Nb
(s)
y = N
=
r e (s)c(s) c(s)
This transfer function is exactly same as without estimator.
137
4.2.1 Equivalent feedback compensation
With a reference input the equivalent system is :
r
D2(s)
u
P
y
D1(s)
D2(s) is transfer function introduced by adding reference input.
We have
xb_ = F xb + Gu + L(y ; H xb)
u = ;K xb + Nr
+ Ly
) xb_ = (F ; GK ; LH )xb + GNr
From the above equations (considering y as input and u as output for D1 (s)),
D1(s) = ;K (sI ; F + GK + LH );1L
(Considering r as an input and u as an output for D2 (s)),
D2(s) = ;K (sI ; F + GK + LH );1GN + N
4.2.2 Bode/root locus
The root locus and Bode plots are found by looking at open loop transfer functions. The reference
is set to zero and feedback loop is broken as shown below.
r =0
D2(s)
uin
uout
138
kG(s)
D1(s)
y
!
i.e.
D1(s) = K (sI ; F + GK + LH );1 L
G(s) = H (sI ; F );1 G
x_ = F
0
_xb
LH F ; GK ; LH
{z
|
A
!
}
!
!
x + G u
in
xb
0
x!
uout = 0 ;K
| {z } xb
| {z }
B
C
Rootlocus can be obtained by rlocus(A,B,C,0) and Bode plot by bode(A,B,C,0).
Estimator and controller pole locations are given by rlocus(A,B,C,0,1).
0 1
!
r C
u
B
As an exercise, nd the state-space representation for input @ w A and output y .
v
% MATLAB .m file for Two Mass System.
Feedback Control / Estimator
%
M
m
k
b
Parameters
= 1
= .1
= 0.091
= 0.0036
%
F
G
H
J
State-Space Description
= [0 1 0 0; -k/M -b/M k/M b/M ;0 0 0 1; k/m b/m -k/m -b/m]
= [0 1/M 0 0]'
= [0 0 1 0]
= 0
% LQR Control Law for p^2 = 4
K = lqr(F,G,H'*H*4,1)
K =
2.8730
2.4190
-0.8730
1.7076
% Closed loop pole locations
pc = eig(F-G*K)
pc =
-0.3480 + 1.2817i
-0.3480 - 1.2817i
139
-0.8813 + 0.5051i
-0.8813 - 0.5051i
% Symmetric Root Locus for LQR design
%axis([-3 3 -3 3]);
%axis('square');
%A = [F zeros(4); -H'*H -F'];
%B = [G ;0;0;0;0];
%C = [0 0 0 0 G'];
%rlocus(A,B,C,0)
%hold
%plot(pc,'*')
%title('SRL for LQR')
% The poles pc are on the stable locus for p^2 = 4
% Standard Root Locus vs Plant Gain
%axis([-3 3 -3 3]);
%axis('square');
%%rlocus(-num,den)
%rlocus(F,G,K,0)
%title('Plant/State-Feedback Root Locus')
%rcl = rlocus(F,G,K,0,1)
%hold
%plot(rcl,'*')
% Bode plot of loop transfer function
%w = logspace(-1,1,200);
%bode(F,G,K,0,1,w);
%[mag,phase] = bode(F,G,K,0,1,w);
%title('Bode Plot')
%[gm,pm,wg,wc] = margin(mag,phase,w)
% Reference Design
Nx = [1 0 1 0]';
Nb = K*Nx
Nb =
2.0000
% Step Response
%t = 0:.004:15;
%subplot(211);
%step(F-G*K,G*Nb,H,0,1,t);
%title('2-Mass control ref. step')
%subplot(212);
%step(F-G*K,G,H,0,1,t);
%title('2-Mass process noise step ')
% Now Lets design an Estimator
% Estimator Design for p^2 = 10000
L = lqe(F,G,H,10000,1)
140
L =
79.1232
96.9383
7.8252
30.6172
% Closed loop estimator poles
pe = eig(F-L*H)
pe =
-1.1554
-1.1554
-2.7770
-2.7770
+
+
-
2.9310i
2.9310i
1.2067i
1.2067i
% Check natural frequency
a_pe = abs(eig(F-L*H))
% Symmetric Root Locus for LQE design
%A = [F zeros(4); -H'*H -F'];
%B = [G ;0;0;0;0];
%C = [0 0 0 0 G'];
%rlocus(A,B,C,0)
%hold
%plot(pe,'*')
%title('SRL for LQE')
%axis([-3 3 -3 3]);
%axis('square');
%
A
B
C
D
Compensator state description for r,y to u. D1(s) D2(s)
= [F-G*K-L*H];
= [G*Nb L];
= -K;
= [Nb 0];
% Check poles and zeros of compensator
[Zr,Pr,Kr] = ss2zp(A,B,C,D,1);
[Zy,Py,Ky] = ss2zp(A,B,C,D,2);
[Zp,Pp,Kp] = ss2zp(F,G,H,0,1);
Zr =
-1.1554
-1.1554
-2.7770
-2.7770
+
+
-
2.9310i
2.9310i
1.2067i
1.2067i
Pr =
141
-0.9163
-0.9163
-4.2256
-4.2256
+
+
-
3.9245i
3.9245i
2.0723i
2.0723i
Zy =
-0.1677 + 0.9384i
-0.1677 - 0.9384i
-0.3948
Py =
-0.9163
-0.9163
-4.2256
-4.2256
+
+
-
3.9245i
3.9245i
2.0723i
2.0723i
Zp =
-25.2778
Pp =
0
-0.0198 + 1.0003i
-0.0198 - 1.0003i
0
% Combined Plant and Compensator state description (Open Loop) D(s)*G(s)
Fol = [F zeros(4) ; L*H F-L*H-G*K];
Gol = [G ; [0 0 0 0]'];
Hol = [0 0 0 0 K];
% An alternate derivation for D(s)*G(s)
[num1,den1] = ss2tf(A,B,C,D,2);
[num2,den2] = ss2tf(F,G,H,0,1);
num = conv(num1,num2);
den = conv(den1,den2);
% Plant/Compensator Root Locus
%rlocus(-num,den)
%rlocus(Fol,Gol,Hol,0)
%title('Plant/Compensator Root Locus')
%rcl = rlocus(num,den,-1)
%hold
%plot(rcl,'*')
%axis([-5 5 -5 5]);
%axis('square');
142
% Plant/Compensator Bode Plot
w = logspace(-1,1,200);
%bode(Fol,Gol,Hol,0,1,w)
[mag,phase] = bode(Fol,Gol,Hol,0,1,w);
title('Bode Plot')
[gm,pm,wg,wc] = margin(mag,phase,w)
gm =
1.5154
pm =
17.3683
% Closed loop Plant/Estimator/Control system for step response
Acl = [F -G*K ; L*H F-G*K-L*H];
Bcl = [G*Nb G [0 0 0 0]' ; G*Nb [0 0 0 0]' L];
Ccl = [H 0 0 0 0];
Dcl = [0 0 1];
t = 0:.01:15;
% Reference Step
%[Y,X] = step(Acl,Bcl,Ccl,Dcl,1,t);
%subplot(211)
%step(Acl,Bcl,Ccl,Dcl,1,t);
%title('2-Mass System Compensator Reference Step')
% Control Effort
%Xh = [X(:,5) X(:,6) X(:,7) X(:,8)];
%u = -K*Xh';
%subplot(212)
%plot(t,u+Nb)
%title('Control effort')
% Process Noise Step
%[Y,X] = step(Acl,Bcl,Ccl,Dcl,2,t);
subplot(211)
step(Acl,Bcl,Ccl,Dcl,2,t);
title('Process Noise Step')
% Sensor Noise Step
%[Y,X] = step(Acl,Bcl,Ccl,Dcl,3,t);
subplot(212)
step(Acl,Bcl,Ccl,Dcl,3,t);
title('Sensor Noise Step')
143
SRL for LQE
3
2
0
−1
−2
−3
−3
−2
−1
0
Real Axis
1
2
3
Plant/Compensator Root Locus
5
4
3
2
1
Imag Axis
Imag Axis
1
0
−1
−2
−3
−4
−5
−5
0
Real Axis
144
5
2−Mass System Compensator Reference Step
Amplitude
1.5
1
0.5
0
0
5
10
15
10
15
10
15
10
15
Time (secs)
Control effort
2
1.5
1
0.5
0
−0.5
0
5
Process Noise Step
2.5
Amplitude
2
1.5
1
0.5
0
0
5
Time (secs)
Sensor Noise Step
1
Amplitude
0.5
0
−0.5
−1
0
5
Time (secs)
145
4.2.3 Alternate reference input methods
xb_ = F xb + Gu + L(y ; H xb)
u = ;K xb
With additional reference the equations are :
xb_ = (F ; LH ; GK )xb + Ly + Mr
u = ;kxb + Nr
Case 1 :
M = GN
N = N
This is same as adding normal reference input.
Case 2 :
N =0
M = ;L
) xb_ = (F ; GK ; LH )xb + L(y ; r)
This is same as classical error feedback term.
Case 3:
M and N are used to place zeros.
Transfer function
where
N for unity gain.
) yr = N(s()s)b((ss))
e
c
)
(s) = det(sI ; A + MK
M = M
N
Now estimator poles are not cancelled by zeros. The zeros are placed so as to get good tracking
performance.
146
4.3 Discrete Estimators
4.3.1 Predictive Estimator
xk+1 = xk + ;uk + ;1 w
y = Hx + v
Estimator equation
) xk+1 = xk + ;uk + Lp(yk ; yk )
This is a predictive estimator as xk+1 is obtained based on measurement at k.
x~k+1 = xk+1 ; xk+1 = ( ; LpH )~xk + ;1 w ; Lpv
e(z ) = det(zI ; + Lp H )
Need O(,H) to build estimator.
4.3.2 Current estimator
Measurement update:
xbk = xk + Lc (yk ; H xk )
Time update:
xk+1 = xbk + ;uk
(128)
These are the equations for a discrete Kalman Filter (in steady state).
u
Γ
+
Σ
z
_
x
-1
−
H
+
Σ
+
Lc
+
Φ
^
x
Σ
+
Estimator block diagram
Substitute xb into measurement update equation:
) xk+1 = [xk + Lc (y ; H x)] + ;uk
= xk + Lc (y ; H xk ) + ;u
147
y
Comparing with predictive estimator
Lp = Lc
x~k+1 = ( ; Lc H )~xk
Now
O[; H ]
should be full rank for this estimator.
0
BB H
H 2
Oc = BB ..
@.
H n
1
CC
CC = Op
A
detOc = detOp det
) Cannot have poles at origin of open loop system.
Lc = ;1Lp
4.4 Miscellaneous Topics
4.4.1 Reduced order estimator
If one of the states xi = y, the output, then there is no need to estimate that state.
Results in more complicated equations.
4.4.2 Integral control
r +
Σ
.
xi
−
1
−
s
+
xi
-Ki
Σ
u
F,G,H
+
-K
x_ = Fx + Gu
y = Hx
x_ i = ;y + r = ;Hx + r
148
y
)
x_ i
x_
!
0 ;H
0 F
=
!
!
!
xi + 0 u
x
G
xi !
u = ;Ki K
x
Gains can be calculated by place command.
!
!
;Ki K = place( 00 ;FH ; 0G ; P )
n+1 poles can be placed.
Controllable if F; G are controllable and F; G; H has no zeros at s = 0.
4.4.3 Internal model principle
The output y (t) is made to track arbitrary reference r(t) (e.g. sinusoidal tracking for a spinning
disk-drive).
Augment state with r dynamics.
Ex 1 :
x_ = Fx + Gu
y = Hx
r_ = 0 ! constant step
Or
r ; w02r = 0 ! Sinusoid
Dene
as new state which is driven to zero.
e=y;r
4.4.4 Polynomial methods
b(s)
K
F,G,H,J
a(s)
Nb(s)
α c(s)
Direct transformation between input/output transfer function.
Often advantageous for adaptive control.
149
5 Kalman
(Lecture)
150
6 Appendix 1 - State-Space Control
Why a Space? The denition of a vector or linear space is learned in the rst weeks of a linear
algebra class and then conveniently forgotten. For your amusement (i.e. you-will-not-be-heldresponsible-for-this-on-any-homework-midterm-or-nal) I will refresh your memory.
A linear vector space is a set V, together with two operations:
+ , called addition, such that for any two vectors x,y in V the sum x+y is also a vector in V
, called scalar multiplication, such that for a scalar c 2 < and a vector x 2 V the product c V
is in V . 2
Also, the following axioms must hold:
(A1) x + y = y + x; 8x; y 2 V (commutativity of addition)
(A2) x + (y + z) = (x + y) + z; 8x; y; z 2 V (associativity of addition)
(A3) There is an element in V, denoted by 0v (or 0 if clear from the context), such that
(x + 0v = 0v + x = x); 8x 2 V (existence of additive identity)
(A4) For each x 2 V , there exists an element, denoted by ;x, such that
x + (;x) = 0v (existence of additive inverse)
(A5) For each r1; r2 in < and each x 2 V ,
r1 (r2 x) = (r1r2) x
(A6) For each r in < and each x; y 2 V ,
r (x + y) = r x + r y
(A7) For each r1; r2 in < and each x 2 V ,
(r1 + r2) x = r1 x + r2 x
(A8) For each x 2 V ,
1x = x
Example 1: n-tuples of scalars. (We will not prove any of the examples.)
82
>
>
66
>
<
4
<n = >666
>
>
:4
x1
x2
:
:
xn
9
3
>
>
77
>
77 j xi 2 <=
75
>
>
>
;
Example 2: The set of all continuous functions on a nite interval
C [a; b] =4 ff : [a; b] ! < j f is continuousg
2
< is used to denote the eld of real numbers. A vector space may also be dened over an arbitrary eld.
151
A subspace of a vector space is a subset which is itself a vector space.
Example 1
N (A) =4 x 2 <q j Ax = 0; A 2 <pq is a subspace of <q called the nullspace or A.
Example 2
R(A) =4 x 2 <p j 9u 2 Rq Au = x; A 2 <pq is a subspace of <p called the range of A.
And nally a very important example for this class.
Example 3
fx 2 C n[<+] j x_ = Ax for t 0g
is a subspace of C n [<+ ] (continuous functions: <+ ! <n ). This is the set of all possible trajectories
that a linear system can follow.
152
7 Appendix - Canonical Forms
Controller
2
3
;
a
1 ;a2 ;a3
Ac = 64 1 0 0 75
0
2 3
1
Bc = 64 0 75
1
0
0
h
i
Cc = b1 b2 b3
Ac is top companion
153
Observer
2
3
;
a
1 1 0
Ao = 64 ;a2 0 1 75
;a3 0 0
2 3
b1
Bo = 64 b2 75
b3
h
i
Co = 1 0 0
Ao is left companion
154
Controllability
2
3
0 0 ;a3
Aco = 64 1 0 ;a2 75
0 1 ;a1
2 3
1
Bco = 64 0 75
0
h
i
Cco = 1 2 3
Aco is right companion
155
Observability
2
3
0
1
0
Aob = 64 0 0 1 75
;a3 ;a2 ;a1
2 3
1 7
6
Bob = 4 2 5
3
h
i
Cob = 1 0 0
Aob is bottom companion
1
2
X
G(s) = s3 +b1sa s+2 b+2sa+s b+3 a = isi
1
2
156
3
1
Part VII
Advanced Topics in Control
1 Calculus of Variations - Optimal Control
1.1 Basic Optimization
1.1.1 Optimization without constraints
min
u L(u) without constraints.
Necessary condition
@L = 0
@u
@ 2L 0
@u2
1.1.2 Optimization with equality constraints
min
u L(x; u) subject to f (x; u) = 0
Dene
H = L(x; u) +
Since u determines x
Dierential changes on H due to x and u
| T f{z
(x; u)}
\adjoin" constraints
H =L
@H
dH = @H
@x dx + @u du
We want this to be 0, so start by selecting such that
@H = 0
@x
@H = @L + T @f = 0
@x @x
@x
@f ;1
) T = ; @L
@x @x
) dL = dH = @H
@u du
157
(129)
For a minimum we need
dL = 0
du
)
Summary
@H = 0 = @L + T @f = 0
@u
@u
@u
min
u L(x; u) such that f (x; u) = 0
where
Example
such that
@H = 0; @H = 0
@x
@u
H = L(x; u) + T f (x; u)
1 x2 + u2
min
L
(
u
)
=
u
2 a2 b2
!
f (x; u) = x + mu ; c = 0
!
2
2
H = 21 xa2 + ub2 + (x + mu ; c)
@H = x + = 0
@x a2
@H = u + m = 0
@u b2
x + mu ; c = 0
There are 3 equations and three unknowns x,u,
;
= 2 c 22
) x = a2 +a mc 2b2 ; u = a2 b+mc
2
2
mb
a +m b
2
2
158
(130)
1.1.3 Numerical Optimization
Non-linear case
T
min
u L(x; u) + f (x; u) = min
u H
wheref (x; u) = 0
@H
Setting @H
@x = 0 and @u = 0 leads to an algorithm for computing u:
1. Guess u, for f (x; u) = 0 solve for x.
@f ;1
2. T = ; @L
:
@x @x
@L
@H
T @f
3. @H
@u = @u + @u ; test for @u = 0.
4. u u ; @H
@u
Therefore one does a gradient search.
Related optimization
Inequality constraints L(x; u) subject to f (x; u) 0, requires Kuhn-Tucker conditions.
If L and f linear in x; u then use Linear Programming.
Constrained optimization + calculus of variations ! Euler-Lagrange Equations.
1.2 Euler-Lagrange and Optimal control
1.2.1 Optimization over time
Constraint
x_ = f (x; u; t); x(t0 )
with cost
min J = '(x(tf ); tf ) +
Adjoin the constraints
Z tf
u(t)
J = '(x(tf ); tf ) +
Z tf
t0
t0
L(x((t); u(t); t) dt
L(x(t); u(t); t) + t (t) ff (x; u; t) ; x_ g dt
where the Hamiltonian is dened as
H = L(x(t); u(t); t) + T (t)f (x; u; t):
Using integration by parts on the T x_ term
J = '(x(tf ); tf ) ; T (tf )x(tf ) + T (t0 )x(t0) +
159
Z tf
t0
H + _ T x dt:
To minimize variation in J due to variations in the control vector u(t) for xed times t0 and tf
minimize
@'
Z tf @H (x; u; t)
hT i
@H
(
x;
u;
t
)
T
T
_
J =
; xj + x j +
+ x +
u dt
@x
t=tf
t=t0
@x
t0
@u
Choose (t) such that the x coecients vanish
@L T @f
_ T = ; @H
@x = ; @x ; @x
(131)
T (tf ) = @x@'
(t ) :
(132)
f
Then
J = T (t0)x(t0) +
Z tf @H
@u u dt
But we still want to minimize J for variations u at the optimal u yielding the optimal J (i.e., we
t0
need @@uJ = 0). Then
@H = 0 = @L + T @f
@u
@u
@u
is called an `inuence function' on J .
(133)
Equations 131, 132 and 133 dene the Euler Lagrange Equations for a two-point boundary problem.
Solution of these equations leads to Pontryagin's minimum principle (w/o proof)
H (x; u; ; t) H (x; u; ; t)
Summary: given the cost
min J = '(x(tf ); tf ) +
u(t)
and constraint
Z tf
t0
L(x((t); u(t); t) dt;
x_ = f (x; u; t);
@L T @H T
_ = ;
@x ; @x = ; @x
@f T
where u is determined by
is given;
@H = 0 or @f T + @L T = 0:
@u
@u
@u
x(t0)
T
(tf ) = @'
@x tf
is the boundary condition. In general this leads to some optimal solution for u(t), which is determined by initial value x(t0) and nal constraint '(x(tf ); tf ), resulting in open-loop control since
the control is xed without feedback.
160
1.2.2 Miscellaneous items
The family of optimal paths for all initial conditions x(t0), 0 < t0 < tf is called a `eld of
extremals'.
If we solve for all states such that u = function(x; t) then this is the `optimal feedback control
law'.
J = J (x; t) gives the optimal return function.
1.3 Linear system / ARE
Consider the time-varying linear system
x_ = F (t)x + G(t)u; x(tf ) ! 0
with cost function
J = 12 (xT Sf x)t=tf + 21
Z tf
with Q and R positive denite.
t0
xT Qx + uT Ru dt
Then using the Euler-Lagrange equations from Equation 133
so that the control law is
and from Equation 131
with boundary condition
@H = 0 = Ru + GT @u
u = ;R;1GT _ = ;Qx ; F T (tf ) = Sf x(tf )
from Equation 132. Assimilating into a linear dierential equation
" # "
x_ = F ;GR;1GT
;Q ;F T
_
#" #
x ;
x(t0)
(tf ) = Sf x(tf )
Try a solution that will convert the two-point boundary problem to a one-point problem. As a
possible solution try (t) = S (t)x(t)
_ + S x_ = ;Qx ; F T Sx
Sx
so that
(S_ + SF + F T S ; SGR;1GT S + Q)x = 0
resulting in the matrix Ricatti equation
S_ = ;SF ; F T S + SGR;1GT S ; Q; with S (tf ) = Sf
Note that
u = ;R;1 GT S (t)x(t) = K (t)x(t)
161
where K (t) is a time-varying feedback.
S_ = 0 gives the steady state solution of the Algebraic Ricatti Equation (A.R.E) where tf ! 1.
Then K (t) = K ie state feedback.
The A.R.E may be solved
1. by integrating the dierential Ricatti equation to a xed point
2. symbolically
3. by eigen value decomposition
4. with the schur method
5. with a generalized eigen formulation.
6. MATLAB
1.3.1 Another Example
x_ = ax + u
J=
Z1
0
u2 + 2x2
with open loop pole at s = a. Steady state A.R.E gives
sF + F T s ; sGR;1 GT s + Q = 0
as + as ; s:1:1:1:s + 2 = 0
Therefore
and
s2 ; 2as ; 2 = 0
p
q
2
2
s = 2a 42a + 4 = a a2 + 2
q
K = R;1GT s = s = a + a2 + 2
with the closed loop poles at s = a ; k. See gure 17.
1.4 Symmetric root locus
Take the Laplace transform of the Euler Lagrange equations for the linear time invariant case of
example 1.3.
sx(s) = Fx(s) + Gu(s)
s(s) = ;Qx(s) ; F T (s)
u(s) = ;R;1GT (s)
Q = H T H2 and R = 1:
162
ρ= large
ρ=0
a
-a
u=-kx
Figure 17: Closed loop poles
Then
where
and
u(s) = GT (sI + F T );1H T H2(sI ; F );1 Gu(s)
;GT (;s) = GT (sI + F T );1H T
G(s) = H (sI ; F );1G;
the open loop transfer function. Then
h 2 T
i
1 + G (;s)G(s) u(s) = 0
giving the symmetric root locus (SRL).
Has 2N roots for the coupled system of x; of order N .
as t ! 1 u(t) = ;K1x(t) it becomes an N th order system with N stable poles.
1.4.1 Discrete case
Given the discrete plant
x(k + 1) = x(k) + ;u(k);
u = ;Kx
the task is to minimize the cost function
N h
i
X
J = 21
xT (k)Qx(k) + uT (k)Ru(k) :
k=0
Using a Lagrange multiplier
N h
i
X
J = 12
xT (k)Qx(k) + uT (k)Ru(k) + T (k + 1)(;x(k + 1) + x(k) + ;u(k))
k=0
leads to the Euler Lagrange equations
@H = uT (k)R + T (k + 1); = 0
@u(k)
@H
@(k + 1) = ;x(k + 1) + x(k) + ;u(k) = 0
@H = T (k + 1) + xT (k)Q ; T (k) = 0
@x(k)
163
(134)
(135)
(136)
(137)
From the cost function clearly u(N ) = 0 and from Equation 134 (N + 1) = 0 so that from
Equation 136
(N ) = Qx(N ):
Again choosing
(k) = S (k)x(k)
it follows that
u(k) = ;R;1 ;T (k + 1)
with the Discrete Ricatti Equation (non-linear in S )
h
i
S (k) = T S (k + 1) ; S (k + 1);N ;1;T S (k + 1) + Q;
with the end condition
S (N ) = Q
and where
N = R + ;T S (k + 1);:
The feedback gain then is
h
i;1
K (k) = R + ;T S (k + 1);
Note that the minimum cost is
;T S (k + 1):
J = 12 xT (0)S (0)x(0):
1.4.2 Predictive control
Here the cost is taken as
J (x; ut!t+N ) =
t+X
N ;1
k=t
L(K; x; u)
giving u (t); : : :u(t + N ). Only u(t) = K (t)x(t) is used, and the RHC problem re-solved for time
t + 1 to get u(t + 1). u(t) provides a stable control if N the order of the plant, and the plant is
reachable and observable.
Guaranteed stabilizing if prediction horizon length exceeds order of plant.
(SIORHC) = stabilizing input-output receding horizon control.
This is popular for slow linear and non-linear systems such as chemical batch processes.
See `Optimal predictive and adaptive control' { Mosca.
164
1.4.3 Other applied problems
Terminal constraint of x
Here x(tf ) = xf or
Adjoin v T (x(tf )) to J . Then
(x(tf )) = 0:
T @
+
v
T (tf ) = @'
@x
@x t=tf :
Examples include 1) robotics for a xed endpoint at xed time, 2) transfer of orbit radius in given
time.
t(F), r(F)
Thrust
r(0), t(0)
Unspecied time
Want to move an arm from A to B, but do not know how long it will take. A 4'th condition is
required together with the prior Euler Lagrange equations to minimize time as well
@'
T
@t + f + L t=tf = 0:
Some minimum time problems.
} Missile trajectory.
x_ = f (x; u; t) = f (t)x + g (t)u
J (u) =
Z tf
t0
1 dt = tf ; t0 ) L = 1
such that ju(t)j 1 is a nite control. Solving,
H = L + T [f (t) + g (t)u] ; L = 1
165
Clearly to minimize H the resulting optimal u is `bang-bang control'
u=
To solve for (
1 T g < 0
;1 T g > 0
)
_ T = ; @L ; T @f
=
; @H
@x
@x
@x
so that is obtained as the solution to
_ T = ;T f (t)
and
T [fx + gu] = ;1:
} Double integrator. Consider the system
x_1 = x2
x_2 = ax2 + u
µ =-1
x2
u=-1
u=+1
x 1
path
µ =+1
x(0)
Start
Figure 18: Double integrator
} Other minimum-time problems.
robotics (2 and 3 link arms)
min-time orbits
brachistochron problem (solutions are cycloids).
minimum control (ex. fuel) problems st ;1 ui 1 ) bang-o-bang control.
166
A
shifting bead
A -> B in minimum time:
shape of string?
θ
B
Figure 19: Brachistochron problem
2 Dynamic Programming (DP) - Optimal Control
2.1 Bellman
Dynamic Programming represents an alternative approach for optimization.
b
J ab
J bc
c
a
Suppose that we need to determine an optimal cost to go from point a to point c.
If the optimal path from a to c includes the path from a to b, then the path from b to c is
the optimal path from b to c.
i.e., Jac = Jab + Jbc
Bellman: An optimal policy has the property that whatever the initial state and the initial
decision are, the remaining decisions must constitute an optimal policy with regard to the
state resulting from the rst decision.
167
2.2 Examples:
2.2.1 Routing
c
*
J cf
J bc
*
J bd
d
b
f
*
J be
J df
J ef
e
Start out with paths bc or bd or be.
The optimal path from b to f is hence the minimum of Jbc + Jcf , Jbd + Jdf , Jbe + Jef
Jcf , Jdf and Jef are determined by back-tracking.
The viterbi algorithm for processing of speech is based on dynamic programming.
2.2.2 City Walk
a
d
8
e
h
Final Point
8
3
N
W
5
5
2
E
2
S
9
b
3
3
c
f
g
Want to 'walk' from c to h. Which is the optimal path?
We have two choices; proceed along path from either c to d or c to f .
=
So, Ccdh
=
Ccfh
Jch =
=
= 15
Jcd + Jdh
=8
Jcf + Jfh
; C g
minfCcdh
cfh
minf15; 8g = 8
initially, which implies a back As with the routing example, we need to know Jdh and Jfh
tracking problem.
2.3 General Formulation
(x(k)) = min[Jk;k+1 (X (k); U (k))+ J JkN
k+1;N (X (k + 1))]
U (k)
168
) We are minimizing only with respect to U(k), instead of U's over the entire path. Let
k ! N ; k. So,
JN ;k;N (X (N ; k)) = U (min
[J
(X (N ; k); U (N ; k)) + JN ;(k;1);N (X (N ; k + 1))]
N ;k) N ;k;N ;k+1
) JN ;(k;1) ! JN ;k;N
This implies a backward recurrence.
4
=
Start at the last time step and dene JNN
JNN .
Then solve for JN ;1;N and work backwards.
Curse of Dimensionality: For all possible X , suppose that X 2 <3 and Xi can take-on
100 values. Thus we need to have already found and stored 102 102 102 = 106 values for
JN ;(k;1);N to solve for JN ;k;N .
Dynamic programming is not practical unless we can intelligently restrict the search space.
2.4 LQ Cost
Given
we have the following:
Xn+1 = AXn + bUn
Yn = HXn
NX
;1
1
J0;N = 2 (X T (k)QX (k) + U T (k)RU (k)) + 21 X T (N )QX (N )
k=0
Need to minimize J0;N over time. We saw before that the solution to the above involves a Ricatti
equation. By denition,
J0;N (X0) = U ;U min
J (X )
;U ;:::;U 0;N 0
0
1
2
N
Let's write this as follows:
NX
;1
J0;N = 12 X T (0)Q(0)X (0)+ U T (0)RU (0) + 21 (X T (k)QX (k) + U T (k)RU (k)) +X T (N )QX (N )
|
{z
} | k=1
{z
}
cost at k = 0 due to U0
cost to go from k = 1
= min
f
U 2<
T
T
X
| 0 QX0{z+ U RU}
+
J1;N (AX0 + bU )
|
{z
g
}
initial cost to go from X to AX +bU min. cost starting at AX +bU
Next, express the cost at the nal state as the following.
= 1 X T (N )QX (N )
JNN
2
Let S(N)=Q,
JN ;1;N = 21 X T (N ; 1)QX (N ; 1) + 12 U T (N ; 1)RU (N ; 1) + 21 X T (N )S (N )X (N )
0
0
169
0
0
(138)
where, X (N ) = (AX (N ; 1) + BU (N ; 1))
JN ;1;N =4 u(min
J
N ;1) N ;1;N
@JN ;1;N = 0 = RU (N ; 1) + B T S (N )[AX (N ; 1) + BU (N ; 1)]
Take @U
(N ; 1)
@ J is P.D.)
(Exercise: Show that @U
2
2
) U (N ; 1) = ;| [R + BT S (N{z)B];1S (N )A} X (N ; 1)
K (N ;1)
Substitute U (N ; 1) back in equation 138. ) JN ;1;N = 12 X T (N ; 1)S (N ; 1)X (N ; 1)
where S (N ; 1) is the same as we derived using the Euler-Lagrange approach.
S (N ; 1) is given by a Ricatti Equation.
Continue by induction.
Comments
J is always quadratic in X . This makes it easy to solve.
As N ! 1, this becomes the LQR. S (k) = S , and Jk = XkT SXk
2.5 Hamilton-Jacobi-Bellman (HJB) Equations
This is the generalization of DP to continuous systems.
J = '(X (tf ); tf ) +
Z tf
t
L(X ( ); U ( ); )d
such that, F (X; U; t) = X_ . Need to minimize J using DP.
J (X (t); t) = U ( );t<<t
min [
+4t
Z t+4t
t
Ld + J (X (t + 4t); t + 4t)]
by the principle of optimality.
Expand J (X (t + 4t; t + 4t)) using Taylors series about (X (t); t). So,
J (X (t); t) =
min
Z t+4t
[
u( );t<t+4t t
T
+[ @J
@X (X (t); t)] [
L(X; U; )d + J (X (t); t) + [ @J
@t (X (t); t)]4t
(|X (t{z+ 4t)}
;X (t)]]
F (X;U;t)4t+ higher order terms
@J (X (t); t)[F (X; U; t)]4t]
= min[L(X; U; t)4t + J (X (t); t) + @J
(
X
(
t
)
;
t
)
4
t
+
@t
@X
U (t)
|
{z
does not depend on U
170
}
Divide by 4t and take limit as 4t ! 0
@J (X (t); t)F (X; U; t)]
[
L
(
X;
U;
t
)
+
0 = @J (X (t); t) = min
u
@t
Boundary Condition: J (X (tf ; tf ) = '(X (tf ; tf )
@X
Dene Hamiltonian as:
4
@J (X (t); t)[F (X; U; t)]
H (X; U; @J
;
t
)
=
L
(
X;
U;
t
)
+
@X
@X
(recall H from Euler-Lagrange equation ) T (t) = @J
@X (X (t); t)
inuence function)
The optimal Hamiltonian is given by:
@J ; t)
H (X; U ; @J
;
t
)
=
min
H
(
X
(
t
)
;
U
(
t
)
;
@X
@X
U (t)
(X; U ; @J ; t)
Thus, 0 = @J
+
H
@t
@X
Equations 139 and 140 are the HJB equations.
Exercise: Derive the E-L equations from the H-J-B Equation.
2.5.1 Example: LQR
X_ = A(t)X + B (t)U
Z tf
1
1
T
J = 2 X (tf )Sf X (tf ) + 2 X T QX + U T RUdt
t
1
1
H = 2 X T QX + 2 U T RU + @J
@X [AX + BU ]
0
So, to minimize H,
@H = RU + B T @J = 0
@U
@X
(note that @@UH = R is P.D. ! global min.)
2
2
@J
) U = ;R;1 BT @X
Thus,
@J T BR;1 B T @J + @J T AX ; @J T BR;1 BT @J
H = 12 X T QX + 12 @X
@X @X
@X
@X
T
@J BR;1 B T @J + @J AX
= 12 X T QX ; 12 @X
@X @X
Thus, the H-J-B equation is:
T
0 = @J + 1 X T QX ; 1 @J BR;1 B T @J + @J AX
@t 2
2 @X
@X @X
171
(139)
(140)
With boundary conditions,
Let's guess J = 1 X T S (t)X (t)
| 2 {z
}
quadratic
Then we get,
@J (X (t ); t ) = 1 X T (t )S X (t )
f f
f f
f
@X
2
_ + 1 X T QX ; 1 X T SBR;1 B T SX + X T SAX
0 = 12 X T SX
2
2
1
1
T
let SA = 2 [SA + (SA) ] + 2 [SA ; (SA)T ]
|
{z
symmetric
} |
{z
un-symmetric
}
Also, X T ( 12 [SA ; (SA)T ])X = 0
Thus,
_ + 1 X T QX ; 1 X T SBR;1 BSX + 1 X T SAX + 1 X T AT SX
0 = 12 X T SX
2
2
2
2
Now, this must hold for all X
) 0 = S_ + Q ; SBR;1 BS + SA + AT S
which is the Ricatti equation we saw before. With boundary conditions,
S (tf ) = Sf
2.5.2 Comments
Equation 140 is another form of Pontryagin's minimal principle.
Involves solving partial dierential equations which may not lead to closed form solution.
For global optimality, we need:
@ 2H 0 ) Legendre-Clehsch condition
@U 2
J is a necessary condition to satisfy the H-J-B equations. It can also be proved that J is a
sucient condition.
Solution often requires numerical techniques or the method of characteristics.
The H-J-B equations relate dynamic programming to variational methods.
172
3 Introduction to Stability Theory
3.1 Comments on Non-Linear Control
3.1.1 Design Methods
Linearization of the plant followed by techniques like adaptive control.
Lots of "rules of thumb", which are not well-dened.
Not as easy as Bode, root-locus (i.e., classical linear control).
Methods include: DP, E-L, neural and fuzzy.
3.1.2 Analysis Methods
Many analysis methods for non-linear systems.
Equivalent concepts of controllability and observability for non-linear systems, but very hard
to prove.
Some common ones in control: describing functions, circle theorems, Lyapunov methods,
small-gain theorem, passivity (the latter two are very advanced methods).
Most of the analysis methods are concerned with stability issues of a given system X_ =
F (X; U ).
3.2 Describing Functions
This is an old method which is not always rigorous.
Consider systems of the following conguration.
NL
G(Z)
where, NL stands for simple non-linearities like any of the following:
173
Consider the following.
Input
NL
Output
P Y sin(n!t + ) (sum of harmonics).
Let input=A sin !t. Then, output=A0 + 1
n
n=1 n
Describing function D(A) = YA1 6 1
of output
= fundamental
input
Example: Consider X to be a saturation function of the following form:
8
9
>
>
+
m
X
>
1
<
=
4
X = > kX jX j < 1 >
: ;m X < 1 ;
m
k
x
-m
So,
D(A) =
(
q
)
2m 1 ; m 2 Ak > 1
+
Ak A
Ak
m
Ak 1
k
m
2k arcsin m
174
D(A)
k
Ak
m
Note:
{ For certain regions below saturation, there is no distortion.
{ As the gain is increased, Y1 decreases.
Describing functions give us information regarding stability and limit cycles.
Steps:
1. Plot the Nyquist of G(Z ).
2. Plot ; D(1A) on the same graph.
Im
G(w)
-1/D(A)
Re
gives amplitude and frequency
of possible limit cycle
If the 2 plots intersect, then there may exist instabilities.
3.3 Equivalent gains and the Circle Theorem
This is a rigorous way of formalizing the describing functions D(A) (developed in the '60s).
175
Consider the following non-linearity:
slope k2
φ (x)
slope k1
x
The N-L is bounded with two lines of slope k1 and k2, such that:
Hence,
Example: quantizer
k1 (XX ) k2
(X ) 2 sector[k1; k2]
x
Azermans Conjecture, 1949
Consider the following system:
176
U
F,G,H
Y
k
If this system is stable for k1 k k2, then the following system is also stable.
U
F,G,H
Y
φ (.,t)
memoryless,
time-varying
where, (:; t) 2 sector[k1; k2]. This is actually not true.
Circle Theorem: (Sandberg, Zames 1965)
The system is stable if the Nyquist plot of G(z ) does not intersect the disk at ;k 1 and ;k 1 , for
k1 > 0 and k2 < 1, as in the following gure.
1
177
2
disk
Nyquist
-1
k2
-1
k1
Now, if k1 = 0 ) k1 = 1, then the following case would result.
1
-1
k2
make sure the quantizers Nyquist
does not cross this boundary
A related theorem is Popovs theorem, for time-independent (t).
3.4 Lyapunovs Direct Method
If X_ = F (X; t)
where, F (0; t) = 0, then, is X = 0 stable at 0?
We proceed to dene stability in a Lyapunov sense.
178
Find V (X ) such that:
{ V (X ) has continuous derivatives.
{ V (X ) 0 (P.S.D. near 0)
(V (X ) (jjX jj) for global asymptotic stability).
V (X ) is the Lyapunov Function.
Then, if
V_ (t; X ) 0 8t t0 8X 2 Br
where, Br is a ball of radius r, then,
X_ = F (X; t) is stable at 0
The system is stable if
jX0j < ) jX j < The system is asymptotically stable if
jX0j < ) X ! 0 as ! 1
The system is globally stable if the above "balls" go to 1.
STABLE
Xo
δ
ε
ASYMPTOTICALLY
STABLE
Think of V as an energy function. So, V_ represents the loss of energy.
Thus, if
V_ (t; X ) < 0 ! asymptotically stable
0 ! stable
Need La Salles Theorems to prove asymptotic stability for the case.
A common Lyapunov function is the sum of the kinetic and the potential energies as in the
case of a simple pendulum.
179
3.4.1 Example
Let
X_ = ; sin X
Here, X = 0 is an equilibrium point. So, let
V (X ) = X 2
) V_ = 2X X_ = ;2X sin X
Note that the above V (X ) is not a valid Lyapunov function for unlimited X . So, we need to nd
a "ball" such that it is a Lyapunov function. But, sin X 21 X for 0 < X < C (C is a constant)
sin x
0.5x
−π
π
Thus,
x
V_ ;X 2 for jX j C
So,
V_ 0 within the "ball"
) 0 is an asymptotically stable equilibrium point.
unstable
x(t)
asymptotically stable
t
0
near 0, x(t)
180
0 as t
0
3.4.2 Lyapunovs Direct Method for a Linear System
Let
X_ = AX
Therefore, postulate
V (X ) = X T PX
where, P = P T 0
J (X0 ) =
=
Z1
0
XT
0
X T QXdt
Z1
|0
eAT tQeAt dt X0
{z
P
}
= V (X0)
) V_ = ;X T QX < 0
where, Q = ;AT P + PA ) Lyapunov Equation
If A is asymptotically stable, then for Q = QT 0, there is a unique P = P T 0, such that,
AT P + PA + Q = 0
Exercise: Plug-in u = ;kX and show we get the algebraic Ricatti equation.
This approach is much harder for time-varying systems.
3.4.3 Comments
Lyapunovs theorem(s) provide lots of variations as far as some of the following parameters
are concerned.
{ Stability
{ Global stability
{ Asymptotic global stability, etc.
What about adaptive systems ? An adaptive system can be written as a non-linear system
of higher order.
U
So,
W
Wk+1 = Wk + F (U; Y; C )
) use Lyapunov function to determine stability.
Y = WU or Y = F (X; U; W )
181
Y
So,
W_ = ;eV = ;(d ; F (X; U; W ))X
which is a non-linear dierence equation that is updated. The Lyapunov function gives
conditions of convergence for various learning rates, .
MATLAB lyap solves the Lyapunov equation and dlyap gives the solution for the discrete
case.
3.5 Lyapunovs Indirect Method
X_ = F (X ) is asymptotically stable near equilibrium point Xe if the linearized system X_ L =
DF (XE )XL = AXL is asymptotically stable (other conditions).
3.5.1 Example: Pendulum
θ
 = + sin The suitable lead controller would be:
(s) = ;s4s++3 4 (s)
In state form,
X1 = X2 = _
X3 = state of the controller
So,
X_ 3 = ;3X3 ; 2
= ;4 ; 4X3
2 _ 3 2
3
X1
X
2
64 X_ 2 75 = 64 sin X1 ; 4X1 ; 4X3 75
;3X3 ; 2X1
X_ 3
182
Linearize this at X=0.
2
3
0 1 0
) A = 64 ;3 0 ;4 75
;2 0 ;3
So,
eig A = -1, -1, -1 ) stable
Hence, we have designed a lead-lag circuit which will give asymptotic stability for some region
around zero. The non-linear structure is stable as long as we start near enough 0.
3.6 Other concepts worth mentioning
Small gain theorem.
e
U1
1
+
Y
Y1
H1
H2
2
e
+
U2
2
A system with inputs U1 and U2 and outputs Y1 and Y2 is BIBO stable, provided H1 and H2
are BIBO stable and the product of the gains of H1 and H2 is less than unity. This is often
useful for proving stability in adaptive control.
Passivity.
These are more advanced and abstract topics used for proving other stability theorems. They are
also useful in adaptive systems.
4 Stable Adaptive Control
4.1 Direct Versus Indirect Control
Direct control
{ Adjust control parameters directly.
{ Model reference adaptive control is a direct control method.
Indirect control
{ Plant parameters are estimated on-line and control parameters are adjusted.
183
{ Gain scheduling is an open loop indirect control method where predetermined control is
switched based on current plant location.
Gain Scheduling
4.2 Self-Tuning Regulators
Self-Tuning Regulator
A Control Structure
184
System identication is carried out by RLS, LMS etc.
Issues: The inputs should persistently excite the plant so that system identication is properly
done.
Control Design
{ Time varying parameters
{ PID
{ LQR
{ Kalman (If the system is linearized at each time point then it is called Extended Kalman)
{ Pole-placement
{ Predictive control
{ The design is often done directly in polynomial form rather than state space. This is
because system ID (e.g. using RLS) provides transfer functions directly.
Issues:
{ Stability
{ Robustness
{ Nonlinear issues
4.3 Model Reference Adaptive Control (MRAC)
M
ym
Km
s+am
+
r
e
P
Co
+
+
yp
Kp
u
s+ap
do
Model Reference Adaptive Control - An example
y_p(t) = ;ap yp (t) + Kpu(t)
y_m (t) = ;am ym(t) + Kmr(t)
185
(141)
Let
u(t) = co (t)r(t) + do (t)yp(t)
Substitute this u(t).
Note that if
; am
co = KKm do = ap K
p
Then
p
y_p (t) = ;am yp(t) + Km r(t)
Consider Gradient descent
@ = ; @e2o () = ;2e @eo ()
o @
@t
@
= ;2eo @yp
@
P
r
Co
+
+
yp
Kp
u
s+ap
do
!
p
yp = s K
+ ap (doyp + rco)
where
@yp = P r
@co
@yp = P y
p
@do
p
P = sK
; ap
!
If we don't know plant.
{ Just use proper sign of Kp and ignore P i.e. set P = 1 or
{ Use current estimates of Kp and ap (MIT Rule)
With P = 1
) c_ = ; e r
d_ = ; e yp
186
Is the system stable with this adaptation?
Dene
r = co ; co
y = do ; do
Then eo = yp ; ym
After some algebraic manipulations
e_o = ;am eo + Kp (r r + y eo + y ym)
_r = ; eo r
_y = ; eo yp = e2o ; eo ym
Dene Lyapunov function
2
p 2
2
V (eo; r ; y ) = e2o + K
2 (r + y )
V_ = ;ame2o + Kpr eo r + Kpy e2o + Kpo eo ym ; Kpr eo r ; Kr y e2o ; Kpy eo ym
= ;am e2o <= 0
) This is a stable system since V is decreasing monotonically and bounded below.
lim V (t) as t ! 1 exist. Thus eo ! 0 as t ! 1.
Indirect method can also be used
{ Estimate Kcp; abp and design co and do.
4.3.1 A General MRAC
Plant
Model
pb(s) = Kp nbbp(s) = yup((ss))
dp(s)
M (s) = Km ndm((ss)) = ybrbm((ss))
m
187
Model Reference Adaptive Control
Controller Structure
A Control Structure
Existence of solution
Adaptation Rules
{ Many variations
"Heuristic" Gradient descent
Try to prove stability and convergence
{ Start with Lyapunov function
Find adaptation rule.
Issues
{ Relative order of systems
{ Partial knowledge of some parameters.
188
5 Robust Control
Suppose plant P belongs to some family P ; Robust Control provides stability for every plant
in P . (Worst case design)
Nominal performance specications
y(s) = 1 +g(gs)(ks)(ks)(s) r(s) + 1 + g(1s)k(s) d(s) ; 1 +g (gs()sk)(ks()s) n(s)
L = gk = open loop gain
J = 1 + gk = return dierence
S = 1 +1 gk = sensitivity transfer function
T = 1 +gkgk = complementary transfer function
S+T = 1
Command tracking
{ Want gk large
{ S has to be small (low frequency)
Disturbance rejection
{ S has to be small (low frequency)
Noise suppression
{ T has to be small (high frequency)
189
(142)
Desired Open Loop Response
5.1 MIMO systems
A = GK (I + GK );1
where A is Transfer -Function Matrix
190
i(A) = (i(AA)) ! Singular values
1
2
S = (I + GK );1 T = GK (I + GK );1
Singular plots
{ Plots of i(A) Vs frequency, which are generalization of Bode plots.
5.2 Robust Stability
Suppose we design for G(s)
The design is robust if the closed loop system remains stable for true plant G~(s)
5.2.1 Small gain theorem (linear version)
jG(s)K (s)j < 1 ) stability
jG(s)K (s)j jG(s)jjK (s)j
) Stability guaranteed if jG(s)jjK (s)j < 1
G(s)K(s)
-1
Instability according to Nyquist
191
5.2.2 Modeling Uncertainty
We use G(s) for design whereas the actual plant will be G~ (s).
G~ (s) = G(s) + a(s) ! Additive
G~(s) = (1 + m (s))G(s) ! Multiplicative
192
5.2.3 Multiplicative Error
M (s) = 1 ;+GG((ss))(KK(s()s)
Closed system is stable if
1
jGK (1 + GK );1j
) j j < 1
jmj <
m
jT j
Bound the uncertainty
Suppose jm j < This is stable if
jT j <
Smallest destabilizing uncertainty
(Multiplicative Stability Margin)
193
1
MSM = M1
r
Mr = sup jT (jw)j = jjT jj1 ! innity norm
w
This represents distance to -1 point on Nyquist plot.
For MIMO
For additive uncertainty
= sup (T (jw)) = jjT jj1
w
Additive Stability Margin is given by
1
jaj < jKS
j
ASM = jjKS1 jj
1
recall jT + S j = 1
If we want MSM large then T has to be small. This is good for noise suppression but conicts
with tracking and disturbance.
Trade o
jjT jj1 < 1
jjKS jj1 1
T +S = 1
194
5.3 2-port formulation for control problems
195
closed loop transfer function from w to z is:
5.4 H1 control
Standard problem:
Find K (s) stabilizing so that jjTzw jj1 Optimal control
Find K(s) stabilizing to minimize jjTzw jj1
Typically solve standard problem for , then reduce until no solution possible to get the
optimal control.
(LQG results from minjjTzw jj2)
Many assumptions like stabilizable, detectable for solution to exist.
Two port block diagram
Transfer function:
State-space:
Packed-matrix notation
196
H1 Controller
Controller
Estimator
Packed-Matrix Form
Controller/Estimator gains
H1 constants:
197
Closed loop system:
Necessary conditions for the above solutions are existence of Ricatti equations, and,
H1 Controller
198
5.5 Other Terminology
Generalization to nite horizon problems
jjTzw jj1 ! jjTzw jj[O;T ]
where [O,T] is induced norm i.e. L2 [O; T ]
This is a much harder problem to solve as it is a 2-point boundary value problem
Loop Shaping: Design for characteristics of S and T.
{ Adjust open loop gains for desired characteristics (possibly by some pre-compensators)
{ Determine opt using H1 design and repeat if necessary.
{ May iterate (for example to reduce model order).
199
Part VIII
Neural Control
Neural control often provides an eective solution to hard control problems. This does not imply
that more traditional methods are necessarily incapable of providing as good or even better solutions. The popularity of neural control can in part be attributed to the reaction by practitioners
to the perceived excessive mathematical intensity and formalism of recent development in modern
control theory. Although neural control techniques have been inappropriately used in the past, the
eld of neural control is now reaching a level of maturity where it is becoming more widely accepted
in the controls community.
Certain aspects of neural control are especially attractive, e.g., the ability to systematically deal with
non-linear systems, and simple generalizations to multi-input-multi-output problems. However, we
must caution students that this eld is still at its infancy. Little progress has been made to
understand such issues as as controllability, observability, stability, and robustness.
1 Heuristic Neural Control
We start with some \heuristic" methods of control which utilize neural networks in conjunction
with classical controllers. 1) Control by emulation, in which we simply attempt to learn an existing
controller. 2) Inverse and open-loop control. 3) Feedback control, where the feedback is ignored
during designing the controller. These methods, while rather ad-hoc, can be quite eective and are
common in industrial applications today.
1.1 Expert emulator
If a controller exists we can try to emulate it as illustrated below
y
controller
r,x
e
+
y
G
There is no feedback and we train with standard BP. This is useful if the controller is a human and
we want to automate the process.
An example of a network trained in this manner is ALVINN at CMU. The input to the controller
was a 30 by 32 video input retina and the output was the steering direction. The neural controller
was trained by \watching" a human drive. Fifteen shifted images per actual input were used
during training to avoid learning to just drive straight. Two hundred road scenes from the past
200
are stored and trained on to avoid \forgetting". Fifteen of the two hundred are replaced for each
new input image. Two networks were trained, one for single lane roads, and another for double
lane roads. Each network was also trained as an auto associator. The network with the best L.S.
reconstruction of the input image gave a condence level to determine which network got to drive
(*SHOW VIDEO).
1.2 Open loop / inverse control
Early systems were designed open loop making the neural controller approximately the inverse
of the plant: C P ;1 .
Not all systems have inverses, such as the robotic arm illustrated below. (To resolve this problem,
it is necessary to incorporate trajectory constraints.)
Direct inverse modeling: as illustrated in the gure below, one could reverse the order of plant
and controller and use standard BP to train the net before copying the weights back to the actual
controller. This is a standard approach for linear systems. However this is a bit dubious for nonlinear systems since the plant and controller do not necessarily commute.
201
d
y
r
N
P
e
d
y
r
P
N
e
Back-popagate
Indirect learning.: It often makes more sense to train the direct system. Consider the uncom-
muted plant and controller in the previous gure. Taking the gradient of the error with respect to
the weights in the neural controller:
@e2 = 2eT @y
@w
@w
@y @u
= 2eT @u
@w
= 2eT Gp Gw
where Gp and Gw are the jacobians.
We could calculate Gp directly if we had the system equations.
We can also calculate the Jacobian by a perturbation method 44UY . If U is dimension N , Step
0: forward pass nominal U. Step 1 through N: forward pass [U1 ; U2 : : :Ui+4 ; : : :UN ].
If P is a neural net, eT Gp is found directly by backpropagation. To see this, we observe that
@eT e = 2eT @y = 2eT G = 0
xy
@x
@x
where 0 are found by backpropagating the error directly through the network, as can be
argued from the following gure:
202
From Backprop
input layer
s
l
f
δ
x=s o
a
l
o
linear funct f(s)=s
Thus we see backpropagation gives the product of the Jacobian and the error directly. The
complexity: O(N H1 + H1 H2 + H2 M ) is less than calculating the terms separately and
then multiplying. Note, we could get the full Jacobian estimate by successively backpropagating basis vectors through model.
The conclusion is that it is useful to build a plant emulator { system ID.
We then train the controller as illustrated below:
This model is appropriate when we have a state representation of our plant. The state vector
x with the reference are the inputs to a static network. If we do not have full state information,
203
then we might use a non-linear ARMA model (NARMA).
yk = P^[u(k); u(k ; 1); : : :; u(k ; m1); y (k ; 1); : : :; y(k ; m2 )]
This may also be viewed as a simple generalization of linear zero/pole systems. With the
recursion, gradient descent training is more complicated. We will explore adaptation methods later
(Elman Nets just ignore the consequence of feedback).
We would then need to \backprop" through a dynamic system illustrated below. We will develop
methods to do this later.
Often, it makes more sense to train to some \model reference" rather than a direct inverse:
204
1.3 Feedback control - ignoring the feedback
These methods use neural nets heuristically to improve dynamics in conjunction with \classical"
methods. See Figure below. We will illustrate the approach through several examples.
1.4 Examples
1.4.1 Electric Arc furnace control
Figures 20 to 22 show the plant. Figure 23 shows icker removal and the need for predictive
control. Figure 25 shows the regulator emulator, furnace emulator and combined furnace/regulator.
Figure 24 shows the plant savings. (SHOW VIDEO)
Figure 20: Electric arc furnace
205
Figure 21: Furnace system
Figure 22: Three phase system
206
Figure 23: Flicker remover and predictive control
Figure 24: Plant saving
207
208
Figure 25: Regulator emulator, furnace emulator and combined furnace/regulator
1.4.2 Steel rolling mill
The steel rolling mill is characterized by the following parameters (see Figure 26)
Figure 26: Steel rolling mill
fw
rolling force
he strip thickness at input
ha strip thickness at output
le; la
sensor locations
sr
roll dimension
v1
input velocity
vw
output velocity
he , ha, v1, vw , fw are measured values. Goal: keep ha at haref (step command).
J=
Z t=tmax
t=0
(haref ; ha (t))2 dt
A linear PI controller is shown in Figure 27 and the system's closed-loop response in Figure 28.
Figure 29 shows the open-loop controller with a neural-net model of the plant's system inverse.
The neural net implements a RBF (radial basis function) network
y=
N
X
i=1
wiG(x; xi; ):
The inverse plant model is reasonable since it is a smooth static SISO mapping. Figure 30 shows
the system response with this neural inverse open-loop controller. Note the poor steady-state error.
Figure ?? shows the closed-loop PI controller with a neural-net model of the plant's system inverse.
The PI controller sees an approximately linear system. Figure 31 shows the system response with
209
Figure 27: Feedforward and feedback linear controller
Figure 28: Closed-loop response of feedforward and feedback linear PI controller
Figure 29: Open-loop controller with neural model of the plant's system inverse
210
Figure 30: Response of the system with neural inverse as open-loop controller
Figure 31: Closed-loop PI controller with neural model of the plant's system inverse
Figure 32: Response of the system with PI controller and inverse neural net model
211
this neural inverse closed-loop PI controller. Note the improved steady-state error.
Internal model control (IMC): A control structure, popular for non-linear control. See Fig-
ure 33.
Feeding back error rather than output.
Given plant and controller which are input/output stable, and perfect model of plant then
closed loop is stable.
Advantage: uncertainty of model is fed back to system.
G(s) = 1 + G GG1G;p G G
1 p
1 m
if G1Gm = 1 ) G(s) = 1, and zero s.s. for non-linear system. Conclusion: need inverse of model
to be perfect (not of plant). (Filtering may be done additionally for high frequency removal.)
Figure 34 shows the response of the system with IMC. Since G1 6= 1=Gm exactly there is some s.s.
error.
Figure 33: Internal model control structure
212
Figure 34: Response of system with IMC
2 Euler-Lagrange Formulation of Backpropagation
It is possible to derive backpropagation under the framework of optimal control using the EulerLagrange method. Referring back to the feed-forward neural network model as presented in class,
we have the following formulation for the activation values
A(l) = f (w(l)A(l ; 1))
(143)
where f is the sigmoid function and w(l) is the weight matrix for layer l. We dene A(0) to be
the vector of inputs, x, and A(L) to be the vector of outputs, Y . (Think of the weights w as the
control u, the activation A as the state, and the layer l as the time index k.)
Recall that we desire
min
w J=
n
X
k=1
kDk ; Ak (L)k2
(144)
subject to certain constraints which in this case are given in equation 143. So adjoining equation
143 to equation 144 results in the following energy function
J=
n (
X
k=1
kDk ; Ak (L)k2 +
What is desired then is
L
X
l=1
k (l)T (Ak (l) ; f (w(l)Ak(l ; 1)))
)
@J
@J
@Ak (l) = 0 and @ w(l) = 0
Note that w is not a function of k. We will consider the rst condition in two cases
@J ) (L) = 2(D ; A (L))
k
k
k
@Ak (L)
@J ) (l) = wT (l + 1)rf (w(l + 1)A (l)) (l + 1)
k
k
k
@A (l)
k
and we will make the denition
k (l) rf (w(l)Ak(l ; 1))k(l) = fk0 (l)k(l)
213
So we have
k (L) = 2fk0 (L)(Dk ; Ak (L)) and
k (l) = fk0 (l)wT (l + 1)k (l + 1)
which we recognize from before as backpropagation. Now we need an update rule for the weights
n
@J = 0 = X
fk0 (l)k (l)ATk (l ; 1)
@ w(l)
k=1
=
n
X
k=1
k (l)ATk (l ; 1)
which can be minimized using stochastic gradient descent. The update rule is then
wk (l) wk (l) ; k (l)ATk (l ; 1)
3 Neural Feedback Control
Incorporating feedback, neural system can be trained with similar objectives and cost functions
(LQR, minimum-time, model-reference, etc.) as classical control but solution are iterative and approximate. \Optimal control" techniques can be applied which are constrained to be approximate
by nature of training and network architectural constraints.
A typical controller implementation for a model reference approach is shown below.
214
In order to understand how to adapt such structures, we must rst review some algorithms for
training networks with feedback.
3.1 Recurrent neural networks
We provide some review of the more relevant recurrent structures and algorithms for control.
r(k-1)
X(k)
X(k-1)
N
z
-1
The above gure represents a generic network with feedback expressed as:
x(k) = N (x(k ; 1); r(k ; 1); W )
The network may be a layered structure in which case we have output feedback. If there is only
one layer in the network then this structure is equivalent to a fully recurrent network.
If there is a desired response at each time step, then the easiest way to train this structure is to feed
back the desired response d instead of x at each iteration. This eectively removes the feedback
and standard BP can be used. Such an approach is referred to as \teacher forcing" in the neural
network literature or \equation error" adaptation in adaptive signal processing. While simple, this
may lead to a solution which is biased if there is noise in the output. Furthermore, in controls, often
a feedback connection represents some internal state which we may not have a desired response for.
In such case, we must use more elaborate training methods:
3.1.1 Real Time Recurrent Learning (RTRL)
RTRL was rst present by Williams and Zipser (1989) in the context of fully recurrent networks.
It is actually a simple extension of an adaptive algorithm for linear IIR lters developed by White
in 1975. We present a derivation here for the general architecture shown above.
We wish to minimize the cost function:
"X
#
K
1
T
J = 2E
e (k)e(k)
k=1
Where the expectation is over sequences of the form
(
(
(
(
(
(x(0); d(0));
(x(1); d(1));
(x2(0); d2(0)); (x2(1); d2(1));
(x3(0); d3(0)); : : :
..
.
(xP (0); dP (0)); : : :
215
: : :; (x(k); d(k))
)
: : :; (x2(k); d2(k)) )
(x3(k); d3(k)) )
..
.
)
(xP (k); dP (k)) )
Dene
e(k) =
(
d(k) ; x(k) if 9d(k)
0
else
(some outputs may have desired responses; others governed by internal states.) The update rule is
T
W = ; de (k)e(k) = 2eT (k) dx(k)
Then
dW
dW
=I
z =0
}| {
z}|{
dx(k) = @N (k) dx(k ; 1) + @N (k) dr(k ; 1) + @N (k) dW
@x(k{z; 1)} dW
@r(k ; 1) dW
@W
dW
{z
}
{z
}
| dW
|
|
rec
(k)
so that
(k)
Gx (k)
rec (k) = Gx (k)rec (k ; 1) + (k)
with
(k) 2 RNx Nx
Gx = @x@N
(k ; 1)
and
Then the update rule becomes
(k) 2 RNx NW :
= @N
@W
W = 2eT (k)rec (k):
Note that this is O(N 2M ) for M weights and N outputs. For a fully recurrent network this is
O(N 4), which is not very ecient.
To calculate the necessary Jacobian terms, we observe that,
dJ (k) = @J T (k) dN (k) = ;eT dN (k)
dW
@N (k) dW
dW
dJ (k) = @J T (k) dN (k) = ;eT dN (k)
dx
@N (k) dx(k)
dx
(k)
These are found by a single backpropagation of the error. To get dNdW(k) and dN
dx(k) , backprop unit
vectors e = [0 : : : 010 : : : 0]T .
3.1.2 Dynamic BP
This method by Narendra is an extension of RTRL.
r(k-1)
X(k)
X(k-1)
N
W(z)
216
X (k ; 1) = W (z)X (k)
(145)
where W (z ) is a Linear Operator (we should really use the notation W (q )).
Example 1: Full Feedback:
2 ;1
66 (z0 ) (z0;1) 00
) 64 0
0 (z ;1 )
0
0
0
0
0
0
(z ;1 )
32
1
77 66 X
X
75 64 X23
X4
3
77
75
Example2: Tap delay line on X1.
2 ;1
32
(z ) + (z ;2 ) + (z ;3 ) + (z ;4 ) 0 : : : : : :
X1
66
7
6
0
0 : : : : : : 77 66 X2
) 64
0
0 : : : : : : 5 4 X3
0 ::: :::
0
X4
3
77
75
The equation for DBP is the following modication of RTRL:
rec = dx(k + 1)
(146)
dW
rec (k) = Gx(k)W (z )rec (k) + (k)
(Note the mix of time-domain and \frequency" domain operators)
(147)
3.2 BPTT - Back Propagation through Time
(Werbos, Rumelhart et al, Jordan, Nguyen,...)
r(k-1)
X(k)
X(k-1)
N(X,r,W)
q
r(0)
r(1)
N
X(0)
-1
r(2)
N
X(1)
r(N-1)
N
X(2)
Just unravel in time to get one large network.
Backprop through the entire structure.
217
N
X(3)
X(N)
Error may be introduced at nal time step or at intermediate time steps
Store 4w for each stage ! average all of them together to get a single 4w for the entire
training trajectory.
3.2.1 Derivation by Lagrange Multipliers
We present a derivation of BPTT by Lagrange Multipliers, which is a common tool for solving
N-stage optimal control problems.
We want to
N
X
1 (d ; X )T (d ; X )
min
k
k
k
W 2 k=1 k
Subject to
X (k + 1) = N (X (k); r(k); W )
Note that W acts like the control input. Next, simply adjoin the constraint to get the modied
cost function:
N
NX
;1
X
J = 12 (dk ; Xk )T (dk ; Xk ) + Tk+1[N (k) ; Xk+1]
k=1
=
k=0
NX
;1
k=1
( 12 jdk ; Xk j2 + k+1 N (k) ; k Xk ) + 12 jdN ; XN j2 + T1 N (0) + TN XN
The rst constraint is that
@J
@X (k) = 0
For k = N , we get
For k 6= N , we get
N = ;(dN ; XN )T (Terminal Condition);
@N (k)
k = ;(dk ; Xk )T + Tk+1 @X
(k)
which is the BPTT equation that species to combine the current error with the error propagated
through the network at time k + 1.
The second optimality condition:
;1
@J = NX
T @N (k) = 0;
@W k=0 k+1 @W
which gives the weight update formula, and explicitly says to change 4W by averaging over all
time as we argued before.
Variations
N-Stage BPTT. Just backprop a xed N stages back. This allows for an on-line algorithm.
(If N is large enough, then gradient estimate will be sucient for convergence).
1-Stage BPTT. In other words, just ignore the feedback as we saw last lecture.
218
3.2.2 Diagrammatic Derivation of Gradient Algorithms:
The adjoint system resulting from the Lagrange approach species an algorithm for gradient calculations. Both the adjoint system and algorithm can be described using ow diagrams. It is possible
to circumvent all tedious algebra and go directly from a diagram of the neural network of interest
to the adjoint system. This method provides a simple framework for deriving BP, BPTT, and a
variety of other algorithms. Reference: Wan and Beaufays. (SHOW SLIDES)
3.2.3 How do RTRL & BPTT Relate
Both minimize same cost function.
They dier by when weights are actually updated.
We can use ow graph methods to relate the algorithms directly and argue that they are
simply \transposes" of each other. Reference: Beaufays & Wan, 1994- Neural Computation.
Other examples like this include emitting and receiving antennas, controller and observer
canonical forms, decimation in time/decimation in frequency for the FFT.
Misc. RTRL and BPTT are gradient based methods. Many second-order methods also exist which
try to estimate the Hessian in some way. Most ignore the feedback when actually estimating the
Hessian. A popular method with controls used the extended decoupled Kalman lter (EDKF)
(Feldkamp).
4 Training networks for Feedback Control
Returning to feedback control, we can now apply RTRL (or DBP), or BPTT to train the control
system. The indirect nature of the problem (where a plant model is required) is illustrated in the
gures.
For example, let's again assume availability of state information x:
If we assume we know the Jacobian for the plant or have trained a neural network emulator, the
system may be thought of as one big recurrent network which may be trained using either RTRL
or BPTT. For BPTT we visualize this as unfolding the network in time.
219
we simply treat every block as a Neural Network and backprop all the errors.
If we have a model with the output as a function of the states, y = h(X ):
r(k-1)
X(k-1)
G
X(k)
F
...
h
y(k)
e(k)
d(k)
For more complicated structures such NARMA models we still apply either BPTT or RTRL. For
example, Feldkamp's control structure uses a feedforward network with a tap delay line feed back
from the output along with locally recurrent connections at each layer.
4.1 Summary of methods for desired response
Regulators (steady-state controllers):
220
1. Model Reference (as seen before)
J = min
X
jjdk ; Xk jj2
Where dk come from some plausible model that we would like the system to emulate (e.g., a
2nd order linear model).
2. \Linear" Quadratic Regulator
J = min
X
(|XkT {z
Qk Xk}
U| T R{zk Uk )}
+
Weighting function Constraint on control effort
Eectively dk = 0 or some constant, and we weight in control eort. To modify either RTRL
or BPTT, simply view U and X as outputs:
u(k)
r(k-1)
X(k-1)
X(k)
N
P
q-1
@J = 2(U )T R ;! (error for U )
k
@Uk
@J = 2(X )T Q ;! (error for X )
k
@Xk
T
T
2X(k) Q
2u(k) R
X(k-1)
G
G
N
q+1
221
P
Final Time Problems (Terminal Control)
1. Constraint is only at nal time. Examples: Robotic arm, Truck backing.
J = min jjdN ; XN jj2
although one could add additional trajectory constraints (e.g. smoothness).
4.1.1 Video demos of Broom balancing and Truck backing
Comments:
Single broom
Double broom
Single Trailer
Double Trailer
Network Dimensions
4-1
6-10-1
4-25-1
5-25-1
222
223
224
225
226
227
228
4.1.2 Other Misc. Topics
Obstacle avoidance and navigation. See slides and video.
TOBPTT (time optimal): In above problems, the nal time N is unknown (or hand picked).
We can also try to optimize over N.
J = W;Nmin
E [(X; N )+
(Xo)
NX
(Xo)
k=0
L(X (k); U (k)]
where (X; N ) is the cost of nal time. The basic idea is to iteratively forward sweep,
followed by a backward error sweep, up to time a time N, which minimizes above cost for
current (Plumer, 1993).
Model Predictive control with neural networks also possible. (Pavillion uses this for process
control, also Mozer's energy control.)
4.1.3 Direct Stable Adaptive Control
Uses Radial Basis functions and Lyapunov method to adapt controller network without need for
plant model (Sanner and Slotine, 1992). This is very good work, and should be expanded on in
future notes.
4.1.4 Comments on Controllability / Observability / Stability
Controllability refers to the ability to drive the states of the systems from arbitrary initial conditions to arbitrary nal states using the control input. In linear systems, this implies state feedback
is capable of arbitrarily modifying the dynamics.
Observability refers to the ability to determine the internal states of the system given only measurements of the control signal and the measures plant output. For a linear system, this implies
we can estimate the states necessary for feedback control.
For a nonlinear systems:
The System X (k + 1) = f (X (k); U (k)) is controllable in some region of an equilibrium point,
if the linearized system is controllable.
The same holds for observability .
Global controllability and observability for nonlinear systems are extremely dicult prove. Proofs
do not currently exist that neural networks are capable of learning an optimal control scheme,
assuming controllability, or are capable of implementing global observers.
General proofs of stability, robustness, etc. are still lacking for neural control schemes.
229
5 Reinforcement Learning
The name \Reinforcement learning" comes from animal learning theory and refers to \occurrence
of an event in proper relation to a response, that tends to increase the probability that the response
will occur again in the same situation".
We will take two dierent cracks at presenting this material. The rst takes an historical
approach and then arrives at the dynamic programming perspective. Second, we will come back
for a more detailed coverage taking \approximate" dynamic programming as a starting place.
Learning can be classied as:
Supervised Learning:
{ output evaluation provided
{ desired response encoded in cost function.
Unsupervised Learning:
{ no desired response given
{ no evaluation of output provided.
Reinforcement Learning:
{ no explicit desired response is given
{ output coarsely evaluated (reward / punish).
Examples requiring reinforcement:
Playing checkers (win/lose)
Broom balancing (crashed/balanced)
Vehicle Navigation (ok/collision/docked).
Note we already showed how to solve some of these problems using BPTT based on the EulerLagrange approach. Reinforcement Learning is just another approach (based on DP).
Exploration and Reinforcement
In reinforcement learning, system must discover which outputs are best by trying dierent
actions and observing resulting sequence of rewards.
Exploration vs Exploitation dilemma:
{ Exploitation use current knowledge about environment and perform best action currently known
{ Exploration explore possible actions hoping to nd a new, more optimal strategy.
Exploration requires randomness in chosen action.
230
Σ
f
y
noise
Figure 35: Add randomness to a system - method 1
Σ
Random flip
Bernoulli 0/1
f
yi=+/- 1
equivalent
Σ
y
noise
Figure 36: Add randomness to a system - method 2
Adding randomness to a system (see Figure 35 and 36). For those gures: Prfyi = 1jwi ; xig = Pi .
Method 3 - Ising model
yi = X1 ;
si =
wij xj ;
Pr fyi = 1g = 1 + exp(1 ;2s ) ;
i
E [yi] = tanh(si ) ;
where is the temperature parameter.
We will see that the current view of reinforcement relates to approximate dynamic programming.
While true DP explicitly explores all possible states of the system, RL must use randomness to
\explore" some region around current location in state space. Sometimes strict adhesion to this
principle makes the job more dicult than it really is.
231
Basic reinforcement structure
Classes of Reinforcement Problems:
1. Unique credit assignment
one action output
reinforcement given at every step
reinforcement independent of earlier actions and states
state independent of earlier states and actions
2. Structural credit assignment
like (I) but involves multiple action units
how to distribute credit between multiple actions
3. Temporal credit assignment (game playing, control)
reinforcement not given on every step
state dependent on earlier states and actions
how to distribute credit across sequence of actions
Class (I), (II) are easy. Class (III) is more challenging, and is what current research deals with.
Class(I) - Simple credit assignment Example:
1. Example: Adaptive Equalizers (see Figure below)
s
Z-1
Z-1
Z-1
w
\reward" every decision
w = k x(k ; i) ;
and k = sign(sk ) ; sk
232
yk
\reinforces" response for next time
if we start out fair response, this method continues to get better.
Reward-Penalty:
(
yi (t) if r(t) = +1
;yi(t) if r(t) = ;1
dene error = yi (t); < yi (t) >
Here < yi (t) > is the expectation for stochastic units and Pi for Bernoulli units. This says to
\psuedo" desired di (t) =
make the current output more like the average output. Then we have
w = r(yi ; < yi (t) >)X :
2. Black Jack - Early example of ARP (1973 Widrow, Gupta, Maitra IEEE SMC-3(5):455-465)
Temporal problem but temporal credit problem ignored
Also called \selective boot strap"
Particular encoding makes optimal strategy linearly separable
Punish or reward all decisions equally
Weak punish / strong reward -> better convergence
Works because wins tend to result from many good decisions.
Class(II) - Structural Credit Assignment.
233
Train a model to predict E [r] for given (x,a)
Backprop through environment model to get action errors
Like plant model in control
Class(III) - Temporal Credit Assignment.
Adaptive Heuristic Critic
Critic synthesizes past reinforcements to provide current heuristic reinforcement signal to actor.
Samuel's checker programs:
Used a method to improve a heuristic evaluation function of the game of checkers during play
Best evaluation for a given move -> win/lose sequence used to determine next move
Updated board evaluations by comparing an evaluation of the current board position with
an evaluation of a board position likely to arise later in the game (make current evaluation
look like nal outcome - basic temporal credit assignment)
Improved to evaluate long term consequences of moves (tries to become optimal evaluation
function)
In one version, evaluation was a weighted sum of numerical features, update based on error
between current evaluation and predicted board position.
5.1 Temporal Credit Assignment
5.1.1 Adaptive Critic (Actor - Critic)
Adaptive critic seeks to estimate a measure of discounted, cumulative future reinforcement.
Many problems only involve reinforcement after many decisions are made.
Critic provides a step by step heuristic reinforcement signal to action network at each time
step.
The actor is just the RL terminology for the controller.
234
5.1.2 TD algorithm
Re-popularized reinforcement learning. (1983 - Sutton, Barto, Anderson)
ACE (Adaptive Critic Element) or called ASE (Associative Search Element), which is the controller.
81
9
<X t
=
V (t0) = E : r(t) j w; x(t);
t=t
0
This gives the overall utility of state x(t) given current actor W with action a = N (x; W ), where
N may be a Neural Network actor). (Notation: V = J in Chapter on DP)
Critic
1. V (t) is unknown. The critic builds an estimate V^ (t) by interaction with the environment.
2. The critic also provide an immediate heuristic reinforcement to the actor.
(t + 1) = r| (t + 1) + V^{z(t + 1) ; V^ (t)}
temporal difference
This is immediate reinforcement with temporal credit assignment.
Discounted immediate return by taking action a.
if our action takes us to a better utility then reward.
otherwise punish.
(along optimal (t + 1) = 0, assuming no randomness).
Actor Weight Update (controller)
Dene a \weight eligibility" trace.
e(t) (not error) - analogous to (d - y)x in LMS (y is the actor a).
But don't have d, use error as before.
e(t) = (a(t); < a(t) >)x
Averaged eligibility trace
e(t) = e(t) + (1 ; )e(t)
Wa (t + 1) = a (t + 1)eij (t)
(Similar form to immediate reinforcement seen before).
235
Comment
If using Net, we need to Backprop terms through network to maintain eligibility traces for each
weight vector.
Where to get V^ - train critic
TD method
V^ = ACE (x(t); Wc(t))
where the ACE may be a Neural Net or table lookup (box-representation). Tries to predict V based on only the current state.
Consider nding Wc to minimize (V ; V^ )2
Wc = c (V ; V^t) r
| {zw V^}
BP term
where V is the actual future utility.
But we still need the true V.
Total weight update is
m
m
X
X
W = Wt = (V ; V^t)rw V^t
But,
t=1
t=1
V ; V^t =
Assuming V^m+1 = V
(sum of individual dierences).
W =
m
X
(V^k+1 ; V^k )
k =t
X X^
( Vk+1 ; V^k )rw V^t
t
k
m X
m
X
=
(V^k+1 ; V^k )rwV^t
k=1 t=1
Rearranging terms and changing indices (k $ t),
W =
m
t
X
X
(V^t+1 ; V^t) rw V^k
t=0
k|=1 {z }
sum of all previous
gradient vectors.
.
236
But this can be done incrementally.
X
Wt = (V^t+1 ; V^t ) rw V^k
t
k=1
Variations -
Discount previous gradient estimates (like RLS).
t
X ^
X
rw Vk ;! t;k
k=1
implemented as
x for single neuron
"t = rw V^t + "t;1
(TD())
Here's a second (easier) argument for TD():
Assume discounted future sum
V ;! V (t) =
Assume V = V^ (t + 1)
^
r
| w{zVk}
1
X
k=0
"0 = 0
tr(t + k + 1)
V (t) ; V^ (t) = rt+1 + V (t + 1) ; V^ (t)
= rt+1 + V^ (t + 1) ; V^ (t) = (t + 1)
Summarizing, the TD algorithm is expressed as :
Wt critic = c (t + 1)
t
X
k=1
t;k rw V^k ;! ACE
Wt actor = a (t + 1)e(t) ;! ASE
Alternative argument: by Bellman's optimality condition,
V (t) = max
fr(t + 1) + V (t + 1)g
a(t)
Let V ;! V^ and ignore probability.
Error for critic with output V^ is
r(t + 1) + V^ (t + 1) ; V^ (t) = (t + 1)
237
5.2 Broom Balancing using Adaptive Critic
1983 Barto, Sutton, Anderson.
Figures from: C.Anderson "Learning to Control an Inverted Pendulum using neural networks",
IEEE Control System Mag., April 1989.
Using an ML net to produce non-linear critic.
Quantizing input space and - use 0/1 representation for each bin. Binary inputs combined using
single layer.
238
(
+1
;1
(
;1
reward r =
;1
action a =
:
:
:
:
left 10 Newton
right 10 Newton
punish if j j> 12 or j h j> 2:4m
nothing otherwise
output neurons are Bernoulli initialized to Pi = 0 (W=0)
\Backprop" used to update hidden units. Uses i computed at output based on pseudo-
targets. 1983 version used single layer (1 neuron).
trained on 10,000 balancing cycles.
instead of always starting from = 0, h = 0, each run was initialized to random position.
This ensures rich set of training positions.
M.C. has no temporal info.
Single layer not sucient.
239
5.3 Dynamic Programming Perspective
V (t) = E
(X
1
t=0
tr(x(t); a(t)) j x
0
= x(t); W
)
where W represents the current actor parameters.
Bellman's Optimality Equation from Dynamic Programming.
2
3
X
4r(x(t); a(t)) + V (x) = max
Prob(x(t + 1) j x(t); a(t))V (x(t + 1))5
a(t)
x(t+1)
Note
V (x) = max
V (x; W )
W
only if the model we choose to parameterize V by is powerful enough.
Aside :
V ! V^ , ignore probability, assume actor chooses a \good" action -
r(t + 1) + V^ (t + 1) ; V^ (t) = (t + 1)
which is the immediate reinforcement signal for the controller, Alternatively, we can solve directly
for the control action by:
h
a = arg max
r(xt; at) + a(t)
X
i
P (xt+1 j xt; at)V (xt+1)
Diculty is that P as well as V must be known (need some model to do this correctly).
We thus see that we don't really need a network for the controller. However, the role of actor
to do exploration. Also, it may be advantageous during training for helping \smooth" the search,
240
and for providing \reasonable" control constrained by the parameterization of the controller.
Policy Iteration - a parameterized controller is maintained.
Value Iteration - Only the \value" function (critic) is learned, and the control is inferred from it.
5.3.1 Heuristic DP (HDP)
(Werbos and others)
first term in TD algorithm (just top branch)
Xk
^
Vk
Critic
^V
error
+
desired
BP this direction
also
Xk
ak
Action
r
r
k
z
+1
+
Xk
γ
Plant
Model
X k+1
Critic
^
V
^
V k+1
environment
dV^ to actually update networks.
Note that we only use critic to get dx
k
Dual Heuristic Programming (DHP) Updates an estimate
directly.
dV^
^ (x; w) = dx
k
241
^
V k+1
should be V
*
5.4 Q-Learning
(Watkins)
Notation: V W or QW ;! where W is the current critics parameters.
We showed,
2
3
X
4
V = max
Prob(x(t + 1) j xt ; at)V (x(t + 1))5
at r(xt; at) + xt+1
Dene
X
QW (xt; at) = r(xt; at) + Bellman's Equation in Q alone
xt+1
P (xt+1 j xt; at )V W (xt+1)
V = max [Q (x; a)]
a
X
Q(xt; at) = r(xt; at) + P (x(t + 1) j xt ; at) max
a Q (xt+1 ; at+1
t+1
at+1
Note - max is only on the last term, thus given
a = arg max
a [Q (x; a)]
without the need for a model !
Q-Learning Algorithm :
^
Q^ (xt; at) ; Q^(xt; at)(1 ; t) + t rt + max
at Q(xt+1; at+1 )
+1
where
0 < t < 1;
1
X
t=1
t = 1;
t2 < 1
Q^ ;! Q with probability 1 if all pairs are tried innitely often.
Look up table - grows exponentially in size.
1. choose action a (with some exploration) and xt+1 is resulting state.
2. update Q^
3. repeat
nal controller action is maxa Q^ (x; a) or an approximation.
A Neural Network may also be used to approximate the controller:
x
NN
a
242
^
Q
^
Q^des = rt + max
at Q(xt+1 ; at+1)
+1
No proof for convergence.
A related algorithm is Action Dependent HDP (ADHDP) - Werbos. Uses action network output
as best at+1 - just like HDP but critic also has at as input.
5.5 Model Dependant Variations
TD, Q-Learning are model free, i.e. they don't use Prob(xt+1 j xt ; at).
Several methods exist which explicitly use knowledge of a model. These are referred to as \Real
Time DP".
Example : Backgammon - Sutton's \Dyna" architecture (could also apply it to Broom Balancing).
Methods are based on approximation :
2
3
X
4
Vt+1 = max
Prob(x(t + 1) j xt ; at)Vt (xt+1)5
a r(xt; at) + xt+1
(with some reasonable restrictions based on current states).
Vt is iteratively updated in this manner.
(TD and Q-Learning can be shown to be approximations to the RTDP methods).
5.6 Relating BPTT and Reinforcement Learning
We have seen two approaches to neural control problems which optimize a cost function over time,
BPTT and RL. How are these related? BPTT is based on the calculus of variations or EulerLagrange approach, while RL is based on DP. EL and DP were our two main frameworks for
solving optimal control problems. Recall also the relation-ship derived using the HJB equations.
The inuence function (k) = @J =@u(k), or equivalently (k) = @V =@u(k). In words, the delta
terms found with backpropagation-through-time can be equated to the partial derivative of the
value function.
5.7 LQR and Reinforcement Learning
RL has the reputation of often being excruciatingly slow. However, for some problems we can help
the approach out by knowing something about the structure of the value function V . For example,
in LQR we have shown a number of times that the optimal LQ cost is quadratic in the states. In
243
fact the V function is simply the solution of a Ricatti equation. However, rather than algebraically
solving for the Ricatti equation we can use RL approaches to learn the quadratic function. A nice
illustration of this is given by Stefan Schaal at Georgia Tech., where learning pendulum balancing
is achieved after a single demonstration (SHOW TAPE on Anthropomorphic Robot Experiments).
6 Reinforcement Learning II
(Alex T. Nelson)
6.1 Review
Immediate RL: r(xt; at) is an evaluation (or reinforcement) of action at and state xt, and is
available at time t.
Delayed RL: a single value of r is available for an entire sequence of actions and states f(xt; at)g
Typical Delayed RL methods maintain an approximation, V^ of the optimal value function V , and
use Immediate RL methods as follows: If a : x 7! y , then the dierence, V^ (y ) ; V^ (x) provides an
immediate evaluation of the pair (x; a).
6.2 Immediate RL
We want to nd parameters w such that:
r(x; ^ (x; w)) = r(x)
8x 2 X ;
where r (x) = maxa r(x; a) is the reinforcement for the optimal action at state x, and where ^ (x; w)
represents the action policy parameterized by w. In other words, we want to nd a policy which
will maximize the reinforcement.
Unlike supervised learning, we don't have directed error information:
= ^ (x; w) ; (x)
So . . .
1. Have to obtain weight change some other way.
2. Also, we don't know when we've converged to the correct mapping.
6.2.1 Finite Action Set
A = fa1; a2; a3; : : :; amg
Devise two functions, g (:) : X 7! <m , and M (:) : <m 7! A.
g(:) produces a merit vector for the possible actions.
M (:) selects the action with the highest merit:
^(x; w) = M (g(x; w))
Since M (:) is predened, we only need to learn g(:)
244
Approximate the optimal reinforcement with r^(x; v ) r (x), where we write the training rule
for the parameters v in shorthand:
r^(x; v) := r^(x; v ) + (r(x; a) ; r^(x; v ))
Notice that r^(x) will approximate E [r(x; a)jx], but that this value will converge to r (x) if a is
chosen with the max selector, M (:).
The update for the merit function (kth action) is:
gk (x; w) := gk (x; w) + (r(x; ak ) ; r^(x; v )):
Problems:
1. at each x, need to evaluate all members of A
2. maximum selector M(.) doesn't allow for exploration
So:
1. use stochastic action policy
2. decrease stochasticity with time (simulated annealing)
Also:
1. decrease as r^ ! r
2. randomize initial conditions over many trials.
6.2.2 Continuous Actions
A<
Merit vectors are no longer suitable, so we need to adapt the action policy directly. There are
two basic approaches.
Stochastic: A perturbation is added for exploration.
^ (:) = h(:; w) + ;
where N (0; 2) The learning rule for the function approximator is:
h(x; w) := h(x; w) + [r(x; a) ; r^(x; v )]
Which increases the probability of choosing actions with optimal reinforcement.
Deterministic: If r is dierentiable,
^ (x; w) := ^ (x; w) + @r(x; a)
a
If r is unknown, use r^.
The stochastic method does not suer from local maxima problems, whereas the deterministic
method is expected to be much faster because it searches systematically.
245
6.3 Delayed RL
6.3.1 Finite States and Action Sets
Assume Markovian and stationary stochastic dynamic systems; i.e.,
P (xT +1 = y jf(xt; at)gT0 ) = P (xT +1 = y jxT ; aT );
which is just Pxy (a), the probability of moving to state y from state x via action a.
Value Functions: We begin by dening the value of a policy at state x:
#
V (x ) = E
tr(xt; (xt)) x0 = x ;
t=0
"X
1
which can be interpreted as:
V (x) =
lim E
N !1
" NX;1
t=0
tr(xt; (xt)) x0 = x
#
:
2 [0; 1) is a discount factor on future rewards. We can now dene the optimal value function
in terms of V (x):
V (x) = max
8x
V (x)
Optimal Policy: Dene optimal policy, as:
which is to say,
(x) = arg max
V (x)
V (x) = V (x)
8x 2 X ;
8x 2 X :
Using Bellman's optimality equation,
2
3
X
4r(x; a) + Pxy (a)V (y)5 ;
V (x) = amax
2A(x)
y2X
and provided that V is known, we have a mechanism for nding the optimal policy:
2
3
X
4r(x; a) + Pxy (a)V (y)5
(x) = arg amax
2A(x)
y2X
Unfortunately, this requires the system model, Pxy (a) in addition to our approximation of
V .
Q Functions: The idea is to incorporate Pxy (a) into the approximation. Dene
Q (x; a) = r(x; a) + X
Pxy (a)V (y )
so Q = Q =) V (x) = maxa[Q (x; a)], and
(x) = arg max
a [Q (x; a)]
246
6.4 Methods of Estimating V (and Q )
6.4.1 n-Step Truncated Return
This is a simple approximation of the value function
V [n] (x) =
nX
;1
=0
V^ (x; v ) = E [V [n] (x)]
r
V^ (x; v ) ! V (x) uniformly as n ! 1, but note that the variance of V [n] (x) increases with n.
Problem: computing E [].
Solutions:
1. use V^ (x; v ) = V [n] (x) (high variance for high n)
2. use V^ (x; v ) := V^ (x; v ) + [V [n] (x) ; V^ (x; v )] and iterate over many trials.
6.4.2 Corrected n-Step Truncated Return
We can take advantage of having an approximation of V on hand by adding a second term to the
n-step truncated return:
V (n)(x) =
Approximating E[.]:
nX
;1
=0
r + nV^ (xn; vold)
V^ (x; vnew ) = E [V (n)(x)]
1. use V^ (x; v ) := V^ (x; v ) + [V (n) (x) ; V^ (x; v )]
6.4.3 Which is Better: Corrected or Uncorrected?
Known Model Case, Using Expectation
1. Uncorrected convergence:
n
max jV^ (x; v ) ; V (x)j 1 r;max
2. Corrected convergence:
^
max jV^ (x; vnew ) ; V (x)j n max
x jV (x; vold) ; V (x)j
This is a tighter bound.
Model Unavailable, or Expectation Too Expensive
1. If V^ is good, use Corrected and small n. Using Uncorrected would require large n,
resulting in higher variance.
2. If V^ is poor, both require large n, and the dierence is negligible.
V (n) is better
We need a way of choosing truncation length n dynamically, depending on the goodness of
V^ .
247
6.4.4 Temporal Dierence Methods
Consider a geometric average of V (n)'s, over all n.
1
X
V (x) = (1 ; ) n;1 V (n)(x)
n=1
= (1 ; )(V (1) + V (2) + 2V (3) + )
Using our denition of V (n) and expanding gives:
V (x) = r0 + (1 ; )V^ (x1; v) + [r1 + (1 ; )V^ (x2; v) +
[r2 + (1 ; )V^ (x3; v ) + which (using r0 = r(x; (x)) ) can be computed recursively as
(
V (xt) = r(xt; (xt)) + (1 ; )V^ (xt+1 ; v) + V (xt+1 )
xt
:= xt+1
Note that:
= 0 =) V 0 (x) = r0 + V^ (x; v) = V (1)(x)
= 1 =) V 1 (x) = r0 + V 1(x1 )
=) V 1 (x) = r0 + (r1 + V 1 (x2)) = V (1) (x)
=) V 1 (x) = V (x)
Idea: vary approximate truncation length, n, from 0 to 1 by starting with = 1 and letting
! 0 as V^ improves.
Problem: V takes forever to compute because of the innite sum over n.
Solution: approximate V on-line. Replace the corrected n-step estimate with:
V^ (x; v ) := V^ (x; v ) + [V (x) ; V^ (x; v)]:
(148)
Now dene the temporal dierence as the dierence between the new prediction of the value
and the old prediction:
z new}|pred. { old
z }|pred.{
(x) = r(x; (x)) + V^ (x1; v ) ; V^ (x; v ) :
The second term in equation 148 now can be written as:
[V (x) ; V^ (x; v )] = (x) + ()(x1) + ()2(x2 ) + ;
and we can approximate equation 148 with the online update:
V^ (x; v ) := V^ (x; v ) + () (x )
if is small.
248
(149)
Note that equation 148 and 149 update V^ for only a single state. We can update several
states by introducing the notion of eligibility.
8
>
if x never visited
<0
e(x; t) = > e(x; t ; 1)
if x 6= x (not currently visited)
: 1 + e(x; t ; 1) if x = xtt (currently visited)
This results in an update equation for all states:
V^ (x; v ) := V^ (x; v ) + e(x; t)(xt)
8x 2 X
(150)
Notes:
1. Reset e's before each trial.
2. Decrease during training
3. Basically the same for Q function
6.5 Relating Delayed-RL to DP Methods
Delayed Reinf. Learning
Dynamic Programming
works on-line
uses subset of X
can be model-free
time-varying
works o-line
uses entire state-space X
requires a model
stationary
There are two popular DP methods: value iteration and policy iteration. Both of these have
spawned various dRL counterparts. There are two categories of delayed RL methods: model-based
and model-free. Model-based methods have direct links to DP methods. Model-free methods are
essentially modications of the model-based methods.
6.6 Value Iteration Methods
Original DP Method: The basic idea is to compute the optimal value function as the limit of
nite-horizon optimal value functions:
V (x) = nlim
!1 Vn (x):
To this end, we can use the recursion
"
X
Vn+1 (x) = max
r
(
x;
a
)
+
Pxy (a)Vn(y)
a2A
y
#
8x
to compute VN for a large number N . The policy associated with any given value function
(say, Vn after an arbitrary number of iterations) is:
"
(x) = arg max
r(x; a) + a2A
X
y
#
Pxy (a)V (y )
n
8x
Problem: X can be very large, making value iteration prohibitively expensive.
249
Solution: We can update Vn(x) on a subset Bn X of the state-space.
Real Time Dynamic Programming: RTDP is a model-based dRL method that does this on-
line (n is system time) and Bn includes the temporal neighbors of xn .
Q-Learning: this is a model-free dRL approach to value-iteration. As described in section 3, the
idea is to incorporate the system model into the approximation. Again the update rule for
the Q-function is:
X
Q^ (x; a; v ) := r(x; a) + Pxy max Q^ (y; b; v):
(151)
y
b2A(y)
However, this update relies on a system model. We can avoid this by looking only at the
Q-function of the state that the system actually transitions to (say, x1 ):
Q^(x; a; v) := r(x; a) + max Q^ (x1; b; v ):
(152)
b2A(x1 )
To approximate the averaging eect found in equation 151, we can use an averaging update
rule:
"
#
^
^
^
Q(x; a; v) := (1 ; )Q(x; a; v) + r(x; a) + b2A
max
Q(x1; b; v ) :
(153)
(x )
1
The Q-learning algorithm consists of repeatedly (1) choosing an action a 2 A, (2) applying
equation 153, and (3) updating the state x := x1 .
The choice of action in step(1) can be greedy with respect to Q^ , or can be made according
to some stochastic policy inuenced by Q^ , to allow for better exploration.
6.6.1 Policy Iteration Methods
Whereas value iteration methods maintain an explicit representation of the value function, and
implements a policy only in association with the value function, policy iteration methods maintain
an explicit representation of the policy.
Original DP Method: The idea is that if the Q-function for a policy is known, than we can
improve on it in the following way:
(x) = arg amax
Q (x; a)
2A(x)
8x;
(154)
so that the new policy is at least as good as the old policy . This is because the Q (x; a)
gives the value of rst choosing action a, and then following policy . The steps of policy
iteration are:
1. Start with a random policy and compute V .
2. Compute Q .
3. Find using equation 154.
4. Compute V .
5. Set := and V := V .
6. Go to step 2 until V = V in step 4.
Problem: It is very expensive to recompute V at each iteration. The following method
gives an inexpensive approximation.
250
Actor-Critic: This is a model-free dRL method (there isn't a model-based dRL counterpart to
policy iteration). The \actor" is a stochastic policy generator, and the \critic" is a valuefunction estimator. The algorithm is as follows:
1. Select action a stochastically according to the current policy, a = (x).
2. (x) = r(x; a) + V^ (x1; v ) ; V^ (x; v ) serves as the immediate reinforcement.
3. Update the value estimator, V^ (x; v ) := V^ (x; v ) + (x).
4. Update the deterministic part of the policy generator, gk (x; w) := gk (x; w) + (x)
where k 3 a = ak is the index of the action selected.
5. Repeat.
6.6.2 Continuous Action Spaces
So far the assumption has been that there are only a nite number of actions for the policy generator
to choose from. If we now consider the case of continuous action spaces, the problem becomes much
more dicult. For example, even value-iteration methods must maintain a policy generator because
the max operation becomes nontrivial. The general approach is to update the policy generator by
taking the partial derivative of the Q-function (or Q^ ) with respect to action a. This is a gradient
ascent approach, which adjusts the policy in the direction of increasing utility; it is comparable to
the immediate RL methods for continuous action spaces.
7 Selected References on Neural Control
(Books)
1. Reinforcement Learning, R. Sutton, A. Barto, MIT Press, 1998.
2. Neural Networks for Control, edited by W.T Miller, R. Sutton, P. Werbos. MIT Press, 1991.
3. Handbook of intelligent control : fuzzy, neural, and adaptive approaches, edited by David A.
White, Donald A. Sofge, Van Nostrand Reinhold, 1992.
4. Neuro-dynamic programming, Dimitri P. Bertsekas and John N. Tsitsiklis, Athena Scientic,
Belmont, 1997.
(Articles)
1. A. Barto, R. Sutton, A. Anderson., Neuronlike elements that can solve dicult learning
control problems. IEEE Transactions Systems, Man, and Cybernetics, 40:201-211, 1981.
2. S. Keerthi, B. Ravindran. A tutorial Survey of Reinforcement Learning. Available by request
to ravi@chanakya.csa.iisc.ernet.in.
3. Narendra, K, and Parthasarathy, K. 1990. Identication and control of dynamic systems
using neural networks. IEEE Trans. on Neural Networks, vol. 1, no. 1, pages 4-27.
4. Narendra, K, and Mukhopadhyay, S. Adaptive Control of Nonlinear Multivariable Systems
Using Neural Networks. Neural Networks, Vol. 7, No. 5., 1994.
251
5. Nguyen, D., and Widrow, B. 1989. The truck backer-upper: an example of self-learning in
neural networks, In Proceedings of the International Joint Conference on Neural Networks,
II, Washington, DC, pages 357-363.
6. Plumer, E. 1993. Optimal Terminal Control Using Feedforward Neural Networks. Ph.D.
dissertation. Stanford University.
7. G. Puskorius and L. Feldkamp. Neural control of Nonlinear Dynamic Systems with the
Kalman Filter Trained Recurrent Networks. IEEE Transactions on Neural Networks, Vol. 5.,
No. 2, 1994.
8. R. Sanner, J. Slotine. Gaussian Networks for Direct Adaptive Control. IEEE Transaction on
Neural Networks, Vol. 3., No. 6., Nov. 1992.
9. Williams, R., and Zipser D. 1989. A learning algorithm for continually running fully recurrent
neural networks. Neural Computations, vol. 1, pages 270-280.
10. Wan, E., and Beaufays, F. 1994. Network Reciprocity: A Simple Approach to Derive Gradient Algorithms for Arbitrary Neural Network Structures. WCNN'94, San Diego, CA., vol
III, June 6-9, pages 87-93.
8 Videos
(List of Videos shown in class)
1.
2.
3.
4.
5.
6.
ALVINN - Dean Pomerleau, CMU
Intelligent Arc Furnace - Neural Applications corp,
Trucks and Brooms - Bernard Widrow, Stanford University,
Artical Fishes - Demitri Terzopolous, Toronto.
Anthropomorphic Robot Experiments - Stefan Schaal, Georgia Tech.
Catching - Slotine.
252
Part IX
Fuzzy Logic & Control
1 Fuzzy Systems - Overview
Fuzzy Methods are rmly grounded in the mathematics of Fuzzy Logic developed by Lofti
Zadeh (1965).
Very popular in Japan (great buzz-word). Used in many products as decision making elements
and controllers.
Allows engineer to quickly develop a smooth controller from an intuitive understanding of
problem and engineering heuristics.
Are dierent from and complementary to neural networks.
Possible over-zealous application to unsuitable problems.
Typically the intuitive understanding is in the form of inference rules based on linguistic variables.
Rule based
An example might be a temperature controller: If the break temperature is HOT and the
speed is NOT VERY FAST, then break SLIGHTLY DECREASED.
A mathematical way to represent vagueness.
Dierent from Probability Theory.
An easy method to design adequate control system.
When to use Fuzzy Logic:
{ When one or more of the control variables are continuous.
{ When an adequate mathematical model of the system does not exist or is to expensive
to use.
{ When the goal cannot be expressed easily in terms of equations.
{ When a good, but not necessarily optimal, solution is needed.
{ When the public is still fascinated with it!
2 Sample Commercial Applications
Fujitec / Toshiba - Elevator : Evaluates passenger trac, reduces wait time.
Maruman Golf - Diagnoses golf swing to choose best clubs.
Sanyo Fisher / Canon - Camcorder focus control: Chooses best lighting and focus with many
subjects.
253
Matsushita - Washing Machine : Senses quality / quantity of dirt, load size, fabric to adjust
cycle. Vacuum Cleaner : Senses oor condition and dust quantity to adjust motor. Hot water
heater : Adjusts heating element by sensing water temperature and usage rate.
Mitsubishi - Air conditioner : Determines optimum operating level to avoid on/ o cycles.
Sony - Television : Adjusts screen brightness, color, contrast. Computer : Handwritten OCR
of 3000 Japanese characters.
Subaru - Auto Transmission: Senses driving style and engine load to select best gear ratio. (
Continuum of possible gears!)
Saturn - Fuzzy Transmission.
Yamaichi Securities - Stocks: Manage stock portfolios (manages $350 million).
3 Regular Set Theory
We dene a `set' as a collection of elements. Example:
A = fx1; x2; x3g
B = fx3; x4; x5g
We can thus say that a certain thing is an element of a set, or is a member of a set.
x1 2 A ; x3 2 B
We can also speak of `intersection' and `union' of sets.
A [ B = fx1; x2; x3; x4; x5g
A \ B = fx3 g
\Fuzzy people" call the set of all possible \things" (all X) the universe of discourse.
The complement is A = fxi j xi 2= Ag
\Membership" is a binary operation. either x 2 A or x 2= A.
X
A
A
A
B
254
4 Fuzzy Logic
In Fuzzy Logic we allow membership to occur in varying degrees.
Instead of listing elements, we must also dene their degree of membership .
A membership function is specied for each set.
fA (x) : X ! [0; 1]
degree of membership
Example:
fA
X
X ! range of possible members or input values, typically subset of R.
If fA (x) = f0; 1g then we have regular set theory.
Typically we describe a set simply by drawing the activation function (membership function).
A
B
1
fA (1:0) = 0:4; fB (1:0) = 0:1
1.0 `belongs' to sets A and B in diering amounts.
4.1 Denitions
Fuzzy Logic developed by Zadeh shows that all the normal set operations can be given precise
denitions such that normal properties of set theory still hold.
Empty set
A = if fA (x) = 0 8 x X
Equality
A = B if fA (x) = fB (x) 8 x X
255
Complement
Ac has fAc = 1 ; fA (x)
Containment or Subset
A B if fA (x) fB (x) 8 x X
Union
fA[B (x) = max [fA (x); fB(x)] 8 x X
= fA [ fB
This is equivalent to the `Smallest set containing both A and B'.
Intersection
fA\B (x) = min [fA(x); fB (x)] 8 X
= fA \ fB
This is the `Largest set contained in both A and B'.
4.2 Properties
A number of properties (set theoretical and algebraic) have been shown for Fuzzy Sets. A
few are :
Associative
(A \ B ) \ C = A \ (B \ C )
(A [ B ) [ C = A [ (B [ C )
Distributive
C \ (A [ B) = (C \ A) [ (C \ B)
C [ (A \ B ) = (C [ A) \ (C [ B)
De Morgan
(A [ B )c = Ac \ B c
(A \ B )c = Ac [ B c
256
4.3 Fuzzy Relations
We can also have Fuzzy relations of multiple variables.
In regular Set theory, a `relation' is a set of ordered pairs i.e,
A = x y = f(5; 4); (2; 1); (3; 0); : : : etcg
We can have Fuzzy relations by assigning a `degree of correctness" to the Logical statement
and dene membership functions in terms of two variables.
fA (10; 9) = 0; fA (100; 10) = 0:7; fA (100; 1) = 1:0
4.4 Boolean Functions
In a real problem, x would represent a real variable (temperature) and each set a Linguistic Premise
(room is cold).
Example, let t and P be variables representing actual temperature and pressure.
{ Let A be the premise \temperature is hot"
{ Let B be \pressure is normal".
{ Assume the following membership functions.
1
1
A
B
0
t
0
p
"temperature hot AND pressure normal" is a fuzzy relation given by the set
fA AND Bg = C1 = fA \ Bg
fC (t; P ) = min [fA(t); fB (P )] 8 t, P
1
For OR,
fA OR Bg = C2 = fA [ Bg
fC (t; P ) = max [fA (t); fB (P )] 8 t,P
2
For NOT,
fNOT Ag = C3 = Ac
fC (t; P ) = 1 ; fA (t) 8 t
3
Note, the resulting fuzzy relation is also a fuzzy set dened by a membership function.
257
4.5 Other Denitions
Support and cross over.
.5
{
cross-over
point.
support
Singleton
x
4.6 Hedges
Modiers: not, very, extremely, about, generally, somewhat..
very tall
Tall
n
U A ← U [X]
A
4
5.5
n >1
6
definitely tall
somewhat tall
4
5.5
6
4
258
5.5
6
5 Fuzzy Systems
Fuzzier
{ Each Sensor input is compared to a set of possible linguistic variables to determine
membership.
{ Provides \Crisp" values
Warm
.25
22 degrees
Cool
.67
22 degrees
Process Rules (Fuzzy Logic)
{ Each rule has an antecedent statement with an associated fuzzy relation. Degree of
membership for antecedent is computed. - Fuzzy Logic.
{ IF (ANTECEDENT) THEN (CONSEQUENCE).
259
Defuzzier
{ Each rule has an associated action (consequence) which is also a fuzzy set or linguistic
{
{
{
{
variable such as 'turn right slightly'.
Defuzzication creates a combined action fuzzy set which is the intersection of output
sets for each rule, weighted by the degree of membership for antecedent of the rule.
Converts output fuzzy set to a single control value.
This defuzzication is not part of the 'mathematical fuzzy logic' and various strategies
are possible.
We introduce several Defuzzication possibilities later.
Design Process - Control designer must do three things:
1. Decide on appropriate linguistic quantities for inputs and outputs.
2. Dene a membership function (fuzzy sets) for each linguistic quantity.
3. Dene the inference rules.
5.1 A simple example of process control.
Sensor Inputs : 1. Temperature, 2. Pressure.
Actuator Outputs: 1. Value setting.
We must rst decide on linguistic variables and membership functions for fuzzy sets.
Next, we choose some rules of inference
if P = H and t = C then V = G (1)
260
if P = H and t = W then V = L (2)
if P = N then V = Z (3).
This is the same as
if P = N and "1" then V = Z
\Karnaugh Map"
Fuzzify - Infer - Defuzzify
261
5.2 Some Strategies for Defuzzication
Center of Mass
If B ? is the composite output Fuzzy set,
R yf ? (y) dy
y0 = R f B? (y) dy
B
This is probably the best solution, and best motivated from fuzzy logic, but is hard to
compute.
Peak Value
?
y0 = arg max
y fB (y )
This may not be unique.
Weighted Maximums
For each rule there is a Bi and an activation degree for antecedent wi.
y fBi (y )
y0 = wi max
w
i
functions of inputs.
Can also have outputs be dierent functions of the original inputs.
y0 = wi i(w1 : : :n )
i
5.3 Variations on Inference
Inference
{ Mamdani - minimum
{ Larson - product
Composition
{ Maximum (traditional method)
{ Sum (weighted and/or normalized)
262
263
5.4 7 Rules for the Broom Balancer
5.5 Summary Comments on Basic Fuzzy Systems
Easy to implement (if number of inputs are small).
Based on rules which may relate directly to understanding of problem.
Produces smooth mapping automatically.
May not be optimal.
Based on designer's heuristic understanding of the solution.
Designer may have no idea what inference rules may be.
Extremely tedious for large inputs.
Often a conventional controller will work better.
Cautions: There is a lot of over-zealousness about Fuzzy Logic (buzz - word hype). There is
nothing magical about these systems. If conventional controller works well, don't toss it out.
264
6 Adaptive Fuzzy Systems
Adaptive fuzzy systems use various ad hoc techniques which attempt to:
Modify shape and location of fuzzy sets.
Combine or delete unnecessary sets and rules.
Use unsupervised clustering techniques.
Use \neural" techniques to adapt various parameters of the fuzzy system.
6.1 Fuzzy Clustering
6.1.1 Fuzzy logic viewed as a mapping
Y
X
Y
X
Example: If X is NS then Y is PS.
265
6.1.2 Adaptive Product Space Clustering
Y
MF
X
Vector quantization - Unsupervised Learning.
Widths dened by covariance.
Gaussian Membership Functions Radial Basis Function Network.
If X1 is A1i and X2 is A2i then Y is Bi (adds 3rd dimension).
Kosko - optimal rules cover extremes
Y
X
6.2 Additive Model with weights (Kosko)
Rule 1
w
i
X
Rule 2
Y
+
B
Defuzzify
fuzzy set
Additive Model
(no max)
Rule 3
Can also set weights wi to equal wi = kki , where ki is the cluster count in cell i, and k is the
total count.
266
If we threshold this to wi = 0=1 which implies adding or removing rule.
Additive Fuzzy Models are \universal approximators".
6.3 Example Ad-Hoc Adaptive Fuzzy
Adaptive Weights
Weights initially set to 1 and then adapted. Truth of premise (0 to 1) reduced as w ! 0.
Adaptation can be supervised (like BP) or some reinforcement method (punish/reward weights
based on frequency of rules that re).
Adaptive Membership function support
Example:
If e = y ; d is positive the the system is over-reacting thus narrow regions of rules that red.
If e is negative
Widen regions.
(Restrict amount of overlap to avoid instabilities)
6.4 Neuro-Fuzzy Systems
Make Fuzzy systems look like Neural Net and then apply any of the learning algorithms.(BP, RTBP,
etc).
267
6.4.1 Example: Sugene Fuzzy Model
if X is A and Y is B then Z = f (x; y ).
Layer 1 - MFs
MFAi (x) = 1 + [( x1;ci )2]bi
ai
) bell shape function parameterized by a; b; c.
Layer 2 - AND - product or soft min
;kx + ye;ky
soft ; min(x; y) = xee;kx +
e;ky
Layer 3 - Weight Normalization
Layer 4 = wi(pi x + qi y + ri )
Layer 5 = wifi
Network is parameterized by ai ; bi; ci; pi; qi ; ri. Train using gradient descent.
In general replace hard max/min by soft max/ min and use BP,RTRL, BPTT, etc.
268
6.4.2 Another Simple Example
a11
Membership functions:
a1(m-1)
a12
8
>
<
U1i = >
:
x1 ;a1(i;1)
a1i ;a1(i;1)
a1 (i+1);x1
a1(i+1) ;ai1
0
if U1i AND U2j then ....
Constraint a11 < a12 < a1m
AND - Product Rule
Defuzzication - Weighted Average
Y=
a1(i;1) < x1 < a1i
a1i x1 a1(i+1)
otherwise
(155)
X
ij
(U1i U2j )wij
Gradient descent on error:
Error = 12 (Yk ; dk )2
@E = X(Y ; d )(U U )
@wij k k k 1i 2j
i+1
@E = X X(Y ; d )( X
@a1i j k k k l=i;1 U1iwij )
7 Neural & Fuzzy
In 1991 Matsushita Electronics claimed to have 14 consumer products using both neural networks
and fuzzy logic.
269
7.1 NN as design Tool
Neural Net outputs center locations and width of membership functions.
7.2 NN for pre-processing
Output of Neural Net is "Predicted Mean Vote" - PMV, and is meant to be an index of
comfort.
This output is fed as control input to a fuzzy controller.
Example: Input to NNet is image. Output of net in classication of stop sign / no stop sign
which goes as input to Fuzzy logic.
270
7.3 NN corrective type
Example: Hitachi Washing Machine, Sanyo Microwave.
Fuzzy designed on original inputs.
NN added later to handle additional inputs and avoid redesigning with too many inputs and
membership functions.
7.4 NN for post-processing
NN control
Input
Low Level
Control
Fuzzy
Logic
NN Control
Fuzzy for higher level logic and NN for low level control.
Example: Fuzzy determines ight high-level commands, neural implements them.
Example: Sanyo fan - Points to holder of controller. Fuzzy - 3 infrared sensors used to
compute distance. Neural - given distance & ratio of sensor outputs are used to command
direction.
7.5 Fuzzy Control and NN system ID
NN ID
error
Input
Fuzzy
Logic
Plant
Use NNet for system ID which provides error (through BP) to adapt Fuzzy controller.
271
8 Fuzzy Control Examples
8.1 Intelligent Cruise Control with Fuzzy Logic
R. Muller, G. Nocker, Daimler-Benz AG.
272
8.2 Fuzzy Logic Anti-Lock Brake System for a Limited Range Coecient of
Friction Surface
D. Madau, F. Yuan, L. Davis, L. Feldkamp, Research Laboratory, Ford Motor Company.
273
274
9 Appendix - Fuzziness vs Randomness
(E. Plumer)
Fuzzy logic seems to have the same \avor" as probability theory. Both attempt to describe
uncertainty in certain events. Fundamentally, however, they are very dierent. For an excellent
discussion on the dierences and implications thereof, read (B. Kosko, Neural networks and fuzzy
systems: A dynamical systems approach to machine learning, Prentice-Hall, 1992, chapter 7). It
would seem that we can make the following identications and use probability theory to describe
fuzzy logic:
Fuzzy Logic
?
Probability Theory
fuzzy set
()
random variable
membership function, fA (x) () probability density, pA (x)
element of fuzzy set
() value of random variable
As we shall see, this won't work. Before waxing philosophical on the dierences, we note a few
problems with using probability theory. Recalling basics of probability theory (see M. Gray and
L. Davisson, Random processes: A mathematical approach for engineers, Prentice-Hall, 1986) we
dene a probability space (
; F; P ) as having three parts:
1. Sample Space, | collection of points or members. Also called \elementary outcomes."
2. Event Space, F | Nonempty collection of subsets of . Also called a sigma algebra of .
This set must obey the following properties: (1) if F 2 F, then F c 2 F, where F c = f! : ! 2
; ! 62SF g is the compliment of F ; (2) for nite (or countably innite) number of sets Fi , we
have i Fi 2 F.
3. Probability Measure, P | assignment of a real number , P (F ) to every member of the sigma
algebra such that P obeys the following axioms of probability: (1) Sfor all F P2 F, P (F ) 0;
(2) for disjoint sets Fi ( i.e. i 6= j implies Fi \ Fj = ;), we have P ( i Fi ) = i P (Fi ).
A random variable is simply a mapping from the event space to the real numbers X : F ! <. These
properties must hold in order to have a probability space and to be able to apply probability theory.
Thus, a probability space has at its root a set which obeys regular set theory. Built upon this
set is the probability measure describing the \randomness." On the other hand, in fuzzy logic, the
basic notion of a set and of all set operations and set properties is redened. Furthermore, fuzzy sets
and membership functions do not satisfy the axioms of probability theory. As anR example, recall
that for a probability density function pX (x), we must have 0 pX (x) 1 and
R x pX (x) dx = 1.
However, for membership functions, although 0 fA (x) 1, the value of x fA (x) dx can be
anything! Also, consider this very profound statement: A \ Ac = ; which states that a set and
its compliment are mutually exclusive. This certainly holds true for the event space underlying a
probability space. However, in fuzzy logic this is no longer true! A \ Ac has membership function
fA\Ac (x) = min [fA (x); 1 ; fA (x)] which is clearly not identically zero. Moral: probability theory
is not adequate to describe fuzzy logic.
We can also provide a few intuitive reasons why \randomness" and \fuzziness" are not the same
thing. To quote Kosko:
275
Fuzziness describes event ambiguity. It measures the degree to which an event occurs,
not whether it occurs. Randomness describes the uncertainty of event occurrence. An
event occurs or not, . . . The issue concerns the occurring event: Is it uncertain in any
way? Can we unambiguously distinguish the event from its opposite? [pp. 265].
Basically probability theory accounts for an event occurring or not occurring but doesn't permit
an event to partially occur.
Underlying probability theory is the abstract idea of some type of \coin toss experiment" which
chooses one of the elements of F as the event. The measure P describes the likelihood of any
particular event occurring and the random variable X maps the event to some output number x.
In fuzzy logic, there is no randomness, no notion of an underlying experimental coin toss, and no
uncertainty of the outcome.
A further contrasting can be made as follows: In fuzzy logic, each real value (such as T = 50 C )
is associated, in varying degrees, with one or more possible fuzzy sets or linguistic variables (such
as \T is hot" or \T is very hot"). The membership function measures the degree to which each
fuzzy set is an \attribute" of the real value. In probability theory, on the other hand, the roles are
reversed. A real value does not take on a random variable, rather, a random variable takes on one
of a number of possible real values depending on the outcome of the underlying experiment.
Beyond simply describing and approach to designing control systems, \fuzziness" proposes a
completely dierent way of explaining uncertainty in the world. As Kosko points out in his book,
for example:
Does quantum mechanics deal with the probability that an unambiguous electron
occupies spacetime points? Or does it deal with the degree to which an electron, or an
electron smear, occurs at spacetime points? [pp. 266].
More fun philosophical examples are given in the rest of chapter 7 of Kosko's book.
276
Part X
Exercises and Simulator
Home work 1 - Continuous Classical Control
Home work 2 - Digital Classical Control
Home work 3 - State-Space
Home work 4 - Double Pendulum Simulator - Linear SS Control
Midterm
Double Pendulum Simulator - Neural Control/BPTT
Double Pendulum Simulator Documents
277
0.1 System description for double pendulum on a cart
θ2
θ1
u
0
x
System parameters
l1
l2
m1
m2
M
=
=
=
=
=
length of rst stick
length of second stick
mass of rst stick
mass of second stick
mass cart
(156)
(157)
(158)
(159)
(160)
position of the cart
velocity of cart
angle of rst stick
angular velocity of rst stick
angle of second stick
angular velocity of second stick
(161)
(162)
(163)
(164)
(165)
(166)
States
x
x_
1
_1
2
_2
=
=
=
=
=
=
Control force
u = force
278
(167)
Dynamics
Using a Lagrange approach, the dynamic equations can be written as:
(M + m1 + m2 )x ; (m1 + 2m2)l11 cos 1 ; m2 l22 cos 2
= u + (m1 + 2m2)l1_1 2 sin 1 + m2 l2_2 2 sin 2
;(m1 + 2m2)l1x cos 1 + 4( m31 + m2)l121 + 2m2l1l22 cos(2 ; 1)
= (m1 + 2m2 )gl1 sin 1 + 2m2 l1l2_2 2 sin(2 ; 1 )
;m2xl2 cos 2 + 2m2l1l21 cos(2 ; 1) + 34 m2l222
= m2gl2 sin 2 ; 2m2l1l2_1 2 sin(2 ; 1 )
The above equations can be solved for x, 1 and 2 , to form a state-space representation.
279
(168)
(169)
(170)
(171)
(172)
(173)
0.2 MATLAB les
MATLAB scripts for the simulation of the double inverted pendulum
dbroom.m:
Main scripts that designs and graphically implements linear control of
the double pendulum. Just type "dbroom" from MATLAB to run it. The
script can be edited to change initial conditions and model
specifications. The script designs the control law for the current
model specifications. Functions from the controls toolbox are
necessary.
Functions called by dbroom.m
dbroom_figs.m - Plots responses of simulation
dbroom_init.m - Initializes graphics windows
dbroom_lin.m - Linearizes the nonlinear model for given parameters
dbroom_plot.m
- Graphic plot of system for current state
dbroom_sim.m
- Nonlinear state-space simulation of double pendulum
dbroom_view.m
- Graphic prompt for viewing figures
broom.ps : system definitions and dynamics.
Eric A. Wan
ericwan@ece.ogi
Oregon Graduate Institute
Snapshot of MATLAB window:
−3.75
3.75
280
% Double Inverted Pendulum with Linear Controller
%
% Eric Wan, Mahesh Kumashikar, Ravi Sharma
% (ericwan@eeap.ogi.edu)
% PENDULUM MODEL SPECIFICATIONS
Length_bottom_pendulum
Length_top_pendulum
Mass_of_cart
Mass_of_bottom_pendulum
Mass__top_pendulum
=
=
=
=
=
.5;
0.75;
1.5;
0.5;
0.75;
pmodel = [Length_bottom_pendulum Length_top_pendulum Mass_of_cart
Mass_of_bottom_pendulum Mass__top_pendulum] ;
% INITIAL STATE
Cart_position = 0;
Cart_velocity = 0;
Bottom_pendulum_angle
Bottom_pendulum_velocity
Top_pendulum_angle
Top_pendulum_velocity
=
=
=
=
20; % degrees
0;
20; % degrees
0;
% +- 9 max
% Random starting angles for fun
%Bottom_pendulum_angle
= 15*(2*(rand(1)-.5));
%Top_pendulum_angle
= 15*(2*(rand(1)-.5));
u = 0; % initial control force
deg = pi/180;
x0 = [Cart_position Cart_velocity Bottom_pendulum_angle*deg
Bottom_pendulum_velocity*deg Top_pendulum_angle*deg Top_pendulum_velocity*deg];
% TIME SPECIFICATIONS
final_time = 10;
% seconds
dt= 0.01;
% Time for simulation step
DSample = 2;
% update control every Sample*dt seconds
T = dt:dt:final_time;
steps = length(T);
plotstep = 4;
% update graphics every plotstep time steps
%To record positions and control force.
X = zeros(6,steps);
U = zeros(1,steps);
% setup beginning of demo
281
dbroom_init
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% DESIGN LINEAR CONTROLLER
% Obtain a linear state model
[A,B,C,D] = dbroom_lin(pmodel,0.001);
% form discrete model
[Phi,Gam] = c2d(A,B,DSample*dt);
%Design LQR
Q = diag([50 1 100 20 120 20]);
K = dlqr(Phi,Gam,Q,1);
% Closed loop poles
% p = eig(Phi - Gam*K)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Simulate controlled pendulum
% Plots initial pendulum state
fall = dbroom_plot(x0,u,0,f1,1,pmodel);
drawnow;
fprintf('\n Hit Return to begin simulation \n')
pause
x = x0'; i = 1;
while (i<steps) & (fall ~= 1),
i = i+1;
% Update Control force
if rem(i,DSample) <= eps,
u = -K*x;
end
% Apply control once in DSample time steps.
% Linear control law
% UPDATE STATE
% state simulation based on Lagrange method
x = dbroom_sim(x,u,pmodel,dt);
% PLOT EVERY plotstep TIMESTEPS
if rem(i,plotstep) <= eps
fall = dbroom_plot(x,u,i*dt,f1,0,pmodel);
end
282
% SAVE VALUES
X(:,i) = x;
U(:,i) = u;
end
% graph responses
dbroom_view
283
Download