Nicole Rogers
Known as the ‘law of disorder.’
Entropy is a measurement of uncertainty associated with a random variable.
Measures the ‘multiplicity’ associated with the state of objects.
Thermodynamic entropy is related to Shannon entropy by normalizing it with a Boltzmann constant.
Shannon Entropy measures how undetermined a state of uncertainty is.
The higher the Shannon Entropy, the more undetermined the system is.
Let’s use the example of a dog race.
Four dogs have various chances of winning the race.
If we apply the entropy equation:
Racers
Fido
Ms Fluff
Spike
Woofers
H = ∑ P i log(P i
)
Chance to Win
(P)
0.08
0.17
0.25
0.50
-log(P)
3.64
2.56
2.00
1.00
-P log(P)
0.29
0.43
0.50
0.50
Racers
Fido
Ms Fluff
Spike
Woofers
Chance to Win
(P)
0.08
0.17
0.25
-log(P)
3.64
2.56
2.00
-P log(P)
0.29
0.43
0.50
0.50
H = ∑ P
1.00
i log(P i
)
H = 0.29 + 0.43 + 0.5 + 0.5
0.50
The Shannon Entropy is 1.72
Racers
Fido
Ms Fluff
Spike
Woofers
Chance to Win (P) -log(P)
0.08
3.64
0.17
0.25
0.50
2.56
2.00
1.00
-P log(P)
0.29
0.43
0.50
0.50
If you add the chance of each dog to win, the total will be one.
This is because the chances are normalized and can me represented using a Gaussian curve.
The more uncertain a situation, the higher the Shannon entropy.
This will be demonstrated in the next example.
Racers
Fido
Ms Fluff
Spike
Woofers
Chance to Win
(P)
0.25
0.25
0.25
-log(P)
2.00
2.00
2.00
-P log(P)
0.50
0.50
0.50
0.25
H = ∑ P 2.00
i log(P i
) 0.50
H = 0.5 + 0.5 + 0.5 + 0.5 +0.5
With every variable completely uncertain, the
Shannon Entropy will be 2.0
Racers
Fido
Ms Fluff
Spike
Woofers
Chance to Win
(P)
0.01
0.01
0.01
-log(P)
6.64
6.64
6.64
-P log(P)
0.07
0.07
0.07
0.97
H = ∑ P 0.0439
i log(P i
) 0.04
H = 0.07 + 0.07 + 0.07 + 0.04
With the situation fairly certain, the Shannon Entropy will be 0.25.
High Uncertainty
Fair Uncertainty
Low Uncertainty
H = 2.00
H = 1.72
H = 0.25
The more uncertain the situation, the higher the entropy, thus entropy is a measurement of chaos.
The maximum entropy states that, subject to precisely stated prior data, which must be a proposition that expresses testable information, the probability distribution which best represents the current state of knowledge is one with the largest information theoretical entropy.
In most practical cases, the stated prior data or testable information is given by a set of conserved quantities associated with the probability distribution is question.
We use Lagrange method to help us solve this.
In mathematical optimization, the method of Lagrange multipliers provides a strategy for finding the local maxima and minima of a function subject to equality constraints.
Lagrange Method assumes maximum entropy.
The first of these equations are a normalization constraint. All of the probabilities must equal 1.
The second equation is a general constraint. We will see more of what this is in the next example.
Since Lagrange Method assumes maximum entropy, we can say:
Maximizing L with respect to each of the p(A i
) is done by differentiating L with respect to one of the p(A i
α , β , and all other p(A i
) constant. The result is:
) while keeping
Rearranging the equation, we can get:
Where f( β )=0 because . Using this method, we can solve equations with minimum constraints.
Burger
$1.00
Chicken
$2.00
Fish
$3.00
Tofu
$8.00
A fast food restaurant sells four types of product. They find that the average amount of money made for each purchase is
$2.50. The products are chosen by the consumer based on price alone, and not preference. What is the percentage of purchase for each of these four foods?
We know that:
Applying Lagrange Method:
Entropy is the largest, subject to the constraints, if
Where
A zero-finding program was used to find the variables in these equations. The results were:
Food Probability of
Purchase
Burger
Chicken
Fish
Tofu
0.3546
0.2964
0.2477
0.1011
0.3546+0.2964+0.2477+0.1011 = 0.9998
This rounds to one, and therefore is normalized.
Lagrange method and maximum entropy can determine probabilities using only a small set of constraints. This answer makes sense because the probabilities of each food being chosen are consistent with the price constraint given to them.
Burger
$1.00
Chicken
$2.00
Fish
$3.00
Tofu
$8.00
Only by assuming maximum entropy are we able to evaluate these equations.
Since this example is evaluated on price alone, then the burger would have been chosen with the most frequency because of the cheaper price. The probabilities are lower for the more expensive prices, as indicated by the results.
When the number of randomness increases, so does the entropy.
Because we only had four variables, the entropy at maximum would have been lower than if there were five variables.
Fourier transform is a mathematical operation with many applications in physics and engineering that expresses a mathematical function of time as a function of frequency. The frequency can be approximated with sine and cosine functions.
Fourier transforms and maximum entropy can both be utilized to find the specific frequencies of a sine/cosine wave.
Num=30 x(i)=dsin(twopi*2.d0*t) x(i)=x(i)+dsin(twopi*3.d0*t) x(i)=5.d0+x(i)+dsin(twopi*3.2d0*t)
Num=90
Num=150
Since we were looking for 2.0
π , 3.0
π , and 3.2
π in our sine and cosine waves, maximum entropy was consistently better at determining these numbers on the graphs
Maximum entropy works better than Fourier from the range of 30 to 150 data sets. This is because it calculates an average using a small amount of data.
If the data were dramatically increased, Fourier
Method would work better.
http://en.wikipedia.org/wiki/Entropy
http://www.eoht.info/page/High+entropy+state
http://en.wikipedia.org/wiki/Second_law_of_ther modynamics
http://www.entropylaw.com/