Uploaded by Alison Prus

Lecture2-BasicMath-1

advertisement
Mathematical background
Part 1
Dr. Alberto Testolin
Department of General Psychology and
Department of Mathematics
University of Padova
Lecture overview
• Why do psychologists need math?
• Notions of graph theory
• Notions of probability theory
• Notions of linear algebra
• Notions of calculus
• Non-linear systems: more power --> more troubles
• What is a “machine learning” algorithm?
• Examples of algorithms and simulators
The advantage of understanding a method
Computational modeling is powerful because it uses research tools that are
precise, systematic, and as clearly defined as possible
• A more rapid advancement of scientific knowledge
is usually promoted by the adoption of a
mathematical (formal) language, where concepts
are defined in an explicit and rigorous way
• In the humanities this is often hard to apply
• However, regardless on your willing to either
promote or criticize this approach, you should
try to understand it, in order to appreciate both
its strenghts and its limitations
The advantage of understanding a method
A deeper comprehension allows us to be more critical !
Few reasons why you should love math
(if you are interested in AI and cognitive modeling)
•
•
•
•
•
•
•
•
Math is precise and rigorous
Math is quantitative
Math is a universal language
Math is unambiguous
Math is abstract
…but it can be applied
Math can (often) be proved
Math can run on computers
BUT if you can’t do math,
you just hate it.
Some examples of the mathematical formalism
that we will use during the course
Graph theory: Why ?
• Powerful mathematical formalism for describing and studying
systems composed by interacting elements:
•
•
•
•
•
•
•
•
•
Atoms and molecules
Protein and gene networks
Immune systems
Animal species in ecosystems
Traffic flows
Telecommunication networks (e.g., Internet)
Commercial trades
Immigration flows
Social relationships
(e.g., Facebook friend’s activity)
• Of course, neuronal networks
• … and many others!
More on “The power of networks”: https://www.youtube.com/watch?v=nJmGrNdJ5Gw
The brain as a graph
Graph theory
• Building blocks:
– Nodes (or “vertices”, or “units”): they define the variables to be modeled
– Edges (or “arcs”, or “connections”): they define which variables directly
interact together
– Weights: they define the strength of interactions
B
0,02
C
-10
A
10
5
D
More on “Graph Theory”: https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)
Graph theory
• Other useful notions:
– Directed graphs: encode “parent of” relationships between variables
sunrise
air
temperature
This is often related to the notion of
“causality”: the rooster crows because of
the sunrise, and not vice versa!
rooster crows
Bob
happiness
– Undirected graphs: encode only “degree of affinity” (e.g., correlation)
Bob
productivity
Bob
happiness
Alice
happiness
Italian PIL
(Gross Domestic Product)
Here, we only know that the variables
mutually influence each other.
We cannot establish which one is the
cause, and which one is the effect.
NB:
•
Undirected = Bidirectional and Symmetric
When two variables interact through an undirected connection, it is like having
two symmetric (i.e., same weight) directed connections:
A
10
10
B
A
B
10
•
If we want to specify a bidirectional influence with different weights, then we
must use two separate, directed arrows:
10
A
B
-4
Graph theory
• Other useful notions:
– Topology of a graph: specifies the connectivity map on a 2D plane
ring
tree
fully-connected
bipartite
As we will see later in the course, topology and directionality
define the architecture of a neural network.
Probability theory: Why ?
• Powerful mathematical formalism for describing and studying
stochastic systems. But why do we care about stochastic systems?
• Ignorance: sometimes we can only partially observe a system
(e.g., given some [observed] symptoms, diagnose the underlying [unobserved] disorder)
• Approximations and errors in the data
(e.g., measurements and sensations are usually noisy)
• Some systems are intrinsically stochastic
(e.g., quantum physics, complex systems…)
Probability theory
• Building blocks:
– Random variables: variables whose values are defined according to some
probabilistic function (e.g., Gaussian distribution)
– Conditional probability P(A|B) : probability of one or more random
variables, given the value of some other random variables
– Joint probability P(A, B) : probability of the intersection of two or more
random variables
P(A, B) = P(A|B) * P(B) = P(B|A) * P(A)
– Statistical independence: two variables are independent if they don’t
influence each other. This implies that their joint distribution is simply the
product of their individual distributions:
P(A, B) = P(A) * P(B)
It also implies that knowing the value
of B does not change our opinion on A:
P(A|B) = P(A)
On the meaning of «random» variable
«A random variable (a.k.a. “aleatory variable” or “stochastic variable”) is a
measurable variable that can assume different values depending on some
aleatory phenomenon, which is represented by a probability distribution».
NB: This does not necessarily mean that the variable cannot be predicted!
(«totally random behavior»)
Uniform distribution
Normal distribution
(Gaussian)
Exponential distribution
Probability theory
Simple example: Rolling two dice
§ The two random variables have the same probability distribution, which is
uniform between all the possible events:
P(A=1) = P(A=2) = P(A=3) … = P(B=1) = P(B=2) … = P(B=6) = 1/6
§ The two dices are independent, so their joint distribution for a particular
event is simply the product of their individual probabilities:
P(A=1, B=4) = 1/6 * 1/6 = 1/36
A more complex example: Eye and skin color
§ The two random variables have different probability distributions, which is
not uniform between all the possible events (e.g., blue eyes are less common
than brown eyes).
§ Moreover, the two variables are not independent (e.g., dark skin is usually
associated with dark eyes).
Probability theory: the Bayesian perspective
Rather than being concerned about absolute probabilities, we
interpret the value of each random variable as a degree of belief
H
P( H | E ) =
P( E | H ) × P( H )
P( E )
musician
dog
snake
face
E
•
Inference: given some evidence (observed variables), we want to find the probability
distribution of the remaining variables (hypotheses)
à this way, the degree of belief can be updated according to current evidence
•
Learning: we want to find the parameters that best describe a set of observations
(maximum-likelihood)
à we’ll talk more about this later in the course !
Graph theory + Probability theory =
Probabilistic Graphical Models
• Basic idea: we can use graphs to represent joint probability distributions
• Each node is a random variable
• The topology of the graph defines conditional independences
• The joint probability distribution can thus be factorized using local
interactions between variables
Bayesian network
Markov network
Download