Mathematical background Part 1 Dr. Alberto Testolin Department of General Psychology and Department of Mathematics University of Padova Lecture overview • Why do psychologists need math? • Notions of graph theory • Notions of probability theory • Notions of linear algebra • Notions of calculus • Non-linear systems: more power --> more troubles • What is a “machine learning” algorithm? • Examples of algorithms and simulators The advantage of understanding a method Computational modeling is powerful because it uses research tools that are precise, systematic, and as clearly defined as possible • A more rapid advancement of scientific knowledge is usually promoted by the adoption of a mathematical (formal) language, where concepts are defined in an explicit and rigorous way • In the humanities this is often hard to apply • However, regardless on your willing to either promote or criticize this approach, you should try to understand it, in order to appreciate both its strenghts and its limitations The advantage of understanding a method A deeper comprehension allows us to be more critical ! Few reasons why you should love math (if you are interested in AI and cognitive modeling) • • • • • • • • Math is precise and rigorous Math is quantitative Math is a universal language Math is unambiguous Math is abstract …but it can be applied Math can (often) be proved Math can run on computers BUT if you can’t do math, you just hate it. Some examples of the mathematical formalism that we will use during the course Graph theory: Why ? • Powerful mathematical formalism for describing and studying systems composed by interacting elements: • • • • • • • • • Atoms and molecules Protein and gene networks Immune systems Animal species in ecosystems Traffic flows Telecommunication networks (e.g., Internet) Commercial trades Immigration flows Social relationships (e.g., Facebook friend’s activity) • Of course, neuronal networks • … and many others! More on “The power of networks”: https://www.youtube.com/watch?v=nJmGrNdJ5Gw The brain as a graph Graph theory • Building blocks: – Nodes (or “vertices”, or “units”): they define the variables to be modeled – Edges (or “arcs”, or “connections”): they define which variables directly interact together – Weights: they define the strength of interactions B 0,02 C -10 A 10 5 D More on “Graph Theory”: https://en.wikipedia.org/wiki/Graph_(discrete_mathematics) Graph theory • Other useful notions: – Directed graphs: encode “parent of” relationships between variables sunrise air temperature This is often related to the notion of “causality”: the rooster crows because of the sunrise, and not vice versa! rooster crows Bob happiness – Undirected graphs: encode only “degree of affinity” (e.g., correlation) Bob productivity Bob happiness Alice happiness Italian PIL (Gross Domestic Product) Here, we only know that the variables mutually influence each other. We cannot establish which one is the cause, and which one is the effect. NB: • Undirected = Bidirectional and Symmetric When two variables interact through an undirected connection, it is like having two symmetric (i.e., same weight) directed connections: A 10 10 B A B 10 • If we want to specify a bidirectional influence with different weights, then we must use two separate, directed arrows: 10 A B -4 Graph theory • Other useful notions: – Topology of a graph: specifies the connectivity map on a 2D plane ring tree fully-connected bipartite As we will see later in the course, topology and directionality define the architecture of a neural network. Probability theory: Why ? • Powerful mathematical formalism for describing and studying stochastic systems. But why do we care about stochastic systems? • Ignorance: sometimes we can only partially observe a system (e.g., given some [observed] symptoms, diagnose the underlying [unobserved] disorder) • Approximations and errors in the data (e.g., measurements and sensations are usually noisy) • Some systems are intrinsically stochastic (e.g., quantum physics, complex systems…) Probability theory • Building blocks: – Random variables: variables whose values are defined according to some probabilistic function (e.g., Gaussian distribution) – Conditional probability P(A|B) : probability of one or more random variables, given the value of some other random variables – Joint probability P(A, B) : probability of the intersection of two or more random variables P(A, B) = P(A|B) * P(B) = P(B|A) * P(A) – Statistical independence: two variables are independent if they don’t influence each other. This implies that their joint distribution is simply the product of their individual distributions: P(A, B) = P(A) * P(B) It also implies that knowing the value of B does not change our opinion on A: P(A|B) = P(A) On the meaning of «random» variable «A random variable (a.k.a. “aleatory variable” or “stochastic variable”) is a measurable variable that can assume different values depending on some aleatory phenomenon, which is represented by a probability distribution». NB: This does not necessarily mean that the variable cannot be predicted! («totally random behavior») Uniform distribution Normal distribution (Gaussian) Exponential distribution Probability theory Simple example: Rolling two dice § The two random variables have the same probability distribution, which is uniform between all the possible events: P(A=1) = P(A=2) = P(A=3) … = P(B=1) = P(B=2) … = P(B=6) = 1/6 § The two dices are independent, so their joint distribution for a particular event is simply the product of their individual probabilities: P(A=1, B=4) = 1/6 * 1/6 = 1/36 A more complex example: Eye and skin color § The two random variables have different probability distributions, which is not uniform between all the possible events (e.g., blue eyes are less common than brown eyes). § Moreover, the two variables are not independent (e.g., dark skin is usually associated with dark eyes). Probability theory: the Bayesian perspective Rather than being concerned about absolute probabilities, we interpret the value of each random variable as a degree of belief H P( H | E ) = P( E | H ) × P( H ) P( E ) musician dog snake face E • Inference: given some evidence (observed variables), we want to find the probability distribution of the remaining variables (hypotheses) à this way, the degree of belief can be updated according to current evidence • Learning: we want to find the parameters that best describe a set of observations (maximum-likelihood) à we’ll talk more about this later in the course ! Graph theory + Probability theory = Probabilistic Graphical Models • Basic idea: we can use graphs to represent joint probability distributions • Each node is a random variable • The topology of the graph defines conditional independences • The joint probability distribution can thus be factorized using local interactions between variables Bayesian network Markov network