Approximate Inference and Learning Le Song Machine Learning II: Advanced Topics

advertisement
Approximate Inference and Learning
Le Song
Machine Learning II: Advanced Topics
CSE 8803ML, Spring 2012
Why Sampling
Exact and variational inference tasks focus on obtaining the
entire posterior distribution 𝑃 𝑋𝑖 𝑒
Often we want to take expectations
Mean πœ‡π‘‹π‘– |𝑒 = 𝐸 𝑋𝑖 𝑒 = ∫ 𝑋𝑖 𝑃 𝑋𝑖 𝑒 𝑑𝑋𝑖
More general 𝐸 𝑓 = ∫ 𝑓 𝑋 𝑃 𝑋|𝑒 𝑑𝑋, can be difficult to do
analytically
Sometime we also want to see typical data points from a
distribution
2
Sampling
Samples: points from the domain of a distribution 𝑃 𝑋
The higher the 𝑃 π‘₯ , the more likely we see π‘₯ in the sample
𝑃(𝑋)
𝑋
π‘₯1
π‘₯4
π‘₯5 π‘₯2 π‘₯6
π‘₯3
Approximate expectation by sample average
1
𝐸𝑓 ≈
𝑁
𝑁
𝑓 π‘₯𝑖
𝑖=1
where π‘₯1 , … , π‘₯𝑁 ∼ 𝑃 𝑋|𝑒 independently and identically
distributed
3
Generate Samples from Bayesian Networks
BN describe a generative process for observations
First, sort the nodes in topological order
1
2
𝐹𝑙𝑒
π΄π‘™π‘™π‘’π‘Ÿπ‘”π‘¦
Then, generate sample using this order according
to the CPTs
𝑆𝑖𝑛𝑒𝑠 3
Generate a set of sample for (A, F, S, N, H):
Sample π‘Žπ‘– ∼ 𝑃 𝐴
Sample 𝑓𝑖 ∼ 𝑃 𝐹
Sample 𝑠𝑖 ∼ 𝑃 𝑆 π‘Žπ‘– , 𝑓𝑖
Sample 𝑛𝑖 ∼ 𝑃 𝑁 𝑠𝑖
Sample β„Žπ‘– ∼ 𝑃 𝐻 𝑠𝑖
π»π‘’π‘Žπ‘‘π‘Žπ‘β„Žπ‘’
π‘π‘œπ‘ π‘’
4
5
4
Challenge in sampling
Not all distributions can be trivially sampled, e.g.,
Loopy graphical model with lots of variables
Distribution with complicated shapes
𝑃(𝑋)
𝑋
5
Sampling Methods
Direct Sampling
Simple
Works only for easy distributions
Rejection Sampling
Create samples like direct sampling
Only count samples consistent with given evidence
Importance Sampling
Create samples like direct sampling
Assign weights to samples
Gibbs Sampling
Often used for high-dimensional problem
Use variables and its Markov blanket for sampling
6
Rejection sampling
Sample π‘₯ ∼ 𝑄(𝑋) and reject with probability 1
𝑓 𝑋
−
𝑀𝑄 𝑋
𝑀 𝑄(π‘₯1 )
𝑀 𝑄(𝑋)
𝑓(π‘₯1 )
𝑓(𝑋)
Between red and blue
curves is rejection region
𝑒1 ∼ π‘ˆ[0,1]
𝑋
π‘₯1 ∼ 𝑄(𝑋)
7
Importance Sampling
Instead of reject sample, reweight sample instead
𝑃(π‘₯2 )
𝑄(𝑋)
𝑄(π‘₯1 )
𝑄(π‘₯2 )
𝑃(π‘₯1 )
𝑃(𝑋)
𝑋
π‘₯2 ∼ 𝑄(𝑋)
𝑀2 ∼ 𝑃 π‘₯2 /𝑄 π‘₯2
π‘₯1 ∼ 𝑄(𝑋)
𝑀1 ∼ 𝑃 π‘₯1 /𝑄 π‘₯1
8
Example: sample from MRF on grid
Use tree distribution 𝑄 as the proposal distribution
Cut some edges
to make a tree
𝑄 𝑋1 , … , 𝑋𝑛
𝑃 𝑋1 , … , 𝑋𝑛
∝ exp
πœƒπ‘–π‘— 𝑋𝑖 𝑋𝑗 +
(𝑖𝑗)∈𝐸
πœƒπ‘– 𝑋𝑖
𝑖∈𝑉
∝ exp
πœƒπ‘–π‘— 𝑋𝑖 𝑋𝑗 +
(𝑖𝑗)∈𝑇
πœƒπ‘– 𝑋𝑖
𝑖∈𝑉
has fewer terms
Then use rejection sampling or importance sampling
9
Gibbs Sampling
Both rejection sampling and importance sampling do not scale
well to high dimensions
Markov Chain Monte Carlo (MCMC) is an alternative
Key idea: Construct a Markov chain whose stationary
distribution is the target distribution 𝑃 𝑋
Sampling process: random walk in the Markov chain
Gibbs sampling is a very special and simple MCMC method.
10
Markov Chain Monte Carlo
Wan to sample from 𝑃 𝑋 , start with a random initial vector X
𝑋 𝑑 : 𝑋 at time step 𝑑
𝑋 𝑑 transition to 𝑋 𝑑+1 with probability
𝑄(𝑋 𝑑+1 |𝑋 𝑑 , … , 𝑋1 ) = 𝑇 (𝑋 𝑑+1 |𝑋 𝑑 )
The stationary distribution of 𝑇 (𝑋 𝑑+1 |𝑋 𝑑 ) is our 𝑃 𝑋
Run for an intial 𝑀 samples (burn-in time) until the chain
converges/mixes/reaches the stationary distribution
Then collect 𝑁 (correlated) sample as π‘₯𝑖
Key issues: Designing the transition kernel, and diagnose
convergence
11
Gibbs Sampling
A very special transition kernel, works nicely with Markov
blanket in GMs.
The procedure
We have variables set 𝑋 = 𝑋1 , … , 𝑋𝐾 variables in a GM.
At each step, one variable 𝑋𝑖 is selected (at random or some
fixed sequence), denote the remaining variables as 𝑋−𝑖 , and its
𝑑
current value as π‘₯−𝑖
𝑑
Compute the conditional distribution 𝑃(𝑋𝑖 | π‘₯−𝑖
)
A value π‘₯𝑖𝑑 is sampled from this distribution
This sample π‘₯𝑖𝑑 replaces the previous sampled value of 𝑋𝑖 in 𝑋
12
Gibbs Sampling in formula
Gibbs sampling
𝑋 = π‘₯0
For t = 1 to N
πΉπ‘œπ‘Ÿ π‘”π‘Ÿπ‘Žπ‘β„Žπ‘–π‘π‘Žπ‘™ π‘šπ‘œπ‘‘π‘’π‘™π‘ 
Only need to condition on the
Variables in the Markov blanket
π‘₯1𝑑 = 𝑃(𝑋1 |π‘₯2𝑑−1 , … , π‘₯𝐾𝑑−1 )
π‘₯2𝑑 = 𝑃(𝑋2 |π‘₯1𝑑 , … , π‘₯𝐾𝑑−1 )
…
𝑑
π‘₯𝐾𝑑 = 𝑃(𝑋2 |π‘₯1𝑑 , … , π‘₯𝐾−1
)
𝑋3
𝑋2
𝑋1
Variants:
Randomly pick variable to sample
sample block by block
𝑋4
𝑋5
13
Gibbs Sampling: Image Segmentation
Noisy grayscale image
Label each pixel as on/off
Model using a pairwise MRF
𝑃 𝑋 =
1
𝑍
𝑖Ψ
𝑋𝑖
Ψ π‘₯𝑖 = exp −
𝑖𝑗 Ψ
𝑦𝑖 −πœ‡π‘₯𝑖
𝑋𝑖 , 𝑋𝑗
𝑋5
𝑋6
π‘Œ8
π‘Œ5
π‘Œ6
𝑋8
2
2𝜎π‘₯2
𝑖
Ψ π‘₯𝑖 , π‘₯𝑗 = exp −𝛽 π‘₯𝑖 − π‘₯𝑗
π‘Œ1
π‘Œ2
𝑋2
𝑋1
π‘Œ4
𝑋4
2
π‘Œ3
π‘Œ7
𝑋7
𝑋3
π‘Œ9
𝑋9
14
Gibbs Sampling: Image Segmentation
Need conditional 𝑃(π‘₯𝑖 |π‘₯1 , … , π‘₯𝑖−1 , π‘₯𝑖+1 , … , π‘₯π‘˜ )
𝑃(π‘₯1 ,…,π‘₯π‘˜ )
𝑃(π‘₯1 ,…,π‘₯𝑖−1 ,π‘₯𝑖+1 ,π‘₯π‘˜ )
=
1
𝑍
𝑖Ψ
1
π‘₯𝑖 𝑍
π‘₯𝑖
𝑖Ψ
𝑖𝑗 Ψ
π‘₯𝑖
π‘₯𝑖 ,π‘₯𝑗
𝑖𝑗 Ψ
π‘₯𝑖 ,π‘₯𝑗
Terms without π‘₯𝑖 will cancel out
π‘₯𝑖 is summed out in the denominator
𝑋5
𝑋6
∝ Ψ π‘₯𝑖
𝑗∈𝑁(𝑖) Ψ(π‘₯𝑖 , π‘₯𝑗 )
𝑋1
π‘Œ4
𝑋4
π‘Œ3
π‘Œ7
𝑋7
𝑋8
π‘Œ1
π‘Œ2
𝑋2
π‘Œ8
π‘Œ5
π‘Œ6
𝑋3
π‘Œ9
𝑋9
15
Gibbs Sampling: Image Segmentation
16
MAP by Sampling
Generate a few samples from the posterior
For each 𝑋𝑖 the MAP is the majority assignment
Majority vote
17
Convergence of Gibbs Sampling
Not all samples π‘₯ 0 , … π‘₯ 𝑇 are independent
Consider a particular marginal 𝑃(π‘₯𝑖 |𝑒𝑖 )
1
True
𝑃(π‘₯𝑖 |𝑒𝑖 ) πΈπ‘šπ‘π‘–π‘π‘Žπ‘™
𝑃 π‘₯𝑖 𝑒𝑖
π‘‚π‘ π‘π‘–π‘™π‘™π‘Žπ‘‘π‘’ 𝑏𝑒𝑑𝑀𝑒𝑒𝑛
π‘šπ‘’π‘™π‘‘π‘–π‘π‘™π‘’ π‘šπ‘œπ‘‘π‘’π‘ 
𝑑
0
Burn-in
Take samples from here
18
Diagnose convergence
Good chain
Sampled Value
Iteration number
19
Diagnose convergence
Bad chain
Sampled Value
Iteration number
20
Sampling Methods
Direct Sampling
Works only for easy distributions (multinomial, Gaussian etc.)
Rejection Sampling
Create samples like direct sampling
Only count samples consistent with given evidence
Importance Sampling
Create samples like direct sampling
Assign weights to samples
Gibbs Sampling
Often used for high-dimensional problem
Use variables and its Markov blanket for sampling
21
Learning Graphical Models
The goal: given set of independent samples (assignments of
random variables), find the best (the most likely) graphical
model (both structure and the parameters
𝐹
𝐴
Learn
𝑆
𝑁
Structure
learning
𝑆
𝐻
(A,F,S,N,H) = (T,F,F,T,F)
(A,F,S,N,H) = (T,F,T,T,F)
…
(A,F,S,N,H) = (F,T,T,T,T)
𝐹
𝐴
𝐻
𝑁
S FA TF
TF
FT
FF
t
0.9
0.7
0.8
0.2
f
0.1
0.3
0.2
0.8
parameter
learning
22
Learning for GMs
Known Structure
Unknown Structure
Fully observable data
Relatively Easy
Hard
Missing data
Hard (EM)
Very hard
Estimation principle:
Maximal likelihood estimation
Bayesian estimation
Common Feature
Make use of distribution factorization
Make use of inference algorithm
Make use of regularization/prior
23
Example problem
Estimate the probability πœƒ of landing in heads
using a biased coin
Given a sequence of 𝑁 independently and
identically distributed (iid) flips
Eg., 𝐷 = π‘₯1 , π‘₯2 , … , π‘₯𝑁 = {1,0,1, … , 0}, π‘₯𝑖 ∈ {0,1}
Model: 𝑃 π‘₯|πœƒ = πœƒ π‘₯ 1 − πœƒ
𝑃(π‘₯|πœƒ ) =
1−π‘₯
1 − πœƒ, π‘“π‘œπ‘Ÿ π‘₯ = 0
πœƒ,
π‘“π‘œπ‘Ÿ π‘₯ = 1
Likelihood of a single observation π‘₯𝑖 ?
𝑃 π‘₯𝑖 |πœƒ = πœƒ π‘₯𝑖 1 − πœƒ
1−π‘₯𝑖
24
Bayesian Parameter Estimation
Bayesian treat the unknown parameters as a random variable,
whose distribution can be inferred using Bayes rule:
𝑃(πœƒ|𝐷) =
𝑃 𝐷 πœƒ 𝑃(πœƒ)
𝑃(𝐷)
=
𝑃 𝐷 πœƒ 𝑃(πœƒ)
∫ 𝑃 𝐷 πœƒ 𝑃 πœƒ π‘‘πœƒ
πœƒ
The crucial equation can be written in words
π‘ƒπ‘œπ‘ π‘‘π‘’π‘Ÿπ‘–π‘œπ‘Ÿ =
π‘™π‘–π‘˜π‘’π‘™π‘–β„Žπ‘œπ‘œπ‘‘×π‘π‘Ÿπ‘–π‘œπ‘Ÿ
π‘šπ‘Žπ‘Ÿπ‘”π‘–π‘›π‘Žπ‘™ π‘™π‘–π‘˜π‘’π‘™π‘–β„Žπ‘œπ‘œπ‘‘
𝑁
For iid data, the likelihood is 𝑃 𝐷 πœƒ =
𝑁
π‘₯𝑖
𝑖=1 πœƒ
1−πœƒ
1−π‘₯𝑖
=πœƒ
𝑖 π‘₯𝑖
1−πœƒ
𝑋
𝑖 1−π‘₯𝑖
𝑁
𝑖=1 𝑃(π‘₯𝑖 |πœƒ)
= πœƒ #β„Žπ‘’π‘Žπ‘‘ 1 − πœƒ
#π‘‘π‘Žπ‘–π‘™
The prior 𝑃 πœƒ encodes our prior knowledge on the domain
Different prior 𝑃 πœƒ will end up with different estimate 𝑃(πœƒ|𝐷)!
25
Frequentist Parameter Estimation
Bayesian estimation has been criticized for being “subjective”
Frequentists think of a parameter as a fixed, unknown
constant, not a random variable
Hence different “objective” estimators, instead of Bayes’ rule
These estimators have different properties, such as being
“unbiased”, “minimum variance”, etc.
A very popular estimator is the maximum likelihood estimator
(MLE), which is simple and has good statistical properties
πœƒ = π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯πœƒ 𝑃 𝐷 πœƒ = π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯πœƒ
𝑁
𝑖=1 𝑃(π‘₯𝑖 |πœƒ)
26
MLE for Biased Coin
Objective function, log likelihood
𝑙 πœƒ; 𝐷 = log 𝑃 𝐷 πœƒ = log πœƒ π‘›β„Ž 1 − πœƒ
𝑁 − π‘›β„Ž log 1 − πœƒ
𝑛𝑑
= π‘›β„Ž log πœƒ +
We need to maximize this w.r.t. πœƒ
Take derivatives w.r.t. πœƒ
πœ•π‘™
πœ•πœƒ
=
π‘›β„Ž
πœƒ
−
𝑁−π‘›β„Ž
1−πœƒ
= 0 ⇒ πœƒπ‘€πΏπΈ =
π‘›β„Ž
𝑁
or πœƒπ‘€πΏπΈ =
1
𝑁
𝑖 π‘₯𝑖
27
Maximum Likelihood Estimation for Bernoulli
What if we toss too few times so that we saw zero head in the
data?
𝑛
In this case, πœƒπ‘€πΏπΈ = β„Ž = 0, and we will predict that the
𝑁
probability of seeing a head next is zeros.
The rescue: add regularization to smooth the counts. Do
maximum a posteriori (MAP) estimation:
πœƒπ‘€π΄π‘ƒ = π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯πœƒ 𝑃 πœƒ 𝐷 = π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯πœƒ 𝑙 πœƒ; 𝐷 + log 𝑃(πœƒ)
For instance, log 𝑃 πœƒ = 𝑛′ log πœƒ + 𝑛′ log(1 − πœƒ)
πœƒπ‘€π΄π‘ƒ =
π‘›β„Ž +𝑛′
,
𝑁+𝑛′
𝑛’ known as pseudo –count
𝐡𝑒𝑑 π‘Žπ‘Ÿπ‘’ 𝑀𝑒 𝑠𝑑𝑖𝑙𝑙
π‘œπ‘π‘—π‘’π‘π‘‘π‘–π‘£π‘’?
28
Bayesian estimation for biased coin
Prior over πœƒ, Beta distribution
𝑃 πœƒ; 𝛼, 𝛽 =
à 𝛼+𝛽
Γ π‘Ž à 𝛽
πœƒ 𝛼−1 1 − πœƒ
𝛽−1
When x is discrete Γ π‘₯ + 1 = π‘₯Γ π‘₯ = π‘₯!
Posterior distribution of πœƒ
𝑃 π‘₯1 ,…,π‘₯𝑁 πœƒ 𝑃 πœƒ
𝑃 π‘₯1 ,…,π‘₯𝑁
𝑛𝑑 πœƒ 𝛼−1 1 − πœƒ 𝛽−1 =
𝑃 πœƒ|π‘₯1 , … , π‘₯𝑁 =
∝
πœƒ π‘›β„Ž 1 − πœƒ
πœƒ π‘›β„Ž +𝛼−1 1 − πœƒ 𝑛𝑑 +𝛽−1
Posterior is the same type of function as the prior
Such a prior is called a conjugate prior
𝛼 and 𝛽 are hyperparameters and correspond to the number of
“virtual” heads and tails (pseudo counts)
29
Bayesian Estimation for Bernoulli
Posterior distribution πœƒ
𝑃 π‘₯1 ,…,π‘₯𝑁 πœƒ 𝑃 πœƒ
𝑃 π‘₯1 ,…,π‘₯𝑁
𝑛𝑑 πœƒ 𝛼−1 1 − πœƒ 𝛽−1 =
𝑃 πœƒ|π‘₯1 , … , π‘₯𝑁 =
∝
πœƒ π‘›β„Ž 1 − πœƒ
πœƒ π‘›β„Ž +𝛼−1 1 − πœƒ
𝑛𝑑 +𝛽−1
Maximum a posteriori (MAP) estimation:
πœƒπ‘€π΄π‘ƒ = π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯πœƒ log 𝑃 πœƒ|π‘₯1 , … , π‘₯𝑁
Posterior mean estimation:
πœƒπ‘π‘Žπ‘¦π‘’π‘  = ∫ πœƒ 𝑃 πœƒ 𝐷 π‘‘πœƒ = 𝐢∫ πœƒ × πœƒ π‘›β„Ž +𝛼−1 1 − πœƒ
𝑛𝑑 +𝛽−1 π‘‘πœƒ
=
(π‘›β„Ž +𝛼)
𝑁+𝛼+𝛽
Prior strength: 𝐴 = 𝛼 + 𝛽
A can be interpreted as an imaginary dataset
30
Effect of Prior Strength
Suppose we have a uniform prior (𝛼 = 𝛽), and we observed
that π‘›β„Ž = 2, and 𝑛𝑑 = 8
Weak prior 𝐴 = 𝛼 + 𝛽 = 2. Posterior prediction:
𝑃 π‘₯ = β„Ž π‘›β„Ž = 2, 𝑛𝑑 = 8, 𝛼 = 1, 𝛽 = 1 =
2+1
10+2
= 0.25
Strong prior 𝐴 = 𝛼 + 𝛽 = 20. Posterior prediction:
𝑃 π‘₯ = β„Ž π‘›β„Ž = 2, 𝑛𝑑 = 8, 𝛼 = 10, 𝛽 = 10 =
2+10
10+20
= 0.4
However if we have enough data, it washes away the prior.
E.g. π‘›β„Ž = 200, and 𝑛𝑑 = 800. Then the estimate under weak and
200+1 200+10
strong prior are
,
respectively. Both close to 0.2
1000+2 1000+10
31
How estimators should be used?
πœƒπ‘€π΄π‘ƒ is not Bayesian (even though it uses a prior) since it is a
point estimate
Consider predicting the future. A sensible way is to combine
predictions based on all possible value of πœƒ, weighted by their
posterior probability, this is called Bayesian prediction:
𝑃 π‘₯𝑛𝑒𝑀 𝐷 = ∫ 𝑃 π‘₯𝑛𝑒𝑀 , πœƒ 𝐷 π‘‘πœƒ
= ∫ 𝑃 π‘₯𝑛𝑒𝑀 πœƒ, 𝐷 𝑃 πœƒ 𝐷 π‘‘πœƒ
= ∫ 𝑃 π‘₯𝑛𝑒𝑀 πœƒ 𝑃 πœƒ 𝐷 π‘‘πœƒ
πœƒ
𝑋𝑛𝑒𝑀
𝑋
𝑁
A frequentist prediction will typically use a “plug-in” estimator
such as ML/MAP
𝑃 π‘₯𝑛𝑒𝑀 𝐷 = 𝑃(π‘₯𝑛𝑒𝑀 | πœƒπ‘€πΏ ) π‘œπ‘Ÿ 𝑃 π‘₯𝑛𝑒𝑀 𝐷 = 𝑃(π‘₯𝑛𝑒𝑀 | πœƒπ‘€π΄π‘ƒ )
32
Frequentist vs. Bayesian
Advantages of Bayesian approach:
Mathematically elegant
Works well when amount of data is much less than the number
of parameters
Easy to do incremental (sequential) learning
Can be used for model selection (max likelihood will always pick
the most complex model)
Advantage of frequentist approach:
Mathematically/computationally simpler
“objective”, unbiased, invariant to reparametrization
As 𝐷 → ∞, the two approaches become the same
𝑃 πœƒ 𝐷 → 𝛿(πœƒ, πœƒπ‘€πΏ )
33
MLE for General Bayesian Networks
If we assume that the parameters for each CPT are globally
independent, and all nodes are fully observed, then the loglikelihood function decomposes into a sum of local terms, one
per node:
𝑙 πœƒ; 𝐷 = log 𝑃 𝐷 πœƒ
= log
𝑖
𝑖
𝑖
𝑃
π‘₯
π‘π‘Ž
𝑋𝑗 , πœƒπ‘— ) =
𝑗
𝑗
𝑖
𝑖
𝑖
log
𝑃
π‘₯
π‘π‘Ž
𝑋𝑗 , πœƒπ‘— )
𝑗
𝑗
π΄π‘™π‘™π‘’π‘Ÿπ‘”π‘¦
For each variable 𝑋𝑖 :
𝑃𝑀𝐿𝐸 𝑋𝑖 = π‘₯𝑖 π‘ƒπ‘Žπ‘‹π‘– = 𝑒 =
Why?
𝐹𝑙𝑒
#(𝑋𝑖 =π‘₯ ,π‘ƒπ‘Žπ‘‹π‘– =𝑒)
#(π‘ƒπ‘Žπ‘‹π‘– =𝑒)
𝑆𝑖𝑛𝑒𝑠
π‘π‘œπ‘ π‘’
π»π‘’π‘Žπ‘‘π‘Žπ‘β„Žπ‘’
34
MLE for General Bayesian Networks
𝑙 πœƒ; 𝐷 = log 𝑃 𝐷 πœƒ =
𝑖 log 𝑃
π‘Žπ‘– |πœƒπ‘Ž +
𝑖 log 𝑃
𝑓 𝑖 |πœƒπ‘“ +
𝑖 π‘™π‘œπ‘”π‘ƒ
𝑠 𝑖 π‘Žπ‘– , 𝑓 𝑖 , πœƒπ‘  +
𝑖 π‘™π‘œπ‘”π‘ƒ(β„Ž
𝑖
|𝑠 𝑖 , πœƒβ„Ž )
One term for each CPT; break up MLE problem into independent subproblems
Earlier we already learn how to estimate a single CPT
Here we just need to estimate each CPT separately.
π΄π‘™π‘™π‘’π‘Ÿπ‘”π‘¦
𝐹𝑙𝑒
𝑆𝑖𝑛𝑒𝑠
π»π‘’π‘Žπ‘‘π‘Žπ‘β„Žπ‘’
35
Download