Learning Large-Scale Conditional Random Fields Joseph K. Bradley Thesis Defense

advertisement
Thesis Defense
Learning Large-Scale
Conditional Random Fields
Joseph K. Bradley
Committee
Carlos Guestrin (U. of Washington, Chair)
Tom Mitchell
John Lafferty (U. of Chicago)
Andrew McCallum (U. of Massachusetts at Amherst)
1 / 18 / 2013
Carnegie Mellon
Modeling Distributions
Goal: Model distribution P(X) over random variables X
E.g.: Model life of a grad student.
X1: losing sleep?
X6: loud roommate?
X7: taking classes?
X4: losing hair?
X2: deadline?
X3: sick?
X9: exercising?
X5: overeating?
X10: gaining weight?
X8: cold weather?
X11: single?
2
Modeling Distributions
Goal: Model distribution P(X) over random variables X
E.g.: Model life of a grad student.
X1: losing sleep?
X7: taking classes?
X2: deadline?
X5: overeating?
P(X1, X5 | X2, X7 ) = P( losing sleep, overeating | deadline, taking classes )
3
Markov Random Fields (MRFs)
Goal: Model distribution P(X) over random variables X
E.g.: Model life of a grad student.
X1: losing sleep?
X6: loud roommate?
X7: taking classes?
X4: losing hair?
X2: deadline?
X3: sick?
X9: exercising?
X5: overeating?
X10: gaining weight?
X8: cold weather?
X10: single?
4
Markov Random Fields (MRFs)
Goal: Model distribution P(X) over random variables X
P(X)µ Y16 (X1, X6 )Y13 (X1, X3 )×...
Y16
X1
Y
Y
X5
Y
X6
Y
X4
X3
Y
factor (parameters)
Y
X10
Y
Y
Y
X7
Y
X9
Y
Y
X2
Y
Y
Y
graphical
structure
X8
X10
5
Conditional Random Fields (CRFs)
MRFs: P(X)
(Lafferty et al., 2001)
CRFs: P(Y|X)
P(Y | X)µ Y(Y1, X1 )Y(Y1,Y3 )×...
X1
Y1
X3
Y4
X2
Y3
X4
X5
Y5
Y2
Simpler structure (over Y only)
X6
Do not model P(X)
6
MRFs & CRFs
Benefits
• Principled statistical and computational framework
• Large body of literature
Applications
•
•
•
•
•
Natural language processing (e.g., Lafferty et al., 2001)
Vision (e.g., Tappen et al., 2007)
Activity recognition (e.g., Vail et al., 2007)
Medical applications (e.g., Schmidt et al., 2008)
...
7
Challenges
Goal: Given data, learn CRF structure and parameters.
X1
Y1
Y(Y1, X1 ), Y(Y1,Y3 ),...
Y4
X2
Y3
X5
Y5
Y2
X6
NP hard in general
Big structured
optimization problem
(Srebro, 2003)
Many learning methods require inference,
i.e., answering queries P(A|B)
NP hard to approximate
(Roth, 1996)
Approximations often lack strong guarantees.
8
Thesis Statement
CRFs offer statistical and computational advantages,
but traditional learning methods are often impractical
for large problems.
We can scale learning by using decompositions of
learning problems which trade off sample complexity,
computation, and parallelization.
9
Scaling core
methods
Outline
Parameter Learning
Structure Learning
 Learning without
intractable inference
 Learning tractable
structures
Parallel
scaling
solve via
Parallel Regression
 Multicore sparse
regression
10
Scaling core
methods
Outline
Parameter Learning
 Learning without
intractable inference
11
Log-linear MRFs
Goal: Model distribution P(X) over random variables X
X6
X1
P(X)µ Y12 (X1, X2 )Y 24 (X2, X4 )×...
X2
X3
Parameters
Features
X8
X9
X5
= exp (q12T F12 (X1, X2 ))
X7
X4
X10
X10
All results
generalize
to CRFs.
Pq (X)µ exp(q F(X))
T
12
Parameter Learning: MLE
Pq (X)µ exp(q F(X))
T
Parameter Learning
Given structure Φ and
samples from Pθ*(X),
Learn parameters θ.
Traditional method: max-likelihood estimation (MLE)
Minimize
objective:
Edata [-log Pq (X)] + regularization
Loss
Gold Standard:
MLE is (optimally) statistically efficient.
13
Parameter Learning: MLE
Pq (X)µ exp(q T F(X))
14
Parameter Learning: MLE
1
Pq (X) = exp(q T F(X))
Zq
MLE requires inference.
 Provably hard for general MRFs.
(Roth, 1996)
Inference makes
learning hard.
Can we learn without
intractable inference?
Zq = å exp (q T F(x))
x
15
Parameter Learning: MLE
1
Pq (X) = exp(q T F(X))
Zq
Approximate inference & objectives
• Many works: Hinton (2002), Sutton &
McCallum (2005), Wainwright (2006), ...
• Many lack strong theory.
• Almost no guarantees for general
MRFs or CRFs.
Inference makes
learning hard.
Can we learn without
intractable inference?
16
Our Solution
Bradley, Guestrin (2012)
Max Likelihood
Estimation (MLE)
Max Pseudolikelihood
Estimation (MPLE)
Sample
complexity
Parallel
Computational
complexity optimization
Optimal
High
Difficult
High
Low
Easy
PAC learnability
for many MRFs!
17
Our Solution
Bradley, Guestrin (2012)
Max Likelihood
Estimation (MLE)
Max Pseudolikelihood
Estimation (MPLE)
Sample
complexity
Parallel
Computational
complexity optimization
Optimal
High
Difficult
High
Low
Easy
PAC learnability
for many MRFs!
18
Our Solution
Bradley, Guestrin (2012)
Max Likelihood
Estimation (MLE)
Max Composite
Likelihood Estimation
(MCLE)
Sample
complexity
Parallel
Computational
complexity optimization
Optimal
High
Difficult
Low
Low
Easy
Choose MCLE structure to optimize trade-offs
Max Pseudolikelihood
Estimation (MPLE)
High
Low
Easy
19
Deriving Pseudolikelihood (MPLE)
MLE:
minq Edata [-log Pq (X)]
Hard to compute.
So replace it!
P(X)µ Y12 (X1, X2 )Y 24 (X2, X4 )×...
X1
Y 24
Y12
Y 23
X2
Y 25
X4
X5
X3
Y 35
20
Deriving Pseudolikelihood (MPLE)
P(X1 | X\1 )µ Y12 (X1, X2 )
MLE:
minq Edata [-log Pq (X)]
X1
Y12
MPLE:
é
ù
minq Edata ê-å log Pq (Xi | X \ i )ú
ë i
û
(Besag, 1975)
Y12 via regression:
minq12 Edata éë-log Pq12 (X1 | X \1 )ùû
Estimate
Tractable inference!
21
Pseudolikelihood (MPLE)
Cons
Pros
• No intractable inference!
MPLE:
é
ù
• min
Consistent
estimator
E - log P (X | X )
q
data
ê
ë
å
i
q
i
\i
ú
û
• Less statistically efficient
than MLE (Liang & Jordan, 2008)
• No PAC bounds
(Besag, 1975)
PAC = Probably Approximately Correct
(Valiant, 1984)
22
Sample Complexity: MLE
Our Theorem: Bound on n (# training examples needed)
1 1
r
n ³ const × 2 2 log
L min e
d
Λmin: min eigenvalue of
Hessian of loss at θ*
# parameters
(length of θ)
probability
of failure
parameter
error (L1)
Recall: Requires intractable inference.
23
Sample Complexity: MPLE
Our Theorem: Bound on n (# training examples needed)
1 1
r
n ³ const × 2 2 log
L min e
d
Λmin: mini [ min eigenvalue of Hessian
of component i at θ* ]
Recall: Tractable inference.
# parameters
(length of θ)
probability
of failure
parameter
error (L1)
PAC learnability
for many MRFs!
24
Sample Complexity: MPLE
Our Theorem: Bound on n (# training examples needed)
1 1
r
n ³ const × 2 2 log
L min e
d
PAC learnability
for many MRFs!
Related Work
Ravikumar et al. (2010)
• Regression Yi~X with Ising models
• Basis of our theory
Liang & Jordan (2008)
• Asymptotic analysis of MLE, MPLE
• Our bounds match theirs
Abbeel et al. (2006)
• Only previous method with PAC bounds for high-treewidth MRFs
• We extend their work:
• Extension to CRFs, algorithmic improvements, analysis
• Their method is very similar to MPLE.
25
Trade-offs: MLE & MPLE
Our Theorem: Bound on n (# training examples needed)
1 1
r
n ³ const × 2 2 log
L min e
d
MLE
Larger Λmin
=> Lower sample complexity
Higher computational complexity
MPLE
Smaller Λmin
=> Higher sample complexity
Lower computational complexity
Sample — computational complexity
trade-off
26
Trade-offs: MPLE
Joint optimization for MPLE:
minq Edata [-log Pq (X1 | X2 ) - log Pq (X2 | X1 )]
X1
Lower sample complexity
Disjoint optimization for MPLE:
minq Edata [-log Pq (X1 | X2 )]
minq Edata [-log Pq (X2 | X1 )]
Y12
Y12
X2
Data-parallel
2 estimates of Y12
 Average estimates
Sample complexity — parallelism
trade-off
27
Synthetic CRFs Pq (X | E)
Chains
Y
Stars
Random
Associative
Grids
Factor strength
= strength of variable interactions
28
Predictive Power of Bounds
Errors should be ordered:
MLE < MPLE < MPLE-disjoint
L1 param error ε
better
60
Length-4 chains
Factors: random, fixed strength
MPLE-disjoint
MPLE
MLE
50
40
30
20
10
0
1
100
10000
# training examples
29
Predictive Power of Bounds
1 1
r
log
MLE & MPLE Sample Complexity: n ³ const × 2
2
L min e
d
Length-6 chains
Factors: random
10,000 train exs
e µ1/ L min
Actual ε
better
0.1
MLE
0.08
0.06
0.04
0.02
0
50
1/ L min
harder
100
30
Failure Modes of MPLE
Sample complexity:
n = O (1/ L 2min )
How do Λmin(MLE) and Λmin(MPLE)
vary for different models?
Model
diameter
Factor
strength
Node
degree
31
Λmin: Model Diameter
Λmin ratio: MLE/MPLE
Chains
(Higher = MLE better)
Factors: associative, fixed strength
Relative MPLE performance
is independent of diameter
in chains.
(Same for random factors)
Λmin ratio
1.6
1.4
1.2
1
0
5
10
Model diameter
32
Λmin: Factor Strength
Λmin ratio: MLE/MPLE
Length-8 Chains
(Higher = MLE better)
Factors: associative
MPLE performs poorly with
strong factors.
(Same for random factors, and
star & grid models)
Λmin ratio
200
150
100
50
0
0
2
4
Factor strength
33
Λmin: Node Degree
Stars
Λmin ratio: MLE/MPLE
Factors: associative, fixed strength
9
(Same for random factors)
Λmin ratio
MPLE performs poorly with
high-degree nodes.
(Higher = MLE better)
7
5
3
1
1
6
11
Node degree
34
Failure Modes of MPLE
Sample complexity:
n = O (1/ L 2min )
How do Λmin(MLE) and Λmin(MPLE)
vary for different models?
Model
diameter
Factor
strength
Node
degree
We can often fix this!
35
Composite Likelihood (MCLE)
MLE: Estimate P(Y) all at once
36
Composite Likelihood (MCLE)
MLE: Estimate P(Y) all at once
MPLE: Estimate P(Yi|Y\i) separately
Yi
37
Composite Likelihood (MCLE)
MLE: Estimate P(Y) all at once
MPLE: Estimate P(Yi|Y\i) separately
Something in between?
YAi
Composite Likelihood (MCLE):
Estimate P(YAi|Y\Ai) separately.
(Lindsay, 1988)
38
Composite Likelihood (MCLE)
MCLE Class:
Node-disjoint subgraphs
which cover graph.
Generalizes MLE, MPLE; analogous:
Objective
Sample complexity
Joint & disjoint optimization
39
Composite Likelihood (MCLE)
Combs
MCLE Class:
Node-disjoint subgraphs
which cover graph.
Generalizes MLE, MPLE; analogous:
Objective
Sample complexity
Joint & disjoint optimization
• Trees (tractable inference)
• Follow structure of P(X)
• Cover star structures
• Cover strong factors
• Choose large components
40
Structured MCLE on a Grid
1.04
MPLE
1.03
1.02
1.01
MCLE (combs)
1
0
20
40
60
Grid size |X|
Training time (sec)
Log loss ratio (other/MLE)
better
Grid. Associative factors.
10,000 train exs. Gibbs sampling.
MCLE (combs) lowers sample complexity
...without increasing computation!
6000
MLE
4000
2000
MCLE
(combs)
MPLE
0
0
20 40 60
Grid size |X|
MCLE tailored
to model structure.
Also in thesis: tailoring
to correlations in data.
41
Summary: Parameter Learning
• Finite sample complexity bounds for general MRFs, CRFs
• PAC learnability for certain classes
• Empirical analysis
• Guidelines for choosing MCLE structures: tailor to model, data
Sample
complexity
Likelihood (MLE)
Parallel
Computational
complexity optimization
Optimal
High
Difficult
Composite Likelihood
(MCLE)
Low
Low
Easy
Pseudolikelihood
(MPLE)
High
Low
Easy
42
Scaling core
methods
Outline
Structure Learning
 Learning tractable
structures
43
CRF Structure Learning
P(Y | X) µ Õ Y j (YCj , X Dj )
j
Structure learning: Choose YC
I.e., learn conditional independence
Y1: losing sleep?
Y(Y1, X1 )
Evidence selection: Choose XD
I.e., select X relevant to each YC
X1: loud roommate?
X2: taking classes?
Y(Y1,Y3 )
Y3: sick?
Y2: losing hair?
X3: deadline?
44
Related Work
Previous
Work
Method
Structure
learning?
Tractable
inference?
Evidence
selection?
Torralba et
al. (2004)
Boosted Random
Fields
Yes
No
Yes
Schmidt et
al. (2008)
Block-L1 regularized
pseudolikelihood
Yes
No
No
Shahaf et al.
(2009)
Edge weights +
low-treewidth model
Yes
Yes
No
Most similar to our work:
 They focus on selecting treewidth-k structures.
 We focus on the choice of edge weight.
45
Tree CRFs with Local Evidence
Bradley, Guestrin (2010)
Goal
Given:
Xi relevant to each Yi
Data
F(Yi ,Yj , Xij ), | Xij | << | X |
Local evidence
Fast inference at test-time
Learn tree CRF structure
Via a scalable method
46
Chow-Liu for MRFs
Chow & Liu (1968)
Algorithm
Weight edges with
mutual information:
w(i, j) = I(Yi ;Y j )
Y1
I(Y1;Y2 )
Y2
I(Y2;Y3 )
I(Y1;Y3 )
Y3
47
Chow-Liu for MRFs
Chow & Liu (1968)
Algorithm
Weight edges with
mutual information:
I(Y1;Y2 )
w(i, j) = I(Yi ;Y j )
Y1
Choose max-weight
spanning tree.
I(Y1;Y3 )
Y2
I(Y2;Y3 )
Y3
Chow-Liu finds a
max-likelihood structure.
48
Chow-Liu for CRFs?
Algorithm
Weight each possible
edge:
w(i, j) = ?
Choose max-weight
spanning tree.
What edge weight?
 must be efficient to compute
Global Conditional Mutual
Information (CMI)
w(i, j) = I(Yi;Y j | X)
Pro: Finds max-likelihood structure
(with enough data)
Con: Intractable for large |X|
49
Generalized Edge Weights
Global CMI
w(i, j) = I(Yi;Y j | X) = -H(Yi,Yj | X)+ H(Yi | X)+ H(Yj | X)
Local Linear Entropy Scores (LLES):
w(i,j) = linear combination of entropies over Yi,Yj,Xi,Xj
Theorem
No LLES can recover all tree CRFs
(even with non-trivial parameters and exact entropies).
50
Heuristic Edge Weights
Global CMI
w(i, j) = I(Yi;Y j | X) = -H(Yi,Yj | X)+ H(Yi | X)+ H(Yj | X)
Local CMI
w(i, j) = I(Yi;Y j | X i , X j )
= -H(Yi ,Y j | X i,X j ) + H(Yi | X i,X j ) + H(Y j | X i,X j )
Decomposable
Conditional
Influence (DCI)
w(i, j) = -H(Yi ,Y j | X i,X j ) + H(Yi | X i ) + H(Y j | X j )
Method
Guarantees
Compute w(i,j)
tractably
Comments
Global CMI
Recovers true tree
No
Shahaf et al. (2009)
Local CMI
Lower-bounds
likelihood gain
Yes
Fails with strong Yi—Xi
potentials
DCI
Exact likelihood gain
for some edges
Yes
Best empirically
51
Synthetic Tests
Trees w/ associative factors. |Y|=40.
1000 test samples. Error bars: 2 std. errors.
Fraction edges recovered
better
True CRF
DCI
Global CMI
1
0.8
Local CMI
0.6
0.4
0.2
Schmidt
et al.
0
0
100
200
300
400
# training examples
500
52
Synthetic Tests
Trees w/ associative factors. |Y|=40.
1000 test samples. Error bars: 2 std. errors.
Global CMI
Seconds
better
20000
15000
10000
DCI
Local CMI
5000
0
0
100
200
300
400
500
Schmidt
et al.
# training examples
53
fMRI Tests
(Application & data from Palatucci et al., 2009)
X: fMRI voxels (500)
predict
Y: semantic features (218)
 E[log P(Y | X )]
Disconnected
(Palatucci et al., 2009)
better
DCI 1
Image from
http://en.wikipedia.org/wiki/File:FMRI.jpg
DCI 2
54
Summary: Structure Learning
• Analyzed generalizing Chow-Liu to CRFs
• Proposed class of edge weights: Local Linear Entropy Scores
• Negative result: insufficient for recovering trees
• Discovered useful heuristic edge weights: Local CMI, DCI
• Promising empirical results on synthetic & fMRI data
Generalized Chow-Liu
Compute edge weights
w12
w23
w25
Max-weight spanning tree
w24
w45
55
Scaling core
methods
Outline
Parameter Learning
Pseudolikelihood
Canonical parameterization
Parallel
scaling
Regress each variable
on its neighbors:
P( Xi | X\i )
Structure Learning
Generalized Chow-Liu
solve via
Compute edge weights
via P(Yi,Yj | Xij )
Parallel Regression
 Multicore sparse
regression
56
Sparse (L1) Regression
(Bradley, Kyrola, Bickson, Guestrin, 2011)
 Useful in high-dimensional setting (# features >> # examples)
 Lasso and sparse logistic regression
Lasso (Tibshirani, 1996)
Goal: Predict
Objective:
y Î Â from x Î Âd, given samples {(x i , y i )}i
min 12 || Xw - y ||22 + l || w ||1
w
Bias towards
sparse solutions
57
Parallelizing LASSO
Many LASSO optimization algorithms
Gradient descent, interior point, stochastic gradient, shrinkage,
hard/soft thresholding
Coordinate descent (a.k.a. Shooting (Fu, 1998))
One of the fastest algorithms (Yuan et al., 2010)
Parallel optimization
Matrix-vector ops (e.g., interior point)  Not great empirically
Stochastic gradient (e.g., Zinkevich et al., 2010)  Best for many samples,
not large d
Shooting  Inherently sequential
Shotgun: Parallel coordinate descent for L1 regression
 simple algorithm, elegant analysis
58
Shooting: Sequential SCD
min F(w)
w
where
F(w) = 12 || Xw - y ||22 + l || w ||1
Stochastic Coordinate Descent (SCD)
While not converged,
Choose random coordinate j,
Update wj (closed-form minimization)
59
Shotgun: Parallel SCD
min F(w)
w
where
F(w) = 12 || Xw - y ||22 + l || w ||1
Shotgun Algorithm (Parallel SCD)
While not converged,
On each of P processors,
Choose random coordinate j,
Update wj (same as for Shooting)
Is SCD inherently
sequential?
Nice case:
Uncorrelated
features
Bad case:
Correlated
features
60
Shotgun: Theory
min F(w)
w
where
F(w) = 12 || Xw - y ||22 + l || w ||1
Convergence Theorem
Assume # parallel updates
E éëF(w(T ) )ùû - F(w*) £
Final
objective
Optimal
objective
d
P£
r
x Î Âd
r= spectral radius of XTX
d × ( 12 || w* ||22 +2F(w(0) ))
T ×P
iterations
Generalizes bounds for Shooting
(Shalev-Shwartz & Tewari, 2009)
61
Shotgun: Theory
min F(w)
w
F(w) = 12 || Xw - y ||22 + l || w ||1
where
Convergence Theorem
Assume P £ d / r
where
r = spectral radius of X’X.
E [ F(w )] - F(w*)
(T )
£
final - opt
objective
d ( || w* || +2F(w ))
1
2
2
2
(0)
T ×P
iterations
# parallel updates
Nice case:
Uncorrelated
features
r =1Þ Pmax = d
Bad case:
Correlated
features
r = d Þ Pmax =1 (at worst)
62
Shotgun: Theory
Convergence Theorem
Assume P £ d / r
Up to a threshold...
E [ F(w )] - F(w*)
Experiments match
(T )
2
2
(0)
Mug32_singlepixcam
10000
T (iterations)
... linear speedups predicted.
T ×P
(d =1024)
Pmax=79
1000
100
10
1
1
10
100
1000
P (parallel updates)
SparcoProblem7
10000
T (iterations)
£
our theory!
d ( || w* || +2F(w ))
1
2
(d = 2560)
Pmax=284
1000
100
10
1
1
100
P (parallel updates)
63
Lasso Experiments
Compared many algorithms
Interior point (L1_LS)
Shrinkage (FPC_AS, SpaRSA)
Projected gradient (GPSR_BB)
Iterative hard thresholding (Hard_IO)
Also ran: GLMNET, LARS, SMIDAS
Single-Pixel Camera
35 datasets
Pmax =1
λ=.5, 10
Shooting
Shotgun P = 8
(multicore)
Shotgun proves
most scalable & robust
Sparco (van den Berg et al., 2009)
Pmax Î [1,8683]
Sparse Compressed Imaging
Large, Sparse Datasets
Pmax Î [1432, 5889]
Pmax Î [107,1036]
64
Shotgun: Speedup
Aggregated results from all tests
But we are doing
fewer iterations! 
8
Optimal
7
Speedup
6
Lasso Iteration
Speedup
5
Explanation:
Memory wall
(Wulf & McKee, 1995)
4
Logistic
regression
The memory
bus uses
more
FLOPS/datum.
gets flooded.
 Extra computation
hides memory latency.
 great
Better
speedups
Not so
on average!
Logistic Reg.
Time Speedup
3
2
Lasso Time
Speedup
1
1
2
3
4
5
# cores
6
7
8
65
Summary: Parallel Regression
• Shotgun: parallel coordinate descent on multicore
• Analysis: near-linear speedups, up to problem-dependent limit
• Extensive experiments (37 datasets, 7 other methods)
• Our theory predicts empirical behavior well.
• Shotgun is one of the most scalable methods.
Shotgun
Decompose computation
by coordinate updates
Trade a little extra computation
for a lot of parallelism
66
Recall: Thesis Statement
We can scale learning by using
decompositions of learning problems which
trade off sample complexity, computation, and parallelization.
 Decompositions use model structure & locality.
 Trade-offs use model- and data-specific methods.
Parameter Learning
Structured composite likelihood
Structure Learning
Generalized Chow-Liu
w12
MLE
MCLE
MPLE
w23
w25
w24
w45
Parallel Regression
Shotgun: parallel coordinate descent
67
Future Work: Unified System
Structure Learning
Parameter Learning
L1 Structure Learning
Structured MCLE
Use structured MCLE?
Automatically:
• choose MCLE structure &
parallelization strategy
• to optimize trade-offs,
• tailored to model & data.
Learning Trees
Learn trees for parameter
estimators?
Parallel Regression
Shotgun (multicore)
Distributed
Limited communication in
distributed setting.
Handle complex objectives
(e.g., MCLE).
68
Summary
We can scale learning by using decompositions of learning problems
which trade off sample complexity, computation, and parallelization.
Parameter learning
Structure learning
 Structured composite likelihood
 Generalizing Chow-Liu to CRFs
Finite sample complexity bounds
Empirical analysis
Guidelines for choosing MCLE structures:
tailor to model, data
Analyzed canonical parameterization of
Abbeel et al. (2006)
Parallel regression
Proposed class of edge weights:
Local Linear Entropy Scores
Insufficient for recovering trees
Discovered useful heuristic edge
weights: Local CMI, DCI
Promising empirical results on
synthetic & fMRI data
 Shotgun: parallel coordinate descent on multicore
Analysis: near-linear speedups, up to problem-dependent limit
Extensive experiments (37 datasets, 7 other methods)
Our theory predicts empirical behavior well.
Shotgun is one of the most scalable methods.
Than
k you!
69
Download