Scala

advertisement
NLP in Scala with Breeze and Epic
David Hall
UC Berkeley
ScalaNLP Ecosystem
≈
≈
Numpy/Scipy
PyStruct/NLTK
• Natural Language
• Linear Algebra
Processing
• Scientific Computing
• Structured Prediction
• Optimization
Breeze
Epic
≈
• Super-fast GPU
parser for English
{
}
Puck
Natural Language Processing
S
VP
NP
S
V
VP
NP
VP
PP
Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap.
It certainly won’t get there on looks.
Epic for NLP
S
VP
NP
S
V
VP
NP
VP
PP
Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap.
Named Entity Recognition
Person Organization Location Misc
NER with Epic
> import epic.models.NerSelector
> val nerModel = NerSelector.loadNer("en").get
> val tokens = epic.preprocess.tokenize("Almost 20 years
ago, Bill Watterson walked away from \"Calvin &
Hobbes.\"")
> println(nerModel.bestSequence(tokens).render("O"))
Almost 20 years ago , [PER:Bill Watterson] walked away
from `` [LOC:Calvin & Hobbes] . ''
Not a location!
Annotate a bunch of data?
Building an NER system
> val data: IndexedSeq[Segmentation[Label, String]] = ???
> val system = SemiCRF.buildSimple(data,
startLabel,
outsideLabel)
> println(system.bestSequence(tokens).render("O"))
Almost 20 years ago , [PER:Bill Watterson] walked away
from `` [MISC:Calvin & Hobbes] . ''
Gazetteers
http://en.wikipedia.org/wiki/List_of_newspaper_comic_strips_A%E2%80%93F
Gazetteers
Using your own gazetteer
> val data: IndexedSeq[Segmentation[Label, String]] = ???
> val myGazetteer = ???
> val system = SemiCRF.buildSimple(data,
startLabel,
outsideLabel,
gaz = myGazetteer)
Gazetteer
• Careful with gazetteers!
• If built from training data, system will use it
and only it to make predictions!
• So, only known forms will be detected.
• Still, can be very useful…
Semi-CR-What?
• Semi-Markov Conditional Random Field
• Don’t worry about the name.
Semi-CRFs
Semi-CRFs
score(Chez Panisse)
+ score(Berkeley, CA)
+ score(- A bowl of )
+ score(Churchill-Brenneis Orchards)
+ score(Page mandarins and medjool dates)
Features
=
w(starts-with-Chez)
score(Chez
+ w(starts-with-C…)
+ w(ends-with-P…)
+ w(starts-sentence)
+ w(shape:Xxx Xxx)
+ w(two-words)
+ w(in-gazetteer)
Panisse)
Building your own features
val dsl = new WordFeaturizer.DSL[L](counts) with
SurfaceFeaturizer.DSL
import dsl._
word(begin) // word at the beginning of the span
+ word(end – 1) // end of the span
+ word(begin – 1) // before (gets things like Mr.)
+ word (end) // one past the end
+ prefixes(begin) // prefixes up to some length
+ suffixes(begin)
+ length(begin, end) // names tend to be 1-3 words
+ gazetteer(begin, end)
Using your own featurizer
> val data: IndexedSeq[Segmentation[Label, String]] = ???
> val myFeaturizer = ???
> val system = SemiCRF.buildSimple(data,
startLabel,
outsideLabel,
featurizer = myFeaturizer)
Features
• So far, we’ve been able to do everything with
(nearly) no math.
• To understand more, need to do some math.
Machine Learning Primer
• Training example (x, y)
– x: sentence of some sort
– y: labeled version
• Goal:
– want score(x, y) > score(x, y’), forall y’.
Machine Learning Primer
score(x, y) = wTf(x, y)
Machine Learning Primer
score(x, y) = w.t * f(x, y)
Machine Learning Primer
score(x, y) = w dot f(x, y)
Machine Learning Primer
score(x, y) >= score(x, y’)
Machine Learning Primer
w dot f(x, y) >= w dot f(x, y’)
Machine Learning Primer
w dot f(x, y) >= w dot f(x, y’)
Machine Learning Primer
w + f(x,y)
f(x,y)
w
f(x,y’)
(w + f(x,y)) dot f(x,y) >= w dot f(x,y)
The Perceptron
> val featureIndex : Index[Feature] = ???
> val labelIndex: Index[Label] = ???
> val weights = DenseVector.rand[Double](featureIndex.size)
> for ( epoch <- 0 until numEpochs; (x, y) <- data ) {
val labelScores = DV.tabulate(labelIndex.size) { yy =>
val features = featuresFor(x, yy)
weights.t * new FeatureVector(indexed)
// or weights dot new FeatureVector(indexed)
}
…
}
The Perceptron (cont’d)
> val featureIndex : Index[Feature] = ???
> val labelIndex: Index[Label] = ???
> val weights = DenseVector.rand[Double](featureIndex.size)
> for ( epoch <- 0 until numEpochs; (x, y) <- data ) {
val labelScores = ...
val y_best = argmax(labelScores)
if (y != y_best) {
weights += new FeatureVector(featuresFor(x, y))
weights -= new FeaureVector(featuresFor(x, y_best))
}
}
Structured Perceptron
• Can’t enumerate all segmentations! (L * 2n)
• But dynamic programs exist to max or sum
• … if the feature function has a nice form.
Structured Perceptron
> val featureIndex : Index[Feature] = ???
> val labelIndex: Index[Label] = ???
> val weights = DenseVector.rand[Double](featureIndex.size)
> for ( epoch <- 0 until numEpochs; (x, y) <- data ) {
val y_best = bestStructure(weights, x)
if (y != y_best) {
weights += featuresFor(x, y)
weights -= featuresFor(x, y_best)
}
}
Constituency Parsing
S
VP
NP
S
V
VP
NP
VP
PP
Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap.
Multilingual Parser
95
"Berkeley"
Epic
90
85
80
75
70
“Berkeley:” [Petrov & Klein, 2007]; Epic [Hall, Durrett, and Klein, 2014]
Epic Pre-built Models
• Parsing
– English, Basque, French, German, Swedish,
Polish, Korean
– (working on Arabic, Chinese, Spanish)
• Part-of-Speech Tagging
– English, Basque, French, German, Swedish,
Polish
• Named Entity Recognition
– English
• Sentence segmentation
– English
– (ok support for others)
• Tokenization
– Above languages
What Is Breeze?
Dense Vectors, Matrices, Sparse Vectors,
Counters, Matrix Decompositions
What Is Breeze?
Nonlinear Optimization,
Probability Distributions
Getting started
libraryDependencies ++= Seq(
"org.scalanlp" %% "breeze" % "0.8.1",
// optional: native linear algebra libraries
// bulky, but faster
"org.scalanlp" %% "breeze-natives" % "0.8.1"
)
scalaVersion := "2.11.1"
Linear Algebra
> import breeze.linalg._
> val x = DenseVector.zeros[Int](5)
// DenseVector(0, 0, 0, 0, 0)
> val m = DenseMatrix.zeros[Int](5,5)
> val r = DenseMatrix.rand(5,5)
>
>
>
>
>
>
m.t // transpose
x + x // addition
m * x // multiplication by vector
m * 3 // by scalar
m * m // by matrix
m :* m // element wise mult, Matlab .*
Return Type Selection
> import breeze.linalg.{DenseVector => DV}
> val dv = DV(1.0, 2.0)
> val sv = SparseVector.zeros[Double](2)
> dv + sv
// res: DenseVector[Double] = DenseVector(1.0, 2.0)
> sv + sv
// res: SparseVector[Double] = SparseVector(1.0, 2.0)
Return Type Selection
> import breeze.linalg.{DenseVector => DV}
> val dv = DV(1.0, 2.0)
> val sv = SparseVector.zeros[Double](2)
> (dv: Vector[Double]) + (sv: Vector[Double])
// res: Vector[Double] = DenseVector(1.0, 2.0)
> (sv: Vector[Double]) + (sv: Vector[Double])
Static: Vector
Dynamic: Dense
// res: Vector[Double] = SparseVector()
Static: Vector
Dynamic: Sparse
Linear Algebra: Slices
> val m = DenseMatrix.zeros[Int](5, 5)
> m(::,1) // slice a column
// DV(0, 0, 0, 0, 0)
> m(4,::) // slice a row
> m(4,::) := DV(1,2,3,4,5).t
> m.toString
0
0
0
0
1
0
0
0
0
2
0
0
0
0
3
0
0
0
0
4
0
0
0
0
5
Linear Algebra: Slices
> m(0 to 1, 3 to 4).toString
0
2
0
3
> m(IndexedSeq(3,1,4,2),IndexedSeq(4,4,3,1))
0
0
5
0
0
0
5
0
0
0
4
0
0
0
2
0
Universal Functions
> import breeze.numerics._
> log(DenseVector(1.0, 2.0, 3.0, 4.0))
// DenseVector(0.0, 0.6931471805599453,
// 1.0986122886681098, 1.3862943611198906)
> exp(DenseMatrix( (1.0, 2.0), (3.0, 4.0)))
> sin(Array(2.0, 3.0, 4.0, 42.0))
> // also sin, cos, sqrt, asin, floor, round, digamma,
trigamma, ...
UFuncs: In-Place
> import breeze.numerics._
> val v = DV(1.0, 2.0, 3.0, 4.0)
> log.inPlace(v)
// v == DenseVector(0.0, 0.6931471805599453,
// 1.0986122886681098, 1.3862943611198906)
UFuncs: Reductions
> sum(DenseVector(1.0, 2.0, 3.0))
// 6.0
> sum(DenseVector(1, 2, 3))
// 6
> mean(DenseVector(1.0, 2.0, 3.0))
// 2.0
> mean(DenseMatrix( (1.0, 2.0, 3.0)
(4.0, 5.0, 6.0)))
// 3.5
UFuncs: Reductions
> val m = DenseMatrix((1.0, 3.0), (4.0, 4.0))
> sum(m) == 12.0
> mean(m) == 3.0
> sum(m(*, ::)) == DenseVector(4.0, 8.0)
> sum(m(::, *)) == DenseMatrix((5.0, 7.0))
> mean(m(*, ::)) == DenseVector(2.0, 4.0)
> mean(m(::, *)) == DenseMatrix((2.5, 3.5))
Broadcasting
> val dm = DenseMatrix((1.0,2.0,3.0),
(4.0,5.0,6.0))
> sum(dm(::, *)) == DenseMatrix((5.0, 7.0, 9.0))
> dm(::, *) + DenseVector(3.0, 4.0)
//
DenseMatrix((4.0, 5.0, 6.0), (8.0, 9.0, 10.0)))
> dm(::, *) := DenseVector(3.0, 4.0)
// DenseMatrix((3.0, 3.0, 3.0), (4.0, 4.0, 4.0))
UFuncs: Implementation
>
object log extends UFunc
>
implicit object logDouble extends log.Impl[Double, Double] {
def apply(x: Double) = scala.math.log(x)
}
>
log(1.0) // == 0.0
// elsewhere: magic to automatically make this work:
> log(DenseVector(1.0)) // == DenseVector(0.0)
> log(DenseMatrix(1.0)) // == DenseMatrix(0.0)
> log(SparseVector(1.0)) // == SparseVector()
>
UFuncs: Implementation
>
object add1 extends UFunc
>
implicit object add1Double extends add1.Impl[Double, Double] {
def apply(x: Double) = x + 1.0
}
>
add1(1.0) // == 2.0
// elsewhere: magic to automatically make this work:
> add1(DenseVector(1.0)) // == DenseVector(2.0)
> add1(DenseMatrix(1.0)) // == DenseMatrix(2.0)
> add1(SparseVector(1.0)) // == SparseVector(2.0)
>
Breeze Benchmarks
Singular Value Decomposition
3000
2626
2500
2151
2000
1500
1000
500
685.5
135.04
0
Breeze
jblas
Lower is better
Mahout
Apache Commons
Breeze Benchmarks
Sparse/Dense Multiply
3000
2562
2500
2000
1500
1000
500
0
26
N/A
Breeze
jblas
Lower is better
56
Mahout
Apache Commons
Breeze Benchmarks
Sparse/Dense Addition
30
25.376
25
20
15
10
5
0
0.037
N/A
0.075
Breeze
jblas
Mahout
Lower is better
Apache Commons
Nonnegative Matrix Factorization
V
≈ W H
Vij, Wij, Hij >= 0
Nonnegative Matrix Factorization
• Input: matrix V
• Output: W * H ≈ V, W and H ≥ 0
[Lee and Seung, 2011]
Breeze NMF
val n = V.rows
val m = V.cols
val r = dims
val W = DenseMatrix.rand(n, r)
val H = DenseMatrix.rand(r, m)
for(i <- 0 until iters) {
H :*= ((W.t * V) :/= (W.t * W * H))
W :*= ((V * H.t) :/= (W * H * H.t))
}
(W, H)
From Breeze to Gust
+
Breeze NMF
val n = X.rows
val m = X.cols
val r = dims
val W = DenseMatrix.rand(n, r)
val H = DenseMatrix.rand(r, m)
for(i <- 0 until iters) {
H :*= ((W.t * X) :/= (W.t * W * H))
W :*= ((X * H.t) :/= (W * H * H.t))
}
(W, H)
Gust NMF
val n = X.rows
val m = X.cols
val r = dims
≥6x speedup on my laptop!
val W = CuMatrix.rand(n, r)
val H = CuMatrix.rand(r, m)
for(i <- 0 until iters) {
H :*= ((W.t * X) :/= (W.t * W * H))
W :*= ((X * H.t) :/= (W * H * H.t))
}
(W, H)
Optimization
(x2 + y – 11)2 + (x + y2 – 7)2
Optimization
trait DiffFunction[T] extends (T=>Double) {
/** Calculates both the value
and the gradient at a point */
def calculate(x:T):(Double,T)
}
Optimization
val df = new DiffFunction[DV[Double]] {
def calculate(values: DV[Double]) = {
val gradient = DV.zeros[Double](2)
val (x,y) = (values(0),values(1))
val value = pow(x * x + y - 11, 2) +
pow(x + y * y - 7, 2)
gradient(0) = 4 * x * (x * x + y - 11)
2 * (x + y * y - 7)
gradient(1) = 2 * (x * x + y - 11) +
4 * y * (x + y * y - 7)
(value, gradient)
}
}
+
Optimization
breeze.optimize.minimize(df, DV(0.0, 0.0))
// DenseVector(3.0000000000012905, 1.999999999997128)
Optimization
• Fully configurable
• Support for:
– LBFGS (+ L2/L1 Regularization)
– Stochastic Gradient (+ L2/L1 Regularization)
– Truncated Newton
– Linear Programs
– Conjugate Gradient Descent
Thanks!
Breeze
Epic
Puck
www.github.com/dlwh/{breeze, epic, puck}
Download