A Draft White Paper Comparing of SVM and FAUST Classifiers In

advertisement
A Draft White Paper Comparing of SVM and FAUST Classifiers
In this paper, we do some rough comparison of Support Vector Machines (SVM) versus
Functional Analytic Unsupervised and Supervised Technology (FAUST) Classifiers. The first
big difference is that SVM uses horizontally structured data and processes it vertically, while
FAUST uses vertically structure data (to the bit slice level) and processes it horizontally. Big
data usually means many rows (trillions) and only a few columns (tens, hundreds, thousands).
Therefore, FAUST loops require many orders of magnitude fewer passes than SVM processing
loops. And even though each FAUST loop pass involves trillions of bits while each SVM loop
pass involves only tens, hundreds or thousands of values, this difference does not balance off the
difference in the number of loop passes, since massive bit strings are processed on the metal, and
such processes have become extremely fast (e.g., with GPUs instead of CPUs). The following
short description of SVM closely resembles the description in Wikipedia.
In Support Vector Machines, given a set of training examples, each belonging to one of two
categories, an SVM training algorithm builds a model that assigns new examples into one
category or the other. SVM represents points in space so that the examples of the separate
categories are divided by a clear linear gap that is as wide as possible. New examples are then
mapped into that same space and predicted to belong to a category based on which side of the
gap they fall on. In addition to performing linear classification, SVMs can perform a non-linear
classification using the kernel trick, implicitly mapping their inputs into high-dimensional
spaces. To use SVM for 1-class classification, the kernel trick is required since a single class
almost never linearly separable from its complement. FAUST doesn’t have this short-coming.
More formally, SVM constructs a set of hyper-planes in a high dimensional space, which can be
used for classification, regression, or other tasks. Intuitively, a good separation is achieved by the
hyperplane that has the largest distance to the nearest training data point of any class (so-called
functional margin), since in general the larger the margin the lower the error of the classifier.
Whereas the original problem may be stated in a low-dimensional space, it often happens that the
sets to discriminate are not linearly separable in that space. For this reason, it was proposed that
the original low-dimensional space be mapped into a much higher-dimensional space,
presumably making the separation easier in that space. To keep the computational load
reasonable, the mappings used by SVM schemes are designed to ensure that dot products may be
computed easily in terms of the variables in the original space, by defining them in terms of a
kernel function
selected to suit the problem. The hyper-planes in the higher-dimensional
space are defined as the set of points whose dot product with a vector in that space is constant.
The vectors defining the hyper-planes can be chosen to be linear combinations with parameters
of images of feature vectors that occur in the data base. With this choice of a hyper-plane, the
points in the feature space that are mapped into the hyper-plane are defined by the relation:
Note that if
becomes small as y grows further away from
, each term in the sum measures the degree of closeness of the test point to the corresponding
data base point . So the sum of kernels above can be used to measure the relative nearness of
each test point to the data points originating in one or the other of the sets to be discriminated.
For Big Data, if one has to use a kernel, the computation cost (time) may become prohibitive,
whether the data is structured horizontally or vertically.
H1 does not separate the classes.
H2 does, but only with a small margin.
H3 separates them with the maximum margin.
Functional Analytic Unsupervised and Supervised Technology (FAUST) Classifiers work for 1class, two-class and multi-class classification directly. The idea is to “house” or circumscribe
each class in a tight hull which is constructed so that it is very easy to determine if a point lies in
a hull or not. For convex classes, the so-called convex hull is mathematically the tightest hull
possible. It turns out that, the recursively constructed FAUST hulls are often even better than the
convex hull when the class is non-convex. FAUST class-hulls are piecewise hulls in the sense
that they are made up of a series of pairs of “boundary pieces” (n-1 dimensional pieces, if the
space is n dimensional) that fit up against the class tightly. These boundary pairs are linear when
using the dot product functional, L; round when using the spherical functional , S, and tubular
when using the radial functional, R. These functionals are:
n
Let X(X1..Xn)R with |X|=N. The classes are {C1..CK}.
n
Let d=(d1,...,dn), p=(p1,…,pn)R with |d|=1.
o is the dot product.
LdXod is a single column of numbers (bit sliced)
and so are
Ld,p(X-p)od = Xod-pod = Ld-pod,
Sp(X-p)o(X-p) = XoX+Xo(-2p)+pop,
2
Rd,p Sp-L d,p .
The FAUST classifier the assigns y to class Ck iff it is in the hull of class Ck as follows:
yCk iff yHk{z | Fmind,p,k (z-p)od Fmaxd,p,k (d,p) from dpSet.}
where F ranges over L, S and R. Fmin is the minimum and Fmax is the maximum value in the
respective column. dpSet is a set of unit vectors and points (used to define projection lines in the
direction of d through p, for the functionals). Typically, dpSet would include all the standard
basis unit vectors so that L is just a column of X and requires no computation. In general, the
bigger dpSet is the better (the tighter the hull).
Once the Fmin and Fmax values have been computed (using bit string processing on FAUST’s
massive vertical bit slice structures), the determination of whether y is in a class Ck or not
involves simple numeric comparisons.
The following shows some typical FAUST boundary pieces (Linear only).
Some advantages of FAUST (over SVM) include:
1. FAUST scales well to very large datasets, both large cardinality and large dimensionality.
2. No translation or kernelization is ever required.
3. Building the hull models of each class is fast using pTree technology.
4. Applying the model is very fast (requiring only a series of number comparisons).
5. Elimination of False Positives can always be done by adding more boundary pieces (using more
(d,p) pairs).
6. 1-class classification or multi-class (k-class) classification are done the same way.
7. The model building phase is highly parallelizable.
8. The model constantly be incrementally improved even while it is in use (by add more boundary
pairs).
9. If the training set is suspected to be somewhat uncharacteristic or inadequate (in terms of
faithfully representing classes), each boundary pair can be moved to the first and last Precipitous
Count Change in the functional instead of the min and max values. This will eliminate outlier
values and may provide a more accurate model (this corresponds to Lagrange noise elimination in
SVM).
Download