Sketch Recognition of Digital Logical Circuits

advertisement
Sketch Recognition of Digital Logical Circuits
University of California, San Diego
Computer Science and Engineering
Masters Project
David Johnston
djohnsto@cs.ucsd.edu
Advisor:
Christine Alvarado
18 March 2013
Abstract
Two tasks are prevalent in most sketch recognition systems - grouping
and recognition. We describe a novel grouping algorithm which clusters
pen strokes into symbols. This algorithm is unique because it allows
a feedback loop which can improve the performance of other phases in
recognition. In our experiments, we improve the percentage of perfectly
grouped symbols from 76.7% to 94.2%. Additionally, a state-of-the-art
symbol classification technique was improved for use in the domain of
digital logical circuits. On average, this technique improved the accuracy
of symbol classification in the the “factory trained” setting from 80.8% to
87.16%.
1
Introduction
In the early stages of engineering design, pen-and-paper sketches are often used
to quickly convey concepts and ideas. Free-form drawing is often preferable to
using computer interfaces due to its ease of use, fluidity, and lack of constraints.
Tablet PCs and digital drawing tablet devices have enabled computer sketch
recognition interfaces which can interpret hand-drawn sketches in an interactive
environment.
Sketch recognition software can be applied to many different application domains, including digital logic diagrams, family trees [12], free-body diagrams,
mathematical equations, [4], electrical circuit diagrams [10], and chemical diagrams [11]. One challenge in developing sketch recognition software is maintaining generality. A hand-crafted recognition solution for use in one domain
will not necessarily generalize well to other domains. In this work, we explore
sketch recognition techniques that do not rely on a set of hand-crafted features
based on an expertise in one domain.
Sketches are different than images because they are recorded in a vectorized
format - a series pen strokes on a tablet device. Within each stroke, pointlevel information is sampled at high frequency. This point-level information
includes positional x and y coordinates, pen pressure, and time. This descriptive
data format enables traditional computer vision techniques, but also opens up
opportunities for vector-based algorithms. Vectorized data is often helpful in
1
Figure 1: An example a digital circuit sketch in the LogiSketch program
sketch recognition, but can also be unreliable or noisy (for example, over-drawing
existing lines, using several strokes to draw one line, etc).
The goal is to obtain a higher-level interpretation of these strokes. Generally,
this interpretation is achieved through grouping strokes into meaningful symbols
and classifying, or predicting, what types of symbols was drawn.
LogiSketch
LogiSketch [2] is an example of sketch recognition software for digital logical
circuits. It has been an ongoing research project at Harvey Mudd College for
over six years. We will use LogiSketch as a platform to explore problems in
grouping and classification – these are important problems that almost all sketch
recognition systems face.
It is an interactive system that interprets hand-drawn logical circuit diagrams on-the-fly and provides several modes of feedback to the user, including
the result of the circuit recognition, the ability to simulate a circuit given some
inputs, and the ability to see a truth table for every possible input. An example
of the LogiSketch interface is shown in figure (1).
This software solves these problems of grouping and classifying using a threestep pipelined architecture, where each stage uses the results of the previous
stage.
1. Single stroke classification – each stroke is classified into one of three
categories (wire, text, or gate). The name of this step may be misleading,
because it really groups strokes into three “bins”.
2. Stroke grouping – An additional grouping stage is carried out for strokes
identified as being “gates”. These strokes are clustered into individual
2
gates.
3. Symbol classification – each cluster of gate strokes is classified into a
known symbol type (AND, OR, NAND, NOR, NOT, XOR, etc.)
Each step reduces the complexity of subsequent steps. The task of grouping
strokes into symbols is simplified because we can look only at the bin of “gates”
from the previous step. Likewise, the task of classifying symbols is simplified
under the assumption that the strokes in a single symbol have been correctly
identified (no missing or extra strokes in the symbol). Unfortunately, errors can
compound throughout the stages of this pipeline. Stroke grouping will always
be inaccurate if it uses the wrong “gate” strokes, and symbol classification will
always fail if the wrong set of strokes are identified as a symbol. Errors can
cascade through this pipeline; each step has an opportunity to add errors but
generally will not fix existing errors.
In this work, we explore ways to tighten the loop in these grouping and
classification stages. Feedback from latter stages can correct errors in earlier
stages. For example, if a stroke doesn’t fit well into any symbol’s group, then
this may be evidence that the single-stroke classifier incorrectly identified that
stroke as a gate. Likewise, if the symbol classifier cannot make a confident
classification, then it is possible that the stroke grouping stage identified the
wrong strokes in the group.
In the following sections, we will present a novel two stage grouping algorithm
which incorporates a feedback loop. We also implement and modify a state-ofthe art classification algorithm for use in the domain of digital logical circuits.
2
Stroke Grouping
In the single stroke classification stage, each stroke is identified as a gate, wire,
or text. The subset of these strokes with the classification of “gate” are sent to
a stroke grouping procedure. Other types of strokes (such as wires or text) may
also require grouping algorithms, but such problems fall outside the scope of
this paper. The goal of stroke grouping is to group together sets of strokes that
compose the symbols in the sketch. In general, brute force grouping techniques,
such as attempting to recognize all possible partitions of strokes into groups,
are too expensive for interactive systems. Instead, more clever approaches are
needed.
The current algorithm used by LogiSketch classifies pairs of strokes into one
of three categories: (1) strokes are in the same symbol and are adjacent to each
other, (2) strokes are in the same symbol but are not adjacent to each other,
and (3) strokes are not in the same cluster. Each pairwise classification is made
by an AdaBoost classifier with decision trees based on 13 handcrafted spatial
and temporal features [12].
In this section, we present a novel grouping algorithm to overcome some
of the shortcomings of the current AdaBoost classifier. First, the AdaBoost
classifier can be slow; a decision tree prediction must be made for every pair
of strokes. This takes time O(n2 ), where n is the number of strokes labeled as
symbols by the single-stroke classifier. Additionally, each decision is local – only
taking in information about the pair of strokes. There is an opportunity to incorporate more contextual information into the decision. Third, the predictive
3
Figure 2: Illustration of pipeline architecture in LogiSketch. Red marks indicate
the feedback potential of this fine-grained algorithm.
accuracy of the AdaBoost classifier depends on the quality of the 13 handcrafted
features. These handcrafted features require domain knowledge, may not generalize well to other domains, and do not capitalize on unanticipated patterns
in data. Lastly, and perhaps most importantly, we aim to improve grouping
accuracy. The grouping phase can become a bottleneck in symbol classification;
the symbol classification phase has no chance to correctly classify the symbol if
the symbol has missing or extra strokes.
Previous work has relied on “marker” symbols which attach two distinct
symbols; removing these markers can reveal boundaries between the remaining
symbols [4]. Other recent work converts sets of strokes that are likely to be
symbols into a graph representation, where nodes represent groups of strokes.
A max-product belief propogation algorithm is used to find the most likely set
of nodes in this graph [8]. This algorithm is state-of-the-art because it has an
element of “symbol-aware” grouping, where the grouping process is guided by a
holistic measurement of how much a candidate set of strokes resembles a known
symbol. However, the algorithm is computationally expensive, requiring 100
iterations of belief propogation.
In this section, a novel two-stage stroke grouping algorithm is introduced,
which retains this “symbol-aware” property while remaining fast enough for
interactive use. A fast coarse grouping is performed first, then each of these
coarse groups are later refined in a fine-grained stage.
This grouping algorithm has an additional desirable quality in that it enables a two-way conversation between single-stroke classification, stroke grouping, and symbol grouping. The nature of the iteration used in the fine-grained
step can provide guidance to the single-stroke classification stage. Additionally,
a Hausdorff distance, which is most commonly used as an algorithm for symbol
classification, guides the stroke grouping procedure. These these feedback mechanisms are illustrated as red marks in figure (2). Both feedback mechanisms will
be described in depth below.
Stage 1: Coarse-Grained Grouping
After observing several example circuit sketches, it was apparent that most
symbols can be separated simply by spatial separation once gate strokes are
isolated. Strokes that belong within a single gate often overlap or nearly overlap with neighboring strokes. Strokes belonging to separate gates often have
4
substantial distances between them.
The goal of the coarse-grained grouping algorithm is twofold. First, this spatial information must be exploited in a computationally efficient way suitable
for interactive systems. Second, it provides a set of coarse groupings for the
more computationally intensive fine-grained step. The fine-grained algorithm
works locally within a coarse grouping, so coarse grouping serves to decrease
the computational complexity of the fine-grained step. The coarse-grained algorithm has a tendency err towards overgrouping, which means more strokes
get merged into one group than should be. The fine-grained step is designed to
correct such errors.
This algorithm first computes a rectangular bounding box for each stroke
with edges aligned to the x and y axes. An expanded bounding box is then
created with the same center as the original bounding box, but with a width
and height multiplied by a constant factor, m (a value of m = 1.32 worked well
in practice).
An undirected graph G = (V, E) is created, where vertices represent strokes
and edges represent relationships between strokes. An edge is added for any pair
of strokes whose expanded bounding boxes overlap. This step is O(|V |2 ), but is
fast overall because checking for rectangular overlap is a fast O(1) computation.
Next, the connected components of this graph are found using a series of
breadth-first-searches through the graph, which complete in O(|V | + |E|) time.
The connected components of this graph represent coarse groups of strokes.
This coarse-grained stage ends up being very similar to the current approach
taken by LogiSketch, except that rectangular overlap is used as a criteria for
merging pairs of strokes instead of an AdaBoost decision tree. The overlap
computation can execute must faster because it is a single logical equation
based on the limits of the bounding box. Two rectangles A and B overlap if:
(Amin
< Bxmax )&(Amax
> Bxmin )&(Amin
< Bymax )&(Amax
> Bymin ),
x
x
y
y
(1)
where subscripts indicate either the x or y component of the bounding box
and superscripts indicate the minimum or maximum value of this component.
Figure (3) illustrates two examples of the output of the coarse-grained grouping. In the majority of sketches, gates are separated enough spatially to allow
the coarse-grained algorithm to perfectly group the gates (an example of this
scenario on the left). In some cases, however, gates are drawn in close proximity to each other and are not separated enough for the coarse-grained grouping
algorithm to detect (an example of this scenario on the right). In such cases,
the problem groupings require refinement from the fine-grained grouping stage.
Stage 2: Fine-Grained Grouping
In the coarse-grained grouping stage, symbols in close proximity may get erroneously grouped together. The goal of the fine-grained stage is to recognize and
correct such instances of “overgrouping”.
The fine-grained stage works under the assumption that spatial separation
alone is not enough to accurately group strokes into symbols. Instead, a more
rigorous grouping procedure is required which does not rely on spatial separation; instead, this stage relies on temporal information and “symbol awareness”
5
Figure 3: Examples of the coarse-grained grouping algorithm. Plots drawn with
only gate-strokes added. Colored rectangles represent group labels after coarsegrouping. On the left, perfect grouping. On the right, three gates have been
erroneously clustered together.
Figure 4: Illustration of the greedy fine-grained algorithm. First an AND symbol
is finalized, then an OR symbol.
- a learned notion of how closely a candidate shape resembles a known gate. This
notion is captured by a crucial distance metric called the Hausdorff distance.
Hausdorff distance is explained in detail below.
This algorithm greedily finalizes fine-grained groups. Figure 4 illustrates
this approach. In each iteration, the algorithm examines several subsets of the
strokes in the coarse group. The “quality of fit” of each subset is assessed,
which is a measurement of how much the subset looks like a known symbol.
The subset that fits best is finalized as a fine-grained group and removed from
the coarse-grained group. This algorithm repeats until all strokes are finalized.
Although a coarse group is typically smaller than the set of all strokes labeled
as gates, an exhaustive search for the best partition into fine groups is still
intractable. Assuming that there are n strokes in a coarse group, then the
number of possible fine groupings in an exhaustive search is lower-bounded as
follows:
6
exhaustive ≥
≥
n X
n
i=1
n
X
i=1
i
n!
k!(n − k)!
(2)
(3)
In fact, the lower bound above describes the number of ways the first fine
group can be chosen from the coarse set. To get an exact number of combinations, this formula must be applied recursively for strokes remaining after a
fine-grained group is removed from consideration.
In most sketches, we observe that users tend to draw gates in consecutive
strokes, rather than starting one gate, drawing a second gate, then finishing
the first gate. The fine-grained algorithm exploits this idea to limit the search
space for the best grouping. It is a greedy iterative procedure, where in each
iteration, it pulls out the best subset of consecutive strokes and “finalizes”
this subset as a fine-grained group. This greedy procedure explores subsets
of strokes most likely to be true groups under this “consecutive stroke” assumption while avoiding the state-space explosion illustrated in figure (5). It
is important to note that the term “consecutive” is local to this fine-grained
stage. If the coarse grouper identified the first, fifth, sixth, and ninth strokes as
a coarse group, then the set of consecutive strokes up to length k = 3 would be
[{1}, {5}, {6}, {9}, {1, 5}, {5, 6}, {6, 9}, {1, 5, 6}, {5, 6, 9}]. In algorithm (1), this
procedure is called getConsecutiveStrokes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Input: a coarse-grained grouping, S, maximum grouping length k
Result: a set of fine-grained groupings
begin
S ←− strokes in coarse grouping, ordered temporally;
fineGroupings ←− ∅;
while S 6= ∅ do
consStrokes ←− getConsecutiveStrokes (S, k) ;
dist ←− ∞ ;
bestStrokeSubset ←− ∅ ;
for strokeSubset ∈ consStrokes do
if HausdorffDistanceToGates ( strokeSubset) < dist then
dist ←− HausdorffDistanceToGates (strokeSubset) ;
bestStrokeSubset ←− strokeSubset;
end
end
domainSpecificCheck (bestStrokeSubset) ;
append bestStrokeSubset to fineGroupings;
S ←− S - bestStrokeSubset;
end
end
Algorithm 1: Fine-grained grouper at a high level
7
10000
1000
100
10
1
number of combinations
exhaustive
greedy
2
4
6
8
10
12
14
number of strokes
Figure 5: The state space explosion of fine groupings. The line marked “exhaustive” illustrates the lower bound on the number of possible fine groupings
as the number of strokes increases. The line marked “greedy” illustrates the
number of fine groupings using the consecutive stroke assumption.
In each iteration of the while loop, the algorithm examines all sets of consecutive strokes up to length k (in practice, k was set to 5). For each of these
temporally consecutive subsets of strokes, this “candidate gate” is compared to
known gates (e.g., AND, OR, NAND, NOR, NOT, and XOR) using Hausdorff
distance. This distance metric measures how well a candidate gate matches a
known gate in its entirety; the metric is essential to the performance of the
algorithm and is expanded upon in the following section. The candidate symbol
that matches most closely to a known symbol goes through one final check called
domainSpecificCheck before being finalized. It too will be elaborated upon in
following sections. The finalized candidate is set perminately as a fine-grained
group. This iterative procedure repeats until all strokes in the coarse group
have been placed into a fine-grained group.
While in most cases symbols will be drawn in consecutive strokes, this is not
always the case. This algorithm has the ability to recover if a shape X is started,
then a new shape Y is drawn, then shape X is finished. As an example, the
temporal order of strokes may be: x1 , x2 , y1 , y2 , x3 . The while loop is designed
to recognize y1 , y2 as a shape first, leaving x1 , x2 , x3 . Once the shape Y is
removed, the shape X becomes a consecutive series of strokes.
Hausdorff Distance
In general, the Hausdorff distance measures how far two objects are from each
other in metric space. This distance metric has been successfully (and tractably)
applied in computer vision tasks to measure how well template images match
target images. In the procedure HausdorffDistanceToGates from algorithm
8
(1), a candidate shape is compared to a set of well-drawn template shapes. We
use five examples of each of the six gates for a total of 30 template gates.
To take advantage of this distance metric, candidate and template shapes
must have an appropriate image-based representation. This representation is
similar to the symbol classification approach in [4]. Symbols are centered and
projected onto a 48 by 48 bitmap, while preserving the original aspect ratio.
The Hausdorff distance between two bitmaps A and B is defined as:
H(A, B) = max(h(A, B), h(B, A))
(4)
h(A, B) = max0 min0 ||a − b||
(5)
where
a∈A
b∈B
and A0 represents the set of pixels in bitmap A whose value is 1.
Intuitively, the function h(A, B) measures the maximum of all distances one
can measure from each point in A to the closest point in B. This intuition is
illustrated in figure (6). This measurement has been criticized for being too
sensitive to outliers, however in the context of symbol matching, this maximum
distance tends to capture the occurrence of missing or additional strokes. Additionally, the step of “binning” stroke vector (x, y) points onto a discrete bitmap
representation serves as a way to make computation tractable. A sparse matrix
representation is used to improve space and time complexity.
The Hausdorff distance is essential to the success of the fine-grained algorithm. A sum of squared error (SSE) distance metric and a probabilistic
similarity metric based on a linear SVM model were also considered using the
features described in the “Symbol Classification” stage. Neither approach was
as successful Hausdorff distance.
Incorporating Domain-Specific Knowledge
The greedy fine-grained algorithm iteratively pulls out and finalizes sets of
strokes as symbols. One problem faced in the logical circuit application domain is that some symbols are subsets of other symbols. For example, an AND
is a subset of a NAND gate, and OR is a subset of both XOR and NOR.
This greedy approach has the potential to finalize a subshape (such as AND)
because it matches better via Hausdorff distance to a template then the shape
in its entirety (such as NAND). Therefore, a modification was made to this
fine-grained algorithm and is captured by the procedure domainSpecificCheck
in the algorithm pseudocode above. This procedure does a series of domainspecific checks before finalizing a symbol. For example, in the domain of digital
logical circuits, this procedure will check if it is possible that an AND may be
extended to a NAND.
The language for encoding domain-specific knowledge is generic to remain
applicable in other application domains as well.
domain_info = {
"AND": [ { "candidate": "NAND", "degrees": 0, "distance": 0.5,
"tolerance": 2.0, } ],
9
Figure 6: Illustration of Hausdorff distance applied to logic gates. The Hausdorff
distance between an incomplete candidate symbol (A) and a complete template
(B). Because of the missing stroke, h(B, A) is high
...
}
This example states that if the best match, M , is AND, then we must check
if it could possibly be extended to a NAND. To do this, the bounding box for
the AND symbol is shifted in the 0 degrees orientation (to the right) a distance
of 0.5 times the width of the bounding box of the AND symbol. All nonfinalized strokes within this shifted bounding box are iteratively added to the
AND shape, and checked via Hausdorff distance. The temporary shape with
the smallest Hausdorff distance to a NAND template, T , is considered as an
alternative. If H(T )/H(M ) is less than the tolerance of 2.0, the algorithm will
finalize T instead of M .
Grouping Experimental Results
To test the performance of these grouping algorithms and to compare to the existing approach in the LogiSketch software, a test set of 477 digital logical circuit
sketches were created by 24 students. In total, 4,084 symbols were tested (an average of 8.56 symbols per sketch). The true symbol groupings for these sketches
were labeled by hand. To simplify experimental procedures, we assumed perfect
single-stroke classification.
Four grouping algorithms were applied to this data in table (1). First, just
the coarse-grained grouping algorithm was used. Second, both the coarse and
fine-grained grouping stages were used (coarse+fine). Third, the LogiSketch
system was used under its current parameter settings. Lastly, published LogiSketch results are listed by Peterson [12], which reports results from the same
data using LogiSketch with different parameter settings.
Surprisingly, the coarse-grained algorithm alone most accurately groups this
data set. The coarse+fine-grained approach rarely adds erroneous strokes to a
10
Metric
Coarse
Coarse +
Fine
LogiSketch
(published)
94.8%
LogiSketch
(current
parameters)
71.3%
Percentage of true group ink correctly grouped
Percentage of ink that was erroneously added to true group
Average number of strokes missing from true group
Average number of strokes erroneously added to true group
Average total missing or extra
strokes
Percentage of clusters with 0 errors
Percentage of clusters with 1 error
Percentage of clusters with 2+
errors
96.7%
7.9%
1.3%
34.5%
4.5%
0.08
0.19
0.80
N/A
0.05
0.01
0.80
N/A
0.13
0.20
1.60
N/A
94.2%
82.6%
63.7%
76.7%
3.3%
15.9%
2.0%
16.5%
2.4%
1.5%
34.3%
5.0%
91.0%
Table 1: Grouping performance metrics for coarse-grained, coarse+fine grained,
LogiSketch under current running parameters, and published LogiSketch results
under different parameters
true group, but tends to have missing strokes more frequently than the coarsegrained approach by itself. While the coarse-grained algorithm alone achieves
the best performance on this data set, the coarse+fine-grained algorithm may
prove to be better in busy, crowded sketches where spatial separation alone is
not enough to distinguish between symbols.
Although the coarse approach outperforms the coarse+fine approach, the
coarse+fine approach may provide additional benefits to the sketch recognition
process.
Future Grouping Work
One advantage of the fine-grained grouping algorithm is that for each coarse
group, the last finalized strokes are those that have the poorest fit to a known
template. It is possible that these strokes could have been mislabeled by the
single stroke classifier as a gate, and there is an opportunity to provide feedback
to the single stroke classification stage. For example, the single-stroke classifier
could have erroneously labeled a wire as a gate, as in figure (7). We would expect
that wire to be one of the last strokes finalized by the fine-grained grouping
algorithm, and the Hausdorff distance for this wire should be relatively large.
Additionally, strokes not labeled as gates which overlap with the coarsegrained bounding box could be added to the pool of strokes used by the finegrained algorithm. If the fine-grained algorithm confidently finalizes such strokes
(into one of the first finalized fine-grained groups), this could be an indicator
to the single-stroke classifier that these strokes are truly gates. It would be
11
Figure 7: Illustration of an instance where the fine-grained algorithm can provide feedback to the single-stroke classifier
interesting to explore ways in which the fine-grained algorithm can provide
feedback to the single-stroke classification stage.
Currently, the LogiSketch framework simplifies the sketch recognition by
separating it into a pipeline of three distinct steps: (1) single-stroke recognition,
(2) stroke grouping, then (3) symbol classification. This fine-grained algorithm
has promise to achieve a tighter feedback loop in these pipeline stages, where
each stage gives feedback to previous stages.
In summary, this novel grouping algorithm provides two feedback mechanisms. As illustrated in figure (7), the order in which the fine-grained algorithm
finalizes strokes can provide feedback to the single-stroke classification stage.
Additionally, the Hausdorff distance (which is typically used to classify shapes)
is incorporated into the fine-grained iteration to guide the search for fine-grained
groups.
3
Symbol Classification
The symbol classifier takes as input a group of strokes from the stroke grouping
stage. The goal of this stage is to correctly identify the class of symbol represented by a group of strokes. In the domain of digital logical circuits, a symbol
is labeled as one of six possible gates: AND, OR, NAND, NOR, NOT, or XOR.
See figure (8) for examples of these gates.
The current LogiSketch system uses a nearest-neigbor based approach, which
predicts that a test symbol is the same class as the closest template image
according to a distance metric [3]. Classification is based on a voting scheme
between four distance metrics, including the Hausdorff distance, a modified
Hausdorff distance, the Tanimoto Similarity Coefficient, and the Yule Coefficient
[5].
In this section, we implement an existing state-of-the-art symbol classification algorithm [9], then modify it to work well in the domain of digital logical
circuits. This modified algorithm addresses some of the issues with the current
approach taken by LogiSketch. Like any other non-parametric nearest-neighbor
classifier, performance is only as good as the templates that a shape is matched
12
Figure 8: Examples of user-drawn gates. Clockwise from top-left: AND, NAND,
NOR, XOR, OR, NOT
to. Additionally, it is often too costly to compare a test symbol to all symbols
in a training dataset; therefore, a representative template set must be chosen.
This section presents a parametric approach that will benefit from large training
corpuses while still making predictions tractably.
Previous approaches to symbol classification generally fall into two categories: stroke-based or image-based. Stroke-based approaches such as [13]
classify a symbol based on the raw vectorized stroke information (a series of
(x, y, time, pressure) points). Often, stroke vector information can be noisy
due to users tracing over existing lines, segmenting edges of shapes into multiple strokes, or simply due to a user’s inherent messiness. This gave rise to
several image-based approaches such as [4], which convert the stroke vector information into a bitmap representation, which is less prone to noise but also
loses temporal and exact spatial information.
Recent work by Ouyang and Davis in [9] proposes a novel hybrid symbol
classification technique which combines elements from stroke-based and imagebased approaches. In the following section, this approach is introduced and
modifications to this approach are described which make it suitable to overcome
the challenges of symbol classification of digital logical circuits.
Ouyang et al’s Raster-Based Approach
Ouyang and Davis [9] developed a feature representation of symbols based on a
set of five low-resolution feature images, or rasters. These features capture the
orientation and endpoint information of a symbol. This approach is a hybrid
both image- and vector-based approaches because orientation and endpoints are
qualities of the vectorized data representation, however they are embedded into
an image-based representation.
A set of strokes from the stroke grouping stage is used as input to this
algorithm. The feature extraction process is summarized in the steps below; a
complete explanation can be found in their paper.
1. Additional points are interpolated within strokes to ensure constant spatial
13
distance between consecutive points.
2. The positional x and y values of each point in this symbol are scaled and
transformed so that the symbol’s center of mass is at the origin and so
that the points have unit variance along both the horizontal and vertical
axis.
3. For each point in every stroke of a symbol, five features are computed.
Orientation features are computed, which measure how nearly horizontal,
vertical or diagonal (45 or 135 degrees) the stroke is at that point. An
endpoint feature is an indicator variable describing whether or not the
point is an endpoint of a stroke. Each of these features is scaled to a
range of [0, 1].
4. Five 24 by 24 raster grids are created - one for each of the four orientations,
and one for endpoints. Each point is mapped to a pixel in these 24 by 24
grids. The pixel intensity for each pixel in the feature rasters is set to the
maximum feature value in all points that map to that pixel.
5. A Gaussian smoothing function is applied to each raster, then each raster
is down-sized to a 12 by 12 raster grid using a max-filter, where each pixel
in the downsized image is the maximum of the four pixels from the original
grid.
The result of this feature extraction process is a set of five 12 by 12 grids a total of 720 feature values for each symbol. Each raster contains orientation
or endpoint information extracted from the vectorized strokes. Finally, a test
symbol is compared to a set of template images to determine the predicted
class. This comparison is carried out using the image deformation model (IDM)
distance metric [6], which is tolerant to local deformations in the symbol.
Adapting the Raster-Based Approach for Digital Logical
Circuits
The symbol classification task is especially challenging in the digital logical
circuit domain for two reasons. First, slight deformations in a symbol may
change it from one gate to another. For example, adding a bend to the vertical
back of an AND gate effectively converts it to an OR gate. Sketchers in practice
do not draw perfectly straight strokes, so it is important to pick up on the
important distinguishing factors of shapes (e.g., the curvature of the vertical
back) without falling victim to the imperfections of human sketchers. Second,
some gates are subsets of the strokes in other gates. For example, an OR gate
is a subset of a NOR gate in which the “bubble” has been removed. While the
majority of a symbol may be highly indicative of one gate, there may be a small
amount of ink which changes the class of that gate. A symbol classifier must
be sensitive to small amounts of ink which may suggest a change in predicted
gate.
Because logical gates are so sensitive to small deformations, the IDM distance
metric performed very poorly. Instead, a parametric approach was used. A
multi-class linear support vector machine (SVM) was trained based on features
from the 12 by 12 rasterized images. A support vector machine model has the
14
Figure 9: Examples of the six raster images used as features in a linear SVM
for symbol classification
added benefit of being able to tractably learn from large training sets while
still performing prediction quickly. The decision boundary of an SVM is a
hyperplane in high-dimensional space and a prediction can be made based on
a dot product calculation which can be computed in time O(num. features),
regardless of how many training examples there are.
We also modified Ouyang et al’s algorithm in an attempt to improve classification accuracy. Ouyang’s five raster images introduced in the section above
represent where the pen ink is on the tablet, the angle at which the pen was
moving, and where the pen was put down or lifted up from the tablet. These
features describe the spatial characteristics of a symbol, but this feature set
does not describe any temporal information. This temporal information can describe not just where a symbol was drawn, but how it was drawn. For example,
sketchers must decrease pen velocity to make sharp turns or intricate patterns,
but may tend to draw straight lines quickly.
This temporal information was encoded in a sixth raster image, which projects
the pen’s velocity onto a similar 12 by 12 raster image. Similar steps were taken
for this raster image, including shifting, scaling, Gaussian smoothing, and downsampling.
In figure (9), an example of a complete set of six raster images is shown for
an AND symbol. In total there are 12 × 12 × 6 = 864 feature values.
Symbol Classification Experimental Results
Digital Logical Circuits
We tested both the parametric SVM classifier and the additional velocity raster
on a dataset of sketches from nine students at the University of California, River15
Method
IDM
SVM (no velocity)
SVM (with velocity)
2-fold c.v. accuracy
60.06%
90.58%
92.01%
Notes
cost: 0.1
cost: 0.1
Table 2: Performance of symbol classifiers on digital logical circuit data. The
cost reported in the “Notes” column is the linear SVM regularization parameter
tuned by grid search.
side and Harvey Mudd College. 1,508 symbols were taken from 175 sketches
from these nine students. True symbol classifications were labeled by hand.
To simplify the experimental procedure, we assumed perfect grouping from the
stroke grouping stage.
Two-fold cross validation was performed for each test. Cross validation
groups were assigned so that no student had symbols in both the training and
testing datasets. This rule ensures no bias due to already seeing a student’s
sketching style.
Three classifiers were considered; all three are based on the raster feature
representation.
• The first classifier used a non-parametric nearest-neighbor predictor with
the IDM distance metric. In this approach, a random 20% subset of
the training data was used as a set of templates in the nearest-neighbor
comparison.
• Second, a multi-class linear SVM model was trained on the complete training set. In this experiment, no velocity raster is used. The one-vs-one
strategy is used to combine pairwise binary classifiers into a multi-class
model. Linear SVM models require a cost parameter which was tuned via
grid search.
• Third, a similar multi-class linear SVM model is used, but the velocity
raster is added.
Results are listed in table (2). The IDM nearest neighbor technique achieved
an unimpressive 60.06% predictive accuracy. We believe this is due to the fact
that digital logical gates are sensitive to slight deformations, and allowing such
deformations can quickly make one type of gate look like another.
We also observe that the two linear SVM models classify relatively accurately. The additional velocity raster increases predictive accuracy by 1.43%,
from 90.58% to 92.01%.
Digital Logical Circuits: A comparison to LogiSketch
In this experiment, we compare the image-based nearest neighbor classifier in
the current LogiSketch system with the best classifier from the previous section:
the SVM classifier using the velocity raster. To compare, the experimental
procedure described in [3] was replicated to ensure that results are comparable
to published results. In this procedure, classifiers are trained and tested on
three types of data:
16
Training Activity
ISO
COPY
SYNTH
ISO
COPY
SYNTH
ISO
COPY
SYNTH
Testing Activity
GlobalSet
SYNTH
SYNTH
SYNTH
ComboSet
SYNTH
SYNTH
SYNTH
UserSet
SYNTH
SYNTH
SYNTH
LogiSketch
SVM
81.1%
79.4%
81.9%
86.9%
88.8%
85.8%
88.5%
86.8%
93.4%
83.3%
88.6%
90.1%
84.0%
81.9%
88.6%
70.5%
62.1%
69.8%
Table 3: Classifier performance under different types of data, training tasks,
and test tasks
1. GlobalSet – one global classifier is learned from all available training data.
This simulates a factory trained classifier.
2. ComboSet – individual classifiers are learned for each user such that 40%
of the training data is global and 60% is user specific. This simulates a
factory trained classifier tuned “online” to a user’s examples.
3. UserSet – individual classifiers are learned for each user using only other
training symbols from that user. This simulates a classifier trained only
on examples given by the user.
Additionally, the type of training data was also varied. Training data was
gathered from students under three premises: isolated symbol drawing (ISO),
copying a logic diagram (COPY), and synthesizing a logic diagram from a logical equation (SYNTH). We used test data from the SYNTH task. This most
accurately simulates how a user would use sketch recognition software in real
life.
The published Logisketch results in [3] are based on template sets with five
templates per gate to remain tractable. SVM models typically improve with
more training data so all available training data is used. Again, a two-fold cross
validation procedure is used.
In table (3), we see that the SVM classifier excels when one global classifier
is learned from all available training data (i.e., a factory-trained classifier). Results are mixed in the combo scenario, where user-specific models are trained
using a split of global data and user-specific data. LogiSketch’s nearest-neighbor
approach is dominant when user-specific models are trained using only other examples from that user (i.e., a personalized classifer).
In general, the SVM model performs well when it can take advantage of a
large training set such as GlobalSet, but when the training data “narrows in”
on just the examples of one user as in the UserSet, the SVM model does not
have enough training examples to learn a good model. We believe there is an
opportunity to use a “factory trained” global SVM model when a user first starts
using LogiSketch. As a user interacts with LogiSketch, it can switch to a model
17
Figure 10: Examples of training handwritten pen digits. First stroke in blue,
second stroke in green.
Method
linear SVM (no velocity)
linear SVM (with velocity)
Eigen-Deformations [7]
IDM [9]
test accuracy
98.4%
98.7%
98.2%
99.2%
Table 4: Performance of symbol classifiers on handwritten pen digits dataset
which combines the decisions of the SVM and nearest-neighbor classifiers. We
leave this combined classifier to future work.
Handwritten Pen Digits
Because there are so many application domains that can benefit from sketch
recognition software, it is important that sketch recognition algorithms can be
applied across application domains. In this section, the best classifier from
above (SVM with velocity) is tested on handwritten digits.
The handwritten pen digits data set was first introduced in 1997 by Alimoglu
and Alpaydin [1]. It contains 10,992 individual digit sketches divided into a
training and testing set. 30 users provided 7,493 training digit examples and an
additional 14 users provided 3,497 test digit examples. No writers had examples
in both the training and testing sets to simulate the ability of learning algorithms
to apply to new users. Some example training digits are illustrated in figure (10).
A global linear SVM model was learned on all training examples then applied
to all test examples. This procedure was repeated twice – once for an SVM
model without a velocity raster, and once with the addition of the velocity
raster into the feature set. Results are listed in table (4).
The addition of the velocity raster increases test set accuracy from 98.4% to
98.7%, which further suggests that velocity information can increase separability
of the data. The SVM with velocity method improves upon the best known
published result before 2009 [7], however it appears that the IDM distance
metric is well suited to pen digits and achieves the highest accuracy.
18
4
Conclusion
We have presented a novel two-stage stroke grouping algorithm which does a
fast initial grouping then refines these groups in an efficient way. Experiments
suggest that this approach improves upon the techniques currently used in LogiSketch. Additionally, this grouping algorithm has the ability tighten the loop
in the pipelined sketch recognition architecture. Additionally, we improved an
existing state-of-the-art symbol classification technique for use in the domain
of digital logical circuits. Experiments suggest that this classification technique
may be optimal for factory trained classifiers.
References
[1] F. Alimoglu and E. Alpaydin. Combining multiple representations and
classifiers for pen-based handwritten digit recognition. In Proceedings of
ICDAR, 1997, 1997.
[2] C. Alvarado. . In Sketch Recognition Research at HMC, September 2011.
http://www.cs.hmc.edu/~alvarado/research/sketch.html.
[3] M. Field, S. Gordon, E. Peterson, R. Robinson, T. Stahovich, and C. Alvarado. The effect of task on classification accuracy: using gesture recognition techniques in free-sketch recognition. In Proceedings of the 6th Eurographics Symposium on Sketch-Based Interfaces and Modeling, SBIM ’09,
pages 109–116, New York, NY, USA, 2009. ACM.
[4] L. B. Kara and T. F. Stahovich. Hierarchical parsing and recognition of
hand-sketched diagrams. In Proceedings of the 17th annual ACM symposmitium on User interface software and technology, UIST ’04, pages 13–22,
New York, NY, USA, 2004. ACM.
[5] L. B. Kara and T. F. Stahovich. An image-based trainable symbol recognizer for sketch-based interfaces. In AAAI Fall Symposium Series 2004:
Making Pen-Based Interaction Intelligent and Natural, pages 99–105. AAAI
Press, 2004.
[6] D. Keysers, C. Gollan, and H. Ney. Local context in non-linear deformation models for handwritten character recognition. In Pattern Recognition,
2004. ICPR 2004. Proceedings of the 17th International Conference on,
volume 4, pages 511 – 514 Vol.4, aug. 2004.
[7] H. Mitoma, S. Uchida, and H. Sakoe. Online character recognition using
eigen-deformations. In 9th International Workshop on Frontiers in Handwriting recognition, pages 26–29, 2004.
[8] T. Y. Ouyang and R. Davis. Learning from neighboring strokes: Combining
appearance and context for multi-domain sketch recognition, 2009.
[9] T. Y. Ouyang and R. Davis. A visual approach to sketched symbol recognition, 2009.
[10] T. Y. Ouyang and R. Davis. Visual recognition of sketched symbols. In
IUI 2009 Workshop on Sketch Recognition, 2009.
19
[11] T. Y. Ouyang and R. Davis. Chemink: a natural real-time recognition
system for chemical drawings. In Proceedings of the 16th international
conference on Intelligent user interfaces, IUI ’11, pages 267–276, New York,
NY, USA, 2011. ACM.
[12] E. J. Peterson, T. F. Stahovich, E. Doi, and C. Alvarado. Grouping strokes
into shapes in hand-drawn diagrams. In AIII, 2010.
[13] J. O. Wobbrock, A. D. Wilson, and Y. Li. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In Proceedings
of the 20th annual ACM symposium on User interface software and technology, UIST ’07, pages 159–168, New York, NY, USA, 2007. ACM.
20
Download