Sketch Recognition of Digital Logical Circuits University of California, San Diego Computer Science and Engineering Masters Project David Johnston Advisor: Christine Alvarado 18 March 2013 Abstract Two tasks are prevalent in most sketch recognition systems - grouping and recognition. We describe a novel grouping algorithm which clusters pen strokes into symbols. This algorithm is unique because it allows a feedback loop which can improve the performance of other phases in recognition. In our experiments, we improve the percentage of perfectly grouped symbols from 76.7% to 94.2%. Additionally, a state-of-the-art symbol classification technique was improved for use in the domain of digital logical circuits. On average, this technique improved the accuracy of symbol classification in the the “factory trained” setting from 80.8% to 87.16%. 1 Introduction In the early stages of engineering design, pen-and-paper sketches are often used to quickly convey concepts and ideas. Free-form drawing is often preferable to using computer interfaces due to its ease of use, fluidity, and lack of constraints. Tablet PCs and digital drawing tablet devices have enabled computer sketch recognition interfaces which can interpret hand-drawn sketches in an interactive environment. Sketch recognition software can be applied to many different application domains, including digital logic diagrams, family trees [12], free-body diagrams, mathematical equations, [4], electrical circuit diagrams [10], and chemical diagrams [11]. One challenge in developing sketch recognition software is maintaining generality. A hand-crafted recognition solution for use in one domain will not necessarily generalize well to other domains. In this work, we explore sketch recognition techniques that do not rely on a set of hand-crafted features based on an expertise in one domain. Sketches are different than images because they are recorded in a vectorized format - a series pen strokes on a tablet device. Within each stroke, pointlevel information is sampled at high frequency. This point-level information includes positional x and y coordinates, pen pressure, and time. This descriptive data format enables traditional computer vision techniques, but also opens up opportunities for vector-based algorithms. Vectorized data is often helpful in 1 Figure 1: An example a digital circuit sketch in the LogiSketch program sketch recognition, but can also be unreliable or noisy (for example, over-drawing existing lines, using several strokes to draw one line, etc). The goal is to obtain a higher-level interpretation of these strokes. Generally, this interpretation is achieved through grouping strokes into meaningful symbols and classifying, or predicting, what types of symbols was drawn. LogiSketch LogiSketch [2] is an example of sketch recognition software for digital logical circuits. It has been an ongoing research project at Harvey Mudd College for over six years. We will use LogiSketch as a platform to explore problems in grouping and classification – these are important problems that almost all sketch recognition systems face. It is an interactive system that interprets hand-drawn logical circuit diagrams on-the-fly and provides several modes of feedback to the user, including the result of the circuit recognition, the ability to simulate a circuit given some inputs, and the ability to see a truth table for every possible input. An example of the LogiSketch interface is shown in figure (1). This software solves these problems of grouping and classifying using a threestep pipelined architecture, where each stage uses the results of the previous stage. 1. Single stroke classification – each stroke is classified into one of three categories (wire, text, or gate). The name of this step may be misleading, because it really groups strokes into three “bins”. 2. Stroke grouping – An additional grouping stage is carried out for strokes identified as being “gates”. These strokes are clustered into individual 2 gates. 3. Symbol classification – each cluster of gate strokes is classified into a known symbol type (AND, OR, NAND, NOR, NOT, XOR, etc.) Each step reduces the complexity of subsequent steps. The task of grouping strokes into symbols is simplified because we can look only at the bin of “gates” from the previous step. Likewise, the task of classifying symbols is simplified under the assumption that the strokes in a single symbol have been correctly identified (no missing or extra strokes in the symbol). Unfortunately, errors can compound throughout the stages of this pipeline. Stroke grouping will always be inaccurate if it uses the wrong “gate” strokes, and symbol classification will always fail if the wrong set of strokes are identified as a symbol. Errors can cascade through this pipeline; each step has an opportunity to add errors but generally will not fix existing errors. In this work, we explore ways to tighten the loop in these grouping and classification stages. Feedback from latter stages can correct errors in earlier stages. For example, if a stroke doesn’t fit well into any symbol’s group, then this may be evidence that the single-stroke classifier incorrectly identified that stroke as a gate. Likewise, if the symbol classifier cannot make a confident classification, then it is possible that the stroke grouping stage identified the wrong strokes in the group. In the following sections, we will present a novel two stage grouping algorithm which incorporates a feedback loop. We also implement and modify a state-ofthe art classification algorithm for use in the domain of digital logical circuits. 2 Stroke Grouping In the single stroke classification stage, each stroke is identified as a gate, wire, or text. The subset of these strokes with the classification of “gate” are sent to a stroke grouping procedure. Other types of strokes (such as wires or text) may also require grouping algorithms, but such problems fall outside the scope of this paper. The goal of stroke grouping is to group together sets of strokes that compose the symbols in the sketch. In general, brute force grouping techniques, such as attempting to recognize all possible partitions of strokes into groups, are too expensive for interactive systems. Instead, more clever approaches are needed. The current algorithm used by LogiSketch classifies pairs of strokes into one of three categories: (1) strokes are in the same symbol and are adjacent to each other, (2) strokes are in the same symbol but are not adjacent to each other, and (3) strokes are not in the same cluster. Each pairwise classification is made by an AdaBoost classifier with decision trees based on 13 handcrafted spatial and temporal features [12]. In this section, we present a novel grouping algorithm to overcome some of the shortcomings of the current AdaBoost classifier. First, the AdaBoost classifier can be slow; a decision tree prediction must be made for every pair of strokes. This takes time O(n2 ), where n is the number of strokes labeled as symbols by the single-stroke classifier. Additionally, each decision is local – only taking in information about the pair of strokes. There is an opportunity to incorporate more contextual information into the decision. Third, the predictive 3 Figure 2: Illustration of pipeline architecture in LogiSketch. Red marks indicate the feedback potential of this fine-grained algorithm. accuracy of the AdaBoost classifier depends on the quality of the 13 handcrafted features. These handcrafted features require domain knowledge, may not generalize well to other domains, and do not capitalize on unanticipated patterns in data. Lastly, and perhaps most importantly, we aim to improve grouping accuracy. The grouping phase can become a bottleneck in symbol classification; the symbol classification phase has no chance to correctly classify the symbol if the symbol has missing or extra strokes. Previous work has relied on “marker” symbols which attach two distinct symbols; removing these markers can reveal boundaries between the remaining symbols [4]. Other recent work converts sets of strokes that are likely to be symbols into a graph representation, where nodes represent groups of strokes. A max-product belief propogation algorithm is used to find the most likely set of nodes in this graph [8]. This algorithm is state-of-the-art because it has an element of “symbol-aware” grouping, where the grouping process is guided by a holistic measurement of how much a candidate set of strokes resembles a known symbol. However, the algorithm is computationally expensive, requiring 100 iterations of belief propogation. In this section, a novel two-stage stroke grouping algorithm is introduced, which retains this “symbol-aware” property while remaining fast enough for interactive use. A fast coarse grouping is performed first, then each of these coarse groups are later refined in a fine-grained stage. This grouping algorithm has an additional desirable quality in that it enables a two-way conversation between single-stroke classification, stroke grouping, and symbol grouping. The nature of the iteration used in the fine-grained step can provide guidance to the single-stroke classification stage. Additionally, a Hausdorff distance, which is most commonly used as an algorithm for symbol classification, guides the stroke grouping procedure. These these feedback mechanisms are illustrated as red marks in figure (2). Both feedback mechanisms will be described in depth below. Stage 1: Coarse-Grained Grouping After observing several example circuit sketches, it was apparent that most symbols can be separated simply by spatial separation once gate strokes are isolated. Strokes that belong within a single gate often overlap or nearly overlap with neighboring strokes. Strokes belonging to separate gates often have 4 substantial distances between them. The goal of the coarse-grained grouping algorithm is twofold. First, this spatial information must be exploited in a computationally efficient way suitable for interactive systems. Second, it provides a set of coarse groupings for the more computationally intensive fine-grained step. The fine-grained algorithm works locally within a coarse grouping, so coarse grouping serves to decrease the computational complexity of the fine-grained step. The coarse-grained algorithm has a tendency err towards overgrouping, which means more strokes get merged into one group than should be. The fine-grained step is designed to correct such errors. This algorithm first computes a rectangular bounding box for each stroke with edges aligned to the x and y axes. An expanded bounding box is then created with the same center as the original bounding box, but with a width and height multiplied by a constant factor, m (a value of m = 1.32 worked well in practice). An undirected graph G = (V, E) is created, where vertices represent strokes and edges represent relationships between strokes. An edge is added for any pair of strokes whose expanded bounding boxes overlap. This step is O(|V |2 ), but is fast overall because checking for rectangular overlap is a fast O(1) computation. Next, the connected components of this graph are found using a series of breadth-first-searches through the graph, which complete in O(|V | + |E|) time. The connected components of this graph represent coarse groups of strokes. This coarse-grained stage ends up being very similar to the current approach taken by LogiSketch, except that rectangular overlap is used as a criteria for merging pairs of strokes instead of an AdaBoost decision tree. The overlap computation can execute must faster because it is a single logical equation based on the limits of the bounding box. Two rectangles A and B overlap if: (Amin < Bxmax )&(Amax > Bxmin )&(Amin < Bymax )&(Amax > Bymin ), x x y y (1) where subscripts indicate either the x or y component of the bounding box and superscripts indicate the minimum or maximum value of this component. Figure (3) illustrates two examples of the output of the coarse-grained grouping. In the majority of sketches, gates are separated enough spatially to allow the coarse-grained algorithm to perfectly group the gates (an example of this scenario on the left). In some cases, however, gates are drawn in close proximity to each other and are not separated enough for the coarse-grained grouping algorithm to detect (an example of this scenario on the right). In such cases, the problem groupings require refinement from the fine-grained grouping stage. Stage 2: Fine-Grained Grouping In the coarse-grained grouping stage, symbols in close proximity may get erroneously grouped together. The goal of the fine-grained stage is to recognize and correct such instances of “overgrouping”. The fine-grained stage works under the assumption that spatial separation alone is not enough to accurately group strokes into symbols. Instead, a more rigorous grouping procedure is required which does not rely on spatial separation; instead, this stage relies on temporal information and “symbol awareness” 5 Figure 3: Examples of the coarse-grained grouping algorithm. Plots drawn with only gate-strokes added. Colored rectangles represent group labels after coarsegrouping. On the left, perfect grouping. On the right, three gates have been erroneously clustered together. Figure 4: Illustration of the greedy fine-grained algorithm. First an AND symbol is finalized, then an OR symbol. - a learned notion of how closely a candidate shape resembles a known gate. This notion is captured by a crucial distance metric called the Hausdorff distance. Hausdorff distance is explained in detail below. This algorithm greedily finalizes fine-grained groups. Figure 4 illustrates this approach. In each iteration, the algorithm examines several subsets of the strokes in the coarse group. The “quality of fit” of each subset is assessed, which is a measurement of how much the subset looks like a known symbol. The subset that fits best is finalized as a fine-grained group and removed from the coarse-grained group. This algorithm repeats until all strokes are finalized. Although a coarse group is typically smaller than the set of all strokes labeled as gates, an exhaustive search for the best partition into fine groups is still intractable. Assuming that there are n strokes in a coarse group, then the number of possible fine groupings in an exhaustive search is lower-bounded as follows: 6 exhaustive ≥ ≥ n X n i=1 n X i=1 i n! k!(n − k)! (2) (3) In fact, the lower bound above describes the number of ways the first fine group can be chosen from the coarse set. To get an exact number of combinations, this formula must be applied recursively for strokes remaining after a fine-grained group is removed from consideration. In most sketches, we observe that users tend to draw gates in consecutive strokes, rather than starting one gate, drawing a second gate, then finishing the first gate. The fine-grained algorithm exploits this idea to limit the search space for the best grouping. It is a greedy iterative procedure, where in each iteration, it pulls out the best subset of consecutive strokes and “finalizes” this subset as a fine-grained group. This greedy procedure explores subsets of strokes most likely to be true groups under this “consecutive stroke” assumption while avoiding the state-space explosion illustrated in figure (5). It is important to note that the term “consecutive” is local to this fine-grained stage. If the coarse grouper identified the first, fifth, sixth, and ninth strokes as a coarse group, then the set of consecutive strokes up to length k = 3 would be [{1}, {5}, {6}, {9}, {1, 5}, {5, 6}, {6, 9}, {1, 5, 6}, {5, 6, 9}]. In algorithm (1), this procedure is called getConsecutiveStrokes. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Input: a coarse-grained grouping, S, maximum grouping length k Result: a set of fine-grained groupings begin S ←− strokes in coarse grouping, ordered temporally; fineGroupings ←− ∅; while S 6= ∅ do consStrokes ←− getConsecutiveStrokes (S, k) ; dist ←− ∞ ; bestStrokeSubset ←− ∅ ; for strokeSubset ∈ consStrokes do if HausdorffDistanceToGates ( strokeSubset) < dist then dist ←− HausdorffDistanceToGates (strokeSubset) ; bestStrokeSubset ←− strokeSubset; end end domainSpecificCheck (bestStrokeSubset) ; append bestStrokeSubset to fineGroupings; S ←− S - bestStrokeSubset; end end Algorithm 1: Fine-grained grouper at a high level 7 10000 1000 100 10 1 number of combinations exhaustive greedy 2 4 6 8 10 12 14 number of strokes Figure 5: The state space explosion of fine groupings. The line marked “exhaustive” illustrates the lower bound on the number of possible fine groupings as the number of strokes increases. The line marked “greedy” illustrates the number of fine groupings using the consecutive stroke assumption. In each iteration of the while loop, the algorithm examines all sets of consecutive strokes up to length k (in practice, k was set to 5). For each of these temporally consecutive subsets of strokes, this “candidate gate” is compared to known gates (e.g., AND, OR, NAND, NOR, NOT, and XOR) using Hausdorff distance. This distance metric measures how well a candidate gate matches a known gate in its entirety; the metric is essential to the performance of the algorithm and is expanded upon in the following section. The candidate symbol that matches most closely to a known symbol goes through one final check called domainSpecificCheck before being finalized. It too will be elaborated upon in following sections. The finalized candidate is set perminately as a fine-grained group. This iterative procedure repeats until all strokes in the coarse group have been placed into a fine-grained group. While in most cases symbols will be drawn in consecutive strokes, this is not always the case. This algorithm has the ability to recover if a shape X is started, then a new shape Y is drawn, then shape X is finished. As an example, the temporal order of strokes may be: x1 , x2 , y1 , y2 , x3 . The while loop is designed to recognize y1 , y2 as a shape first, leaving x1 , x2 , x3 . Once the shape Y is removed, the shape X becomes a consecutive series of strokes. Hausdorff Distance In general, the Hausdorff distance measures how far two objects are from each other in metric space. This distance metric has been successfully (and tractably) applied in computer vision tasks to measure how well template images match target images. In the procedure HausdorffDistanceToGates from algorithm 8 (1), a candidate shape is compared to a set of well-drawn template shapes. We use five examples of each of the six gates for a total of 30 template gates. To take advantage of this distance metric, candidate and template shapes must have an appropriate image-based representation. This representation is similar to the symbol classification approach in [4]. Symbols are centered and projected onto a 48 by 48 bitmap, while preserving the original aspect ratio. The Hausdorff distance between two bitmaps A and B is defined as: H(A, B) = max(h(A, B), h(B, A)) (4) h(A, B) = max0 min0 ||a − b|| (5) where a∈A b∈B and A0 represents the set of pixels in bitmap A whose value is 1. Intuitively, the function h(A, B) measures the maximum of all distances one can measure from each point in A to the closest point in B. This intuition is illustrated in figure (6). This measurement has been criticized for being too sensitive to outliers, however in the context of symbol matching, this maximum distance tends to capture the occurrence of missing or additional strokes. Additionally, the step of “binning” stroke vector (x, y) points onto a discrete bitmap representation serves as a way to make computation tractable. A sparse matrix representation is used to improve space and time complexity. The Hausdorff distance is essential to the success of the fine-grained algorithm. A sum of squared error (SSE) distance metric and a probabilistic similarity metric based on a linear SVM model were also considered using the features described in the “Symbol Classification” stage. Neither approach was as successful Hausdorff distance. Incorporating Domain-Specific Knowledge The greedy fine-grained algorithm iteratively pulls out and finalizes sets of strokes as symbols. One problem faced in the logical circuit application domain is that some symbols are subsets of other symbols. For example, an AND is a subset of a NAND gate, and OR is a subset of both XOR and NOR. This greedy approach has the potential to finalize a subshape (such as AND) because it matches better via Hausdorff distance to a template then the shape in its entirety (such as NAND). Therefore, a modification was made to this fine-grained algorithm and is captured by the procedure domainSpecificCheck in the algorithm pseudocode above. This procedure does a series of domainspecific checks before finalizing a symbol. For example, in the domain of digital logical circuits, this procedure will check if it is possible that an AND may be extended to a NAND. The language for encoding domain-specific knowledge is generic to remain applicable in other application domains as well. domain_info = { "AND": [ { "candidate": "NAND", "degrees": 0, "distance": 0.5, "tolerance": 2.0, } ], 9 Figure 6: Illustration of Hausdorff distance applied to logic gates. The Hausdorff distance between an incomplete candidate symbol (A) and a complete template (B). Because of the missing stroke, h(B, A) is high ... } This example states that if the best match, M , is AND, then we must check if it could possibly be extended to a NAND. To do this, the bounding box for the AND symbol is shifted in the 0 degrees orientation (to the right) a distance of 0.5 times the width of the bounding box of the AND symbol. All nonfinalized strokes within this shifted bounding box are iteratively added to the AND shape, and checked via Hausdorff distance. The temporary shape with the smallest Hausdorff distance to a NAND template, T , is considered as an alternative. If H(T )/H(M ) is less than the tolerance of 2.0, the algorithm will finalize T instead of M . Grouping Experimental Results To test the performance of these grouping algorithms and to compare to the existing approach in the LogiSketch software, a test set of 477 digital logical circuit sketches were created by 24 students. In total, 4,084 symbols were tested (an average of 8.56 symbols per sketch). The true symbol groupings for these sketches were labeled by hand. To simplify experimental procedures, we assumed perfect single-stroke classification. Four grouping algorithms were applied to this data in table (1). First, just the coarse-grained grouping algorithm was used. Second, both the coarse and fine-grained grouping stages were used (coarse+fine). Third, the LogiSketch system was used under its current parameter settings. Lastly, published LogiSketch results are listed by Peterson [12], which reports results from the same data using LogiSketch with different parameter settings. Surprisingly, the coarse-grained algorithm alone most accurately groups this data set. The coarse+fine-grained approach rarely adds erroneous strokes to a 10 Metric Coarse Coarse + Fine LogiSketch (published) 94.8% LogiSketch (current parameters) 71.3% Percentage of true group ink correctly grouped Percentage of ink that was erroneously added to true group Average number of strokes missing from true group Average number of strokes erroneously added to true group Average total missing or extra strokes Percentage of clusters with 0 errors Percentage of clusters with 1 error Percentage of clusters with 2+ errors 96.7% 7.9% 1.3% 34.5% 4.5% 0.08 0.19 0.80 N/A 0.05 0.01 0.80 N/A 0.13 0.20 1.60 N/A 94.2% 82.6% 63.7% 76.7% 3.3% 15.9% 2.0% 16.5% 2.4% 1.5% 34.3% 5.0% 91.0% Table 1: Grouping performance metrics for coarse-grained, coarse+fine grained, LogiSketch under current running parameters, and published LogiSketch results under different parameters true group, but tends to have missing strokes more frequently than the coarsegrained approach by itself. While the coarse-grained algorithm alone achieves the best performance on this data set, the coarse+fine-grained algorithm may prove to be better in busy, crowded sketches where spatial separation alone is not enough to distinguish between symbols. Although the coarse approach outperforms the coarse+fine approach, the coarse+fine approach may provide additional benefits to the sketch recognition process. Future Grouping Work One advantage of the fine-grained grouping algorithm is that for each coarse group, the last finalized strokes are those that have the poorest fit to a known template. It is possible that these strokes could have been mislabeled by the single stroke classifier as a gate, and there is an opportunity to provide feedback to the single stroke classification stage. For example, the single-stroke classifier could have erroneously labeled a wire as a gate, as in figure (7). We would expect that wire to be one of the last strokes finalized by the fine-grained grouping algorithm, and the Hausdorff distance for this wire should be relatively large. Additionally, strokes not labeled as gates which overlap with the coarsegrained bounding box could be added to the pool of strokes used by the finegrained algorithm. If the fine-grained algorithm confidently finalizes such strokes (into one of the first finalized fine-grained groups), this could be an indicator to the single-stroke classifier that these strokes are truly gates. It would be 11 Figure 7: Illustration of an instance where the fine-grained algorithm can provide feedback to the single-stroke classifier interesting to explore ways in which the fine-grained algorithm can provide feedback to the single-stroke classification stage. Currently, the LogiSketch framework simplifies the sketch recognition by separating it into a pipeline of three distinct steps: (1) single-stroke recognition, (2) stroke grouping, then (3) symbol classification. This fine-grained algorithm has promise to achieve a tighter feedback loop in these pipeline stages, where each stage gives feedback to previous stages. In summary, this novel grouping algorithm provides two feedback mechanisms. As illustrated in figure (7), the order in which the fine-grained algorithm finalizes strokes can provide feedback to the single-stroke classification stage. Additionally, the Hausdorff distance (which is typically used to classify shapes) is incorporated into the fine-grained iteration to guide the search for fine-grained groups. 3 Symbol Classification The symbol classifier takes as input a group of strokes from the stroke grouping stage. The goal of this stage is to correctly identify the class of symbol represented by a group of strokes. In the domain of digital logical circuits, a symbol is labeled as one of six possible gates: AND, OR, NAND, NOR, NOT, or XOR. See figure (8) for examples of these gates. The current LogiSketch system uses a nearest-neigbor based approach, which predicts that a test symbol is the same class as the closest template image according to a distance metric [3]. Classification is based on a voting scheme between four distance metrics, including the Hausdorff distance, a modified Hausdorff distance, the Tanimoto Similarity Coefficient, and the Yule Coefficient [5]. In this section, we implement an existing state-of-the-art symbol classification algorithm [9], then modify it to work well in the domain of digital logical circuits. This modified algorithm addresses some of the issues with the current approach taken by LogiSketch. Like any other non-parametric nearest-neighbor classifier, performance is only as good as the templates that a shape is matched 12 Figure 8: Examples of user-drawn gates. Clockwise from top-left: AND, NAND, NOR, XOR, OR, NOT to. Additionally, it is often too costly to compare a test symbol to all symbols in a training dataset; therefore, a representative template set must be chosen. This section presents a parametric approach that will benefit from large training corpuses while still making predictions tractably. Previous approaches to symbol classification generally fall into two categories: stroke-based or image-based. Stroke-based approaches such as [13] classify a symbol based on the raw vectorized stroke information (a series of (x, y, time, pressure) points). Often, stroke vector information can be noisy due to users tracing over existing lines, segmenting edges of shapes into multiple strokes, or simply due to a user’s inherent messiness. This gave rise to several image-based approaches such as [4], which convert the stroke vector information into a bitmap representation, which is less prone to noise but also loses temporal and exact spatial information. Recent work by Ouyang and Davis in [9] proposes a novel hybrid symbol classification technique which combines elements from stroke-based and imagebased approaches. In the following section, this approach is introduced and modifications to this approach are described which make it suitable to overcome the challenges of symbol classification of digital logical circuits. Ouyang et al’s Raster-Based Approach Ouyang and Davis [9] developed a feature representation of symbols based on a set of five low-resolution feature images, or rasters. These features capture the orientation and endpoint information of a symbol. This approach is a hybrid both image- and vector-based approaches because orientation and endpoints are qualities of the vectorized data representation, however they are embedded into an image-based representation. A set of strokes from the stroke grouping stage is used as input to this algorithm. The feature extraction process is summarized in the steps below; a complete explanation can be found in their paper. 1. Additional points are interpolated within strokes to ensure constant spatial 13 distance between consecutive points. 2. The positional x and y values of each point in this symbol are scaled and transformed so that the symbol’s center of mass is at the origin and so that the points have unit variance along both the horizontal and vertical axis. 3. For each point in every stroke of a symbol, five features are computed. Orientation features are computed, which measure how nearly horizontal, vertical or diagonal (45 or 135 degrees) the stroke is at that point. An endpoint feature is an indicator variable describing whether or not the point is an endpoint of a stroke. Each of these features is scaled to a range of [0, 1]. 4. Five 24 by 24 raster grids are created - one for each of the four orientations, and one for endpoints. Each point is mapped to a pixel in these 24 by 24 grids. The pixel intensity for each pixel in the feature rasters is set to the maximum feature value in all points that map to that pixel. 5. A Gaussian smoothing function is applied to each raster, then each raster is down-sized to a 12 by 12 raster grid using a max-filter, where each pixel in the downsized image is the maximum of the four pixels from the original grid. The result of this feature extraction process is a set of five 12 by 12 grids a total of 720 feature values for each symbol. Each raster contains orientation or endpoint information extracted from the vectorized strokes. Finally, a test symbol is compared to a set of template images to determine the predicted class. This comparison is carried out using the image deformation model (IDM) distance metric [6], which is tolerant to local deformations in the symbol. Adapting the Raster-Based Approach for Digital Logical Circuits The symbol classification task is especially challenging in the digital logical circuit domain for two reasons. First, slight deformations in a symbol may change it from one gate to another. For example, adding a bend to the vertical back of an AND gate effectively converts it to an OR gate. Sketchers in practice do not draw perfectly straight strokes, so it is important to pick up on the important distinguishing factors of shapes (e.g., the curvature of the vertical back) without falling victim to the imperfections of human sketchers. Second, some gates are subsets of the strokes in other gates. For example, an OR gate is a subset of a NOR gate in which the “bubble” has been removed. While the majority of a symbol may be highly indicative of one gate, there may be a small amount of ink which changes the class of that gate. A symbol classifier must be sensitive to small amounts of ink which may suggest a change in predicted gate. Because logical gates are so sensitive to small deformations, the IDM distance metric performed very poorly. Instead, a parametric approach was used. A multi-class linear support vector machine (SVM) was trained based on features from the 12 by 12 rasterized images. A support vector machine model has the 14 Figure 9: Examples of the six raster images used as features in a linear SVM for symbol classification added benefit of being able to tractably learn from large training sets while still performing prediction quickly. The decision boundary of an SVM is a hyperplane in high-dimensional space and a prediction can be made based on a dot product calculation which can be computed in time O(num. features), regardless of how many training examples there are. We also modified Ouyang et al’s algorithm in an attempt to improve classification accuracy. Ouyang’s five raster images introduced in the section above represent where the pen ink is on the tablet, the angle at which the pen was moving, and where the pen was put down or lifted up from the tablet. These features describe the spatial characteristics of a symbol, but this feature set does not describe any temporal information. This temporal information can describe not just where a symbol was drawn, but how it was drawn. For example, sketchers must decrease pen velocity to make sharp turns or intricate patterns, but may tend to draw straight lines quickly. This temporal information was encoded in a sixth raster image, which projects the pen’s velocity onto a similar 12 by 12 raster image. Similar steps were taken for this raster image, including shifting, scaling, Gaussian smoothing, and downsampling. In figure (9), an example of a complete set of six raster images is shown for an AND symbol. In total there are 12 × 12 × 6 = 864 feature values. Symbol Classification Experimental Results Digital Logical Circuits We tested both the parametric SVM classifier and the additional velocity raster on a dataset of sketches from nine students at the University of California, River15 Method IDM SVM (no velocity) SVM (with velocity) 2-fold c.v. accuracy 60.06% 90.58% 92.01% Notes cost: 0.1 cost: 0.1 Table 2: Performance of symbol classifiers on digital logical circuit data. The cost reported in the “Notes” column is the linear SVM regularization parameter tuned by grid search. side and Harvey Mudd College. 1,508 symbols were taken from 175 sketches from these nine students. True symbol classifications were labeled by hand. To simplify the experimental procedure, we assumed perfect grouping from the stroke grouping stage. Two-fold cross validation was performed for each test. Cross validation groups were assigned so that no student had symbols in both the training and testing datasets. This rule ensures no bias due to already seeing a student’s sketching style. Three classifiers were considered; all three are based on the raster feature representation. • The first classifier used a non-parametric nearest-neighbor predictor with the IDM distance metric. In this approach, a random 20% subset of the training data was used as a set of templates in the nearest-neighbor comparison. • Second, a multi-class linear SVM model was trained on the complete training set. In this experiment, no velocity raster is used. The one-vs-one strategy is used to combine pairwise binary classifiers into a multi-class model. Linear SVM models require a cost parameter which was tuned via grid search. • Third, a similar multi-class linear SVM model is used, but the velocity raster is added. Results are listed in table (2). The IDM nearest neighbor technique achieved an unimpressive 60.06% predictive accuracy. We believe this is due to the fact that digital logical gates are sensitive to slight deformations, and allowing such deformations can quickly make one type of gate look like another. We also observe that the two linear SVM models classify relatively accurately. The additional velocity raster increases predictive accuracy by 1.43%, from 90.58% to 92.01%. Digital Logical Circuits: A comparison to LogiSketch In this experiment, we compare the image-based nearest neighbor classifier in the current LogiSketch system with the best classifier from the previous section: the SVM classifier using the velocity raster. To compare, the experimental procedure described in [3] was replicated to ensure that results are comparable to published results. In this procedure, classifiers are trained and tested on three types of data: 16 Training Activity ISO COPY SYNTH ISO COPY SYNTH ISO COPY SYNTH Testing Activity GlobalSet SYNTH SYNTH SYNTH ComboSet SYNTH SYNTH SYNTH UserSet SYNTH SYNTH SYNTH LogiSketch SVM 81.1% 79.4% 81.9% 86.9% 88.8% 85.8% 88.5% 86.8% 93.4% 83.3% 88.6% 90.1% 84.0% 81.9% 88.6% 70.5% 62.1% 69.8% Table 3: Classifier performance under different types of data, training tasks, and test tasks 1. GlobalSet – one global classifier is learned from all available training data. This simulates a factory trained classifier. 2. ComboSet – individual classifiers are learned for each user such that 40% of the training data is global and 60% is user specific. This simulates a factory trained classifier tuned “online” to a user’s examples. 3. UserSet – individual classifiers are learned for each user using only other training symbols from that user. This simulates a classifier trained only on examples given by the user. Additionally, the type of training data was also varied. Training data was gathered from students under three premises: isolated symbol drawing (ISO), copying a logic diagram (COPY), and synthesizing a logic diagram from a logical equation (SYNTH). We used test data from the SYNTH task. This most accurately simulates how a user would use sketch recognition software in real life. The published Logisketch results in [3] are based on template sets with five templates per gate to remain tractable. SVM models typically improve with more training data so all available training data is used. Again, a two-fold cross validation procedure is used. In table (3), we see that the SVM classifier excels when one global classifier is learned from all available training data (i.e., a factory-trained classifier). Results are mixed in the combo scenario, where user-specific models are trained using a split of global data and user-specific data. LogiSketch’s nearest-neighbor approach is dominant when user-specific models are trained using only other examples from that user (i.e., a personalized classifer). In general, the SVM model performs well when it can take advantage of a large training set such as GlobalSet, but when the training data “narrows in” on just the examples of one user as in the UserSet, the SVM model does not have enough training examples to learn a good model. We believe there is an opportunity to use a “factory trained” global SVM model when a user first starts using LogiSketch. As a user interacts with LogiSketch, it can switch to a model 17 Figure 10: Examples of training handwritten pen digits. First stroke in blue, second stroke in green. Method linear SVM (no velocity) linear SVM (with velocity) Eigen-Deformations [7] IDM [9] test accuracy 98.4% 98.7% 98.2% 99.2% Table 4: Performance of symbol classifiers on handwritten pen digits dataset which combines the decisions of the SVM and nearest-neighbor classifiers. We leave this combined classifier to future work. Handwritten Pen Digits Because there are so many application domains that can benefit from sketch recognition software, it is important that sketch recognition algorithms can be applied across application domains. In this section, the best classifier from above (SVM with velocity) is tested on handwritten digits. The handwritten pen digits data set was first introduced in 1997 by Alimoglu and Alpaydin [1]. It contains 10,992 individual digit sketches divided into a training and testing set. 30 users provided 7,493 training digit examples and an additional 14 users provided 3,497 test digit examples. No writers had examples in both the training and testing sets to simulate the ability of learning algorithms to apply to new users. Some example training digits are illustrated in figure (10). A global linear SVM model was learned on all training examples then applied to all test examples. This procedure was repeated twice – once for an SVM model without a velocity raster, and once with the addition of the velocity raster into the feature set. Results are listed in table (4). The addition of the velocity raster increases test set accuracy from 98.4% to 98.7%, which further suggests that velocity information can increase separability of the data. The SVM with velocity method improves upon the best known published result before 2009 [7], however it appears that the IDM distance metric is well suited to pen digits and achieves the highest accuracy. 18 4 Conclusion We have presented a novel two-stage stroke grouping algorithm which does a fast initial grouping then refines these groups in an efficient way. Experiments suggest that this approach improves upon the techniques currently used in LogiSketch. Additionally, this grouping algorithm has the ability tighten the loop in the pipelined sketch recognition architecture. Additionally, we improved an existing state-of-the-art symbol classification technique for use in the domain of digital logical circuits. Experiments suggest that this classification technique may be optimal for factory trained classifiers. References [1] F. Alimoglu and E. Alpaydin. Combining multiple representations and classifiers for pen-based handwritten digit recognition. In Proceedings of ICDAR, 1997, 1997. [2] C. Alvarado. . In Sketch Recognition Research at HMC, September 2011. [3] M. Field, S. Gordon, E. Peterson, R. Robinson, T. Stahovich, and C. Alvarado. The effect of task on classification accuracy: using gesture recognition techniques in free-sketch recognition. In Proceedings of the 6th Eurographics Symposium on Sketch-Based Interfaces and Modeling, SBIM ’09, pages 109–116, New York, NY, USA, 2009. ACM. [4] L. B. Kara and T. F. Stahovich. Hierarchical parsing and recognition of hand-sketched diagrams. In Proceedings of the 17th annual ACM symposmitium on User interface software and technology, UIST ’04, pages 13–22, New York, NY, USA, 2004. ACM. [5] L. B. Kara and T. F. Stahovich. An image-based trainable symbol recognizer for sketch-based interfaces. In AAAI Fall Symposium Series 2004: Making Pen-Based Interaction Intelligent and Natural, pages 99–105. AAAI Press, 2004. [6] D. Keysers, C. Gollan, and H. Ney. Local context in non-linear deformation models for handwritten character recognition. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, volume 4, pages 511 – 514 Vol.4, aug. 2004. [7] H. Mitoma, S. Uchida, and H. Sakoe. Online character recognition using eigen-deformations. In 9th International Workshop on Frontiers in Handwriting recognition, pages 26–29, 2004. [8] T. Y. Ouyang and R. Davis. Learning from neighboring strokes: Combining appearance and context for multi-domain sketch recognition, 2009. [9] T. Y. Ouyang and R. Davis. A visual approach to sketched symbol recognition, 2009. [10] T. Y. Ouyang and R. Davis. Visual recognition of sketched symbols. In IUI 2009 Workshop on Sketch Recognition, 2009. 19 [11] T. Y. Ouyang and R. Davis. Chemink: a natural real-time recognition system for chemical drawings. In Proceedings of the 16th international conference on Intelligent user interfaces, IUI ’11, pages 267–276, New York, NY, USA, 2011. ACM. [12] E. J. Peterson, T. F. Stahovich, E. Doi, and C. Alvarado. Grouping strokes into shapes in hand-drawn diagrams. In AIII, 2010. [13] J. O. Wobbrock, A. D. Wilson, and Y. Li. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In Proceedings of the 20th annual ACM symposium on User interface software and technology, UIST ’07, pages 159–168, New York, NY, USA, 2007. ACM. 20