1 Querying Graphics through Analysis and Recognition INRIA Lorraine 2 Research fields •Image processing and segmentation •Structural pattern recognition •Statistical pattern recognition •Information spotting and retrieval In the context of the analysis and recognition of graphics-rich documents 3 Querying Graphics through Analysis and Recognition 4 Querying Graphics through Analysis and Recognition 5 Querying Graphics through Analysis and Recognition 6 Querying Graphics through Analysis and Recognition Scientific staff PhD students Suzanne Collin, Assist. Prof. UHP Sabine Barrat, CIFRE contract (pending) Philippe Dosch, Assist. Prof. Nancy 2 Thi Oanh Nguyen, joint supervision with IFI (Hanoi, Vietnam) Bart Lamiroy, Assist. Prof. INPL/Mines Oriol Ramos Terrades, joint supervision with UAB, Barcelona (Spain) Gérald Masini, CR CNRS Salvatore Tabbone, Assist. Prof. U. Nancy 2 Karl Tombre, Prof. INPL/Mines Laurent Wendling, Assist. Prof. UHP Jan Rendek, CIFRE France Télécom Jean-Pierre Salmon, FRESH (European project) Zhang Wan, joint supervision with City U. Hong Kong (pending) Daniel Zuwala, MESR grant Administrative staff Isabelle Herlich (part time) Françoise Laurent (part time) Technical staff Yamina Smail, Epeires project X, Fresh project (pending) 7 8 Main results 2004-05 Hierarchical binarization 9 Main results 2004-05 Focus on symbol recognition – Symbol spotting combining Radon-based signature and structural approach 10 Main results 2004-05 Improvement of recognition rates through combination of shape descriptors Recognition rates by descriptors Recognition rates The set of images I 100 90 80 70 60 50 40 30 20 10 0 Compactness Ellipticity Degree Angular Signature Generic Fourier Descriptors R-signature 1 2 3 4 5 6 Clusters 7 8 9 Weighted Sum without weighted map 11 Main results 2004-05 Improvement of recognition rates through combination of shape descriptors Application : extraction of letters in heritage documents Ranking Descriptors C E SA GFD TRf WS Before 85 78 90 89 80 96 After 93 81 97 94 87 100 Descriptors C E SA GFD TRf WS Before 49 41 70 59 50 55 After 52 39 75 69 50 72 Recognition rates 12 Main results 2004-05 Raster-to-vector conversion method based on random sampling and parametric fitting 13 Segmenting the skeleton RANVEC : Random sampling on pairs of vector points Extension of primitive as long as it fits arc or segment (linear regression) 14 Simplification and unification of primitives 15 Arc segmentation contest GREC’01 GREC’03 GREC’05 Winner Dave Elliman JiQiang Song Xavier Hilaire VRI 0,681 0,609 0,803 16 Application domains/transfer Electrical wiring diagrams in aeronautics FRESH project (FP6 STREP Aeronautics program) 17 Application domains/transfer Cultural heritage documents ACI Madonne, FP6 STREP proposal QUIMERA-Doc submitted 9/05 QgarLib : library of C++ classes QgarApps : applications QgarGUI : user interface qgar.org, APP Refactoring to professional standards Open architecture (XML) 80,000 lines of C++ code (comments not counted) 30 to 40 downloads of code per month >10 documentation browses per day (robots excluded) 18 19 Positioning within INRIA Fully within one of INRIA’s 7 challenges in strategic plan: Developing multimedia data and information processing Regular partnership with Imadoc (research group at Irisa) Joint contacts Texmex (Sym-C)/Qgar with industrial partner Recent contacts with Lear on browsing of large image bases 20 Collaborations National: informal consortium Nancy, Rennes, La Rochelle, Rouen, Tours, Lyon with several joint projects (ACI Madonne, RNTL past and submission, Techno-Vision Epeires, IST submission) and coordination of actions CVC/UAB, Barcelona: long lasting relationship, associated team SymbolRec, joint PhD supervisions City University Hong Kong: associated in Epeires, PAI submission accepted, joint PhD supervision IFI, Vietnam: joint PhD supervision University of Auckland (NZ), University of Bern, Carleton University (Canada) 21 Achievements, strengths, weaknesses Leadership position at international level on graphics recognition Announced in project and largely addressed: • • Symbol recognition and spotting Performance evaluation Strong and adequate applicative backing Improvement in number of PhD students Still low on permanent workforce 22 Future work Scalability of symbol recognition methods • • Large number of models Variations within the same shape class – Combining structural and statistical methods – Hierarchical approach 23 Future work Complex symbols 24 Future work Dynamic, on-the fly recognition and spotting: from model-based recognition to freehand recognition 25 Future work Multi-modal indexation (text / graphics / image / video) in multimedia and document databases (collaborations with Texmex, Lear, …) Interactivity with user (relevance feedback) 26 Future work Performance evaluation • • • International symbol recognition contests 2003 & 2005 Epeires – French Techno-Vision program – 4 universities, FT R&D, 1 company + foreign partners UAB & CityU – www.epeires.org Future research challenges – Simple and non-biased metrics – Ground-truth/recognition output matching methods – Generation of large sets of training and benchmarking data using realistic image degradation models Epeires – ground-truthing 27 28 Future work Software : increase number of applications