IMG2XML Linking Text and Image with SVG Hugh Cayless — NYU Friday, October 9, 2009 1 background NEH-funded project at UNC Chapel Hill Prototype linking of images, transcription, and notes on the diary of a 19th-century UNC undergraduate Develop and refine methods for producing vector graphic tracings of manuscript text Explore the theoretical background Friday, October 9, 2009 2 Papyrus: P. Mich. Inv. 3088 Friday, October 9, 2009 3 SVG Tracing of Previous Friday, October 9, 2009 4 Huitfeldt’s and Sperberg-McQueen’s model of transcription A reading is a sequence of typed tokens The reader recognizes marks on the support and interprets them as tokens token type ʻΑʼ They aren’t really specific about resolution of types —words may be ok, for example Friday, October 9, 2009 5 The img2xml model An SVG tracing of the text consists of Shapes Structures (such as lines, words, or letters) are the overlap of one or more Shapes and a bounding Region Structures map to elements in a transcription or to annotations Friday, October 9, 2009 6 SVG Transcription Shape Shape Shape Shape Shape Line Structure Word Letter Region Friday, October 9, 2009 7 SVG tracing with added Regions a Region Shapes Friday, October 9, 2009 8 “Whence” in the line above is touched by the “W” in “With” and it in turn is touched by the descender in “gaze” above. The descender in “from” overlaps the outlined Region, but does not interact with it to form a Structure. The combination of Shapes and a Region constitutes a line Structure. Friday, October 9, 2009 9 #struct1 | isA | http://philomousos.com/img2xml/ontology/Structure jld-p010.svg#rect12755 | isA | http://philomousos.com/img2xml/ontology/Region jld-p010.svg#path10728 | isA | http://philomousos.com/img2xml/ontology/Shape jld-p010.svg#path10832 | isA | http://philomousos.com/img2xml/ontology/Shape jld-p010.svg#path10812 | isA | http://philomousos.com/img2xml/ontology/Shape dusenbery.xml#lb10-8 | transcribes | #struct1 jld-p010.svg#rect12755 | memberOf | #struct1 jld-p010.svg#path10728 | memberOf | #struct1 jld-p010.svg#path10832 | memberOf | #struct1 jld-p010.svg#path10812 | memberOf | #struct1 <path d="m 120.24436,214.02349 c -2.99822,2.99822 -7.0785,7.32647 -7.98022,8.45362 -0.56358,0.69884 -1.33004,1.64564 -1.73581,2.0965 -0.76646,0.87918 -1.35258,2.02887 -1.12715,2.23176 0.22543,0.22543 1.10461,-0.18035 0.99189,-0.45086 -0.11271,-0.24798 1.19478,-2.07396 2.84042,-3.99011 0.27052,-0.31561 0.74392,-0.90172 1.05952,-1.33004 0.90172,-1.17224 6.15424,-6.49238 5.81609,-5.86118 -0.4734,0.87918 -3.08839,4.5086 -4.23808,5.86118 -1.66818,1.98378 -3.83231,4.89183 -3.83231,5.16235 0,0.56357 0.6312,0.13525 2.84042,-2.00633 2.00633,-1.91616 3.02076,-2.63753 3.02076,-2.14159 0,0.0676 -0.40577,0.76647 -0.90172,1.53293 -0.96935,1.48784 -1.12715,2.50227 -0.40577,2.93059 0.40577,0.27051 2.52481,-0.13526 3.60688,-0.67629 0.54103,-0.27052 0.60866,-0.24797 0.74392,0.18034 0.36068,1.1497 2.7277,1.12715 3.83231,-0.0225 0.36068,-0.38323 0.76646,-0.67629 0.87917,-0.67629 0.13526,0 0.92427,-0.56357 1.7809,-1.23987 0.83409,-0.67629 1.51038,-1.1046 1.51038,-0.9468 0,0.13526 -0.3156,0.81155 -0.67629,1.48784 -0.38323,0.67629 -0.67629,1.30749 -0.67629,1.42021 0,0.4734 0.90172,0.11271 1.44275,-0.58612 0.94681,-1.26241 2.07396,-1.82599 3.60688,-1.82599 1.89361,0 1.91616,-0.49594 0.0225,-0.54103 -1.01443,-0.0225 -1.62309,0.0676 -1.98378,0.3156 -0.69883,0.49595 -0.74392,0.45086 -0.65375,-0.67629 0.0902,-0.9468 0.0676,-1.01443 -0.4734,-1.01443 -0.36069,0 -1.05952,0.45086 -2.02887,1.33004 -2.02887,1.82598 -4.12537,3.22365 -4.73403,3.13347 -0.76646,-0.11271 -0.60866,-1.53292 0.29306,-2.61498 0.76646,-0.94681 0.94681,-2.16413 0.29306,-1.91616 -0.67629,0.22543 -1.21732,0.81155 -1.7809,1.89361 -0.56357,1.1497 -1.62309,1.9387 -2.95313,2.23176 -0.87918,0.20289 -1.42021,0.0902 -1.42021,-0.3156 0,-0.13526 0.36069,-0.96935 0.83409,-1.84853 0.49595,-0.9468 0.76646,-1.73581 0.67629,-1.96124 -0.27051,-0.69883 -1.33004,-0.42832 -2.61499,0.6312 -0.58611,0.47341 0.0902,-0.49594 3.02077,-4.41842 1.78089,-2.36702 2.45718,-3.404 2.79533,-4.32826 0.38323,-1.01443 -0.45086,-0.76646 -1.69073,0.49595 z" id="path10812" style="fill:#000000;stroke:none"/> Friday, October 9, 2009 10 Advantages The SVG tracing gives you hooks in a facsimile of the image that you can hang transcriptions or annotations on. It allows the capture of the structure and its relation to the transcription. SVG shapes can be manipulated using Javascript, made transparent, (in)visible, colored, zoomed, etc. Some structure-detection tasks are simpler with vector graphics. Friday, October 9, 2009 11 Disadvantages 2-dimensional Relies on collapsing the image to a 1-bit (black/white) colorspace. Browser support (though Google may recently have fixed that). SVG lacks semantics beyond simple geometry Friday, October 9, 2009 12 The Future / Questions Prototype completion in winter 2009 (I hope). Research into other texts (papyri, Archimedes palimpsest data). I’d like to incorporate this into a transcription tool (maybe SoSOL – papyrus transcription editor under development at UKY). Is this useful? Is the model sensible? Do we need a model at all? Friday, October 9, 2009 13 hugh.cayless@nyu.edu http://github.com/hcayless/img2xml http://philomousos.com/img2xml/ http://docsouth.unc.edu/dusenbery (not yet) Friday, October 9, 2009 14