INLS 520 Information Organization INLS 520 – Fall 2007 Erik Mitchell Review • Metadata models • DC, METS • Metadata Standards • Dublin core / qdc • Encoding Schemes • HTML, XML, MARC… • Advanced metadata concepts • Schemas, application profiles INLS 520 – Fall 2007 Erik Mitchell Today • Core Skills for Library/IS types • MARC Overview – Encoding – Related Standards – Exercise • RDF Introduction (brief) • Introduction to programming (brief) INLS 520 – Fall 2007 Erik Mitchell Discussion • Read assigned posting from NGC4LIB discussion group. • Share in group & think about the following questions: – What are the core skills that an information organization professional should have? – What is the relationship of Information organization to these “core skills?” INLS 520 – Fall 2007 Erik Mitchell Anatomy of a bibliographic record Encoding Standards (MARC) Content/syntax Standards (AACR) Classification Systems (LCSH) INLS 520 – Fall 2007 Erik Mitchell MARC value standards • Fields & Values – Fields, Indicators, Subfields – More information from OCLC • Content and encoding standards – AACR2 – RDA • Development started in 2004, slated for release in 2009 • An enjoyable article on the development of RDA INLS 520 – Fall 2007 Erik Mitchell How to enter a title into a MARC record – AACR2 • Transcribe title exactly according to spelling but not necessarily punctuation/capitalization. • If an alternative title is present, precede it by a comma following the regular title • Use a General Material Designation in brackets [] – MARC Standard • • • • • Use 245 field – indicates Main title Indicator 2 – Number of non-filing characters (leading articles) Subfield a – main title Subfield b – remainder of title Subfield h – General Material Designation in brackets [] INLS 520 – Fall 2007 Erik Mitchell MARC metadata • Definition – Machine Readable Catalog Record – Combination of content, value, and encoding standard • History – Created by Henriette Avram in 1968 – Managed by the Library of Congress INLS 520 – Fall 2007 Erik Mitchell MARC metadata • The encoding standard – – – – Variable length record Set leader defines position of fields in record Fixed fields in leader codifies format information Variable length fields provide descriptive content • Examples – System ready example record (LC) – Uses of MARC fields by OCLC • More information – More information from LC INLS 520 – Fall 2007 Erik Mitchell Encoded MARC record 01802cam 22003371a 45000010018000000030008000180050017000260060019000430070015000620080041000 77015001400118035002100132035001800153040005100171041001300222043002100235 05000170025608200170027324501100029026000660040030000210046650505330048753 30157010206500025011776510033012027000044012357760032012798300048013118560 09201359949001301451ASPS00000161/nwldVaAlASP20061114120112.0m | d | cr |n ---||a|a730321s1955 mnu 000 0 eng aGB56-6680 9(DLC) 55009368 a(OCoLC)585815 aDLCcODaUdOCoLCdMnHidUkdPBfGdDLCdVaAlASP1 aenghnor an-us---ae-no---00aE184.S2bB55 a325.2481097300aLand of their choiceh[electronic resource] :bthe immigrants write home /cedited by Theodore C. Blegen. a[Minneapolis, Minn.] :bUniversity of Minnesota Press,c1955. a463 p. ;c24 cm.0 aThe immigrant image of America -- The "sloopfolk" arrive -- Westward to El-a-noy -Wisconsin is the place -- The Atlantic crossing -- Scouting the promised land -Spreading the gospel -- Journeying toward new horizons -- Ordeal and debate -Appraising the American scene -- The transatlantic gold rush -- Cheerful voices at midcentury -- More than a ballad -- A humorist in Canaan -- A lady grows old in Texas -- In defense of the southwest -- From a frontier parsonage -- The beautiful land -- The glorious new Scandinavia.I0aElectronic reproduction.bAlexandria, VA :cAlexander Street Press,d2002.f(North American women's letters and diaries).nAvailable via World Wide Web. 0aNorwegian Americans. 0aUnited StatesxCivilization.1 aBlegen, Theodore Christian,d1891-1969.1 cOriginalw(DLC) 55009368 0aNorth American women's letters and diaries.40zAccess restricted to subscribers.uhttp://www.aspresolver.com/aspresolver.asp?NWLD;S16101aER_NAWL D INLS 520 – Fall 2007 Erik Mitchell Text formatted MARC • • • • • • • • • • • • • • • • • • • • • • • • • • • =LDR 01802cam 22003371a 4500 =001 ASPS00000161/nwld =003 VaAlASP =005 20061114120112.0 =006 m\\\\|\\\d\|\\\\\\ =007 cr\|n\---||a|a =008 730321s1955\\\\mnu\\\\\\\\\\\000\0\eng\\ =015 \\$aGB56-6680 =035 \\$9(DLC) 55009368 =035 \\$a(OCoLC)585815 =040 \\$aDLC$cODaU$dOCoLC$dMnHi$dUk$dPBfG$dDLC$dVaAlASP =041 1\$aeng$hnor =043 \\$an-us---$ae-no--=050 00$aE184.S2$bB55 =082 \\$a325.24810973 =245 00$aLand of their choice$h[electronic resource] :$bthe immigrants write home /$cedited by Theodore C. Blegen. =260 \\$a[Minneapolis, Minn.] :$bUniversity of Minnesota Press,$c1955. =300 \\$a463 p. ;$c24 cm. =505 0\$aThe immigrant image of America -- The "sloopfolk" arrive -- Westward to El-a-noy -- Wisconsin is the place -- The Atlantic crossing -- Scouting the promised land -- Spreading the gospel -- Journeying toward new horizons -- Ordeal and debate -- Appraising the American scene -- The transatlantic gold rush -- Cheerful voices at mid-century -- More than a ballad -- A humorist in Canaan -- A lady grows old in Texas -- In defense of the southwest -- From a frontier parsonage -- The beautiful land -- The glorious new Scandinavia. =533 I0$aElectronic reproduction.$bAlexandria, VA :$cAlexander Street Press,$d2002.$f(North American women's letters and diaries).$nAvailable via World Wide Web. =650 \0$aNorwegian Americans. =651 \0$aUnited States$xCivilization. =700 1\$aBlegen, Theodore Christian,$d1891-1969. =776 1\$cOriginal$w(DLC) 55009368 =830 \0$aNorth American women's letters and diaries. =856 40$zAccess restricted to subscribers.$uhttp://www.aspresolver.com/aspresolver.asp?NWLD;S161 =949 01$aER_NAWLD INLS 520 – Fall 2007 Erik Mitchell MARC variable fields • 245 14 $a The MARC record: $b revealed and detailed – Field tag: 245 – Indicators: 14 – Subfield: $a, $b – Contents INLS 520 – Fall 2007 Erik Mitchell MARC leader http://www.oclc.org/support/documentation/worldcat/records/subscription/1/1.pdf INLS 520 – Fall 2007 Erik Mitchell MARC fields (1) • • • • • • • 001-007 010-035 050-099 100-130 210-247 250-270 300-362 Leader/fixed fields Identifying numbers Call Numbers Names Title Edition, imprint, etc Physical, publication info. INLS 520 – Fall 2007 Erik Mitchell MARC fields (2) • • • • • • 500-599 600-699 700-799 800-830 856 900-999 Notes & contextual info. Subject headings, names Added entries Series added entries Electronic access Local information INLS 520 – Fall 2007 Erik Mitchell Example MARC fields (1) • • • • • • • • • • =LDR 01802cam 22003371a 4500 =001 ASPS00000161/nwld =003 VaAlASP =005 20061114120112.0 =006 m\\\\|\\\d\|\\\\\\ =007 cr\|n\---||a|a =008 730321s1955\\\\mnu\\\\\\\\\\\000\0\eng\\ =015 \\$aGB56-6680 =035 \\$9(DLC) 55009368 =035 \\$a(OCoLC)585815 INLS 520 – Fall 2007 Erik Mitchell MARC leader (006) Position Field Value 00-04 05 06 07 08 09 10 11 12-16 17 18 19 20 21 22 23 Logical Record Length RecStat (Record Status) Type (type of record) BLvl (Bibliographic level) Ctrl (type of control) Character Coding Scheme Indicator Count Subfield Code Count Base Address of data ELvl (Encoding Level) Desc (Descriptive catalog form AACR2/ISBD) Linked Record Requirement Length of Len-of-field Length of starting character Transaction type code in hex Undf 0180 c a m \ INLS 520 – Fall 2007 Erik Mitchell 1 a 008 Field (Leader – 2) Position Field Value 00–05 06 07–10 11–14 15–17 18–34 Entered Date added to WorldCat DtSt Date Type Dates (Date 1) Dates (Date 2) Ctry(Required if avail.) Format specific 730321 s 1955 \\\\ mnu (See Summary of 008 and 006 Field Bytes.) 18 22 23 24 28 29 30 31 33 34 35–37 38 39 Illustrations Audience Form Nature of Contents Gpub (Government Publication) Conf (conference Publication) Fest (Festschrift) Indx (does the resource have an index) LitF (literary form) Biog (Is the work biographical) Lang(Mandatory) MRec Modified Record Srce (Mandatory)Cataloging source INLS 520 – Fall 2007 Erik Mitchell acde e r bcde \ 0 0 1 m \ eng \ \ Example MARC fields (2) • =050 00$aE184.S2$bB55 • =082 \\$a325.24810973 • =245 00$aLand of their choice$h[electronic resource] :$bthe immigrants write home /$cedited by Theodore C. Blegen. • =260 \\$a[Minneapolis, Minn.] :$bUniversity of Minnesota Press,$c1955. • =300 \\$a463 p. ;$c24 cm. INLS 520 – Fall 2007 Erik Mitchell Example MARC fields (3) • • • • =505 0\$aExtracted notes fields. =650 \0$aNorwegian Americans. =651 \0$aUnited States$xCivilization. =700 1\$aBlegen, Theodore Christian,$d1891• =830 \0$aNorth American women's letters • =856 40$zAccess restricted to subscribers.$uhttp://www.aspresolver.c om/as presolver.asp?NWLD;S161 • =949 01$aER_NAWLD INLS 520 – Fall 2007 Erik Mitchell MARC Exercises • Introduction to MARCEdit – If you can’t use MARCEdit – use a text editor & follow this standard: • =245 04 $a content $b more content – Tour of the application – Exercise 1 – create a MARC record – Exercise 2 – decompile/compile MARC records, batch edit INLS 520 – Fall 2007 Erik Mitchell FRBR Model http://fictionfinder.oclc.org/ http//worldcat.org http://www.frbr.org INLS 520 – Fall 2007 Erik Mitchell http://www.ifla.org/ FRBR background • Work/item – C.A. Cutter (1890) • Notion of a work – S. R. Ranganathan (1930-late 1960) • Intellectual entity – expressed thought • Physical entity – embodies thought – P. Wilson • Intellectual entity – work – Subject metadata • Physical entity – item – Selected descriptive metadata INLS 520 – Fall 2007 Erik Mitchell Adapted from Jane Greenberg FRBR components • Work – distinct intellectual or artistic creation • Expression – intellectual or artistic realization of a work • Manifestation – physical embodiment of an expression of a work • Item – a single exemplar of a manifestation INLS 520 – Fall 2007 Erik Mitchell Adapted from Jane Greenberg FRBR Example • Rolling Stones’ IT'S ONLY ROCK-N –ROLL (1974) (work) – Group’s performance recorded for the album (Expression) • Recording released in 1974 by MCA Records on tape cassette (Manifestation) • Recording released in 1974 by MCA Records on compact disc (Manifestation) • Sheet music released in 1992 (?) INLS 520 – Fall 2007 Erik Mitchell Adapted from Jane Greenberg FRBR diagram I: Your CD, RCA, 2005 c.1 M: CD, RCA, 2005 E: Music and lyrics M: RS, LP 1974 I: UNC Musllib.CD, RCA, 2005 c.3 I: My CD, RCA, 2005 c.2 E: Music (just the instruments) Work, the Performance (1974) M: 8-track, RCA, 1975 INLS 520 – Fall 2007 Erik Mitchell Adapted from Jane Greenberg FRBR Algorithm (1) • Process – Extract Author • Construct Authority author entry from100, 400 using subfields and 008 data to limit – Extract Title • Construct Authority title entry from 130, 240, 245, etc. Normalize using NACO – Combine these two authorities to create a unique Work identifier • <author>Mitchell, Margaret</author><title>Gone with the wind</title> INLS 520 – Fall 2007 Erik Mitchell FRBR Algorithm (2) • Results from a sample extraction (From FRBR doc) • • • • <author>/<title> <uniform title> /<title>/[one or more <name>] /<title>/<control number> (75.97%) (1.34 %) (17.35%) (5.34%) • http://www.oclc.org/research/software/frbr/frbr_ workset_algorithm.pdf INLS 520 – Fall 2007 Erik Mitchell Warwick Framework • Origins / Definition – Beginnings: Came out of DC discussions in 1995/6 – Goal: to promote interoperability, define context of the DC metadata, come up with a way of ‘contextualizing’ DC description – Definition: A general model that describes the various parts of a complex object, including the various categories of metadata.http://www.cs.cornell.edu/wya/DigLib/MS1999/glossary.html • Components – Container – Package • Metadata set • Indirect link • Another container INLS 520 – Fall 2007 Erik Mitchell Resource Description Framework • Origins – PICS (Platform for Internet Content Selection) – Warwick framework – Initial goal was to code metadata for the web • Definition: – A data model – A set of “statements” about a “resource” – RDF Triple: Description = Resource with Value INLS 520 – Fall 2007 Erik Mitchell RDF Example • A resource is a uniquely identifiable thing (URI) • Properties are given context (Property Type) INLS 520 – Fall 2007 Erik Mitchell From Miller, 1998 RDF Model Author “Abe Crystal” Webpage: http://ils.unc.edu (Resource) (Property type) (Value) Subject Predicate Object “The author of the SILS Webpage is Abe Crystal” http://ils.unc.edu has a creator with name Abe Crystal - A literal, a triple, a statement INLS 520 – Fall 2007 Erik Mitchell From Greenberg How is RDF different? • RDF is a descriptive model that – Allows variable contextualized description – Deconstructs the descriptive process – Allows more granular automated processing of data – Uses exact markup to indicate the context of values (namespaces, schemas) • A simple Example INLS 520 – Fall 2007 Erik Mitchell Encoding RDF in XML <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdf:Description rdf:about="http://purl.org/dc/elements/1.1/"> <dc:title>The Hang: The Island of Black Jeans</dc:title> <dc:creator>SAKI KNAFO</dc:creator> <dc:identifier>http://www.stuff.com</dc:identifier> <dc:date>Sun, 16 Sep 2007 01:04:40 GMT</dc:date> <dc:description>descriptive content</dc:description> </rdf:Description> </rdf:RDF> INLS 520 – Fall 2007 Erik Mitchell Iterative RDF description <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:vcard="http://dli.grainger.uiuc.edu/publications/metadatacasestudy/dc_schemas/v card.xsd" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdf:Description rdf:about="http://purl.org/dc/elements/1.1/"> <dc:title>The Hang: The Island of Black Jeans</dc:title> <dc:creator rdf:href = "#Creator_001"/> <dc:identifier>http://www.stuff.com</dc:identifier> <dc:date>Sun, 16 Sep 2007 01:04:40 GMT</dc:date> <dc:description>descriptive content</dc:description> </rdf:Description> <rdf:Description ID="Creator_001"> rdf:about="http://dli.grainger.uiuc.edu/publications/metadatacasestudy/dc_,,,"> <vcard:given>Saki</vcard:given> <vcard:family>Knafo</vcard:family> <vcard:email> <vcard:userid>knafo@www.nytimes.com</vcard:userid> </vcard:email> </rdf:Description> </rdf:RDF> INLS 520 – Fall 2007 Erik Mitchell DC in RDF • Expressing Simple Dublin Core in RDF/XML (Beckett, et al., 2002) - http://dublincore.org/documents/dcmes-xml/ - *note, remember, you cannot do qualification with this recommendation. • Expressing Qualified Dublin Core in RDF / XML (Kokkelink & Schwänzl, 2002) - http://dublincore.org/documents/2002/04/14/dcqrdf-xml/ INLS 520 – Fall 2007 Erik Mitchell Programming 101 • What is a program? • What concepts do we need to understand? • Is XSL a programming language? INLS 520 – Fall 2007 Erik Mitchell Programming 101 • Definition: – “the act of creating software or some other set of instructions for a computer.” [1] • Examples – Dynamic web sites – Compiled applications (like Firefox) – Small applications that perform a specific task (such as transform metadata) INLS 520 – Fall 2007 Erik Mitchell Definitions • Programming Language • “A formal language used to write instructions that can be translated into machine language and then executed by a computer.” (definitions) • Scripting Language • • • • Run-time (does not require compilation) Restricted context (requires a specific environment) Functional / Object oriented Definitions • Compiler / Interpreter • A program that builds and executes a program. Compilers create a self-executable file, interpreters read a text script at run-time Programming approaches • Logical/structural programming • Stream of consciousness • Starts at line 1 • Procedural programming • Uses functions, sub-functions, subroutines • Encapsulation, modularization • Object-oriented programming • Further encapsulation • Uses concepts of inheritance, modularity Flow of Document Models What is the relationship of the data model to the intended document use in the four following document examples? Structural Procedural Procedural+ Objects Data model and Logic are intertwined Data model is encoded in standard Data model has no implicit use definition Data model independent of use Examples: Examples: Examples: Examples: • Text documents • Simple programs • HTML document • Functional program • XML document • Re-usable function • RDF Document • OOP INLS 520 – Fall 2007 Erik Mitchell The programming process • Analyze the problem • What do you want your program to do? • What are your users expecting, what data do you have? • Plan program flow/logic • What steps need to occur, in what order? • Useful tools include Step-Form, flowcharts, pseudocode • Code the program • Create variables, routines, functions • Compile/run the program • Test, verify • Release and Programming 101 - Concepts • General structure – Programs have a ‘flow’ to them – Programs use functions, algorithms, and objects to compartmentalize operations – Programs follow a specific syntax (their own document model) – Programs operate in specific environments (compiled platforms, run-time platforms) INLS 520 – Fall 2007 Erik Mitchell Programming 101 – Concepts • Control Structures – Looping (while) – Decision making (if) • Variables – Store information for use/reuse – A simple varaible is name=value INLS 520 – Fall 2007 Erik Mitchell Programming 101 - XSL • Is XSL programming? • What can we use XSL for? • Why are we covering it here? INLS 520 – Fall 2007 Erik Mitchell XSL Overview • Extensible Stylesheet Language • Components – Defined XML standard which is used in conjunction with a transformation engine to transform XML data – Xquery/Xpath • Capabilities, limitations – Document processing – Semi-functional programming language XSL Introduction • Styling – XSL - eXtensible Style Language • Querying – – – – XPath XQuery XPointer XLink • Good resources for reference – – – – http://www.w3schools.com/xsl/default.asp http://www.w3.org/Style/XSL/ http://www.w3schools.com/css/default.asp http://www.csstutorial.net/ XSL Overview - 1 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Tr ansform"> <xsl:output method="html"/> <xsl:template match="/dc"> Processing Instructions </xsl:template> </xsl:stylesheet> Contents of <xsl:template...> <html> <head> <title>Sample XSL transformation</title> </head> <body> <xsl:for-each select="*"> <p> <b> <xsl:value-of select="name(.)"/> <xsl:text>:</xsl:text> </b> <xsl:value-of select="./text()"/> </p> </xsl:for-each> </body> </html> XSL – Sample Stylesheet <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/rss"> <html> <body> <xsl:for-each select="./channel/item"> <xsl:value-of select="title"/><br/> </xsl:for-each> </body> </html> </xsl:template> </xsl:stylesheet> INLS 520 – Fall 2007 Erik Mitchell XSL Control Structures • For Each • <xsl:for-each select=“/date”></xsl:for-each> • Choosing between options • <xsl:choose> – <xsl:when select=“contains(/URL, “.edu”)> – </xsl:when> • </xsl:choose> • If • <xsl:if test=“./title != ‘’> </xsl:if> XSL Templates • Templates work like functions • Defining a template • <xsl:template name=“myName”> – <xsl:for-each…..> – </xsl:for-each> • </xsl:template> • Calling a template • <xsl:call-template name=“myName”/> INLS 520 – Fall 2007 Erik Mitchell XSL Variables • Variables store values for later use – In XSL variables are somewhat limited due to the processing relationship to the XML DOM • Defining a Variable • <xsl:variable name=“myVariable”>value here</xsl:variable> • Using a Variable • <xsl:value-of select=“$myVariable”/> INLS 520 – Fall 2007 Erik Mitchell XSL – Sample Stylesheet <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/rss"> <html> <body> <xsl:for-each select="./channel/item"> <xsl:value-of select="title"/><br/> </xsl:for-each> </body> </html> </xsl:template> </xsl:stylesheet> XPath • A DOM-style syntax that allows us to access elements in an XML file • Examples – /dublinCore/title – Access the title of a DC record – /dulinCore/subject/@attribute – Access an attribute of the subject element – /dublinCore/ Xpath (2) • Xpath functions – Contains (//item/title, ‘England’) – substring-before(string1, string2), substringafter(string1, string2) • Xpath selectors – //elementname – finds an element anywhere in the DOM – ./ - from the current context – / - from the root context – * - wildcard match XSL – Sample Stylesheet <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <xsl:template match="/dc"> <html> <head> <title>Sample XML File</title> </head> <body> <xsl:for-each select="*"> <p><b><xsl:value-of select="name(.)"/>: </b><xsl:text> </xsl:text><xsl:value-of select="./text()"/></p> </xsl:for-each> </body> </html> </xsl:template> </xsl:stylesheet> INLS 520 – Fall 2007 Erik Mitchell User generated Metadata • Based on our work with metadata so far – is this something a general ‘user’ could do? • What system features would help/hurt user-generated metadata? INLS 520 – Fall 2007 Erik Mitchell