Towards a Social Network of History Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu, School of Information, UC Berkeley Brian Tingle, California Digital Library Adrian Turner, California Digital Library Rachel Hu, California Digital Library PNC 2013 – Kyoto, Japan http://socialarchive.iath.virginia.edu Archival Name Authority System Hamilton, Alexander, 1757-1804 Luce, Clare Boothe, 19031987 Patton, George S. (George Smith), 1885-1945 Susan, 1933-2004 Oppenheimer, J. Robert, 1904- Sontag, Washington, George, 1732-1799 Archival Name 1967 Authority System Whitman, Walt, 1819-1892 Patton family Wright, Lloyd, 1890-1978 Franklin, Benjamin, 1706-1790 Buckminster (Richard Buckminster), Patton, George S.1895-1983 (George Smith), Hamilton, Alexander, 1757-1804 1885-1945 Anthony, Susan B Fuller, R. Hamilton, Alexander, 1757-1804 Berkeley Church Luce, ClareFree Boothe, 19031987 Bernstein, Leonard, Luce, Clare Boothe, 1903-1987 Oppenheimer, J. Robert, 1904- Sontag, Susan, 1933-2004 Oppenheimer, J. Robert, 1904-1967 Archival Name 1967 1918Authority System Washington, George, 1732-1799 Patton family Whitman, Walt, 1819-1892 Block, Herbert, 1909-2001 Bush, Vannevar, 1890-1974 Frankfurter, Felix, 1882-1965 Patton family Wright, Lloyd, 1890-1978 Patton, George S. (George Smith), Engelland, Jurgen (George). Enwall, Ogie (Aage). Erickson, Selma Inez. Fahl, Hans Johan Fredrik. Fet, Peter Laurits. Flones, Edward. Walfred. Norwick, Goodman. Nygaard, Lars Thomas. Holmes, Anna Gudrun Hauge. Holmes, Elias Kristofferson Velholmen. Ohrt, Sigfrid Eidsness. Hoset, Ole. Oliver, Kole Skaflestad. Howard, Barnett Allen, b. 1827. Olson, Alvin E.1757-1804 Hytmo, Guri Olsdatter. Alexander, Fredrickson, Hans. Hamilton, Opsal, Cato Torvald. Johnson, Andrew (Anders Johansson). Fredrickson, Sven Fredrick. Petersen, Greta Jensen. Johnson, Phiea Petersen Stahl. Rasmussen, Martin. Garberg, Peder. Johnson, Thelma Irene Rinne, Esther Wiirre. Gillam, Chandler B., 1833-1899. Halseth, Otto Hjalmer.Rodney family Washington, George, 1732-1799 Underdal. Handeland, Martha Tweiten.Sandback, George Brun. Jorgenson, Jorgen Aadneram. Hansen, Anne Schmidt.Saure, Sivert Andreas. Kjersem, Ole Johnson. Enwall,1904Ogie (Aage). Hansen, Sylvia (Solveig). Oppenheimer, J. Robert, Erickson, Haug, Olga Karoline Nilsen. Knudsen, Johanne. 1967 Hemmestad, Olga Kristine Brodahl. Kofoed, Thorvald Andreas. Flones, Edward. Selma Inez. Henry, Oscar M., 1851-1916. Larsen, Elias.Hans. Fredrickson, Fahl, Hans Johan Fredrik. Lillelien, Thor. Fet, Peter Laurits. Norberg, Jonas Walfred. Fredrickson, Sven Fredrick. Loe, Otto Calvin. Norwick, Goodman. Molund, Erik Wilhelm. Gillam, ChandlerNakkerud, B., 1833-1899. Nygaard, Lars Thomas. Inga Amanda Treland. Halseth, Otto Hjalmer. Nakkerud, Trygve Bloch. Odmark, Elsie Karlson. . Nelson, Amanda. . Ohrt, Sigfrid Eidsness. Nerland, Einar Magnus. Hoset, Ole. Oliver, Kole Skaflestad. Howard, Barnett Allen, b. 1827. Olson, Alvin E. Wright, Lloyd, 1890-1978 , Einer. Opsal, Cato Torvald. Nilsen, Martha Dagsvik. Petersen, Greta Jensen. Knudsen, Johanne. Nissen, Ole Andreas Nissenivert Andreas. Rasmussen, Martin. Kofoed, Thorvald Andreas. Johnson, Andrew (Anders Johansson). Rinne, Esther Wiirre. Nakkerud, Inga Amanda Treland. Johnson, Phiea Petersen Stahl. Rodney family Nakkerud, Trygve Bloch. Sandback, George Brun. Nelson, Amanda. Saure, SHandeland, Martha Nerland, Einar Magnus. Nielsen, Einer. Tweiten. Jorgenson, Jorgen Aadneram. Nilsen, Martha Dagsvik. Hansen, Anne Schmidt. Kjersem, Ole Johnson. Nissen, Ole Andreas Nissen. Hansen, Sylvia (Solveig). Haug, Olga Karoline Nilsen. Norberg, Jonas Odmark, Elsie Karlson. 1885-1945 Luce, Clare Boothe, 1903Sontag, Susan, 1933-2004 1987 Whitman, Walt, 1819-1892 Archival Name Authority System Garberg, Peder. Holmes, Anna Gudrun Hauge. Holmes, Elias Kristofferson Velholmen Patton, George S. (George Smith), Hytmo, Guri Olsdatter. Patton family Nielsen Johnson, Thelma Irene Underdal. Engelland, Jurgen (George). Nelson, Amanda. Hoset, Ole. Enwall, Ogie (Aage). Nerland, Einar Magnus. Howard, Barnett Allen, b. 1827. Erickson, Nielsen, Einer. Hytmo, Guri Olsdatter. Engelland,Selma JurgenInez. (George). Engelland, JurgenNelson, (George). Amanda. Hoset,(Anders Ole. Nilsen, Martha Dagsvik.Nelson, Amanda.Hoset, Ole. Johnson, Andrew Johansson). Enwall, Ogie (Aage). Nerland, Enwall, Ogie Einar(Aage). Magnus. Nerland, Einar Magnus. Howard, Barnett Allen, b. 1827. Howard, Barnett Allen, b. 1827. Fahl, Hans Johan Fredrik. Nissen, Ole Andreas Nissen. Johnson, Hytmo, PhieaGuri Petersen Stahl.Hytmo, Gu Erickson, Nielsen, Einer. Erickson, Nielsen, Einer. Olsdatter. Fet, Peter Laurits. Norberg, Jonas Selma Inez. Inez. Nilsen, Martha Dagsvik. Nilsen,Walfred. Martha Dagsvik. Johnson, Andrew (Anders Johansson). Johnson, Andrew (Anders Johnson, Thelma Irene Flones,Selma Edward. Fahl, Hans Johan Fredrik. Fredrickson, Fahl, Hans Johan Fredrik. Norwick, Goodman. Hans. Nissen, Ole Andreas Nissen. Nissen, OleJohnson, Andreas Nissen. Franklin, Benjamin, 1706-1790 Phiea Petersen Johnson, Stahl. Phiea Peterse Underdal. Fet, PeterFredrickson, Laurits. Fet, Peter Laurits. Sven Fredrick. Nygaard, Lars Thomas. Norberg, Jonas Walfred. Norberg, Jonas Walfred. Fuller, R. Johnson, Thelma Johnson, Thelm Flones, Edward. Flones, Edward. Jorgenson, JorgenIrene Aadneram. Garberg, Peder. Odmark, Elsie Karlson. Norwick, Goodman. Norwick, Goodman. Fredrickson, Hans. Fredrickson, Gillam, Chandler B., 1833-1899.Hamilton, Alexander,Hans. 1757-1804 Underdal. Un Ohrt, Sigfrid Eidsness. Kjersem, Ole Johnson. Fredrickson, Sven Fredrick. Fredrickson, Fredrick. Nygaard, Lars Thomas. Halseth, Sven Otto Hjalmer. Nygaard, Lars Thomas. Oliver, Kole Skaflestad. Jorgenson, JorgenKnudsen, Aadneram. Jorgenson, Jorgen Aad Garberg, Peder. Garberg, Peder. Handeland, MarthaOdmark, Tweiten.Elsie Johanne . Karlson. Odmark, Elsie Karlson. Olson, Alvin E. Gillam, Chandler B., 1833-1899. Gillam, Chandler 1833-1899. Hansen, AnneB., Schmidt. Kofoed, Thorvald Andreas. Ole J Ohrt, Sigfrid Eidsness. Eidsness.Ole Opsal, Cato Torvald.Ohrt, Sigfrid Kjersem, Johnson. Kjersem, Hamilton, Alexander, 1757-1804 Halseth, Otto Hjalmer. Halseth, Otto Hjalmer. Hansen, Sylvia (Solveig). Larsen, Elias. Oliver, Kole Skaflestad. Oliver, Kole Skaflestad. Petersen, Greta Jensen. Martha Tweiten.Olson, Handeland, Knudsen, Johanne. Knudsen, J Haug, OlgaHandeland, Karoline Nilsen. Alvin E. Martha Tweiten.Olson, Rasmussen, Alvin E. Martin.Lillelien, Thor. Hansen, Anne Schmidt. Hansen, Anne Schmidt. Hemmestad, Olga Kristine Brodahl. Kofoed, Thorvald Andreas. Kofoed, Thorva Loe, OttoCato Calvin. Opsal, Cato Torvald. Opsal, Torvald. Rinne, Esther Wiirre. Hansen, Sylvia (Solveig). Sylvia (Solveig). Henry, OscarHansen, M., 1851-1916. Larsen, Elias. Larsen, Elias. Molund, Erik Wilhelm. Petersen, Greta Jensen. Rodney family Petersen, Greta Jensen. Haug, Olga Karoline Nilsen. Haug, Olga Karoline Nilsen. Thor. Treland. Nakkerud, Inga Amanda Oppenheimer, J. Robert, 1904Rasmussen, Martin.Lillelien, Thor. Rasmussen, Martin.Lillelien, Sandback, George Brun. Washington, George, 1732-1799 Hemmestad, Olga Kristine Brodahl. Hemmestad, OlgaRinne, Kristine Brodahl. Loe, Otto Calvin. Loe, Otto Calvin. Nakkerud, Trygve Bloch. Esther Wiirre. Saure, Sivert Andreas. Rinne, Esther Wiirre. 1967 Henry,Rodney Henry, Oscar M., 1851-1916. Oscar M., 1851-1916. Molund, Erik Wilhelm. Molund, Erik Wilhelm. Nelson, Amanda. family Rodney family Enwall, Ogie (Aage). Nakkerud, Inga Amanda Treland. Nakkerud, Inga Amanda Nerland, Einar Magnus. Sandback, George Brun. Sandback, George Brun. Erickson, Nakkerud, Trygve Bloch. Nakkerud, Trygve Blo Saure, Sivert Andreas. Saure, Sivert Andreas. , Einer. Nelson, Amanda. Enwall, Ogie (Aage). Enwall, Ogie (Aage).Nelson, Amanda. Selma Inez. Block, .Herbert, 1909-2001 Nerland, Einar Magnus. Nerland, Einar Magnus. Fahl, Hans Johan Fredrik.Erickson, Nilsen, Erickson, Martha Dagsvik. Hoset, Ole. Wright, Lloyd, 1890-1978 Fet, Peter Laurits. Nissen, Ole Andreas Nissen. Howard, Barnett Allen, b. 1827. , Einer. , Ei Selma Inez. Selma Inez. Flones,. Edward. Norberg, Jonas Walfred. . Fahl, Hans Johan Fredrik. Fredrickson, Fahl, Hans Fredrik.Martha Dagsvik. Norwick,Nilsen, Martha Dagsvik. Hans.JohanNilsen, Goodman. Hoset, Ole. Hoset, Ole. Fet, Peter Laurits. Fet, Peter Laurits. Ole Andreas Nissen. Fredrickson, Sven Fredrick. Nissen, Ole Andreas Nissen. Nygaard, Nissen, Lars Thomas. Johnson, (Anders Johansson). Howard, Barnett Allen, b. 1827.Andrew Howard, Barnett Allen, b. 1827. Flones, Edward. Flones, Edward. Norberg, Jonas Walfred. Norberg, Jonas Walfred. Odmark, Elsie Karlson. Johnson, Phiea Petersen Stahl. Fredrickson, Hans. Fredrickson, Hans. Norwick, Goodman. Norwick, Goodman. Ohrt, Sigfrid Eidsness. Johnson, Thelma Irene Underdal. Gillam, Chandler B., 1833-1899. Fredrickson, Sven Fredrick. Fredrickson, Sven Fredrick. Nygaard, Lars Thomas. Nygaard, Lars Thomas. Johnson, Andrew (Anders Johansson). Johnson, Andrew (Anders Johansson). Oliver, Kole Skaflestad. Jorgenson, Jorgen Aadneram. Halseth, Otto Hjalmer. Odmark, Elsie Karlson. Odmark, Elsie Karlson. Johnson, Phiea Petersen Stahl. Johnson, Phiea Petersen Stahl. Olson, Alvin E. Kjersem, Ole Johnson. Handeland, Martha Tweiten. Ohrt, Sigfrid Eidsness. Ohrt, Sigfrid Eidsness. Johnson, Thelma Knudsen, Irene Underdal. Johnson, Thelma Irene Underdal. Opsal, Cato Torvald. Johanne. Gillam, Chandler B., 1833-1899. Gillam, Chandler B., Skaflestad. 1833-1899. Hansen, Anne Schmidt. Oliver, Kole Jorgenson, Jorgen Aadneram. Jorgenson, Jorgen Aadneram. Petersen, Greta Jensen.Oliver, Kole Skaflestad. Kofoed, Thorvald Andreas. Halseth, Otto Hjalmer. Halseth, Otto Hjalmer. Hansen, Sylvia (Solveig). Olson, Alvin E. Olson, Alvin E. Kjersem, Ole Johnson. Kjersem, Rasmussen, Martin. Larsen, Elias. Ole Johnson. Handeland, Martha Nilsen. Tweiten. Handeland, Martha Tweiten. Haug, Olga Karoline Opsal, Cato Torvald. Opsal, Cato Torvald. Knudsen, Johanne. Knudsen, Johanne. Rinne, Esther Wiirre. Lillelien, Thor. Hansen, Anne AnneGreta Schmidt. Hemmestad, OlgaSchmidt. Kristine Brodahl. Hansen, Petersen, Jensen. Petersen, Greta Jensen. Kofoed, Thorvald Andreas. Kofoed, Thorvald Andreas. Rodney family Loe, Otto Calvin. Hansen, Sylvia (Solveig). Hansen, Sylvia (Solveig). Henry, Oscar M., 1851-1916. Rasmussen, Martin. Rasmussen, Martin. Elias. Larsen, Elias. Sandback, George Brun. Molund,Larsen, Erik Wilhelm. Haug, OlgaHolmes, KarolineAnna Nilsen. Haug, Olga Karoline Nilsen. Gudrun Hauge. Rinne, Esther Wiirre. Rinne, Esther Wiirre. Lillelien, Thor. Treland. Lillelien, Thor. Saure, Sivert Andreas. Nakkerud, Inga Amanda Hemmestad, Olga Kristine Brodahl. Hemmestad, Olga Kristine Brodahl.Rodney family Holmes, Elias Kristofferson Velholmen. Rodney family Loe, Otto Calvin. Loe, Otto Calvin. Nakkerud, Trygve Bloch. Henry, Oscar M., 1851-1916. Henry, Oscar M.,Sandback, 1851-1916. George Brun. Sandback, George Brun. Molund, Erik Wilhelm. Molund, Erik Wilhelm. Anthony, Susan B Berkeley Church Luce, ClareFree Boothe, 19031987 Buckminster (Richard Buckminster), Patton, George S.1895-1983 (George Smith), 1885-1945 Luce, Clare 1903-1987 Bernstein, Leonard, Sontag, Susan,Boothe, 1933-2004 Holmes, Anna Gudrun Oppenheimer, J. Robert, 1904-1967 Archival Name Hauge. 1918Authority System Holmes, Anna Gudrun Holmes, Anna Gudrun Whitman, Walt, 1819-1892 Holmes, Elias Kristofferson Hauge. Hauge. Patton family Nielsen Velholmen Holmes, Elias Kristofferson Holmes, Elias Kristofferson Nielsen Velholmen Velholmen Patton, George S. (George Smith), Bush, Hytmo, GuriVannevar, Olsdatter.1890-1974 Patton family Hytmo, Guri Olsdatter. Hytmo, Guri Olsdatter.Garberg, Peder. Frankfurter, Felix, 1882-1965 Garberg, Peder. Garberg, Peder. Nielsen Archival Name Archival Name Authority System Authority System Archival Name Archival Name Authority System Authority System Archival Name Authority System Background • Research and demonstration project • Multi-year funding • National Endowment for the Humanities (2010-2012) • Andrew W. Mellon Foundation (20122014) • Planning Project for Cooperative Service (2014-15 - Pending) Objectives 1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records) 2. Match, merge, and enhance; build a large test corpus of EAC-CPF records 3. Create a prototype biographical resource and access system, using those records Objectives 1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records) 2. Match, merge, and enhance; build a large test corpus of EAC-CPF records 3. Create a prototype biographical resource and access system, using those records Objectives 1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records) 2. Match, merge, and enhance; build a large test corpus of EAC-CPF records 3. Create a prototype biographical resource and access system, using those records Project Team • University of Virginia, Institute for Advanced Technology in the Humanities – Daniel Pitti (PI) and Worthy Martin • UC Berkeley School of Information – Ray Larson and Yiming Liu • California Digital Library – Rachael Hu, Brian Tingle, and Adrian Turner Project Team • • • • • Terry Catapano (Columbia University) Sara Sprenkle (Washington and Lee University) Sarah Wells (University of Virginia) Kathy Wisser (Simmons Graduate School of Library and Information Science) Tom Lynch (University of Illinois School of Library and Information Science) EAC-CPF • XML-based data structure standard for encoding archival authority records • Authorized name headings for the entity • Biographical/historical context for the entity • Links to resources created by the entity • Links to resources about the entity Example EAD - Creator <?xml version="1.0" encoding="UTF-8" standalone="no"?> <!-- Transformed with v1v2002_4.xsl --> <!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" "http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd" [ <!ENTITY lcseal SYSTEM "http://lcweb2.loc.gov/xmlcommon/lcseal.jpg " NDATA jpeg> ]> <ead><eadheader repositoryencoding="iso15511” … > <eadid mainagencycode="dlc" countrycode="us”…>http://hdl.loc.gov/loc.mss/eadmss.ms003073</eadid><filedesc>< titlestmt> <titleproper encodinganalog="245$a">Clement F. Haynsworth Papers</titleproper> … <unitid label="ID No." encodinganalog="590" countrycode="US” …>MSS79781</unitid> <origination label="Creator"> <persname source="lcnaf" encodinganalog="100">Haynsworth, Clement F. (Clement Furman), 1912-1989</persname> </origination> Example EAD - bioghist …<bioghist encodinganalog="545"> <head>Biographical Note</head> <chronlist> <listhead> <head01>Date</head01> <head02>Event</head02> </listhead> <chronitem> <date>1912, Oct. 30</date> <event>Born, Greenville, S.C.</event> </chronitem> <chronitem> <date>1933</date> <event>A.B., Furman University, Greenville, S.C.</event> </chronitem> <chronitem> <date>1936</date> <event>LL.B., Harvard University, Cambridge, Mass.</event> </chronitem> … Title Title Title Title John Brennan George Jones Frederick Jones Martha Jones Thomas Smith Example EAD - controlaccess … </note> <controlaccess> <head>People</head> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Barzun%2C+Jacques%2C+1907+Correspondence.^">Barzun, Jacques, 1907- --Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Brennan%2C+William+J.+%28William+Joseph%29%2C+190 6-1997+Correspondence.^">Brennan, William J. (William Joseph), 1906-1997-Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Burger%2C+Warren+E.%2C+19071995+Correspondence.^">Burger, Warren E., 1907-1995--Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Clark%2C+Tom+C.+%28Tom+Campbell%29%2C+18991977+Correspondence.^">Clark, Tom C. (Tom Campbell), 1899-1977-Correspondence.</persname> … Example EAD - scopecontent … The most significant and frequent of Haynsworth's correspondents are Jacques Barzun, William J. Brennan, Warren E. Burger, Tom C. Clark, John Paul Frank, Ernest F. Hollings, Edward Moore Kennedy, J. Woodrow Lewis, Daniel John Meador, Arthur Raphael Miller, Richard M. Nixon, Lewis F. Powell, Jr., Strom Thurmond, Johnnie McKeiver Walters, Bernard J. Ward, and Charles Alan Wright.</p> </scopecontent> … Example EAD – unittitle <c04 level="file"> <did> <unitid>No. 7383 </unitid> <unittitle encodinganalog="245$a">Long Mfg. Co. v. Holliday</unittitle> </did> </c04> …<c04 level="file"> <did> <unitid>No. 7416 </unitid> <unittitle encodinganalog="245$a">Norfolk and Portsmouth Belt Line R.R. v. Brotherhood of R.R. Trainmen, Lodge No. 514</unittitle> </did> </c04> … <c03 level="file"> <did> <container type="box">201</container> <unittitle encodinganalog="245$a">Wright, Charles Alan, 1970-1989 </unittitle> <physdesc> <extent encodinganalog="300">(10 folders)</extent> Data Sources • EAD finding aids [~150,000] – – • MARC21 records [~4.5 million] – • 13 regional and statewide consortia 35 repositories in US, UK, and France; multiple US federal agencies OCLC WorldCat Authority records – OCLC Research: Virtual International Authority File (VIAF) [~16 million] – Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000] – Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives Consortia Individual institutions •Archives Florida •ArchivesHub (UK) •Arizona Archives Online • Points (OhioLink) •EAD FACTORY •Five Colleges •Maine Archival Collections Online (MACON) •Northwest Digital Archives (NWDA) •Online Archive of California •Philadelphia Area Consortium of Special Collections Libraries (PACSCL) •Rhode Island Archival & Manuscript Collections Online (RIAMCO) •Rocky Mountain Online Archive (RMOA) •Texas Archival Resources Online (TARO) •Virginia Heritage •American Philosophical Society •Northwestern University •Archives nationales (France) •Princeton University •Archives of American Art •Rutgers University •Bibliothèque nationale de France •Smithsonian Institution Archives •BnF Archives et manuscripts •Syracuse University •French Union Catalog •University of Alabama •Brigham Young University •University of Chicago •Church of Latter Day Saints •University of Connecticut Archives •University of Delaware •Columbia University •University of Florida •Cornell University •University of Illinois •Duke University •University of Kansas •Harvard University •University of Maryland •Indiana University •University of Michigan Bentley & •Library of Congress (publicly Special Collections available without restriction) •University of Minnesota •Minnesota Historical Society •University of Nebraska •Massachusetts Institute of •University of North Carolina, Technology Chapel Hill •National Library of Medicine •University of Utah •New York Public Library •Utah State Archives •New York University •Utah State University •North Carolina State •Yale University Data Sources • EAD finding aids [~150,000] – – • MARC21 records [~4.5 million] – • 13 regional and statewide consortia 35 repositories in US, UK, and France; multiple US federal agencies OCLC WorldCat Authority records – OCLC Research: Virtual International Authority File (VIAF) [~16 million] – Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000] – Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives Data Sources • EAD finding aids [~150,000] – – • MARC21 records [~4.5 million] – • 13 regional and statewide consortia 35 repositories in US, UK, and France; multiple US federal agencies OCLC WorldCat Authority records – OCLC Research: Virtual International Authority File (VIAF) [~16 million] – Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000] – Additional EAC-CPF (or other) name records from Archives nationales de France, British Library, NARA, New York State Archives, and Smithsonian Institution Archives Methods and Processing • Extract EAC-CPF records from existing EAD-encoded archival descriptions – Extracting both creators and referenced CPF names • Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN) • Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about) The Problem • Proliferation of the forms of names – Different names for the same person – Different people with the same names • Examples – from Books in Print (semi-controlled but not consistent) – ERIC author index (not controlled) Goethe …etc… John Muir Library and Archive Authority Control • Library (or bibliographic) authority control is almost exclusively about the control of names • Archival authority control involves biographicalhistorical description of the CPF entity – Descriptions based on controlled vocabularies, for example, occupations, place of birth and death – But also biographical-historical description • Prose • Chronological list • Archival authority control provides context for understanding records, the context of their creation, the provenance Matching and Merging in SNAC 2 • Developing an updateable database of merged EAC data (dumping Mongo for PostgreSQL) – Will permit incremental addition of new data and support editing and “forced” merges • All original records and merged records will be in the database • Permanent identifiers will be assigned to merged (and unmerged) EAC output records – Track these in the database Merging EAC-CPF Records LCNAF Repository ULAN Repository Cheshire Search Connect exactly matching records EAC Record Input Connect records using name authority information Postgres Merge Merged EAC Records Output Merge System Step 1: Load Original Records CREATE TABLE original_records ( id bigint NOT NULL, name character varying(255) DEFAULT ''::character varying NOT NULL, source_id character varying(255), collection_id character varying(64), path character varying(255), r_type character varying(64) NOT NULL, from_date date, from_date_type character varying(64), to_date date, to_date_type character varying(64), processed boolean DEFAULT false NOT NULL, last_processed timestamp without time zone, record_data text, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL, record_group_id bigint ); • Parse source EAC – Key attributes extracted for merge use – Original XML stored • Timestamp for last merge run on record – Resumption of aborted merge runs or reruns Merging EAC-CPF Records LCNAF Repository ULAN Repository Cheshire Search Connect exactly matching records EAC Record Input Connect records using name authority information Postgres Merge Merged EAC Records Output But… • Exact merging assumes that archives are following LC cataloging practice in their EAD records – There are some problems with this assumption Some failures for merging… • Different abbreviations: – A. & G. Carisch & C. – A. & G. Carisch & Co. • And spacing issues: – – – – A. C. Peters & Bro. A. C. Peters & Brother. A. C. Peters. (??) A. C.Peters & Bro. • Completeness and alternate rules – Tabb, John B. (John Banister), 1845-1909. – Tabb, John Banister, 1845-1909. • Also differing transliterations for non-Latin scripts More… • Variant romanizations (and spacing): – M. P. Belaieff. – M. P. Belaïeff. – M. P. Bieliaev. – M.P. Belaïeff. – M.P.Belaïeff. • Initials vs. names: – Zabolotskii, N.A. – Zabolotskii, Nikolai Alekseevich, 1903-1958. – Zabolotskii. Merging EAC-CPF Records LCNAF Repository ULAN Repository Cheshire Search Connect exactly matching records EAC Record Input Connect records using name authority information Postgres Merge Merged EAC Records Output Search Authority Files • For each name, formulate a search of the VIAF database using the Cheshire system (SGML/XML retrieval system with probabilistic and Boolean matching) – Search both the “authoritative” and “nonauthoritative” forms – Consider any name matching a non-authoritative form to be a candidate match for the authoritative form – Flag EAC records that match the same authority record as potential matches Data Sources • EAD finding aids [~150,000] – – • MARC21 records [~4.5 million] – • 13 regional and statewide consortia 35 repositories in US, UK, and France; multiple US federal agencies OCLC WorldCat Authority records – OCLC Research: Virtual International Authority File (VIAF) [~16 million] – Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000] – Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives NGRAM or Shingle Matching Name: Einstein Albert Shingle sequence: ein, ins, nst, ste, tei, ein … , ert Probability that the sequence (ins, nst, ste) follows ein is very high for the name einstein Shingle Language Model for names Krishna Janakiraman and Sean Marimpietri - Biograph Name 1 : Einstein Albert Name 2 : Ainshtain Albert Name 3 : Albert Einstein hta In ein tai sht ste ein lbe ert ein tei rte nst alb ins ins ste al nsh nst lbe Ain ins tei lbe ein ert ein In ein ain ert ein tei rte rte Shingle Language Model for names Krishna Janakiraman and Sean Marimpietri - Biograph Merge System Step 2: Record Matches CREATE TABLE record_groups ( id bigint NOT NULL, name character varying(255) DEFAULT ''::character varying NOT NULL, g_type character varying(64) NOT NULL, viaf_record text, ulan_record text, is_valid boolean, invalidated_by bigint, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL ); Original Records belongs to Has many Record Groups • Execute merge algorithm and create record groups – pointers from original records to record groups – Can be invalidated • Matched authority record stored for reference Merging EAC-CPF Records LCNAF Repository ULAN Repository Cheshire Search Connect exactly matching records EAC Record Input Connect records using name authority information Postgres Merge Merged EAC Records Output Merge Flagged Records • For all of the exact matches and authority matches – Use the Authoritative form of the name – Combine data from each match into a single EAC-CPF record – Retain all source record IDs and information • Finally, output the merged EAC-CPF records – Actually – store how to build the merged record in the database as well • Records can be regenerated as needed from the merge data – Assign permanent identifier for the merged record Merge System Step 3: Create Output • Using valid record groups: – generate merged EAC – assign permanent ARK ID – write to new EAC file • Merged XML stored in db, referenced by record group – Do not need to regenerate XML – Keep track of assigned permanent IDs Merging Conclusions • There is not a single merging method, but a staged set of approaches that will allow us to go from the simplest exact matches, to (we hope) reliably identifying various variant forms of a name, etc. when corroborated by contextual (date, etc.) information (including “active” dates Prototype Access System • text http://socialarchive.iath.virginia.edu SNAC Social Networks and Archival Context SNAC Social Networks and Archival Context NAAC National Archival Authorities Cooperative Not the final name NAAC National Archival Authorities Cooperative http://socialarchive.iath.virginia.edu/ NAAC_index.html Activities 1. Cultivate EAC-CPF expertise across the archival community, through 140 SAAhosted workshops 2. Develop a blueprint for a sustainable, national archival authority cooperative Activities 1. Cultivate EAC-CPF expertise across the archival community, through 140 SAAhosted workshops 2. Develop a blueprint for a sustainable, national archival authority cooperative Activities 1. 2. Cultivate EAC-CPF expertise across the archival community, through 140 SAA-hosted workshops Develop a blueprint for a sustainable, national archival authority cooperative Planning is being extended with proposal to the Mellon Foundation. Stay tuned for Spring 2014! Prototype Access System • text http://socialarchive.iath.virginia.edu Brian Tingle and Adrian Turner RBMS Pre-Conference 2012 San Diego, CA