Larson_PNC_2013_slides

advertisement
Towards a Social Network of History
Ray R. Larson, School of Information, UC Berkeley
Daniel Pitti, University of Virginia, Institute for Advanced Technology
in the Humanities
Yiming Liu, School of Information, UC Berkeley
Brian Tingle, California Digital Library
Adrian Turner, California Digital Library
Rachel Hu, California Digital Library
PNC 2013 – Kyoto, Japan
http://socialarchive.iath.virginia.edu
Archival Name
Authority System
Hamilton, Alexander, 1757-1804
Luce, Clare Boothe, 19031987
Patton, George S.
(George Smith),
1885-1945
Susan, 1933-2004
Oppenheimer, J. Robert, 1904- Sontag,
Washington,
George, 1732-1799
Archival Name
1967
Authority System
Whitman, Walt, 1819-1892
Patton family
Wright, Lloyd, 1890-1978
Franklin, Benjamin, 1706-1790
Buckminster
(Richard
Buckminster),
Patton,
George S.1895-1983
(George Smith),
Hamilton, Alexander, 1757-1804
1885-1945
Anthony, Susan B
Fuller, R.
Hamilton, Alexander, 1757-1804
Berkeley
Church
Luce,
ClareFree
Boothe,
19031987
Bernstein,
Leonard, Luce, Clare Boothe, 1903-1987
Oppenheimer, J. Robert, 1904- Sontag, Susan, 1933-2004
Oppenheimer, J. Robert, 1904-1967
Archival Name
1967
1918Authority System
Washington, George, 1732-1799
Patton family
Whitman, Walt, 1819-1892
Block, Herbert, 1909-2001
Bush, Vannevar, 1890-1974
Frankfurter, Felix, 1882-1965
Patton family
Wright, Lloyd, 1890-1978
Patton, George S. (George Smith),
Engelland, Jurgen (George).
Enwall, Ogie (Aage).
Erickson,
Selma Inez.
Fahl, Hans Johan Fredrik.
Fet, Peter Laurits.
Flones, Edward.
Walfred.
Norwick, Goodman.
Nygaard, Lars Thomas.
Holmes, Anna Gudrun Hauge.
Holmes, Elias Kristofferson Velholmen.
Ohrt, Sigfrid Eidsness. Hoset, Ole.
Oliver, Kole Skaflestad. Howard, Barnett Allen, b. 1827.
Olson,
Alvin E.1757-1804
Hytmo, Guri Olsdatter.
Alexander,
Fredrickson, Hans. Hamilton,
Opsal, Cato Torvald.
Johnson, Andrew (Anders Johansson).
Fredrickson, Sven Fredrick.
Petersen, Greta Jensen. Johnson, Phiea Petersen Stahl.
Rasmussen, Martin.
Garberg, Peder.
Johnson, Thelma Irene
Rinne, Esther Wiirre.
Gillam, Chandler B., 1833-1899.
Halseth, Otto Hjalmer.Rodney family
Washington, George, 1732-1799
Underdal.
Handeland, Martha Tweiten.Sandback, George Brun.
Jorgenson, Jorgen Aadneram.
Hansen, Anne Schmidt.Saure, Sivert Andreas.
Kjersem, Ole Johnson.
Enwall,1904Ogie (Aage).
Hansen, Sylvia (Solveig).
Oppenheimer, J. Robert,
Erickson,
Haug, Olga Karoline Nilsen.
Knudsen, Johanne.
1967
Hemmestad, Olga Kristine Brodahl.
Kofoed, Thorvald Andreas.
Flones, Edward.
Selma Inez.
Henry, Oscar M., 1851-1916.
Larsen, Elias.Hans.
Fredrickson,
Fahl, Hans Johan Fredrik.
Lillelien, Thor.
Fet, Peter Laurits. Norberg, Jonas Walfred. Fredrickson, Sven Fredrick.
Loe, Otto Calvin.
Norwick, Goodman.
Molund, Erik Wilhelm.
Gillam, ChandlerNakkerud,
B., 1833-1899.
Nygaard, Lars Thomas.
Inga Amanda Treland.
Halseth,
Otto
Hjalmer.
Nakkerud, Trygve Bloch.
Odmark, Elsie Karlson.
.
Nelson, Amanda.
.
Ohrt, Sigfrid Eidsness.
Nerland, Einar Magnus.
Hoset, Ole.
Oliver, Kole Skaflestad.
Howard, Barnett Allen, b. 1827.
Olson, Alvin E.
Wright, Lloyd, 1890-1978
, Einer.
Opsal, Cato Torvald.
Nilsen, Martha Dagsvik.
Petersen, Greta Jensen.
Knudsen, Johanne.
Nissen, Ole Andreas Nissenivert Andreas.
Rasmussen, Martin.
Kofoed, Thorvald Andreas.
Johnson, Andrew (Anders Johansson).
Rinne, Esther Wiirre.
Nakkerud, Inga Amanda Treland.
Johnson,
Phiea
Petersen Stahl.
Rodney family
Nakkerud, Trygve Bloch.
Sandback, George Brun.
Nelson, Amanda.
Saure, SHandeland, Martha
Nerland, Einar Magnus.
Nielsen, Einer.
Tweiten.
Jorgenson, Jorgen Aadneram.
Nilsen, Martha Dagsvik.
Hansen, Anne Schmidt.
Kjersem, Ole Johnson.
Nissen, Ole Andreas Nissen.
Hansen, Sylvia (Solveig).
Haug, Olga Karoline Nilsen.
Norberg, Jonas
Odmark, Elsie Karlson.
1885-1945
Luce, Clare Boothe, 1903Sontag, Susan, 1933-2004
1987
Whitman, Walt, 1819-1892
Archival Name
Authority System
Garberg, Peder.
Holmes, Anna Gudrun
Hauge.
Holmes, Elias Kristofferson
Velholmen
Patton, George S.
(George Smith),
Hytmo, Guri Olsdatter.
Patton family
Nielsen
Johnson, Thelma Irene
Underdal.
Engelland, Jurgen (George).
Nelson, Amanda.
Hoset, Ole.
Enwall, Ogie (Aage).
Nerland, Einar Magnus.
Howard, Barnett Allen, b. 1827.
Erickson,
Nielsen, Einer.
Hytmo, Guri Olsdatter.
Engelland,Selma
JurgenInez.
(George).
Engelland, JurgenNelson,
(George).
Amanda.
Hoset,(Anders
Ole.
Nilsen, Martha Dagsvik.Nelson, Amanda.Hoset, Ole.
Johnson, Andrew
Johansson).
Enwall,
Ogie
(Aage).
Nerland,
Enwall, Ogie
Einar(Aage).
Magnus.
Nerland,
Einar
Magnus.
Howard,
Barnett
Allen,
b.
1827.
Howard,
Barnett
Allen, b. 1827.
Fahl, Hans
Johan
Fredrik.
Nissen, Ole Andreas Nissen.
Johnson, Hytmo,
PhieaGuri
Petersen
Stahl.Hytmo, Gu
Erickson,
Nielsen, Einer. Erickson,
Nielsen, Einer.
Olsdatter.
Fet, Peter Laurits.
Norberg, Jonas
Selma Inez.
Inez.
Nilsen, Martha Dagsvik.
Nilsen,Walfred.
Martha Dagsvik.
Johnson, Andrew
(Anders Johansson).
Johnson,
Andrew (Anders
Johnson,
Thelma
Irene
Flones,Selma
Edward.
Fahl, Hans Johan Fredrik. Fredrickson,
Fahl, Hans
Johan
Fredrik.
Norwick,
Goodman.
Hans.
Nissen, Ole Andreas Nissen.
Nissen,
OleJohnson,
Andreas
Nissen.
Franklin,
Benjamin,
1706-1790
Phiea Petersen
Johnson,
Stahl.
Phiea Peterse
Underdal.
Fet, PeterFredrickson,
Laurits.
Fet, Peter Laurits.
Sven Fredrick.
Nygaard,
Lars
Thomas.
Norberg, Jonas Walfred. Norberg, Jonas Walfred.
Fuller, R.
Johnson,
Thelma
Johnson, Thelm
Flones,
Edward.
Flones, Edward.
Jorgenson,
JorgenIrene
Aadneram.
Garberg,
Peder.
Odmark, Elsie Karlson.
Norwick,
Goodman.
Norwick,
Goodman.
Fredrickson,
Hans.
Fredrickson,
Gillam,
Chandler B.,
1833-1899.Hamilton,
Alexander,Hans.
1757-1804
Underdal.
Un
Ohrt, Sigfrid Eidsness.
Kjersem,
Ole Johnson.
Fredrickson, Sven Fredrick.
Fredrickson,
Fredrick. Nygaard, Lars Thomas.
Halseth, Sven
Otto Hjalmer.
Nygaard,
Lars Thomas.
Oliver, Kole
Skaflestad.
Jorgenson, JorgenKnudsen,
Aadneram.
Jorgenson,
Jorgen
Aad
Garberg, Peder.
Garberg, Peder.
Handeland,
MarthaOdmark,
Tweiten.Elsie
Johanne
.
Karlson.
Odmark, Elsie Karlson.
Olson,
Alvin E.
Gillam, Chandler B., 1833-1899. Gillam,
Chandler
1833-1899.
Hansen,
AnneB.,
Schmidt.
Kofoed,
Thorvald
Andreas. Ole J
Ohrt, Sigfrid Eidsness.
Eidsness.Ole
Opsal, Cato Torvald.Ohrt, Sigfrid
Kjersem,
Johnson.
Kjersem,
Hamilton,
Alexander,
1757-1804
Halseth, Otto Hjalmer.
Halseth, Otto Hjalmer.
Hansen, Sylvia (Solveig).
Larsen,
Elias.
Oliver, Kole
Skaflestad.
Oliver,
Kole
Skaflestad.
Petersen, Greta Jensen.
Martha Tweiten.Olson,
Handeland,
Knudsen, Johanne. Knudsen, J
Haug, OlgaHandeland,
Karoline Nilsen.
Alvin E. Martha Tweiten.Olson, Rasmussen,
Alvin E.
Martin.Lillelien, Thor.
Hansen,
Anne
Schmidt.
Hansen,
Anne
Schmidt.
Hemmestad, Olga Kristine Brodahl.
Kofoed,
Thorvald Andreas.
Kofoed, Thorva
Loe,
OttoCato
Calvin.
Opsal, Cato Torvald.
Opsal,
Torvald.
Rinne, Esther Wiirre.
Hansen, Sylvia (Solveig).
Sylvia (Solveig).
Henry, OscarHansen,
M., 1851-1916.
Larsen,
Elias.
Larsen,
Elias.
Molund,
Erik
Wilhelm.
Petersen, Greta Jensen.
Rodney family Petersen, Greta Jensen.
Haug, Olga Karoline Nilsen.
Haug,
Olga Karoline Nilsen.
Thor. Treland.
Nakkerud,
Inga Amanda
Oppenheimer,
J. Robert,
1904Rasmussen, Martin.Lillelien, Thor.
Rasmussen,
Martin.Lillelien,
Sandback, George
Brun.
Washington,
George,
1732-1799
Hemmestad, Olga Kristine Brodahl.
Hemmestad, OlgaRinne,
Kristine
Brodahl.
Loe,
Otto
Calvin.
Loe,
Otto
Calvin.
Nakkerud, Trygve
Bloch.
Esther
Wiirre.
Saure,
Sivert Andreas. Rinne, Esther Wiirre.
1967 Henry,Rodney
Henry, Oscar M., 1851-1916.
Oscar M.,
1851-1916.
Molund,
Erik
Wilhelm.
Molund,
Erik
Wilhelm.
Nelson,
Amanda.
family
Rodney
family
Enwall, Ogie (Aage).
Nakkerud,
Inga
Amanda
Treland.
Nakkerud,
Inga Amanda
Nerland,
Einar
Magnus.
Sandback, George Brun.
Sandback, George
Brun.
Erickson,
Nakkerud, Trygve Bloch.
Nakkerud, Trygve Blo
Saure, Sivert Andreas.
Saure, Sivert Andreas.
, Einer. Nelson, Amanda.
Enwall, Ogie (Aage).
Enwall, Ogie (Aage).Nelson, Amanda.
Selma Inez.
Block, .Herbert, 1909-2001
Nerland,
Einar
Magnus.
Nerland,
Einar
Magnus.
Fahl, Hans Johan Fredrik.Erickson,
Nilsen, Erickson,
Martha Dagsvik.
Hoset, Ole.
Wright,
Lloyd,
1890-1978
Fet, Peter Laurits.
Nissen, Ole Andreas Nissen.
Howard, Barnett Allen, b. 1827.
, Einer.
, Ei
Selma
Inez.
Selma Inez.
Flones,. Edward.
Norberg, Jonas Walfred.
.
Fahl, Hans Johan Fredrik. Fredrickson,
Fahl, Hans
Fredrik.Martha Dagsvik. Norwick,Nilsen,
Martha Dagsvik.
Hans.JohanNilsen,
Goodman.
Hoset, Ole.
Hoset, Ole.
Fet,
Peter
Laurits.
Fet,
Peter
Laurits.
Ole Andreas Nissen.
Fredrickson, Sven Fredrick. Nissen, Ole Andreas Nissen.
Nygaard, Nissen,
Lars Thomas.
Johnson,
(Anders
Johansson).
Howard, Barnett Allen,
b. 1827.Andrew
Howard,
Barnett
Allen, b. 1827.
Flones,
Edward.
Flones,
Edward.
Norberg,
Jonas
Walfred.
Norberg,
Jonas Walfred.
Odmark,
Elsie
Karlson.
Johnson, Phiea Petersen Stahl.
Fredrickson,
Hans.
Fredrickson,
Hans.
Norwick,
Goodman.
Norwick, Goodman.
Ohrt,
Sigfrid
Eidsness.
Johnson, Thelma Irene Underdal.
Gillam,
Chandler
B.,
1833-1899.
Fredrickson,
Sven
Fredrick.
Fredrickson,
Sven
Fredrick.
Nygaard,
Lars
Thomas.
Nygaard,
Lars Thomas.
Johnson, Andrew
(Anders Johansson).
Johnson, Andrew (Anders Johansson).
Oliver, Kole Skaflestad.
Jorgenson,
Jorgen Aadneram.
Halseth,
Otto
Hjalmer.
Odmark,
Elsie
Karlson.
Odmark,
Elsie
Karlson.
Johnson, Phiea
Petersen
Stahl. Johnson, Phiea Petersen Stahl.
Olson, Alvin E.
Kjersem,
Ole Johnson.
Handeland,
Martha
Tweiten.
Ohrt,
Sigfrid
Eidsness.
Ohrt,
Sigfrid
Eidsness.
Johnson, Thelma Knudsen,
Irene Underdal.
Johnson, Thelma Irene Underdal.
Opsal, Cato Torvald.
Johanne.
Gillam, Chandler
B., 1833-1899.
Gillam, Chandler
B., Skaflestad.
1833-1899.
Hansen,
Anne Schmidt.
Oliver, Kole
Jorgenson, Jorgen Aadneram.
Jorgenson,
Jorgen Aadneram.
Petersen, Greta Jensen.Oliver, Kole Skaflestad.
Kofoed, Thorvald
Andreas.
Halseth,
Otto
Hjalmer.
Halseth,
Otto
Hjalmer.
Hansen, Sylvia (Solveig).
Olson, Alvin E.
Olson,
Alvin E.
Kjersem, Ole Johnson.
Kjersem,
Rasmussen,
Martin.
Larsen,
Elias. Ole Johnson.
Handeland,
Martha Nilsen.
Tweiten. Handeland,
Martha
Tweiten.
Haug,
Olga Karoline
Opsal,
Cato
Torvald.
Opsal,
Cato
Torvald.
Knudsen, Johanne.
Knudsen,
Johanne.
Rinne,
Esther
Wiirre.
Lillelien, Thor.
Hansen, Anne
AnneGreta
Schmidt.
Hemmestad,
OlgaSchmidt.
Kristine Brodahl. Hansen,
Petersen,
Jensen.
Petersen,
Greta
Jensen.
Kofoed,
Thorvald
Andreas.
Kofoed,
Thorvald
Andreas.
Rodney
family
Loe, Otto Calvin.
Hansen, Sylvia
(Solveig).
Hansen, Sylvia (Solveig).
Henry,
Oscar
M.,
1851-1916.
Rasmussen,
Martin.
Rasmussen,
Martin.
Elias.
Larsen, Elias.
Sandback, George Brun.
Molund,Larsen,
Erik Wilhelm.
Haug, OlgaHolmes,
KarolineAnna
Nilsen.
Haug,
Olga Karoline Nilsen.
Gudrun
Hauge.
Rinne,
Esther
Wiirre.
Rinne,
Esther
Wiirre.
Lillelien,
Thor. Treland.
Lillelien, Thor.
Saure, Sivert Andreas.
Nakkerud,
Inga Amanda
Hemmestad,
Olga Kristine
Brodahl.
Hemmestad,
Olga Kristine Brodahl.Rodney family
Holmes,
Elias
Kristofferson
Velholmen.
Rodney
family
Loe, Otto Calvin.
Loe, Otto Calvin.
Nakkerud, Trygve Bloch.
Henry, Oscar M., 1851-1916. Henry, Oscar M.,Sandback,
1851-1916.
George Brun.
Sandback, George Brun.
Molund, Erik Wilhelm.
Molund, Erik Wilhelm.
Anthony, Susan B
Berkeley
Church
Luce,
ClareFree
Boothe,
19031987
Buckminster
(Richard
Buckminster),
Patton,
George S.1895-1983
(George Smith),
1885-1945
Luce, Clare
1903-1987
Bernstein,
Leonard,
Sontag,
Susan,Boothe,
1933-2004
Holmes, Anna Gudrun
Oppenheimer, J. Robert, 1904-1967
Archival Name
Hauge.
1918Authority System
Holmes, Anna Gudrun
Holmes, Anna Gudrun
Whitman, Walt, 1819-1892
Holmes, Elias Kristofferson
Hauge.
Hauge.
Patton family
Nielsen
Velholmen
Holmes, Elias Kristofferson
Holmes, Elias Kristofferson
Nielsen
Velholmen
Velholmen
Patton,
George
S.
(George Smith),
Bush,
Hytmo,
GuriVannevar,
Olsdatter.1890-1974
Patton family
Hytmo, Guri Olsdatter.
Hytmo, Guri Olsdatter.Garberg, Peder.
Frankfurter, Felix,
1882-1965
Garberg,
Peder. Garberg, Peder.
Nielsen
Archival
Name
Archival
Name
Authority System
Authority
System
Archival
Name
Archival
Name
Authority System
Authority
System
Archival Name
Authority System
Background
• Research and demonstration project
• Multi-year funding
• National Endowment for the Humanities
(2010-2012)
• Andrew W. Mellon Foundation (20122014)
• Planning Project for Cooperative Service
(2014-15 - Pending)
Objectives
1. Develop tools for extracting EAC-CPF
records, drawing on existing data (EAD
finding aids, MARC records)
2. Match, merge, and enhance; build a
large test corpus of EAC-CPF records
3. Create a prototype biographical
resource and access system, using
those records
Objectives
1. Develop tools for extracting EAC-CPF
records, drawing on existing data (EAD
finding aids, MARC records)
2. Match, merge, and enhance; build a
large test corpus of EAC-CPF records
3. Create a prototype biographical
resource and access system, using
those records
Objectives
1. Develop tools for extracting EAC-CPF
records, drawing on existing data (EAD
finding aids, MARC records)
2. Match, merge, and enhance; build a
large test corpus of EAC-CPF records
3. Create a prototype biographical
resource and access system, using
those records
Project Team
• University of Virginia, Institute for
Advanced Technology in the Humanities
– Daniel Pitti (PI) and Worthy Martin
• UC Berkeley School of Information
– Ray Larson and Yiming Liu
• California Digital Library
– Rachael Hu, Brian Tingle, and Adrian Turner
Project Team
•
•
•
•
•
Terry Catapano (Columbia University)
Sara Sprenkle (Washington and Lee University)
Sarah Wells (University of Virginia)
Kathy Wisser (Simmons Graduate School of Library
and Information Science)
Tom Lynch (University of Illinois School of Library
and Information Science)
EAC-CPF
• XML-based data structure standard for
encoding archival authority records
• Authorized name headings for the entity
• Biographical/historical context for the entity
• Links to resources created by the entity
• Links to resources about the entity
Example EAD - Creator
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Transformed with v1v2002_4.xsl -->
<!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival
Description (EAD) Version 2002)//EN"
"http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd" [
<!ENTITY lcseal SYSTEM "http://lcweb2.loc.gov/xmlcommon/lcseal.jpg " NDATA jpeg>
]>
<ead><eadheader repositoryencoding="iso15511” … >
<eadid mainagencycode="dlc"
countrycode="us”…>http://hdl.loc.gov/loc.mss/eadmss.ms003073</eadid><filedesc><
titlestmt>
<titleproper encodinganalog="245$a">Clement F. Haynsworth
Papers</titleproper> …
<unitid label="ID No." encodinganalog="590" countrycode="US”
…>MSS79781</unitid>
<origination label="Creator">
<persname source="lcnaf" encodinganalog="100">Haynsworth, Clement F.
(Clement Furman), 1912-1989</persname>
</origination>
Example EAD - bioghist
…<bioghist encodinganalog="545">
<head>Biographical Note</head>
<chronlist>
<listhead>
<head01>Date</head01>
<head02>Event</head02>
</listhead>
<chronitem>
<date>1912, Oct. 30</date>
<event>Born, Greenville, S.C.</event>
</chronitem>
<chronitem>
<date>1933</date>
<event>A.B., Furman University, Greenville, S.C.</event>
</chronitem>
<chronitem>
<date>1936</date>
<event>LL.B., Harvard University, Cambridge, Mass.</event>
</chronitem> …
Title
Title
Title
Title
John Brennan
George Jones
Frederick Jones
Martha Jones
Thomas Smith
Example EAD - controlaccess
… </note>
<controlaccess>
<head>People</head>
<persname encodinganalog="600" role="subject" source="lcnaf"
altrender=":::PWEBRECON=^Barzun%2C+Jacques%2C+1907+Correspondence.^">Barzun, Jacques, 1907- --Correspondence.</persname>
<persname encodinganalog="600" role="subject" source="lcnaf"
altrender=":::PWEBRECON=^Brennan%2C+William+J.+%28William+Joseph%29%2C+190
6-1997+Correspondence.^">Brennan, William J. (William Joseph), 1906-1997-Correspondence.</persname>
<persname encodinganalog="600" role="subject" source="lcnaf"
altrender=":::PWEBRECON=^Burger%2C+Warren+E.%2C+19071995+Correspondence.^">Burger, Warren E., 1907-1995--Correspondence.</persname>
<persname encodinganalog="600" role="subject" source="lcnaf"
altrender=":::PWEBRECON=^Clark%2C+Tom+C.+%28Tom+Campbell%29%2C+18991977+Correspondence.^">Clark, Tom C. (Tom Campbell), 1899-1977-Correspondence.</persname> …
Example EAD - scopecontent
…
The most significant and frequent of Haynsworth's correspondents are Jacques
Barzun, William J. Brennan, Warren E. Burger, Tom C. Clark, John Paul Frank, Ernest F.
Hollings, Edward Moore Kennedy, J. Woodrow Lewis, Daniel John Meador, Arthur
Raphael Miller, Richard M. Nixon, Lewis F. Powell, Jr., Strom Thurmond, Johnnie
McKeiver Walters, Bernard J. Ward, and Charles Alan Wright.</p>
</scopecontent>
…
Example EAD – unittitle
<c04 level="file">
<did>
<unitid>No. 7383 </unitid>
<unittitle encodinganalog="245$a">Long Mfg. Co. v. Holliday</unittitle>
</did>
</c04>
…<c04 level="file">
<did>
<unitid>No. 7416 </unitid>
<unittitle encodinganalog="245$a">Norfolk and Portsmouth Belt Line
R.R. v. Brotherhood of R.R. Trainmen, Lodge No. 514</unittitle>
</did>
</c04> …
<c03 level="file">
<did>
<container type="box">201</container>
<unittitle encodinganalog="245$a">Wright, Charles Alan, 1970-1989
</unittitle>
<physdesc>
<extent encodinganalog="300">(10 folders)</extent>
Data Sources
•
EAD finding aids [~150,000]
–
–
•
MARC21 records [~4.5 million]
–
•
13 regional and statewide consortia
35 repositories in US, UK, and France; multiple US federal
agencies
OCLC WorldCat
Authority records
–
OCLC Research: Virtual International Authority File (VIAF)
[~16 million]
–
Getty Vocabulary Program: Union List of Artist Names (ULAN)
[~120,000]
–
Additional name records from Archives nationales, British
Library, NARA, New York State Archives, and Smithsonian
Institution Archives
Consortia
Individual institutions
•Archives Florida
•ArchivesHub (UK)
•Arizona Archives Online
•
Points (OhioLink)
•EAD
FACTORY
•Five Colleges
•Maine Archival Collections
Online (MACON)
•Northwest Digital Archives
(NWDA)
•Online Archive of California
•Philadelphia Area
Consortium of Special
Collections Libraries (PACSCL)
•Rhode Island Archival &
Manuscript Collections Online
(RIAMCO)
•Rocky Mountain Online
Archive (RMOA)
•Texas Archival Resources
Online (TARO)
•Virginia Heritage
•American Philosophical Society •Northwestern University
•Archives nationales (France)
•Princeton University
•Archives of American Art
•Rutgers University
•Bibliothèque nationale de France •Smithsonian Institution Archives
•BnF Archives et manuscripts
•Syracuse University
•French Union Catalog
•University of Alabama
•Brigham Young University
•University of Chicago
•Church of Latter Day Saints
•University of Connecticut
Archives
•University of Delaware
•Columbia University
•University of Florida
•Cornell University
•University of Illinois
•Duke University
•University of Kansas
•Harvard University
•University of Maryland
•Indiana University
•University of Michigan Bentley &
•Library of Congress (publicly
Special Collections
available without restriction)
•University of Minnesota
•Minnesota Historical Society
•University of Nebraska
•Massachusetts Institute of
•University of North Carolina,
Technology
Chapel Hill
•National Library of Medicine
•University of Utah
•New York Public Library
•Utah State Archives
•New York University
•Utah State University
•North Carolina State
•Yale University
Data Sources
•
EAD finding aids [~150,000]
–
–
•
MARC21 records [~4.5 million]
–
•
13 regional and statewide consortia
35 repositories in US, UK, and France; multiple US federal
agencies
OCLC WorldCat
Authority records
–
OCLC Research: Virtual International Authority File (VIAF)
[~16 million]
–
Getty Vocabulary Program: Union List of Artist Names (ULAN)
[~120,000]
–
Additional name records from Archives nationales, British
Library, NARA, New York State Archives, and Smithsonian
Institution Archives
Data Sources
•
EAD finding aids [~150,000]
–
–
•
MARC21 records [~4.5 million]
–
•
13 regional and statewide consortia
35 repositories in US, UK, and France; multiple US federal
agencies
OCLC WorldCat
Authority records
–
OCLC Research: Virtual International Authority File (VIAF)
[~16 million]
–
Getty Vocabulary Program: Union List of Artist Names (ULAN)
[~120,000]
–
Additional EAC-CPF (or other) name records from Archives
nationales de France, British Library, NARA, New York State
Archives, and Smithsonian Institution Archives
Methods and Processing
• Extract EAC-CPF records from existing EAD-encoded
archival descriptions
– Extracting both creators and referenced CPF names
• Match EAC-CPF records against one another and
against existing authority records (ULAN, VIAF, LCNAF)
– Enhance EAC-CPF by normalizing entries, adding
alternative entries, titles (VIAF), and historical data
(ULAN)
• Create a prototype historical resource and access
system
– Historical data and social-professional networks
– Links to archive, library, and museum resources (by and
about)
The Problem
• Proliferation of the forms of names
– Different names for the same person
– Different people with the same names
• Examples
– from Books in Print (semi-controlled but not
consistent)
– ERIC author index (not controlled)
Goethe
…etc…
John Muir
Library and Archive Authority Control
• Library (or bibliographic) authority control is almost
exclusively about the control of names
• Archival authority control involves biographicalhistorical description of the CPF entity
– Descriptions based on controlled vocabularies, for
example, occupations, place of birth and death
– But also biographical-historical description
• Prose
• Chronological list
• Archival authority control provides context for
understanding records, the context of their creation,
the provenance
Matching and Merging in SNAC 2
• Developing an updateable database of
merged EAC data (dumping Mongo for
PostgreSQL)
– Will permit incremental addition of new data and
support editing and “forced” merges
• All original records and merged records will be
in the database
• Permanent identifiers will be assigned to
merged (and unmerged) EAC output records
– Track these in the database
Merging EAC-CPF Records
LCNAF Repository
ULAN Repository
Cheshire
Search
Connect
exactly
matching
records
EAC Record
Input
Connect
records using
name authority
information
Postgres
Merge
Merged EAC
Records
Output
Merge System Step 1: Load Original
Records
CREATE TABLE original_records (
id bigint NOT NULL,
name character varying(255) DEFAULT
''::character varying NOT NULL,
source_id character varying(255),
collection_id character varying(64),
path character varying(255),
r_type character varying(64) NOT NULL,
from_date date,
from_date_type character varying(64),
to_date date,
to_date_type character varying(64),
processed boolean DEFAULT false NOT
NULL,
last_processed timestamp without time
zone,
record_data text,
created_at timestamp without time zone
NOT NULL,
updated_at timestamp without time zone
NOT NULL,
record_group_id bigint
);
• Parse source EAC
– Key attributes
extracted for merge
use
– Original XML stored
• Timestamp for last
merge run on record
– Resumption of
aborted merge runs or
reruns
Merging EAC-CPF Records
LCNAF Repository
ULAN Repository
Cheshire
Search
Connect
exactly
matching
records
EAC Record
Input
Connect
records using
name authority
information
Postgres
Merge
Merged EAC
Records
Output
But…
• Exact merging assumes that archives are
following LC cataloging practice in their EAD
records
– There are some problems with this assumption
Some failures for merging…
• Different abbreviations:
– A. & G. Carisch & C.
– A. & G. Carisch & Co.
• And spacing issues:
–
–
–
–
A. C. Peters & Bro.
A. C. Peters & Brother.
A. C. Peters. (??)
A. C.Peters & Bro.
• Completeness and alternate rules
– Tabb, John B. (John Banister), 1845-1909.
– Tabb, John Banister, 1845-1909.
• Also differing transliterations for non-Latin scripts
More…
• Variant romanizations (and spacing):
– M. P. Belaieff.
– M. P. Belaïeff.
– M. P. Bieliaev.
– M.P. Belaïeff.
– M.P.Belaïeff.
• Initials vs. names:
– Zabolotskii, N.A.
– Zabolotskii, Nikolai Alekseevich, 1903-1958.
– Zabolotskii.
Merging EAC-CPF Records
LCNAF Repository
ULAN Repository
Cheshire
Search
Connect
exactly
matching
records
EAC Record
Input
Connect
records using
name authority
information
Postgres
Merge
Merged EAC
Records
Output
Search Authority Files
• For each name, formulate a search of the VIAF
database using the Cheshire system
(SGML/XML retrieval system with probabilistic
and Boolean matching)
– Search both the “authoritative” and “nonauthoritative” forms
– Consider any name matching a non-authoritative
form to be a candidate match for the authoritative
form
– Flag EAC records that match the same authority
record as potential matches
Data Sources
•
EAD finding aids [~150,000]
–
–
•
MARC21 records [~4.5 million]
–
•
13 regional and statewide consortia
35 repositories in US, UK, and France; multiple US federal
agencies
OCLC WorldCat
Authority records
–
OCLC Research: Virtual International Authority File (VIAF)
[~16 million]
–
Getty Vocabulary Program: Union List of Artist Names (ULAN)
[~120,000]
–
Additional name records from Archives nationales, British
Library, NARA, New York State Archives, and Smithsonian
Institution Archives
NGRAM or Shingle Matching
Name: Einstein Albert
Shingle sequence: ein, ins, nst, ste, tei, ein … , ert
Probability that the sequence (ins, nst, ste) follows ein is very high for the
name einstein
Shingle Language Model for names
Krishna Janakiraman and Sean Marimpietri - Biograph
Name 1 : Einstein Albert Name 2 : Ainshtain Albert Name 3 : Albert Einstein
hta
In
ein
tai
sht
ste
ein
lbe
ert
ein
tei
rte
nst
alb
ins
ins
ste
al
nsh
nst
lbe
Ain
ins
tei
lbe
ein
ert
ein
In
ein
ain
ert
ein
tei
rte
rte
Shingle Language Model for names
Krishna Janakiraman and Sean Marimpietri - Biograph
Merge System Step 2: Record Matches
CREATE TABLE record_groups (
id bigint NOT NULL,
name character varying(255) DEFAULT
''::character varying NOT NULL,
g_type character varying(64) NOT NULL,
viaf_record text,
ulan_record text,
is_valid boolean,
invalidated_by bigint,
created_at timestamp without time zone NOT
NULL,
updated_at timestamp without time zone NOT
NULL
);
Original
Records
belongs to
Has many
Record
Groups
• Execute merge
algorithm and create
record groups
– pointers from original
records to record
groups
– Can be invalidated
• Matched authority
record stored for
reference
Merging EAC-CPF Records
LCNAF Repository
ULAN Repository
Cheshire
Search
Connect
exactly
matching
records
EAC Record
Input
Connect
records using
name authority
information
Postgres
Merge
Merged EAC
Records
Output
Merge Flagged Records
• For all of the exact matches and authority matches
– Use the Authoritative form of the name
– Combine data from each match into a single EAC-CPF
record
– Retain all source record IDs and information
• Finally, output the merged EAC-CPF records
– Actually – store how to build the merged record in the
database as well
• Records can be regenerated as needed from the merge data
– Assign permanent identifier for the merged record
Merge System Step 3: Create Output
• Using valid record groups:
– generate merged EAC
– assign permanent ARK ID
– write to new EAC file
• Merged XML stored in db, referenced by
record group
– Do not need to regenerate XML
– Keep track of assigned permanent IDs
Merging Conclusions
• There is not a single merging method, but a
staged set of approaches that will allow us to
go from the simplest exact matches, to (we
hope) reliably identifying various variant
forms of a name, etc. when corroborated by
contextual (date, etc.) information (including
“active” dates
Prototype Access System
• text
http://socialarchive.iath.virginia.edu
SNAC
Social Networks and Archival Context
SNAC
Social Networks and Archival Context
NAAC
National Archival Authorities Cooperative
Not the final name
NAAC
National Archival Authorities Cooperative
http://socialarchive.iath.virginia.edu/
NAAC_index.html
Activities
1. Cultivate EAC-CPF expertise across the
archival community, through 140 SAAhosted workshops
2. Develop a blueprint for a sustainable,
national archival authority cooperative
Activities
1. Cultivate EAC-CPF expertise across the
archival community, through 140 SAAhosted workshops
2. Develop a blueprint for a sustainable,
national archival authority cooperative
Activities
1.
2.
Cultivate EAC-CPF expertise across the
archival community, through 140 SAA-hosted
workshops
Develop a blueprint for a sustainable,
national archival authority cooperative
Planning is being extended with proposal
to the Mellon Foundation.
Stay tuned for Spring 2014!
Prototype Access System
• text
http://socialarchive.iath.virginia.edu
Brian Tingle and Adrian Turner
RBMS
Pre-Conference 2012
San Diego, CA
Download