SEAclassics Khmer nt igraphy Moveme

advertisement
r
e
m
h
K
s
c
i
SEAclass
phy Movement
a
r
g
i
p
E
w
e
N
e
h
t
and
Doug Cooper
h in
Center for Researc
guistics
Computational Lin
today
one
where do we come from?
two
what are we?
three where are we going?
one: where do we come from?
the old epigraphy
is about artifacts
and explicit stories
and unique traditions
and definitive editions
it’s about conservation
and studying texts one by one
it’s saved in a box until it’s all done
the new epigraphy
is about texts
and implicit stories
and exploration
and continuous improvement
it’s about dissemination
it’s about collaboration
it’s shared and used as soon as any of it is done
the new epigraphy
doesn’t store the texts in a database
it stores the database in the texts
footnotes comment on texts
tags comment in texts
references send you there
links bring there here
databases collect evidence
attributes denote evidence
the new epigraphy is fully engaged
lending
history
linguistics
archeology
computational linguistics / political science
information retrieval
digital lexicography
image processing
borrowing
texts that were static
have become dynamic
print publication was the ultimate result
of the process
digital publication is a critical element
of the process
theoldnew epigraphy
artifactstexts
explicitimplicit
traditionexploration
definitivecontinuous
competitioncollaboration
conservationaccess
sui generisgeneric
oninsendbringstoresharecollectdenote
result process
two:
what
are
we?
Pyu
Lao
Mon
SEAclassics
Khmer
Thai
Cham
Malay
Burmese
Javanese
SEAclassics
research interoperable* 5th
education on-line through 15th
exploration resources centuries
*interoperability = programmable Web access to data and services
XML tagged
API access
XML tagged
API access
killer apps
for the first millenium
NB:
Everything you are about to see is
running on CRCL’s internal server.
Some of it is on line.
line
All of it is preliminary work.
Dictionary
Analysis
pre-Angkor, Angkor, and MK (Jenner)
readings, grammatical analysis (Jenner)
Corpus
an analytical corpus of all texts
Bitexts
translated / aligned sentence examples
Images
with embedded enhancement tools
Reference
to all available publication (w/permission)
integrated
extensible
interoperable
1. dictionary
search via any aspect of the text
integrated reference for
Sanskrit, Thai, Khmer...
linked to all other SEAclassics services
customized “click-click”
living works …
... open to
comment
correction
extension
2. digital critical editions
EpiDoc encoded (eventually),
with full critical apparatus
allowing a variety of approaches
to the text
3. analytic corpus
KWIC – transcription and estampage
integrated reference
materials
mapping returned sites
restricting search
areas and times
query
expansion
collocate analysis
chronological
distribution
epigraphic, editorial, and lexical views
4. bitext corpus
Old Khmer / English
query expansion
explicit match /
no-match controls
5. images
and images
6. reference
three: where are we going?
technology will change epigraphy
epigraphy will change technology
transliterating Old Khmer into modern Khmer
aren’’t the problems
or deciding if databases should use Ś or Ç
building another website
aren’’t the solutions
translating another inscription
worrying about them
is like worrying about ironing the estampage
instead of building new tools to explore it
if these are our questions ...
compare toponyms within 20km of ...
cross-check likely Javanese loans against ...
when do nak- constructions first ...
find the first images of maay han akaat in ...
what is the areal distribution of ...
where do Pali-skrit neologisms enter the ...
query via the Sanskrit root for ...
is there evidence of infixed “-ngk-” in …
are four-syllable elaborate ...
was there Cham contact with ...
entity tagging
semantic tagging
etymological tagging
bounding-box tagging
syntax/grammar tagging
then these are our problems
the Javanese corpus
the Burmese corpus
the Khmer corpus
the Malay corpus
the Mon corpus
the Tai corpus ...
we need to get from translating words
archeological onamastic
topographic lexicographic
grammatical historic political
semantic visual linguistic
to answering questions
MON|PYU|THAI|KHMER|LAO|CHAM
LINGUISTICS|EPIGRAPHY|HISTORY
SANSKRIT|CHINESE|PALI|JAVANESE
from here
MON|PYU|THAI|KHMER|LAO|CHAM
LINGUISTICS|EPIGRAPHY|HISTORY
SANSKRIT|CHINESE|PALI|JAVANESE
to there
SEAclassics Khmer
the new epigraphy
Download