szigeti

advertisement
New Frontiers in Auto-translation:
The HAH* Solution
An ISyracuseHigh Joint Initiative
Helen Szigeti, ISI
Abby Goodrum, Syracuse University
Helen Atkins, Highwire Press
* HAH: Helen, Abby, and another Helen
Issue: Citedness
•
Why aren’t JASIST authors more highly cited than they are?
Problem: Incomprehensibility
•
No one can understand articles in JASIST
•
Hence, no one cites JASIST
•
JASIST authors do not receive large amounts of grant
money, lucrative speaking engagements, a smooth
path to tenure, or invitations for guest appearances
on Oprah
Evidence of the problem
•
1999 JASIS article by HB Babs, HMS Trix, and A Bala:
“The synthesis of specialty narratives from co-citation
clusters. Part 1: Utilization of a real-time self organizing
approach to term co-occurrence and word frequency
analysis through collaborative filtering of
multidimensional databases.”
Hypothesis: Comprehension is time-consuming
•
By the time a reader reaches the end of a JASIST article
with a full understanding of the ideas and issues
presented s/he has forgotten why s/he was reading the
article in the first place
Goal: Reduce the time needed to understand a
JASIST article
Solution: HAH Trans-JASIST Device
sm
•
Automatically parses out pseudo-scholarly info-babble
leaving only root concepts, stop words, and thinly veiled
polysyllabic expletives.*
•
“Corporate” Version (2.0; in beta) can also reversetranslate from a simple executive memorandum to a
quality scholarly paper suitable for publication in any
information science journal.
* Note: ISyracuseHigh is currently working on a related parser that
will be capable of capitalizing on these expletives as a means
of generating a new method of relevance ranking
Elements of the Solution: part 1
•
HAH Redundancy Reducer (HAR-HAR)
- Occupational tendency for information scientists to
utilize the same data set to publish multiple papers
- The HAR-HAR takes a work or a corpus of work by a
single author and reduces it to a single paragraph (or in
some cases, a single phrase)
Elements of the Solution: part 2
•
HAH Suess-O-Mapper (HAH-SOMMore)
- Our research uncovered a fundamental linguistic key*
that underlies all scholarly communication/ publication
patterns worldwide
- The HAH-SOMMore uses concept mapping algorithms
against the output from the HAR-HAR redundancy
reducer to generate a comprehensible, natural language
alternative to the original text.
* From the seminal work by Dr. Suess entitled One Fish, Two
Fish, Red Fish, Blue Fish.
Demonstrations of the System
•
Academic paper to natural language
•
Corporate memo to academic paper
Academic paper to natural language
•
“The synthesis of specialty narratives from co-citation
clusters. Part 1: Utilization of a real-time self organizing
approach to term co-occurrence and word frequency
analysis through collaborative filtering of multidimensional databases.” (Babs, Trix, and Bala)
•
After reduction:
synthesis self-ego to group visual word and free ISI science
data through from grant of no-tenure wine damn damn damn
•
After mapping to natural language...
Academic paper to natural language
•
“A pretty picture we drew by putting ISI data (which we
got for free) into visualization software to show that
medicine can be considered a sub-category of life
sciences (who’da thunk?): We would have done more
but we blew our grant money on Merlot and DVDs.”
Corporate memo to academic paper
•
“Subject: Unauthorized use of telephone, fax, and email
for personal reasons.”
•
After reverse translation:
“Policy analysis for topical consensus on the roles,
rights, and responsibilities of individuals toward digital
materials and communication protocols within the
corporate learning organization: Optimization of
transactional analysis to benchmark performance
measures in a networked environment.”
Results
•
Although our translation engine has a 93% success rate it
does not solve the problem initially identified by the
research team
•
Original hypothesis: If readers could understand JASIST
articles within a shorter time period then citations to these
articles would increase
•
Actual outcome: Once fully comprehended in a
reasonable time frame, JASIST articles are even less
frequently cited because no worthwhile data,
methodologies, or conclusions are discernable
The HAH Axiom:
Comprehension works against citedness.
Conclusion
•
Do not try to be clear -- just keep doing what you’re doing.
Thank you!
ISyracuseHigh contact information:
Helen Szigeti
helen.szigeti@isinet.com
Abby Goodrum aagoodru@syracuse.edu
Helen Atkins
something@highwire.org?
Download