Language Life Lines By Jenny Maloney

advertisement
Language Life Lines
By Jenny Maloney
For twenty-four years, Jugal Kalita taught hundreds of
students in his computer science classes at the
University of Colorado Colorado Springs (UCCS). He
has watched the college grow from an entirely
undergraduate set of programs to a sprawling, post-graduate research university. Working in both a growing
university and the constantly changing computer science field, Kalita has learned to adapt and thrive.
The rapidly changing field has inspired Kalita in his current research to create resources for endangered
languages. Across the world, resource-rich languages like English, Mandarin Chinese, Spanish and Hindi are
crowding out resource-poor languages. “If you look at a language like Cherokee, here in the US,” Kalita explained,
“or a language like Dimasa in northeast India or a language like Tai Daeng in Vietnam and Laos, these are
languages that barely have any resources. However, most of these languages have at least one bilingual
dictionary because of explorers, or people who tried to colonize these places, or because of people in the church
who wanted to translate the Bible. But that’s all, in terms of lexical resources, a language like these might have –
just one single bilingual dictionary with a limited number of terms. But, maybe, now computer technology can
help these languages.”
Kalita and his colleagues are attempting to expand available dictionaries to endangered language speakers by
writing computer algorithms to translate a low-resource language like Cherokee into Chinese, French, Hindi, or
English. Because his focus is artificial intelligence, Kalita is teaching the computer to do the translation work.
“Using resources like the Bing translator or the Google translator and limited parallel textual documents, we’re
trying to translate dictionaries. We’re trying to use limited resources available on the Internet to develop a whole
bunch of additional dictionaries, automatically, without human help,” said Kalita.
Among the resources Kalita is working to construct are thesauruses and Wordnets, which are lexical databases
that group core elements of a language together. “In English there is a database, or we could call it an ontology,
of words and how words are related to each other – which word is a synonym of which other word, which word is
an antonym, which word is a so-called hypernym or hyponym, which word denotes an object that is a part of
another word,” said Kalita.
The most well-known Wordnet is the one compiled by Princeton for the English language. Kalita and his students
are attempting to create a similar resource for endangered languages. Kalita explained, “That kind of word
ontology or Wordnet resource is quite valuable in performing tasks computationally. So we’re trying to create
Wordnets for these languages which are endangered or who have very few resources.”
While the main funding for creating language resources generally goes to those studying the dominant languages,
Kalita feels developing resources for endangered languages is necessary work. He said, “Recently, there has been
an understanding among researchers that if these languages go away, it makes all of humanity poorer. The
diversity of languages, the diversity of cultures, the diversity of thought that is expressed in terms of languages
enrich everyone.”
Since computer science requires a great deal of hand-on work,
writing and rewriting programs – then evaluating the
effectiveness of those programs – and encompasses a wide
variety of subjects, Kalita does not work in a vacuum.
Collaboration and teamwork are key to making sure his
“When people from
research works.
different fields come
Over the years, Kalita has worked with hundreds of
undergraduate and graduate students, and in the past several
together, new and exciting
years he’s had the opportunity to work with UCCS’s new Ph.D.
things are likely to happen “ students. His students are his first collaborators.
Often, he will come up with the seed of the idea and then encourage his students to develop the ideas further
and develop computer programs to test out their ideas. “Usually I come up with the basic ideas myself – what
topic we want to work on in a broad area. Sometimes with some students, we come up with a few questions or
problems for which we need answers.”
Next, he tells the student, “Research these problems and choose the problem you’re most interested in.” He
added, “I work on explaining papers, asking questions, proposing possible answers, but they’re the ones who do
the deeper investigation.”
He went on to explain, “Because we’re in computer science we can’t just do theoretical work. People have to
write computer programming to verify whatever hypothesis they may have or whatever solution they may have
come up with.”
Another area of Kalita’s research that involves collaboration with students is designing automatically generated
comprehension questions to test natural language processing and artificial intelligence. “Suppose we were
working with a K-2 child and he or she reads a passage, a short story, or a fairy tale or a Dr. Seuss book. After
reading the book we want to ask a few questions to see if the child understood it. Sure, a teacher can generate
those questions on her own, but can a machine do it? Automatically?”
To help verify results of these computer programs, which cross a broad spectrum of subjects, Kalita has worked
with different departments throughout his history at UCCS. “I’ve worked with people in electrical engineering,
mechanical engineering, psychology, communication, biology, chemistry, and linguistics on our campus.”
His collaborations aren’t limited to Colorado Springs – he’s worked with professors and students from Brigham
Young University, Louisiana State University, SUNY-Buffalo, Colorado College, University of Texas, University of
Minnesota, and Stanford University.
He’s also collaborated outside the United States. He has worked closely with colleagues at several universities in
India, and in particular Tezpur University (just forty miles from where he grew up), both in language resource
production and several other fields. For example, he co-wrote a book on network security, Network Anomaly
Detection: A Machine Learning Perspective, with Dr. Dhurba Kumar Bhattacharyya. “His area of interest in
network security compliments my interest in artificial intelligence and machine learning, and vice versa,” Kalita
said.
Kalita sees great benefit in working with collaborators: “It is always a great idea to look at a problem from
different perspectives. When people from different fields come together, new and exciting things are likely to
happen.”
For his expansive research and passion for teaching, Kalita was recognized at UCCS in 2011 with the Chancellor’s
Award, which is given to faculty who excel in research, service, and teaching. He has also received teaching,
research and service awards in the College of Engineering and Applied Science at UCCS.
Seeing his students succeed is one of the greatest points of pride for Kalita. He finds pleasure in working with
bright undergraduate researchers. For the past several summers, he has been kept busy with the Research
Experience for Undergraduates program, funded by the National Science Foundation. About twenty published
papers have resulted from the grant, and in these papers undergraduates were the first authors.
Download