1 - Text Summarization

advertisement
EECS 595 / LING 541 / SI 661&761
Natural Language Processing
Fall 2005
Lecture Notes #1
Introduction
Course logistics
• Instructor: Prof. Dragomir Radev (radev@umich.edu)
Ph.D., Computer Science, Columbia University
Formerly at IBM TJ Watson Research Center
• Times: Thursdays 2:40-5:25 PM, in 411, West Hall
• Office hours: TBA, 3080 West Hall Connector
Course home page:
http://www.si.umich.edu/~radev/NLP-fall2005
Example (from a famous movie)
Dave Bowman: Open the pod bay doors, HAL.
HAL: I’m sorry Dave. I’m afraid I can’t do that.
Example
I saw her fall
• How many different interpretations does the
above sentence have? How many of them
are reasonable/grammatical?
Example 1
The Standard and Poor's 500 and the Nasdaq
composite index both reached four-year highs
Thursday as investors, unfazed by oil prices nearing
$70 per barrel, welcomed a raft of strong earnings
reports.
Example 1
The Standard and Poor's 500 and the Nasdaq
composite index both reached four-year highs
Thursday as investors, unfazed by oil prices nearing
$70 per barrel, welcomed a raft of strong earnings
reports.
Example 1
The Standard and Poor's 500 and the Nasdaq
composite index both reached four-year highs
Thursday as investors, unfazed by oil prices nearing
$70 per barrel, welcomed a raft of strong earnings
reports.
Example 1
The Standard and Poor's 500 and the Nasdaq
composite index both reached four-year highs
Thursday as investors, unfazed by oil prices nearing
$70 per barrel, welcomed a raft of strong earnings
reports.
Example 1
The Standard and Poor's 500 and the Nasdaq
composite index both reached four-year highs
Thursday as investors, unfazed by oil prices nearing
$70 per barrel, welcomed a raft of strong earnings
reports.
Example 1
The Standard and Poor's 500 and the Nasdaq
composite index both reached four-year highs
Thursday as investors, unfazed by oil prices nearing
$70 per barrel, welcomed a raft of strong earnings
reports.
Example 2
Accenture posts higher earnings
Consulting and technology services firm beats estimates; stock gains
in after-hours trading.
July 7, 2005: 4:35 PM EDT
NEW YORK (Reuters) - Accenture Ltd., one of the world's largest
consulting and technology services firms, posted a higher quarterly
profit Thursday boosted by a rebound in consulting demand.
Fiscal third-quarter net income more than doubled to about $484 million, or
51 cents a share, from $210 million, or 37 cents a share, a year earlier, the
company said.
Analysts had expected earning of 43 cents a share, according to First Call.
Accenture stock rose about 2 percent in after-hours trading after falling nearly
6 percent in regular New York Stock Exchange trading.
• Gary Larson (“The Far Side”) cartoon:
• What we say to dogs:
– “Okay Ginger! I’ve had it! You stay out of the
garbage! Understand, Ginger?“
• What they hear:
– “Blah Ginger! blah blah blah blah blah blah
blah blah blah blah blah Ginger?"
Time Warner to hold off on Cablevision
But top Time Warner execs said it may eventually be interested in the cable assets.
July 8, 2005: 7:20 PM EDT
SUN VALLEY, Idaho (Reuters) - A top Time Warner Inc. executive said Friday it could not bid for
Cablevision until it completes a deal to buy Adelphia Communications Corp., splashing cold water on
early buyout speculation.
Time Warner is in a joint deal with Comcast Corp. to buy bankrupt cable provider Adelphia Communications
Corp.
"We can't do anything else until we get it (Adelphia) integrated," said Don Logan, chairman of Time Warner's
media and communications group.
But he added, "We've always said we are interested in Cablevision. ... Anything is possible."
In June, the Dolan family offered Cablevision shareholders about $33.50 per share in a $7.9 billion deal to
take the company private.
Analysts and one of Cablevision's top investors have said the offer is too low and could put the cable system,
which serves 3 million customers in the New York area, into play for other suitors, including Time Warner
Cable and Comcast.
Wall Street analysts said in June that Time Warner, if it were to bid, could top the offer with a $35 to $40 per
share bid. Time Warner is the parent company of this Web site.
Time Warner chief executive Dick Parsons said on Friday his company's decision about whether to buy
Cablevision Corp. rests on whether the Dolan family decides to put it up for sale.
"Chuck (Dolan) controls it and it's not as if we could take it away from him," Parsons said during a break at
the Allen & Co. conference in Sun Valley, Idaho. "When he's ready to bring that asset to market he knows
we're here."
Parsons would not comment on whether he has had recent conversations with Dolan about buying
Cablevision.
Parsons said he and Dolan agree that cable assets are undervalued and that now is a good time to buy them.
Time Warner is the parent company of CNN/Money.
Stocks edge up
Major gauges make tentative gains at Friday's open after steep Fed-inspired selloff.
July 1, 2005: 9:46 AM EDT
NEW YORK (CNN/Money) - Stocks inched higher early Friday, recovering some from the big selloff after the Federal
Reserve boosted interest rates again, and signaled it didn't intend to pause anytime soon.
The Dow Jones industrial average (down 99.51 to 10,274.97, Charts), the broader Standard & Poor's 500 (up 2.50 to
1,193.83, Charts) index and the Nasdaq composite (up 4.84 to 2,061.80, Charts) all added a few points in the early going, with
the Nasdaq lagging the blue chip indicators a bit.
Stocks ended a mixed quarter on a down note Thursday, with the Dow losing more than 100 points after the Fed raised the
target for its fed funds rate, an overnight bank lending rate, another quarter point to 3.25 percent.
In the closely watched statement, the central bankers acknowledged the impact of higher energy prices and other negatives,
but said the economic expansion remains on track. They also pledged to keep raising rates at a "measured" pace, all of which
suggested that they don't plan to pause in the near term.
Gains early Friday were broad based, with 27 out of 30 Dow issues rising.
In corporate news, Microsoft (up $0.02 to $24.86, Research) has settled antitrust claims made by IBM (unchanged at $74.20,
Research), the companies said Friday. The software leader will pay IBM $775 million as part of the deal.
A number of economic reports were due around 10 a.m. ET.
The Institute for Supply Management's manufacturing index for June was expected to have risen to 51.5 in the month from
51.4 in May, according to a consensus of economists surveyed by Briefing.com.
The revised read on June consumer sentiment from the University of Michigan was also due, as was the May read on
construction spending.
Treasury prices slipped after Thursday's big rally. The fall raised the yield on the 10-year note to 3.94 percent from 3.92
percent late Thursday. Treasury prices and yields move in opposite directions.
In currency trading, the dollar jumped versus the euro and the yen.
U.S. light crude oil for August delivery rose 32 cents to trade at $56.82 a barrel in electronic trading. Crude set a record closing
price for a nearby futures contract at $60.54 on Monday.
COMEX gold fell $1.20 to $435.90 an ounce.
In global trade, Asian-Pacific markets ended mostly lower, and European markets rose at midday.
Google cracks $300
Shares of the popular search engine pass $300 for the first time and are now up 260% since IPO.
June 27, 2005: 5:52 PM EDT
By Paul R. La Monica, CNN/Money senior writer
NEW YORK (CNN/Money) - Shares of Google, the popular search-engine company, surpassed the $300 level for the first time on Monday,
sparking memories of the dot-com stock craze of the late 1990s.
Google gained 2.3 percent to finish at $304.10, slightly below its high for the day of $304.30. The stock has now gained nearly 260 percent since it went
public last August at $85 a share.
Much of the optimism surrounding Google comes from the fact that it is the leader in the white-hot online advertising industry. The company reported
much better than expected sales and earnings for the first quarter, thanks to a booming market for online advertising, particularly ads tied to specific
keyword searches.
And during the past few weeks, Google has released several new features -- including a desktop search function for businesses and a test version of a
personalized home page tool -- that should help the company remain competitive against rivals Yahoo! and Microsoft.
Several analysts have also speculated that Google will soon launch an online payment service that could compete against eBay's PayPal. In addition,
many investors have been betting that the company, which now has a market value of nearly $85 billion, will soon be added to the benchmark S&P 500
index.
But the stock's meteoric rise as of late -- shares have surged more than 50 percent since the company reported first-quarter results in mid-April -- has
some analysts thinking that the stock could take a hit in the near future.
"You might see the stock pause temporarily," said Marianne Wolk, an analyst with Susquehanna Financial Group. "For the longer term, we're still very
bullish but in the very short term it wouldn't be a surprise to see the stock stabilize or pull back."
The key for Google will be how strong its second quarter results are. Google is set to report these numbers on July 21. Analysts expect Google's sales,
excluding revenues it shares with affiliates, a figure known as traffic acquisition costs or TAC, to come in at $840 million, nearly double last year's levels.
Earnings, excluding certain one-time charges, are forecast at $1.21, an increase of 121 percent from a year ago.
Wolk thinks that Google should meet these targets but does not believe the company will report results that are significantly better than consensus
projections. And if Google does not continue to beat estimates, the stock could take a bath.
"For Google to keep heading higher, it's absolutely critical that they keep hitting numbers. Everyone now believes the story," said John Tinker, an
analyst with ThinkEquity Partners.
Still, many investors are finding it hard to bet against Google because it has been posting extremely strong levels of sales growth and healthy profit
margins as a public company. So the comparisons to the late 1990s, when shares of many unprofitable Internet companies soared solely due to hype,
may not be apt.
To that end, Google is expected to generate nearly $3.6 billion in sales, excluding TAC and revenue of $5 billion next year as the company continues to
benefit from a shift of advertising dollars from more mainstream media sources such as television, radio, and newspapers, to the Web.
In addition to its ubiquitous search engine, Google has branched out into related areas in order to capitalize on the boom in online advertising. The
company has a comparison shopping site, Froogle, a free e-mail service called Gmail which features ads embedded in e-mails, and a local search site
that operates as kind of a Web version of the Yellow Pages.
Google also has expanded rapidly abroad, with sales from outside the U.S. accounting for nearly 40 percent of total sales in the first quarter.
What's more, some argue that Google is not overvalued, since it continues to trade at a discount to its top rival, Yahoo. However, this gap has narrowed
significantly as of late. Google's price-to-earnings ratio, based on 2005 earnings estimates, is 58. Yahoo trades at 61.5 times earnings estimates for this
year.
"Google is not an undiscovered stock any more," said Tinker. "It's no longer inefficiently priced."
And Google also potentially faces the issue of the summer sluggishness that typically affects Internet stocks. Last year, shares of several Internet
companies plunged in July as results did not live up to lofty expectations.
Silly sentences
•
•
•
•
•
•
•
•
•
•
•
Children make delicious snacks
Stolen painting found by tree
I saw the Grand Canyon flying to New York
Court to try shooting defendant
Ban on nude dancing on Governor’s desk
Red tape holds up new bridges
Iraqi head seeks arms
Blair wins on budget, more lies ahead
Local high school dropouts cut in half
Hospitals are sued by seven foot doctors
In America a woman has a baby every 15 minutes. How does
she do that?
Main problems in language
• Novel words and usages
– Blogs, little “r” me,7342.67
– Spam as verb, email
• Inconsistencies
– Beverly Hills, Beverly Sills
– junior college, college junior
– pet spray, pet llama
• Parsing problems
– Cup holder
– Federal Reserve Board Chairman
• Implicature/reasoning
• World knowledge
• Subjectivity, scoping, negation
Types of ambiguity
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Morphological: Joe is quite impossible. Joe is quite important.
Phonetic: Joe’s finger got number.
Part of speech: Joe won the first round.
Syntactic: Call Joe a taxi.
Pp attachment: Joe ate pizza with a fork. Joe ate pizza with meatballs. Joe ate pizza
with Mike. Joe ate pizza with pleasure.
Sense: Joe took the bar exam.
Modality: Joe may win the lottery.
Subjectivity: Joe believes that stocks will rise.
Scoping: Joe likes ripe apples and pears.
Negation: Joe likes his pizza with no cheese and tomatoes.
Referential: Joe yelled at Mike. He had broken the bike.
Joe yelled at Mike. He was angry at him.
Reflexive: John bought him a present. John bought himself a present.
Ellipsis and parallelism: Joe gave Mike a beer and Jeremy a glass of wine.
Metonymy: Boston called and left a message for Joe.
Synonyms/paraphrases
The S&P 500 climbed 6.93, or 0.56 percent, to 1,243.72,
its best close
since June 12, 2001.
The Nasdaq gained 12.22, or 0.56 percent, to 2,198.44 for its best showing since June 8, 2001.
The DJIA
rose 68.46, or 0.64 percent, to 10,705.55,
its highest level
since March 15.
What is Natural Language
Processing
• Natural Language Processing (NLP) is the
study of the computational treatment of
natural language.
• NLP draws on research in Linguistics,
Theoretical Computer Science,
Mathematics and Statistics, Artificial
Intelligence, Psychology, etc.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
NLP
Information extraction
Named entity recognition
Trend analysis
Subjectivity analysis
Text classification
Anaphora resolution, alias resolution
Cross-document crossreference
Parsing
Semantic analysis
Word sense disambiguation
Word clustering
Question answering
Summarization
Document retrieval (filtering, routing)
Structured text (relational tables)
Paraphrasing and paraphrasing/entailment ID
Text generation
Machine translation
What is needed: (1) linguistic knowledge
• Examples:
– Zipf’s law: rank(wi)*freq(wi) = const
– Collocations:
• Strong beer but *powerful beer
• Big sister but *large sister
• Stocks rise but ?stocks ascend (225,000 hits on Google vs. 47 hits)
– Constituents:
•
•
•
•
Children eat pizza.
They eat pizza.
My cousin’s neighbor’s children eat pizza.
_ Eat pizza!
– Burstiness
• P(ct=2|ct>=1)
• How to get it:
– Manual rules
– Automatically acquired from large text collections (corpora)
Linguistics
• Knowledge about language:
–
–
–
–
–
–
–
Phonetics and phonology - the study of sounds
Morphology - the study of word components
Syntax - the study of sentence and phrase structure
Lexical semantics - the study of the meanings of words
Compositional semantics - how to combine words
Pragmatics - how to accomplish goals
Discourse conventions - how to deal with units larger
than utterances
What is needed: (2) mathematical and
computational tools
•
•
•
•
•
•
•
•
•
•
•
Language models
Estimation methods
Hidden Markov Models (HMM): for sequences
Context-free grammars (CFG): for trees
Conditional Random Fields (CRF)
Generative/discriminative models
Maximum entropy models
Random walks
Latent semantic indexing (LSI)
+ Representation issues
+ Feature engineering
Theoretical Computer Science
• Automata
– Deterministic and non-deterministic finite-state automata
– Push-down automata
• Grammars
– Regular grammars
– Context-free grammars
– Context-sensitive grammars
• Complexity
• Algorithms
– Dynamic programming
Mathematics and Statistics
•
•
•
•
•
•
Probabilities
Statistical models
Hypothesis testing
Linear algebra
Optimization
Numerical methods
Artificial Intelligence
• Logic
– First-order logic
– Predicate calculus
• Agents
– Speech acts
• Planning
• Constraint satisfaction
• Machine learning
Existing applications
•
•
•
•
•
•
•
Web search
Natural language interfaces to databases
Parsing job postings
Military intelligence
Summarizing medical records
Information extraction for databases
Wrapper induction
Potential applications
• Trend recognition
• Db conversion + named entity extraction +
classification + relation extraction
• Detecting change
• Summarization
• Social network analysis
• Assigning subjectivity scores (stars)
• Sentiment classification
• Alignment of text w/ other signal (time series)
• Record linkage
Current work at CLAIR
•
•
•
•
•
•
•
•
•
Semi-supervised entity and relation extraction
Subjectivity analysis + factuality extraction
Protein interaction recognition
Text summarization
Text mining from the Web
Lexical network models of the Web
Syntactic alignment
Chronology recovery
Classification
Final remarks
•
•
•
•
Language is not adversarial
It is used to convey useful information
Hard to extract this information automatically
Need to use NLP
–
–
–
–
–
–
–
–
Inference: mathematics, statistics, machine learning
Networks/fields
Graph theory
Differential equaitions
Statistics/optimization
Linguistics/KR/AI
Sequence alignment
Linear algebra/vector analysis
Ambiguity
I saw her fall.
• The categories of knowledge of language can be
thought of as ambiguity-resolving components
• How many different interpretations does the above
sentence have?
• How can each ambiguous piece be resolved?
• Does speech input make the sentence even more
ambiguous?
Time flies like an arrow.
The alphabet soup
(NLP vs. CL vs. SP vs. HLT vs. NLE)
•
•
•
•
•
•
NLP (Natural Language Processing)
CL (Computational Linguistics)
SP (Speech Processing)
HLT (Human Language Technology)
NLE (Natural Language Engineering)
Other areas of research: Speech and Text Generation,
Speech and Text Understanding, Information Extraction,
Information Retrieval, Dialogue Processing, Inference
• Related areas: Spelling Correction, Grammar Correction,
Text Summarization
Some demos
•
•
•
•
•
•
•
•
AT&T Labs Text to Speech (http://www.research.att.com/projects/tts/demo.html)
Babelfish (http://babelfish.altavista.com)
OneAcross (http://www.oneacross.com)
AskJeeves (http://www.ask.com)
IONaut (http://www.ionaut.com:8400) – seems to be down
NSIR (http://tangra.si.umich.edu/clair/NSIR/html/nsir.cgi)
AnswerBus (http://www.answerbus.com)
NewsInEssence (http://www.newsinessence.com)
The Turing Test
• Alan Turing: the Turing test (language as test for intelligence)
• Three participants: a computer and two humans (one is an
interrogator)
• Interrogator’s goal: to tell the machine and human apart
• Machine’s goal: to fool the interrogator into believing that a
person is responding
• Other human’s goal: to help the interrogator reach his goal
Q: Please write me a sonnet on the topic of the Forth Bridge.
A: Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764.
A: 105621 (after a pause)
Some brief history
• Foundational insights (40’s and 50’s): automaton (Turing),
probabilities, information theory (Shannon), formal
languages (Backus and Naur), noisy channel and decoding
(Shannon), first systems (Davis et al., Bell Labs)
• Two camps (57-70): symbolic and stochastic.
Transformation grammar (Harris, Chomsky), artificial
intelligence (Minsky, McCarthy, Shannon, Rochester),
automated theorem proving and problem solving (Newell
and Simon)
Bayesian reasoning (Mosteller and Wallace)
Corpus work (Kučera and Francis)
Some brief history
• Four paradigms (70-83): stochastic (IBM), logicbased (Colmerauer, Pereira and Warren, Kay,
Bresnan), nlu (Winograd, Schank, Fillmore),
discourse modelling (Grosz and Sidner)
• Empiricism and finite-state models redux (83-93):
Kaplan and Kay (phonology and morphology),
Church (syntax)
• Late years (94-03): strong integration of different
techniques, different areas (including speech and
IR), probabilistic models, machine learning
The state of the art and the nearterm future
• World-Wide Web (WWW)
• Sample scenarios:
–
–
–
–
–
–
–
–
–
generate weather reports in two languages
teaching deaf people to speak
translate Web pages into different languages
speak to your appliances
find restaurants
answer questions
grade essays (?)
closed-captioning in many languages
automatic description of a soccer game
Structure of the course
• Three major parts:
– Linguistic, mathematical, and computational background
– Computational models of morphology, syntax, semantics, discourse,
pragmatics
– Applications: text generation, machine translation, information extraction,
etc.
• Three major goals:
– Learn the basic principles and theoretical issues underlying natural
language processing
– Learn techniques and tools used to develop practical, robust systems that
can communicate with users in one or more languages
– Gain insight into many open research problems in natural language
Readings
• Speech and Language
Processing
(Daniel Jurafsky and James
Martin)
Prentice-Hall, 2000
ISBN: 0-13-095069-6
• Handouts given in class
• 1-2 chapters per week
Optional readings:
Natural Language Understanding by Allen
Foundations of Statistical Natural Language Processing by Manning and Schütze.
Grading
•
•
•
•
•
Four homework assignments (40%)
Midterm (15%)
Final project (20%)
Final exam (25%)
Additional requirements for SI761
Assignments
• (subject to change)
– Finite-state modeling, part of speech tagging, and
information extraction
• Fsmtools/lextools/JMX (Bell Labs, Penn)
– Tagging and parsing
• Brill tagger/Charniak parser (JHU, Brown)
– Machine translation
• GIZA++/Rewrite decoder (Aachen, JHU, ISI)
– Text generation
• FUF/Surge (Columbia)
Syllabus
Introduction (JM1)
Linguistic Fundamentals
Regular Expressions and Automata (JM2)
Morphology and Finite-State Transducers (JM3)
Word Classes and Part of Speech Tagging (JM8)
Context-Free Grammars for English (JM9)
Parsing with Context-Free Grammars (JM10)
Features and Unification (JM11)
Lexicalized and Probabilistic Parsing (JM12)
Natural Language Generation (JM20) (Cont’d)
The Functional Unification Formalism (Handout)
Language and Complexity (JM13)
Representing Meaning (JM14)
Semantic Analysis (JM15)
Discourse (JM18)
Rhetorical Analysis (Handout)
Dialogue and Conversational Agents (JM19)
Other meetings
• CLAIR meeting
(TBA)
• Artificial Intelligence Seminar
(Tuesdays 4-5:30)
• STIET
(Thursdays 4-5:30)
Projects
Each student will be responsible for designing and completing a research project that
demonstrates the ability to use concepts from the class in addressing a practical
problem. A significant part of the final grade will depend on the project assignment.
Students can elect to do a project on an assigned topic, or to select a topic of their own.
The final version of the project will be put on the World Wide Web, and will be
defended in front of the class at the end of the semester (procedure TBA).
In some cases (and only with instructor’s approval), students may be allowed to work
in pairs when the project’s scope is significant.
Sample projects
•
•
•
•
•
•
•
•
•
•
•
•
•
Noun phrase parser
Paraphrase identification
Question answering
NL access to databases
Named entity tagging
Rhetorical parsing
Anaphora resolution, entity
crossreference
Document and sentence
alignment
Using bioinformatics methods
Encyclopedia
Information extraction
Speech processing
Sentence normalization
•
•
•
•
•
•
•
•
•
•
•
•
•
Text summarization
Sentence compression
Definition extraction
Crossword puzzle generation
Prepositional phrase attachment
Machine translation
Generation
Semi-structured document
parsing
Semantic analysis of short
queries
User-friendly summarization
Number classification
Domain-specific PP attachment
Time-dependent fact extraction
Main research forums and other
pointers
• Conferences: ACL/NAACL, SIGIR, AAAI/IJCAI, ANLP, Coling,
HLT, EACL/NAACL, AMTA/MT Summit, ICSLP/Eurospeech
• Journals: Computational Linguistics, Natural Language Engineering,
Information Retrieval, Information Processing and Management, ACM
Transactions on Information Systems, ACM TALIP, ACM TSLP
• University centers: Columbia, CMU, JHU, Brown, UMass, MIT,
UPenn, USC/ISI, NMSU, Michigan, Maryland, Edinburgh,
Cambridge, Saarland, Sheffield, and many others
• Industrial research sites: IBM, SRI, BBN, MITRE, MSR, (AT&T, Bell
Labs, PARC)
• Startups: Language Weaver, Ask.com, LCC
• The Anthology: http://www.aclweb.org/anthology
What this course is NOT
•
EECS 597 / LING 792 / SI 661 “Language and Information”, last taught in
Winter 2005, essentially an introduction to corpus-based and statistical NLP.
– Topics covered: introduction to computational linguistics, information theory, data
compression and coding, N-gram models, clustering, lexicography, collocations,
text summarization, information extraction, question answering, word sense
disambiguation, analysis of style, and other topics .
•
SI 760 “Information Retrieval”, last taught Winter 2005.
– Topics covered: information need, IR models, documents, queries, query languages,
relevance, retrieval evaluation, reference collections, query expansion and
relevance feedback, indexing and searching, XML retrieval, language modeling
approaches, crawling the Web, hyperlink analysis, measuring the Web, similarity
and clustering, social network analysis for IR, hubs and authorities, PageRank and
HITS, focused crawling, relevance transfer, question answering
•
•
The new advanced NLP/IR course, to be offered Winter 2006.
An undergraduate Linguistics course such as Ling 212 “Intro to the Symbolic
Analysis of Language” or Ling 320 “Programming for Linguistics and
Language Studies”
Other sites
• Johns Hopkins University (Jason
Eisner)
http://www.cs.jhu.edu/~jason/465/
• Cornell University (Lillian Lee)
http://courses.cs.cornell.edu/cs674/2002SP/
• Stanford University (Chris Manning)
http://www.stanford.edu/class/cs224n/
• JHU Summer workshop
http://www.clsp.jhu.edu/ws2003/calendar/preliminary.shtml
Readings
• J&M Chapters 1, 2
• “What is Computational Linguistics” by
Hans Uszkoreit
http://www.coli.uni-sb.de/~hansu/what_is_cl.html
• Lecture notes #1
Download