presentation - University of Illinois at Urbana

advertisement
Introduction to
IR Research Methodology
ChengXiang Zhai
Department of Computer Science
University of Illinois, Urbana-Champaign
http://www.cs.uiuc.edu/homes/czhai, czhai@cs.uiuc.edu
Outline
1. What is research?
2. How to prepare yourself for research?
3. How to identify a good research problem?
4. How to identify, frame, and refine an IR research
problem?
5. How to formulate and test IR research hypotheses?
6. How to write and publish an IR paper?
Part 1. What is research?
•
What is Research?
Research is to
– Discover new knowledge
•
•
•
– Seek answers to questions
Basic research
– Often driven by curiosity for fundamental/general
knowledge/understanding
– High impact examples: relativity theory, DNA, …
Applied research
– Driven by practical needs
– High impact examples: computers, transistors, vaccinations, …
The boundary is vague, and distinction isn’t important, but all
require innovations and discovery of new knowledge; others
must be able to learn something new from your work that
they can’t learn from any existing literature
Why Research?
Curiosity
Amount of
knowledge
Basic Research
Advancement of
Technology
Applied Research
Utility of
Applications
Application
Development
Quality of
Life
Where’s IR Research?
Amount of
knowledge
Basic Research
Advancement of
Technology
Utility of
Applications
Computer Science
Application
Applied Research Development
Quality of
Life
Where’s Your Position?
Different position benefits from different collaborators
Amount of
knowledge
Basic Research
Advancement of
Technology
Applied Research
Utility of
Applications
Application
Development
Quality of
Life
Research Process
• Identification of a topic & raise a research question
– research question = question with no answer, or no
good answer described in any literature
• Hypothesis formulation
– offer a (possible) answer to the research question
– formulate a hypothesis based on your answer (e.g.,
“your new method better than an existing method”)
• Design experiments to test hypothesis (e.g.,
compare X and Y on the data)
• Draw conclusions and repeat the cycle if needed
Research Methods
• Exploratory research: Identify and frame a new
problem (e.g., “a survey/outlook of personalized
search”)
• Constructive research: Construct a (new) solution
to a problem (e.g., “a new method for expert finding”)
• Empirical research: evaluate and compare existing
solutions (e.g., “a comparative evaluation of link
analysis methods for web search”)
•
The “E-C-E cycle”:
exploratoryconstructiveempiricalexploratory…
Types of Research Questions and Results
• Exploratory (Framework): What’s out there?
• Descriptive (Principles): What does it look like?
How does it work?
• Evaluative (Empirical results): How well does a
method solve a problem?
• Explanatory (Causes): Why does something
happen the way it happens?
• Predictive (Models): What would happen if xxx ?
Solid and High Impact Research
•
Solid work:
– A clear hypothesis (research question) with conclusive result
(either positive or negative)
– Clearly adds to our knowledge base (can we really learn
something new from this work?)
•
– Implications: a solid, focused contribution is often better than a
non-conclusive broad exploration
High impact = high-importance-of-problem * high-quality-ofsolution
– high impact = open up an important problem
– high impact = close a problem with the best solution
– high impact = major milestones in between
– Implications: question the importance of the problem and don’t
just be satisfied with a good solution, make it the best
Part 2. How to prepare yourself
for research?
What It Takes to Do Research
• Curiosity: allow you to ask questions
• Critical thinking: allow you to challenge assumptions
• Learning: take you to the frontier of knowledge
• Persistence: so that you don’t give up
• Respect data and truth: ensure your research is solid
• Communication: allow you to publish your work
•…
Critical Thinking
•
•
•
•
Develop a habit of asking questions, especially why questions
Always try to make sense of what you have read/heard; don’t
let any question pass by
Get used to challenging everything
Practical advice
– Question every claim made in a paper or a talk (can you argue the
other way?)
– Try to write two opposite reviews of a paper (one mainly to argue
for accepting the paper and the other for rejecting it)
– Force yourself to challenge one point in every talk that you attend
and raise a question
Respect Data and Truth
•
Be honest with the experiment results
– Don’t throw away negative results!
– Try to learn from negative results
•
•
•
•
Don’t twist data to fit your hypothesis; instead, let the
hypothesis choose data
Be objective in data analysis and interpretation; don’t mislead
readers
Aim at understanding/explanation instead of just good results
Be careful not to over-generalize (for both good and bad
results); you may be far from the truth
Communications
• General communication skills:
– Oral and written
– Formal and informal
– Talk to people with different level of backgrounds
• Be clear, concise, accurate, and adaptive (elaborate
with examples, summarize by abstraction)
• English proficiency
• Get used to talking to people from different fields
Persistence
• Work only on topics that you are passionate about
• Work only on hypotheses that you believe in
• Don’t draw negative conclusions prematurely and
give up easily
– positive results may be hidden in negative results
– In many cases, negative results don’t completely reject
a hypothesis
• Be comfortable with criticisms about your work (learn
from negative reviews of a rejected paper)
• Think of possibilities of repositioning a work (what
you learn from an unsuccessful exploration can often
inspire a new interesting research direction)
Optimize Your Training
• Know your strengths and weaknesses
– strong in math vs. strong in system development
– creative vs. thorough
–…
• Train yourself to fix weaknesses
• Find strategic partners
• Position yourself to take advantage of your strengths
Part 3. How to identify a good
research problem?
What is a Good Research Problem?
• Well-defined: Would we be able to tell whether you’ve solved
the problem?
•
Highly important: Who would really care about the solution to
the problem? Does it solve a big pain?
– Identify fundamental problems
– Dream big to identify novel application opportunities
•
•
Solvable: Is there any clue about how to solve it? Do you
have a baseline approach? Do you have the needed
resources?
Matching your strength: Are you good at solving this kind of
problems?
Challenge-Impact Analysis
Level of Challenges
Difficult
basic research
Problems,
but questionable impact
Low impact
Low risk
Bad research problems
(May not be publishable)
High impact
High risk (hard)
Good long-term
research problems
High impact
Low risk (easy)
Good short-term
research problems
Novel application
research problems
Unknown
Known
“entry point” problems
Good applications
Not interesting
for research
Impact/Usefulness
Optimizing “Research Return”:
Pick a Problem Best for You
Your
Passion
High (Potential)
Impact
Your Strength
Best problems for you
Find your passion: If you don’t have to work/study for money, what would you do?
Test of impact: If you are given $1M to fund a research project, what would you fund?
Find your strength/Avoid your weakness: What are you (not) good at?
How to Find a Problem?
•
Application-driven (Find a nail, then make a hammer)
– Identify a need by people/users that cannot be satisfied well
currently (“complaints” about current data/information
management systems?)
– How difficult is it to solve the problem?
• No big technical challenges: do a startup
• Lots of big challenges: write a research proposal
– Identify one technical challenge as your topic
– Formulate/frame the problem appropriately so that you can solve
it
•
Aim at a completely new application/function (find a highstake nail)
How to Find a Problem? (cont.)
•
Tool-driven (Hold a hammer, and look for a nail)
– Choose your favorite state-of-the-art tools
• Ideally, you have a “secret weapon”
• Otherwise, bring tools from area X to area Y
– Look around for possible applications
– Find a novel application that seems to match your tools
– How difficult is it to use your tools to solve the problem?
• No big technical challenges: do a startup
• Lots of big challenges: write a research proposal
•
– Identify one technical challenge as your topic
– Formulate/frame the problem appropriately so that you can solve
it
Aim at important extension of the tool (find an unexpected
application and use the best hammer)
How to Find a Problem? (cont.)
• In practice, you do both in various kinds of ways
– You use your imagination, or talk to people in
application domains to identify new “nails”
– You take courses and read literature to acquire newest
or powerful “hammers”
– You check out related areas for both new “nails” and
new “hammers”
– You read visionary papers and the “future work”
sections of research papers, and then take a problem
from there
–…
Part 4. How to identify, frame, and
refine an IR research problem?
Three Basic Questions to Ask about an
IR Problem
•
•
Who are the users?
Everyone (who has an Internet connection)
– Everyone vs. Small group of people
What data do we have?
– Web (whole web vs. sub-web)
The whole web (indexed by Google)
– Email (public email vs. personal email)
– Literature (general vs. special discipline)
•
– Blog, forum, …
What functions do we want to support?
– Information access vs. knowledge acquisition
– Decision and task support
Search (by keywords)
The Data-User-Service (DUS) Triangle
Users=?
Data=?
Services=?
The Data-User-Service (DUS) Triangle offers a general conceptual
framework for designing any intelligent information system
28
Millions of Ways to
Connect the DUS Triangle!
Everyone
UIUC
Employees
… Scientists
Web pages
Literature
Customer
Service
People
Web Search
Literature
Assistant
Enterprise
Opinion
Search
Advisor
Customer
Rel. Man.
Organization docs
Blog articles
Product reviews
…
Online
Shoppers
Customer emails
Search
Browsing
Alert
Mining
…
Task/Decision
support
29
•
High-Level Challenges in IR
How to develop robust, effective, and efficient methods for a
particular application?
– Methods need to “work all the time” without failure
– Methods need to be accurate enough to be useful
– Methods need to be efficient enough to be useful
•
How to make use of imperfect IR techniques to do something
definitely useful?
– Save human labor (e.g., partially automate a task)
– Create “add on” value (e.g., literature alert)
– A lot of HCI issues (e.g., allowing users to control)
•
How to develop more intelligent general IR techniques to
support many different applications?
Challenge 1: Optimize Search
• How to infer a user’s information need accurately?
• How to understand content of documents?
• How to match a user’s information need and
content of a document?
• How to support interactive search in an optimal
way?
• How to help users when their queries don’t work
well? (long tail queries)
Challenge 2: From Search
to Information Access
• Search is only one way to access information
• Browsing and recommendation are two other
ways
• How can we effectively combine these three ways
to provided integrated information access?
• E.g., artificially linking search results with
additional hyperlinks, “literature pop-ups”…
Challenge 3: From Information Access to
Task Support
• The purpose of accessing information is often to
perform some tasks
• How can we go beyond information access to
support a user at the task level?
• E.g., automatic/semi-automatic email reply for
customer service, literature information service
for paper writing (suggest relevant citations, term
definitions, etc), comparing prices for shoppers
Challenge 4: Support Whole Life Cycle of
Information
• A life cycle of information consists of “creation”,
“storage”, “transformation”, “consumption”,
“recycling”, etc
• Most existing applications support one stage (e.g.,
search supports “consumption”)
• How can we support the whole life cycle in an
integrated way?
• E.g., Community publication/subscription service
(no need for crawling, user profiling)
Challenge 5: Collaborative Information
Management
•
•
•
•
Users (especially similar users) often have similar information
need
Users who have explored the information space can share
their experiences with other users
How to exploit the collective expertise of users and allow
users to help each other?
E.g., allowing “information annotation” on the Web
(“footprints”), collaborative filtering/retrieval,
Look for New IR Research Questions
•
Driven by new data: X is a new type of data emerging (e.g., X= blog vs. news)
– How is X different from existing types of data?
– What new issues/problems are raised by X?
– Are existing methods sufficient for solving old problems on X? If not, what are the
new challenges?
– What new methods are needed?
•
– Are old evaluation measures adequate?
Driven by new users: Y is a set of new users (e.g., ordinary people vs. librarians)
– How are the new users different from old ones? What new needs do they have?
– Can existing methods work well to satisfy their needs? If not, what are the new
challenges?
•
– What new functions are appropriate for Y?
Driven by new tasks (not necessarily new users or new data): Z is a new task (e.g.,
social networking, online shopping)
– What information management functions are needed to better support Z?
– Can these new functions reduced to old ones? If not, what are the new
challenges?
•
•
•
General Steps to Define a Research
Problem
Generate and Test
Raise a question
Novelty test: Figure out to what extent we know how to answer the question
– There’s already an answer to it: Is the answer good enough?
• Yes: not interesting, but can you make the question more challenging?
• No: your research problem is how to get a better answer to the raised question
•
– No obvious answer: you’ve got an interesting problem to work on
Tractability test: Figure out whether the raised question can be answered
– I can see a way to answer it or potentially answer it: you’ve got a solvable
problem
•
– I can’t easily see a way to answer it: Is it because the question is too hard or
you’ve not worked hard enough? Try to reframe the problem to make it easier
Evaluation test: Can you obtain a data set and define measures to test
solutions/answers?
– Yes: you’ve got a clearly defined problem to work on
•
– No: can you think of anyway to indirectly test the solutions/answers? Can you
reframe the problem to fit the data?
Every time you reframe a problem, try to do all the three tests again.
Frame a New Computation Task
• Define basic concepts
• Specify the input
• Specify the output
• Specify any preferences or constraints
From a new application to
a clearly defined research problem
•
•
•
•
Try to picture a new system, thus clarify what new functionality is to be
provided and what benefit you’ll bring to a user
Among all the system modules, which are easy to build and which are
challenging?
Pick a challenge and try to formalize the challenge
– What exactly would be the input?
– What exactly would be the output?
Is this challenge really a new challenge (not immediately clear how to
solve it)?
– Yes, your research problem is how to solve this new problem
– No, it can be reduced to some known challenge: are existing methods
sufficient?
• Yes, not a good problem to work on
•
• No, your research problem is how to extend/adapt existing methods to
solve your new challenge
Tuning the problem
Tuning the Problem
Level of Challenges
Make an easy problem harder
Increase impact (more general)
Make a hard problem easier
Unknown
Known
Impact/Usefulness
“Short-Cut” for starting IR research
•
•
•
•
Scan most recently published papers to find papers that you like or can
understand
Read such papers in detail
Track down background papers to increase your understanding
Brainstorm ideas of extending the work
– Start with ideas mentioned in the future work part
– Systematically question the solidness of the paper (have the authors
answered all the questions? Can you think of questions that aren’t
answered?)
– Is there a better formulation of the problem
– Is there a better method for solving the problem
•
– Is the evaluation solid?
Pick one new idea and work on it
Part 5. How to formulate and
test IR research hypotheses?
•
Formulate Research Hypotheses
Typical hypotheses in IR:
– Hypothesis about user characteristics (tested with user studies or userlog analysis, e.g., clickthrough bias)
– Hypothesis about data characteristics (tested with fitting actual data,
e.g., Zipf’s law)
– Hypothesis about methods (tested with experiments):
• Method A works (or doesn’t work) for task B under condition C by
measure D (feasibility)
• Method A performs better than method A’ for task B under condition C
by measure D (comparative)
•
•
•
•
• Introduce baselines naturally lead to hypotheses
Carefully study existing literature to figure our where exactly you can make
a new contribution (what do you want others to cite your work as?)
The more specialized a hypothesis is, the more likely it’s new, but a
narrow hypothesis has lower impact than a general one, so try to
generalize as much as you can to increase impact
But avoid over-generalize (must be supported by your experiments)
Tuning hypotheses
Procedure of Hypothesis Testing
• Clearly define the hypothesis to be tested (include
any necessary conditions)
• Design the right experiments to test it (experiments
must match the hypothesis in all aspects)
• Carefully analyze results (seek for understanding
and explanation rather than just description)
• Unless you’ve got a complete understanding of
everything, always attempts to formulate a further
hypothesis to achieve better understanding
Clearly Define a Hypothesis
• A clearly defined hypothesis helps you choose the
right data and right measures
• Make sure to include any necessary conditions so
that you don’t over claim
• Be clear about any justification for your hypothesis
(testing a random hypothesis requires more data
than testing a well-justified hypothesis)
Design the Right Experiments
•
•
•
•
•
Flawed experiment design is a common cause of rejection of
an IR paper (e.g., a poorly chosen baseline)
The data should match the hypothesis
– A general claim like “method A is better than B” would need a
variety of representative data sets to prove
The measure should match the hypothesis
– Multiple measures are often needed (e.g., both precision and
recall)
The experiment procedure shouldn’t be biased
– Comparing A with B requires using identical procedure for both
– Common mistake: baseline method not tuned or not tuned
seriously
Test multiple hypotheses simultaneously if possible (for the
sake of efficiency)
Carefully Analyze the Results
• Do the significance test if possible/meaningful
• Go beyond just getting a yes/no answer
– If positive: seek for evidence to support your original
justification of the hypothesis.
– If negative: look into reasons to understand how your
hypothesis should be modified
– In general, seek for explanations of everything!
• Get as much as possible out of the results of one
experiment before jumping to run another
– Don’t throw away negative data
– Try to think of alternative ways of looking at data
Modify a Hypothesis
• Don’t stop at the current hypothesis; try to generate
a modified hypothesis to further discover new
knowledge
• If your hypothesis is supported, think about the
possibility of further generalizing the hypothesis and
test the new hypothesis
• If your hypothesis isn’t supported, think about how to
narrow it down to some special cases to see if it can
be supported in a weaker form
Derive New Hypotheses
• After you finish testing some hypotheses and
reaching conclusions, try to see if you can derive
interesting new hypotheses
– Your data may suggest an additional (sometimes
unrelated) hypothesis; you get a by-product
– A new hypothesis can also logically follow a current
hypothesis or help further support a current
hypothesis
• New hypotheses may help find causes:
– If the cause is X, then H1 must be true, so we test H1
Part 6:
How to write and publish an IR
paper?
When to Write a Paper?
•
•
Survey/Review paper:
– An emerging field or topic has appeared (i.e., a hot topic) but no
survey is available, or sufficient new development has occurred
such that existing surveys are out of date
– You’ve read and digested enough papers about the topic
Original research paper: when you have sufficient results to
draw an interesting conclusion or answer an interesting
research question, i.e., you’ve got a basic story to tell, e.g.,
– A new problem, a solution, and results showing how good the
solution is
– An old problem, a new solution, and results showing
advantage(s) of the new solution over the old ones
– An old problem, many old solutions, and results showing an
understanding of their relative performance
– In general, a research question and an answer
Before you write any paper, be
clear about the targeted readers
Typical Structure of a Survey Paper
• Introduction:
– Motivation for the survey
• An emerging field/topic, but no survey available
• Surveys exist, but they are out of date (e.g., due to new
development in a field/topic)
– Scope of the survey
•
•
Background (if necessary)
Conceptual framework ( based on synthesis of the literature)
– Define basic concepts, terminology, etc
– Give a big picture of the topic so that your survey is coherent
Typical Structure of a Survey Paper
•
(cont.)
Systematic review of existing work
– It’s very important that you have some clear structure for this part
• The structure is usually your conceptual framework, or
• other meaningful structures (e.g., by time or some way to
classify all the work)
•
•
– Be critical! Add your opinions about the work surveyed
– Don’t treat every work equally; elaborate on some representative
work and simply give pointers to other work
Summary
– Summarize the progress and the state of the art
– Give recommendations if any (e.g., for practitioners)
– Outlook (remaining challenges, future directions)
References
Typical Structure of a Research Paper
•
1. Introduction
–
–
–
–
•
Background discussion to motivate your problem
Define your problem
Argue why it’s important to solve the problem
Identify knowledge gap in existing work or point out deficiency of
existing answers/solutions
– Summarize your contributions
– Briefly mention potential impact
Tips:
– Start with sentences understandable to almost everyone
– Tell the story at a high-level so that the entire introduction is
understandable to people with no/little technical background in
the topic
– Use examples if possible
Typical Structure of a Research Paper
(cont.)
•
2. Previous/Related work
– Sometimes this part is included in the introduction or appears later
– Previous work = work that you extend (readers must be familiar with it to
understand your contribution)
•
– Related work = work related to your work (readers can until later in the
paper to know about it)
Tips:
– Make sure not to miss important related work
– Always safer to include more related work
– Discuss the existing work and its connection to your work
• Your work extends …
• Your work is similar to … but differs in that …
• Your work represents an alternative way of …
– Whenever possible, explicitly discuss your contribution in the context of
existing work
Typical Structure of a Research Paper
(cont.)
•
3. Problem definition/formulation
– Clearly define your problem
• If it’s a new problem, discuss its relation to existing related
problems
• If it’s an old problem, cite the previous work
– Justify why you define the problem in this way
– Discuss challenges in solving the problem
•
Tips:
– Give both an informal description and a formal description if
possible
– Make sure that you mention any assumption you make when
defining the problem (e.g., your focus may be on studying the
problem in certain conditions)
Typical Structure of a Research Paper
(cont.)
•
4. Overview of the solution(s) (can be merged with the next
part)
– Give a high-level information description of the proposed
solutions or solutions you study
•
– Use examples if possible
5. Specific components of your solution(s)
– Be precise (formal description helps)
•
– Use intuitive descriptions to help people understand it
Tips:
– make sure that you organize this part so that it’s understandable
to people with various backgrounds
– Don’t just throw in formulas; include high-level intuitive
descriptions whenever possible
Typical Structure of a Research Paper
(cont.)
• 6. Experiment design: make sure you justify it
– Data set
– Measures
– Experiment procedure
• Tips:
– Given enough details so that people can reproduce
your experiments
– Discuss limitation/bias if any, and discuss its potential
influence on your study
Typical Structure of a Research Paper
(cont.)
•
7. Result analysis:
– Organized based on research questions to be answered or hypotheses
tested
– Be comprehensive, but focus on the major conclusions
– Include “standard” components
• Baseline comparison
• Individual component analysis
• Parameter sensitivity analysis
• Individual query analysis
• Significance test
•
– Discuss the influence of any bias or limitation
Tips
– Don’t leave any question unanswered (try to provide an explanation for
all the observed results)
– Discuss your findings in the context of existing work if possible
• Similar observations have also been made in …
• This is in contrast to … observed in … One explanation is ….
Typical Structure of a Research Paper
(cont.)
• 8. Conclusions and future work
– Summarize your contributions
– Discuss its potential impact
– Discuss its limitation and point out directions for
future work
• 9. References
Tips on Polishing your Paper
•
•
•
•
•
Start with the core messages you want to convey in the paper
and expand your paper by following the core story
Try to convey the core messages at different levels so that
people with different knowledge background can all get them
Try to write a review of your paper yourself, commenting on
its originality, technical soundness, significance, evaluation,
etc, and then revise the paper if needed
Check out reviewer’s instructions, e.g., the following:
http://nips07.stanford.edu/nips07reviewers.html (not
necessarily matching your conference, but should share a lot
of common requirements)
Try to polish English as much as you can
What an IR reviewer often looks for
•
Most important factors:
– Realistic setup of a retrieval problem
• What kind of users would benefit from your research?
– Solid evaluation of methods
• Truly state of the art baseline
• Careful selection of data sets
– Use as many representative data sets as possible
– Always use a standard data set (e.g., TREC) if possible
• Careful definition of measures
•
• Unbiased experiment procedure
General factors:
– Quality of argument, novelty, writing, …
– Avoid all kinds of careless mistakes! (If you aren’t careful about writing,
it’s possible you aren’t careful about your experiments either.)
Where to Publish IR Papers
•
•
•
•
Core IR conferences:
– ACM SIGIR, ACM CIKM
– ECIR, AIRS
Core IR journals
– ACM TOIS, IRJ
– IPM, JASIS
Web Applications
– WWW, WSDM
Other related conferences
– Natural Language Processing: HLT, ACL, NAACL, COLING, EMNLP
– Machine Learning: ICML, NIPS
– Data Mining: KDD, ICDM
•
– Databases: SIGMOD, VLDB, ICDE
…
After You Get Reviews Back
•
Carefully classify comments into:
– Unreasonable comments (e.g., misunderstanding):
• Try to improve the clarity of your writing
– Reasonable comments
•
• Constructive: easy to implement
• Non-constructive: think about it, either argue the other way or mention weakness
of your work in the paper
If paper is accepted
– Take the last chance to polish the paper as much as you can
•
– You’ll regret if later you discover an inaccurate statement or a typo in your
published paper
If paper is rejected
– Digest comments and try to improve the research work and the paper
– Run more experiments if necessary
– Don’t try to please reviewers (the next reviewer might say something opposite);
instead use your own judgments and use their comments to help improve your
judgments
– Reposition the paper if necessary (again, don’t reposition it just because a
reviewer rejected your original positioning)
Summary
•
•
•
•
•
Research is about discovery and increase our knowledge
(innovation & understanding)
Intellectual curiosity and critical thinking are extremely
important
Creativity is a big plus!
Work on important problems that you are passionate about
Aim at becoming a top expert on one topic area
– Obtain complete knowledge about the literature on the topic (read
all the important papers and monitor the progress)
– Write a survey if appropriate
– Publish one or more high-quality papers on the topic
•
Don’t give up!
Download