Mei_MSR_Application_.. - UM Personal World Wide Web Server

advertisement
Microsoft Research Faculty Fellowship Nomination
Application Questions
Innovative Merit
Social, healthcare, and research tasks that were traditionally done individually and offline
are now accomplished much better collaboratively in online communities.
Mei’s work represents early attempts in integrative and in-situ information retrieval and
mining that target at data emerged in those communities. His unique vision is to connect
the content, the context, the crowd, and the cloud, which leads to the next generation of
community-based information system that he calls the Foreseer (“4-C-er”). He has
conducted the first formal study of the influence of context in text mining, including
environmental, emotional, and social context. This new paradigm of text mining,
contextual text mining, has helped him win two best paper awards from the top data
mining conference, KDD, and a dissertation award from SIGKDD.
Mei’s research offers insights into many other fundamental challenges in communitybased information analysis, such as how to jointly model user-generated contents and
user behaviors, how to develop efficient algorithms for web-scale data mining, how to
incorporate user interactions and domain knowledge such as social theories into the
mining process, and how to utilize mining results to support decision-making.
Mei’s work has a strong interdisciplinary flavor. He leverages techniques and findings in
multiple fields, including computer science, linguistics, sociology, network science, and
biomedical science, to facilitate his own research in social computing and health
informatics. He aims to develop an advanced set of data analysis tools and end-useroriented information systems that are appreciated by social network users, researchers,
and healthcare providers and consumers (i.e., physicians and patients).
Potential to Advance the Stat of the Art
In existing text mining systems, the context and behaviors of users are usually isolated
from the mining process. Content and user behaviors tend to be modeled separately, and
context information is usually neglected. Mei’s work advances the state-of-the-art by
bridging the gap among these isolated components in a unified mining process: text
content, context information, user behaviors, and social networks. Mei’s dissertation has
already made pioneering contribution in integrating context information in text mining,
by opening a new direction of text mining called contextual text mining. Mei’s
dissertation work contains a general framework of contextual text mining as well as many
instantiations of contextual language models that well captures the effect of
environmental, emotional, and social context in the generative process of text data. This
new paradigm of text mining not only leads to a number significantly improved methods
for information retrieval and topic modeling, but also provides the solution to many novel
and challenging research problems, such as modeling the evolution of topics in scientific
Qiaozhu Mei, University of Michigan School of Information
research, personalized search, generating faceted summary of public opinions, as well as
the discovery of topical communities from social networks.
Mei’s recent work has proceeded beyond contextual text mining, by going towards the
next generation of community-based data mining techniques that connect the content, the
context, the crowd, and the cloud. The Foreseer system in Mei’s vision is the first attempt
of integrative and in situ analysis of information generated in online communities. The
core of this new information system consists of a family of general probabilistic
community models, which advance the state-of-the-art of community analysis by
modeling the dependency between the user-generated content, the user actions, the social
network structure, and the latent context information. Such a model provides a better
understanding of the content generation, information diffusion, and network evolution in
online communities and their correlations. The proposed models unify many existing
models and lead to a number of powerful new instantiations for community-based
information analysis.
In existing data mining systems, the mining results are usually only useful to data
analysts but not directly useful to the end users who have generated the data. There is
also a lack of user participation and interaction in the mining process. Mei’s research will
develop text mining systems that are compatible with the incentive of users, actively
interact with users, adjust to the guidance and feedback from users, and learn from the
collective wisdom of users. The new text mining systems target at using the mining
results to directly benefit end users, by enhancing computational thinking and smart
decision making of users in different domains. One example is Mei’s work on assessing
the credibility of controversial and widespread rumors created in the online social
communities. Instead of relying on the endorsement of authoritative sources, the project
will provide personalized recommendations to users based on text mining and social
reputation systems. Another example is Mei’s work on a novel query suggestion service
for searching electronic health records (EHRs). The system will suggest medical search
queries to physicians and medical researchers through both automatic text mining from
EHRs and user query logs, as well as through a social recommendation mechanism.
In summary, Mei’s work will lead to an advanced set of models, tools, and systems of
community-based information analysis across disciplines. The portfolio includes
contextualized probabilistic models for text and user behaviors, scalable learning
algorithms, toolkits, and user-centered information systems for users in social
communities, researchers, healthcare providers and consumers. Mei’s findings will also
be useful for the improvement web information systems such as search engines. Besides
being evaluated using controlled datasets and controlled user study, the new text mining
techniques will also be evaluated through prototype systems to understand how effective
it is in influencing the user behaviors in the community.
University’s Current Support of the Nominee’s Work
The University of Michigan supports the work of Qiaozhu Mei in several ways. First, a
quarter of Qiaozhu's academic year salary is allocated for his research activity. Second,
Qiaozhu Mei, University of Michigan School of Information
we provide office and lab space for Qiaozhu. Third, when we hired Qiaozhu he received
a generous start-up package over $310,000 that consists of research funds and student
support. Finally, the University allows Qiaozhu to recover a small percentage of the
indirect costs on his grants which is paid into his discretionary account.
Current Funding from Other Grant Sources
* PI: NSF IIS-0968489. “Assessing information credibility without authoritative
sources.”
* Co-PI: NIH HHSN276201000032C. “Developing an Intelligent and Socially Oriented
Search Query Recommendation Service for Facilitating Information Retrieval in
Electronic Health Records.” (PI: Zheng)
* Co-PI: NIH 3-U54-DA-021519-05-S2. “NCIBI Bridge Supplement.” (PI: Athey)
Statement of Hypothesis, Objectives, and Methodology of Current Work and Plans
I. Research Objectives:
The rapid evolution of novel internet applications, especially the Web 2.0 applications,
has fundamentally changed people’s lives. It has created a huge opportunity for “the
PeopleWeb,” by bringing individuals behind the screens into the crowd of online
communities. Tasks that were traditionally done individually and offline are now done
collaboratively and interactively, which spur the users to lead better social experiences,
healthier lives, as well as accelerated scientific research. A new generation of information
systems has emerged along with this trend. Those community-based information systems
provide their users a brand-new experience featured by rich user-system interactions as
well as rich user-user interactions. On the other hand, it has also created a brand-new
opportunity and challenge for data miners, where users that were playing a passive role as
the data creators have now become part of the data themselves. Indeed, an unprecedented
volume of data is generated from these community-based information systems, which
consists of rich user-generated content, rich context information, as well as rich user
behaviors and interactions. How to battle with this huge dynamic data, model the
influence of environmental and social context on user behaviors, how to infer the
correlation between the behaviors of users and the content they generate, and how to
utilize the knowledge discovered to influence the decision-making and enhance the
experiences of end users in these communities are all important research questions, which
can lead to innovative technologies and make a large positive impact to our society.
The objective of Mei’s research is to develop theories, models, and systems for next
generation of information retrieval and data mining techniques with broad and influential
applications to online communities. The goal of such a community-based information
analysis system not only includes generating useful results for analysts and researchers,
Qiaozhu Mei, University of Michigan School of Information
but also to influence both individual and collective behaviors of users in the communities,
such as to enhance information seeking, social communication, scientific innovation, and
decision-making in health-related problems. Mei proposes to develop a new paradigm of
community-based information analysis by integrating the content, the context, the crowd,
and the cloud, and generate interpretable mining results to facilitate computational
thinking and decision-making for social actors, scientific researchers, and healthcare
providers and consumers. Mei uses an interdisciplinary research approach where he
works closely with experts in multiple fields in order to integrate the theories and
findings of social behaviors and social context into the text mining process, and evaluate
the effectiveness of a mining system based on how well the system influences behaviors
in the communities. The interdisciplinary research model of the School of Information
and the University of Michigan has created an ideal environment for Mei’s research, with
ample opportunities for him to collaborate with world-class experts in social computing,
sociology, network science, and biomedical research.
II. Hypothesis:
The key hypothesis is that many traditionally individual tasks are now done in
communities, and an integrated community-based information retrieval and mining
system can be more effective than existing information systems that isolate the content
from the users and contexts in the community. Different from traditional mining systems
that mainly target at data analysts, such an integrative and in situ information analysis
system is expected to directly influence the behaviors of end users in the community, by
facilitating computational thinking and decision-making in their personal and collective
activities. People who generate the data can also be who benefit from the system.
Mei believes that the user behaviors and the content generated in those communities
are closely correlated with each other, and both are influenced by various types of
contexts, including environmental context, emotional context, and social context. Mei
believes that community-based mining systems can be significantly enhanced through
integrative modeling of text content and user activities conditional on context
information, where users themselves become a special type of context. The communitybased mining process can benefit from interaction between the system and the users as
well as the interaction between users and users. The mining of large-scale datasets can be
facilitated by leveraging the cloud. Mei believes that mining results can be interpreted in
an understandable way to end users, and can be utilized to influence user behaviors in the
communities. The effectiveness of such a system should be evaluated by how well it
enhances tasks and decision-making of end users in those communities.
Mei believes that techniques built from this new generation of information retrieval
and mining can facilitate many interdisciplinary research areas, such as social computing
and health informatics. These techniques will also enhance existing information systems,
such as Web search engines.
III. Methodology and Research Plan:
Mei has made pioneering contribution in integrating context and content in text
mining. He has proposed a general framework of contextual text mining that consists of a
Qiaozhu Mei, University of Michigan School of Information
family of probabilistic language models that explains the generative process of text
depending on context variables including environmental context such as time and
geographic location, emotional context such as sentiments, and social context such as
social networks. The framework also includes fast inference algorithms, a mechanism to
steer the mining process with user’s guidance and personal preferences, as well as a
labeling module that generates interpretable annotations for mining results.
Mei has proposed a novel technique to regularize probabilistic language models based
on the structure of social and information networks. This regularization framework
provides a general way for incorporating structural and social context into text mining
models and can be flexibly extended to capture other assumptions and constraints about
social behavior and social influence.
The next generation of community-based information retrieval and mining system
that Mei proposes, the Foreseer system, will consist of a new family of probabilistic
community models. These models integrate the generative processes of text content and
user activities and their dependency on heterogeneous context information. Mei also
plans to develop efficient inference and mining algorithms for the collection and analysis
of very large data sets collected, especially by utilizing cloud infrastructures.
Mei expects to work closely with domain experts to design computational models for
social communities that reflect the intuitions, findings, and theory of social behaviors,
social influence, and social context and models for health informatics that reflect the
domain knowledge in healthcare and biomedical research. Mei plans to integrate such
prior knowledge into the model development, by regularizing the community models
with important objectives and constraints of how people behave in the community.
Mei plans to apply the developed community-based mining techniques to various
communities including social networking communities, online health communities, and
the community of physicians and medical researchers.
If funded, one portion of the funds will be used to support one Ph.D. student for three
years. Two nodes (12 cores each) will be added to the current SI Hadoop Cluster. The
funds will also be used to cover other research costs such as system development,
controlled user studies for system evaluation, and conference travels.
Qiaozhu Mei, University of Michigan School of Information
Download