Data Analytics of Health and Well-Being: From

advertisement
Summary Report
On
Development of a Singular Presence in Data
Analytics for The Ohio State University
To
Executive Team
by
Janet Box-Steffensmeir, Vernal Riffe Professor of Political Science
Casey Hoy, Kellogg Endowed Chair In Agricultural Ecosystems Management
William Martin, Dean of Public Health
Ellen Peters, Professor of Psychology
Gary Hattery, Project Manager Support
February 2014
Data Analytics Collaborative Framework
Acknowledgements
The authors would like to acknowledge and thank the faculty and staff members who provided
guidance, support, suggestions, and recommendations and helped provide the critical thinking
necessary in the creation of this Data Analytics Collaborative document. In particular, we would
like to thank the following participants in the process for their engagement, discussions and
willingness and dedication to helping craft this document:
Climate and Environment Team
• Dorota Grejner-Brzezinska, Civil, Environmental, & Geodetic Engineering
• Erich Grotewold, Plant Cell & Molecular Biology
• Joel Johnson, Electrical and Computer Engineering
• Jay Martin, Food, Agricultural & Biological Engineering
Complex Systems and Network Science Team
• John Casterline, Sociology
• Elena Irwin, Agricultural, Environmental, and Development Economics
• Srinivasan Parthasarathy, Computer Science & Engineering
Foundational Core Team
• Mark Berliner, Statistics
• Randy Olsen, Economics
• Peter Shane, Law
• David Tomasko, Chemical & Biomolecular Engineering
Health and Well Being Team
• Zhong-Lin Lu, Psychology
• Cynthia Carnes, Pharmacy
• Larry Schlesinger, Microbial Infection & Immunity
• Steven Schwartz, Food Science and Technology
• Peter Shields, Internal Medicine
Office of Academic Affairs Discovery Themes Support Team
 Stephen Myers, Associate Provost
 Rebecca Momany, Program Coordinator
 Mary White, Executive Assistant
 Marty Kress, Assistant VP for Research Business Development
 Varun Garg, Graduate Assistant, Fisher College of Business
In addition, support from the Ohio Supercomputer Center (Pankaj Shah and associated
faculty/staff) in providing recognition of the available data handling infrastructure around these
efforts are gratefully acknowledged.
Hundreds of faculty members, students, staff and others outside the University offered their time
and talent to generating the ideas and insights that drove this process and we wish to extend our
appreciation to them as well for their contributions to this collaborative effort. We hope that
each will benefit from new opportunities in data analytics.
i
Data Analytics Collaborative Framework
EXECUTIVE SUMMARY
A new era exists in research and application known as data science, data analytics, or Big Data.
Data streams at petabyte (1015 bytes) and exabyte (1018 bytes) scales are becoming common and
widely available via warehouse-scale computers and the internet. Due to cloud computing, very
large computers are also widely available. Big Data has joined theory and experiment to form a
triumvirate basis for the practice of science in areas ranging from cosmology and climate system
modeling to genomics and systems biology. Massive datasets on human behavior contain
information relevant to understanding the big problems facing society and individuals today.
Investment in data analytics will create a critical core capability at Ohio State as it develops
initiatives in the three Discovery Areas
and beyond.
In early December 2013, after review of
a total of 49 Statements of Intent in
response to the Discovery Themes
inaugural RFP in data analytics, the
Discovery Themes Executive Team
identified four specific areas of strength,
or clusters, for leveraging The Ohio
State University’s (OSU) presence in
data analytics. These clusters/cores are
(1) Climate and Environment; (2)
Foundations; (3) Health and Well-Being;
and (4) Complex Systems and Network
Science. At this December 5, 2013
meeting, the Provost charged these
teams with developing a Data Analytics
Framework document that would serve
as a basis for leading to OSU creating “a
singular presence in data analytics leveraged by areas of strength across the campus.”
Leadership across these groups was provided by four (4) Conveners, a Project Manager
supporting the organizational efforts and support from the Office of Academic Affairs Discovery
Themes Team office. Experience and resources were leveraged from some twenty teams that
emerged from the Discovery Themes RFP. Brief descriptions of the four core areas follow
below
The main objectives of this document are to:
1) Identify key disciplinary gaps that, when filled, will establish OSU’s eminence in Data
Analytics across the core areas
2) Recommend a set of Guiding Principles of Implementation to be used to launch the Data
Analytics Collaborative (hereafter referred to as the Collaborative) across the University
3) Identify Evaluation Metrics that can be used to assess the progress that the Collaborative
makes in its first 5 years in achieving a singular presence at OSU, advancing solutions to
the Discovery Theme Challenges, and establishing OSU among the top Universities in
Data Analytics.
Over the course of two months including holidays and inclement weather, frequent meetings of
the conveners, meetings within teams and meetings across teams this Data Analytics
ii
Data Analytics Collaborative Framework
Collaborative Framework document was developed. The DA Collaborative document consists of
two phases.
 The first phase establishes a vision for key programmatic
areas associated with developing the Collaborative, “Big
Ideas” of institutional strength that can leverage the
Collaborative, and disciplinary gaps to be addressed in
order to achieve the full potential of the
Collaborative. Results of these efforts (see figure at
right) showed how intra-cluster and inter-cluster
identification of conceptual, discipline and resource
overlaps and unique ideas, resulted in the
recommendation that around 100 positions with
strengths in seven (7) methodological areas be
considered for recruitment.

The second phase, which this effort has addressed to provide
some structure and suggestions for implementation of the
Collaborative, recommends the immediate need for an internal leadership team serving in an
advisory capacity, to develop an implementation plan using the suggestions and
recommendations provided in this Framework document as an initial guideline in creating a
strategic vision, mission and operating plan for the Collaborative. This implementation team
will ensure that momentum in developing the various aspects of the Collaborative, including
the new undergraduate major, are maintained and enhanced as appropriate.
Guiding Principles of Implementation
The following principles provide guidance on implementing the Collaborative. They are not to
be construed as policy. The principles were proposed and agreed upon by the faculty who
developed the framework and are recommended to the ongoing faculty leadership and the
Discovery Themes Executive Team as being key to implementing the Collaborative as a singular
presence and achieving success as measured by the proposed metrics.
1. Faculty hires should each fit a personality and leadership profile that is consistent with an
interdisciplinary team player, someone who actively seeks collaboration, someone with a history
of promoting success in the teams and people with whom they collaborate, someone with a
history of putting success of their collaborative groups above their own self interests. A means of
assessment for these qualities should be developed.
2. Recruiting rising stars from other institutions should be a priority (e.g., Associate Professors
who are poised to quickly reach the top of their fields).
3. A process for continued faculty leadership of the collaborative should be put in place
immediately. An Interim internal director with a Data Analytics Faculty Advisory Board should
be formed to lead implementation. They should identify and consult with data analytics thought
leaders at OSU.
4. Prioritization of faculty hires will be dependent on resolving the interests of the faculty
leadership of the Collaborative, College and Department leaders, and the Discovery Themes
Executive Team. These groups will need to negotiate priorities for hires and their potential
iii
Data Analytics Collaborative Framework
TIU’s, and resolve issues of unequal ability among TIU’s to match DT support at present;
identify members for and participate on cross-disciplinary, search and recruiting committees that
represent the TIU or TIU’s, review and approve detailed job descriptions, and provide
consistency in information provided to candidates regarding the collaborative, expectations of
new hires, and advantages of working at OSU.
5. Fairness to small and large colleges and TIU’s should be an important principle.
6. Consistent with the Framework, the hires made in data analytics should span the entire
continuum from foundational to bridging to domain-focused scholars from the outset.
7. Leveraging of support from external (i.e., businesses, institutes, foundations) as well as
internal (i.e., colleges) partners should be explored as an immediate next step, based on phase I
position descriptions.
8. Support for the existing faculty who currently provide the strengths upon which the
collaborative is built, and support for integrating new faculty into existing research and scholarly
activity, should be addressed within the first year. Forums, internal sabbaticals, and bridging
postdoctoral fellows are examples of tested mechanisms that can ensure rapid integration of new
faculty and launch of collaborations.
9. A University-wide effort should be launched immediately to convene faculty across the
various areas discussed in the Framework, providing new opportunities to discuss the framework,
plan participation in the collaborative, and begin planning for coursework, grant proposals, new
collaborations, etc.
10. Because data analytics is expected to be shared by and integrated within disciplines
throughout the University, all TIU’s, Schools and Colleges should share in data analytics
teaching, research and outreach and no TIU, School or College should be recognized as having
any special claim on data analytics as their disciplinary area.
Indicators of success for the Data Analytics Collaborative as a Singular Presence
Finally, suggestions for evaluating the progress in creation of this “singular presence in data
analytics leveraged by areas of strength across campus” were generated as an aid and a prompt
for the implementation team. The following indicators of success are recommended as a set of
measurable 5 year outcomes expected from implementing the Collaborative. Additional
measures may be proposed and associated data gathered as the implementation proceeds and the
Faculty Advisory Board and Executive Team find new specific opportunities for the
Collaborative to excel.
Success in faculty, staff and student engagement in data analytics:
 Current faculty engagement with data analytics – measured with the number of
applications for sabbaticals in data analytics and number of grants proposed

Number of co-authorships between new hires and existing faculty on peer-reviewed
publications and grant proposals

Number of collaborations with industry and other external partners with data analytics as
a key feature of the collaboration.
iv
Data Analytics Collaborative Framework
Success in executing the framework and plan:
 First hires completed in 2014

An interim Director is hired immediately
Success in teaching data analytics:
 Number of courses incorporating data analytics at the undergrad and graduate levels

Number of students enrolled in the undergraduate major

Number of MS Thesis and PhD Dissertations with data analytics in the title or key words.
Success in national and international recognition:
 Grants and contracts awarded with data analytics in the title and key words

Patent and license agreements related to data analytics and software or hardware
developments

Number of peer reviewed publications with data analytics in the title and key words

Metrics used in the Battelle report, repeated in 5 years.
The following four sections provide brief summaries of each of
the cores: (1) Foundations; (2) Health and Well-Being; (3)
Climate and Environment; and (4) Complex Systems and
Network Science
The Foundations of Data Analytics:
Fundamental Needs for Advancing the
Discovery Themes
The Foundations group provides the domain-independent core
critical to current and future research in virtually all areas of
data-science investigation and applications of that research to
policy and decision making of societal importance. It consists of:
(i) the theory and practice of data science that is the basis for
cyber-enabled discovery; and (ii) the transformational and
synergistic approaches to legal and regulatory issues, decision
science, and social- and health-science Big Data resources that will guide, support and enhance
data-analytic applications for humans and society. In the Foundations core, education of the
current and future workforce and partnerships with a variety of institutions and businesses will be
paramount. We will revolutionize data analytics as the emerging basis for science, policy, and
decision making across the OSU campus and beyond.
The challenge for OSU is to lay a lasting foundation for collaborative data-analytic contributions
to the big problems facing society and individuals today and tomorrow. Data-analytic approaches
can catalyze solutions across levels of human activity from the individual decision maker, to
microsystems (such as engineers and health care providers) to macrosystems (such as businesses
and governments) that exist within an even larger geo-political context. The Foundations core
will provide leadership and synergy in Data Analytics and its applications to real world problems.
The two primary foundational area notions are set in a landscape of collaborations in the areas
identified in this document and their critical roles in making OSU the leader in research on the
v
Data Analytics Collaborative Framework
Discovery Themes (see figure on the right). First, advances in data science will produce the
demanding computing technologies, algorithms and software, and methodological techniques that
will enable compilation and combination of data needed for application-specific research
breakthroughs and to enable evidence-driven decision making. The second notion concerns
humans and societies in the 21st Century and how they shape and can be affected by Data
Analytics. Data are being collected, managed, and analyzed in ways never before imagined.
These uses require improved tools and resources to influence decision making and policy across
diverse societal issues from energy and the environment to health and well-being. At the same
time, these breakthroughs question our notions of privacy, personal and national security, ethics,
and social equity and require new regulatory and legal frameworks that recognize the realities of
our new society. Hence, though they appear quite diverse, the two primary components of
Foundations interact as technological and data science advances are combined with recognition of
emergent opportunities and challenges from the human side of the equation in ways that will
ultimately allow us to transform society for the better. In addition, through training of students at
all levels and retraining current workers and scholars, the Foundations team will support the
development of a 21st century workforce.
Data Analytics of Health and Well-Being: From the Molecule to the
Community
Our reality is that expanding human, animal and plant populations will continue to share a
complex ecosystem on this small planet and increasingly stress human health and well-being
from early child development to the end of life, made worse with limiting resources, unhealthy
behaviors and poverty. Health and Well-Being result from a healthy environment and lifestyle,
access to health care, and opportunities for individuals and communities to be productive and
happy. This is achieved by who we are, how we behave in our culture and environment, choices
we make (e.g., food we eat, use of tobacco, etc.), with whom we coexist (e.g., from microbes to
vertebrates), how we can improve health-care and how we care for ourselves as individuals and in
our community. The Health and Well-Being core, which builds on existing strengths at OSU,
can counteract our future challenges and stressors by advances in science and technology through
the use of big data characterizing multiple dimensions from the molecule to the community. Big
data and systems analysis will allow us to understand how seemingly disparate factors move
together to affect our health and well-being, and the interventions we can do to improve this. For
example, changes in a person’s environmental exposure and diet (individual and population scale)
alter the probability for health and disease. These changes are mediated and can be predicted by
an individual’s genetic make-up and gene expression, the metabolome for a given biochemical
pathway and or microbiome, psychological state and social context, which in turn can adversely
affect health and well-being. For all of these pathways from the individual to the community,
understanding and predicting the complexities of the overall ecosystem affecting health and wellbeing is essential to improving disease prevention, early treatment paradigms, and promotion of
life-long quality of life for individuals and for the
communities in which they live.
Overarching Conceptual Overlap: Control
Theory for Health & Well-Being The Health
and Well Being cluster has a common purpose to
use a systems approach to develop new
knowledge and fundamentally new strategies to
improve health and well-being. We have a
common intent to intervene in the systems
equilibrium from the molecule to the community
vi
Data Analytics Collaborative Framework
to improve health outcomes. The interventions in effect attempt to control or modify the existing
systems equilibrium, and this represents a major conceptual overlap across the Health and WellBeing cluster. The fundamental premise of the theory on systems control which is particularly
appropriate to health and well-being problems is that even mild inputs or stimuli, when properly
administered, can be effective in changing the behavior of a system of components. In many
applications a mild control is specifically required in order to tune the system without destroying
its fundamental nature. Mild controls also have typically better potential to reduce interventions’
undesirable side effects and to improve system stability. For instance, administrating an
antimicrobial drug to a microbiome or a chemotherapy to a cancer patient, or enforcing the antismoking ban at OSU campus are all examples of mild interventions in the specific biosystems
with the goal of altering their parameters just enough for achieving the desired change of
equilibrium. Due to various technological and scientific limitations as well as diversity and
uniqueness of specific scientific problems, no unified methodological approach to such problems
in the context of health and well-being currently exists. However, the advancements in analytical
as well computing methods within the last decade as well as the increased data collection
capabilities in the life sciences have made it possible to start asking general questions on the
problem of controlling the biological health systems.
The success of any effective approach to improving health and well-being rests, ultimately, on
how well the available information about the health systems is collected, processed and utilized in
order to make informed choices that will achieve an optimal homeostatic (healthy) state. As the
data will differ substantially across various areas of health and well-being related activities, in
order to develop proper strategies for complicated biological multicomponent
systems, OSU needs to first develop a comprehensive and multifaceted approach to mining,
analyzing and modeling the increasingly vast amount of data (local, national and global). The
challenge is formidable, since the sheer amount and complexity of the information collected with
various modern research tools, ranging from the DNA sequencers to population surveys and
satellite imaging and spanning multiple physical scales from the molecular to individuals’ levels,
defies any currently available standard approaches. To give a simple example, consider the area
of health promotion, where in order to promote the healthy behavior of a community one needs to
analyze how personal level decisions contribute to the global changes in behavior. The
mathematical theory required to answer such questions is known as the control of interacting
particles systems. In this case the mathematical results describing the system behavior, when
coupled with the appropriate data analytics methods, are likely to hold the key on how to inform
the policies encouraging the consumers to increase healthy behaviors desired for pushing the
overall community towards the more healthy state. The use of such approaches requires us to
bridge the gap between the theory and practice of data science and their capacities to identify and
understand options to make informed decisions.
Data Analytics in Climate and the Environment to meet Discovery
Theme Challenges
The profusion of data that is being generated to
measure trends in climate and environmental
change spans from global to ecosystem to
molecular scales, including weather and climate
sensing and modeling, water and land use/land
cover satellite data, aerial and ground based
sensing systems, and multilevel –omics data.
The big ideas proposed for a data analytics
Climate and Environment core recognize and
build upon these current and anticipated data
vii
Data Analytics Collaborative Framework
streams by an existing network of over 200 faculty who contributed to big ideas and whose
collaboration, as part of both intramural and extramural partnerships, will be enhanced to meet
the Discovery Theme challenges.
Major earth systems proposed for focus and integration through data analytics include
fundamental climate and weather processes; watersheds and the land-water interface; foodsheds
and agriculture, particularly plant systems that are the foundation for all food production and
greenhouse gas sequestration; cities and the built environment; and global to regional scale
integration across these major systems. Specific research activities in data analytics include data
generation, processing and manipulation, integration, analysis, modeling, decision support, and
policy. The major systems and levels of data analytics, along with specific technologies and
research themes, create a matrix framework for the research areas that will build data analytics
capacity for climate and the environment.
Proposed research themes will build and synergize current strengths and lead to global solutions
to challenges in climate and the environment. Climate science has been about big data and
models for decades. While its predictions are gaining credence and sophistication, research is
needed to better understand the complex atmosphere to geosphere relationships and translate
predictions to adaptation and mitigation strategies. Technological aspects of sensing and
monitoring systems as well as data integration, analysis and modeling at global to local scales are
key to the science envisioned for this cluster and provide strong linkage to the Complex Systems
and Network Science core. Social science, including demographic and behavioral changes
related to climate, ecosystem services and ethics of data ownership and integration among
individuals and corporations, in precision agriculture for example, also form a strong link to the
Foundations core. Health is impacted by diet and food security, environmental impacts on
plants, pathogens, water quality and human demographics, creating strong linkage to the Health
and Well-being core and related Discovery Theme. Areas of research more unique to climate and
environment issues, and addressing the Energy and Environment and Food Production and
Security Discovery Themes, include the interfaces between the atmosphere, oceans, and polar
regions with terrestrial dynamics in watersheds, foodsheds and agriculture, built urban, and
associated landscapes and land uses. Genomics and environmental data on the critical plant
systems upon which food, materials and renewable energy rely, provides opportunities for
adapting plant systems to a changing climate and mitigating climate change through their role in
C and N cycles. Cities and the built environment present their own set of data and challenges
including optimizing energy and food distribution across foodsheds, while managing watersheds
to provide downstream ecosystem services.
Proposed faculty hires will be expected to fill gaps and create synergy with the large existing
faculty and partner network in the Climate and Environment core at The Ohio State University.
Integration and interdisciplinary networking within and among the levels of analysis, major earth
systems, technologies, and topical research areas described above will provide an exciting,
compelling and productive environment for research, teaching and outreach, and position The
Ohio State University for meeting the major challenges inherent in our Discovery Themes.
viii
Data Analytics Collaborative Framework
Complex Systems and Network Science: Moving Beyond Disciplinary
Boundaries to Unlock Transformational Solutions
Complex Systems and Network
Science constitute an
Network Science &
interdisciplinary area of inquiry
Behavioral Sciences
that transcends traditional
knowledge domains by focusing on
the fundamental interdependencies
Policy
Social and
Ethics
Behavioral
of components within systems.
Governance
Systems &
Networks
Examples abound from social
networks to coupled human and
Biological and
Network
Health Systems
Visualization
COMPLEX
natural systems, from financial
& Networks
SYSTEMS
AND
networks to disease systems, and
NETWORK
Data Mining
Sustainability
from telecommunication networks
SCIENCE
& Statistical
Science
Learning on
to energy and power systems. It is
Networks
the interconnection among these
Network Data
Systems
Network Analytics:
Management
Modeling &
Coupled Human and
components that often sit at the
Algorithms, Models &
Dynamics
Natural Systems
Systems
heart of our most vexing global
grand challenge problems,
including climate change, energy
demands, food insecurity, health
and wellness, and livelihoods and
poverty. Accordingly, the study of
complex systems and networks –
understanding their intrinsic
properties, changes to their structure over time or due to external factors, multi-scale behavior of
individuals to coarser grained modular communities – can afford important insights about optimal
strategies for tackling such grand challenges.
Complex systems are, by their very nature, dauntingly difficult to describe and even harder to
explain. While scientific advances have improved our understanding of various component parts,
our pursuit of deeper knowledge has been stymied by a failure to understand the connections
across these component parts. Currently our ability to produce and store data that relate to such
complex and networked systems has far outstripped our ability to analyze and utilize this data to
derive actionable insight. The crucial insights in such data often reside in the implicit
interconnections or explicit relationships among individual entities. Further advances in the field
of Complex Systems and Network Sciences depend on our ability to harness the increasing
availability of vast amounts of data, along with more sophisticated computing power and
modeling techniques, to develop and test new theories of interconnectedness in complex systems.
The challenges are manifold and include the ability to manage, model, visualize, and analyze
large scale complex systems and networks with rigor while facilitating effective decision making,
the ability to abstract common concepts across fields to realize new ways of thinking about
problems, and the ability to analyze such data in the presence of noise, variations in model
assumptions, dynamics and uncertainty. Bridging the complexity chasm in Systems and Network
Sciences requires not only the ability to handle the data deluge (storage and representation),
although this is an essential prerequisite, but also the ability to integrate information in the
presence of uncertainty and complex dynamics while making effective use of insights specific to
relevant scientific fields of inquiry.
Complex Systems and Network Sciences represent one of the most promising scientific
approaches for deriving transformational analytic solutions to 21st century problems. Confirming
ix
Bridging
Faculty
Data Analytics Collaborative Framework
this judgment, Network Science is recognized as one of four research frontiers in the U.S.
National Science Foundation’s agenda-setting statement Rebuilding the Mosaic (2011). The
Executive Summary of this report additionally stresses that: “Future research will be
interdisciplinary, data-intensive, and collaborative”, which is what this project sets into motion
for Ohio State University.
The establishment of a Data Analytics Collaborative with Complex Systems and Network
Sciences at its core will provide a mechanism for scholars to interact beyond disciplinary
boundaries to identify, learn and develop new methods for data extraction and analysis that is
guided by and informs Complex Systems and Network Sciences theories and methods. The
overarching goal is that Ohio State becomes a leader in an area that is certain to be central to all
domains of science during the next few decades. Doing so will be critical for transformational
solutions to 21st century grand challenges in areas such as Climate Change, Energy and
Sustainability, Food Security, and Health and Wellness.
The figure represents an integration of ideas in Complex Systems and Network Science and lays
out suggested hires that will come from a wide variety of disciplines. Central to accomplishing
the goals and big ideas will be hires who can integrate both methods and substance in the quest
for transformative solutions.
x
Download