Executive Briefing Series Big Data and Business Analytics: Realizing Opportunities

advertisement
Executive Briefing Series
(Volume 6, Number 1)
January 2013
Big Data and Business Analytics:
Realizing Opportunities
An Executive Summary of the November 16th, 2012 Workshop
written by Dr. Erran Carmel and Mr. Michael Carleton
edited by Dr. Gwanhoo Lee and Ms. Marianne Du
Contents
1. Presentations
 Michael Brown, Chief Technology Officer, comScore, Inc
 Michael W. Carleton, Senior Research Fellow, American University
(Former CIO, U.S. Department of Health and Human Services)
 Dr. Erran Carmel, Professor, American University
 Jill DeGraff Thorpe, Vice President, Strategic Initiatives & General
Counsel , AFrame Digital
2. Group Discussion
Facilitated by Michael Carleton, Senior Research Fellow, CITGE
Foundations and Questions
By Dr. Erran Carmel, Professor, American University
Michael W. Carleton, Senior Research Fellow, American
University, (Former CIO, U.S. Department of Health and
Human Services)
Carmel set the groundwork on Big Data giving the audience some
background. He used this definition:
Big Data – the amount of data just beyond technology’s
ability to store, manage, and process efficiently
Carmel asked: How big is Big? Carlson called later called it the
data deluge. At the organizational level “Currently Big Data
refers to data volumes in the range of exabytes (1018) and
beyond.” He gave an estimate that 2 zetabytes (2x1021) of data
created in 2011 alone. The numbers are staggering: Facebook
creates 10 terabytes of data every day. Large Hadron collider
generates 40 Tb per second. 30 billion RFID tags are produced
every year. Mammograms create vast volumes of data, but
interpretation is uncertain. We are now in the early Industrial
Revolution of Data. Data is the new “raw material of business” –
says the Economist magazine. Rate of data generation exceeds
storage capacity and certainly exceeds our ability to use it.
Carlson pointed out that often we get large volumes of data but we
don’t know how to interpret them yet. Public assumes that all this
data will somehow make us “better”, but…We need to find the
sweet spot between resources available for data collection and value
of data.
Do you know the Three V’s
of Big Data? You should
know this one by now. It is
in the introduction to every
big data discussion:
 High Volume
 High Velocity
 High Variety – data are
very complex (from
sensors, internet, etc.)
Do you know the Types of
Analytics? You should also
know this one by now. It is
also in the introduction to
every big data discussion:
 Descriptive – e.g.,
Medical
 Estimative
 Predictive – Business,
Politics (Nate Silver
predictions about the
Obama win)
 Prescriptive - Medical
Carleton offered several observations about Big Data from a practitioner and public sector point
of view and emphasized the opportunities to create commercial business value and social value
while avoiding offending consumers and citizens.
As an example of the potential pitfalls in offending customers and citizens, he raised the need to
ask more questions about security/privacy before proceeding with aggregations and mash ups
from disparate data sources, specifically: How to share data without compromising
security/privacy?



How do non-government actors use big data? What are the security/privacy implications?
(Take Note: From a legal perspective, we should more fear “Little Brother” (i.e., private
business entities) uses of personally identifiable information rather than “Big Brother” (i.e.,
government)
How to address the severability of data creation from value creation?
What influences consent rates for data creation/sharing?
CITGE Executive Workshop

January 29, 2013
Each country has different definitions of privacy and propriety with differing frameworks
around personally identifiable data as protected by property versus human rights.
Carleton shared more examples of “the Data Deluge:” high rates of data capture (e.g., there are
already 1 billion transistors per human on the planet), the degree to which Tim O'Reilly
measures the growing gap between “information created” and “available storage,” and the
explosion of unstructured data like graphics and video files. He then offered an overview of the
work by the National Institute of Standards and Technology to come to terms with the emerging
technologies in a manner that balances beneficial opportunities with avoidable risks.
After touching upon NIST's differentiation between SQL and NoSQL Big Data frameworks, and
Tene and Plonetsky's urgent call for an update of privacy protections, and Armour and Kaisler's
taxonomy of types of analytics, Carleton shared some anticipated findings from work in progress
on Big Data business cases. He asked about the widely varying Business Models emerging
around Big Data, especially data that comes from the public sector and is put to use (often in
creative secondary and tertiary uses) by private sector businesses and academic centers. These
included but are not apparently limited to:
 Use dynamic data streams (heavy flow of real time data)
 Keep exclusive “treasure trove” of data, sell to others
 Data is free but algorithms are proprietary
 Big data enabled value chain
He then offered a concise contrast between Federal government stewardship of big data
sets and Commercial Sector exploitations of big data sets:





Many constraints (legislative and regulatory) on collection of information by Federal
government
Politicians often restrict data use to avoid regulation and gain votes
Government can only collect data for defined purposes and only retain it for limited periods
Agencies dissuaded from multiple uses of data
Need cooperation from citizens and government transparency to sustain Federal government
information stewardship
Next Carleton examined the ease and consequences of decoupling value creation and
value extraction in the Information Age, touching upon:




Value creation vs Value extraction
Value creation using analytics on others' data– avoiding costs to generate data
Value extraction – run algorithms to finely target sectors and avoid or intentionally neglect
others
Potential conflict in which creators don’t want to share with extractors
-1©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Workshop
January 29, 2013

Should we treat taxpayer-funded data sets as a public asset to be stewarded like public lands
or water resources?
Carleton gave an example that is close to his heart from DHHS: NHANES – the National Youth
Fitness Survey (NYFS) which collects data on exercise and the nutrition habits of U.S. youth
collected through interviews and fitness tests.
Case 1: comScore
By Michael Brown (Founder of comScore, entrepreneur for two decades)
comScore based in Reston, 1000+ employees, is a huge data collector and analyzer. It is
essentially a data factory and delivers “digital intelligence.” The growth is staggering: 250
million records created every day in 2005; 1.6 billion created daily in 2009; 2.5 billion per hour
in 2012.
As data quantities were increasing in after 2005, comScore realized that it needed to update
methodology. It created tracking code for websites, every visit creates information
The diagram on the right shows the two
main flows routed into one of two big data
systems: Greenplum (Greenplum database
is lab for experimentation/ideation, not
production) – big SQL database and
Hadoop. Both of these converge to a data
warehouse in Sybase.
Now come the big business challenges.
Can we gain new value/insight from data
aside from its original intent? Can you gain
new revenue? Solve a new problem? We
are now able to develop actual product in 4
hours using ~100 lines of PSQL to create
revenue generating product
We are able to see some interesting trends on device Essentials. We used dataset to determine
non-PC device share of online traffic by country. For example, we found that Mobile is very
popular in Singapore, less so in North America. We found that iPhone users prefer Wi-Fi/LAN,
while Android prefer mobile
-2©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Workshop
January 29, 2013
The audience asked:
 Is privacy a concern? Brown answered that comScore designs systems to destroy native IP
address (obfuscation) so information can be logged without violating privacy laws, other
types of personal information stripped from dataset before processing
 What has to be disclosed to users? Brown answered that there is an Opt out page, clients
must sign agreements on data use
 How do you account for user self selection during data generation? Brown answered that the
take cross section and bundle with demographics
 What about user preferences (i.e. Washington Post vs New York Times)? Brown answered
that Data is compartmentalized to avoid bias
 What development framework do you use? Brown answered Agile/scrum with 250 member
team, about 30% software engineers
 How does experimental environment transfer to an individual company? If companies want
insights from big data, can they run analysis on smaller scales? Brown answered
 Software historically requirements-based
 Big data demands flexible framework that can handle undiscovered
requirements
 Large flexible datasets require flexible algorithms
 Tools less important than philosophy of data selection
 How will technology need to evolve to accommodate large data sets?
o Speed is absolutely core (refer to Dr Black’s last lecture)
o Must refine ways to make algorithms more efficient
 What are the interesting research areas? What do you need to know?
o Parallelized algorithms
o Sampling methods
o Translate anonymous data to discrete consumers
Case 2: AFrame Digital
By Jill DeGraff Thorpe, Vice President, Strategic Initiatives & General Counsel
Jill is from a healthcare startup that revolves around new ways of data capture and analysis.
While it isn’t a Big Data company yet, it will be soon. AFrame Digital
addresses the critical and growing need for long-term care for aging
population. By 2020, it is estimated that 12 million older Americans
will need long-term care.
The central technologies are a wristband device (pictured) and a mesh
network in the house (pictured further below). The firm is developing
-3©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Workshop
January 29, 2013
non-intrusive for real time fall detection and emergency services calling. Fall sensors can also
capture rates of exertion. The mesh network in the home environment triangulates user
movements in house to identify patterns of behavior (using the bathroom too often, visiting the
kitchen at night, etc.)
For example – 91 year old woman fell, became unconscious, but system detected fall and warned
family, got her to hospital, showed doctor three months of sensor data to guide treatments. This
illustrates the need to develop way for aging and unhealthy population to live with dignity while
reducing unsustainable healthcare costs. It is especially important for underserved communities.
Jill asked: How can we bring intelligent
alerting and analytics to relevant data?
first, medical use is most important issue
for data systems. For example: IBM
Watson used for clinical diagnosis and
care management instructions.
Providers are beginning to recognize
that data gathered in the home is more
important than clinical data. Such next
generation providers are developing
holistic, 24/7 medical data collection.
The goal is to identify dangerous trends
and warn users/doctors. Contrast that
with the older “Lifeline” which had a
stigma, Aframe must develop nonintrusive socially acceptable data gathering with attractive value proposition for consumer.
How can researchers integrate with this? NIH and DARPA grants are a good foundation. A
project example: effectiveness research to see if real-time continuous monitoring has positive
impact on user health and treatment.
A key weakness in our healthcare is that 5% of population creates 50% of expenditures. Health
reform aims to improve return on expenditures. Spending rates level off after childhood then
spike after age 65. Where do these high expenditures occur? Transfer across care settings
(highest risk for re-admittance to hospital). One of the settings is the home. So we need to adopt
home as primary point of care. There is emerging consensus among doctors and medical
researchers of importance of home care. There is a need to migrate from episodic to continuous
care. Tele-monitoring enables new models of care. It is expected that an integrated approach of
enabling doctors to deliver new outcomes.
There were many questions from the audience to this game changing technology for health and
for health data.
-4©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Workshop


January 29, 2013
What about monitoring patient compliance? Jill responded: we’re a $180 billion annual
market for medication adherence. Many technologies in use, but none incorporate
biofeedback so none are perfect.; but AFrame’s product monitors biofeedback (heart rate,
blood pressure) to track medication consumption
What types of data do you collect? There are 3 dimensions: Stability (accelerometer) about
10% of data; Biometrics – any wireless sensor device can be pulled for vitals tracking data;
Activity – measure rates of exertion and patterns of behavior.
Does device have Wi-Fi? No because, it would create too much battery drain. Microcontroller
does some analytics and then opens net connection when threshold reached.
How mainstream is this? Is it limited to the upper range of the market? There is interest in
demand across market, but focused on demand driven by health reform and pay for performance
ideas. PACE program is used as model for super-generalized data. It is very complicated for
indigent elderly to navigate Medicare/Medicaid framework – so the firm developed a per-user
model. New medical payment and treatment models will spur demand.
What about coordinating MAC addresses of data points? Good idea, but also limited by battery
life.
Who pays for this? Is it reimbursable by insurance? Providers have inherent financial interest in
keeping clients well and out of hospital. Obama care is driving switch in model by shifting risk
to providers, creates incentive for insurers to focus on general health. Insurers, providers, large
employers may use it to reduce risk positions. Value proposition will ultimately come from
health providers with local geographic monopolies (can’t steal customers from competitors so
must boost efficiency and reduce risk to increase profits). And in a nod to Big Data: there are
reams of opportunities in effectiveness research!
Carmel – Big Data in Academic Research
In this young area there are two vectors. Whereas the Management perspective is mainly
advocacy based research with very little empirical work, the Computer Science folks are all over
this. Very gung ho about Big Data. Great innovation in algorithm design. There are orders of
magnitude greater (100+ articles on Big Data added to ACM every month). At least three major
workshops held around the world. Carmel gave a sampling of research and topics in this field:
CluChunk – looks at clustering large scale data. How to take large amounts of messy data and
organize it into manageable and efficient chunks to optimize performance. CS focused, looked at
effectiveness of NoSQL databases for managing big data; Flex-KV, with Authors from IBM and
Carnegie Mellon, looks beyond NoSQL solutions. Propose a flexible key-value storage system
-5©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Workshop
January 29, 2013
Why is management perspective lagging? Probably because it tends to look at past rather than
future. Our field needs to incorporate predictive analytics for theory generation across academia.
One article Carmel does recommend is the LeValle survey (2012) article from HBR which is a
study sponsored/run by IBM looking at large corporate users of Big data
Carleton – The Philosophy of Big Data
Mike ended by discussing the Philosophy of Big Data. Current models focus on building digital
platform for a business and owning part of the architecture. The Key question in his mind is what
are data transmission requirements for business success? Businesses must decide how to handle
big data – develop own framework or hire experts? Many opportunities being created – nearly
infinite potential for value creation, rewards are so great that distribution is nearly arbitrary
Open Discussion
Question for audience: How has big data affected your business?
 Huge expense up front (JHU spent $600 million on record system, single biggest expense
after new buildings), so must carefully select projects to maximize value return
 Start with set of problems, find/buy/access data, and build models and potential solutions.
But some problems don’t have solutions – solution perturbs problem. Requires constant,
incremental data gathering. Extremes are data mining (no domain knowledge in algorithms,
process everything) and antagonism (look at details and you miss the forest, but if you look
at the forest you lose the details).
 Need to pay attention to government use of big data – already being used in intelligence
community and healthcare, but academics should look for opportunities in other federal
agencies especially in light of tightened fiscal budgets. Is there data already owned by
agencies that could be reprocessed for new value (esp. “dark data”). Are there nuggets in
stored data that can be analyzed anew? Sensor data from flights?
o Mike: Federal government has taken position that if they make data exposed to
public, someone will use it. But should the feds try to add value to data? Big
ideological divide. Who gets to repackage and sell data once it has been collected?
Government is taking a hands-off approach to data value and leaving it to private
sector. But there are explicit restrictions on usage of medical and financial data, so
agencies are averse to exposing it to the public. Others are so overwhelmed by
volume of data that they do not have resources to extract additional or secondary uses
of data. There is also a fear that politicians/public will misinterpret data due to poor
understanding of statistics and cause more problems. But there is still a possibility of
-6©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Workshop
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
January 29, 2013
creating a value chain where the government collects large data sets and makes them
available for public processing
Big seller and driver for openness will be fraud prevention. Once value has been
proven, public will want to create other value such as disease prediction
Government often collects large volumes of data, but only uses small subset related to
mission. The government does not see data collection as its role.
Former astrophysicist with Crohn's Disease keeps “amazingly complete” data set
about his digestive system.
Many restrictions on when and how government can collect data,
Regulatory framework still based on paperwork reduction act when data collection
was expensive
Rules are focused on recurring rather than opportunistic collection
So much big data being collected, so challenge is sifting through large sets to figure
out what’s valuable. Storage growth is very important. How can businesses educate
customers on different types of data
Challenge is developing good rules on how and where to use datasets, and then
communicating them to customers
Ideal big data professional is a skilled programmer who can work with complex
technical tools and a mathematician who can tailor and develop algorithms.
Maybe that's too many skills for one person to master. The key to managing big data
may be assembling strong cross-disciplinary teams
Businesses want to capture every single customer interaction, down to discrete page
views and develop tools to react in real time.
Business readiness to embrace big data is very important too. Good data does not
guarantee good decisions, and giving companies more data will not always lead to
better decisions. Integrating big data is as much about refining management processes
as it is about developing new technology.
Some businesses are paralyzed by the scope and difficulty of big data systems – they
may have access to useful datasets but lack the technical or managerial skills needed
to leverage it
Organizations need to develop ways to openly share large datasets – at least within
the organization. See importance of cross-disciplinary teams.
See Cheesecake Factory for good example of big data implementation – referenced
this article http://www.newyorker.com/reporting/2012/08/13/120813fa_fact_gawande
Tools are not suited to average person. Developing new tools that allow people to
visualize and analyze big data is key to full leverage.
One of the things we need to think about is: how much data is enough? You can
gather so much data that you cannot effectively process it. There is a point where
additional data produces diminishing returns. Need to develop ways to measure
“goodness” of data – both in terms of quality and value delivered.
Also important to account for motives of data creators. Are they truth seeking or do
they have an agenda? Different specialists can reach different conclusions from same
-7-
©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Workshop
o
o
o
o
January 29, 2013
dataset even if motives are pure. We are operating under the theory that giving
experts more data will guarantee better conclusions, and there is evidence that this is
not true.
What about NP-hard problems where more data will have little/no meaningful
impact?
Behavioral economics shows degradation of people's reasoning skills as size of data
set increases.
Companies would be more willing to share data if there was a stronger framework for
them to share value derived from it.
Rising importance of big data highlights importance of strong search algorithms
-8©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Workshop
January 29, 2013
Presenter Bios
Dr. Erran Carmel
Professor
American University
Professor Carmel teaches information technology with a specialty in globalization of technology. He
studies global software teams, offshoring of information technology, and emergence of software
industries around the world. His 1999 book "Global Software Teams" was the first on this topic and is
considered a landmark in the field, helping many organizations take their first steps into distributed tech
work. His second book "Offshoring Information Technology" came out in 2005 and has been especially
successful in outsourcing / offshoring classes. He has written over 80 articles, reports, and manuscripts.
He consults and speaks to industry and professional groups.
Michael Carleton
Senior Research Fellow
American University
(Former CIO,
U.S. Department of Health and Human Services)
Michael W. Carleton served as Chief Information Officer (CIO) for the United States Department of
Health and Human Services (HHS) and the General Services Administration (GSA). Mr. Carleton holds a
Master of Science in Information Resources Management from Syracuse University and a Master of
Public Administration from Northeastern University. He is also a distinguished alumnus of the National
Defense University’s Information Resources Management College and the Society for Information
Management International's Regional Leadership Forum. He is a past president of the Capital Area
Chapter of the Society for Information Management.
-9©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Workshop
January 29, 2013
Michael Brown
Chief Technology Officer
comScore, Inc.
Michael Brown was a founding member of comScore, Inc. in 1999. He leads the company’s technology
efforts to measure Internet and digital activities. He has been responsible for over 17 patent applications
at comScore, three of which have already been issued by the U.S. Patent and Trademark Office. Prior to
joining comScore, Mike worked on projects that included a large help desk deployment and
modernization effort for Deutsche Bahn in Frankfurt, Germany. In 1993, Brown cofounded Pragmatic
Image Technologies, a consulting group focused on implementation of IBM’s ImagePlus. One of the core
projects completed was the successful rollout of ImagePlus at Pennsylvania Blue Shield to over 1,200
users, resulting in the largest image workflow installation on the East Coast at that time. Brown holds a
bachelor’s degree in computer science from the University of Maryland and a master's in computer and
information science from Hood College.
Jill DeGraff Thorpe
Vice President, Strategic Initiatives & General Counsel
AFrame Digital, Inc.
Jill DeGraff Thorpe brings 20 years’ experience advising public and private companies in corporate,
strategic partnering, M&A, structured finance, technology acquisition and private equity
transactions. She is familiar with all corporate operating functions and has advised on key matters in
intellectual property, product development, sales and marketing, contracts, corporate compliance, risk
management and human resources. Ms DeGraff Thorpe was Associate General Counsel for CyberCash,
serving as its corporate and securities counsel. Before that, she practiced law at Morrison & Foerster,
specializing in corporate, securities and financial transactions. She holds a B.A. cum laude from
Wellesley College and a JD from The University of Virginia School of Law.
- 10 ©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Workshop
January 29, 2013
Confirmed Attendees (ordered by affiliation)
First name
Last name
Jill
Thorpe
Engin
Mike
Erran
Mary
Cakici
Carleton
Carmel
Culnan
Organization
AFrame Digital, Inc.
William
J. Alberto
Keyvan
Michael
Itir
Jill
American University
American University
American University
American University &
Bentley University
DeLone
American University
Espinosa
American University
Gheissari
American University
Ginzberg
American University
Karaesmen-aydin American University
Klein
American University
Irene
Gwanhoo
Lam
Lee
American University
American University
Kelsey
Phyllis
Lee
Peres
American University
American University
Kamalika
Sandell
American University
Matthew
Bob
Sandra
Paritosh
Margaret
Larry
Filippo
Michael
Mindy
Yvonne
Patrick
Vanessa
Ron
Shannon
Sloan
Smothers
Uttarwar
Weber
Fitzpatrick
Morelli
Brown
Ko
Chaplin
Murray
Sherman
Renjilian
Mohamoud
Jibrell
Steve
Peter
Curtis
Carol
Kaisler
Keen
Generous
Hayes
American University
American University
American University
American University
American University
Computech, Inc.
Computech, Inc.
comScore, Inc.
COTELCO Center
CSC
CSC
CSC
Emerios Government
Services
Howard Hughes Medical
Insititue
i_SW Corporation
Keen Innovations
Navy Federal Credit Union
Navy Federal Credit Union
Title
Vice President, Strategic Initiatives &
General Counsel
Assistant Professor
Senior Research Fellow
Professor
Senior Research Fellow and Professor
Emeritus
Professor
Associate Professor
Student
Dean, Kogod School of Business
Assistant Professor
Director, Professional MBA; Executive
in Residence, Information Technology
OIT - Enterprise Systems
Associate Professor and Director,
CITGE
Student
Senior Vice Provost and Dean of
Academic Affairs
Associate CIO, Office of Information
Technology
Student
Faculty
Student
Student
Student
President
Chief Technology Officer
Chief Technology Officer
Graduate Research Associate
Partner
Consultant
Partner
President
VP for Information Technology
Senior Scientist
Chairman
Chief Technology Officer
Assistant Vice President, Enterprise Data
- 11 ©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Workshop
Susan
Bennett
Bill
Prasanna
Jason
DeLeo
Lal Das
Bongard
January 29, 2013
Paragon Technology
Group, Inc.
SAS
World Bank
Strategy Services
Program Manager
Director, Release Engineering
Lead Program Officer
- 12 ©2012 Center for IT and the Global Economy, Kogod School of Business, American University
CITGE Executive Team
Dr. William H. DeLone
Executive Director, CITGE
Professor, Kogod School of
Business, American University
Dr. Gwanhoo Lee
Director, CITGE
Associate Professor, Kogod
School of Business, American
University
Dr. Richard J. Schroth
Executive-in-Residence, Kogod
School of Business, American
University
CEO, Executive Insights, Ltd.
Michael Carleton
Senior Research Fellow
Former CIO, U.S. Department of
Health and Human Services
Dr. Frank Armour
Research Fellow
CITGE Advisory Council
Steve Cooper
CIO, Air Traffic Organization,
Federal Aviation Administration
Bill DeLeo
Director of Release Engineering
Architecture, SAS
Associated Faculty and Research Fellows
Dr. Erran Carmel
Professor, Kogod School of
Business, American University
Mohamoud Jibrell
CIO, Howard Hughes Medical
Institute
Dr. J. Alberto Espinosa
Associate Professor, Kogod
School of Business, American
University
Joe Kraus
CIO, U.S. Holocaust Memorial
Museum
Dr. Peter Keen
Distinguished Research Fellow
Chairman, Keen Innovation
Ed Trainor
former CIO, AMTRAK
Dr. Mary Culnan
Senior Research Fellow
Slade Professor of Management
and Information Technology,
Bentley College
Susan Zankman
SVP of Information Resources
Finance and Management
Services, Marriott International
Download