Program Prospectus
Ph.D. in Analytics and Data Science
Last Updated: November 13, 2012
Executive Summary
The McKinsey Global Institute has identified that the demand for deep analytical talent
will outpace the supply in the United States by almost 200,000 people within three
years. The White House has launched a “Big Data Research and Development
Initiative”, to “expand the workforce needed to develop and use Big Data technologies”. This
theme is echoed by Thomas Davenport’s recent article in Harvard Business Review
titled “Data Scientist: The Sexiest Job in the 21st Century”.
These studies – and many others – point to the need for universities to train “Data
Scientists”. However, no university in the country currently has a degree program in
Data Science – defined as the intersection of Statistics, Mathematics and Computer
Kennesaw State University is proposing a Ph.D. in Data Science – the first of its kind in
the country.
The degree will train individuals to translate large, unstructured, complex data into
information to improve decision making. This curriculum will include programming,
mathematics, data mining, statistical modeling, and the mathematical foundations to
support these concepts. Importantly, it will also emphasize communication skills –
both oral and written – as well as application and tying results to business and research
Because this degree is a Ph.D. (rather than a Doctorate in Data Science), it creates
flexibility for the student. Graduates can either pursue a position in the private or
public sector as a “practicing” Data Scientist – where the demand is expected to greatly
outpace the supply – or pursue a position within academia, where they would be
uniquely qualified to teach these skills to the next generation.
Kennesaw State University is well positioned to launch this degree. This is evidenced
by the unparalleled success of the MS in Applied Statistics – where graduates are in
great demand and continue to have 100% placement – and by the Minor in Applied
Statistics with undergraduate demand from every college across the university. The
Minor supports approximately 200 undergraduates every semester – making it the most
successful and sought out Minor in the history of KSU.
The Ph.D. in Data Science will not only help to close the talent gap in the area of Data
Science, but will also continue KSU’s trajectory of regional and national recognition in
the area of applied analytics.
SECTION 1: Justification of Need
“I skate to where the puck is going to be, not where it has been.” – Wayne Gretzky
The United States Federal Government recently issued a press release addressing what
it sees as a growing critical shortage of data analysts and, on March 29, 2012, issued the
“Big Data Research and Development Initiative”. One of the main purposes of the
initiative is to “expand the workforce needed to develop and use Big Data technologies”.
The term “Big Data” is beginning to dominate descriptions of required skill sets across a
wide variety of disciplines and sectors of the economy. While the accepted definition of
Big Data is continuing to evolve, there is no question about the expansion and
prevalence of related concepts and their expanded role in the future.
According to The Economist magazine, unmanned American military aircraft (i.e.,
drone aircraft) flying over Iraq and Afghanistan in a single year (2009) produced
approximately 24 years’ worth of video surveillance footage. This astonishing fact
highlights at least four major points about the new direction of how data is collected,
analyzed, and used:
1. Extraordinary, previously unimaginable amounts of data are being collected and
stored for subsequent analysis, which contain potentially significant and
meaningful information to society at large.
2. It is not feasible to manually review and/or analyze such massive data in a timely
manner even with a team of human analysts using traditional methods.
Computer-assisted semi- or fully-automated processes using new computational
and data mining methods are needed in order to extract useful information from
massive data sources in a timely manner.
3. In addition to massive amounts of traditional structured data (i.e., tabular data),
extraordinary amounts of unstructured, non-traditional data such as video
footage, audio recordings, and unstructured text are being collected and stored.
Increasingly, these two very different types of data must be merged together in
systematic ways in order to obtain useful information.
4. Unlike the past, data collection and analysis is no longer a purely academic
endeavor. Data gathering and analysis for obtaining useful information most
often used in decision making processes is used in almost every field and sector
imaginable at present including the sciences, public health, the healthcare
industry, all aspects of business and finance (including retail, insurance,
marketing, the service industry, the credit industry, fraud detection, the
communications industry, etc.), psychology, education, public policy agencies,
government elections, and even national security and defense.
From these four points it follows that:
The next generation of statisticians will face very different challenges and issues than
previous generations of statisticians. As a result, the next generation of statisticians
needs a new set of knowledge and skills in order to effectively serve the data analysis
needs of the 21st century. These skills will incorporate more emphasis on applied
mathematics and on computer programming than has historically been the case – even
for applied statisticians.
The U.S. business community is also aware of this need: Hal Varian, Ph.D., the Chief
Economist at Google, Inc. states simply, “Data are available; what is scarce is the ability to
extract wisdom from them”.
Further to the recognition of the talent shortage evidenced through the White House
Big Data Research and Development Initiative, the “Big Data Report” from the
McKinsey Global Institute (MGI) estimates that the demand for data analysts could
exceed the current supply by 140,000 to 190,000 positions by the year 2018 (see Figure
1). Figure 1 shows that there are 440,000 to 490,000 total data analyst job positions
projected for 2018 with only 300,000 trained analyst to fill those positions. In other
words, the demand for big data analysts could be 50 to 60% greater than its projected
supply by 2018.
FIGURE 1: The Talent Gap for Big Data Analysts
The Big Data MGI report also predicts differential gains as a result of the impact of big
data and its use across different sectors. According to MGI, finance and government
(Cluster B in Figure 2) are expected to benefit strongly from big data use in the future
where computer and electronic products and information sectors (Cluster A in Figure 2)
have already and will continue to experience substantial benefits from the impact and
use of big data.
FIGURE 2: Differential Potential Gains of Big Data by Sector
A brief survey of the diverse disciplines which have recognized the role of Big Data and
the changing role of analytics includes:
Customer relations management (CRM) is one of the most innovative and profitable
ways in which businesses use big data. CRM is essentially the business practice of
analyzing customer-centric big data to discover trends and use that information to
customize or personalize offers and communications with customers to optimize
business. CRM was once used only by Fortune 500 companies, however, now with the
proliferation of big data and reduced costs in collecting and storing it, all types of
companies are using it to optimize their business. In one example of a typical CRM
application, a U.S. bank used big data analytics to predict which product offer was most
likely to be accepted by a particular customer and thereby customize the next on-line
product offered to that customer in an effort to cross-sell to existing customers (Berry &
Linoff, 2000). This CRM initiative resulted in substantial gains in cross-selling and
therefore profits to the bank well above the cost of implementation. This is just one of
many examples of big data analytics in business. Others include fraud identification,
service rate estimation, predicting product failure, and optimizing direct mailing
campaigns, among others. By all accounts, the main hindrance in CRM is a lack
qualified data analyst (The Economist, 2010; Significance, 2012).
Healthcare & Public Health
The proper use of digitized medical records has the potential of revolutionizing the
healthcare industry. Proper analysis of these records may be used to detect unwanted
drug interactions and/or side-effects, identify best practices in care (e.g., identify the
most effective drug therapies), and even predict the onset of certain diseases before
patients themselves are aware of symptoms (The Economist, 2012). In one example,
medical doctors and data analysts in Alabama developed automated infection
surveillance software that assists hospitals in identifying changes in nosocomial
infection (i.e., hospital-acquired infection) rates using massive data from Blue
Cross/Blue Shield of Alabama and statistical and data mining methods (Putman, 2003).
It has been estimated that nosocomial infections add as much as nine days to a patient’s
hospital stay leading to more than a $4 million per year additional expense. This
infection surveillance software provides early warning to hospitals and allows them to
intervene in a timely manner. This is only one of many possible examples where nontraditional statistical work involving big data has made a substantial improvement in
healthcare quality and substantial savings to society.
According to the National Science Foundation (NSF, 2012) in the document entitled,
“Core Techniques and Technologies for Advancing Big Data Science & Engineering”
(NSF 12-499), the impact of big data is causing a literal paradigm shift in scientific and
biomedical investigation that is transforming the missions of a number of U.S. Federal
Government agencies:
Today, US government agencies recognize that the scientific, biomedical and
engineering research communities are undergoing a profound transformation with the
use of large-scale, diverse, and high-resolution data sets that allow for data-intensive
decision-making, including clinical decision making, at a level never before imagined.
New statistical and mathematical algorithms, prediction techniques, and modeling
methods, as well as multidisciplinary approaches to data collection, data analysis and
new technologies for sharing data and information are enabling a paradigm shift in
scientific and biomedical investigation. Advances in machine learning, data mining,
and visualization are enabling new ways of extracting useful information in a timely
fashion from massive data sets, which complement and extend existing methods of
hypothesis testing and statistical inference. As a result, a number of agencies are
developing big data strategies to align with their missions.
These examples and countless others highlight three common emerging themes:
1. Data is ubiquitous. All disciplines. All sectors of the economy.
2. Data is no longer considered a necessary cost to be managed down, but rather as an
asset to be “mined” and leveraged.
3. All sectors are increasingly finding a dearth of analytical talent to support their
nascent, but explosive analytical needs, particularly as it is related to Big Data.
In response to this, Kennesaw State University is proposing the development of a Ph.D.
in Analytics and Data Science. It is our position that the Data Scientist will be uniquely
positioned to fill the talent shortage as outlined above.
It is critical to note that we are proposing a Ph.D. program in Analytics and Data
Science rather than in Statistics.
A great deal of attention is emerging in the field of analytics towards the role of the
Data Scientist –
From IBM - A data scientist represents an evolution from the business or data analyst role. The
formal training is similar, with a solid foundation typically in computer science and
applications, modeling, statistics, analytics and math. What sets the data scientist apart is strong
application acumen, coupled with the ability to communicate findings…in a way that can
influence how an organization approaches a business challenge.
From Thomas Davenport, Senior Managing Partner at Accenture and author of
Competing on Analytics – “(Data Scientists) are not typical scientists…but rather hybrids of
science and computation. Somewhere along their career journeys they became interested in, and
good at, the manipulation of data. In fact, many of them really have ‘computational’ in front of
their scientific specialties: computational biology, computational ecology, etc. If you want some
evidence of this hybrid specialization, look at your favorite data scientist’s profile on LinkedIn -the home, by the way, of some of the best data scientists around -- and check out the skills they
say they have. You’ll see “analytics” (quantitative analysis, statistical modeling, predictive
analytics, social network analysis, data mining, etc.) listed, of course. But you are also likely to
see SQL, Java, C, Python, R, distributed databases, and so forth. All of these skills actually are
found in one individual, and he seems typical of the breed…to my knowledge, no universities
have programs yet in big data analytics (though some are talking about them -- universities
typically don’t move too hastily).
From Daniel Tunkelang, Chief Data Scientist at LinkedIn – “Strong analytical skills are a
given: above all, a data scientist needs to be able to derive robust conclusions from data. But a
data scientist also needs to possess creativity and strong communication skills. Creativity drives
the process of hypothesis generation, i.e., picking the right problems to solve the will create value
for users and drive business decisions. Communication is essential, because data scientists work
in horizontal roles and partner with groups across the entire organization. At LinkedIn, data
scientists collaborate with every other product group, as well as with sales and finance. Strong
communication skills are a must-have.”
From Steve Hillion, VP of Analytics at GreenPlum, as quoted in Forbes - “I’m sure in 30
years’ time, there will be lots and lots of degrees in data science and that’s where [data scientists
will] come from, but right now it’s coming from all these different buckets (math, computer
science, economics)…And, just as the early days of computing were born in the garages of
Silicon Valley do-it-yourself-ers, data science is likely to develop first in an ad-hoc, hands-on
It is our position that the intersection of skills outlined in multiple ways above, are
brought together under the description of “Data Scientist”, in a way that does not occur
in a traditional Statistics curriculum. This term has emerged as the moniker of an
individual with strong computational and programming skills, but also possessing
business/content acumen, enabling clear and meaningful communication. As can be
seen below, the term “data scientist” is emerging as a dominant search term in Job
Search engines.
FIGURE 3: Job Trends in Data Science
From Michael Rappa, Director of the Institute for Advanced Analytics at NC State –
“The future of data science in the enterprise will be extremely bright, if a few key things happen:
First, the right kinds of partnerships must be formed between data-rich companies and forwardthinking academic institutions. Second, institutions and employers need to encourage and
reward the right set of data-science skills.”
Are statisticians going away? No. There will always be a need for traditional statistics.
Disciplines such as psychology, nursing, marketing research, medical research, etc., will
always have a need for the traditional skills associated with hypothesis testing and
model development.
Data Scientists are different. They embody skills which traditional statisticians don’t
have. While data scientists must have strong skills in statistical testing and modeling,
they are also strong in computational mathematics, data architecture, the process of ETL
(extract, transport, load), programming (i.e., SAS, Java, C++, Hadoop), and typically
have some content knowledge (i.e., Chemistry, Biology, Finance).
The proposed Ph.D. in Analytics and Data Science at Kennesaw State University
directly meets the national and local talent shortage in this space, as evidenced by
movements such as the Big Data Research and Development Initiative of 2012, by
effectively and thoroughly training and thereby expanding the workforce available to
develop and use Big Data technologies.
Furthermore, this degree program would transform teaching and learning in the field of
Big Data technology, another major objective of the White House Initiative.
Consequently, we believe the degree will directly and/or indirectly accelerate the pace
of discovery in science and engineering used to further understanding and knowledge,
strengthen U.S. national security, and increase the quality of life for the average
American citizen.
With respect to the shortage of big data analysts and their training, The Big Data MGI
report states, “…we believe that the constraint on this type of talent will be global, with the
caveat that some regions may be able to produce the supply that can fill talent gaps in other
It is our strongly held position that we can make Georgia one of these key regions
which produces Big Data analytical leadership for the world with this proposed Ph.D.
degree. A Ph.D. program in Analytics and Data Science at Kennesaw State University
has the potential of defining Kennesaw State University, the University System of
Georgia, and the State of Georgia, as cutting-edge, state-of-the-art innovators in the
methods and technologies that will shape and see us through the 21st century.
While there are no unique statistics on positions for “Data Scientists” from the Georgia
Department of Labor, there are unique statistics on the constituent disciplines of
Statistics, Mathematics and Computer Science and their projected employability.
TABLE 1: GA Department of Labor projections for
Mathematics, Statistics and Computer Science
Occupational Employment Projections in Georgia for Multiple Occupations for a base year of 2010 and a projected year of 2020
Occupational Title
2010 Estimated
2020 Projected
Computer and Information
Research Scientists
Computer and Mathematical
Total 2010-2020
Total Percent
Source: Georgia Department of Labor
Mathematical Science Occupations
In addition, the economies of State of Georgia and the City of Atlanta, which are heavily
dominated by Finance and Insurance, Government Services and Healthcare, are the very
industries the McKinsey Global Institute identified in Figure 3 above as expecting the greatest
benefit (and by association should have the greatest demand) for big data expertise.
SECTION 2: Demand for the Program
Local demand for the program is evidenced, in part, through the successes of both the
Minor in Applied Statistics and Data Analysis as well as in the Master of Science in
Applied Statistics.
The Minor in Applied Statistics, more than any other Minor field of study in the history
of KSU, is a flagship of interdisciplinary success. Students are required to complete 15
hours (five courses) in Statistics at the 3000 level or above to qualify for a Minor in
Applied Statistics and Data Analysis. In any given semester, the Minor serves the needs
of over 200 students from almost every college across the university.
Statistics represents the most diverse cross section of majors in 3000 or 4000 level
courses, of any course of study. Where most upper division courses are populated by
students from a single major, in the statistics courses (all STAT courses are above 3000),
the classes are consistently populated with students from Biology and Chemistry,
Finance and Economics, Psychology, Mathematics, Sociology…and even Theater (see
Figure 4).
FIGURE 4: Distribution of Minors in Applied Statistics and Data Analysis
by Declared Major (Fall 2012)
International Affairs
Exercise & Health Sciences
Information Systems
Geographic Info Sciences
Political Science
Math Education
Interdisciplinary Studies Bio
Environmental Studies
Criminal Justice
Computer Science
Applied Exercise & Health Sci
Why do almost 1% of the undergraduates at KSU seek out a series of five upper
division electives in Statistics? We believe that there are three primary reasons that
have created this demand:
1. We have an inherently interdisciplinary faculty – the same faculty which will
power the Ph.D. in Analytics and Data Science. Most of the Statistics faculty has
had experience in the private sector, including Ford Motor Company, The
Children’s Hospital of Cincinnati, The Cancer Center at MD Andersen in
Houston, TX, MasterCard International, VISA EU (London), AT&T/BellSouth
(Brazil), Thompson Reuters, The Southern Company and ChoicePoint. Most
students can find someone with an application of statistics outside the classroom,
aligned with their career aspirations. We bring our experiences into the
classroom and students respond.
2. Statistics is the process through which data is converted into meaningful
information to support decision making. But, as outlined above, while data is
increasingly ubiquitous and cheap and easy to capture and store, it is difficult to
translate. Students recognize that whether they are studying Finance or
Psychology, Biology or Political Science, they will have to understand how to
translate data into information. Since all disciplines work with data, in some
form, all disciplines of study need to have some integration of statistics for their
graduates to be marketable.
3. Jobs. Jobs. Jobs. Students are increasingly turning to Statistics as a great way to
differentiate themselves in the marketplace. Undergraduates with Minors in
statistics are having great success with job placement after graduation. Statistics
students from KSU are recruited for positions across a wide variety of companies
including The Home Depot, The Southern Company, Link Analytics, Aspen
Marketing Services, Epsilon, Ultimate Software, IBM, Assurant, Compucredit,
The CDC, Equifax.
The Masters of Science in Applied Statistics has a similar story. Since the launch of the
degree in 2006, very few of the applicants have had undergraduate degrees in Statistics.
MSAS applicants come from Engineering, Business, Medicine, and Education. A
defining characteristic of the MSAS program is its fluid alignment with the needs of the
market. As a result, the MSAS is proud of its effective 0% unemployment rate amongst
students without work restrictions.
Statistics emerged as a unique discipline at KSU in Fall of 2006 – all of this success has
occurred in less than 6 years. In an effort to ensure limited duplication with other
successful initiatives in Statistics within the University System, such as the programs at
the University of Georgia and at Georgia Tech, KSU pursued a strongly applied
orientation, meaning that our course materials were focused on leveraging our faculty
experience outside the classroom, and applying statistics the way practitioners apply
statistics. From the beginning we elected to have less emphasis on theoretical statistics.
This last point meant that we would have to have strong integration of statistical
software into our curriculum. So, we looked to the dominant software/language in the
marketplace. This was, without question, SAS, which is used by 95% of the Fortune 500
– including all of the top companies in our regional footprint. As a result, all of our
students, both at the undergraduate and graduate levels, learn strong SAS
programming skills as a complement to their statistics skills.
It is this dimension of programming skills, combined with a strong mathematical
foundation, and deep and broad instruction in statistical modeling which has already
well positioned the program to offer a Ph.D. in Analytics and Data Science.
Additional evidence of local demand for these skills comes from analytical job sites
such as – a job posting site uniquely designed for analytical
professionals. A recent keyword search for open positions in Georgia generated the
following results:
 “Big Data” – 79 positions, including postings with Intuit, The Home Depot,
Hitachi, United Health Group, and IBM.
 “Statistics” – 759 positions, including postings with Coca-Cola, The Home Depot,
CDC, Assurant, Fiserv and Lockheed Martin.
 “Data Scientist” – 8 positions – all with salaries over $100,000.
 “Advanced Analytics” – 1490 positions – including positions with every Fortune
500 in the state.
Screen shots from these searches can be found in Appendix 1.
We have also received strong demand and support for this program from the Statistical
Advisory Board.
TABLE 2: KSU Statistical Advisory Board
Chuck Clemens
Steven Einbender
Bill Franks
Ron Garmon
Will Hakes
Don Hayes
Jim Head
Darrell Maret
Billy Nix
Jerry Oglesby
Carol Pierannunzi
Brian Stone
Maxum Specialty Insurance Group
The Home Depot
Hayes Consulting
Southern Company
Senior Manager – Pricing
Chief Analytics Officer
Founder and CEO
Senior Vice President, Analytics
Senior Analyst
Vice President, Load Research
Division Head, Higher Education
Senior Survey Methodologist
Chief Risk Officer
As our Advisory Board guided the development of the proposal, they emphasized the
importance of practical experience to this degree. To that end, the Board has agreed, in
principle, to “hire” Ph.D. students on a contract basis for a minimum of one year, after
they have completed their coursework – but prior to completing their dissertation. This
would accomplish three objectives:
The hiring firm would cover one year (minimum) of doctoral student stipend
($25 - $30K).
The Ph.D. student would apply concepts and skills learned in the classroom in a
“real” environment.
The experience has the potential to become a source of dissertation research.
This integration with the companies represented by the Advisory Board also represents
an important endorsement of our proposed program, as well as an extension of the
engagement with the business community which has been the trademark of the
Statistics programs to date.
Letters of Intent and Support from the Statistical Advisory Board can be found in
Section Appendix 2.
SECTION 3: Non-Duplication of Similar Programs at USG Institutions
The proposed Program is not only NOT a duplication of any programs currently in
existence in the State of Georgia, the program would be the first of its kind in the
country. A brief outline of the most closely related Ph.D. programs in the State of
Georgia is provided in Table 3 below.
TABLE 3: Comparison of related Ph.D. programs in the University System of Georgia
Name of
Stated Objectives
Notes on Curriculum
Program Housed
Georgia Institute
of Technology
Ph.D. in Industrial
Engineering with a
Specialization in
“The Ph.D. in (Industrial and Systems
Engineering) is a research degree...students have
the opportunity to pursue work at virtually any
of the points across the applied/theoretical
College of Engineering, H.
Milton Stewart School of
Industrial and Systems
Georgia Institute
of Technology
Ph.D. in Industrial
Engineering with a
Specialization in
Computational Science
and Engineering
“Georgia Tech's CSE Ph.D. degree will prepare
students for a variety of positions in industry,
government and academia that emphasize
research and development. Students will be well
prepared for positions in industry…and in
government. Graduates may pursue work in
software and systems for modeling and
simulation, systems integration, data mining and
visualization, high performance computing, and
computational modeling. Academic career
possibilities include research and education in
departments concerned with advancing the stateof-the-art in the development and application of
computational models in engineering, the
sciences and computing.
Courses incorporate strong mathematics,
with methods courses aligned with
manufacturing and engineering.
Requirements include five core courses,
two theory courses, three methods courses,
one elective course (11 courses total). No
requirement for internships or co-op.
The program emphasizes the integration
and application of principles from
mathematics, science, engineering and
computing to create computational
University of
Ph.D. in Statistics
Georgia State
Ph.D. in Mathematics
and Statistics
“The Ph.D degree program in Mathematics and
Statistics includes concentrations in
bioinformatics, biostatistics, and mathematics.
These concentrations address the critical need for
mathematics faculty and the need for highly
trained specialists in the areas of bioinformatics
and biostatistics…(the program) will graduate
individuals with a broad background in applied
areas for direct placement in business, industry,
governmental institutions and research
College of Engineering, H.
Milton Stewart School of
Industrial and Systems
Courses include 6 core courses in
computational mathematics and in high
performance computing, three elective
courses which “must go beyond ‘using
computers’ to deepen understanding of
computational methods, preferably in the
context of some application domain” and
three elective courses in an application
domain (12 courses total). No requirement
for internships or co-op.
Heavy theoretical emphasis – placement is
exclusively in academic positions.
Program includes a minimum of 10
courses including four statistical theory
courses (core), two subcore electives – one
of which is a statistical computing course,
and four unspecified STAT electives.
Heavy emphasis on mathematics. The four
core courses include Real Analysis, Matrix
Analysis, Theory of Probability and Linear
Statistical Analysis. Remaining courses
vary based upon selected concentration.
The Concentration in Bioinformatics
incorporates three computer science
courses. Eighteen courses required.
College of Arts and Science,
Statistics Department.
College of Arts and Sciences,
Department of Mathematics
and Statistics.
These are all excellent programs which have achieved recognition in varying contexts.
However, none of these programs are aligned with the skills defining the “Data
APPENDIX 1: Screen Shots from for Georgia job postings
APPENDIX 2: Letters of Intent from the Statistical Advisory Board
