Uploaded by Nandkumar Khachane

Data Science Introduction

advertisement
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 1
Data Science
Data science combines the scientific method, math and statistics, specialized
programming, advanced analytics, AI, and even storytelling to uncover and explain the
business insights buried in data.
What is data science?
Data science is a multidisciplinary approach to extracting actionable insights from the
large and ever-increasing volumes of data collected and created by today’s
organizations. Data science encompasses preparing data for analysis and processing,
performing advanced data analysis, and presenting the results to reveal patterns and
enable stakeholders to draw informed conclusions.
Data preparation can involve cleansing, aggregating, and manipulating it to be ready
for specific types of processing. Analysis requires the development and use of
algorithms, analytics and AI models. It’s driven by software that combs through data
to find patterns within to transform these patterns into predictions that support
business decision-making. The accuracy of these predictions must be validated
through scientifically designed tests and experiments. And the results should be
shared through the skilful use of data visualization tools that make it possible for
anyone to see the patterns and understand trends.
As a result, data scientists (as data science practitioners are called) require computer
science and pure science skills beyond those of a typical data analyst. A data scientist
must be able to do the following:






Apply mathematics, statistics, and the scientific method
Use a wide range of tools and techniques for evaluating and preparing data—
everything from SQL to data mining to data integration methods
Extract insights from data using predictive analytics and artificial intelligence
(AI), including machine learning and deep learning models
Write applications that automate data processing and calculations
Tell—and illustrate—stories that clearly convey the meaning of results to
decision-makers and stakeholders at every level of technical knowledge and
understanding
Explain how these results can be used to solve business problems
This combination of skills is rare, and it’s no surprise that data scientists are currently
in high demand.
Data science tools
Data scientists must be able to build and run code in order to create models. The most
popular programming languages among data scientists are open source tools that
include or support pre-built statistical, machine learning and graphics capabilities.
These languages include:
DR. NANDKUMAR KHACHANE


DATA SCIENCE
Page | 2
R: An open source programming language and environment for developing
statistical computing and graphics, R is the most popular programming
language among data scientists. R provides a broad variety of libraries and
tools for cleansing and prepping data, creating visualizations, and training and
evaluating machine learning and deep learning algorithms. It’s also widely
used among data science scholars and researchers.
Python: Python is a general-purpose, object-oriented, high-level
programming language that emphasizes code readability through its
distinctive generous use of white space. Several Python libraries support data
science tasks, including Numpy for handling large dimensional arrays, Pandas
for data manipulation and analysis, and Matplotlib for building data
visualizations.
Data science use cases
There’s no limit to the number or kind of enterprises that could potentially benefit
from the opportunities data science is creating. Nearly any business process can be
made more efficient through data-driven optimization, and nearly every type of
customer experience (CX) can be improved with better targeting and personalization.
Here are a few representative use cases for data science and AI:






An international bank created a mobile app offering on-the-spot decisions to
loan applicants using machine learning-powered credit risk models and
a hybrid cloud computing architecture that is both powerful and secure.
An electronics firm is developing ultra-powerful 3D-printed sensors that will
guide tomorrow’s driverless vehicles. The solution relies on data science and
analytics tools to enhance its real-time object detection capabilities.
A robotic process automation (RPA) solution provider developed a cognitive
business process mining solution that reduces incident handling times
between 15% and 95% for its client companies. The solution is trained to
understand the content and sentiment of customer emails, directing service
teams to prioritize those that are most relevant and urgent.
A digital media technology company created an audience analytics platform
that enables its clients to see what’s engaging TV audiences as they’re
offered a growing range of digital channels. The solution employs deep
analytics and machine learning to gather real-time insights into viewer
behaviour.
An urban police department created statistical incident analysis tools to help
officers understand when and where to deploy resources in order to prevent
crime. The data-driven solution creates reports and dashboards to augment
situational awareness for field officers.
A smart healthcare company developed a solution enabling seniors to live
independently for longer. Combining sensors, machine learning, analytics,
and cloud-based processing, the system monitors for unusual behavior and
alerts relatives and caregivers, while conforming to the strict security
standards that are mandatory in the healthcare industry.
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 3
Lifecycle of Data Science
Here is a brief overview of the main phases of the Data Science Lifecycle:
Phase 1—Discovery: Before you begin the project, it is important to understand the
various specifications, requirements, priorities and required budget. You must possess
the ability to ask the right questions. Here, you assess if you have the required
resources present in terms of people, technology, time and data to support the
project. In this phase, you also need to frame the business problem and formulate
initial hypotheses (IH) to test.
Phase 2—Data preparation: In this phase, you require analytical sandbox in which
you can perform analytics for the entire duration of the project. You need to explore,
preprocess and condition data prior to modeling. Further, you will perform ETLT
(extract, transform, load and transform) to get data into the sandbox. Let’s have a
look at the Statistical Analysis flow below.
You can use R for data cleaning, transformation, and visualization. This will help you
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 4
to spot the outliers and establish a relationship between the variables. Once you have
cleaned and prepared the data, it’s time to do exploratory analytics on it. Let’s see how
you can achieve that.
Phase 3—Model planning: Here, you will determine the methods and techniques to
draw the relationships between variables. These relationships will set the base for the
algorithms which you will implement in the next phase. You will apply Exploratory Data
Analytics (EDA) using various statistical formulas and visualization tools.
Let’s have a look at various model planning tools.
1. R has a complete set of modeling capabilities and provides a good environment
for building interpretive models.
2. SQL Analysis services can perform in-database analytics using common data
mining functions and basic predictive models.
3. SAS/ACCESS can be used to access data from Hadoop and is used for
creating repeatable and reusable model flow diagrams.
Although, many tools are present in the market but R is the most commonly used tool.
Now that you have got insights into the nature of your data and have decided the
algorithms to be used. In the next stage, you will apply the algorithm and build up a
model.
Phase 4—Model building: In this phase, you will develop datasets for training and
testing purposes. Here you need to consider whether your existing tools will suffice
for running the models or it will need a more robust environment (like fast and parallel
processing). You will analyze various learning techniques like classification,
association and clustering to build the model.
You can achieve model building through the following tools.
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 5
Phase 5—Operationalize: In this phase, you deliver final reports, briefings, code and
technical documents. In addition, sometimes a pilot project is also implemented in a
real-time production environment. This will provide you a clear picture of the
performance and other related constraints on a small scale before full deployment.
Phase 6—Communicate results: Now it is important to evaluate if you have been
able to achieve your goal that you had planned in the first phase. So, in the last phase,
you identify all the key findings, communicate to the stakeholders and determine if the
results of the project are a success or a failure based on the criteria developed in
Phase 1.
What is a Data Scientist
Data scientists are big data wranglers, gathering and analyzing large sets of
structured and unstructured data. A data scientist’s role combines computer science,
statistics, and mathematics. They analyze, process, and model data then interpret
the results to create actionable plans for companies and other organizations.
Data scientists are analytical experts who utilize their skills in both technology and
social science to find trends and manage data. They use industry knowledge,
contextual understanding, skepticism of existing assumptions – to uncover solutions
to business challenges.
A data scientist’s work typically involves making sense of messy, unstructured data,
from sources such as smart devices, social media feeds, and emails that don’t neatly
fit into a database.
Roles & Responsibilities of a Data Scientist

Management: The Data Scientist plays an insignificant managerial role
where he supports the construction of the base of futuristic and technical
abilities within the Data and Analytics field in order to assist various
planned and continuing data analytics projects.
DR. NANDKUMAR KHACHANE





DATA SCIENCE
Page | 6
Analytics: The Data Scientist represents a scientific role where he plans,
implements, and assesses high-level statistical models and strategies for
application in the business’s most complex issues. The Data Scientist
develops econometric and statistical models for various problems
including projections, classification, clustering, pattern analysis, sampling,
simulations, and so forth.
Strategy/Design: The Data Scientist performs a vital role in the
advancement of innovative strategies to understand the business’s
consumer trends and management as well as ways to solve difficult
business problems, for instance, the optimization of product fulfilment and
entire profit.
Collaboration: The role of the Data Scientist is not a solitary role and in
this position, he collaborates with superior data scientists to communicate
obstacles and findings to relevant stakeholders in an effort to enhance
drive business performance and decision-making.
Knowledge: The Data Scientist also takes leadership to explore different
technologies and tools with the vision of creating innovative data-driven
insights for the business at the most agile pace feasible. In this situation,
the Data Scientist also uses initiative in assessing and utilizing new and
enhanced data science methods for the business, which he delivers to
senior management of approval.
Other Duties: A Data Scientist also performs related tasks and tasks as
assigned by the Senior Data Scientist, Head of Data Science, Chief Data
Officer, or the Employer.
Difference Between Data Scientist, Data Analyst, and Data Engineer
Data Scientist, Data Engineer, and Data Analyst are the three most
common careers in data science. So let’s understand who’s data science by
comparing it with its similar jobs.
Data Scientist
Data Analyst
Data Engineer
The focus will be on the
futuristic display of
data.
The main focus of a
data analyst is on
optimization of
scenarios, for example
how an employee can
enhance the company’s
product growth.
Data Engineers focus on
optimization techniques
and the construction of data
in a conventional manner.
The purpose of a data
engineer is continuously
advancing data
consumption.
Data scientists present
both supervised and
unsupervised learning
of data, say regression
and classification of
data, Neural networks,
etc.
Data formation and
cleaning of raw data,
interpreting and
visualization of data to
perform the analysis
and to perform the
Frequently data engineers
operate at the back end.
Optimized machine
learning algorithms were
used for keeping data and
making data to be prepared
most accurately.
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Data Scientist
Data Analyst
Page | 7
Data Engineer
technical summary of
data.
Skills required for Data
Scientist are Python, R,
SQL, Pig, SAS, Apache
Hadoop, Java, Perl,
Spark.
Skills required for Data
Analyst are Python, R,
SQL, SAS.
Skills required for Data
Engineer are MapReduce,
Hive, Pig Hadoop,
techniques.
Application of Data Science in Real
World.
Fraud and Risk Detection
The earliest applications of data science were in Finance. Companies were fed up of
bad debts and losses every year. However, they had a lot of data which use to get
collected during the initial paperwork while sanctioning loans. They decided to bring
in data scientists in order to rescue them out of losses.
Over the years, banking companies learned to divide and conquer data via customer
profiling, past expenditures, and other essential variables to analyze the probabilities
of risk and default. Moreover, it also helped them to push their banking products based
on customer’s purchasing power.
Healthcare
The healthcare sector, especially, receives great benefits from data science
applications.
1. Medical Image Analysis
Procedures such as detecting tumors, artery stenosis, organ delineation employ
various different methods and frameworks like MapReduce to find optimal parameters
for tasks like lung texture classification. It applies machine learning methods, support
vector machines (SVM), content-based medical image indexing, and wavelet analysis
DR. NANDKUMAR KHACHANE
for
solid
DATA SCIENCE
Page | 8
texture
classification.
2. Genetics & Genomics
Data Science applications also enable an advanced level of treatment personalization
through research in genetics and genomics. The goal is to understand the impact of
the DNA on our health and find individual biological connections between genetics,
diseases, and drug response. Data science techniques allow integration of different
kinds of data with genomic data in the disease research, which provides a deeper
understanding of genetic issues in reactions to particular drugs and diseases. As soon
as we acquire reliable personal genome data, we will achieve a deeper understanding
of the human DNA. The advanced genetic risk prediction will be a major step towards
more individual care.
3. Drug Development
The drug discovery process is highly complicated and involves many disciplines. The
greatest ideas are often bounded by billions of testing, huge financial and time
expenditure. On average, it takes twelve years to make an official submission.
Data science applications and machine learning algorithms simplify and shorten this
process, adding a perspective to each step from the initial screening of drug
compounds to the prediction of the success rate based on the biological factors. Such
algorithms can forecast how the compound will act in the body using advanced
mathematical modeling and simulations instead of the “lab experiments”. The idea
behind the computational drug discovery is to create computer model simulations as
a biologically relevant network simplifying the prediction of future outcomes with high
accuracy.
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 9
4. Virtual assistance for patients and customer support
Optimization of the clinical process builds upon the concept that for many cases it is
not actually necessary for patients to visit doctors in person. A mobile application can
give a more effective solution by bringing the doctor to the patient instead.
The AI-powered mobile apps can provide basic healthcare support, usually as
chatbots. You simply describe your symptoms, or ask questions, and then receive key
information about your medical condition derived from a wide network linking
symptoms to causes. Apps can remind you to take your medicine on time, and if
necessary, assign an appointment with a doctor.
This approach promotes a healthy lifestyle by encouraging patients to make healthy
decisions, saves their time waiting in line for an appointment, and allows doctors to
focus on more critical cases.
The most popular applications nowadays are Your.MD and Ada.
Internet Search
Now, this is probably the first thing that strikes your mind when you think Data Science
Applications.
When we speak of search, we think ‘Google’. Right? But there are many other search
engines like Yahoo, Bing, Ask, AOL, and so on. All these search engines (including
Google) make use of data science algorithms to deliver the best result for our searched
query in a fraction of seconds. Considering the fact that, Google processes more than
20 petabytes of data every day.
Had there been no data science, Google wouldn’t have been the ‘Google’ we know
today.
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 10
Targeted Advertising
If you thought Search would have been the biggest of all data science applications,
here is a challenger – the entire digital marketing spectrum. Starting from the display
banners on various websites to the digital billboards at the airports – almost all of them
are decided by using data science algorithms.
This is the reason why digital ads have been able to get a lot higher CTR (Call-Through
Rate) than traditional advertisements. They can be targeted based on a user’s past
behavior.
This is the reason why you might see ads of Data Science Training Programs while I
see an ad of apparels in the same place at the same time.
TOP 10 DATA SCIENCE TRENDS FOR THIS DECADE
The presence of data in every field that you can think of is what turns out to be a
reason why organizations are showing interest in data science. Also, the fact that data
will continue to be an integral part of our lives till eternity serves to be yet another
driver of data science. That said, it’s really important to stay updated with the hottest
data science trends that could serve to be a blessing to grow your business. Here are
the top 10 data science trends for this decade.
1. Predictive analysis
For a business to prosper, it is critical to know what the future might look like. This is
exactly where predictive analysis comes into play. Organizations rely on their
customers to a large extent. Hence, being able to understand their behaviors helps in
making better decisions ahead. This technique is one of the smartest to come up with
the best strategies to target the customers that’d aid in retaining the older ones and
also get newer customers.
2.Machine learning
Over the years, we have seen how much automation has transformed the world. This
is why machine learning has gained importance like never before. The coming years
will see more automation and hence the rise in the number of organizations adopting
machine learning will surpass one’s imagination for sure.
3, IoT
Gone are the days when IoT was considered to be something that would have limited
applications. Today, we are living in a world where our smartphones have the ability
to control appliances like TV, AC, etc. All of this is possible because of IoT. Google
Assistant is yet another remarkable innovation in the area of IoT. Thus, companies
looking for ways to invest in this technology come as no big surprise. This simply
throws light on how rapidly the IoT industry would grow in the days ahead.
4.. Blockchain
Needless to say, cryptocurrencies like Bitcoin, Litecoin, etc. have become the talk of
the world. All of these currencies employ blockchain technology. With the world
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 11
showing keen interest in this field, it surely stands a far-reaching implementation in the
coming time
5. Edge computing
Edge computing is known for faster processing of information and it also boasts of
reducing latency, cost and traffic. It is solely because of these features that the
organizations are not willing to sideline this option. With this computing in place,
dealing with real-time applications couldn’t have got any better. The coming years
could see more of a considerable shift from traditional methods to that of edge
computing.
6. DataOps
Lets’ face the reality – the data pipeline has become more complex and thus requires
even more integration and governance tools. DataOps to our rescue it is! Tasks right
from collection to preparation to analysis, testing automation, implementing automated
testing, delivery for providing enhanced data quality and analysis are all covered. This
trend will continue for the years to come.
7, Artificial Intelligence
Be it a small enterprise or a tech giant, all of them have relied on AI in one way or the
other. All those complex tasks are no longer a concern for we now can rely on AI for
the same. Also, the reduction in errors is yet another strong reason to why AI stands
apart. Now that we’ve relied on AI so much, there’s no coming back!
8. Data visualization
This is one of those prominent trends that we can trust with. This is because the
organizations are moving their conventional data warehouses to the cloud.
9. Better user experience
The extent to which user experience is given importance to talks volume about the
success of the company. This is why companies are leaving no stone unturned in
providing the best possible user experience – be it in the form of chatbots, personal
assistance, or AI-driven tools for that matter.
10. Data governance
This is yet another area that’s gaining a lot of importance. Numerous companies out
there are still struggling to comply with the rules and regulations. It is critical to not just
comply with these but also to understand the impact of the same on the present and
future operations. Data scientists who have sound knowledge about all of this is the
need of the hour.
These trends show a clearer picture of what data science strategies need to be
implemented to retain your customers and also take your business to new heights.
Current Trends in Data Science
With the diversity in data problems and requirements, comes a broad range of
innovative solutions. These solutions often bring with themselves a host of data science
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 12
trends granting businesses the agility they require while offering them deeper insights
into their data. A few of these top Data Science trends are briefly explained below:
1. Graph Analytics
With data flowing in from all directions, it becomes harder to analyze.
Graph Analytics aims to solve this problem by acting as a flexible yet powerful
tool that analyzes complicated data points and relationships using graphs. The
intention behind using graphs is to represent the complex data abstractly and in a visual
format that is easier to digest and offers maximum insights. Graph Analytics are applied
in a plethora of areas such as:
 Filtering out bots on social media to reduce false information
 Identifying frauds in banking industries
 Preventing financial crime
 Analyzing power and water grids to find flaws
2. Data Fabric
Data Fabric is a relatively new trend, and at its core, it encapsulates an organization’s
data collected from a vast number of sources such as APIs, reusable data
services, pipelines, semantic tiers, providing transformable access to data.
Created for assisting the business context of data and keeping data in an intelligible
way not just for users but also for applications, Data Fabrics enable you to have
scalable data while being agile.
By doing so, you get unparalleled access to process, manage, store, and share the
data as needed. Business Intelligence and Data Science relies heavily upon Data
Fabrics due to its smooth and clean access to enormous amounts of data.
3. Data Privacy by Design
The trend of Data privacy by design incorporates a safer and more proactive approach
to collecting and handling user data while training your machine learning model on it.
Corporations need user data to train their models on real-world scenarios, and they
collect data from various sources such as browsing patterns and devices.
The idea behind Federated Learning is to collect as little data as possible, keeping the
user in the loop by also giving them the option to opt-out and erase all collected data
at any time.
While the data may come from an enormous audience, for privacy reasons, it must be
guaranteed that any reverse-engineering of the original data to identify the user isn’t
possible.
4. Augmented Analytics
Augmented Analytics refers to driving better insights from the data in hand by
excluding any incorrect conclusions or bias for optimized decisions. By
infusing Artificial Intelligence and Machine Learning, Augmented Analytics aids users
in planning a new model.
With reduced dependency on data scientists and machine learning experts,
Augmented Analytics aims to deliver relatively better insights on data to aid the entire
Business Intelligence process.
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 13
This subtle introduction of Artificial Intelligence & Machine Learning has a significant
impact on the traditional insight discovery process by automating many aspects of data
science. Augmented Analytics is gaining a stronghold in providing better decisions free
of any errors and bias in the analysis.
5. Python as the De-Facto Language for Data Science
Python is an absolute all-rounder programming language and is considered a worth
entry point if you’re interested in getting into the world of Artificial Intelligence and Data
Science.
Python comes stacked with integration for
numerous programming languages and libraries, making it an excellent option for,
say, jumping into creating a quick prototype for the problem at hand or going in-depth
into large datasets.
Some of its most popular libraries are ● TensorFlow, for machine learning workloads and working with datasets
● scikit-learn, for training machine learning models
● PyTorch, for computer vision and natural language processing
● Keras, as the code interface for highly complex mathematical calculations and
operations
● SparkMLlib, like Apache Spark’s Machine Learning library, making machine learning
easy for everyone with tools like algorithms and utilities
6. Widespread Automation in Data Science
Time is a critical component, and none of it should be spent on performing repetitive
tasks.
As Artificial intelligence advanced, its automation capabilities expanded as well.
Various innovations in automation are turning many complex Artificial Intelligence tasks
easier.
Automation in the field of Data Science is already simplifying much of the process, if
not all. The entire process of Data Science includes identification of the problem,
data collection, processing, exploration, analysis, and sharing of processed
information to others.
7. Conversational Analytics and Natural Language Processing
Natural Language Processing and Conversational Analytics are already making big
waves in the digital world by simplifying the way we interact with machines and look up
information online.
NLP has hugely helped us progress into an era where computers and humans can
communicate in common natural language, enabling a constant and fluent
conversation between the two.
The applications of NLP and conversational systems can be seen everywhere, such as
chatbots and smart digital assistants. It has been predicted that the usage of voicebased searches will exceed the more commonly used text-based searches in a very
short time.
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 14
8. Super-sized Data Science in the Cloud
The onset of Artificial Intelligence and the amount of data generated from it has
skyrocketed ever since. The size of data grew tremendously from a few gigabytes to a
few hundred as businesses grew their online presence.
This increased requirement of data storage and processing capabilities gave rise to
Data Science for a controlled and precise utilization of data and pushed organizations
working on a global scale to opt for cloud solutions.
Various cloud solutions providers such as Google, Amazon, Microsoft offer vast
cloud computing options that include enterprise-grade cloud server capabilities
ensuring high scalability and zero downtime.
9, Mitigate Model Biases and Discrimination
No model is entirely immune to biases, and they can begin to exhibit discriminatory
behavior at any stage due to factors such as lack of sufficient data, historical bias, and
incorrect data collection practices. Bias and discrimination is a common problem with
models and is an emerging trend. If timely detected, these biases can be mitigated at
three stages:
 Pre-Processing Stage
 In-Processing Stage
 Post-Processing Stage
Each stage comes with its own set of corrective aspects including algorithms and
techniques to optimize the model for fairness, and to increase its accuracy for
eliminating any chance of bias.
10.In-Memory Computing
In-Memory computing is an emerging trend that is vastly different from how we
traditionally process data.
In-Memory computing processes data stored in an in-memory database as opposed to
the traditional methods using hard drives and relational databases with a querying
language. This technique allows for processing and querying of data in real-time for
instant decision making and reporting.
With memory becoming cheaper and businesses relying on real-time results, InMemory computing enables them to have applications with richer, more interactive
dashboards that can be supplied with newer data and be ready for reporting almost
instantly.
11.Blockchain in Data and Analytics
Blockchain, in simpler terms, is a time-stamped collection of immutable data managed
by a cluster of computers, and not by any single entity. The chain here refers to the
connection between each of these blocks, bound together using cryptographic
algorithms.
Transforming gradually similar to Data Science, Blockchain is crucial for maintaining
and validating records while Data Science works on the collecting and information
extraction part of the data. Data Science and Blockchain are related as they both use
algorithms to govern various segments of their processing.
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 15
12. Advanced Image Recognition
You upload your image with friends on Facebook and you start getting suggestions to
tag your friends. This automatic tag suggestion feature uses face recognition
algorithm.
In their latest update, Facebook has outlined the additional progress they’ve made in
this area, making specific note of their advances in image recognition accuracy and
capacity.
“We’ve witnessed massive advances in image classification (what is in the image?) as
well as object detection (where are the objects?), but this is just the beginning of
understanding the most relevant visual content of any image or video. Recently we’ve
been designing techniques that identify and segment each and every object in an
image, a key capability that will enable entirely new applications.”
In addition, Google provides you with the option to search for images by uploading
them. It uses image recognition and provides related search results.
13. peech Recognition
Some of the best examples of speech recognition products are Google Voice, Siri,
Cortana etc. Using speech-recognition feature, even if you aren’t in a position to type
a message, your life wouldn’t stop. Simply speak out the message and it will be
converted to text. However, at times, you would realize, speech recognition doesn’t
perform accurately.
14. Airline Route Planning
Airline Industry across the world is known to bear heavy losses. Except for a few airline
service providers, companies are struggling to maintain their occupancy ratio and
operating profits. With high rise in air-fuel prices and need to offer heavy discounts to
customers has further made the situation worse. It wasn’t for long when airlines
companies started using data science to identify the strategic areas of improvements.
Now using data science, the airline companies can:
1. Predict flight delay
2. Decide which class of airplanes to buy
3. Whether to directly land at the destination or take a halt in between (For
example, A flight can have a direct route from New Delhi to New York.
Alternatively, it can also choose to halt in any country.)
4. Effectively drive customer loyalty programs
Southwest Airlines, Alaska Airlines are among the top companies who’ve embraced
data science to bring changes in their way of working.
15. Gaming
Games are now designed using machine learning algorithms which improve/upgrade
themselves as the player moves up to a higher level. In motion gaming also, your
opponent (computer) analyzes your previous moves and accordingly shapes up its
DR. NANDKUMAR KHACHANE
DATA SCIENCE
Page | 16
game. EA Sports, Zynga, Sony, Nintendo, Activision-Blizzard have led gaming
experience to the next level using data science.
16. Augmented Reality
This is the final of the data science applications which seems most exciting in the
future. Augmented reality.
Data Science and Virtual Reality do have a relationship, considering a VR headset
contains computing knowledge, algorithms and data to provide you with the best
viewing experience. A very small step towards this is the high trending game
of Pokemon GO. The ability to walk around things and look at Pokemon on walls,
streets, things that aren’t really there. The creators of this game used the data from
Ingress, the last app from the same company, to choose the locations of the Pokemon
and gyms.
Download