DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 1 Data Science Data science combines the scientific method, math and statistics, specialized programming, advanced analytics, AI, and even storytelling to uncover and explain the business insights buried in data. What is data science? Data science is a multidisciplinary approach to extracting actionable insights from the large and ever-increasing volumes of data collected and created by today’s organizations. Data science encompasses preparing data for analysis and processing, performing advanced data analysis, and presenting the results to reveal patterns and enable stakeholders to draw informed conclusions. Data preparation can involve cleansing, aggregating, and manipulating it to be ready for specific types of processing. Analysis requires the development and use of algorithms, analytics and AI models. It’s driven by software that combs through data to find patterns within to transform these patterns into predictions that support business decision-making. The accuracy of these predictions must be validated through scientifically designed tests and experiments. And the results should be shared through the skilful use of data visualization tools that make it possible for anyone to see the patterns and understand trends. As a result, data scientists (as data science practitioners are called) require computer science and pure science skills beyond those of a typical data analyst. A data scientist must be able to do the following: Apply mathematics, statistics, and the scientific method Use a wide range of tools and techniques for evaluating and preparing data— everything from SQL to data mining to data integration methods Extract insights from data using predictive analytics and artificial intelligence (AI), including machine learning and deep learning models Write applications that automate data processing and calculations Tell—and illustrate—stories that clearly convey the meaning of results to decision-makers and stakeholders at every level of technical knowledge and understanding Explain how these results can be used to solve business problems This combination of skills is rare, and it’s no surprise that data scientists are currently in high demand. Data science tools Data scientists must be able to build and run code in order to create models. The most popular programming languages among data scientists are open source tools that include or support pre-built statistical, machine learning and graphics capabilities. These languages include: DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 2 R: An open source programming language and environment for developing statistical computing and graphics, R is the most popular programming language among data scientists. R provides a broad variety of libraries and tools for cleansing and prepping data, creating visualizations, and training and evaluating machine learning and deep learning algorithms. It’s also widely used among data science scholars and researchers. Python: Python is a general-purpose, object-oriented, high-level programming language that emphasizes code readability through its distinctive generous use of white space. Several Python libraries support data science tasks, including Numpy for handling large dimensional arrays, Pandas for data manipulation and analysis, and Matplotlib for building data visualizations. Data science use cases There’s no limit to the number or kind of enterprises that could potentially benefit from the opportunities data science is creating. Nearly any business process can be made more efficient through data-driven optimization, and nearly every type of customer experience (CX) can be improved with better targeting and personalization. Here are a few representative use cases for data science and AI: An international bank created a mobile app offering on-the-spot decisions to loan applicants using machine learning-powered credit risk models and a hybrid cloud computing architecture that is both powerful and secure. An electronics firm is developing ultra-powerful 3D-printed sensors that will guide tomorrow’s driverless vehicles. The solution relies on data science and analytics tools to enhance its real-time object detection capabilities. A robotic process automation (RPA) solution provider developed a cognitive business process mining solution that reduces incident handling times between 15% and 95% for its client companies. The solution is trained to understand the content and sentiment of customer emails, directing service teams to prioritize those that are most relevant and urgent. A digital media technology company created an audience analytics platform that enables its clients to see what’s engaging TV audiences as they’re offered a growing range of digital channels. The solution employs deep analytics and machine learning to gather real-time insights into viewer behaviour. An urban police department created statistical incident analysis tools to help officers understand when and where to deploy resources in order to prevent crime. The data-driven solution creates reports and dashboards to augment situational awareness for field officers. A smart healthcare company developed a solution enabling seniors to live independently for longer. Combining sensors, machine learning, analytics, and cloud-based processing, the system monitors for unusual behavior and alerts relatives and caregivers, while conforming to the strict security standards that are mandatory in the healthcare industry. DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 3 Lifecycle of Data Science Here is a brief overview of the main phases of the Data Science Lifecycle: Phase 1—Discovery: Before you begin the project, it is important to understand the various specifications, requirements, priorities and required budget. You must possess the ability to ask the right questions. Here, you assess if you have the required resources present in terms of people, technology, time and data to support the project. In this phase, you also need to frame the business problem and formulate initial hypotheses (IH) to test. Phase 2—Data preparation: In this phase, you require analytical sandbox in which you can perform analytics for the entire duration of the project. You need to explore, preprocess and condition data prior to modeling. Further, you will perform ETLT (extract, transform, load and transform) to get data into the sandbox. Let’s have a look at the Statistical Analysis flow below. You can use R for data cleaning, transformation, and visualization. This will help you DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 4 to spot the outliers and establish a relationship between the variables. Once you have cleaned and prepared the data, it’s time to do exploratory analytics on it. Let’s see how you can achieve that. Phase 3—Model planning: Here, you will determine the methods and techniques to draw the relationships between variables. These relationships will set the base for the algorithms which you will implement in the next phase. You will apply Exploratory Data Analytics (EDA) using various statistical formulas and visualization tools. Let’s have a look at various model planning tools. 1. R has a complete set of modeling capabilities and provides a good environment for building interpretive models. 2. SQL Analysis services can perform in-database analytics using common data mining functions and basic predictive models. 3. SAS/ACCESS can be used to access data from Hadoop and is used for creating repeatable and reusable model flow diagrams. Although, many tools are present in the market but R is the most commonly used tool. Now that you have got insights into the nature of your data and have decided the algorithms to be used. In the next stage, you will apply the algorithm and build up a model. Phase 4—Model building: In this phase, you will develop datasets for training and testing purposes. Here you need to consider whether your existing tools will suffice for running the models or it will need a more robust environment (like fast and parallel processing). You will analyze various learning techniques like classification, association and clustering to build the model. You can achieve model building through the following tools. DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 5 Phase 5—Operationalize: In this phase, you deliver final reports, briefings, code and technical documents. In addition, sometimes a pilot project is also implemented in a real-time production environment. This will provide you a clear picture of the performance and other related constraints on a small scale before full deployment. Phase 6—Communicate results: Now it is important to evaluate if you have been able to achieve your goal that you had planned in the first phase. So, in the last phase, you identify all the key findings, communicate to the stakeholders and determine if the results of the project are a success or a failure based on the criteria developed in Phase 1. What is a Data Scientist Data scientists are big data wranglers, gathering and analyzing large sets of structured and unstructured data. A data scientist’s role combines computer science, statistics, and mathematics. They analyze, process, and model data then interpret the results to create actionable plans for companies and other organizations. Data scientists are analytical experts who utilize their skills in both technology and social science to find trends and manage data. They use industry knowledge, contextual understanding, skepticism of existing assumptions – to uncover solutions to business challenges. A data scientist’s work typically involves making sense of messy, unstructured data, from sources such as smart devices, social media feeds, and emails that don’t neatly fit into a database. Roles & Responsibilities of a Data Scientist Management: The Data Scientist plays an insignificant managerial role where he supports the construction of the base of futuristic and technical abilities within the Data and Analytics field in order to assist various planned and continuing data analytics projects. DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 6 Analytics: The Data Scientist represents a scientific role where he plans, implements, and assesses high-level statistical models and strategies for application in the business’s most complex issues. The Data Scientist develops econometric and statistical models for various problems including projections, classification, clustering, pattern analysis, sampling, simulations, and so forth. Strategy/Design: The Data Scientist performs a vital role in the advancement of innovative strategies to understand the business’s consumer trends and management as well as ways to solve difficult business problems, for instance, the optimization of product fulfilment and entire profit. Collaboration: The role of the Data Scientist is not a solitary role and in this position, he collaborates with superior data scientists to communicate obstacles and findings to relevant stakeholders in an effort to enhance drive business performance and decision-making. Knowledge: The Data Scientist also takes leadership to explore different technologies and tools with the vision of creating innovative data-driven insights for the business at the most agile pace feasible. In this situation, the Data Scientist also uses initiative in assessing and utilizing new and enhanced data science methods for the business, which he delivers to senior management of approval. Other Duties: A Data Scientist also performs related tasks and tasks as assigned by the Senior Data Scientist, Head of Data Science, Chief Data Officer, or the Employer. Difference Between Data Scientist, Data Analyst, and Data Engineer Data Scientist, Data Engineer, and Data Analyst are the three most common careers in data science. So let’s understand who’s data science by comparing it with its similar jobs. Data Scientist Data Analyst Data Engineer The focus will be on the futuristic display of data. The main focus of a data analyst is on optimization of scenarios, for example how an employee can enhance the company’s product growth. Data Engineers focus on optimization techniques and the construction of data in a conventional manner. The purpose of a data engineer is continuously advancing data consumption. Data scientists present both supervised and unsupervised learning of data, say regression and classification of data, Neural networks, etc. Data formation and cleaning of raw data, interpreting and visualization of data to perform the analysis and to perform the Frequently data engineers operate at the back end. Optimized machine learning algorithms were used for keeping data and making data to be prepared most accurately. DR. NANDKUMAR KHACHANE DATA SCIENCE Data Scientist Data Analyst Page | 7 Data Engineer technical summary of data. Skills required for Data Scientist are Python, R, SQL, Pig, SAS, Apache Hadoop, Java, Perl, Spark. Skills required for Data Analyst are Python, R, SQL, SAS. Skills required for Data Engineer are MapReduce, Hive, Pig Hadoop, techniques. Application of Data Science in Real World. Fraud and Risk Detection The earliest applications of data science were in Finance. Companies were fed up of bad debts and losses every year. However, they had a lot of data which use to get collected during the initial paperwork while sanctioning loans. They decided to bring in data scientists in order to rescue them out of losses. Over the years, banking companies learned to divide and conquer data via customer profiling, past expenditures, and other essential variables to analyze the probabilities of risk and default. Moreover, it also helped them to push their banking products based on customer’s purchasing power. Healthcare The healthcare sector, especially, receives great benefits from data science applications. 1. Medical Image Analysis Procedures such as detecting tumors, artery stenosis, organ delineation employ various different methods and frameworks like MapReduce to find optimal parameters for tasks like lung texture classification. It applies machine learning methods, support vector machines (SVM), content-based medical image indexing, and wavelet analysis DR. NANDKUMAR KHACHANE for solid DATA SCIENCE Page | 8 texture classification. 2. Genetics & Genomics Data Science applications also enable an advanced level of treatment personalization through research in genetics and genomics. The goal is to understand the impact of the DNA on our health and find individual biological connections between genetics, diseases, and drug response. Data science techniques allow integration of different kinds of data with genomic data in the disease research, which provides a deeper understanding of genetic issues in reactions to particular drugs and diseases. As soon as we acquire reliable personal genome data, we will achieve a deeper understanding of the human DNA. The advanced genetic risk prediction will be a major step towards more individual care. 3. Drug Development The drug discovery process is highly complicated and involves many disciplines. The greatest ideas are often bounded by billions of testing, huge financial and time expenditure. On average, it takes twelve years to make an official submission. Data science applications and machine learning algorithms simplify and shorten this process, adding a perspective to each step from the initial screening of drug compounds to the prediction of the success rate based on the biological factors. Such algorithms can forecast how the compound will act in the body using advanced mathematical modeling and simulations instead of the “lab experiments”. The idea behind the computational drug discovery is to create computer model simulations as a biologically relevant network simplifying the prediction of future outcomes with high accuracy. DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 9 4. Virtual assistance for patients and customer support Optimization of the clinical process builds upon the concept that for many cases it is not actually necessary for patients to visit doctors in person. A mobile application can give a more effective solution by bringing the doctor to the patient instead. The AI-powered mobile apps can provide basic healthcare support, usually as chatbots. You simply describe your symptoms, or ask questions, and then receive key information about your medical condition derived from a wide network linking symptoms to causes. Apps can remind you to take your medicine on time, and if necessary, assign an appointment with a doctor. This approach promotes a healthy lifestyle by encouraging patients to make healthy decisions, saves their time waiting in line for an appointment, and allows doctors to focus on more critical cases. The most popular applications nowadays are Your.MD and Ada. Internet Search Now, this is probably the first thing that strikes your mind when you think Data Science Applications. When we speak of search, we think ‘Google’. Right? But there are many other search engines like Yahoo, Bing, Ask, AOL, and so on. All these search engines (including Google) make use of data science algorithms to deliver the best result for our searched query in a fraction of seconds. Considering the fact that, Google processes more than 20 petabytes of data every day. Had there been no data science, Google wouldn’t have been the ‘Google’ we know today. DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 10 Targeted Advertising If you thought Search would have been the biggest of all data science applications, here is a challenger – the entire digital marketing spectrum. Starting from the display banners on various websites to the digital billboards at the airports – almost all of them are decided by using data science algorithms. This is the reason why digital ads have been able to get a lot higher CTR (Call-Through Rate) than traditional advertisements. They can be targeted based on a user’s past behavior. This is the reason why you might see ads of Data Science Training Programs while I see an ad of apparels in the same place at the same time. TOP 10 DATA SCIENCE TRENDS FOR THIS DECADE The presence of data in every field that you can think of is what turns out to be a reason why organizations are showing interest in data science. Also, the fact that data will continue to be an integral part of our lives till eternity serves to be yet another driver of data science. That said, it’s really important to stay updated with the hottest data science trends that could serve to be a blessing to grow your business. Here are the top 10 data science trends for this decade. 1. Predictive analysis For a business to prosper, it is critical to know what the future might look like. This is exactly where predictive analysis comes into play. Organizations rely on their customers to a large extent. Hence, being able to understand their behaviors helps in making better decisions ahead. This technique is one of the smartest to come up with the best strategies to target the customers that’d aid in retaining the older ones and also get newer customers. 2.Machine learning Over the years, we have seen how much automation has transformed the world. This is why machine learning has gained importance like never before. The coming years will see more automation and hence the rise in the number of organizations adopting machine learning will surpass one’s imagination for sure. 3, IoT Gone are the days when IoT was considered to be something that would have limited applications. Today, we are living in a world where our smartphones have the ability to control appliances like TV, AC, etc. All of this is possible because of IoT. Google Assistant is yet another remarkable innovation in the area of IoT. Thus, companies looking for ways to invest in this technology come as no big surprise. This simply throws light on how rapidly the IoT industry would grow in the days ahead. 4.. Blockchain Needless to say, cryptocurrencies like Bitcoin, Litecoin, etc. have become the talk of the world. All of these currencies employ blockchain technology. With the world DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 11 showing keen interest in this field, it surely stands a far-reaching implementation in the coming time 5. Edge computing Edge computing is known for faster processing of information and it also boasts of reducing latency, cost and traffic. It is solely because of these features that the organizations are not willing to sideline this option. With this computing in place, dealing with real-time applications couldn’t have got any better. The coming years could see more of a considerable shift from traditional methods to that of edge computing. 6. DataOps Lets’ face the reality – the data pipeline has become more complex and thus requires even more integration and governance tools. DataOps to our rescue it is! Tasks right from collection to preparation to analysis, testing automation, implementing automated testing, delivery for providing enhanced data quality and analysis are all covered. This trend will continue for the years to come. 7, Artificial Intelligence Be it a small enterprise or a tech giant, all of them have relied on AI in one way or the other. All those complex tasks are no longer a concern for we now can rely on AI for the same. Also, the reduction in errors is yet another strong reason to why AI stands apart. Now that we’ve relied on AI so much, there’s no coming back! 8. Data visualization This is one of those prominent trends that we can trust with. This is because the organizations are moving their conventional data warehouses to the cloud. 9. Better user experience The extent to which user experience is given importance to talks volume about the success of the company. This is why companies are leaving no stone unturned in providing the best possible user experience – be it in the form of chatbots, personal assistance, or AI-driven tools for that matter. 10. Data governance This is yet another area that’s gaining a lot of importance. Numerous companies out there are still struggling to comply with the rules and regulations. It is critical to not just comply with these but also to understand the impact of the same on the present and future operations. Data scientists who have sound knowledge about all of this is the need of the hour. These trends show a clearer picture of what data science strategies need to be implemented to retain your customers and also take your business to new heights. Current Trends in Data Science With the diversity in data problems and requirements, comes a broad range of innovative solutions. These solutions often bring with themselves a host of data science DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 12 trends granting businesses the agility they require while offering them deeper insights into their data. A few of these top Data Science trends are briefly explained below: 1. Graph Analytics With data flowing in from all directions, it becomes harder to analyze. Graph Analytics aims to solve this problem by acting as a flexible yet powerful tool that analyzes complicated data points and relationships using graphs. The intention behind using graphs is to represent the complex data abstractly and in a visual format that is easier to digest and offers maximum insights. Graph Analytics are applied in a plethora of areas such as: Filtering out bots on social media to reduce false information Identifying frauds in banking industries Preventing financial crime Analyzing power and water grids to find flaws 2. Data Fabric Data Fabric is a relatively new trend, and at its core, it encapsulates an organization’s data collected from a vast number of sources such as APIs, reusable data services, pipelines, semantic tiers, providing transformable access to data. Created for assisting the business context of data and keeping data in an intelligible way not just for users but also for applications, Data Fabrics enable you to have scalable data while being agile. By doing so, you get unparalleled access to process, manage, store, and share the data as needed. Business Intelligence and Data Science relies heavily upon Data Fabrics due to its smooth and clean access to enormous amounts of data. 3. Data Privacy by Design The trend of Data privacy by design incorporates a safer and more proactive approach to collecting and handling user data while training your machine learning model on it. Corporations need user data to train their models on real-world scenarios, and they collect data from various sources such as browsing patterns and devices. The idea behind Federated Learning is to collect as little data as possible, keeping the user in the loop by also giving them the option to opt-out and erase all collected data at any time. While the data may come from an enormous audience, for privacy reasons, it must be guaranteed that any reverse-engineering of the original data to identify the user isn’t possible. 4. Augmented Analytics Augmented Analytics refers to driving better insights from the data in hand by excluding any incorrect conclusions or bias for optimized decisions. By infusing Artificial Intelligence and Machine Learning, Augmented Analytics aids users in planning a new model. With reduced dependency on data scientists and machine learning experts, Augmented Analytics aims to deliver relatively better insights on data to aid the entire Business Intelligence process. DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 13 This subtle introduction of Artificial Intelligence & Machine Learning has a significant impact on the traditional insight discovery process by automating many aspects of data science. Augmented Analytics is gaining a stronghold in providing better decisions free of any errors and bias in the analysis. 5. Python as the De-Facto Language for Data Science Python is an absolute all-rounder programming language and is considered a worth entry point if you’re interested in getting into the world of Artificial Intelligence and Data Science. Python comes stacked with integration for numerous programming languages and libraries, making it an excellent option for, say, jumping into creating a quick prototype for the problem at hand or going in-depth into large datasets. Some of its most popular libraries are ● TensorFlow, for machine learning workloads and working with datasets ● scikit-learn, for training machine learning models ● PyTorch, for computer vision and natural language processing ● Keras, as the code interface for highly complex mathematical calculations and operations ● SparkMLlib, like Apache Spark’s Machine Learning library, making machine learning easy for everyone with tools like algorithms and utilities 6. Widespread Automation in Data Science Time is a critical component, and none of it should be spent on performing repetitive tasks. As Artificial intelligence advanced, its automation capabilities expanded as well. Various innovations in automation are turning many complex Artificial Intelligence tasks easier. Automation in the field of Data Science is already simplifying much of the process, if not all. The entire process of Data Science includes identification of the problem, data collection, processing, exploration, analysis, and sharing of processed information to others. 7. Conversational Analytics and Natural Language Processing Natural Language Processing and Conversational Analytics are already making big waves in the digital world by simplifying the way we interact with machines and look up information online. NLP has hugely helped us progress into an era where computers and humans can communicate in common natural language, enabling a constant and fluent conversation between the two. The applications of NLP and conversational systems can be seen everywhere, such as chatbots and smart digital assistants. It has been predicted that the usage of voicebased searches will exceed the more commonly used text-based searches in a very short time. DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 14 8. Super-sized Data Science in the Cloud The onset of Artificial Intelligence and the amount of data generated from it has skyrocketed ever since. The size of data grew tremendously from a few gigabytes to a few hundred as businesses grew their online presence. This increased requirement of data storage and processing capabilities gave rise to Data Science for a controlled and precise utilization of data and pushed organizations working on a global scale to opt for cloud solutions. Various cloud solutions providers such as Google, Amazon, Microsoft offer vast cloud computing options that include enterprise-grade cloud server capabilities ensuring high scalability and zero downtime. 9, Mitigate Model Biases and Discrimination No model is entirely immune to biases, and they can begin to exhibit discriminatory behavior at any stage due to factors such as lack of sufficient data, historical bias, and incorrect data collection practices. Bias and discrimination is a common problem with models and is an emerging trend. If timely detected, these biases can be mitigated at three stages: Pre-Processing Stage In-Processing Stage Post-Processing Stage Each stage comes with its own set of corrective aspects including algorithms and techniques to optimize the model for fairness, and to increase its accuracy for eliminating any chance of bias. 10.In-Memory Computing In-Memory computing is an emerging trend that is vastly different from how we traditionally process data. In-Memory computing processes data stored in an in-memory database as opposed to the traditional methods using hard drives and relational databases with a querying language. This technique allows for processing and querying of data in real-time for instant decision making and reporting. With memory becoming cheaper and businesses relying on real-time results, InMemory computing enables them to have applications with richer, more interactive dashboards that can be supplied with newer data and be ready for reporting almost instantly. 11.Blockchain in Data and Analytics Blockchain, in simpler terms, is a time-stamped collection of immutable data managed by a cluster of computers, and not by any single entity. The chain here refers to the connection between each of these blocks, bound together using cryptographic algorithms. Transforming gradually similar to Data Science, Blockchain is crucial for maintaining and validating records while Data Science works on the collecting and information extraction part of the data. Data Science and Blockchain are related as they both use algorithms to govern various segments of their processing. DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 15 12. Advanced Image Recognition You upload your image with friends on Facebook and you start getting suggestions to tag your friends. This automatic tag suggestion feature uses face recognition algorithm. In their latest update, Facebook has outlined the additional progress they’ve made in this area, making specific note of their advances in image recognition accuracy and capacity. “We’ve witnessed massive advances in image classification (what is in the image?) as well as object detection (where are the objects?), but this is just the beginning of understanding the most relevant visual content of any image or video. Recently we’ve been designing techniques that identify and segment each and every object in an image, a key capability that will enable entirely new applications.” In addition, Google provides you with the option to search for images by uploading them. It uses image recognition and provides related search results. 13. peech Recognition Some of the best examples of speech recognition products are Google Voice, Siri, Cortana etc. Using speech-recognition feature, even if you aren’t in a position to type a message, your life wouldn’t stop. Simply speak out the message and it will be converted to text. However, at times, you would realize, speech recognition doesn’t perform accurately. 14. Airline Route Planning Airline Industry across the world is known to bear heavy losses. Except for a few airline service providers, companies are struggling to maintain their occupancy ratio and operating profits. With high rise in air-fuel prices and need to offer heavy discounts to customers has further made the situation worse. It wasn’t for long when airlines companies started using data science to identify the strategic areas of improvements. Now using data science, the airline companies can: 1. Predict flight delay 2. Decide which class of airplanes to buy 3. Whether to directly land at the destination or take a halt in between (For example, A flight can have a direct route from New Delhi to New York. Alternatively, it can also choose to halt in any country.) 4. Effectively drive customer loyalty programs Southwest Airlines, Alaska Airlines are among the top companies who’ve embraced data science to bring changes in their way of working. 15. Gaming Games are now designed using machine learning algorithms which improve/upgrade themselves as the player moves up to a higher level. In motion gaming also, your opponent (computer) analyzes your previous moves and accordingly shapes up its DR. NANDKUMAR KHACHANE DATA SCIENCE Page | 16 game. EA Sports, Zynga, Sony, Nintendo, Activision-Blizzard have led gaming experience to the next level using data science. 16. Augmented Reality This is the final of the data science applications which seems most exciting in the future. Augmented reality. Data Science and Virtual Reality do have a relationship, considering a VR headset contains computing knowledge, algorithms and data to provide you with the best viewing experience. A very small step towards this is the high trending game of Pokemon GO. The ability to walk around things and look at Pokemon on walls, streets, things that aren’t really there. The creators of this game used the data from Ingress, the last app from the same company, to choose the locations of the Pokemon and gyms.