Software & Tools For Data Science Dr. Nancy Agnes, Head, Technical Operations, Tutorsindia info@ tutorsindia.com I. INTRODUCTION Data Science is the analytical field that vitally depends upon the large amount of data, such as Big Data, to analyze the business problem and provide the accurate solution for the problem. But handling the huge amount of data is not the easy task. To avoid manual errors, the automatic computational and logical processes are enhanced via tools and Software. Using that Software and tools, the problem can be solved with a minimum amount of time with high accuracy. [1] II. NEED FOR SOFTWARE AND TOOLS Several tools and Software with high flexibility and features with good extracting and visualizing effects provide more accuracy even when the data is large. Many of the tools and Software provides high-efficiency and accurate results. [4] 1. Tableau: Tableau is the complete data visualization tool. It supports all kinds of worksheets and structured form of data for data processing, exploratory data analysis, and database compatibility. It is not an open-source platform. It is dependent upon the organization necessity. The visualization format is very admiring and good looking. The organization may possess a huge amount of business revenue annually, the vast amount of turnovers and losses, employee strength according to productivity, to understand the current market values and strategies can be estimated to forecast the organization strength. For instance, the Netflix viewers may increase/decrease according to the consecutive shows cast in a certain period. Many of the viewers may withdraw their accounts due to the poor quality of the streaming. Netflix analyzes the root cause for their withdrawals. The analytics process will be done to predict the cause for the withdrawal. Based on the analytics report, further modifications and other recommendations will be published and cast. [2] 2. Jupyter Notebook: Jupyter Notebook is a peak in the data science market because of its compatibility in both the statistical analytical languages Python and R. Jupyter supports coding flexibility Python and R language. Basically, it is a web-based application which supports all kind of worksheets and spreadsheets for data extraction and data manipulation. III. EFFICIENCY OF SOFTWARE AND TOOLS 4. Python: In recent years, many data scientists plant their roots in the Python language, which provide more flexible packages for statistical and mathematical analyses. Python has the feature to connect the other similar tools like Scipy, Dask, HPAT, Cython to provide more flexibility and reliability. By using Software and tools, the accuracy of results for a large number of business datasets can be obtained efficiently. Tools and Software also help transform the data into a visualized format existing in the structured or semi-structured form of data. Every Software and tools have a unique way of representing the data in the graphical format. The Software and tools generate the exact results and outcomes based on the report imported into it. The purpose of the data science tools and Software is to extract, manipulate, and process the data. On the other hand, converting the structured data doesn't convey any information and convert those data into useful information. [3] IV. RECENT TOOLS AND SOFTWARE IN DATA SCIENCE Copyright © 2021 TutorsIndia. All rights reserved 3. MATPLOTLIB: Matplotlib developed especially for Python language to provide more plotting and visualization features. Matplotlib provides more modules, especially for visualization. For instance, Pyplot provides more modules for graphs and plots. [5] 5. R and R Studio: As same as Python, R Studio designed especially for statistical and mathematical analytics. R Studio is the open-source platform. The console port of the R Studio supports more library packages and analytical functions. [6] 6. BigML: BigML is completely based on machine learning algorithm for data science and data analytics. It provides more flexible packages with automation regression, linear regression analysis, cluster analysis, anomaly detection, and forecasting 1 of time series data. The BigML has the feature of online assessment from the source website – bigml.com. visualization and analysis using R, RStudio, and RMarkdown." Journal of Statistics Education 25.2 (2017): 60-67. V. FUTURE SCOPE 1. As the data generating everywhere around the world, handling and manipulating the large volume of data will be the tedious process. So the need for data scientists is vast, and the processing of large amounts of data using automation tools provides better results. 2. The errors in manual computations will lead to recomputation which is time consumption process. To ignore those manual errors, tools and Software with high efficiency and accurate results even for forecasting and predictive analysis. 3. The minimal time of the process is enough for the Software and tools comparatively manual computations even for a small number of datasets. The automation tools exactly predict and provide the outcome based on the trained data set. VI. SUMMARY The world is full of data everywhere, and those data can be stored either physically or virtually. But handling the entire data is not the single-day process. It a routine for the data scientists to compute the tedious data and produce the output for the data. The dataset can be efficiently manipulated through recent technology-based tools such as Artificial Intelligence, Machine Learning, Cloud computing algorithms. REFERENCES 1. Zhang, Amy X., Michael Muller, and Dakuo Wang. "How do data science workers collaborate? roles, workflows, and tools." Proceedings of the ACM on Human-Computer Interaction 4.CSCW1 (2020): 1-23. 2. Saling, Kristin C., and Michael D. Do. "Leveraging People Analytics for an Adaptive Complex Talent Management System." Procedia Computer Science 168 (2020): 105-111. 3. Bloice, Marcus D., and Andreas Holzinger. "A tutorial on machine learning and data science tools with python." Machine Learning for Health Informatics (2016): 435-480. 4. Van Der Aalst, Wil. "Data science in action." Process mining. Springer, Berlin, Heidelberg, 2016. 3-23. 5. Ari, Niyazi, and Makhamadsulton Ustazhanov. "Matplotlib in python." 2014 11th International Conference on Electronics, Computer and Computation (ICECCO). IEEE, 2014. 6. Stander, Julian, and Luciana Dalla Valle. "On enthusing students about big data and social media Copyright © 2021 TutorsIndia. All rights reserved 2