3/11/23, 9:47 PM 5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science Open in app Sign up Sign In Search Medium Published in Towards Data Science You have 2 free member-only stories left this month. Sign up for Medium and get an extra one Federico Trotta Follow Dec 3, 2022 · 6 min read · · Listen Save 5 Python Libraries to Learn to Start Your Data Science Career Master these libraries for a smoother career path 368 4 https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431 1/9 3/11/23, 9:47 PM 5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science Image by 200 Degrees on Pixabay If you want to study Python for Data Science to start a new career, I’m sure you are struggling with all these things to know and master. I know you are overwhelmed by all these new concepts, including all the mathematics you should know, and you may feel you’ll never arrive at the goal of your new job. I know: job descriptions do not help with that. It really seems like Data Scientists must be aliens; even juniors, sometimes. In my opinion, an important skill to master is learning how to stop the fear of “I have to know everything”. Believe me: especially at the beginning, if you are pursuing a junior position, you absolutely do not have to know everything. Well, telling the truth: even seniors do not really know everything. So, If you want to start a career in Data Science, in this article I show you five Python libraries you absolutely have to know. 1. Anaconda As we can see on their website, Anaconda is: The world’s most popular open-source Python distribution platform Anaconda is a Python distribution specifically created for Data Science; so it is not properly a library, but we can intend it as a library because, in software development, a library is a collection of related modules; so, since Anaconda provides all the musthaves for Data Scientists — included the most used packages — we can intend it as a library and, also, is a must-have for you. The first important thing provided by Anaconda is Jupyter Notebook which is: the original web application for creating and sharing computational documents. It offers a simple, streamlined, document-centric experience. https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431 2/9 3/11/23, 9:47 PM 5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science A Jupyter Notebook. Image by Author. Jupyter Notebook is a web application that runs locally on your machine and it is created on purpose for Data Scientists. The main important characteristic that makes it attractive (and very useful) for Data Scientists is the fact that every cell runs independently giving us the possibility to: Do mathematical and coding experiments in independent cells, without affecting the whole code. Write text, if needed, in each cell; this makes Jupyter Notebooks the perfect environment to present scientific works with your code (so, you can forget Latex environments, if you want). To get started with Jupiter Notebooks, I advise you to read this guide here. https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431 3/9 3/11/23, 9:47 PM 5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science Then, when you gain experience, you may need some shortcuts to speed up your experience. You can use this guide here. Also, as said before, Anaconda provides us with all the packages needed for Data Science. This way we don’t have to install them. For example, say you need “pandas”; without Anaconda, you need to install it by typing $ pip install pandas in your terminal. With Anaconda you don’t have to do that because it installs pandas for us. A very good advantage! 2. Pandas Pandas is a library that makes you import, manipulate and analyze data. On their website, they say that pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. If you want to work with data you absolutely need to master Pandas because, nowadays, is widely used by Data Scientists and Analysts. The power of Pandas relies on the fact that this library makes us work with tabular data. In statistics, tabular data refers to data that is organized in a table with rows and columns. We typically refer to tabular data as data frames. This is important because we work with tabular data in a lot of situations; for example: With excel files. With CSV files. With databases. https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431 4/9 3/11/23, 9:47 PM 5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science A data frame is the representation of tabular data. Image from the Panda’s website here: https://pandas.pydata.org/docs/getting_started/index.html The reality of many firms is that, regardless of your role, you’ll always have to deal, somehow, with data in excel/CSV and/or in databases; this is why Pandas is a fundamental resource for you to master. Also, consider that you can even access data from databases and get them directly into your Jupyter Notebooks for further analysis in Pandas. We can do so using a library called PyOdbc . Take a look at that here. 3. Matplotlib After data manipulation and analysis with Pandas, you typically want to make some plots. This can be done with matplotlib which is: a comprehensive library for creating static, animated, and interactive visualizations in Python https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431 5/9 3/11/23, 9:47 PM 5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science Matplotlib is the first library to plot graphs I advise you to use, because it is widely used and, in my opinion, it helps you gain experience coding. Matplotlib helps us plot the most important plots we may need: Statistical plots like histograms or bar charts. Scatterplots. Boxplots. And many more. You can start with Matplotlib here, using their tutorials. 4. Seaborn At a certain point, when you’ve gained experience in analyzing data, you may not be completely satisfied with Matplotlib; mainly (in my experience) this may be due to the fact that to perform advanced plots we have to write a lot of code with matplotlib. This is why Seaborn may help you. Seaborn, in fact: is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. But what does it mean that Seaborn mainly helps us with advanced plots, letting us write less code than matplotlib? For example, say you have some data regarding people tipping waiters. We want to plot a graph of the total bill and the tip, but we want even to show if the people were smokers or not and if the people were at the restaurant at dinner or at launch. We can do so like that: # Import seaborn import seaborn as sns # Apply the default theme sns.set_theme() # Load the dataset tips = sns.load_dataset("tips") # Create the visualization sns.relplot( https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431 6/9 3/11/23, 9:47 PM 5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science data=tips, x="total_bill", y="tip", col="time", hue="smoker", style="smoker", size="size", ) And we get: The visualization of the data coded above. The image is taken from one tutorial on the Seaborn website here: https://seaborn.pydata.org/tutorial/introduction.html So, as we can see, with very few lines of code we can achieve a great result thanks to Seaborn. So, a question may arise: “should I use Matplotlib or Seaborn?” My advice is to start with Matplotlib and then move to Seaborn when you’ve gained some experience because the reality is that, most of the time, we use both Matplotlib and Seaborn (because remember: Seaborn is based on Matplotlib). 5. Scikit-learn The main thing that distinguishes a Data Analyst from a Data Scientist is the ability to use Machine Learning (ML). Machine Learning is the branch of Artificial Intelligence that focuses on the use of data and algorithms to make classifications or predictions. https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431 7/9 3/11/23, 9:47 PM 5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science In Python, ML models can be invoked and trained using a library called scikit-learn (sometimes called sk-learn) which is a library of: Simple and efficient tools for predictive data analysis. As a Data Scientist, all the work related to Machine Learning is done in sk-learn and this is why is fundamental for you to master at least the basics of this library. Conclusions The libraries we introduced have been numbered in ascending order, and my advice for you is to follow this order. So, first of all, install Anaconda to set up the environment and gain experience with Python, using Jupiter Notebooks. Then, start analyzing data with Pandas. Then visualize data with Matplotlib first and then with Seaborn. Finally, use sk-learn for Machine Learning. Consider becoming a member: you could support me with no additional fee. Click here to become a member for less than 5$/month so you can unlock all the stories, and support my writing. Data Science Education Machine Learning Programming Careers Enjoy the read? Reward the writer.Beta Your tip will go to Federico Trotta through a third-party platform of their choice, letting them know you appreciate their story. https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431 8/9 3/11/23, 9:47 PM 5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science Give a tip Sign up for The Variable By Towards Data Science Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look. By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices. Get this newsletter About Help Terms Privacy Get the Medium app https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431 9/9