Uploaded by sajidbhatti26

5 Python Libraries to Learn to Start Your Data Science Career by Federico Trotta Towards Data Science

advertisement
3/11/23, 9:47 PM
5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science
Open in app
Sign up
Sign In
Search Medium
Published in Towards Data Science
You have 2 free member-only stories left this month. Sign up for Medium and get an extra one
Federico Trotta
Follow
Dec 3, 2022 · 6 min read ·
·
Listen
Save
5 Python Libraries to Learn to Start Your Data
Science Career
Master these libraries for a smoother career path
368
4
https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431
1/9
3/11/23, 9:47 PM
5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science
Image by 200 Degrees on Pixabay
If
you want to study Python for Data Science to start a new career, I’m sure you
are struggling with all these things to know and master. I know you are
overwhelmed by all these new concepts, including all the mathematics you should
know, and you may feel you’ll never arrive at the goal of your new job.
I know: job descriptions do not help with that. It really seems like Data Scientists must
be aliens; even juniors, sometimes.
In my opinion, an important skill to master is learning how to stop the fear of “I have
to know everything”. Believe me: especially at the beginning, if you are pursuing a
junior position, you absolutely do not have to know everything. Well, telling the truth:
even seniors do not really know everything.
So, If you want to start a career in Data Science, in this article I show you five Python
libraries you absolutely have to know.
1. Anaconda
As we can see on their website, Anaconda is:
The world’s most popular open-source Python distribution platform
Anaconda is a Python distribution specifically created for Data Science; so it is not
properly a library, but we can intend it as a library because, in software development, a
library is a collection of related modules; so, since Anaconda provides all the musthaves for Data Scientists — included the most used packages — we can intend it as a
library and, also, is a must-have for you.
The first important thing provided by Anaconda is Jupyter Notebook which is:
the original web application for creating and sharing computational documents. It offers a
simple, streamlined, document-centric experience.
https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431
2/9
3/11/23, 9:47 PM
5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science
A Jupyter Notebook. Image by Author.
Jupyter Notebook is a web application that runs locally on your machine and it is
created on purpose for Data Scientists. The main important characteristic that makes it
attractive (and very useful) for Data Scientists is the fact that every cell runs
independently giving us the possibility to:
Do mathematical and coding experiments in independent cells, without affecting
the whole code.
Write text, if needed, in each cell; this makes Jupyter Notebooks the perfect
environment to present scientific works with your code (so, you can forget Latex
environments, if you want).
To get started with Jupiter Notebooks, I advise you to read this guide here.
https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431
3/9
3/11/23, 9:47 PM
5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science
Then, when you gain experience, you may need some shortcuts to speed up your
experience. You can use this guide here.
Also, as said before, Anaconda provides us with all the packages needed for Data
Science. This way we don’t have to install them. For example, say you need “pandas”;
without Anaconda, you need to install it by typing
$ pip install pandas
in your
terminal. With Anaconda you don’t have to do that because it installs pandas for us. A
very good advantage!
2. Pandas
Pandas is a library that makes you import, manipulate and analyze data. On their
website, they say that
pandas is a fast, powerful, flexible and easy to use open source data analysis and
manipulation tool, built on top of the Python programming language.
If you want to work with data you absolutely need to master Pandas because,
nowadays, is widely used by Data Scientists and Analysts.
The power of Pandas relies on the fact that this library makes us work with tabular
data. In statistics, tabular data refers to data that is organized in a table with rows and
columns. We typically refer to tabular data as data frames.
This is important because we work with tabular data in a lot of situations; for example:
With excel files.
With CSV files.
With databases.
https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431
4/9
3/11/23, 9:47 PM
5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science
A data frame is the representation of tabular data. Image from the Panda’s website here:
https://pandas.pydata.org/docs/getting_started/index.html
The reality of many firms is that, regardless of your role, you’ll always have to deal,
somehow, with data in excel/CSV and/or in databases; this is why Pandas is a
fundamental resource for you to master.
Also, consider that you can even access data from databases and get them directly into
your Jupyter Notebooks for further analysis in Pandas. We can do so using a library
called
PyOdbc . Take
a look at that here.
3. Matplotlib
After data manipulation and analysis with Pandas, you typically want to make some
plots. This can be done with matplotlib which is:
a comprehensive library for creating static, animated, and interactive visualizations in
Python
https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431
5/9
3/11/23, 9:47 PM
5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science
Matplotlib is the first library to plot graphs I advise you to use, because it is widely
used and, in my opinion, it helps you gain experience coding.
Matplotlib helps us plot the most important plots we may need:
Statistical plots like histograms or bar charts.
Scatterplots.
Boxplots.
And many more. You can start with Matplotlib here, using their tutorials.
4. Seaborn
At a certain point, when you’ve gained experience in analyzing data, you may not be
completely satisfied with Matplotlib; mainly (in my experience) this may be due to the
fact that to perform advanced plots we have to write a lot of code with matplotlib. This
is why Seaborn may help you. Seaborn, in fact:
is a Python data visualization library based on matplotlib. It provides a high-level interface
for drawing attractive and informative statistical graphics.
But what does it mean that Seaborn mainly helps us with advanced plots, letting us
write less code than matplotlib? For example, say you have some data regarding people
tipping waiters. We want to plot a graph of the total bill and the tip, but we want even to
show if the people were smokers or not and if the people were at the restaurant at
dinner or at launch. We can do so like that:
# Import seaborn
import seaborn as sns
# Apply the default theme
sns.set_theme()
# Load the dataset
tips = sns.load_dataset("tips")
# Create the visualization
sns.relplot(
https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431
6/9
3/11/23, 9:47 PM
5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science
data=tips,
x="total_bill", y="tip", col="time",
hue="smoker", style="smoker", size="size",
)
And we get:
The visualization of the data coded above. The image is taken from one tutorial on the Seaborn website here:
https://seaborn.pydata.org/tutorial/introduction.html
So, as we can see, with very few lines of code we can achieve a great result thanks to
Seaborn.
So, a question may arise: “should I use Matplotlib or Seaborn?”
My advice is to start with Matplotlib and then move to Seaborn when you’ve gained
some experience because the reality is that, most of the time, we use both Matplotlib
and Seaborn (because remember: Seaborn is based on Matplotlib).
5. Scikit-learn
The main thing that distinguishes a Data Analyst from a Data Scientist is the ability to
use Machine Learning (ML). Machine Learning is the branch of Artificial Intelligence
that focuses on the use of data and algorithms to make classifications or predictions.
https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431
7/9
3/11/23, 9:47 PM
5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science
In Python, ML models can be invoked and trained using a library called scikit-learn
(sometimes called sk-learn) which is a library of:
Simple and efficient tools for predictive data analysis.
As a Data Scientist, all the work related to Machine Learning is done in sk-learn and
this is why is fundamental for you to master at least the basics of this library.
Conclusions
The libraries we introduced have been numbered in ascending order, and my advice
for you is to follow this order. So, first of all, install Anaconda to set up the
environment and gain experience with Python, using Jupiter Notebooks. Then, start
analyzing data with Pandas. Then visualize data with Matplotlib first and then with
Seaborn. Finally, use sk-learn for Machine Learning.
Consider becoming a member: you could support me with no additional fee. Click here to
become a member for less than 5$/month so you can unlock all the stories, and support my
writing.
Data Science
Education
Machine Learning
Programming
Careers
Enjoy the read? Reward the writer.Beta
Your tip will go to Federico Trotta through a third-party platform of their choice, letting them know you appreciate their
story.
https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431
8/9
3/11/23, 9:47 PM
5 Python Libraries to Learn to Start Your Data Science Career | by Federico Trotta | Towards Data Science
Give a tip
Sign up for The Variable
By Towards Data Science
Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge
research to original features you don't want to miss. Take a look.
By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.
Get this newsletter
About
Help
Terms
Privacy
Get the Medium app
https://towardsdatascience.com/5-python-libraries-to-learn-to-start-your-data-science-career-2cd24a223431
9/9
Download