Uploaded by Huy Nguyễn Đình

Outliers and Overfitting when Machine Learning Models can’t Reason Huy-

advertisement
Outliers and Overfitting when Machine Learning Models can’t Reason -
From Power BI, Python to ML, NLP, DL. Learn tools & skills of Data Science!
19/07/2023, 15:18
×
Learn Now
Home
Outliers and Overfitting when Machine
Learning Models can’t Reason

Mobarak Inuwa — Published On July 5, 2022
Algorithm Beginner Machine Learning
This article was published as a part of the Data Science Blogathon.
Introduction
Datasets are to machine learning models what experiences are to human beings. Have you
ever witnessed a strange occurrence? What exactly do you consider to be strange? What
constitutes an odd event? Is it based on comparisons with uncommon circumstances or
things that have never happened to you? Accordingly, a weird encounter, incident, or event
deviates significantly from the norm.
Source: https://www.freepik.com/free-photo/economicalresearch_5402552.htm#query=graphs&position=10&from_view=search
https://www.analyticsvidhya.com/blog/2022/07/outliers-and-overfitting-when-machine-learning-models-cant-reason/
Page 1 of 8
Outliers and Overfitting when Machine Learning Models can’t Reason -
19/07/2023, 15:18
Humans base their decisions on specific events that have occurred in the past and use
reasoning to overcome unexpected situations. When odd data that has never been seen
before enters the picture, machines try to absorb it as it is since they employ analytics rather
than reasoning. This occurs because machines are limited in their ability to think creatively,
or in this case, creatively, which will mean outside of datasets.
What are Outliers in Machine Learning?
When people run into something that seems unexpected, their smooth flow is interrupted,
and they are forced to alter their usual behavior. When a distribution or dataset from which a
computer should learn contains unusual inputs that stand out, this is referred to as an outlier.
The standard, common flow of the dataset is disrupted by this point. Physical measurements
are where outliers are usually found. They occur when either the tool or the person using it
makes a mistake or if an unusual occurrence occurs in the measurement environment that
results in a particular disturbance.
When outliers occur in machine learning, the models experience a strangeness. It causes the
model’s typical thinking from the usual pattern to be somewhat altered, which can result in
what is known as overfitting in machine learning.
By simply using specific strategies, such as sorting and grouping the dataset, we may quickly
discover or detect the presence of outliers in datasets. Such a strategy would allow an outlier
to shoot against the grain and become more apparent.
Become a Full Stack Data Scientist
Transform into an expert and significantly impact the
world of data science.
Download Brochure
What is Overfitting?
Generally speaking, machines are faster and more accurate than people. But the capacity to
deduce or be deductive is one area where computers fall short. While humans are deductive,
machine learning models operate mainly from analysis or analytics. This suggests that
although computers use statistics, people work based on thinking. We can make decisions
that may entirely depart from what is already in place by reasoning rather than just accepting
things. However, computers do not think; instead, they operate on the so-called “garbage in,
garbage out” principle.
https://www.analyticsvidhya.com/blog/2022/07/outliers-and-overfitting-when-machine-learning-models-cant-reason/
Page 2 of 8
Outliers and Overfitting when Machine Learning Models can’t Reason -
19/07/2023, 15:18
Consider an educational setting, for example. This school aims to create a machine learning
model that predicts the graduating grades of new students using the test results of previous
students from the school database. The scores of students in various courses will be included
in the dataset. Take, for example, one student’s entries in the dataset. Let’s assume that this
student has had 40 courses recorded in the dataset so far. Suppose this student is performing
well and passing 39 courses with a grade above 90 percent before abruptly failing one course
with a grade below 10 percent. This 10% score stands out in the dataset as an outlier. This is
because it causes a noticeable change in this student’s typical distribution.
Source: Fat man photo created by Anastasia Kazakova
The student was likely ill when enrolled in this course, which would have harmed their
performance. Alternatively, it’s possible that the course lecturer miscalculated the students’
grades. A human mind might be able to process this. But our model will need more than
analyze data to overcome this problem. Humans can solve this issue via reasoning.
Computers cannot think. Therefore, they accept the situation as it is. This can occasionally
significantly impact the model’s accuracy or functionality. Overfitting issues arise as a result
of this.
Overfitting is when an uncommon occurrence in the input data causes a machine learning
model to produce incorrect results. Alternately, the model may be stressing something
illogical.
How to tackle Outliers and Overfittings?
We can combat overfitting most frequently by removing them from our work because
computers cannot reason and would like to take the data as it is. Using the student in the
institution as an example, When one grade out of 40 grades with an average of above 90%
goes below 10%, we can delete it or, better yet, we should do what should be most likely,
https://www.analyticsvidhya.com/blog/2022/07/outliers-and-overfitting-when-machine-learning-models-cant-reason/
Page 3 of 8
Outliers and Overfitting when Machine Learning Models can’t Reason -
19/07/2023, 15:18
which is to utilize the average of the other point for replacing the outlier. This can be done by
replacing the outlier with the average score. If we reason by our example, this should be the
correct conclusion, but it might not hold true in all cases of outliers.
It is also necessary to note that some machine learning models may perform well even when
there are outliers, but others will fail miserably. Everything depends on the model’s
construction method and design. Inexplicably, specific models can function well even in the
presence of outliers, while others are unable to do so.
Are Anomaly and Outlier the Same?
It is noteworthy to respond to this question as well. The word “anomaly” is frequently used in
data science activities. The majority of the time, it represents different information and
differs from outliers. Outliers are often limited in number and occur in the new dataset,
which is one way the two differ, and they are comparatively few and far away.
On the other hand, an anomaly is defined in data science as an output that may represent a
distribution or pattern but does not accurately reflect the dataset. Anomalies are more like
findings that might not have originated from outliers, while outliers are points that deviate
from the distribution and can be seen separately. Outliers can sometimes be mistaken for
anomalies, while the reverse is not always true.
Become a Full Stack Data Scientist
Transform into an expert and significantly impact the
world of data science.
Download Brochure
Though in some rare cases, outliers can be called anomalies, anomalies can not be called
outliers.
For example, we may delete the record and go on when dealing with outliers, but an anomaly
will need some preprocessing treatment. It cannot be logically removed or deleted
immediately. As a result, we can argue that anomalies are outcomes and outliers are flawed
inputs. We continue to look for outliers in our datasets, but abnormalities can subsequently
occur even in an experiment that was previously good and clean.
Conclusion
https://www.analyticsvidhya.com/blog/2022/07/outliers-and-overfitting-when-machine-learning-models-cant-reason/
Page 4 of 8
Outliers and Overfitting when Machine Learning Models can’t Reason -
19/07/2023, 15:18
Outliers aren’t always bad; we don’t always need to get rid of them or count against them. In
other cases, they might provide essential information and serve as the project’s cherry on
top. Mathematically, outliers interfere with these outcomes because most machine learning
models use ranges, averages, and distributions to apply their learning. This causes the
presence of outliers to change how the models and algorithms are implemented. For this
reason, it is more often to need to remove outliers. Anomalies will usually require
reprocessing, which might be due to outliers.
Key points to note:
Humans utilize reasoning to overcome unforeseen situations and base their
conclusions on specific events that have happened in the past.
When a distribution or dataset from which a computer should learn has odd inputs that
stand out, this is referred to as an outlier.
An unusual occurrence in the input data causes a machine learning model to provide
false results, which is overfitting. Alternatively, the model can emphasize an illogical
point.
It is essential to remember that while some machine learning models may succeed even
in the presence of outliers, others will utterly fail depending on how the model is built
and designed.
While the opposite is not usually true, outliers can occasionally be mistaken for
abnormalities.
The media shown in this article is not owned by Analytics Vidhya and is used at the
Author’s discretion.
blogathon
data analysis
machine learning
outliners
overfitting
https://www.analyticsvidhya.com/blog/2022/07/outliers-and-overfitting-when-machine-learning-models-cant-reason/
Page 5 of 8
Outliers and Overfitting when Machine Learning Models can’t Reason -
19/07/2023, 15:18
About the Author
Mobarak Inuwa
Our Top Authors
view
more
Download
Analytics Vidhya App for the Latest blog/Article
Previous Post
Learn Everything about MapReduce
Architecture & its Components
Next Post
Guide to the Intuitive Confusion Matrix
https://www.analyticsvidhya.com/blog/2022/07/outliers-and-overfitting-when-machine-learning-models-cant-reason/
Page 6 of 8
Outliers and Overfitting when Machine Learning Models can’t Reason -
19/07/2023, 15:18
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment
Name*
Email*
Website
Notify me of follow-up comments by email.
Notify me of new posts by email.
Submit
Top Resources
Top 10 SQL Projects for
Top 10 Uses of Python in
How to Change Career
Understand Random
Data Analysis
the Real World with..
avcontentteam -
avcontentteam -
from Mechanical
Engineer to Data
Scientist?
Forest Algorithms With
Examples (Updated
2023)
JUL 16, 2023
JUL 14, 2023
avcontentteam -
Sruthi E R - JUN 17, 2021
JUL 11, 2023
https://www.analyticsvidhya.com/blog/2022/07/outliers-and-overfitting-when-machine-learning-models-cant-reason/
Page 7 of 8
Outliers and Overfitting when Machine Learning Models can’t Reason -
Download App
19/07/2023, 15:18
Analytics Vidhya
Data Scientists
Companies
Visit us
About Us
Blog
Post Jobs

Our Team
Hackathon
Trainings
Careers
Discussions
Hiring Hackathons
Contact us
Apply Jobs
Advertising



We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics
Vidhya, you agree to our Privacy Policy and Terms of Use.
Accept
© Copyright 2013-2023 Analytics Vidhya.
https://www.analyticsvidhya.com/blog/2022/07/outliers-and-overfitting-when-machine-learning-models-cant-reason/
Privacy Policy Terms of Use Refund Policy
Page 8 of 8
Download