Uploaded by Dir Engg Svcs

Cross-Validation Techniques

advertisement
Cross-Validation Techniques. This article aims to explain different… |...
1 of 30
https://medium.com/geekculture/cross-validation-techniques-33d389...
Open in app
Get started
Published in Geek Culture
Abhigyan
Follow
Aug 30, 2021 · 13 min read ·
Listen
Save
Cross-Validation Techniques
This article aims to explain different Cross-Validation techniques and
how they work.
Contents:
→ Introduction
→ What is Cross-Validation?
→ Different Types of Cross-Validation
1. Hold-Out Method
2. K-Folds Method
3. Repeated K-Folds Method
4. Stratified K-Folds Method
5. Group K-Folds Method
6. Shuffle Split Method
7/17/2022, 11:55 PM
Cross-Validation Techniques. This article aims to explain different… |...
2 of 30
7. Stratified Shuffle Split Method
8. Group Shuffle Split Method
https://medium.com/geekculture/cross-validation-techniques-33d389...
Open in app
Get started
9. Leave-One-Out Method
10. Leave-P-Out Method
11. Leave-One-Group-Out Method
12. Leave-P-Group-Out Method
13. Time Series Cross-Validation Method
14. Blocked Cross-Validation Method
15. Nested Cross-Validation Method
→ Conclusion
→ Reference
Introduction
Imagine building a model on a dataset and it fails on unseen data.
We cannot just fit the model on our training data and lay back hoping it will perform
brilliantly for the real unseen data.
This is a case of over-fitting, where our model has learned all the patterns and noise of
training data, to avoid this we need some kind of way to guarantee that our model has
captured most of the patterns and is not picking up every noise in the data(low bias and
low variance), one of the many techniques to handle this is the Cross-Validation.
What is Cross-Validation?
• In machine learning, Cross-validation is a technique that evaluates any ML model by
7/17/2022, 11:55 PM
Cross-Validation Techniques. This article aims to explain different… |...
3 of 30
https://medium.com/geekculture/cross-validation-techniques-33d389...
training several ML models on subsets of the input data and evaluating them on the
complementary subset of the data.
Open in app
Get started
• It is mainly used to estimate any quantitative measure of fit that is appropriate for
both data and model.
• In the cross-validation approach, the test results are usually not biased because the
data used for training and testing are mostly non-overlapping.
Let’s First create two variables which I will be using to demonstrate further:
data = ['Subset1', 'Subset2', 'Subset3', 'Subset4', 'Subset5',
'Subset6', 'Subset7', 'Subset8', 'Subset9', 'Subset10']
Y = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
df = {"data":data, "Y":Y}
df = pd.DataFrame(df)
df
Different methods of Cross-Validation are:
7/17/2022, 11:55 PM
Cross-Validation Techniques. This article aims to explain different… |...
4 of 30
→ Hold-Out Method:
https://medium.com/geekculture/cross-validation-techniques-33d389...
Open in app
Get started
• It is a simple train test split method.
• Once the train test split is done, we can further split the test data into validation
data and test data.
for example:
1. Suppose there are 1000 data, we split the data into 80% train and 20% test.
2. My train data consist of 800 data points and the test will contain 200 data points.
3. Then we split our test data into 50% validation data and 50% test data.
x_train, x_test, y_train, y_test =
model_selection.train_test_split(df.data, df.Y, test_size = 0.2)
for i,n in zip(x_train, y_train):
print(i, "::", n)
for i, n in zip(x_test, y_test):
print(i, "::", n)
7/17/2022, 11:55 PM
Download