Cross-Validation Techniques. This article aims to explain different… |... 1 of 30 https://medium.com/geekculture/cross-validation-techniques-33d389... Open in app Get started Published in Geek Culture Abhigyan Follow Aug 30, 2021 · 13 min read · Listen Save Cross-Validation Techniques This article aims to explain different Cross-Validation techniques and how they work. Contents: → Introduction → What is Cross-Validation? → Different Types of Cross-Validation 1. Hold-Out Method 2. K-Folds Method 3. Repeated K-Folds Method 4. Stratified K-Folds Method 5. Group K-Folds Method 6. Shuffle Split Method 7/17/2022, 11:55 PM Cross-Validation Techniques. This article aims to explain different… |... 2 of 30 7. Stratified Shuffle Split Method 8. Group Shuffle Split Method https://medium.com/geekculture/cross-validation-techniques-33d389... Open in app Get started 9. Leave-One-Out Method 10. Leave-P-Out Method 11. Leave-One-Group-Out Method 12. Leave-P-Group-Out Method 13. Time Series Cross-Validation Method 14. Blocked Cross-Validation Method 15. Nested Cross-Validation Method → Conclusion → Reference Introduction Imagine building a model on a dataset and it fails on unseen data. We cannot just fit the model on our training data and lay back hoping it will perform brilliantly for the real unseen data. This is a case of over-fitting, where our model has learned all the patterns and noise of training data, to avoid this we need some kind of way to guarantee that our model has captured most of the patterns and is not picking up every noise in the data(low bias and low variance), one of the many techniques to handle this is the Cross-Validation. What is Cross-Validation? • In machine learning, Cross-validation is a technique that evaluates any ML model by 7/17/2022, 11:55 PM Cross-Validation Techniques. This article aims to explain different… |... 3 of 30 https://medium.com/geekculture/cross-validation-techniques-33d389... training several ML models on subsets of the input data and evaluating them on the complementary subset of the data. Open in app Get started • It is mainly used to estimate any quantitative measure of fit that is appropriate for both data and model. • In the cross-validation approach, the test results are usually not biased because the data used for training and testing are mostly non-overlapping. Let’s First create two variables which I will be using to demonstrate further: data = ['Subset1', 'Subset2', 'Subset3', 'Subset4', 'Subset5', 'Subset6', 'Subset7', 'Subset8', 'Subset9', 'Subset10'] Y = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0] df = {"data":data, "Y":Y} df = pd.DataFrame(df) df Different methods of Cross-Validation are: 7/17/2022, 11:55 PM Cross-Validation Techniques. This article aims to explain different… |... 4 of 30 → Hold-Out Method: https://medium.com/geekculture/cross-validation-techniques-33d389... Open in app Get started • It is a simple train test split method. • Once the train test split is done, we can further split the test data into validation data and test data. for example: 1. Suppose there are 1000 data, we split the data into 80% train and 20% test. 2. My train data consist of 800 data points and the test will contain 200 data points. 3. Then we split our test data into 50% validation data and 50% test data. x_train, x_test, y_train, y_test = model_selection.train_test_split(df.data, df.Y, test_size = 0.2) for i,n in zip(x_train, y_train): print(i, "::", n) for i, n in zip(x_test, y_test): print(i, "::", n) 7/17/2022, 11:55 PM