Application of machine learning 1. 2. 3. 4. 5. 6. 7. Image Recognition Speech Recognition Traffic Prediction Product Recommendation Self-driving car Online fraud detection Automatic language translation Machine learning life cycle 1. Data gathering: In this step we collect data from various resources such as file, database, internet and mobile devices. 2. Data preparation: After collecting data, we move data for further analysis and prepare it to use for machine learning training. 3. Data wrangling: Data wrangling is the process of cleaning and converting raw data into useable format. It is process of cleaning the data and selecting the variable to use and transform the data in proper format for further steps.in which we work on 4. 5. 6. 7. 1. Missing values 2. Duplicate data 3. Invalid data 4. Noise Data Analysis: selection of analytical techniques (such as classification, Regression, cluster) building models Review the results. Train model Test model Deployment Machine Learning Pipeline Training set: Basically, used for train and teach a machine learning model. The goal is for the model to identify patterns and relationships so that it can make accurate predictions or classification later on. And it is used for training the model. Testing set: it is basically used for checking the model performance. If training accuracy increase and test accuracy decrease is in over fitting If training as well as test accuracy decease it is under fitting. Validation dataset: it is basically used of set for hyper parameter tunning. Feature extraction: these techniques are also used for reducing the number of features from the original features set to reduce model complexity, model overfitting, enhance model computation efficiency and reduce generalization error. Supervised learning Evaluation metrics provide objective and measurable criteria to compare and evaluate different models or algorithms. Evaluation metrics are used in various stages of the machine learning workflow, including model development, model selection, and model deployment. Here are a few keys reasons whys evaluation metrics are important. 1. Model comparison: compare the performance of different models or algorithms. 2. Model selection: evaluation metrics help in selecting the best model that meets the desired performance criteria. 3. Hyperparameter Tuning 4. Model deployment: evaluation metrics play a critical role in determining whether a model is ready for development. Accuracy, precision, recall and f1 score and mean squared error are some commonly metrics for classification and regression tasks. It is a performance measurement for machine learning classification problem where output can be two or more classes. It is table with 4 different combinations of predicted and actual values. It is extremely useful for measuring recall, precision, specificity, accuracy and AUC and roc curves. AUC - ROC curve is a performance measurement for the classification problems at various threshold settings. ROC is a probability curve and AUC represents the degree or measure of separability. It tells how much the model is capable of distinguishing between classes. Higher the AUC, the better the model is at predicting 0 classes as 0 and 1 classes as 1. Cross Validation: “Cross-Validation is a statistical method of evaluating and comparing learning algorithms.” The data is divided into two parts: -Training: to learn or train a model -Testing: to validate the model Cross validation It is used for: -performance evaluation: evaluate the performance of classifier using the given data. Model selection: compare the performance of two or more algorithms to determine the best algorithm for given data Tuning model parameters: compare the performance of two variants of parametric model. Type of cross validation: 1. 2. 3. 4. Resubstitution validation Hold-fold cross validation Leave-one-out cross validation. Repeated k-fold cross validation. To understand the data, you can perform the following steps. S.No 1. 2. Query Determine the size of the data Examine the data Syntax df.shape df.sample(5) 3. Check the data types of columns df.info() 4. Identify missing values df.isnull.sum() 5. Understand the data mathematically df.describe() 6. Detect duplicate values df.duplicated.sum() 7. Explore correlation between columns df.corr()[coloumnName]