Uploaded by Rinki

Application of machine learning

advertisement
Application of machine learning
1.
2.
3.
4.
5.
6.
7.
Image Recognition
Speech Recognition
Traffic Prediction
Product Recommendation
Self-driving car
Online fraud detection
Automatic language translation
Machine learning life cycle
1. Data gathering:
In this step we collect data from various resources such as file, database, internet
and mobile devices.
2. Data preparation:
After collecting data, we move data for further analysis and prepare it to use for
machine learning training.
3. Data wrangling:
Data wrangling is the process of cleaning and converting raw data into useable
format. It is process of cleaning the data and selecting the variable to use and
transform the data in proper format for further steps.in which we work on
4.
5.
6.
7.
1. Missing values
2. Duplicate data
3. Invalid data
4. Noise
Data Analysis:
selection of analytical techniques (such as classification, Regression, cluster)
building models
Review the results.
Train model
Test model
Deployment
Machine Learning Pipeline
Training set:
Basically, used for train and teach a machine learning model. The goal is for the model to
identify patterns and relationships so that it can make accurate predictions or classification
later on.
And it is used for training the model.
Testing set: it is basically used for checking the model performance.
If training accuracy increase and test accuracy decrease is in over fitting
If training as well as test accuracy decease it is under fitting.
Validation dataset: it is basically used of set for hyper parameter tunning.
Feature extraction: these techniques are also used for reducing the number of
features from the original features set to reduce model complexity, model
overfitting, enhance model computation efficiency and reduce generalization
error.
Supervised learning
Evaluation metrics provide objective and measurable criteria to compare and
evaluate different models or algorithms.
Evaluation metrics are used in various stages of the machine learning workflow,
including model development, model selection, and model deployment.
Here are a few keys reasons whys evaluation metrics are important.
1. Model comparison: compare the performance of different models or
algorithms.
2. Model selection: evaluation metrics help in selecting the best model that
meets the desired performance criteria.
3. Hyperparameter Tuning
4. Model deployment: evaluation metrics play a critical role in determining
whether a model is ready for development.
Accuracy, precision, recall and f1 score and mean squared error are some
commonly metrics for classification and regression tasks.
It is a performance measurement for machine learning classification problem
where output can be two or more classes. It is table with 4 different
combinations of predicted and actual values.
It is extremely useful for measuring recall, precision, specificity, accuracy and
AUC and roc curves.
AUC - ROC curve is a performance measurement for the
classification problems at various threshold settings. ROC is a
probability curve and AUC represents the degree or measure of
separability. It tells how much the model is capable of distinguishing
between classes. Higher the AUC, the better the model is at
predicting 0 classes as 0 and 1 classes as 1.
Cross Validation:
“Cross-Validation is a statistical method of evaluating and
comparing learning algorithms.”
The data is divided into two parts:
-Training: to learn or train a model
-Testing: to validate the model
Cross validation
It is used for:
-performance evaluation: evaluate the performance of classifier
using the given data.
Model selection: compare the performance of two or more
algorithms to determine the best algorithm for given data
Tuning model parameters: compare the performance of two variants
of parametric model.
Type of cross validation:
1.
2.
3.
4.
Resubstitution validation
Hold-fold cross validation
Leave-one-out cross validation.
Repeated k-fold cross validation.
To understand the data, you can perform the following steps.
S.No
1.
2.
Query
Determine the size of the data
Examine the data
Syntax
df.shape
df.sample(5)
3.
Check the data types of columns
df.info()
4.
Identify missing values
df.isnull.sum()
5.
Understand the data mathematically
df.describe()
6.
Detect duplicate values
df.duplicated.sum()
7.
Explore correlation between columns
df.corr()[coloumnName]
Download