Uploaded by Nishant Bamal

Project Report

advertisement
Machine Learning Model for Analysis
OPTIMISED PREDICTIVE MODEL
Submitted by
Mukul saini (22BCS14902)
Nishant Bamal (22BCS15558)
in partial fulfillment for the award of the degree of
Bachelors of Engineering
IN
Computer Science
Chandigarh University
February - June 2023
INTRODUCTION .............................................................
PAGE NO.
SNO.
DESCRIPTION
1.
STUDENTS AND ACADEMICS
6
2.
ABOUT THE DATASET AND MODEL
7
10
3.
The Dataset
12
4.
TImeline
Literature Review…………………………………………
SNO.
1.
2.
PAGE NO.
DESCRIPTION
TIMELINE OF REPORTED
PROBLEM
EXISTING SOLUTIONS
13
13
14
3.
BIBLIOMETRIC
ANALYSIS
14
4.
REVIEW SUMMARY
5.
References
15
Design flow and
Process……………………………………….
S.No.
1.
Description
Proposed Methodology
Page No.
16
2.
Generalized Method
16
3.
Phase wise association
17
Result analysis and Validation
S.No.
1.
Description
Types of Analysis
Page No.
20
2.
Analysis Tables
21
3.
Analysis through Graphs
23
Conclusion and Summary
S.No.
1.
2.
Description
Conclusion and Summary
Team Roles
Page No.
24
25
List of Figures
Figure 1.1
Figure 1.2
Figure 1.3
Figure 1.4
Figure 1.5
List of Tables
Table 1.1
Table 1.2
Table 1.3
Table 1.4
Table 1.5
Table 1.6
Gantt Chart of Project Timeline
Generalised Method flow-chart
Phase Wise association flow-chart
Graph for analysis
Graph for analysis
Accuracy Table
Error Table
TP rate Table
FP rate Table
ROC rate Table
Resampled Table
INTRODUCTION
Students and Academics
Academic life is one of the most significant parts of a student’s life. According to the dictionary,
Academic refers to: ‘A teacher or a scholar in educational institutes’. Students and Academics
are intermingled together and often the student’s hard work and intellectual ability (intelligence)
is judged based upon his/her academic scores. Academic performance of adolescents is also in
many cases believed to be an indicator of their future success.
Academics act as pillars for future professional growth and long term success. Serious academic
study usually begins in the teenage and academic study can take many years of an individual’s
life depending upon the degree that they are pursuing. Academics are foundational stones for
cognitive development and knowledge enhancement.
Academic life of student’s can often be very challenging also. The teenage and early adulthood
is also the time for physical development also hence, many students opt for sports during this
time. Managing extra-curricula’s and academics can be a challenging task. If done right then it
results in the overall competence increment otherwise a poor performance in both the tasks
which results in lack of confidence.
From the above it is clear that predicting the students’ academic percentage will be useful. A
machine learning model that can predict a given student’s future percentage will be hence
useful. It will be useful for the following reasons:
•
Helpful for teachers: It can help teachers identify students who will have likely weak
performance in the future. Once teachers know this, they can put their focus on these
students accordingly which can effectively result in better academic grades of all the
students.
•
Helpful for students: It can prove to be helpful to students as teachers will be able to
better guide students who will be likely poor in academics. It can help students get the
attention and guidance they require.
•
Statistically helpful: It will be statistically helpful as it will provide a base for further
research. It can also help assess the performance of the teaching techniques used by
teachers as it will provide concrete evidence for the success/failure of the techniques.
•
Better Management: It will result in better management of students. One category of
students can be grouped together, their pace of teaching can be decided according to it.
It will improve the overall management of students.
•
Growth in extra-curriculars – Knowing about a student’s future percentage, it can be
known whether the student will be able to manage sports and other extra-curriculars
along with academics. According to this different students can be encouraged for sports.
About the Model and the Dataset
The model will be trained using Machine learning and a dataset containing various parameters
for the prediction model. Regression analysis will be used for making the model.
Machine Learning
Machine Learning is a subset of Artificial intelligence (AI) and computer science.
It uses different algorithms and data to imitate human’s way of learning. Its accuracy
gradually improves over time. In simple terms, it gives computers the ability to learn
without specific programming. It enables the computers to learn automatically from
past data.
Machine learning algorithms build a mathematical model from the sample historical
data (training data). It is basically a branch of computer science and statistics helpful in
creating predictive models. It is extremely helpful in many organizations and
companies as it can provide useful data which can result in good decisions. For
example – A company can make a decision for the price of a specific item based on the
past records of sales of the company according to the price of the items. This is a very
simple example, much complex decisions can be made using it.
Features of machine learning-:
1. It learns from historic data and improves accordingly.
2. It is a data-driven technology.
3. It identifies patterns in a given dataset.
4. It is very similar to data-mining because of its ability to deal with huge-datasets.
Classification of Machine Learning
1. Supervised Learning
In supervised learning, we provide sample labeled data for training the machine
learning system. On the basis of this it predicts the output.
The goal of supervised learning is to map input data to the output data. It is same as
learning of a student in the presence of supervision of a qualified teacher. It is further
divided into 2 algorithms – classification and regression.
2. Unsupervised Learning
In this method, as the name implies, the machine learns without any supervision. In
this the set of data has not been categorized or classified and the algorithm needs to
act on the data without any supervision.
The machine tries to find useful data by itself. It can also be divided into 2 algorithms
– clustering and association.
3. Reinforcement Learning
It is a feedback-based learning mechanism in machine learning. It is about decisionmaking. It is about maximizing award by taking appropriate decisions in any
environment. It learns by trial and error and works on learning to achieve the best
outcome.
Many of the modern robots work on this system.
Missing values in Machine Learning
In the pre-processing stage of machine learning, handling missing values is a
crucial step. It's critical to select the best approach based on the dataset, the
kind and quantity of missing data, and the particular specifications of the
current challenge. Additionally, it's critical to evaluate how the chosen
imputation method will affect the outcomes and take into account any biases
that may be introduced by imputed missing values.
For the purpose of machine learning, a number of algorithms and methods
are frequently used to fill in the missing values in a dataset :1. Mean/Median/Mode Imputation: This approach replaces missing
values with the mean, median, or mode of the available data. It is
commonly used for numerical features and is a simple and quick
method. However, it assumes that the missing values have a similar
distribution as the observed data.
2. Regression Imputation: Regression models can be used to predict
missing values based on other features in the dataset. A regression
algorithm is trained using the complete data, with the feature
containing missing values as the target variable. The trained model can
then be used to predict the missing values.
3. k-Nearest Neighbours (KNN) Imputation: In this approach, the values of
the k nearest neighbours in the feature space are used to impute
missing values. The closest data points with complete information are
found using the distance measure, and their values are used to fill in
the missing values.
4. Expectation-maximization (EM) Imputation: Using an iterative process
called the EM algorithm, missing values are estimated by maximising
the likelihood function. Starting with an initial estimate, conditional
expectations are used to iteratively update the estimates until
convergence. For data with intricate interdependencies across
variables, EM is frequently utilised.
5. Multiple Imputation: Using statistical models to impute missing values,
multiple imputation produces several imputed datasets. The results are
then merged to account for the uncertainty brought on by missing
values after each dataset has been individually examined. This method
produces more reliable estimates by accounting for imputation
variability.
Resampling
In machine learning, resampling is the act of picking and modifying existing data
points to produce new training datasets. Issues including unbalanced datasets,
overfitting, and model evaluation are frequently addressed with it. The two
primary categories of resampling procedures are:
1. Oversampling: In datasets where one class is underrepresented in
comparison to others, oversampling is employed to remedy the class
imbalance. In order to achieve balance with the majority class, the
minority class must have more instances. The Synthetic Minority Oversampling approach (SMOTE), which interpolates between existing
minority class samples to construct synthetic samples, is the most used
oversampling approach.
2. Undersampling: By lowering the proportion of instances in the majority
class, undersampling seeks to resolve the imbalance between classes.
In order to match the number of samples in the minority class, this
method randomly selects a subset of the samples from the majority
class. While undersampling can aid in dataset balancing, it may cause
information loss and may not fully reflect the diversity of the majority
class.
Ensembling
In order to increase overall predictive performance, ensembling is a machine
learning technique that includes merging the results of various models.
Ensembling is based on the premise that by combining the predictions of various
models, the strengths of each particular model can make up for the deficiencies of
others, producing forecasts that are more reliable and accurate.
Ensembling has the following advantages:
 Better Predictive Performance: When models are combined, they can
perform more accurately than when they are used alone, especially when the
models have varied strengths and weaknesses.
 Robustness: By utilising the consensus of many models, assembling helps
minimise the influence of outliers or noisy data.
 Reduced Overfitting: By merging predictions from many models trained on
various subsets of the data, assembling can help limit overfitting, especially in
complicated models.
 Model Interpretability: Feature importance metrics provided by assembling
techniques like Random Forests can be used to comprehend the relative value
of various aspects in the forecast.
The Dataset
The dataset contains data for predicting students’ academic performance. It tries to find
the end-semester percentage prediction based on different social, economic and academic
attributes. The dataset is gathered from UCI machine learning repository https://archive.ics.uci.edu/ml/datasets/Student+Academics+Performance . This dataset
was donated on 16 September, 2018. Its source is Dr Sadiq Hussain, Dibrugarh University,
Dibrugarh, Assam, India, sadiq '@' dibru.ac.in. It has total 22 attributes with 300 instances. It does
not contain any missing value.
Some of the attributes are as follows:o Gender
It comprises two Genders – Male and Female.
o Caste
It contains caste attribute which contains castes in the form of General , SC , ST
and OBC.
o Marital Status
It tells about the marital status of the individual student i.e. whether married or
unmarried.
o Father’s Occupation
This attribute tells about the father’s occupation. It is a very important attribute as
this highly influences the economic condition of the family which in turn effects
the education of the student. It has 4 options – Service, Business, Retired, Farmer,
Others.
o Mother’s Occupation
Again it is a very important attribute. It also has 4 options – Service, Business,
Retired, Farmer, Others.
o Father’s Qualification
This is also a very important attribute as father’s education level also effects the
child’s learning and knowledge. It has 5 options – Illiterate, 10th, 12th, Degree, PG.
o Mother’s Qualification
Mother is the first teacher so her education qualification impacts the child in a
significant way, in many cases, even more than father. It also has 5 options –
Illiterate, 10th, 12th, Degree, PG.
o Medium of Education
Medium of Education is also an attribute. Since, this dataset is made in India hence
Indian languages are included. It has 4 options – English , Hindi , Assamese,
Bengali.
o Family Income
Family income supports education hence it is a useful attribute. It has 4 attributes –
Very high, high, medium, low.
o Academic Grades
This is the most important attribute as past behavior is indicator of future behavior.
The academic grades are of 4 subjects. It has 5 options – Best, Very Good, Good,
Pass, Fail.
Etc.
Timeline
Figure 1.1
CHAPTER 2.
LITERATURE REVIEW
2.1 Timeline of the reported problem
A number of studies have been conducted in the past to predict the students’ performance. It
has been proven [1] that the data about the activity of students during the semester improves the
prediction. The predictive models using different algorithms have been made a lot of times. One
of the earliest is (Thai-Nghe et al., 2011) [2]. After this a lot of studies have been published.
2.2 Existing solutions
1. PERSONALIZED MULTI-LINEAR REGRESSION MODELS (PLMR) by [3](Elbadrawy et al.,
2016) .
2. REGRESSION AND CLASSIFICATION MODELS by [4] (Meier et al., 2016) and [5]
(Zimmermann et al., 2015).
3. FACTORIZATION MACHINES (FM) [6] (Sweeney et al., 2015).
4. MATRIX FACTORIZATION MODEL [2] (Thai-Nghe et al., 2011).
2.3 Bibliometric analysis
1. PERSONALIZED MULTI-LINEAR REGRESSION
The PLMR could predict the next semester percentage with lower error rates. PLMR was also
useful for predicting grades on assessments within a traditional class or online course by
incorporating features captured through students’ interaction with LMS and MOOC server logs.
OBSERVED DRAWBACK –
The final grade prediction based on the limited initial data of students and courses is a challenging
task because, at the beginning of undergraduate studies, most of the students are motivated and
perform well in the first semester but as the time passed there might be a decrease in motivation and
performance of the students.
2. REGRESSION AND CLASSIFICATION MODELS
(Meier et al., 2016) proposed an algorithm to predict the final grade of an individual student when
the expected accuracy of the prediction is sufficient. The algorithm can be used in both regression
and classification settings to predict students’ performance in a course. The study also demonstrated
that timely prediction of the performance of each student would allow instructors to intervene
accordingly.
(Zimmermann et al., 2015) considered regression models in combination with variable selection and
variable aggregation approach to predict the performance of graduate students and their aggregates.
By analyzing the structure of the undergraduate program, they assessed a set 3 of students’ abilities.
Their results can be used as a methodological basis for deriving principle guidelines for admissions
committees.
3. FACTORIZATION MACHINES (FM)
(Sweeney et al., 2015) developed a system for predicting students’ grades using simple baselines
and MF-based methods for the dataset of George Mason University (GMU). Their study showed
that Factorization Machines (FM) model achieved the lowest prediction error and can be used to
predict both cold-start and non-cold-start predictions accurately.
4. MATRIX FACTORIZATION MODEL
(Thai-Nghe et al., 2011)[2] created matrix factorization models in order to predict student
performance of Algebra. This technique is useful in cases involving sparse data. They are also
useful when the absence of students’ background knowledge and tasks is there.
OBSERVED DRAWBACK –
It is not efficient when dealing with small sample sizes.
2.4 Review Summary
After careful analysis of literature it has been found out that a number of studies have been
conducted on this topic. By far the most accurate predictive models are factorization models and
regression and classification models while models like matrix factorization struggled with small
sample sizes.
REFERENCES
1. Koprinska, I., Stretton, J., and Yacef, K. 2015. Students at Risk: Detection and Remediation. The 8th
2.
3.
4.
5.
6.
International Conference on Educational Data Mining (EDM 2015), pp. 512 – 515.
THAI-NGHE, N., DRUMOND, L., HORVATH ´ , T., NANOPOULOS, A., AND SCHMIDTTHIEME, L. 2011. Matrix and tensor factorization for predicting student performance. In CSEDU
(1). Citeseer, 69–78. THAI-NGHE, N., DRUMOND, L., HORVATH ´ , T., SCHMIDT-THIEME, L.,
ET AL. 2011. Multi-relational factorization models for predicting student performance. In Proc.
of the KDD Workshop on Knowledge Discovery in Educational Data. Citeseer, 27–40.
THAI-NGHE, N., DRUMOND, L., HORVATH ´ , T., NANOPOULOS, A., AND SCHMIDTTHIEME, L. 2011. Matrix and tensor factorization for predicting student performance. In CSEDU
(1). Citeseer, 69–78. THAI-NGHE, N., DRUMOND, L., HORVATH ´ , T., SCHMIDT-THIEME, L.,
ET AL. 2011. Multi-relational factorization models for predicting student performance. In Proc.
of the KDD Workshop on Knowledge Discovery in Educational Data. Citeseer, 27–40.
MEIER, Y., XU, J., ATAN, O., AND VAN DER SCHAAR, M. 2016. Predicting grades. IEEE
Transactions on Signal Processing 64, 4, 959–972.
ZIMMERMANN, J., BRODERSEN, K. H., HEINIMANN, H. R., AND BUHMANN, J. M. 2015. A
modelbased approach to predicting graduate-level performance using indicators of undergraduatelevel
performance. JEDM-Journal of Educational Data Mining 7, 3, 151–176.
SWEENEY, M., LESTER, J., AND RANGWALA, H. 2015. Next-term student grade prediction. In
Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 970–975.
Design Flow/Process
Proposed Methodology
In proposed methodology two approaches have been used:
1)Generalised Method
2) Phase-wise Association
Fig 1.2
GENERALISED METHOD
The dataset must first be imported from the UCI repository. Once the dataset has
been imported, it becomes apparent that it does not meet the criteria,
necessitating preprocessing during which prominent features are discovered by
using three distinct feature selection techniques and feature selection methods
are compared. Significant features have been observed. The dataset has been
preprocessed, and ensembling is applied to the model. Comparing original
dataset to ensembling, the best accuracy is attained. To train the original and
prominent features, training is currently being done with the prominent and
original dataset.
Analysis and feedback are taken. If the model is unable to meet the
requirements, the improvement is implemented in accordance with the
recommendations. It is then put to the test once more and verified. The
ROBUSTNESS of the model is improved if it is able to achieve the intended
results, as this is one of the most crucial steps in determining whether or not
the model is capable of dealing with real-world problems and situations. The
model is ready to be used in the real world once its robustness has been
increased.
Fig 1.3
Phase wise Association
This section is divided into 5 different phase. In the first phase, the problem
statement is identified The dataset is imported which is relevant to the problem
statement. The imported dataset is not according to the requirements , So
further processing is required. Once we have imported the dataset , the second
phase is starting with the pre processing in which prominent features are
identified using different kind of algorithms . This is one of the most important
phase which focuses on meeting the requirements of the dataset . The dataset
is balanced and structurised. In the third phase, the model is trained with
traditional machine learning algorithms and the data is collected and analysed.
After that Ensembling (voting) is applied to the dataset for further increasing
the accuracy. In phase four, Test and Validate section focus on comparing the
performance of prominent features with original features and improved the
ROBUSTNESS of the model. The best possible accuracy is achieved in
ensembling through voting with 77.2% accuracy. The model is strengthened in
the fifth step so that it may be used in the actual world since it is reliable. The
model is deployed to the real world after considering all forms of input and
analysis and after making all of the suggested modifications, as it is now
prepared to face and solve the problems of the real world. From 53.2% to
77.2% accuracy, there had been an improvement. Therefore, the outcomes
based on the relevant model are satisfactory and prepared for use in the real
world.
Result Analysis and Validation
The stages of result analysis and validation are essential to the assessment of
student achievement. They aid in evaluating the data gathered and figuring out the
validity and reliability of the evaluation techniques employed. Researchers and
educators may make sure the evaluation process is accurate and useful by doing
thorough analysis and validation. We go over a few typical methods for result
analysis and validation in student performance evaluations in this section.
 Statistical Analysis: Statistical analysis is essential to understanding results.
To find patterns, trends, and relationships in the data, several statistical
approaches must be applied. The performance data can be summarised using
descriptive statistics like mean, standard deviation, and frequency
distributions. T-tests and analysis of variance (ANOVA) are two inferential
statistics that can be used to compare student performance across various
groups or circumstances. Finding predictors of student performance, such as
demographic variables or instructional interventions, can be done using
regression analysis.
 Validity Analysis: Validity analysis focuses on determining whether the
assessment methods are measuring what they are supposed to measure. As a
result, the required learning outcomes or constructs are accurately captured
by the evaluation tools. You can assess the content validity, criterion-related
validity, and construct validity, among other sorts of validity. Examining the
assessment items' content validity entails determining if they accurately
reflect the content domain. Comparing student performance to outside
standards like standardised examinations or professional opinions is referred
to as criterion-related validity. The degree to which the assessment measures
match to theoretical constructs or concepts is measured by construct validity.
 Cross-validation: This technique is used to confirm that assessment models
or findings are accurate and generalizable. The dataset is divided into several
subsets, the model is trained on one subset, and it is then validated on the
remaining subsets. This procedure aids in analysing the model's reliability and
generalizability by gauging how well it performs on fresh or previously
unexplored data.
 It is crucial to remember that rigorous result analysis and validation should be
carried out while taking the context and procedural constraints of the
assessment process into account. Researchers and educators can increase the
validity, reliability, and credibility of student performance evaluations by
thoroughly analysing and validating the findings, which will result in better
decision-making and educational outcomes.
Accuracy Analysis
Algorithm
s
BAYES NET
LOGISTIC
Multilayer
Perceptro
n
SMO
Lazy.IBk
Decision
Stump
Random
Forest
Random
Tree
53.2675
39.0704
45.0481
Consiste
nt
54.9618
38.9313
48.0916
Resampl
ed
68.7023
77.0992
72.5191
43.6044
33.8779
45.8333
44.2748
41.9847
43.5115
71.7557
71.7557
44.2748
48.6196
54.9618
71.7557
41.5653
37.4046
67.9389
Original
Table 1.1
Error Analysis
Algorithms
BAYES NET
LOGISTIC
Multilayer
Perceptron
SMO
Lazy.IBk
Decision
Stump
Random
Forest
Random
Tree
Origi
nal
46.73
25
60.92
96
54.95
19
56.39
56
66.12
21
54.16
67
51.38
04
58.43
47
Table 1.2
Consiste
nt
45.0382
Resampl
ed
31.2977
61.0687
22.9008
51.9084
27.4809
55.7252
28.2443
58.0153
28.2443
56.4885
55.7252
45.0382
28.2443
62.5954
32.0611
TP Rate Analysis
Resampled
0.533
0.391
0.45
Consiste
nt
0.55
0.389
0.481
0.436
0.339
0.458
0.443
0.42
0.435
0.718
0.718
0.443
0.486
0.55
0.718
0.416
0.374
0.679
Algorithms
Original
BAYES NET
LOGISTIC
Multilayer
Perceptron
SMO
Lazy.IBk
Decision
Stump
Random
Forest
Random
Tree
0.687
0.771
0.725
Table 1.3
FP Rate Analaysis
Algorithms
BAYES NET
LOGISTIC
Multilayer
Perceptron
SMO
Lazy.IBk
DecisionStump
RandomForest
RandomTree
Origina
l
0.234
0.305
0.275
Consi
stent
0.235
0.332
0.296
Resampled
0.282
0.331
0.271
0.257
0.292
Table 1.4
0.321
0.337
0.359
0.264
0.341
0.168
0.155
0.35
0.167
0.187
0.161
0.133
0.166
ROC Rate Analysis
Algorithms
BAYES NET
LOGISTIC
MultilayerP
erceptron
SMO
Lazy.IBk
DecisionStu
mp
RandomFor
est
RandomTre
e
Origina
l
0.706
0.581
0.663
Consi
stent
0.696
0.574
0.662
Resampl
ed
0.816
0.803
0.851
0.647
0.53
0.55
0.627
0.539
0.553
0.793
0.796
0.544
0.677
0.655
0.898
0.57
0.539
0.77
Table 1.5
Resampled and Ensembled Data with W-saw and L-saw
MODEL
BAYES NET
ACCUR
ACY
68.702
3
LOGISTIC
77.099
2
MultilayerP
erceptron
72.519
1
SMO
71.755
7
Lazy.IBk
71.755
7
DecisionSt
ump
44.274
8
RandomFo
rest
71.755
7
err
or
31.
297
7
22.
900
8
27.
480
9
28.
244
3
28.
244
3
55.
725
2
28.
244
3
TP
FP
ROC
L-saw
0.816
Wsaw
9.15
0.687
0.1
61
0.771
0.1
33
0.803
9.15
3.22
0.725
0.1
66
0.851
9.15
3.22
0.718
0.1
68
0.793
9.15
3.22
0.718
0.1
55
0.796
9.15
3.22
0.443
0.3
5
0.544
9.15
3.22
0.718
0.1
67
0.898
9.15
3.22
3.22
RandomTr
ee
67.938
9
Voting
(Logistic +
Random
Forest)
77.099
2
32.
061
1
22.
900
8
0.679
0.1
87
0.77
9.15
3.22
0.771
0.1
33
0.916
9.15
3.22
Table 1.6
Graphs for Analysis of Ensembling and resampled data
100
80
60
40
20
0
ACCURACY
error
TP
FP
ROC
wsaw
Fig 1.4
80
MODEL
60
ACCURACY
40
error
20
TP
0
1 2
3
4
5
6
7
8
Fig 1.5
9
10
TP
FP
MODEL
ROC
Conclusion and Summary
In conclusion, student performance evaluation is a crucial part of the educational
process since it enables teachers to evaluate their students' academic progress,
pinpoint their areas of strength, and create efficient interventions. We have learned
the value of student performance evaluation datasets through literature reviews in
identifying the variables affecting student performance and informing educational
practises.
In order to guarantee the accuracy, reliability, and validity of student performance
evaluation, the result analysis and validation processes are extremely important.
Data interpretation and pattern and link identification are both aided by statistical
analysis. Validity analysis guarantees that the assessment methods effectively
measure the targeted learning outcomes, whereas reliability analysis provides
consistency and stability in the evaluation measures. The validation process is also
aided by cross-validation and expert opinion.
A more thorough and all-encompassing picture of student performance can be
obtained by further research and development of novel assessment techniques,
such as performance-based assessments, project-based assessments, and authentic
assessments.
In conclusion, research and development in student performance evaluation are
continuous. We can improve the efficacy and fairness of student performance
evaluation by advancing assessment techniques, incorporating technology,
conducting longitudinal studies, adopting multidimensional evaluation approaches,
and addressing ethical issues. This will ultimately support students' academic
success and growth.
Organization of the Report
Chapter 1 Problem Identification: This chapter introduces the project and describes the problem
statement discussed earlier in the report.
Chapter 2 Literature Review: This chapter prevents review for various research papers which
help us to understand the problem in a better way. It also defines what has been done to already
solve the problem and what can be further done.
Chapter 3 Design Flow/ Process: This chapter presents the need and significance of the proposed
work based on literature review. Proposed objectives and methodology are explained. This
presents the relevance of the problem. It also represents logical and schematic plan to resolve the
research problem.
Chapter 4 Result Analysis and Validation: This chapter explains various performance
parameters used in implementation. Experimental results are shown in this chapter. It explains the
meaning of the results and why they matter.
Chapter 5 Conclusion and Summary: This chapter concludes the results and explain the best
method to perform this research to get the best results and define the future scope of study that
explains the extent to which the research area will be explored in the work.
Team Roles
Member Name
UID
Mukul saini
22BCS14902
Roles
•
COLLECTION AND MAKING
OF THE DATASET
•
CLUSTERING
DISTRIBUTION
OF
AND
THE
DATASET.
•
Nishant Bamal
22BCS15558
•
•
VISUALISATION
OF
THE
DATASET
COLLECTION OF DATASET
VISUALISATION OF
THE
DATASET
•
TESTING AND TRAINING OF
THE DATASET
Download