Uploaded by Rahul Bhangarwala

Internship

advertisement
JSPM’S Jayawantrao Sawant College Of Engineering,
Hadapsar, Pune-28
Department of Computer Engineering
Third Year (A.Y. 2021 – 2022)
310255 : Internship
- Joel Silas -
Roll number : 3259
T.E. B Computer Dept.
JSPM’s JSCOE
 We live in a world where we collect huge amounts of data. Traditional
methods and techniques are no longer sufficient to process them. In addition
to the sophisticated development of computers, new ways of processing
data are evolving.
 Data Science is a new emerging multidisciplinary field that combines
classical disciplines like statistics and mathematics with computer science.
The main goal of Data Science is to turn large sets of both unstructured and
structured data into useful information that can help organizations to make
powerful data-driven decisions.
 At a high level, data science can be described as a set of fundamental
principles necessary for successful extraction of information from data.
There are many powerful tools for data scientists that can help them in this process, but in
order to use them wisely, data scientists must have much pre-knowledge from statistics, math
and computer sciences, and they also need to be able to see business problems from a data
perspective. Data science uses scientific methods, processes, algorithms statistics, data
mining, databases, and distributed systems to extract knowledge and insights from data.
The main goals are :
(i) To present a short summary of the history and definition of data science.
(ii) To elaborate similarities and differences between Business Intelligence and Data
Science.
(iii) To overview the life cycle of data science.
(iv) To outline the benefits and various applications of data.
 “Machine Learning is a field of study that gives computers the ability to learn
and determine the output without being explicitly programmed.” -Arthur
Samuel (1959)
 Machine learning (ML) is a sub-domain of artificial intelligence (AI) that
allows software applications to become more accurate at predicting
outcomes without being explicitly programmed to do so.
 Machine learning algorithms use historical data as input to predict new
output values.
 The python programming language provides a rich set of libraries to
execute the machine learning algorithms.
ETG is a company that organizes training programs and internships in an interactive
mode where various individuals and team based exercises are conducted in order to
bridge the gap between academics and industry practices.
Vision :
ETG aims to revolutionize the education scenario across the world by inculcating a
pragmatic approach among students towards the academic knowledge imparted by
institutions
Mission :
Elite Techno Groups emphasizes on the intellectual development of students by
providing them with a practical mode of learning and thereby channeling their technical
knowledge towards innovative real world application.
 The primary purpose of machine learning is to discover patterns in the user data and then
make predictions based on these and intricate patterns for answering business questions and
solving business problems. Machine learning helps in analyzing the data as well as identifying
trends.
 It gives enterprises a view of trends in customer behavior and operational business patterns, as
well as supports the development of new products.
 Ultimately it will help to explore, sort and analyze data from various sources and reach
conclusions to optimize decisions and business processes.
 To explore new learning methods and develop general learning algorithms independent of
applications.
 Objectives of the projects completed during this internship :
1. Regression : to predict the value of a dependent variable based on independent variables
2. Classification : to condense mass data by classification based on similarities
3. Reinforcement Learning : learning agent should be able to perceive and interpret its
environment, take actions and learn through trial and error.
4. Natural Language Processing (NLP) : to read, understand, and decode human words in a
valuable manner to achieve a normal communication between humans and computers
Computer Model
Standard x86 (32-bit) or x86 (64-bit) compatible desktop or laptop computer
Memory
At least 1GB of RAM
Operating System Requirement (Any one)
•Windows 10, 32- or 64-bit versions
•Windows 8 or 8.1, 32- or 64-bit versions
•Windows 7, 32- or 64-bit versions
Software Requirement (Any one)
• Anaconda Navigator
• Google Colab
• Jupyter Notebook
• PyCharm IDE
• Visual Studio Code
1. Machine learning is the scientific process of training systems to act upon data without
requiring explicit, programmed instructions.
2. A subtype of Artificial Intelligence (AI) called as machine learning leverages algorithms and
statistical models to identify patterns and predict future outcomes.
3. Most machine learning initiatives fit within the models: supervised, unsupervised and
reinforcement learning.
 Supervised machine learning :
It begins with a known, labeled dataset — often called “training data” — and uses that data to
make predictions, which are compared against actual outcomes in order to further refine the
algorithm.
 Unsupervised algorithms :
These leverage unlabeled data in order to provide a deeper understanding of how computers
identify patterns.
 Reinforcement learning :
Reinforcement learning is a machine learning training method based on rewarding desired
behaviors and/or punishing undesired ones.
1. Inventory Management System (Python programming) – An inventory is a store-house
for a shop where it helps understand the stock that is available and which is required based
on the market demand. This when implemented in python helps to manage the stock easily.
It also provides information such as expiry date so that the products can be distributed
carefully.
Project Link : https://github.com/joelsilas1816/Inventory-Management-System-for-Skill-India-
AI-ML-internship/blob/main/IMS%20add%20products.ipynb
2. Data Analysis and Visualization – It involves steps such as data wrangling to clean the
data in which missing values and outliers are removed, required features are extracted to
make it ready for analysis. Also graphical / pictorial representation of data helps identify
meaningful insights in case of any Data Science Project.
Project Link : https://github.com/joelsilas1816/Olympics-Analysis-Assignment-for-Skill-India-AI-
ML-internship/blob/main/Summer.ipynb
3. Student Score Prediction based on number of hours of study (Linear Regression) –
Prediction of the marks of the students by determining the relation between his number of
hours of study and the marks obtained as per the analysis of students performance in
previous exams
Project Link : https://github.com/joelsilas1816/Data-Science-
Projects/blob/main/Prediction%20of%20students%20score/Project_1_Prediction_of_students_scor
e_based_on_number_of_hours_of_study.ipynb
4. Prediction of Parkinson's Disease (XGBoost Classification) – Based on the
characteristics and symptoms present in the medical record, it is predicted that whether for
the given characteristics the disease exists in the person or not
Project Link : https://github.com/joelsilas1816/Data-Science-
Projects/blob/main/Detection%20of%20Parkinsons%20disease/Project_2_Detection_of_Parkinsons
_Disease.ipynb
5. Fake News Detection (PassiveAgressive Classifier) – Classification of the news into
REAL and FAKE categories will be done based on the type of words used in the news
Project Link : https://github.com/joelsilas1816/Data-Science-
Projects/blob/main/Fake%20News%20Detection/Project_3_Fake_News_Detection.ipynb
6.
Best Ad Prediction (Reinforcement Learning) – The algorithm will help to determine the
best ad out of many which attracts the customers and proves to be beneficial for the
organization.
Project Link : https://github.com/joelsilas1816/Data-Science-
Projects/blob/main/Best%20ad%20prediction/Project_1_Best_ad_prediction_Joel_Christopher_Sil
as%20(1).ipynb
7. Chatbot (NLP)– The model will be trained based on certain stories, incidences, facts and
when questions pertaining it will be asked, the Chabot will be able to respond.
Project Link : https://github.com/joelsilas1816/Data-Science-
Projects/blob/main/Chatbot/Project_2_Chatbot_Joel_Christopher_Silas%20(1).ipynb
8. Bank Customer Churn Factor Prediction – Various factors cause the customers to
discontinue the service they procure from particular agencies or companies. These factors
may be their income, age, banking services like credit card facility which will be evaluated,
to identify the cause of bank customer churn.
Project Link : https://github.com/joelsilas1816/Data-Science-
Projects/blob/main/Bank%20Customer%20Churn%20Prediction/Joel_Silas_JSCOE__Bank_Custom
er_Churn_Problem%20(1).ipynb
9. Sentiment or Feedback Analysis (NLP) – Twitter, the social media platform provides a
facility to express views in the form of tweets. These tweets will be analyzed to understand
whether the person wants to give a positive or negative remark which will help the company
to work on customer expectations and feedback
Project Link : https://github.com/joelsilas1816/Data-Science-
Projects/blob/main/Sentiment%20Analysis/Sentiment_Analysis.ipynb
 Machine learning systems are designed to generate maximum business value from ML
models used in services and products. If you believe the media hype around AI, you could
think that data scientists only focus on achieving state-of-the-art (SOTA) performance and
designing ingenious model architectures. The reality is a bit different, and data scientists
have many more objectives to accomplish.
 When building AI systems, it’s always good to take a divide-and-conquer approach with your
goal. This means breaking down the problem statement into solvable components, and
studying how machine learning could help alleviate certain problems. A good understanding
of limitations can help you build better products.
 The workflow involves understanding the data and performing various analysis to clean data
along with visualization to identify inherent patterns. Further depending upon the prediction
required and class of data the suitable models and algorithms can be applied to train and
test data to generate meaningful results. The process repeats iteratively upto an extent to
enhance accuracy.
 Various algorithms and data structures have already been implemented which simply need
to be applied on various types of data for problem-solving.
1. Easily identifies trends and patterns
2. No human intervention needed (automation)
3. Continuous Improvement
4. Handling multi-dimensional and multi-variate data
5. Wide Applications in domains like Education, Healthcare, Armed Forces, Finance,
etc.
6. Predictive analysis is possible to make arrangements for handling future
consequences.
1.
Data Acquisition :
Machine Learning requires massive data sets to train on, and these should be
inclusive/unbiased, and of good quality. There can also be times where one must wait for new
data to be generated.
2. Time and Resources :
ML needs enough time to let the algorithms learn and develop enough to fulfill their purpose
with a considerable amount of accuracy and relevancy. It also needs massive resources to
function. This can mean additional requirements of computer power for one.
3. Certain analysis is short of human understanding :
This includes sarcasm in twitter tweets, interpretation of the exclamation mark to express
anger, excitement or praise.
On execution of these projects deep understanding of Analytics and its
mechanism became possible.
Importance of Analytics and its implementation in the python
programming language was comprehended.
This was an extremely great learning experience for me where I could
work on many new concepts with hands-on practical experience and
insightful knowledge.
The faculty, mentors, supervisors and support staff were very
cooperative and knowledgeable to address each and every concern
that students can have. The sessions were very interactive though it was
conducted on virtually.
I am glad to exercise my skills on this domain.
Download