Uploaded by Francisca Oladipo

2019 AICESSS Presenetation for Journal

advertisement
THE RANGE OF POSSIBILITIES IN
MACHINE LEARNING AND DATA
SCIENCE IN THE PARADIGM OF
MULTIDISCIPLINARY RESEARCH
Francisca O. Oladipo, PhD, FASI
Department of Computer Science, Faculty of Science, Islamic University in Uganda
francisca.oladipo@fulokoja.edu.ng
Malik Adeiza
Department of Computer Science, Faculty of Science, Federal University Lokoja
Outline
■ Background
– Data Driven Research
■ Data Science
■ Data Analytics
■ Machine Learning
■ Applications Areas
■ Public Datasets
■ Concluding Remarks
BACKGROUND
Background
■Computing is central to our lives
– Solves problems across every domain. Education, healthcare,
legal, social…
■The Use of Data and as a Discipline
■ Goes back a long way
• e.g. Astronomy 16C and 17C (maybe not the oldest):
• Observations, new theories: e.g. Census 19C in UK and USA analysed
by hand
• Then Hollerith’s punched card machines USA 1890 (it still took 6
years, that first time)
– …
Generic Research Paradigm
Data-Driven Research
Blog post by Adi Bhat, Global VP - Sales and Marketing at QuestionPro
DATA SCIENCE
Data Science
• The processing and analysis of data generated
from various insights to serve several purposes.
• Processes:
• Data extraction
• Data Cleansing
• Analysis
• Visualization
• And actionable Insights generation
MACHINE LEARNING
Machine Learning
■ A sub-field of artificial intelligence and part of
data science
■ Transformed Computers from GIGO to having
a mind of their own
■ Using appropriate statistical and algorithmic
models, computers perform certain tasks (on
data) which they have not been exclusively
programmed for
Machine Learning
■ Machine Learning is the ability given to a system to
learn and process data sets autonomously without
human intervention using complex algorithms and
techniques like regression, supervised clustering,
naïve Bayes, etc
■ Machine Learning Algorithms apply what they have
learnt from existing data to forecast future behaviors,
outcomes, and trends to solve problems.
■ For example, an algorithm can be trained with cat
photos to recognize cats; the same algorithm can
also be trained with bicycle photos to recognize
bicycles without changing a line of code.
Generic Machine Learning Paradigm
 Data Collection
 Data Preparation
 Choose a Model
 Train the Model
 Evaluate/Test the Model
 Parameter Tuning
 Make Predictions
Applications Areas















Agriculture
Anatomy
Adaptive websites
Affective computing
Banking
Bioinformatics
Brain–machine
interfaces
Cheminformatics
Computer Networks
Computer vision
Credit-card fraud
detection
Data quality
DNA sequence
classification
Economics
Financial market
analysis
 Machine translation
 Marketing
 Medical diagnosis
 Natural language processing
 Natural language understanding
 Online advertising
 Optimization
 Recommender systems
 Robot locomotion
 Machine learning control
 Machine perception
 General game playing
 Handwriting recognition
 Information retrieval
 Search engines
 Sentiment analysis
 Sequence mining
 Software engineering
 Speech recognition
 Structural health monitoring
 Syntactic pattern recognition
 Telecommunication
 Theorem proving
 Time series forecasting
 User behavior analytics
 Insurance
 Internet fraud detection
 Linguistics
Further Resources
https://data-flair.training/blogs/machinelearning-applications/
Machine Learning in Action
■https://www.forbes.com/sites/bernardma
rr/2018/04/30/27-incredible-examplesof-ai-and-machine-learning-inpractice/#478fdb0c7502
■https://elitedatascience.com/machinelearning-impact
PUBLIC DATASETS
List of Selected Public Datasets
■ https://www.springboard.com/blog/free-public-data-sets-data-science-project/
UCI Machine Learning Repository
–
–
–
–
oldest sources of datasets on the web
user-contributed and thus have varying levels of cleanliness
Data is freely available for download without registration
http://mlr.cs.umass.edu/ml/
Kaggle
– Externally contributed
– Contains all kind of niche datasets in its master list, from basketball data to and Luganda
phrases
– https://www.kaggle.com/
Google Dataset Search
• A mega Dataset
• Enables users find datasets wherever they’re hosted
(publisher’s site, a digital library, or an author’s personal web
page, etc
• https://toolbox.google.com/datasetsearch
First-person Vision dataset of Office Activities
■ Source: Barcelona (Spain), Oxford (UK) and Nairobi (Kenya)
■ Mult-subject first-person
■ Contains the highest number of subjects + activities compared to
existing office activity datasets.
– activities include person-to-person interactions, such as chatting and
handshaking, person-to-object interactions, such as using a computer
or a whiteboard, as well as generic activities such as walking.
– Video is provided along with its annotation and the extracted features.
■ The videos in the dataset present a number of challenges that, in
addition to intra-class differences and inter-class similarities, include
frames with illumination changes, motion blur, and lack of texture.
■ The dataset contains a description of the state-of-the-art features
extracted from the dataset and base- line activity recognition results
with a number of existing methods.
Still in Africa…
Additional List of Public Datasets
– https://opendatascience.com/25-excellentmachine-learning-opendatasets/?utm_campaign=Newsletters&utm_source
=hs_email&utm_medium=email&utm_content=727
47751&_hsenc=p2ANqtz--Irt4myAzUbs3jNWuSlvk3C5mlPzS41eFw7NjRUB171yh
WOKE5BNKWF-q-55OttgUtwFSH813baP5UN5bU9fau8JyL0JYgeyAn6UpP3mxiauXYA&_
hsmi=72747751
Requisite Skills
Data Science
Machine Learning
■ R, Python, Scala, SAS.
■ Expertise in coding
fundamentals.
■ Knowledge of databases
like SQL.
■ Good knowledge in the
field of mathematics and
statistics.
■ Understanding of
analytical functions.
■ Knowledge and
experience in machine
learning.
■ Programming concepts.
■ Probability and statistics.
■ Data modeling.
APPLIED
DATA
SCIENCE
THANK YOU
Download