THE RANGE OF POSSIBILITIES IN MACHINE LEARNING AND DATA SCIENCE IN THE PARADIGM OF MULTIDISCIPLINARY RESEARCH Francisca O. Oladipo, PhD, FASI Department of Computer Science, Faculty of Science, Islamic University in Uganda francisca.oladipo@fulokoja.edu.ng Malik Adeiza Department of Computer Science, Faculty of Science, Federal University Lokoja Outline ■ Background – Data Driven Research ■ Data Science ■ Data Analytics ■ Machine Learning ■ Applications Areas ■ Public Datasets ■ Concluding Remarks BACKGROUND Background ■Computing is central to our lives – Solves problems across every domain. Education, healthcare, legal, social… ■The Use of Data and as a Discipline ■ Goes back a long way • e.g. Astronomy 16C and 17C (maybe not the oldest): • Observations, new theories: e.g. Census 19C in UK and USA analysed by hand • Then Hollerith’s punched card machines USA 1890 (it still took 6 years, that first time) – … Generic Research Paradigm Data-Driven Research Blog post by Adi Bhat, Global VP - Sales and Marketing at QuestionPro DATA SCIENCE Data Science • The processing and analysis of data generated from various insights to serve several purposes. • Processes: • Data extraction • Data Cleansing • Analysis • Visualization • And actionable Insights generation MACHINE LEARNING Machine Learning ■ A sub-field of artificial intelligence and part of data science ■ Transformed Computers from GIGO to having a mind of their own ■ Using appropriate statistical and algorithmic models, computers perform certain tasks (on data) which they have not been exclusively programmed for Machine Learning ■ Machine Learning is the ability given to a system to learn and process data sets autonomously without human intervention using complex algorithms and techniques like regression, supervised clustering, naïve Bayes, etc ■ Machine Learning Algorithms apply what they have learnt from existing data to forecast future behaviors, outcomes, and trends to solve problems. ■ For example, an algorithm can be trained with cat photos to recognize cats; the same algorithm can also be trained with bicycle photos to recognize bicycles without changing a line of code. Generic Machine Learning Paradigm Data Collection Data Preparation Choose a Model Train the Model Evaluate/Test the Model Parameter Tuning Make Predictions Applications Areas Agriculture Anatomy Adaptive websites Affective computing Banking Bioinformatics Brain–machine interfaces Cheminformatics Computer Networks Computer vision Credit-card fraud detection Data quality DNA sequence classification Economics Financial market analysis Machine translation Marketing Medical diagnosis Natural language processing Natural language understanding Online advertising Optimization Recommender systems Robot locomotion Machine learning control Machine perception General game playing Handwriting recognition Information retrieval Search engines Sentiment analysis Sequence mining Software engineering Speech recognition Structural health monitoring Syntactic pattern recognition Telecommunication Theorem proving Time series forecasting User behavior analytics Insurance Internet fraud detection Linguistics Further Resources https://data-flair.training/blogs/machinelearning-applications/ Machine Learning in Action ■https://www.forbes.com/sites/bernardma rr/2018/04/30/27-incredible-examplesof-ai-and-machine-learning-inpractice/#478fdb0c7502 ■https://elitedatascience.com/machinelearning-impact PUBLIC DATASETS List of Selected Public Datasets ■ https://www.springboard.com/blog/free-public-data-sets-data-science-project/ UCI Machine Learning Repository – – – – oldest sources of datasets on the web user-contributed and thus have varying levels of cleanliness Data is freely available for download without registration http://mlr.cs.umass.edu/ml/ Kaggle – Externally contributed – Contains all kind of niche datasets in its master list, from basketball data to and Luganda phrases – https://www.kaggle.com/ Google Dataset Search • A mega Dataset • Enables users find datasets wherever they’re hosted (publisher’s site, a digital library, or an author’s personal web page, etc • https://toolbox.google.com/datasetsearch First-person Vision dataset of Office Activities ■ Source: Barcelona (Spain), Oxford (UK) and Nairobi (Kenya) ■ Mult-subject first-person ■ Contains the highest number of subjects + activities compared to existing office activity datasets. – activities include person-to-person interactions, such as chatting and handshaking, person-to-object interactions, such as using a computer or a whiteboard, as well as generic activities such as walking. – Video is provided along with its annotation and the extracted features. ■ The videos in the dataset present a number of challenges that, in addition to intra-class differences and inter-class similarities, include frames with illumination changes, motion blur, and lack of texture. ■ The dataset contains a description of the state-of-the-art features extracted from the dataset and base- line activity recognition results with a number of existing methods. Still in Africa… Additional List of Public Datasets – https://opendatascience.com/25-excellentmachine-learning-opendatasets/?utm_campaign=Newsletters&utm_source =hs_email&utm_medium=email&utm_content=727 47751&_hsenc=p2ANqtz--Irt4myAzUbs3jNWuSlvk3C5mlPzS41eFw7NjRUB171yh WOKE5BNKWF-q-55OttgUtwFSH813baP5UN5bU9fau8JyL0JYgeyAn6UpP3mxiauXYA&_ hsmi=72747751 Requisite Skills Data Science Machine Learning ■ R, Python, Scala, SAS. ■ Expertise in coding fundamentals. ■ Knowledge of databases like SQL. ■ Good knowledge in the field of mathematics and statistics. ■ Understanding of analytical functions. ■ Knowledge and experience in machine learning. ■ Programming concepts. ■ Probability and statistics. ■ Data modeling. APPLIED DATA SCIENCE THANK YOU