Uploaded by caviardubai9

OpenSourcePlatformsFrameworksBhattacharya2018

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/328179850
Open Source Platforms and Frameworks for Artiļ¬cial Intelligence and Machine
Learning
Conference Paper · April 2018
DOI: 10.1109/SECON.2018.8479098
CITATIONS
READS
3
1,346
5 authors, including:
Sambit Bhattacharya
Denny Czejdo
Fayetteville State University
Fayetteville State University
45 PUBLICATIONS 543 CITATIONS
108 PUBLICATIONS 745 CITATIONS
SEE PROFILE
SEE PROFILE
Rajeev Agrawal
Balakrishna. Gokaraju
Engineer Research and Development Center - U.S. Army
University of West Alabama
95 PUBLICATIONS 652 CITATIONS
59 PUBLICATIONS 538 CITATIONS
SEE PROFILE
All content following this page was uploaded by Rajeev Agrawal on 20 December 2018.
The user has requested enhancement of the downloaded file.
SEE PROFILE
Open Source Platforms and Frameworks for Artificial
Intelligence and Machine Learning
Sambit Bhattacharya, Bogdan Czejdo
Department of Mathematics and Computer Science,
Fayetteville State University,
Fayetteville, NC, USA
{sbhattac, bczejdo}@uncfsu.edu
Rajeev Agrawal
U.S. Army Corps of Engineers
Engineer Research and Development Center
Vicksburg, MS, USA
rajeev.k.agrawal@erdc.dren.mil
Balakrishna Gokaraju
Erdem Erdemir
Department of Computer Science
Tennessee State University
Nashville, TN, USA
eerdemir@tnstate.edu
Abstract— In recent years Artificial Intelligence (AI) and
Machine Learning (ML) have emerged from academic labs and
become prominent drivers of innovation in the high-tech
industry. The number of jobs in this area has rapidly increased
along with research output from industry and the
commercialization of that research. It is widely accepted that the
changes resulting from these advances will shape society.
Diversity in these fields must be increased not only to get people
from underrepresented populations into these lucrative jobs but
also to have a positive impact by expecting that a more diverse
workforce will ensure the fairness of data-driven decisions made
by AI and ML algorithms, an issue that has come under scrutiny.
The work described here is a result of ongoing efforts to
modernize AI and ML courses, and to make its techniques an
integral part of the software engineering capstone course at our
educational institutions. We report on how the courses and the
practice of software engineering have adopted cloud based
platforms where we gain the benefit of creating virtual machines
and containers for necessary software stacks, the pedagogical
benefits of using open-source software that enable blending of
markup text, code and output from code within the same
document. Information about the choice of software libraries to
help students produce code and perform experiments on data at
an early stage is an important topic that is included in the
discussion. We list of examples of where we have leveraged
existing student interest in application of AI and ML, to motivate
the work on projects. The defense and intelligence community of
the US government are invested in a diverse talent pool in this
area. Efforts in specialized certification, and faculty-student
visits to government labs are described in this work.
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE
Department of Computer Information Systems &
Technology
University of West Alabama
Livingston, AL, USA
bgokaraju@uwa.edu
Keywords—diversity, artificial intelligence, machine learning,
capstone course, project-based learning.
I. INTRODUCTION
Artificial Intelligence (AI) and Machine Learning (ML) are
rapidly changing with the discovery of new algorithms, and
improvements of state-of-the-art on benchmarks. This
progress is driven by availability of big data, significant
amounts of which have associated ground truth, new forms of
parallel computing hardware that support extremely high rates
of numerical computation, and innovations in existing
algorithms that leverage these compute capabilities to create
the emerging field of deep learning [1]. The number of jobs in
AI and ML area has rapidly increased along with research
output from industry and the commercialization of that
research. Industry is also having a strong impact in these
developments by intensifying their own AI research due the
potentially high number of commercial applications. It is
widely accepted that the changes resulting from these
advances will shape society. Diversity in these fields must be
increased not only to get people from underrepresented
populations into these lucrative jobs but also to have a positive
impact by expecting that a more diverse workforce will ensure
the fairness of data-driven decisions made by AI and ML
algorithms, an issue that has come under scrutiny [4].
Due to rapid advancements in the field, it is challenging to
include recent developments in AI and ML coursework that
teaches their real world impact. Undergraduate research has
been identified as one of the highest impact practices [2, 3].
Student research is one of the best forms of active learning
that is associated with significant improvement of students’
comprehension and retention [5]. Our goal is to shift emphasis
from teaching to a more active form of student participation
through research. Group based research activities are
important for the success of programs directed towards
broadening the participation of diversified groups [6, 8, 9].
The underrepresentation of women and minorities is
especially acute in computer science area [7] and a way to
correct the imbalances created by this is to prepare students in
meaningful ways for the technical breakthroughs of the future.
In this paper we describe experiences in teaching capstone
projects, implementing project-based learning, and advising
students in research at HBCU institutions. We share details
that may other institutions engage in similar activities.
II. CAPSTONE AND COURSE DESIGN
The main challenges we face when designing capstone
experiences, and project-based learning in courses are to
create meaningful connections to real-world applications, and
scaffolding of research activities. Our collaborative efforts
address these needs at our two HBCU institutions.
In our experience students feel motivated when faced with a
problem they can connect with and offers challenges that can
be solved by creating applications. The fruits of AI and ML
research have found their way into commercial products and
government. We will discuss their use in government and
defense in a later work. Here we discuss commercial
applications as examples that students can directly relate to
since they are frequently users of those products. A good
starting point for discussions is recommendation systems. It is
almost certain that all students in our classes have used
recommendation-based services to buy products; stream multimedia content like movies or television shows.
Recommendation systems are built out of intelligent machine
learning algorithms that analyze a user’s activities and
compare it to the other users of the system (potential millions
of people) to determine what the specific user may like to buy
or watch next. We find that students frequently find research
ideas in recommendation systems. One idea selected by a
group of three women students as capstone course was to
create a fashion / clothing style recommendation system.
The areas of data security and physical security are rich
sources of applications. Malware is a serious problem with
new and significant cases discovered frequently. Machine
learning and more recently deep learning are creating
algorithms for classifying between normal software and
malware. Machine learning algorithms can also be developed
for detecting patterns in how data in the cloud is accessed, and
report anomalies that could predict security problems. Real-
time surveying and mapping, persistent surveillance, remote
inspection and monitoring, high altitude imaging, standoff
adversary identification, and advanced video analytics are
some applications in physical security, which are benefitting
from AI and ML research. In projects in these areas, students
find interest in how their understanding of how computers
work, are challenged. For example they are often caught by
surprise with the differences between human vision and
computer vision. However the scaling due to employing
computer algorithms instead of humans to visual tasks like
detection and identification on large amounts visual data
provides case studies on how ML can provide benefits in these
areas.
III. SOFTWARE, LIBRARIES AND PLATFORMS
Students develop code on local research clusters or on
cloud based accounts. In order for our research environment to
provide proper support for student research physical labs
consisting of appropriate hardware and software were created.
Jupyter Notebook is a new way of interactive, open-source web
application learning tool that helps instructors to deploy an
effective pedagogy for data visualizations, data mining,
computer simulations, machine learning and artificial
intelligence. The popularity of Jupyter Notebooks comes from
a successful implementation of literate programming, which is
a software development style designed by a computer scientist,
Donald Knuth [19]. Fernando Pérez and Brian Granger were
the scientists who could combine the strengths of literate
programming and very high level flexible language, Python
and invented the Jupyter Notebook.
Our faculty have recently started using Jupyter Notebooks
for the graduate level ML course and the undergraduate level
AI course which includes ML. This approach has produced
some initial results towards a more effective teaching strategy.
These results can be summarized as:
1. Jupyter Notebooks give students easy access to big data
and supercomputer. All the big data (in Terabytes) and the
required libraries are hosted in a remote server. Students can
use and try their algorithms without installing anything or
downloading big data. They only run Jupyter Notebook on
their web-browser which can connect to the remote server and
run algorithms on the big data.
2. Students have started to learn and were able to see the
results, figures, table starting from the first class.
3. Students start to obtain both the necessary knowledge
and practical experience in machine learning remarkably,
which has increased their self-confidence before joining the
work force.
We are progressively monitoring the impact of the Jupyter
Notebook to assess achievements in each category as defined
in the results with post-student surveys, which will be collected
at the completion of the course. Students will be asked to fill
out anonymous questionnaires about their satisfaction and
learning level in the material, relevance to career goals, and the
impact of enhancing their knowledge of AI.
IV. KNOWLEDGE ELEMENTS
In a recent 2017 DoD sponsored study, AI has been
recognized as the key enabling technology for the success of
DoD missions [report reference]. The development of GPUs
and the availability of large labeled data sets have
revolutionized the field of AI. The Department of Defense has
five DoD Supercomputing Resource Centers (DSRCs) to
provide DoD scientists high performance computing
infrastructure for the most complex computational problems
[link]. At the time of this writing, the centers deliver an
aggregate of 6.7 billion processor hours of computing power
per year and over 26 PetaFLOPS of computing capability via
multiple HPC systems. However, new employees with HPC,
ML and AI skills sets are required to effectively utilize these
excellent infrastructure. The colleges and universities are slow
to respond to the knowledge elements needed to operate these
centers. In most cases, students take time to become expert in
the new AI & ML skill sets. It would be nice if educational
institutions can keep up with the demand of newly needed
skills.
The modern weapons systems incorporate varying degree
of autonomy in air, sea, and ground domains. The US Military
has a large number of unmanned aerial systems, which have
been used in different parts of the world in last 15 years very
effectively. These systems. DARPA’s Anti-Submarine Warfare
Continuous Trail Unmanned Vessel (ACTUV) program
recently commissioned a 130 ft. unmanned trimaran to track
diesel electric submarines [1,2]. It is quite clear that AI and ML
topics are crucial to the prospective DoD engineers and
scientists. We will list here some of the skill sets DoD seeks in
the new hires.
Some scientists consider Machine Learning is one
component of AI. ML provides a mathematical and statistical
foundation to develop AI applications. Understanding of
supervised and unsupervised learning on the datasets are
important high-level topics. The k-means clustering, principal
component Analysis, support vector machines, Bayes models,
ensemble methods such as random forest, hidden markov
models (HMM) are algorithms, which can prepare students to
be ready for DoD jobs. Another sub-area of ML that has
evolved very fast in last few years is Deep Learning, which is
an extension of neural network but at scale. There were few
neural network applications for many years until the release of
GPUs and huge labeled data sets. Deep Learning application
revolve around image classification, object identification,
speech recognition and natural language understanding, though
researchers are constantly looking for its application in other
domains as well. The widespread application of Internet of
Things (IoT) will further fast track DL applications.
Deep Learning is taught only at a graduate level, however,
it is advisable that it should be incorporated at undergraduate
level to prepare student for ‘futuristic’ jobs.
There are many ML, and DL software tools available, both
as open source and commercial platforms. A sample minimum
skill sets desired from students are listed below:
Programming skills in R or Python with libraries such
as NumPy, Matplotlib, Pandas and scikit-learng
machine learning tool kit
Jupyter Notebook: an open-source web application
that allows to create and share documents to include
live code, text, visualizations.
Deep Learning: One of these: TensorFlow, Caffe, Torch,
Keras and Microsoft’s CNTK.
V. RESEARCH ACTIVITIES
The progress of Artificial Intelligence (AI) and Machine
Learning (ML), creation of easy accessible ML tools, and
abundance of data created a situation where practical
implementation of the programs in protein folding research has
become possible. This practical, real-world application helps
to create a strong motivation for students. Easy accessible ML
tools and data make research in this area to progress as never
before.
Protein folding research is of critical importance for
understanding of human body functions and development of
new medications. Proteins make up important parts of human
body since they act as tiny machines for biological processes.
They are initially created as a chain of amino acids (protein
sequence), which folds into a three dimensional shape called
the “native” structure. The structure is essential in the
biological function of the protein, and many diseases have been
linked with misfolded proteins [14].
Only a small percentage of proteins have known 3-D
structures. These structures had been determined in a very
expensive and slow experimental process. To accomplish a
quicker progress alternative techniques need to be used. ML
techniques seem to be especially productive by learning
computational methods that can quickly predict the native
structure of proteins from their sequence of amino acids. Data
describing both protein chains and known 3-D structures are
readily available in the Protein Data Bank, as a training set for
computational experiments [16]. Actually the training set is
much larger than that. It also includes thousands of
approximate protein structures (called protein models) that had
been generated for each protein chain based on
physical/chemical laws. The goal for the student researcher is
then more practically achievable of using a scoring functions to
select the best models for the given protein chain when the
native structure is unknown.
Different types of approaches can be used to score protein
models in student research. Our students followed and
enhance the experiments described in the literature using ML
algorithms such as neural networks [15], Support Vector
Machines (SVM) [16], and others that use calculated features
to estimate protein model quality. Recently the most rapidly
developing techniques that holds the greatest potential are
related to deep learning [18] which is perfectly suited for novel
and interesting discoveries in protein folding.
Our ongoing efforts in improving undergraduate research
projects resulted in developing an environment contain ML
tools and libraries. Our environment, therefore, supports
student activities by making available to them the
preconfigured hardware and software platform for protein
folding experiments. More specifically our platform includes
workstations with GPUs that have open source ML software
libraries installed. Hardware and software components of our
research environment provided to students allowed them to
avoid unnecessary impediments like having to configure
hardware and software systems.
We have observed many benefits for our undergraduate
researchers that will be crucial in their careers: learning to
collect, analyze and transform data to various representations
required by ML tools; using multiple ML models,
programming on multiple platforms including GPUs, and
applying rules of scientific experiments with the use of ML.
ACKNOWLEDGMENT
The work described in this paper was partly made possible
by an award (contract number W911NF-16-1-0426) from the
US Army Research Office. We also acknowledge support from
Google for providing Google Cloud Platform credits for
research and teaching.
REFERENCES
[1]
[2]
[3]
[4]
Schmidhuber, J. (2015). Deep learning in neural networks: An overview.
Neural Networks, 61, 85-117.
Lopatto, D. 2003. The essential features of undergraduate research.
Council on Undergraduate Research Quarterly, 23, 139–142.
Fincher, Sally, Marian Petre, and Martyn Clark. 2001. Computer science
project work: principles and pragmatics. Springer Science & Business
Media, 2001.
https://www.technologyreview.com/s/608986/forget-killer-robotsbias-isthe-real-ai-danger/
View publication stats
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
Erkan, A.; Newmark, S.; and Ommen, N. 2009. Exposure to research
through replication of research: a case in complex networks. In
Proceedings of the 14th annual ACM SIGCSE conference on Innovation
and technology in computer science education (ITiCSE '09). ACM, New
York, NY, USA, 114-118.
Baumgartner, J. “Why Diversity is the Mother of Creativity.”
InnovationManagement.se,(2013);
http://www.innovationmanagement.se/imtool-articles/why-diversity-isthe-mother-of-creativity/.
“Global Diversity and Inclusion: Fostering Innovation Through a
Diverse
Workforce.”
Forbes
Insights
(2015);
http://forbes.com/forbesinsights.
Expanding-the-pipeline-the-state-of-african-americans-in-computerscience-the-need-to-increase-representation.
http://cra.org/crn/2015/09/expanding-the-pipeline-the-state-of-africanamericans-in-computer-science-the-need-to-increase-representation/
Institute for African-American Mentoring in Computing Sciences:
http://www.iaamcs.org/
https://centers.hpc.mil/
https://fas.org/irp/agency/dod/jason/ai-dod.pdf
https://www.washingtonpost.com/news/checkpoint/wp/2016/04/08/meet
-sea-hunter-the-130-foot-unmannedvessel-the-navy-wants-to-huntsubmarines/
http://www.darpa.mil/program/anti-submarine-warfare-continuous-trailunmanned-vessel
Alberts B. et al. (2002), The Shape and Structure of Proteins, Molecular
Biology of the Cell; Fourth Edition. New York and London: Garland
Science. ISBN 0-8153-3218-1.
Faraggi E. and Kloczkowski A. (2014), A global machine learning based
scoring function for protein structure prediction, Proteins: Structure,
Function, and Bioinformatics, vol. 82, no. 5, pp. 752-759.
Mirzaei S, T. Sidi, C. Keasar, and S.Crivelli (2016), Purely Structural
Protein Scoring Functions
Using Support Vector Machine and Ensemble Learning, IEEE/ACM
Transactions on
Zaremba W. et al. (2014), Recurrent Neural Network Regularization,
arXiv preprint arXiv:1409.2329.
Unidata Online Python Training, https://unidata.github.io/online-pythontraining/introduction.html , 2017.
Download