See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/328179850 Open Source Platforms and Frameworks for Artiļ¬cial Intelligence and Machine Learning Conference Paper · April 2018 DOI: 10.1109/SECON.2018.8479098 CITATIONS READS 3 1,346 5 authors, including: Sambit Bhattacharya Denny Czejdo Fayetteville State University Fayetteville State University 45 PUBLICATIONS 543 CITATIONS 108 PUBLICATIONS 745 CITATIONS SEE PROFILE SEE PROFILE Rajeev Agrawal Balakrishna. Gokaraju Engineer Research and Development Center - U.S. Army University of West Alabama 95 PUBLICATIONS 652 CITATIONS 59 PUBLICATIONS 538 CITATIONS SEE PROFILE All content following this page was uploaded by Rajeev Agrawal on 20 December 2018. The user has requested enhancement of the downloaded file. SEE PROFILE Open Source Platforms and Frameworks for Artificial Intelligence and Machine Learning Sambit Bhattacharya, Bogdan Czejdo Department of Mathematics and Computer Science, Fayetteville State University, Fayetteville, NC, USA {sbhattac, bczejdo}@uncfsu.edu Rajeev Agrawal U.S. Army Corps of Engineers Engineer Research and Development Center Vicksburg, MS, USA rajeev.k.agrawal@erdc.dren.mil Balakrishna Gokaraju Erdem Erdemir Department of Computer Science Tennessee State University Nashville, TN, USA eerdemir@tnstate.edu Abstract— In recent years Artificial Intelligence (AI) and Machine Learning (ML) have emerged from academic labs and become prominent drivers of innovation in the high-tech industry. The number of jobs in this area has rapidly increased along with research output from industry and the commercialization of that research. It is widely accepted that the changes resulting from these advances will shape society. Diversity in these fields must be increased not only to get people from underrepresented populations into these lucrative jobs but also to have a positive impact by expecting that a more diverse workforce will ensure the fairness of data-driven decisions made by AI and ML algorithms, an issue that has come under scrutiny. The work described here is a result of ongoing efforts to modernize AI and ML courses, and to make its techniques an integral part of the software engineering capstone course at our educational institutions. We report on how the courses and the practice of software engineering have adopted cloud based platforms where we gain the benefit of creating virtual machines and containers for necessary software stacks, the pedagogical benefits of using open-source software that enable blending of markup text, code and output from code within the same document. Information about the choice of software libraries to help students produce code and perform experiments on data at an early stage is an important topic that is included in the discussion. We list of examples of where we have leveraged existing student interest in application of AI and ML, to motivate the work on projects. The defense and intelligence community of the US government are invested in a diverse talent pool in this area. Efforts in specialized certification, and faculty-student visits to government labs are described in this work. XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE Department of Computer Information Systems & Technology University of West Alabama Livingston, AL, USA bgokaraju@uwa.edu Keywords—diversity, artificial intelligence, machine learning, capstone course, project-based learning. I. INTRODUCTION Artificial Intelligence (AI) and Machine Learning (ML) are rapidly changing with the discovery of new algorithms, and improvements of state-of-the-art on benchmarks. This progress is driven by availability of big data, significant amounts of which have associated ground truth, new forms of parallel computing hardware that support extremely high rates of numerical computation, and innovations in existing algorithms that leverage these compute capabilities to create the emerging field of deep learning [1]. The number of jobs in AI and ML area has rapidly increased along with research output from industry and the commercialization of that research. Industry is also having a strong impact in these developments by intensifying their own AI research due the potentially high number of commercial applications. It is widely accepted that the changes resulting from these advances will shape society. Diversity in these fields must be increased not only to get people from underrepresented populations into these lucrative jobs but also to have a positive impact by expecting that a more diverse workforce will ensure the fairness of data-driven decisions made by AI and ML algorithms, an issue that has come under scrutiny [4]. Due to rapid advancements in the field, it is challenging to include recent developments in AI and ML coursework that teaches their real world impact. Undergraduate research has been identified as one of the highest impact practices [2, 3]. Student research is one of the best forms of active learning that is associated with significant improvement of students’ comprehension and retention [5]. Our goal is to shift emphasis from teaching to a more active form of student participation through research. Group based research activities are important for the success of programs directed towards broadening the participation of diversified groups [6, 8, 9]. The underrepresentation of women and minorities is especially acute in computer science area [7] and a way to correct the imbalances created by this is to prepare students in meaningful ways for the technical breakthroughs of the future. In this paper we describe experiences in teaching capstone projects, implementing project-based learning, and advising students in research at HBCU institutions. We share details that may other institutions engage in similar activities. II. CAPSTONE AND COURSE DESIGN The main challenges we face when designing capstone experiences, and project-based learning in courses are to create meaningful connections to real-world applications, and scaffolding of research activities. Our collaborative efforts address these needs at our two HBCU institutions. In our experience students feel motivated when faced with a problem they can connect with and offers challenges that can be solved by creating applications. The fruits of AI and ML research have found their way into commercial products and government. We will discuss their use in government and defense in a later work. Here we discuss commercial applications as examples that students can directly relate to since they are frequently users of those products. A good starting point for discussions is recommendation systems. It is almost certain that all students in our classes have used recommendation-based services to buy products; stream multimedia content like movies or television shows. Recommendation systems are built out of intelligent machine learning algorithms that analyze a user’s activities and compare it to the other users of the system (potential millions of people) to determine what the specific user may like to buy or watch next. We find that students frequently find research ideas in recommendation systems. One idea selected by a group of three women students as capstone course was to create a fashion / clothing style recommendation system. The areas of data security and physical security are rich sources of applications. Malware is a serious problem with new and significant cases discovered frequently. Machine learning and more recently deep learning are creating algorithms for classifying between normal software and malware. Machine learning algorithms can also be developed for detecting patterns in how data in the cloud is accessed, and report anomalies that could predict security problems. Real- time surveying and mapping, persistent surveillance, remote inspection and monitoring, high altitude imaging, standoff adversary identification, and advanced video analytics are some applications in physical security, which are benefitting from AI and ML research. In projects in these areas, students find interest in how their understanding of how computers work, are challenged. For example they are often caught by surprise with the differences between human vision and computer vision. However the scaling due to employing computer algorithms instead of humans to visual tasks like detection and identification on large amounts visual data provides case studies on how ML can provide benefits in these areas. III. SOFTWARE, LIBRARIES AND PLATFORMS Students develop code on local research clusters or on cloud based accounts. In order for our research environment to provide proper support for student research physical labs consisting of appropriate hardware and software were created. Jupyter Notebook is a new way of interactive, open-source web application learning tool that helps instructors to deploy an effective pedagogy for data visualizations, data mining, computer simulations, machine learning and artificial intelligence. The popularity of Jupyter Notebooks comes from a successful implementation of literate programming, which is a software development style designed by a computer scientist, Donald Knuth [19]. Fernando Pérez and Brian Granger were the scientists who could combine the strengths of literate programming and very high level flexible language, Python and invented the Jupyter Notebook. Our faculty have recently started using Jupyter Notebooks for the graduate level ML course and the undergraduate level AI course which includes ML. This approach has produced some initial results towards a more effective teaching strategy. These results can be summarized as: 1. Jupyter Notebooks give students easy access to big data and supercomputer. All the big data (in Terabytes) and the required libraries are hosted in a remote server. Students can use and try their algorithms without installing anything or downloading big data. They only run Jupyter Notebook on their web-browser which can connect to the remote server and run algorithms on the big data. 2. Students have started to learn and were able to see the results, figures, table starting from the first class. 3. Students start to obtain both the necessary knowledge and practical experience in machine learning remarkably, which has increased their self-confidence before joining the work force. We are progressively monitoring the impact of the Jupyter Notebook to assess achievements in each category as defined in the results with post-student surveys, which will be collected at the completion of the course. Students will be asked to fill out anonymous questionnaires about their satisfaction and learning level in the material, relevance to career goals, and the impact of enhancing their knowledge of AI. IV. KNOWLEDGE ELEMENTS In a recent 2017 DoD sponsored study, AI has been recognized as the key enabling technology for the success of DoD missions [report reference]. The development of GPUs and the availability of large labeled data sets have revolutionized the field of AI. The Department of Defense has five DoD Supercomputing Resource Centers (DSRCs) to provide DoD scientists high performance computing infrastructure for the most complex computational problems [link]. At the time of this writing, the centers deliver an aggregate of 6.7 billion processor hours of computing power per year and over 26 PetaFLOPS of computing capability via multiple HPC systems. However, new employees with HPC, ML and AI skills sets are required to effectively utilize these excellent infrastructure. The colleges and universities are slow to respond to the knowledge elements needed to operate these centers. In most cases, students take time to become expert in the new AI & ML skill sets. It would be nice if educational institutions can keep up with the demand of newly needed skills. The modern weapons systems incorporate varying degree of autonomy in air, sea, and ground domains. The US Military has a large number of unmanned aerial systems, which have been used in different parts of the world in last 15 years very effectively. These systems. DARPA’s Anti-Submarine Warfare Continuous Trail Unmanned Vessel (ACTUV) program recently commissioned a 130 ft. unmanned trimaran to track diesel electric submarines [1,2]. It is quite clear that AI and ML topics are crucial to the prospective DoD engineers and scientists. We will list here some of the skill sets DoD seeks in the new hires. Some scientists consider Machine Learning is one component of AI. ML provides a mathematical and statistical foundation to develop AI applications. Understanding of supervised and unsupervised learning on the datasets are important high-level topics. The k-means clustering, principal component Analysis, support vector machines, Bayes models, ensemble methods such as random forest, hidden markov models (HMM) are algorithms, which can prepare students to be ready for DoD jobs. Another sub-area of ML that has evolved very fast in last few years is Deep Learning, which is an extension of neural network but at scale. There were few neural network applications for many years until the release of GPUs and huge labeled data sets. Deep Learning application revolve around image classification, object identification, speech recognition and natural language understanding, though researchers are constantly looking for its application in other domains as well. The widespread application of Internet of Things (IoT) will further fast track DL applications. Deep Learning is taught only at a graduate level, however, it is advisable that it should be incorporated at undergraduate level to prepare student for ‘futuristic’ jobs. There are many ML, and DL software tools available, both as open source and commercial platforms. A sample minimum skill sets desired from students are listed below: Programming skills in R or Python with libraries such as NumPy, Matplotlib, Pandas and scikit-learng machine learning tool kit Jupyter Notebook: an open-source web application that allows to create and share documents to include live code, text, visualizations. Deep Learning: One of these: TensorFlow, Caffe, Torch, Keras and Microsoft’s CNTK. V. RESEARCH ACTIVITIES The progress of Artificial Intelligence (AI) and Machine Learning (ML), creation of easy accessible ML tools, and abundance of data created a situation where practical implementation of the programs in protein folding research has become possible. This practical, real-world application helps to create a strong motivation for students. Easy accessible ML tools and data make research in this area to progress as never before. Protein folding research is of critical importance for understanding of human body functions and development of new medications. Proteins make up important parts of human body since they act as tiny machines for biological processes. They are initially created as a chain of amino acids (protein sequence), which folds into a three dimensional shape called the “native” structure. The structure is essential in the biological function of the protein, and many diseases have been linked with misfolded proteins [14]. Only a small percentage of proteins have known 3-D structures. These structures had been determined in a very expensive and slow experimental process. To accomplish a quicker progress alternative techniques need to be used. ML techniques seem to be especially productive by learning computational methods that can quickly predict the native structure of proteins from their sequence of amino acids. Data describing both protein chains and known 3-D structures are readily available in the Protein Data Bank, as a training set for computational experiments [16]. Actually the training set is much larger than that. It also includes thousands of approximate protein structures (called protein models) that had been generated for each protein chain based on physical/chemical laws. The goal for the student researcher is then more practically achievable of using a scoring functions to select the best models for the given protein chain when the native structure is unknown. Different types of approaches can be used to score protein models in student research. Our students followed and enhance the experiments described in the literature using ML algorithms such as neural networks [15], Support Vector Machines (SVM) [16], and others that use calculated features to estimate protein model quality. Recently the most rapidly developing techniques that holds the greatest potential are related to deep learning [18] which is perfectly suited for novel and interesting discoveries in protein folding. Our ongoing efforts in improving undergraduate research projects resulted in developing an environment contain ML tools and libraries. Our environment, therefore, supports student activities by making available to them the preconfigured hardware and software platform for protein folding experiments. More specifically our platform includes workstations with GPUs that have open source ML software libraries installed. Hardware and software components of our research environment provided to students allowed them to avoid unnecessary impediments like having to configure hardware and software systems. We have observed many benefits for our undergraduate researchers that will be crucial in their careers: learning to collect, analyze and transform data to various representations required by ML tools; using multiple ML models, programming on multiple platforms including GPUs, and applying rules of scientific experiments with the use of ML. ACKNOWLEDGMENT The work described in this paper was partly made possible by an award (contract number W911NF-16-1-0426) from the US Army Research Office. We also acknowledge support from Google for providing Google Cloud Platform credits for research and teaching. REFERENCES [1] [2] [3] [4] Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. Lopatto, D. 2003. The essential features of undergraduate research. Council on Undergraduate Research Quarterly, 23, 139–142. Fincher, Sally, Marian Petre, and Martyn Clark. 2001. Computer science project work: principles and pragmatics. Springer Science & Business Media, 2001. https://www.technologyreview.com/s/608986/forget-killer-robotsbias-isthe-real-ai-danger/ View publication stats [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] Erkan, A.; Newmark, S.; and Ommen, N. 2009. Exposure to research through replication of research: a case in complex networks. In Proceedings of the 14th annual ACM SIGCSE conference on Innovation and technology in computer science education (ITiCSE '09). ACM, New York, NY, USA, 114-118. Baumgartner, J. “Why Diversity is the Mother of Creativity.” InnovationManagement.se,(2013); http://www.innovationmanagement.se/imtool-articles/why-diversity-isthe-mother-of-creativity/. “Global Diversity and Inclusion: Fostering Innovation Through a Diverse Workforce.” Forbes Insights (2015); http://forbes.com/forbesinsights. Expanding-the-pipeline-the-state-of-african-americans-in-computerscience-the-need-to-increase-representation. http://cra.org/crn/2015/09/expanding-the-pipeline-the-state-of-africanamericans-in-computer-science-the-need-to-increase-representation/ Institute for African-American Mentoring in Computing Sciences: http://www.iaamcs.org/ https://centers.hpc.mil/ https://fas.org/irp/agency/dod/jason/ai-dod.pdf https://www.washingtonpost.com/news/checkpoint/wp/2016/04/08/meet -sea-hunter-the-130-foot-unmannedvessel-the-navy-wants-to-huntsubmarines/ http://www.darpa.mil/program/anti-submarine-warfare-continuous-trailunmanned-vessel Alberts B. et al. (2002), The Shape and Structure of Proteins, Molecular Biology of the Cell; Fourth Edition. New York and London: Garland Science. ISBN 0-8153-3218-1. Faraggi E. and Kloczkowski A. (2014), A global machine learning based scoring function for protein structure prediction, Proteins: Structure, Function, and Bioinformatics, vol. 82, no. 5, pp. 752-759. Mirzaei S, T. Sidi, C. Keasar, and S.Crivelli (2016), Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning, IEEE/ACM Transactions on Zaremba W. et al. (2014), Recurrent Neural Network Regularization, arXiv preprint arXiv:1409.2329. Unidata Online Python Training, https://unidata.github.io/online-pythontraining/introduction.html , 2017.