Simplifying Concepts, Delivering Success℠ DESIGNING INTELLIGENT SYSTEMS USING RESOURCE CONSTRAINED EDGE DEVICES Jacob Beningo | President © 2019 Jacob Beningo All Rights Reserved Simplifying Concepts, Delivering Success℠ THE LECTURER Jacob Beningo President Social Media / Contact jacob@beningo.com 810-844-1522 Jacob_Beningo Newsletters Embedded Bytes Beningo Engineering JacobBeningo Embedded Basics http://bit.ly/1BAHYXm © 2019 Jacob Beningo All Rights Reserved www.beningo.com Consulting • • • • • • Secure Bootloaders Code Reviews Architecture Design Real-time Software Expert Firmware Analysis Microcontroller Systems Embedded Training • • • • • RTOS Workshop Bootloader Design Debugging Techniques Security Fundamentals Micro Python 2 Simplifying Concepts, Delivering Success℠ SESSION OVERVIEW TOPICS 1 Designing Intelligent Systems 2 Machine Learning Basics 3 Intelligence in the Cloud 4 Intelligence at the Edge 5 Datasets, Frameworks and Libraries 6 Example Applications 7 Best Practices © 2019 Jacob Beningo All Rights Reserved OBJECTIVE Explore artificial intelligence applications at the edge on Cortex-M processors. 3 Simplifying Concepts, Delivering Success℠ INTRODUCTION The Pillars of Embedded Software Development For embedded software developers, there are core skillsets that every developer must master such as: • Architecture Design • Code Analysis • Debug • Documentation • Language Skills • Processes and Standards • Testing • Tools © 2019 Jacob Beningo All Rights Reserved Artificial Intelligence 4 Simplifying Concepts, Delivering Success℠ DESIGNING INTELLIGENT SYSTEMS Machine Learning “Machine learning is a field of computer science that often uses statistical techniques to give computers the ability to ‘learn’ with data, without being explicitly programmed” - Wikipedia © 2019 Jacob Beningo All Rights Reserved 5 Simplifying Concepts, Delivering Success℠ DESIGNING INTELLIGENT SYSTEMS Machine Learning Why do we need intelligent systems? • To solve problems that are not easy for humans to code for • To scale system behaviors and results based on new data and situations • To perform tasks that are easy for a human but traditionally difficult for computers • To decrease system costs in certain applications • Because it’s cool and cutting edge © 2019 Jacob Beningo All Rights Reserved 6 Simplifying Concepts, Delivering Success℠ DESIGNING INTELLIGENT SYSTEMS Machine Learning What can machine learning be used for? • Image recognition • Speech and audio processing • Language processing • Robotics • Bioinformatics • Chemistry • Video Games • Search © 2019 Jacob Beningo All Rights Reserved 7 Simplifying Concepts, Delivering Success℠ 8 DESIGNING INTELLIGENT SYSTEMS The Range of “Edge” Applications © 2019 Jacob Beningo All Rights Reserved Image Courtesy Arm Simplifying Concepts, Delivering Success℠ 9 DESIGNING INTELLIGENT SYSTEMS The Range of “Edge” Applications © 2019 Jacob Beningo All Rights Reserved Image Courtesy Arm Simplifying Concepts, Delivering Success℠ MACHINE LEARNING BASICS – DEEP LEARNING Perceptron Neuron w · x = !j wjxj 0 x1 4 w1 w 1 x2 -2 2 1 w3 1 x3 b=3 If w · x + b ≤ 0 1 If w · x + b > 0 Output w · x = (0*4) + (1*-2) + (1 *1) = -1 w · x + b = -1 + 3 = 2 > 0 © 2019 Jacob Beningo All Rights Reserved 0 10 Simplifying Concepts, Delivering Success℠ MACHINE LEARNING BASICS – DEEP LEARNING Sigmoid Neuron x1 x2 x3 © 2019 Jacob Beningo All Rights Reserved w1 w2 w3 !(w · x + b) Output Fractional value 0 to 1 11 Simplifying Concepts, Delivering Success℠ MACHINE LEARNING – DEEP LEARNING Sigmoid Neuron The Sigmoid Function © 2019 Jacob Beningo All Rights Reserved 12 Simplifying Concepts, Delivering Success℠ MACHINE LEARNING – DEEP LEARNING Neural Networks Hidden Layers Input Layer © 2019 Jacob Beningo All Rights Reserved Output Layer 13 Simplifying Concepts, Delivering Success℠ INTELLIGENCE IN THE CLOUD Embedded Architectures © 2019 Jacob Beningo All Rights Reserved 14 Simplifying Concepts, Delivering Success℠ INTELLIGENCE IN THE CLOUD Cloud Experimentation Experiment Setup • STM32F779I-Eval • Google Cloud Vision API’s • Express Logic • X-Ware IoT Platform • ThreadX • NetX HTTPS Client • NetX Secure TLS • etc Camera Module Ethernet LCD © 2019 Jacob Beningo All Rights Reserved AC Adapter ST-Link 15 Simplifying Concepts, Delivering Success℠ INTELLIGENCE IN THE CLOUD Cloud Experimentation © 2019 Jacob Beningo All Rights Reserved 16 Simplifying Concepts, Delivering Success℠ 17 INTELLIGENCE AT THE EDGE Why is ML Moving to the Edge? Bandwidth Power © 2019 Jacob Beningo All Rights Reserved Cost Latency Reliability Security Image Courtesy Arm Simplifying Concepts, Delivering Success℠ 18 INTELLIGENCE AT THE EDGE Model Deployment on Cortex-M MCUs • Running ML framework on Cortex-M systems is impractical • Need to run bare-metal code to efficiently use the limited resources • Arm NN translates trained model to the code that runs on Cortex-M cores using CMSIS-NN functions • CMSIS-NN: optimized low-level NN functions for Cortex-M CPUs • CMSIS-NN APIs may also be directly used in the application code © 2019 Jacob Beningo All Rights Reserved Image Courtesy Arm Simplifying Concepts, Delivering Success℠ INTELLIGENCE AT THE EDGE The Intelligent Edge What do you need to do machine learning at the edge? • DSP Capable Processor • ML Libraries • Enough CPU cycles • Training Dataset • • 5,000 labeled examples per category for acceptable performance 10,000,000 labeled examples to achieve human performance • Time and patience Image Source: hackernoon © 2019 Jacob Beningo All Rights Reserved 19 Simplifying Concepts, Delivering Success℠ 20 DATASETS, FRAMEWORKS AND LIBRARIES Dataset Size (# samples) Datasets 109 108 107 106 105 104 103 102 101 100 Canadian Hansard WMT ImageNet 10k Sports-1M ImageNet Public SVHN ILSVRC 2014 MNIST Criminals CIFAR-10 IRIS T vs. G vs. F 1900 © 2019 Jacob Beningo All Rights Reserved 1950 Rotated T vs. G 2000 1985 2015 Image Courtesy Arm Simplifying Concepts, Delivering Success℠ DATASETS, FRAMEWORKS AND LIBRARIES Software Frameworks DistBelief TensorFlow MXNet Theano Software Libraries PyLearn2 Torch Caffe © 2019 Jacob Beningo All Rights Reserved 21 Simplifying Concepts, Delivering Success℠ 22 DATASETS, FRAMEWORKS AND LIBRARIES CMSIS-NN CMSIS-NN: collection of optimized neural network functions for Cortex-M CPUs Key considerations: § Improve performance using SIMD instructions § Minimize memory footprint § NN-specific optimizations: data-layout and offline weight reordering © 2019 Jacob Beningo All Rights Reserved Image Source: Arm Simplifying Concepts, Delivering Success℠ 23 DATASETS, FRAMEWORKS AND LIBRARIES CMSIS-NN: Efficient NN Kernels for Cortex-M CPUs Convolution § Boost compute density with GEMM based implementation § Reduce data movement overhead with depth-first data layout § Interleave data movement and compute to minimize memory footprint Pooling § § Improve performance by splitting pooling into x-y directions Improve memory access and footprint with in-situ updates Activation § ReLU: Improve parallelism by branch-free implementation § Sigmoid/Tanh: fast table-lookup instead of exponent computation ©CMSIS-NN 2019 Jacobpaper: Beningo https://arxiv.org/abs/1801.06601 All Rights Reserved *Baseline uses CMSIS 1D Conv and Caffe-like Pooling/ReLU Image Source: Arm Simplifying Concepts, Delivering Success℠ 24 DATASETS, FRAMEWORKS AND LIBRARIES CMSIS-NN: Efficient NN Kernels for Cortex-M CPUs Many resources available due to the openness of ML community: • DNN: https://research.google.com/pubs/archive/42537.pd f • CNN: https://research.google.com/pubs/archive/43969.pd f • CNN-GRU: https://arxiv.org/abs/1703.05390 • LSTM: https://arxiv.org/abs/1705.02411 Need compact models: that fit within the Cortex-M system memory Need models with less operations: to achieve real time performance © 2019 Jacob Beningo All Rights Reserved NN Models from literature trained on Google speech commands dataset Image Source: Arm Simplifying Concepts, Delivering Success℠ EXAMPLE APPLICATIONS Convolutional Neural Network (CNN) on Cortex-M7 • CNN with 8-bit weights and 8-bit activations • - Total memory footprint: 87 kB weights + 40 kB activations + 10 kB buffers (I/O etc.) • - Example code available in CMSIS-NN github © 2019 Jacob Beningo All Rights Reserved NUCLEO-F746ZG 216 MHz, 320 KB SRAM 25 Simplifying Concepts, Delivering Success℠ EXAMPLE APPLICATIONS OpenMV Camera OpenMV Cam with a Cortex-M7 © 2019 Jacob Beningo All Rights Reserved Video : https://www.youtube.com/watch?v=PdWi_fvY9Og 26 Simplifying Concepts, Delivering Success℠ BEST PRACTICES Machine Learning 1 Read Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville, Francis Bach. 6 Use 80% of your data for training and the last 20% for validating the model. 2 Start in the cloud or on a PC and then work your way to the embedded target. 7 Review the Arm papers on keyword spotting and speech recognition. 3 Create a “Hello World” application that can recognize hand written digits. 8 Purchase a development kit and duplicate an example and then try to scale it. 4 Make sure you are using the right data. 9 Explore CMSIS-NN and the white papers that surround it 5 Try multiple tools to see which one best fits your application and team. 10 Start early, don’t wait to the last minute to learn how machine vision works. © 2019 Jacob Beningo All Rights Reserved 27 Simplifying Concepts, Delivering Success℠ GOING FURTHER Resources from beningo.com • Embedded Bytes Newsletter • Introduction Video: https://www.youtube.com/watch?v=aircAruvn Kk • Online Book: http://neuralnetworksanddeeplearning.com/ • MIT Course: http://introtodeeplearning.com/ • CMSIS-NN paper: https://arxiv.org/abs/1801.06601 • KWS (Keyword Spotting) paper: https://arxiv.org/abs/1711.07128 © 2019 Jacob Beningo All Rights Reserved 28 Simplifying Concepts, Delivering Success℠ UPCOMING EVENTS • RTOS Fundamentals Online • Advanced RTOS Techniques Online • Bootloaders Online • Technology Primers § Debugging § Security For more events, visit Beningo.com © 2019 Jacob Beningo All Rights Reserved 29 Simplifying Concepts, Delivering Success℠ THE NEXT SESSION WILL BEGIN SHORTLY Introduction to the Cortex-M1 and Cortex-M3 using Arm DesignStart FPGA This session will explore the benefits of using the Cortex-M1 and Cortex-M3 soft cores within a Xilinx FPGAs. To attend this session no FPGA knowledge is necessary, the session will cover the architecture of the device, connecting peripherals and how we create and deploy project along with the debugging options available to us. While the time available for these concepts is limited the attendee will take away a good overview of the benefit, design and development life cycle and how to address any challenges encountered along the way. Register at: https://www.beningo.com/insights/conferences or http://bit.ly/ArmDesignStartFPGA © 2019 Jacob Beningo All Rights Reserved 30 THANK YOU! beningo.com Simplifying Concepts, Delivering Success℠ Trademark and copyright statement: The trademarks featured in this presentation are registered and/or unregistered trademarks of Beningo Embedded Group (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. Copyright © 2019. All rights reserved.