Artificial Intelligence Introduction NOTE: The transcript provided is not a "word for word" transcript of the presentation, but rather a compilation of the speaker's notes. Speaker: Mike Hervey, Watson and Cloud Platform Technical Enablement This is a science fiction AI family photo. AI from books and film have been around a long time in different forms. The manifestations of these AI may be clunky robots, androids, vehicles, bodiless software, or even synthetic humans. They are portrayed as sometimes friendly and sometimes menacing. But in real life, the AI we see is more likely to be an app on your phone than a robot (even if your phone is called an Android). There is artificial intelligence in your Netflix queue, your digital assistant, and your navigation app. It’s used behind the scenes to manage finances and identify credit card fraud. It helps keep spam out of your inbox and remind you of important events. It can tell you which one of your cousins is with you in that old photo. It can help doctors diagnose illnesses, read medical imaging, and find appropriate clinical trials for patients. More and more, AI is involved in the decisions we make each day. How can we define Artificial Intelligence? There are lots of definitions to choose from depending on who is doing the defining. For the purposes of this presentation, we’ll use the IBM Research definition which states that AI is anything that makes machines act more intelligently. AI is a broad field, and although certain aspects of AI may have the spotlight today, there is plenty of room for new innovations under the umbrella of AI. IBM has been working on different applications of AI since the 1950s, but when word gets around, it’s usually because the AI has proven to be the best at what it does at the time. Two times that IBM’s applications have made the headlines are when Deep Blue beat Garry Kasparov at chess, and when Watson beat Ken Jennings and Brad Rutter at Jeopardy!. Both of these feats were proof that computers could be used to tackle problems that human intelligence was previously the best or the only solution for. At IBM, we like to think of AI as Augmented Intelligence. The distinction is that AI should not attempt to replace human experts, but rather assist and amplify their intelligence. We can use Augmented Intelligence to extend human capabilities and accomplish things that neither humans nor machines could do on their own. Some of the challenges we face today come from an excess of information. The internet has lead to faster communications and access to more information. Distributed computing and IoT have lead to the generation of massive amounts of data, and social networking has encouraged most of that data to be unstructured. There is so much data that human experts cannot keep up with all the changes and advancements in their fields of study. With Augmented Intelligence, we at IBM want to put the information that subject matter experts need at their fingertips, and back that information up with evidence, so that the experts can make more informed decisions. We want experts to scale their capabilities so they can better service their customers. We want to let the machines do the time-consuming work so that the experts are able to do the things that matter. As with any new disruptive technology, AI brings its own ethical challenges. AI has the potential to access enormous amounts of information, imitate humans (even specific humans), make life-changing recommendations about health and finances, correlate data that may invade privacy, and so much more. The ethical challenges are not because these issues are new. For a long time, we’ve had the capability to build tools and systems that can do these things. The challenge is because now AI is prolific and is used frequently in everyday life. Data is being collected on people, appliances, automobiles, networks, healthcare, purchasing decisions, weather, communications, and everything else you see. It’s also easier than ever to get access to this data as people carry around and use their smart phones and smart watches, and then use them to post pictures of their food, store purchases, and exercise routines on social media. When developing AI powered systems, we have to establish trust. If you cannot trust the AI to be completely honest with you, then you can’t trust the advice it gives. As an example, AI bots are very common in social media and customer support. If a bot does not identify itself as such, but is later found to be one, the user’s level of trust in the solution will most likely plummet. I’m not talking about illegal or nefarious uses of bots, but just having a machine answering questions about your home loan or health issues without identifying itself as AI can cause trust issues. Also, if AI is going to be trusted, it should be able to support recommendations with data. If someone gives you advice, they need to have a level of trust built before you will follow it. One way to build trust is to show how the recommendation was constructed. Any solution that uses AI should be built with transparency and disclosure in mind. Privacy is another concern that must be addressed from the beginning when building an AI solution. Once privacy is lost, you can’t get it back. Concerns in privacy come from the ability for seemingly unrelated data to be tracked and combined over time to identify patterns of behavior or even personal secrets either intentionally or accidentally. Any AI solution should consider the privacy of the individuals that contributed data, provide adequate notification of data collection practices, and allow users to opt in or opt out of tracking when appropriate. Privacy and protecting a company’s intellectual property are also huge considerations. At IBM, your data belongs to you. We will not use your data to improve your competitors’ solutions. In traditional programming, everything is deterministic, and the if-then-else loop determines how the software responds to varying conditions. That works well when you are comparing known values, like literal text strings or numbers in a spreadsheet. But when you want to classify words or objects into categories, simple if-then statements won’t work. The world is far too complex, with shades of gray and multiple correct answers, and probabilistic answers become a requirement. Probabilistic answers also provide insight from AI advisors. Whereas a deterministic system can tell you, “The answer is X because this comparison succeeded”, a probabilistic system can tell you, “I’m very confident this is the correct answer, but I also have evidence to support these alternatives”. As I said before, AI means different things to different people. For a video game designer, AI means writing the code that affects how bots play or how the environment reacts to the player. For a movie writer, AI means a character that acts like a human with some trope computer features mixed in. For a data scientist, AI is a way of exploring and classifying data to meet specific goals. But no matter who is defining AI, the crucial point is that AI means intelligence. How do we define intelligence? When you look at AI from a worldwide perspective, there are many lively discussions about what AI is. Many of these discussions are rooted in differences in defining what intelligence is. Where is the line drawn between intelligence and mimicry? In 1950, Alan Turing published a paper called “Computing Machinery and Intelligence”. In it, he asks the reader to consider whether or not machines can think, and proposes a game that will substitute for answering the question. In a variant of the game, an interrogator asks questions of two unseen contestants. One of the contestants is human, and the other is a machine. The machine wins if it is able to convince the interrogator that it is the human, and the other contestant is not. The fundamental idea is that if the machine is able to win the game, then surely it must be capable of thought? Another thought experiment that relates to the Turing Test is the Chinese Room argument. Initially posed by John Searle in 1980, he imagines a locked room with input and output slots where sheets of paper containing Chinese characters could be passed. If a Chinese literate person were to place a statement into the input slot, and were to then receive a logical response from the output slot, that after some amount of positive interaction, the person would say that whatever is in the room understands Chinese. But if you were to open the room, inside you would find a person who speaks and writes only English, but who has access to a large collection of Chinese input phrases, corresponding output phrases, and a rule book written in English that gives instructions on matching the input to the output. The point is that the Chinese Room could pass the Turing Test, but still not truly understand Chinese. If we think of a computer as operating in the same way as the Chinese Room, can a computer ever really understand? The Turning Test and the Chinese Room are both very interesting topics with many years worth of commentary. I would recommend you do some research on your own if you have any interest in machine intelligence. Humans bring something to the equation that machines currently do not. They have an innate intelligence. If you were to show a child a toy, and then place it under a blanket, the child would understand that the toy did not cease to exist, and that it is located under the blanket. You did not have to teach the child the concepts of existence and visibility. If you then walk the child down the hallway into another room, the child would be able to backtrack to the blanket and get the toy without being explicitly told how to navigate back to the room or how to move the blanket. A machine, on the other hand, must be explicitly programmed to understand these concepts that we consider to be common sense. There are many aspects of AI, each of which is influenced by advances in the sciences and philosophy. Computer science and electrical engineering determine how AI is implemented in software and hardware. Mathematics and statistics determine viable models and measure performance. Because AI is modeled on how we believe the brain works, psychology and linguistics play an important role in understanding how AI might work. Philosophy provides guidance about topics like what intelligence is and ethical considerations. It is the fusion of all these fields of study that make it possible for us to build machines that act intelligently. You will hear AI described in many ways, and sometimes there is overlap in the terms people use. For example, you may hear someone speak of a weak AI being the same as a narrow AI, or a strong AI being the same as Artificial General Intelligence. I’m not presenting this to get into a battle over semantics, so I’ll just say that because the lines between these terms can get blurred, you should focus on understanding the context in which they are being used. The terms “weak AI” and “strong AI” are terms Searle used to explain the Chinese Room thought experiment, where a strong AI is one that could understand what it is processing, and a weak AI is one that has no understanding. In today’s terms, all AI solutions are weak AI. Strong AI is also referred to as Artificial General Intelligence or Conscious AI. A “narrow AI” is one that can be applied to a very small and specific domain. An example of a narrow AI could be one that translates words from one language to another, or one that identifies pictures of kittens. If I combine these two narrow AIs together with a system that drives a car so that it stops when it sees a real kitten and tells you in 14 different languages what it saw, I still have a narrow and weak AI (although it is a very capable one). “Broad AI” is a term that is not used very frequently, probably because multiple narrow AIs still lead to another narrow AI. A broad AI, though, is like Artificial General Intelligence. It would be an AI that is not limited to a single function like driving a car, but one that can interact and operate at a wide variety of independent and unrelated tasks. Applied AI can perform specific tasks, but not learn new ones. It makes decisions based on programmed algorithms and training data. AI that can classify images or language are an example of Applied AI. Artificial General Intelligence (AGI) is AI with the ability to learn new tasks in order to solve new problems. In order to learn new tasks, the AI must be able to teach itself new strategies for dealing with problems. AGI will be the combination of many AI strategies that learn from experience and perform at a human level of intelligence at multiple tasks. By definition, conscious AI would be a strong AI. Conscious AI is an AI with human level consciousness, which would require it to be self-aware. Because we are not yet able to fully define what that consciousness is, it is unlikely that we will be able to create a conscious AI in the near future. That limitation has not stopped anyone from trying, however. One thing we can say with certainty is that when a Conscious AI is finally created, it will be able to play games and sort things really well. Whether you are talking about natural intelligence or artificial intelligence, nothing is intelligent that cannot learn. Humans are born with some level of innate intelligence, and we can build on that intelligence through learning. The only innate intelligence machines have is what we give them, and common sense is not currently on that very short list of items. What we do provide, though, is the ability to examine examples, and create machine learning models based on the inputs and desired outputs. For supervised learning, we provide the AI with examples. Let’s take pictures of birds for example. We can take a large volume of pictures of different birds that we want to recognize on demand later, and train the model to return the label “bird” whenever it is provided a picture of a bird. By itself, the model is worthless because it thinks everything is a bird until we show it something that is not a bird. So we can also create a label for “cat” and provide a large volume of pictures of cats to train on. When we are done, the machine model can be shown a picture of a cat or a bird, and it will label the picture with some level of confidence. Unsupervised learning is where you provide inputs, but not labels, and let the machine infer qualities. This type of learning can be useful for clustering data, where data is grouped according to how similar it is to its neighbors and dissimilar to everything else. Once the data is clustered, you can use different techniques to explore that data, and look for patterns. Reinforcement learning is where the machine makes a decision, and is given a reward or a punishment depending on whether the decision was a good one. You could use reinforcement learning to teach a machine to play chess or navigate an obstacle course. Deep learning is a subset of machine learning that applies to neural networks. Deep learning networks refer to the arrangements of the nodes, and we’ll cover this in more detail when talking about neural networks. When choosing sources of data for training your machine models, it is important to sample data that is representative of the data the will be encountered in production. One problem that can occur is when training data will not predict future input data. Let’s say you have a client who says, “I own a red sports car, so when I show your AI pictures of my car, it better recognize it as a sports car. Put lots of pictures of cars like mine in the training data. Oh, and our town uses yellow fire trucks and I don’t want the locals laughing if the AI doesn’t know what a yellow fire truck looks like.” So you train your model on the photographs the client gives you (after all, it is their data). How would you expect the model to perform on pictures taken in town where the sports cars are red and the fire trucks are yellow? How would you expect it to work on data outside of that town? A very important question is, where is the client going to use this solution? Over the years, AI has proven itself useful in many different domains. Before AI, the idea of a computer recognizing people in photographs based solely on examples was unheard of. Today, we use AI to examine medical images to rapidly identify abnormalities, saving patients’ lives. For many years now, AI has been in use by financial institutions to monitor investments, detect fraudulent transactions, and prevent other crimes. Bots are being used to assist customers by answering questions and fulfilling requests through chat windows and phone interactions, providing instant help without long queues. What is AI good at, and what is it not good at? The answer to that often comes down to data. AI technology improves just like any technology does, so today’s limitation may be tomorrow’s breakthrough, but current AI technologies are very good at classification and translation. That could include tasks like telling the difference between a picture of a dog and a picture of a cat, or converting speech to text and vice-versa. The caveat is that you often need to train the AI with real-world examples. For example, if you are training a bot to answer questions about a service you offer, the bot needs to be trained on different ways a customer could ask for that service. Because garbage in equals garbage out, no AI solution can give good results from bad data. What you can do if some of your data is bad, however, is identify and keep your usable data, and collect or build new data that can be used in your solution. You should be considering the amount, quality, and sensitivity of the data you have to work with. It’s important to note that with IBM services, we respect that you own your data. We will not use your data to improve the performance of your competitors offerings. It will only be used to train your machine models. In order to give the most value, we have focused our resources on improving training times and squeezing the most insight from smaller quantities of data. The end results are solutions that require less data to build, are faster to train and deploy, and that protect your intellectual property. Neural networks NOTE: The transcript provided is not a "word for word" transcript of the presentation, but rather a compilation of the speaker's notes. Speaker: Mike Hervey, Watson and Cloud Platform Technical Enablement If you want to create a flying machine, you look to birds and insects as models. If you want to create machines that see or hear, you look to eyes and ears as models. But if you want to create machines that think, you model them after brains. The biological processing that goes on inside our brains is driven by chemical and electrical activities that happen between cells called neurons. Neurons have connectors called dendrites that act as inputs, and axons that act as outputs. The junctions of these connectors, where neurotransmitters are released and bind to receptors, are called synapses. To give you some perspective, the human brain is made up of somewhere on the order of 100 billion neurons, each with an average 7 thousand synaptic connections to other neurons. As a child grows into an adult, the number of synapses will decrease until 500 trillion to 100 trillion remain before stabilizing. Artificial neural networks borrow some ideas from the biological networks of the brain in order to approximate some of the processing results. Perceptrons are the simplest and oldest types of neural networks. They consist of some input nodes connected directly to an output node. You can see that the structure here is similar to what you might expect from some biological neurons where the input nodes correspond to dendrites, and the output node operates like an axon. Perceptrons are an example of feedforward networks. Feedforward networks send information in one direction, from the input layer towards the output later, when in use. When they are being trained, data is sent in the opposite direction. I’ll cover that more in a few minutes. Each of these nodes has some additional properties that we need to discuss in order to understand how a system of nodes can do something useful. We can start with inputs. An input is a number that represents a feature. No processing is done at the input layer, it just forwards the input values to the next layer by means of multiplying by a weight and summing the results. In addition to the input and output nodes, most networks will also have hidden layers of nodes. Hidden nodes must receive input from other nodes, and then forward their output to other nodes. They do not receive input or produce output directly. Hidden and output nodes have a property called bias. A bias is a special type of weight that applies to a node after the other inputs are considered. Finally, an activation function determines how a node responds to its inputs. The function is run against the sum of the inputs and bias, and then the result is forwarded as output. Activation functions can take different forms, and choosing them is a critical component to the success of a neural network. In this network, there are two input values, each of which has a weight associated with it along the transfer line. The output node has a bias as well as an activation function. Each input is multiplied by its weight and summed with the others, and with the bias value. That total is passed through the activation function before being forwarded. The activation function is there to ensure the output is within the expected range for the next stage of the network. Typical activation functions can limit the output to a 0 or 1, or a range between numbers like -1 to 1. Initially the weights are just random numbers, but as the system is trained, the weights and biases will incrementally adjust. Neural networks can learn through a process called backpropagation. Backpropagation uses a set of training data that matches known inputs to desired outputs. First, the inputs are plugged into the network, and outputs are determined. Next, an error function is used to determine how far the output given was from the desired output. Finally, adjustments are made to the weights and biases in order to reduce the error. This process is actually one way that artificial networks are dissimilar to the human brain, because in human neurons the information flows in only one direction. Learning is accomplished using different mechanisms in people. Deep learning and deep networks refer to when you have more than one hidden layer in your node configuration. Hidden layers only receive input from other nodes, and only send output to other nodes. They are used to identify higher order features in the data. The configuration shown here is an example of a Deep Feed Forward network. Deep networks have some advantages and added complexities over simple perceptrons. In a famous Indian fable, six blind men describe an elephant by feeling the different parts (side, tusk, trunk, etc.). In the end, each man’s description is technically correct, but none of them sees the big picture. The first hidden layer of a deep network takes the raw data from the inputs and refines features from that data. So, when using a picture as input, the first layer might be used to find edges. Edges are important, but not sufficient by themselves to classify images, so we can use another layer that is made of nodes that look for combinations of edges that form shapes or facial features. As we progress through the network, the hidden layers correlate the pieces, and then forward their findings to the output layer. When you consider the number of calculations a fully connected neural network needs to perform during backpropagation, you can quickly see that the hardware you are running on needs to be able to handle many simultaneous mathematical operations. The Central Processing Unit (CPU) that runs most desktop and laptop computers, is designed to handle a mixed load of complex and mostly serial operations. Today’s CPUs typically have a number of cores that number in the low tens (usually between 2-4 for desktops and laptops), which is fine for running a few applications and browser tabs, but is insufficient for graphically and floating point intensive calculations. Contrast that with GPUs that typically have thousands of cores that work well with the type of computations neural networks require. Chip manufacturers are devoting time and energy to developing new chips that address the need for highly parallel and energy efficient processors. As of the time of this recording, the GPU market is seeing unprecedented demand as the chips are finding new uses in cryptocurrency and artificial intelligence. The big names in cloud services are offering GPU options in their cloud-based servers, including IBM of course. IBM Research is going a step further with the TrueNorth platform that measures its power use in milliwatts while running a typical recurrent network at biological real-time. TrueNorth is not simply a new processor that replaces existing GPUs, but rather is an entirely new architecture with new supporting tooling that can solve a wide class of problems from vision, audition, and multi-sensory fusion. Natural Language Processing NOTE: The transcript provided is not a "word for word" transcript of the presentation, but rather a compilation of the speaker's notes. Speaker: Mike Hervey, Watson and Cloud Platform Technical Enablement I’m going to give you the very broad definition of Natural Language Processing. Some other definitions will focus on the more textual and grammatical aspects, but for this talk, I’m throwing everything in the mix. Of all the creatures on Earth, humans have by far the most advanced method of communication, known as natural language. There are literally thousands of languages in use today (estimates of 6,000-7,000), and although they may be very different in their origins, sounds, and appearances, they do tend to share some common features like having words, nouns and verbs. The two ways of communicating natural language are by spoken and written phrases. Computers communicate in completely different ways, like by sending and receiving radio waves and electrical impulses. The protocols we use to transfer information with computers allows them to silently and without vision, exchange enormous amounts of data around the world continually. While humans can use computers to send voice and text messages to each other, computers do not innately know how to process natural language. The computer science field of natural language processing is devoted to empowering machines to process natural language, as well as communicate with human users using natural language. I bring up NLP in this course for two reasons. First, machine learning and AI advances have benefitted Natural Language Processing. Neural networks are helpful for classifying language and performing translations. Language has imprecisions, but allows for error correction, and lends itself to probabilistic computation. NLP today is in use as broadly as it is because of machine learning. Secondly, we have turned a corner in human-machine interaction. In the early days of computer programming, people had to adjust to the machines. You could not input data without resorting to punch cards or something similar. As we made computers more complex, their components smaller, and their usable memory larger we entered a phase of programming languages which allowed people to make easier adjustments to use computers. But the problem still remained that people needed to speak the computer’s language. With AI systems, we are finally at a stage where the computers are the ones that have to learn to adjust to people. Once that transformation is complete, anyone will have the ability to communicate naturally with machines. This is not an exhaustive list, but natural language processing is broken down into many sub-categories related to audio and visual tasks. We can consider verbal communication from both the speaking and the listening sides of the communication. Information is conveyed through the selection of words, but can also be transferred by volume and intonation. In order for computers to communicate in natural language, they need to be able to convert speech into text so that the communication is easier to process and relay, and they need to be able to convert text to speech so that users can interact without the requirement to stare at a screen. Optical character recognition has been around for a long time now, and has made many advancements including real-time recognition and translation using a mobile phone. Image classifiers like Watson Visual Recognition service can read signs, clothing, labels, and other text directly from photographs. Natural language classification is the capability to take some text from a user and classify it into an intent. This functionality is important for building bots and virtual assistants. Language translation is a challenge that has proven to be a lot harder than initially thought. Idiomatic expressions, multiple translations, and words that have no equivalent in the target language are just a few of the challenges that have made this activity difficult. Sentiment analysis is about determining the emotions that a person is attempting to convey through natural language. It is frequently used to judge the reactions to social media posts, customer satisfaction surveys, and emails. Natural language query refers to natural language user interfaces that accept natural language as query parameters. Digital assistants that answer your questions, Q&A bots, and search prompts that accept phrases like “What were the operating expenses last quarter?” are all examples of UIs that support Natural Language Query. Natural language understanding and natural language generation are transformations that work in opposite directions. The goal of natural language understanding is to take something that is written and convert it into an idea or a concept. It is the extraction of information from text. For natural language generation, you are expressing an idea or concept into writing. Both of these tasks are highly complex due to the nature and nuance of language, as well as the challenge of distilling complex concepts into a form that can be processed. This URL will take you to the overview page of the impact of IBM Research in AI. Once the page opens, you are encouraged to click the "Seminal Contributions to AI" link on the page to read real-world examples IBM's contribution to AI. Click https://researcher.watson.ibm.com/researcher/view_group.php?id=1 35 link to open resource. https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer) https://en.wikipedia.org/wiki/IBM_Watson https://www.forbes.com/sites/blakemorgan/2017/06/13/ethics-and-artificial-intelligence-with-ibmwatsons-rob-high/?sh=748cc6f7260e Wikipedia: Deep Blue (chess computer) Wikipedia: Watson (computer) Forbes: Ethics And Artificial Intelligence With IBM Watson's Rob High TechRepublic: IBM Watson CTO: The 3 ethical principles AI needs to embrace IBM: The code of ethics for AI and chatbots that every brand should follow IBM: Data Responsibility @ IBM Wikipedia: Artificial intelligence Wikipedia: Computing Machinery and Intelligence COMPUTING MACHINERY AND INTELLIGENCE, by A. M. Turing Wikipedia: Chinese room Internet Encyclopedia of Philosophy: Chinese Room Argument MIT Technology Review: Why humans learn faster than AI—for now A Big Data Cheat Sheet: From Narrow AI to General AI What is Narrow, General and Super Artificial Intelligence Artificial General Intelligence – The Holy Grail of AI The Hows, Whats and Whos of Neuroscience The Human Memory: What it is, how it works, and how it can go wrong Wikipedia: Neuron IBM: IBM Watson Machine Learning service Medium: Creating a Neural Network from Scratch Neural Networks and Deep Learning The mostly complete chart of Neural Networks, explained Building High-level Features Using Large Scale Unsupervised Learning Backpropagation IBM: Unleash GPU power for faster processing, superior performance Forbes: In The Era Of Artificial Intelligence, GPUs Are The New CPUs IBM Research: Introducing a Brain-inspired Computer- TrueNorth's neurons to revolutionize system architecture Wikipedia: Natural-language processing Wikipedia: Sentiment analysis IBM: Natural Language Understanding