Scholar: Andrew Emmott Focus: Machine Learning Advisors: Tom Dietterich, Prasad Tadepalli Donors: Leslie and Mark Workman ANOMALY DETECTION What is Anomaly Detection? Anomaly Detection is the practice of identifying the strangest data points (items, people, events, etc.) from a large selection of normal ones. Sometimes this is also called statistical outlier detection, (where, given a large collection of data, we ask which Anomaly Normal Normal Normal items are the least likely), but in our work we prefer the more general idea of anomalies because most applications are concerned with identifying some rare class of events that are influenced by some external process. For example, consider a fluke football game where the underdog team won by a very large score. Methodologies Work Anomaly detection is still a nascent field and there is a lack of consistency in how algorithms are evaluated. The bulk of my work has been exploring the efficacy of various methodologies. Different applications might have different characteristics and might allow for different algorithms to succeed. We have urged for a more careful and precise quantification of the qualities of certain anomaly detection. In simpler terms, we try to quantify the following things about any anomaly detection domain: • Point Difficulty – how difficult is it to distinguish the anomalies from the normal points? • Insider threats want to blend in. “Keep your distance. But don’t look like you’re trying to keep your distance.” • Jet engine failures do not. This event was unlikely, but we might only consider it a true anomaly if we learned that the favored team had thrown the match. If it was an honest game, then it was still generated by the normal football game process – it was simply an unlikely outcome. A thrown match is a true anomaly because it is caused by forces outside normal football. A more serious example: Consider that a cyber criminal might go to great lengths to make sure all of their actions appear normal. They are a rare class of person, and are not following the same rules that normal people follow, but they might not seem weird at first glance. • Relative frequency – How much of the data is anomalous? • Signal failure might be relatively frequent. • Insider threats might be very very rare. • Semantic variation – How similar are the anomalies to each other? • Are they a de facto class? • Or are they simply not normal? Thus, it is not easy to establish an all-purpose definition of anomalies. FACE NOT A FACE You might be familiar with some applications of Artificial Intelligence such as face recognition. These problems are known as supervised learning problems because the machine is supervised when it is told which images contain faces and which don’t. It simply finds rules that distinguish faces from not-faces. • Feature relevance/irrelevance – How well do statistical outliers in the data map to the application target? • An 8 foot tall man is an outlier. • But he is not an insider threat. Anomaly detection is an unsupervised learning problem. We want to develop algorithms that can distinguish normal from not-normal without being told which is which. This is more difficult but also more practical; we are asking the machine to find things we are already having trouble finding on our own! What Are the Applications? There are many real-world applications of anomaly detection. In general, whenever you might wish to identify things that are not normal, anomaly detection might be able to help! Some examples: • Cyber Security & Insider Threat Detection • Normal: Regular Employees • Not Normal: Data Thief • Machine Failure Prevention • Normal: Functioning Jet Engine • Not Normal: Failing Jet Engine • (Also: Elevators, Hard Drives, etc.) • Medical Prognosis • Normal: Healthy Cells • Not Normal: Cancer Cells • Surveillance • Normal: Bank teller is approached with a deposit slip. • Not Normal: Bank teller is approached with a gun. Algorithms Work We have also worked on developing anomaly detection algorithms of our own. A high level summary of the algorithms I have explored: 𝑥1 𝑝1 • Cross Prediction If our data is described by some number of features, we can treat each of those features like its own supervised learning problem. 𝑥4 𝑝4 𝑝2 In other words, find weirdness through cross examination. 𝑝3 Example: If two people buy five copies of Catcher in the Rye, that might seem weird. If one of them owns a bookstore and the other is unemployed, maybe only one of them seems strange. 𝑥3 𝑥2 Cross Prediction • Repeated Impossible Discrimination Ensemble (RIDE) Classification techniques can not only discriminate between classes, but many of them can provide a measure of confidence in their decision. If we randomly separate the data into two indiscernible faux-classes, the machine should have low confidence in most points. But what of the points where the machines has high confidence, even on an “impossible” and meaningless classification task? These points are more easily distinguished from all the rest, and therefore, not normal. Acknowledgements: Further Acknowledgements: Funding for my research is provided by the U.S. Army Research Office (ARO) and Defense Advanced Research Projects Agency (DARPA) under Contract Number W911NF-11-C-0088. The content of the information in this document does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on. Additional funding for my research is provided by an ARCS Scholar Award from the Portland Chapter of the ARCS Foundation. Thanks!