UC Berkeley Detecting Large-Scale System Problems by Mining Console Logs Wei Xu (徐葳)* Ling Huang† Armando Fox* David Patterson* Michael Jordan* *UC Berkeley † Intel Labs Berkeley Jan 18, 2009, MSRA 1 Why console logs? • Detecting problems in large scale Internet services often requires detailed instrumentation • Instrumentation can be costly to insert & maintain • Often combine open-source building blocks that are not all instrumented • High code churn • Can we use console logs in lieu of instrumentation? + Easy for developer, so nearly all software has them – Imperfect: not originally intended for instrumentation 2 The ugly console log HODIE NATUS EST RADICI FRATER today unto the Root a brother is born. * "that crazy Multics error message in Latin." http://www.multicians.org/hodie-natus-est.html 3 Result preview Parse Detect 200 nodes, >24 million lines of logs Visualize Abnormal log segments A single page visualization • Fully automatic process without any manual input 4 Outline Key ideas and general methodology • Methodology illustrated with Hadoop case study • Four step approach • Evaluation • An extension: supporting online detection • Challenges • Two stage detection • Evaluation • Related work and future directions 5 Our approach and contribution Parsing Feature Creation Machine Learning Visualization • A general methodology for processing console logs automatically (SOSP’09) • A novel method for online detection (ICDM ’09) • Validation on real systems • Working on production data from real companies 6 Key insights for analyzing logs • The log contains the necessary information to create features • Identifiers • State variables • Correlations among messages (sequences) receiving blk_1 received blk_1 receiving blk_2 NORMAL ERROR • Console logs are inherently structured • Determined by log printing statement 7 Outline • Key ideas and general methodology Methodology illustrated with Hadoop case study • Four step approach • Evaluation • An extension: supporting online detection • Challenges • Two stage detection • Evaluation • Related work and future directions 8 Step 1: Parsing • Free text → semi-structured text • Basic ideas Receiving block blk_1 Log.info(“Receiving block ” + blockId); Receiving block (.*) [blockId] Type: Variables: Receiving block blockId(String)=blk_1 • Non-trivial in object oriented languages •String representation of objects •Sub-classing • Our Solution •Type inference on source code •Pre-compute all possible message templates 9 Parsing: Scale parsing with mapreduce Time used (min) 20 24 Million lines of console logs = 203 nodes * 48 hours 15 10 5 0 0 20 40 60 Number of nodes 80 100 10 Parsing: Language specific parsers • Source code based parsers • Java (OO Language) • C / C++ (Macros and complex format strings) • Python (Scripting/dynamic languages) • Binary based parsers • Java byte code • Linux + Windows binaries using printf format strings 11 Step 2: Feature creation Message count vector • Identifiers are widely used in logs • Variables that identify objects manipulated by the program receiving blk_1 • file names, object keys, user ids receiving blk_2 • Grouping by identifiers • Similar to execution traces receiving blk_1 received blk_2 received blk_1 received blk_1 receiving blk_2 • Identifiers can be discovered automatically 12 Feature creation – Message count vector example • Numerical representation of these “traces” • Similar to bag of words model in information retrieval Receiving blk_1 Receiving blk_2 Received blk_2 Receiving blk_1 Received blk_1 Received blk_1 Receiving blk_2 blk_1 ■ ■0 ■0 ■ 0■ 0■ 0■ 2 ■2 ■1 ■2 ■0 ■0 ■2 ■ 0■ 0■ 00 blk_2 0 2 ■1 ■0 ■1 ■2 ■0 ■0 ■ 2■ 0■ 00 ■■ ■0 ■0 ■ 0■ 0■ 13 Step 3: Machine learning – PCA anomaly detection • Most of the vectors are normal • Detecting abnormal vectors • Principal Component Analysis (PCA) based detection • PCA captures normal patterns in these vectors • Based on correlations among dimensions of the vectors 02 21200200000000 receiving blk_1 received blk_1 receiving blk_2 NORMAL ERROR 14 Evaluation setup • Experiment on Amazon’s EC2 cloud • • • • 203 nodes x 48 hours Running standard map-reduce jobs ~24 million lines of console logs ~575,000 HDFS blocks • 575,000 vectors • ~ 680 distinct ones • Manually labeled each distinct cases • Normal/abnormal • Tried to learn why it is abnormal • For evaluation only 15 PCA detection results Anomaly Description 1 2 3 4 5 6 7 8 9 10 11 Actual Detected Forgot to update namenode for deleted block Write block exception then client give up Failed at beginning, no block written Over-replicate-immediately-deleted Received block that does not belong to any file Redundant addStoredBlock request received Trying to delete a block, but the block no longer exists on data node Empty packet for block Exception in receiveBlock for block PendingReplicationMonitor timed out Other anomalies Total anomalies Normal blocks Description False Positives 1 2 Normal background migration Multiple replica ( for task / jobdesc files ) Total 4297 3225 2950 2809 1240 953 724 476 89 45 108 4297 3225 2950 2788 1228 953 650 476 89 45 107 16916 16808 558223 False Positives 1397 349 1746 How can we make the results easy for operators to understand? 16 Step 4: Visualizing results with decision tree writeBlock # received exception >=1 ERROR 0 # Starting thread to transfer block # to # >=3 <=2 #: Got exception while serving # to #:# >=1 OK 0 Unexpected error trying to delete block #\. BlockInfo Not found in volumeMap >=1 ERROR 1 0 addStoredBlock request received for # on # size # But it does not belong to any file >=1 ERROR 1 0 # starting thread to transfer block # to # >=1 0 OK 0 #Verification succeeded for # >=1 0 OK 0 Receiving block # src: # dest: # >=3 OK 0 ERROR 1 <=2 ERROR 1 17 The methodology is general • Surveyed a number of software apps • Linux kernel, OpenSSH • Apache, MySQL, Jetty • Cassandra, Nutch • Our methods apply to virtually all of them • Same methodology worked on multiple features / ML algorithms • Another feature “state count vector” • Graph visualization • WIP: Analyzing production logs at Google 18 Outline • Key ideas and general methodology • Methodology illustrated with Hadoop case study • Four step approach • Evaluation An extension: supporting online detection • Challenges • Two stage detection • Evaluation • Related work and future directions 19 What makes it hard to do online detection? Parsing Feature Creation PCA Detection Visualization Message count vector feature is based on sequences, NOT individual messages 20 Online detection: when to make detection? • Cannot wait for the entire trace • Can last arbitrarily long time • How long do we have to wait? • Long enough to keep correlations • Wrong cut = false positive • Difficulties • No obvious boundaries • Inaccurate message ordering • Variations in session duration receiving blk_1 receiving blk_1 received blk_1 received blk_1 reading blk_1 reading blk_1 deleting blk_1 deleted blk_1 Time 21 Frequent patterns help determine session boundaries • Key Insight: Most messages/traces are normal • Strong patterns • “Make common paths fast” • Tolerate noise 22 Two stage detection overview Dominant cases Parsing Frequent pattern based filtering Non-pattern Normal cases Free text logs 200 nodes PCA Detection Real anomalies OK OK ERROR 23 Stage 1 - Frequent patterns (1): Frequent event sets Repeat until all patterns found Coarse cut by time Find frequent item set receiving blk_1 receiving blk_1 received blk_1 reading blk_1 received blk_1 Refine time estimation error blk_1 reading blk_1 deleting blk_1 deleted blk_1 PCA TimeDetection 24 Stage 1 - Frequent patterns (2) : Modeling session duration time • Cannot assume Gaussian • 99.95th percentile estimation is off by half • 45% more false alarms • Mixture distribution Count Pr(X>=x) • Power-law tail + histogram head Duration Duration 25 Frequent pattern matching filters most of the normal events 86% 100% Parsing* Dominant cases Frequent pattern based filtering 14% Free text logs 200 nodes OK Non-pattern 13.97% Normal cases PCA Detection Real anomalies OK ERROR 0.03% 26 Frequent patterns in HDFS Frequent Pattern 99.95th percentile Duration Allocate, begin write % of messages 13 sec 20.3% 8 sec 44.6% Delete - 12.5% Serving block - 3.8% Read exception - 3.2% Verify block - 1.1% Done write, update metadata Total • Covers most messages • Short durations 85.6% (Total events ~20 million) 27 Detection latency • Detection latency is dominated by the wait time Frequent pattern (matched) Single event pattern Frequent pattern (timed out) Non pattern events 28 Detection accuracy True Positives False Positives False Negatives Precision Recall Online 16,916 2,748 0 86.0% 100.0% Offline 16,808 1,746 108 90.6% 99.3% (Total trace = 575,319) • Ambiguity on “abnormal” • Manual labels: “eventually normal” • > 600 FPs in online detection as very long latency • E.g. a write session takes >500sec to complete (99.99th percentile is 20sec) 29 Outline • Key ideas and general methodology • Methodology illustrated with Hadoop case study • Four step approach • Evaluation • An extension: supporting online detection • Challenges • Two stage detection • Evaluation Related work and future directions 30 Related work: system monitoring and problem diagnosis Heavier instrumentation Finer granularity data (Passive) data collection •Syslog •SNMP •Chukwa [Boulon08] Configurable Tracing frameworks Source/binary •Dtrace •Xtrace [Fonseca07] •VM-based tracing [King05] •Aspect-oriented programming •Assertions Numerical traces Path-based problem detection SystemHistory[Cohen05] Fingerprints[Bodik10] [Lakhina04] Pinpiont [Chen04] Statistical debugging [Zheng06] AjaxScope [Kiciman07] Replay-based Deterministic replay [Altekar09] Event as time series [Hellerstein02] [Lim08] [Yamanishi05] Alternative logging PeerPressure[Wang04] Heat-ray [Dunagan09] [Kandula09] Runtime predicates D3S [Liu08] 31 Related work: log parsing • Rule based and scripts • Microsoft log parser, splunk, … • Frequent pattern based • [Vaarandi04] R. Vaarandi. A breadth-first algorithm for mining frequent patterns from event logs. In proceedings of INTELLCOMM, 2004. • Hierarchical • • • [Fu09] Q. Fu, et al. Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis. In Proc. of ICDM 2009 [Makanju09] A. A. Makanju, et al. Clustering event logs using iterative partitioning. In Proc. of KDD ’09 [Fisher08] K. Fisher, et al. From dirt to shovels: fully automatic tool generation from ad hoc data. In Proc. of POPL ’08 • Key benefit of using source code • Handles rare messages, which is likely to be more useful in problem detection 32 Related work: modeling logs • As free texts • [Stearley04] J. Stearley. Towards informatic analysis of syslogs. In Proc. of CLUSTER, 2004. • Visualize global trends • • [CretuCioCarlie08] G. Cretu-Ciocarlie et al, Hunting for problems with Artemis, In Proc of WASL 2008 [Tricaud08] S. Tricaud, Picviz: Finding a Needle in a Haystack, In Proc. of WASL 2008 • A single event stream (time series) • • • [Ma01] S. Ma, et.al.. Mining partially periodic event patterns with unknown periods. In Proceedings of the 17th International Conference on Data Engineering, 2001. [Hellerstein02] J. Hellerstein, et.al. Discovering actionable patterns in event data, 2002. [Yamanishi05] K. Yamanishi, et al. Dynamic syslog mining for network failure monitoring. In Proceedings of ACM KDD, 2005. • Multiple streams / state transitions • • [Tan08] J. Tan, et al. SALSA: Analyzing Logs as State Machines. In Proc. of WASL 2008 [Fu09] Q. Fu, et al. Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis. In Proc. of ICDM 2009 33 Related work: techniques that inspired this project • Path based analysis • • [Chen04] M. Y. Chen, et al. Path-based failure and evolution management. In Proceedings of NSDI, 2004. [Fonseca07] R. Fonseca, et.al. Xtrace: A pervasive network tracing framework. In In NSDI, 2007. • Anomaly detection and kernel methods • • [Gartner03] T. Gartner. A survey of kernels for structured data. SIGKDD Explore, 2003. [Twining03] C. J. Twining, et al. The use of kernel principal component analysis to model data distributions. Pattern Recognition, 36(1):217–227, January 2003. • Vector space information retrieval • • [Papineni 01] K. Papineni. Why inverse document frequency? In NAACL, 2001. [Salton87] G. Salton et al. Term weighting approaches in automatic text retrieval. Cornell Technical report, 1987. • Using source code as alternative info • [Tan07] L. Tan, et.al.. /*icomment: bugs or bad comments?*/. In Proceedings of ACM SOSP, 2007. 34 Future directions: console log mining • Distributed log stream processing • Handle large scale cluster + partial failures • Integration with existing monitoring systems • Allows existing detection algorithm to apply directly • • • • Logs from the entire software stack Handling logs across software versions Allowing feedback from operators Providing suggestions on improving logging practice 35 Beyond console logs: textual information in SW dev-operation Documentation Comments Bug tracking Version control logs Unit testing Operation docs Scripts Configurations Problem tickets Code Paper Makefiles Versioning diffs Continuous integration Profiling Monitoring data Platform counters Console logs Core dumps 36 Summary Parsing Feature Creation Machine Learning Visualization • A general methodology for processing console logs automatically (SOSP’09) • A novel method for online detection (ICDM ’09) • Validation on real systems http://www.cs.berkeley.edu/~xuw/ Wei Xu <xuw@cs.berkeley.edu> 37 Backup slides 38 Rare messages are important Receiving block blk_100 src: … dest:... Receiving block blk_200 src: … dest:... Receiving block blk_100 src: …dest:… Received block blk_100 of size 49486737 from … Received block blk_100 of size 49486737 from … Pending Replication Monitor timeout block blk_200 39 Compare to natural language (semantics) based approach • We use console log solely as a trace of program execution • Semantic meanings sometimes misleading • Does “Error” really mean it? • Missing link between the message and the actual program execution • Hard to completely eliminate manual search 40 Compare to text based parsing • Existing work on inferring structure of log messages without source codes • • • • • Patterns in strings Words weighing Learning “grammar” of logs Pros -- Don’t need source code analysis Cons -- Unable to handle rare messages • 950 possible, only <100 appeared in our log • Per-operation features require highly accurate parsing 41 Better logging practices • We do not require developers to change their logging habits • Use logging • Traces vs. error reporting • Distinguish your messages from others’ • printf(“%d”, i); • Use a logging framework • Levels, timestamps etc. • Use correct levels (think in a system context) • Unnecessary WARNINGs and ERRORs 42 Ordering of events • Message count vectors do not capture any ordering/timing information • Cannot guarantee the global ordering on distributed systems – Needs significant change to the logging infrastructure 43 Too many alarms -- what can operators do with them? • Operators see clusters of alarms instead of individual ones • Easy to cluster similar alarms • Most of the alarms are the same • Better understanding on problems • Deterministic bugs in the systems • Provides more informative bug reports • Either fix the bug or the log • Due to transient failures • Clustered around single nodes/subnets/time • Better understanding of normal patterns 44 Case studies • Surveyed a number of software apps • • • • Linux kernel, OpenSSH Apache, MySQL, Jetty Hadoop, Cassandra, Nutch Our methods apply to virtually all of them 45 Log parsing process 46 Step 1: Log parsing Challenges in object oriented languages LOG.info(“starting:” + transact); starting: xact 325 is PREPARING starting: xact 325 is STARTING at Node1:1000 starting: (.*) [transact][Transaction] [Participant.java:345] xact (.*) is (.*) [tid, state] [int, String] starting: xact 325 is PREPARING starting: (.*) [transact][Transaction] [Participant.java:345] xact (.*) is (.*) at (.*) [tid, state, node] [int, String, Node] starting: xact 325 is STARTING at Node1:1000 47 Intuition behind PCA 48 Another decision tree 49 Existing work on textual log analysis • Frequent item set mining – R. Vaarandi. SEC - a lightweight event correlation tool. IEEE Workshop on IP Operations and Management, 2002. – • Temporal properties analysis – – Y. Liang, A. Sivasubramaniam, and J. Moreira. Filtering failure logs for a bluegene/L prototype. In Proceedings of IEEE DSN, 2005. C. Lim, et.al. A log mining approach to failure analysis of enterprise telephony systems. In Proceedings of IEEE DSN, 2008. • Statistical modeling of logs – R. K. Sahoo, et.al.. Critical event prediction for proactive management in largescale computer clusters. In Proceedings of ACM KDD, 2003. 50 Real world logging tools • Generation • Log4j, Jakarta commons logging, logback, SLF4J, ... • Standard web servers log format (W3C, Apache) • Collection - syslog and beyond • syslog-ng (better availability) • nsyslog, socklog… (better security) • Analysis – Query languages for logs • Microsoft log parser • SQL and continuous query / Complex events processing / GUI – Talend Open Studio – Manually specified rules • A. Oliner and J. Stearley. What supercomputers say: A study of five system logs. In Proceedings of IEEE DSN, 2007. • logsurfer, Swatch, OSSEC – an open source host intrusion detection system – Searching the logs • Splunk – Log visualization • Syslog watcher … 51 • • • • • • • • • • • [1] R. S. Barga, J. Goldstein, M. Ali, and M. Hong. Consistent streaming through time: A vision for event stream processing. In Proceedings of CIDR, volume 2007, pages 363–374, 2007. [2] D. Borthakur. The hadoop distributed file system: Architecture and design. Hadoop Project Website, 2007. [3] R. Dunia and S. J. Qin. Multi-dimensional fault diagnosis using a subspace approach. In Proceedings of American Control Conference, 1997. [4] R. Feldman and J. Sanger. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge Univ. Press, 12 2006. [5] R. Fonseca, G. Porter, R. H. Katz, S. Shenker, and I. Stoica. Xtrace: A pervasive network tracing framework. In In NSDI, 2007. [6] T. G¨artner. A survey of kernels for structured data. SIGKDD Explor. Newsl.,5(1):49–58, 2003. [7] T. G¨artner, J.W. Lloyd, and P. A. Flach. Kernels and distances for structured data. Mach. Learn., 57(3):205–232, 2004. [8] M. Gilfix and A. L. Couch. Peep (the network auralizer): Monitoring your network with sound. In LISA ’00: Proceedings of the 14th USENIX conference on System administration, pages 109–118, Berkeley, CA, USA, 2000. USENIX Association. [9] L. Girardin and D. Brodbeck. A visual approach for monitoring logs. In LISA ’98: Proceedings of the 12th USENIX conference on System administration, pages 299–308, Berkeley, CA, USA, 1998. USENIX Association. [10] C. Gulcu. Short introduction to log4j, March 2002. http://logging.apache.org/log4j. [11] J. Hellerstein, S. Ma, and C. Perng. Discovering actionable patterns in event data, 2002. 52 • • • • • • • [12] J. E. Jackson and G. S. Mudholkar. Control procedures for residuals associated with principal component analysis. Technometrics, 21(3):341–349, 1979. [13] A. Lakhina, M. Crovella, and C. Diot. Diagnosing network-wide traffic anomalies. In SIGCOMM ’04: Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, pages 219–230, New York, NY, USA, 2004. ACM. [14] A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E. D. Kolaczyk, and N. Taft. Structural analysis of network traffic flows. In SIGMETRICS ’04/Performance’04: Proceedings of the joint international conference on Measurement and modeling of computer systems, pages 61–72, New York, NY, USA, 2004. ACM. [15] Y. Liang, A. Sivasubramaniam, and J. Moreira. Filtering failure logs for a bluegene/L prototype. In DSN ’05: Proceedings of the 2005 International Conference on Dependable Systems and Networks, pages 476–485, Washington, DC, USA, 2005. IEEE Computer Society. [16] C. Lim, N. Singh, and S. Yajnik. A log mining approach to failure analysis of enterprise telephony systems. In Proceedings of International conference on dependable systems and networks, June 2008. [17] S. Ma and J. L. Hellerstein. Mining partially periodic event patterns with unknown periods. In Proceedings of the 17th International Conference on Data Engineering, pages 205–214, Washington, DC, USA, 2001. IEEE Computer Society. [18] I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler. Yale: Rapid prototyping for complex data mining tasks. In L. Ungar, M. Craven, D. Gunopulos, and T. Eliassi-Rad, editors, KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 935–940, New York, NY, USA, August 2006. ACM. 53 • • • • • • • • • [19] A. Oliner and J. Stearley. What supercomputers say: A study of five system logs. In DSN ’07: Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pages 575–584, Washington, DC, USA, 2007. IEEE Computer Society. [20] J. E. Prewett. Analyzing cluster log files using logsurfer. In in Proceedings of the 4th Annual Conference on Linux Clusters, 2003. [21] R. K. Sahoo, A. J. Oliner, I. Rish, M. Gupta, J. E. Moreira, S. Ma, R. Vilalta, and A. Sivasubramaniam. Critical event prediction for proactive management in largescale computer clusters. In KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 426–435, New York, NY, USA, 2003. ACM. [22] J. Stearley. Towards informatic analysis of syslogs. In CLUSTER ’04: Proceedings of the 2004 IEEE International Conference on Cluster Computing, pages 309–318, Washington, DC, USA, 2004. IEEE Computer Society. [23] J. Stearley and A. J. Oliner. Bad words: Finding faults in spirit’s syslogs. In CCGRID ’08: Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), pages 765–770, Washington, DC, USA, 2008. IEEE Computer Society. [24] T. Takada and H. Koike. Tudumi: information visualization system for monitoring and auditing computer logs. pages 570–576, 2002. [25] C. J. Twining and C. J. Taylor. The use of kernel principal component analysis to model data distributions. Pattern Recognition, 36(1):217–227, January 2003. [26] R. Vaarandi. Sec - a lightweight event correlation tool. IP Operations and Management, 2002 IEEE Workshop on, pages 111–115, 2002. [27] R. Vaarandi. A data clustering algorithm for mining patterns from event logs. IP Operations 54 and Management, 2003. (IPOM 2003). 3rd IEEE Workshop on, pages 119–126, Oct. 2003. • • • • • • • [28] R. Vaarandi. A breadth-first algorithm for mining frequent patterns from event logs. In F. A. Aagesen, C. Anutariya, and V.Wuwongse, editors, INTELLCOMM, volume 3283 of Lecture Notes in Computer Science, pages 293–308. Springer, 2004. [29] I. H. Witten and E. Frank. Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann Publishers Inc., 2000. [30] K. Yamanishi and Y. Maruyama. Dynamic syslog mining for network failure monitoring. In KDD ’05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 499–508, New York, NY, USA, 2005. ACM. [31] P. Bodik, G. Friedman, L. Biewald, H. Levine, G. Candea, K. Patel, G. Tolle, J. Hui, A. Fox, M. I. Jordan, and D. Patterson. Combining visualization and statistical analysis to improve operator confidence and efficiency for failure detection and localization. volume 0, pages 89–100, Los Alamitos, CA, USA, 2005. IEEE Computer Society. [32] M. Y. Chen, A. Accardi, E. Kiciman, J. Lloyd, D. Patterson, A. Fox, and E. Brewer. Pathbased faliure and evolution management. In NSDI’04: Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation, pages 23–23, Berkeley, CA, USA, 2004. USENIX Association. [33] I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox. Capturing, indexing, clustering, and retrieving system history. SIGOPS Oper. Syst. Rev.,39(5):105–118, 2005. [34] E. Kiciman and B. Livshits. Ajaxscope: a platform for remotely monitoring the client-side behavior of web 2.0 applications. In SOSP ’07: Proceedings of twentyfirst ACM SIGOPS symposium on Operating systems principles, pages 17–30, New York, NY, USA, 2007. ACM. 55 • • • • • • • • [35] L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /*icomment: bugs or bad comments?*/. In SOSP ’07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pages 145–158, New York, NY, USA, 2007. ACM. [36] K. Papineni. Why inverse document frequency? In NAACL ’01: Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001, pages 1–8, Morristown, NJ, USA, 2001. Association for Computational Linguistics. [37] G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA, 1987. Samuel T. King, George W. Dunlap, Peter M. Chen, "Debugging operating systems with timetraveling virtual machines", Proceedings of the 2005 Annual USENIX Technical Conference , April 2005 (presentation by Sam King). Best Paper Award. [Dunagan09] J. Dunagan, et al. Heat-ray: Combating Identity Snowball Attacks using Machine Learning, Combinatorial Optimization and Attack Graphs, In Proc. of SOSP 2009 [Kandula09] Detailed Diagnosis in Enterprise Networks Srikanth Kandula (Microsoft Research), Ratul Mahajan (Microsoft Research), Patrick Verkaik (UC San Diego), Sharad agarwal (Microsoft Research), Jitu Padhye (Microsoft Research), Paramvir Bahl (Microsoft Research), In Proc. of SigComm 2009 [Kiciman07] Emre Kiciman and Benjamin Livshits, AjaxScope: A Platform for Remotely Monitoring the Client-side Behavior of Web 2.0 Applications. In Proc. of SOSP 2007 [King05] Samuel T. King, George W. Dunlap, Peter M. Chen, "Debugging operating systems with time-traveling virtual machines", Proceedings of the 2005 Annual USENIX Technical Conference56, April 2005 (presentation by Sam King).