Adapt the Monitoring

Tavolo 2 - Big Data Adaptive Monitoring CIS, UNINA, UNICAL, UNIFI Primi risultati • I primi risultati del tavolo sono stati pubblicati nell’articolo Big Data for Security Monitoring: Challenges Opportunities in Critical Infrastructures Protection L. Aniello2, A. Bondavalli3, A. Ceccarelli3, C. Ciccotelli2, M. Cinque1, F. Frattini1, A. Guzzo4, A. Pecchia1, A. Pugliese4, L. Querzoni2, S. Russo1 (1) UNINA (2) CIS (3) UNIFI (4) UNICAL • Presentato al workshop BIG4CIP @EDCC (12 maggio 2014) Data-Driven Security Framework DATA ANALYSIS ATTACK MODELING INVARIANTBASED MINING CONFORMANCE CHEKING FUZZY LOGIC BAYESIAN INFERENCE … DATA PROCESSING MONITORING ADAPTER RAW DATA COLLECTION ENVIRONMENTAL DATA NODE RESOURCE DATA NETWORK AUDIT CRITICAL INFRASTRUCTURE … KNOWLEDGE BASE APPLICATION/ SYSTEM LOGS IDS ALERTS PROTECTION ACTIONS ADAPTIVE MONITORING Scenario • Problem: need to analyze more data coming from distinct sources in order to improve the capability to detect faults/cyber attacks o Excessively large volumes of information to transfer and analyze o Negative impact on the performance of monitored systems • Proposed solution: dynamically adapt the granularity of monitoring o Normal case: coarse-grained monitoring (low-overhead) o Upon anomaly detection: fine-grained monitoring (higher overhead) • Two distinct scenarios o Fault detection  current CIS’s research direction o Cyber attack detection Anomaly Detection • Metrics Selection o Find correlated metrics (invariants) to be used as anomaly signals o Learn which invariants hold when the system is healthy  Profile the healthy behavior of the monitored system • Anomaly Detection o Monitor the health of the system by looking at a few metrics  How to choose these metrics? o When an invariant stops to hold, adapt the monitoring  The aim is detecting the root cause of the problem  Possibility of false positives [1] J., M., R., W., "Information-Theoretic Modeling for Tracking the Health of Complex Software Systems", 2008 [2] J., M., R., W., " Detection and Diagnosis of Recurrent Faults in Software Systems by Invariant Analysis", 2008 [3] M., J., R., W., "Filtering System Metrics for Minimal Correlation-Based Self-Monitoring", 2009 Adapt the Monitoring • Two dimensions in adapting the monitoring o Change the set monitored metrics o Change the frequency of metrics retrieval • How to choose the way of adapting the monitoring on the basis of the detected anomaly? • Additional issue o The goal of the adaptation is discovering the root cause of the problem o Need to zoom-in specific portions of the system  Very likely to increase the amount of data to transfer/analyze  Risk to have a negative impact on system performance  Possible solution: keep the volume of monitored data limited by zoomingout other portions of the system [4] M., R., J., A., W., "Adaptive Monitoring with Dynamic Differential Tracing-Based Diagnosis", 2008 [5] M., W., "Leveraging Many Simple Statistical Models to Adaptively Monitor Software Systems", 2014 Fault Localization • Goal: given a set of alerts, determine which fault occurred and which component originated it • Problems o A same alert may be due to different faults (Ambiguity) o A single fault may cause several alerts (Domino Effect) o Concurrent alerts may be generated by concurrent unrelated faults o Tradeoff: monitoring granularity vs precision of fault identification • Approaches: o Probabilistic models (e.g. HMM, Bayesian Networks) o Machine learning techniques (e.g. Neural Networks, Decision Trees) o Model-based techniques (e.g., Dependency Graphs, Causality Graphs) [6] S., S., "A survey of fault localization techniques in computer networks", 2004 [7] D., G., B., C., "Hidden Markov Models as a Support for Diagnosis: ...", 2006 Prototype - Work in Progress monitoring of a JBoss cluster by using Ganglia Host #1 Host #2 Host #3 Host #4 JBoss AS JBoss AS JBoss AS JBoss AS gmond gmond gmond gmond monitored metrics gmetad Adaptive Monitoring Mon. Host Prototype - Goals • Identify a small set of metrics to monitor on a JBoss cluster to detect possible faults o Find existing correlations o Profile healthy behavior • Inject faults on JBoss with Byteman (http://byteman.jboss.org/) • For each fault, identify the set of additional metrics to monitor • Implement the prototype in order to evaluate o The effectiveness of the approach o The reactivity of the adaptation o The overhead of the adaptation OPERATING SYSTEMS AND APPLICATION SERVERS MONITORING Data collection and processing • Collects a selection of attributes from OS and AS, through probes that have been installed on machines – Current implementation observes Tomcat 7 ad CentOS 6 • Executes the Statistical Prediction and Safety Margin algorithm on the data collected • The CEP Esper is used to apply rules on events (performs the detection of anomalies) • Work partially done within the context of the Secure! Project (see later today) High level view INVARIANTS MINING Why invariants? • Invariants are properties of a program that are guaranteed to hold for all executions of the program. – If those properties are broken at runtime, it is possible to raise an alarm for immediate action • Invariants can be useful to – – – – detect transient faults, silent errors and failures report performance issues avoid SLAs violations help operators to understand the runtime behavior of the app • Pretty natural properties for apps performing batch work An example of flow intensity invariant • A platform for the batch processing of files: the processing time is proportional to the file size • Measuring the file size and the time spent in a stage, I(x) and I(y), (the flow intensities), the equation I ( y )  k  I ( x) is an invariant relationship characterising the expected behaviour of the batch system. – If there is an execution problem (e.g., file processing hangs) the equation does not hold any more (broken invariant) Research questions RQ1: how to discover invariants out of the hundreds of properties observable from an application log? RQ2: How to detect broken invariants at runtime? Our contribution AUTOMATED MINING A framework and a tool for mining invariants automatically from application logs • tested on 9 months of logs collected from a real-world Infosys CPG SaaS application • able to to automatically select 12 invariants out of 528 possible relationships IMPROVED DETECTION An adaptive threshold scheme defined to significantly shrink down the number of broken invariants • from thousands to tens broken invariants w.r.t. static thresholds on our dataset BAYESIAN INFERENCE Data-driven Bayesian Analysis • Security monitors may produce a large number of false alerts • A Bayesian network can be used to correlate alerts coming from different sources and to filter out false notifications • This approach has been successfully used to detect credential stealing attacks – Raw alerts generated during the progression of an attack (e.g. user-profile violations and IDS notifications) are correlated – The approach was able to remove around 80% of false positives (i.e., not compromised user being declared compromised) without missing any compromised user Data-driven Bayesian Analysis • Vector extraction starting from raw data: – each vector represents a security event, e.g., attack, compromised user, etc… – suitable for post-mortem forensics and runtime analysis; – event logs, network audit, environmental sensors. event VECTOR EXTRACTION binary features (0 / 1) v1 ✓ v2 ✓ ✓ ✓ v3 vN ✓ ✓ ✓ ✓ ✓ ✓ ✓ Bayesian network • Allows estimating the probability of the hypothesis variable (attack event), given the evidence in the raw data: hypothesis variable C (the user is compromised) unknown address information variables A1 multiple logins A14 A2 … Network parameters   a-priori probability P(C); conditional probability table (CPT) for each alert Ai. (alerts) suspicious download P(A | C) P(C | A) = × P(C) P(A) N P(A) = å P(A | Ci )× P(Ci ) i=1 Incident analysis • Estimate the probability that the vector represents an attack, given the features vector V_i ✓ ✓ ✓ ✓ ✓ … P(C)=0.31 Preliminary testbed • A preliminary implementation with Apache Storm ___ ___ ___ LogStreamer (spout) FactorCompute (bolt) • Tested with synthetic logs emulating the activity of 2,5 million users, generating 5 millions log entries per day (IDS logs and user access logs) AlertProcessor (bolt) Log lines Time (ms) 4.300.000 140.886 4.400.000 143.960 4.600.000 147.024 4.500.000 150.448 4.700.000 153.551 4.800.000 159.567 4.900.000 162.642

Adapt the Monitoring

Related documents

Products

Support

Adapt the Monitoring

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib