Model-based Validation of Streaming Data Cheng Xu, Tore Risch Dept. Information Technology Uppsala University, Sweden Daniel Wedlund, Martin Helgoson AB Sandvik Coromant, Sweden Informationsteknologi Talk Overview Motivation Approach and System Architecture Demonstrators Performance experiments Conclusion Related work Future work Institutionen för informationsteknologi | www.it.uu.se Motivation Functional products: integrated provision of hardware, software and services, not just the traditional hardware Informationsteknologi => Manufacturer responsble for functioning In modern manufacturing industry sensors installed on equipment-in-use generate many high rate data streams Providing productivity, reliability, and quality of functional products require monitoring many streams for unexpected behavior. When the number of machines increases and data flows are high, validation with low latency may be challenging SVALI (Stream VALIdator): General system to validate correct equipment behavior by analyzing streams onthe-fly. Institutionen för informationsteknologi | www.it.uu.se SVALI, Stream VALIdator Informationsteknologi Two validation approaches: Model-and-validate The user defines an analytical math model of expected behavior based on streams from equipment sensors The user also defines a validation model that identifies abnormal equipment sensor readings by comparing the result of the analytical model with measured sensor streams. A simple case is detecting when difference between expected power consumption and measured power consumption exceeds some threshold. Learn-and-validate The user provides (statistical) learning model based on a sampled sub-stream of correctly behaving equipment As for model-and-validate the user also provides a validation model Institutionen för informationsteknologi | www.it.uu.se Informationsteknologi SVALI Architecture CLIENT UPDATES VISUALIZERS AND ALERTERS CQ 2 CQ 1 model-n-validate set threshold = 1.3 D B learn-n-validate SVALI VALIDATION FUNCTIONS Analytical model Statistical model STREAM MODELS Stream wrapper A Stream wrapper B STREAM WRAPPERS TCP equipment A Institutionen för informationsteknologi | www.it.uu.se TCP equipment B EPIC DSMS SVALI Validation functions Informationsteknologi Model-and-validate model_n_validate(Bag of Stream s, Function modelfn, Function validatefn) ->Stream of (Number ts, Object me, Object ex) modelfn(Object se)->Object ex validatefn(Object se, Object ex)->(Number ts, Object me) Learn-and-validate The difference is how the model is defined learn_n_validate(Bag of Stream s, Function learnfn, Integer n, Function validatefn) -> Stream of (Number ts, Object me, Object ex) learnfn(Vector of Object sa)->Object ex validatefn(Object se, Object ex)->(Number ts, Object me) Institutionen för informationsteknologi | www.it.uu.se Informationsteknologi Model-n-validate demonstrator ae [mm] fz [mm/tooth] hex [mm] 2 0.0756 0.05 The analytical and validation models are into the 3 entered 0.0641 0.05 The side milling process SVALI system ap [mm] vc [m/min] zc 20 200 4 20 200 4 create function validatePower(Record r, Number ex) -> (Number ts, Number me) as select ts(r), me where me = measuredPower(r) and abs(ex - me) > th(“mill1”); select model_n_validate(bagof(input), #'expectedPower',#’validatePower’) from Stream input where input = corenetJsonWrapper("h1", 1337); Institutionen för informationsteknologi | www.it.uu.se Learn-n-validate demonstrator Informationsteknologi Cyclic behavior is defined as predicate (dynamic) windows. A vector of expected power consumptions is computed from the sampled n first predicate windows The learning model is the normalized average vector over the sampled windows Validation is done by comparing the normalized euclidean distance between the learnt power consumptions and the current window’s power consumptions create function cycleStart(Record s) -> Boolean as s[“trigger”] = 1; The window starts when the trigger is 1 create function cycleStop(Record s, Record r) -> Boolean as r[“trigger”] = 0 and s[“trigger”] = 1; The window ends when the trigger is 0 and the window was started select learn_n_validate(bagof(sw), #’learnCycle’, 2, #’validateCycle’) from Stream s, Stream sw create function extractPowerW(Window w) -> Vector of Number where s= corenetJsonWrapper( 1338) and as vselect extractPower(r) from Record r"h2", where r in w; sw = pwindowize(s, #’cycleStart’, #’cycleStop’); create function learnCycle(Vector of Window f) -> Vector of Number as navg(select extractPowerW(w) from Window w where w in f); create function validateCycle(Window w, Vector e) -> (Number ts, Vector of Number m) as select timestamp(w), m where neuclid(e, m) > th(“machine2”) and m = extractPowerW(w); Cyclic behavior Institutionen för informationsteknologi | www.it.uu.se Performance Experiments Informationsteknologi Experiment setup The performance of SVALI is measured by average response time of two queries Dell NUMA computer PowerEdge R815 featuring 4 CPUs with 16 2.3 GHz cores each. OS: Scientific Linux release 6.2 Q1, model-and-validate over single stream events Q2, model-and-validate moving average over 0.1 second stream windows To scale-up the number of machines, streams are generated based on real data streams provided by industrial partner with different arrival rates (1 ms – 10 ms), each stream is tagged with a machine id. Institutionen för informationsteknologi | www.it.uu.se Central vs Parallel machine0 machine0 ... validation validation0 ... merge on ts ... ... Informationsteknologi Performance Experiments one SVALI node machinei machinei central validation Institutionen för informationsteknologi | www.it.uu.se validationi parallel validation merge on ts machine0 ... merge on ts validation one SVALI node machinei machine0 ... machinei validation0 ... ... Informationsteknologi Experiment Measurement Q1 merge on ts validationi Fig. 1 Average response time Q1 Institutionen för informationsteknologi | www.it.uu.se machine0 ... merge on ts validation validation includes a groupby on machine id one SVALI node machinei validation0 It is already grouped machine0 ... machinei ... ... Informationsteknologi Experiment Measurement Q2 merge on ts around 2 ms validationi Fig. 2 Average response time Q2 Institutionen för informationsteknologi | www.it.uu.se Informationsteknologi Conclusion Two general validation approaches were presented to validate stream behaviors, called model-and-validate and learn-andvalidate Two demonstrators show how they are used in real industrial application streams Parallel execution enables computation of stream validation with limited delays over many machines Institutionen för informationsteknologi | www.it.uu.se Informationsteknologi Related work Jakubek, S. and Strasser, T.: Fault-diagnosis using neural networks with ellipsoidal basis functions. American Control Conference. Vol. 5. pp.3846-3851, 2002 Tan, T., Gu, X., and Wang, H.: Adaptive system anomaly prediction for large-scale hosting infrastructures. PODC Conf., 2010 Prediction instead of detection Low arrival rates, e.g. one sample every 2 seconds, need not parallelization Wang, D., Rundensteiner, E., Ellison, R.: Active Complex Event Processing for Realtime Health Care, VLDB Conf., 3(2): pp.1545-1548, 2010 Learning algorithm to reduce the number of measurements for fault detection, while we use parallel processing to enable low delays Lower level rule mechanism triggered by state changes during the continuous query process Zeitler, E. and Risch, T.: Massive scale-out of expensive continuous queries, Proceedings of the VLDB Endowment, ISSN 2150-8097, Vol. 4, No. 11, pp. 118111888, 2011 SVALI’s underlying DSMS EPIC extends that work with e.g. sliding windows and incremental aggregation. SVALI provides validation functionalities on top of EPIC Institutionen för informationsteknologi | www.it.uu.se Informationsteknologi Future work Other strategies for automatic performance improvements Adaptive learning model by re-sampling Adaptive parallelization of expensive validation functions Institutionen för informationsteknologi | www.it.uu.se Informationsteknologi Institutionen för informationsteknologi | www.it.uu.se