Systems Group Department of Computer Science Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas April 23, 2007 University of Texas at Dallas Information about the Group • • • • • • • • Over 10 members Members of editorial boards of IEEE and ACM Transactions Advisory boards (e.g., Purdue University CS Department) Funding from NSF (including career awards), AFOSR, ARO, DoD, NASA and Corporations PhD form prestigious universities including Cornell, Princeton, USC, Purdue, UNC IEEE/AAAS Fellows, Senior Members, Awards Keynote addresses at major conferences (e.g., ACM SACMAT 04, PAKDD06, IEEE Policy 07) Collaboration with Leading researchers – Purdue, UMBC, U of VA, GMU, UIUC, U of MN, GATech etc. University of Texas at Dallas Technology Themes • Our research is focusing on Core System areas such as – Embedded Systems, Distributed Systems and Networks, Data Management Systems • We are also conducting extensive research in systems applications including – Data Mining, Visualization, Graphics, Bioinformatics, Multimedia and Animation, Geospatial information management, and Wireless Computing • Security cuts across all areas – Data and applications security, Network security, Data Mining for Security Applications, Privacy, Secure languages, Embedded systems security, Secure data grid University of Texas at Dallas Vision of the Systems Group • Five Pronged Approach to R&D in Systems and Applications – 1. Basic research in systems ranging from complexity results to systems design • Funding from NSF, AFOSR, ARO, etc. – 2. Applied research: Large scale design and implementation projects (Alcatel, Raytheon, Nokia, Rockwell, etc.) – 3. Technology Transfer: work with corporations such as Raytheon to transfer the research to Operational programs – 4. Standards – work with organizations such as OGC, W3C to transfer research to standards – 5. Commercialization: Work with Office of Sponsored Research to commercialize our tools (e.g., Data Mining for security) University of Texas at Dallas Embedded Systems & Security Edwin Sha Billions of units produced yearly, versus millions of desktop units Application Specific: more parallel, heterogeneous, networked Tightly-constrained: low cost, low power, small memory Real-time & Secure Need both hardware & software: need design automation and optimization: compiler, OS, hardware Timing & Memory Optimization Timing optimization for loops: Develop retiming, MD retiming. All the instructions in a loop nest can be executed in parallel. Hiding memory latency: CPU is fast; memory is slow. Prefetching data before they are required. Combining with partitioning and iterational retiming. Completely hide memory latencies. University of Texas at Dallas Timing: all the instructions in a loop nest can be executed in parallel Power: switching activities is reduced by 42.8% Program size: code-size reduction technique reaches 50% reduction Security: Hardware/Software Defender protects systems from any buffer-flow attacks http://www.utdallas.edu/~edsha HW/SW for Security Protection from buffer-overflow attacks Problems: protection capability, overhead Solution: Hardware/Software Defender (HSDefender). Intrusion Detection for known worms & viruses Problems: performance Solution: very high-performance specialized parallel architectures. Visual Languages and Communications Kang Zhang Objectives • Build a Theoretical Foundation for Visual Specification and Reasoning • Apply Visual Techniques to Data Engineering • Enhance Information Access on Mobile Devices • Promote Aesthetic Aspects of Visualization for High Usability Funding: NSF ITR: 216K + proposal submitted; Scholarship grants: NSF CSEMS, DoEdu GAANN Scientific/Technical Approaches Develop a spatial graph grammar formalism with efficient parsing Build a graph induction engine Add semantics to UML diagrams Design intuitive and effective graph visualization and navigation algorithms (e.g. graph labeling, mobile browsing) Learn from visual arts and design for aesthetic information visualization and user-interfaces University of Texas at Dallas Visual Languages (Graph Grammars) Information Visualization Round-Trip Visual Engineering Visual Arts & Design Mobile Display Model-Driven Engineering Applications Multimedia Authoring Data Interoperation Accomplishments Proposed a context-sensitive graph grammar formalism with polynomial parsing speed Applied graphical specification and reasoning to various application domains Developed a visual data clustering and noise removal system Challenges Measurement/evaluation of aesthetics and visual effectiveness; Usability; Scalability Next General Prolog Systems Gopal Gupta Objectives • Develop the next generation of Prolog system that integrates various recent advances: •Finite Domain Constraints •Tabled Logic Programming •Coinductive Logic Programming •Answer Set Programming (ASP) •Deterministic coroutining •Parallelism (via Multicores);o Rationale Research in logic programming driven by quest to find the optimal computation rule -- select clauses in optimal order -- select goals in optimal order Tabling/Parallelism allows optimal clause order Det. Coroutining/constraints allow optimal goal order Coinductive LP/ASP adds further power University of Texas at Dallas Approach • Develop simple-to-implement approaches (else impl. becomes too complex). • Use an existing Prolog engine (GNU Prolog) • Exploit parallelism on multicore machines Applications • Model checking and verification • Non-monotonic reasoning • Semantic web reasoning engines Accomplishments Developed coinductive logic programming and efficient ways to implement it. Developed scalable, easy-to-realize parallel implementation on Beowulf arch. Developed easy-to-realize implementation for tabled logic programming Developed methods for goal-directed execution of answer set programs (non-monotonic reasoning). Assured Information Sharing Bhavani Thuraisingham, Latifur Khan, Murat Kantarcioglu Objectives • Develop a Framework for Secure and Timely Data Sharing across Infospheres. • Investigate Access Control and Usage Control policies for Secure Data Sharing. • Develop innovative techniques for extracting information from trustworthy, semi-trustworthy and untrustworthy partners. Funding: AFOSR: 306K + 120K + proposal submitted; Matching funds from dean Scientific/Technical Approach Conduct experiments as to how much information is lost as a result of enforcing security policies in the case of trustworthy partners Develop more sophisticated policies based on rolebased and usage control based access control models Develop techniques based on game theoretical strategies to handle partners who are semi-trustworthy Develop data mining techniques to carry out defensive and offensive information operations University of Texas at Dallas Data/Policy for Coalition Publish Data/Policy Publish Data/Policy Publish Data/Policy Component Data/Policy for Agency A Component Data/Policy for Agency C Component Data/Policy for Agency B Accomplishments Developed an experimental system for determining information loss due to security policy enforcement Developed a strategy for applying game theory for semi-trustworthy partners; simulation results Developed data mining techniques for conducting defensive operations for untrustworthy partners Challenges Handling dynamically changing trust levels; Scalability Malicious Code Detection using Data Mining Latifur Khan and Bhavani Thuraisingham Objectives • Develop a framework for Malicious code detection • Overcome shortcoming of Traditional approaches--Signature based & Not effective against “zero day” attacks • Proposed Innovative Framework will be deployed in untrustworthy partners Funding: AFOSR: 306K + proposal submitted; Matching funds from dean Scientific/Technical Approach Develop a hybrid data mining approach to detect malicious executables. Important features of malicious and benign executables are identified and trained classifiers Three set of features are extracted: Binary features are extracted from the binary executables; assembly features are extracted from disassembled executables; function call features are extracted from program headers. University of Texas at Dallas Accomplishments • Developed a tool that can detect malicious executables in near real time. Future Work • Detect malicious executable in real time with a very low false alarm rate • Extend this work to detect buffer overflow by discriminating messages containing code (i.e., attack message) from messages containing no code (i.e., non attack message) Geospatial Information Management for National Security Latifur Khan and Bhavani Thuraisingham Client DAGIS MatchObjectives • Develop a framework for Geospatial Data integration to incorporate geospatial data sources and other sources • Framework will facilitate standard metadata that describes geospatial repositories and a coherent mechanism to connect repositories-- Seamless integration of Geospatial and Non-Geospatial information with minimal human intervention– (a sample query “Find movie theaters within 30 miles of 75080” ) • Funding: Raytheon: 200K + proposal submitted; Matching funds from dean Scientific/Technical Approach • Develop Semantic Web Services--Conjunction of two powerful technologies : Semantic Web and Web Services • Semantic Web Services provide richer semantics required for automation of service discovery, selection and execution tasks • Develop Geo Service Discovery and dynamic compositions to integrate geospatial information services by exploiting OWL-S to describe Web services University of Texas at Dallas Agent Maker DAGIS Composer 3. 2. Service Compose Discovery Selection Composer Profile 5.Return Dynamic Sequencer 4. Service URI Construct Richardson Sequence Zipcode Theater Finder Finder Theaters TX 1. Query 30 Miles Accomplishments Developed a tool that can handle certain types of queries with a limited number of geospatial and non geospatial data sources Future Work • Complete toolkit that can handle a complex query automatically and effectively on the fly from a significant number of geospatial and non geospatial data sources • Extend this for national security data analysis Securing Critical Information I-Ling Yen Objectives Many data-intensive applications hosting critical data Data grid Large-scale distributed database How to secure these systems under hostile Internet environment Secure storage Secure operations on the data Problem Statements Data Grid Developed data grid storage systems Combine secure sharing and replication to achieve security, availability, and integrity Efficient data placement algo. for allocating data shares and their replicas to achieve the best access performance University of Texas at Dallas No matter how good the intrusion detection systems are, adversaries always manage to penetrate the system Need to support intrusion tolerance Even if the system is compromised, critical information can still stay secure Simple encryption won’t work In storage system: key management issues In data applications: data need to be decrypted when operated on Operating on Encrypted Data Developed search algorithm to support the processing of search queries on encrypted data Developed new encryption algorithms to allow secure computation on secret data Need to integrate these algorithms in systems while ensuring overall system security Data Integrity, Quality and Provenance for Command and Control Applications Murat Kantarcioglu and Bhavani Thuraisingham Objectives • Reduce the complexity of the data integrity assurance process • Develop tools to decide whether to “admit” data into a database • Develop techniques to analyze the confidence of query results based on data provenance Funding: AFOSR: 300K ; Matching funds from dean (Joint work with Elisa Bertino from Purdue University) Scientific/Technical Approach Develop integrity and provenance policy language Develop risk management based approach that considers risks due data provenance Apply game theoretical and incentive based techniques to enforce honest behavior in policy enforcement Access Request Access Control Results Access Controller Integrity Controller Conven tional Access Controll er Integrity Validator Integrity Policy Repository Integrity Metadata Repository Integrity Policy Supplier Accomplishments Developed comprehensive architecture for an integrity control system Developed integrity policy language Developed an initial approach to risk evaluation Challenges Developing techniques against malicious behavior University of Texas at Dallas Privacy-Preserving Data Mining Murat Kantarcioglu and Bhavani Thuraisingham Specific Secure Tools Data Mining on Horizontally Partitioned Data Objectives • Learn data mining results without disclosing the private data • Measure privacy loss due to data mining results • Explore possible trade-offs between privacy, efficiency and accuracy • Devise techniques to use data mining results privately •Association Rule Mining •Secure Sum •Secure Comparison •Decision Trees •Secure Union •EM Clustering •Secure Logarithm • Naïve Bayes Classifier •Secure Poly. Evaluation Scientific/Technical Approach Develop secure multi-party computation based approaches for distributed data mining tasks under different adversarial assumptions Develop perturbation based approaches for individually adaptable privacy preservation Develop statistical methods to measure privacy loss due to data mining results Develop cryptographic framework for using data mining results privately •K-NN Classifier Accomplishments Showed that various distributed data mining protocols could be implemented using few specific secure protocols (see the figure above) Developed a perturbation technique that allows individuals to choose their own privacy level Developed various secure tools for enabling privacy preserving data mining. Challenges Relative inefficiency of cryptographic techniques, accuracy loss in perturbation based approaches University of Texas at Dallas Classification and Prediction Models for Mining Spatial Data Weili Wu • Motivation and Application Historical Examples: – London Asiatic Cholera 1854 (Griffith) – Dental health and fluoride in water, Colorado early 1900s Current Examples: – Crime hotspots (NIJ CML, police petrol ) – Environmental justice (EPA), fair lending practices – Location aware services (Defense: Sensor networks, Mobile ad-hoc networks) – • Ecology: Spatial habitat model Funding – NSF 300K + Matching funds from dean University of Texas at Dallas • Research Problem Formulation Given: S {s1 ,...sn } 1. Spatial Framework 2. Explanatory functions: f Xk : S R f : S C {c1 ,...cM } 3. A dependent class: C 4. A family of function mappings: R ... R C Find: Classification model: Objective: maximize classification_accuracy fˆc exists Constraints: Spatial Autocorrelation • ( fˆc , f c ) Accomplishments: – – – Developed efficient spatial-temporal model to analysis Geo-spatial data. Developed new spatial similarity measure to build a more advanced model. Developed new efficient search algorithm. Dependable Distributed Systems Neeraj Mittal Objectives Develop novel algorithms for monitoring executions of distributed systems. Develop new algorithms for effective sharing of resources. Challenges Asynchronous system with no global clock or shared memory. Processes and channels may be unreliable. Processes may join and leave the system at any time. Scientific Accomplishments Developed algorithms for detecting stable properties (e.g., termination) under a variety of conditions: processes may fail by crashing failed processes may recover Develop efficient algorithms for group mutual exclusion. University of Texas at Dallas Future Work Monitoring algorithms when the system is dynamic. Resource management algorithms when processes and/or channels may be unreliable. Key Management in Sensor Networks Neeraj Mittal Objectives Develop novel schemes for securing communication in sensor nodes deployed in hostile territory. Communication between two sensor nodes may need to be protected against snooping by another node. Challenges Sensor nodes have limited resources. Wireless communication is vulnerable to eavesdropping. Sensor nodes are vulnerable to physical captures. Scientific Accomplishments Developed novel schemes for predistributing keys among sensor nodes under a variety of conditions: limited deployment knowledge is available some sensor nodes may be malicious University of Texas at Dallas Future Work Dynamically refresh the keys stored at uncompromised sensor nodes. Protect against new malicious sensor nodes joining the network. Computational Systems Biology through Mining High Throughput Data Ying Liu Objectives • Design efficient algorithm for biological network inference • Integrate heterogeneous biological data • Decompose Biological networks into functional modules • Discover functional hierarchy from biological networks Scientific/Technical Approaches Biological networks are modular Using random forest tree to integrate heterogeneous data New formulation for heavy sub-graph mining Design graph mining algorithm Propose new metrics to measure dense subgraphs University of Texas at Dallas Accomplishments Integrated 7 different types of data to construct protein-protein interaction networks Formulate heavy sub-graph discovery problem as a quadratic functions Proposed new graph mining algorithms based on Evolutionary computing and neural network Challenges Large-scale data size; Heavy sub-graph discovery problem is NP-complete problem. Physically-Based Deformable Models Xiaohu Guo Objectives • Develop a physically-based simulation and visualization platform for deformable models, which can perform dynamic simulation, collision detection, and material property visualization, in real-time. • Investigate physically-based deformable models under a networking collaborative virtual environment. Scientific/Technical Approach Investigate the theoretical foundations for quasiconformal surface mapping and harmonic volumetric mapping Based on the regular parametric domains included by geometric mapping, develop a GPU-accelerated framework including real-time PDE/ODE solver, collision detection, and volume rendering Having the regular parametric domains (i.e. geometry images), use image-based (2D/3D) compression and streaming technique for efficient transmission of deformable models. University of Texas at Dallas Harmonic Surface and Volumetric Mapping GPU-Accelerated PDE/ODE Solver GPU-Accelerated Collision Detection GPU-Accelerated Deformable Models Geometry Images GPU-Accelerated Volume Rendering Deformable Models Compression and Network Streaming Potential Applications Surgical training and dynamic simulation of human tissues/muscles under interactive manipulation 3D model registration and target localization in medical imaging, based on deformable models Challenges Multiple users’ collaborative manipulation will result in data incoherency at different client sites, deformable model decomposition techniques can be further investigated Language-based Software Security Kevin W. Hamlen Objectives Develop systems for safe execution of mobile code from untrusted sources Support low-level binary formats, legacy languages, etc. Provide formally provable security guarantees (e.g., using type theory) When source is untrusted, code signing doesn’t help Static analyses useful when possible, but interesting security properties are undecidable In-lined Reference Monitors are sufficiently powerful, but need formal proof techniques to guarantee safety Scientific Accomplishments Developed the first certified In-lined Reference Monitoring system fully automatic program-rewriter for managed .NET all generated code has machinecheckable soundness proof University of Texas at Dallas Challenges Future Work Support lower level binary formats (x86 machine code rather than .NET bytecode) Reduce disconnect between theory & implementation by creating smaller verifiers (e.g., logic programming) Multimedia Systems and Networking B. Prabhakaran (praba@utdallas.edu) • 3D Motions: Motion capture and Gesture sensors data • • • • • • 3D Models: Educational instructions, Role playing games For delivery (streaming): focus on wireless networks Biomedical Applications Physical Medicine & Rehabilitation Parkinson’s and other Neurological Diseases Study Dynamics of Human Anticipation Security Applications Emergency Handling: Streaming Animated Instructions Over PDAs, Laptops on Wireless Ad-hoc Networks Optimal Sensor Placement, Suspicious activity identification. Arts and Technology Copyright Protection: Watermarking of 3D Models and Captured Motions Reusability of Models and Motions Funding from NSF Career and ARO University of Texas at Dallas Our Directions and Plans • Each technology area is making very good technical progress • Will continue to enhance our research and follow the five pronged model • Also plan on developing interdisciplinary projects – within the Group – Across the Groups – Across UTD and Partners (e.g., School of Management, UT Southwestern Medical Center) • Continue to increase the number of Fellows, Board members, Keynote talks etc. • Center Scale Project is our major goal University of Texas at Dallas