CSCI 578 Software Architectures Dr. Chris Mattmann Tuesday, August 27, 2013 The Class Will give you a complete treatment of the area of software architecture • The fundamental building blocks of software systems • Components (units of computation) • Connectors (interactions between the software components) • Configurations (arrangements of components and connectors) Will equip you with the necessary skills to design complex, real-world software 27-Aug-13 CS578 CAM-2 General Class Information Lecture, but… • You can participate • You should participate • You will participate, that is, if you want to do well :) On-campus, and remote (DEN) sections • DEN section capped at 20 students • Not my limit, but DEN’s • Please ask questions, if you need to 27-Aug-13 CS578 CAM-3 General Class Information Syllabus/Web Site: • http://sunset.usc.edu/classes/cs578_201 3b/ • Visit it often, as the schedule may change! • This is where all of your homework assignments will be posted • This site will point you to required reading, and to lectures that you can download before class 27-Aug-13 CS578 CAM-4 What We’ll Cover The entire spectrum of software architecture • Where it fits in the overall software engineering process -- it’s the linchpin! • Software architectural styles, product line architectures, components, connectors, implementation frameworks, middleware, nonfunctional properties, visualization, the role of the architect…lots of topics! Topical research in software architecture • (Optional) papers, data-intensive systems, etc. 27-Aug-13 CS578 CAM-5 Me Graduated with my Ph.D. in Computer Science from USC in 2007 • Advisor: Dr. Nenad Medvidovic Was a student at USC from 1998-2007 • B.S., Computer Science 2001 • M.S., Computer Science 2003 My research interests • Open Source Software, Apache Software • • • • • Foundation Was one of the inventors of Hadoop (Nutch PMC) Inventor of Apache Tika, http://tika.apache.org/ The intersection of software architectures, and large-scale data dissemination Information Retrieval Search Engines – I’m teaching a class, CS572, on this topic during Spring 2013 27-Aug-13 CS578 CAM-6 My Other Day Job • The Big Picture • Astronomy, Earth science, planetary science, life/physical science all drowning in data • Fundamental technologies and emerging techniques in archiving and data science • Largely center around open source communities and related systems • Research challenges (adapted from NSF) • • • • • More data is being collected than we can store Many data sets are too large to download Many data sets are too poorly organized to be useful Many data sets are heterogeneous in type, structure Data utility is limited by our ability to use it U.S. National Climate Assessment (pic credit: Dr. Tom Painter) • Proposal focus: Big Data Archiving • Research methods for integrating intelligent algorithms for data triage, subsetting, summarization • Construct technologies for smart data movement • Evaluate cloud computing for storage/processing • Construct data/metadata translators “Babel Fish” 27-Aug-13 CS578 7 SKA South Africa: Square Kilometre Array (pic credit: Dr. Jasper Horrell, Simon Ratcliffe) Where we’re headed Nature magazine piece on “A Vision for Data Science” in Jan. 24th issue • Big Data Initiative highlighted • Outline algorithm integration (regridding, metrics); automatic understanding of data metadata formats and open source as “key issues” 27-Aug-13 CS578 CAM-8 Software Architecture Research Problem •In a performant manner? •Fulfilling system requirements? NASA Planetary Data System Archive Volume Growth 90 80 70 60 TB (Accum) Content repositories are growing rapidly in size At the same time, we expect more immediate dissemination of this data How do we distribute it… 50 TBytes 40 30 20 10 0 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 Year 27-Aug-13 CS578 CAM-9 Data Distribution Technologies GridFTP XML-RPC bbFTP Which one is Aspera the best one? Bittorrent SCP 27-Aug-13 RMI UFTP Given our Siena current architecture? FTP CORBA Given ourSOAP SFTP distributionJXTA HTTP/REST scenarios and GLIDE/PRISM-MW requirements? CS578 CAM-10 Architectural Decisions • Architectural decisions (such as connector selection) impact functional and non-functional properties of the overall data distribution system architecture • It does matter what connector you select • Functional (performance) • Efficiency, consistency, scalability, dependability of the data transfer • Non-functional (e.g., interoperability, security) • We assert that this process has largely remained an art form and forces organizations to rely on organizational gurus whose knowledge is never encoded or understood 27-Aug-13 CS578 CAM-11 Overall Approach “White Box” Guru Connector KB Data Distribution System Architecture Data System Architect 27-Aug-13 CS578 Performance KB CAM-12 “Black Box” Guru So, today… You’re free to enjoy the day Course book (Possible) Reading assignment • Chapter 1: The Big Idea Be ready to get going on Thursday 27-Aug-13 CS578 CAM-13