BEHOLD, WE HAVE SIGNAL Institute for Software Research 1 Software Architecture in Cyberspace Mary Shaw Carnegie Mellon University http://www.cs.cmu.edu/~shaw/ Institute for Software Research 2 The Internet infrastructure supports a creative and thriving social and economic system. The Internet is much richer than its communication infrastructure. Architectural styles help explain software systems. What is an appropriate theory of architectural style for Internet applications? Institute for Software Research 3 The Internet infrastructure supports a creative and thriving social and economic system. The Internet is much richer than its communication infrastructure. Architectural styles help explain software systems. What is an appropriate theory of architectural style for Internet applications? Institute for Software Research 4 The Internet infrastructure supports a creative and thriving social and economic system. End users are integral to the Internet, not simply its audience end users are active participants, not passive consumers end user producers outnumber technical professionals Architectural styles help explain software systems. What is an appropriate theory of architectural style for Internet applications? Institute for Software Research 5 Demographics of US Internet users Overall Age Geography Education Total adults Women Men 18-29 30-49 50-64 65+ urban suburban rural < high school high school some college college + 73% 73 73 90% 85 70 35 74% 77 63 44% 63 84 91 Institute for Software Research 6 Pew Internet & American Life Project, 2008 Activities of online users (“have you ever?”) Activity % of ‘net users Date Send or read email 92 Dec 07 Use search engine 89 May 08 Search for map/directions 86 Dec 06 Look for info on hobby/interest 83 F-M 07 Check on product before buying 81 Sep 07 Check weather 80 May 08 Look for health/medical info 75 Dec 07 Get travel info 73 M-J 04 Get news 73 May 08 Buy product 71 Dec 07 Visit government website 66 May 08 Buy travel reservation 64 Sep 07 Surf for fun 62 F-A 06 + 65 more Institute for Software Research 7 Pew Internet & American Life Project, 2008 Content from online users (“have you ever?”) Activity % of ‘net users Date Upload photos to share 37 Aug 06 Rate product with online rating system 32 Sep 07 Post comment or review of product 30 Sep 07 Use online social networking site 29 Dec 06 Categorize or tag online content 28 Dec 06 Share files from your computer 27 M-J 05 Post comments to online group or blog 22 Dec 07 Share something online you created 21 Dec 07 Create content for the internet 19 Nov 04 !! Share files with peer-to-peer sharing 15 May 08 Create or work on your own webpage 14 Dec 07 Create web pages/blogs for others 13 Dec 07 Create or work on your own blog 12 May 08 Participate in online health discussion 12 Institute for Software Aug 06 Research 8 Pew Internet & American Life Project, 2008 Generation differences Gen X Activity Online Gen Y Teens 12-17 18-28 29-40 Trailing Leading Matures After Boomer Boomer Work 41-50 51-59 60-69 70+ Go online 87 87 79 75 64 21 37 52 28 22 29 38 16 14 25 42 14 8 25 33 8 8 32 25 5 1 84 72 50 80 64 44 84 64 37 68 59 35 72 60 22 84 Teens and Gen Y more likely than older users: Online games 81 Inst message 75 Download music51 Download video 31 54 66 45 27 Gen X or older generations dominate Get health info Travel rez Bank online 73 50 38 Institute for Software Research 9 Pew Internet & American Life Project, 2005 There are lots of end users Using data from the Bureau of Labor Statistics, we estimate that over 90M Americans will use computers at work in 2012. Of these, only about 2.5M will be professional programmers; 40.5M will be managers and (non-software) professionals. This does not include home users or non-US users, so there will be many more than 90M total end users. Most of them will “program” in some way. Institute for Software Research 10 C. Scaffidi, M. Shaw, and B. Myers. Estimating the Numbers of End Users and End User Programmers. VL/HCC'05: Proc 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 207-214, 2005. End users are not software engineers End users do not have rich and robust mental models of their computing systems they fail to do backups They misunderstand storage models (especially local vs network storage) they execute malware they innocently engage in other risky behavior. The responses of SE to this mismatch between real computing systems and end users’ models has been to seek ways to help the users act “rationally”. But there is lots of evidence that people do not reason that way. Institute for Software Research 11 Human network is part of the Internet Intrinsic property (say, dependability) of a unit in a specific (or assumed) environment based on a specific set of attributes + context, value proposition Situational dependency, reflecting the environment and a specific user’s needs, tolerances, priorities, expectations + perception, understanding humans Pragmatic dependability – as realized in practice – which results from decisions made by people and social groups and by the ways users behave Institute for Software Research 12 The Internet infrastructure supports a creative and thriving social and economic system. The Internet is much richer than its communication infrastructure. This is an Ultra-Large Scale System Usual software system assumptions don’t hold Architectural styles help explain software systems. What is an appropriate theory of architectural style for Internet applications? Institute for Software Research 13 Ultra-Large-Scale Systems More than “systems of systems” or “networks of networks” Large size on many dimensions … Lines of code, amount of data, users, dependencies among and complexity of components, etc … but more than that Decentralized operation and control Conflicting, unknowable, diverse requirements Continuous evolution and deployment Heterogeneous, inconsistent, changing elements Indistinct people/system boundary Normal failures New forms of acquisition and policy Institute for Software Research 14 SEI. Ultra-Large-Scale Systems. 2006 Decentralized operation and control ULS system scale offers only limited possibilities for central or hierarchical control Long life, multiple users and objectives, span of physical jurisdictions are the norm at ULS scale Many versions of subsystems must work together Modifications are developed and installed by independent groups Spontaneous, unanticipated new uses arise Undermines common assumption: All conflicts must be resolved and must be resolved uniformly Institute for Software Research 15 Conflicting, unknowable, diverse requirements ULS systems serve wide range of competing needs Competing users contend for requirements Understanding of problem evolves Dependability is “better/worse”, not “right/wrong” Wicked problems Undermines common assumptions: Requirements are known in advance, evolve slowly Tradeoff decisions will be stable Institute for Software Research 16 Continuous evolution and deployment ULS systems have long lives and multiple independent developers Different groups may install capability for their own needs; this may conflict with other groups Evolution can’t be controlled centrally, must be shaped by rules and policies that protect critical services and allow diversity at the edges Undermines common assumption: System improvements introduced in “releases” Users/developers know about releases and can choose to accept them or not Institute for Software Research 17 Heterogeneous, inconsistent, changing elements ULS systems will be composed from independentlycreated components Heterogeneous: many sources, no single interface standard, often incorporating legacy systems Inconsistent: evolution spontaneous, not planned; different objectives may cause inconsistent versions Changing: hardware, software, operating environment change based on local decisions Undermines common assumptions: Effect of change can be predicted adequately Configuration information is accurate and controlled Components and users are fairly homogeneous Institute for Software Research 18 Indistinct people/system boundary ULS systems’ service to a user depends on actions of other users; user/developer distinction soft User actions may affect overall system health System must adapt to changing usage patterns Aggregate analysis may be better than exact analysis Undermines common assumption: Users’ behavior doesn’t affect overall system Collective behavior of people is not relevant Social interactions are not relevant Institute for Software Research 19 Normal failures ULS system scale implies inevitable failures, so systems must do protection/recovery/enforcement Hardware failures inevitable because of scale Legitimate use of software and services outside planned capability will cause degradation/failure Malicious use will cause problems Undermines common assumptions: Failures will be infrequent and exceptional Defects can be removed Institute for Software Research 20 New forms of acquisition and policy ULS systems will evolve, but there must be governance to prevent anarchy Success of system depends on organic evolution Individual developers won’t fully understand core infrastructure Need effective guidance on allowed/unallowed change Undermines common assumption: There is a single agent responsible for system development, operation, and evolution Institute for Software Research 21 Analogy: Cities and city planning Cities are complex systems Built of individual components chosen by individuals Constantly evolve Withstand failures and attacks Cities are not centrally controlled Standards Building codes, highway standards Policies for infrastructures that allow individual action within constraints Zoning laws Regulations that govern individual action Enforcement after the fact, rather than prior constraint Institute for Software Research 22 The Internet infrastructure supports a creative and thriving social and economic system. The Internet is much richer than its communication infrastructure. Architectural styles help explain software systems. Style establishes the abstract structure of a system Character of problem should lead to style for solution What is an appropriate theory of architectural style for Internet applications? Institute for Software Research 23 Architectural style in software Styles are rooted in common knowledge, intuition, informal prose, box-and-line diagrams A style is a set of design rules that identify which kinds of components make up a system which kinds of connectors compose the components how control is shared among components how data is transmitted through the system what type of reasoning is compatible with the style Focus on abstractions used by designers In practice, implemented as processes and calls … … but that loses the designer’s intent Institute for Software Research 24 M Shaw and P. Clements, A Field Guide to Boxology… COMPSAC 1997 Examples of styles Data flow styles Batch Call-and-return styles Main sequential, dataflow network, closed-loop control program/subroutines; object-oriented systems Interacting process styles Communicating Data-centered repository styles Transactional document, hypertext, Fortran common Hierarchical styles Layers 25 database, blackboard Data-sharing styles Compound processes, event subsystems Institute for Software Research Members of Interacting Process Family All members of the family are dominated by messagepassing protocols among independent, usually concurrent processes with sporadic low-volume traffic. Substyle ===Control=== TopoSynchlogy roniciy Comm proc arb 1way data flowlinear Client-server star Heartbeat hier Broadcast arb Token passing arb not seq asynch synch ls/par asynch asynch ====Data==== Topology Mode arb any linear passed star passed hier/star pass,share star bdcast arb passed Institute for Software Research 26 G. Andrews, Paradigms for Process Interaction in Distributed Programs. ACM Comp Surv 1991 Language of Styles: Constituent Parts Component: unit of software that performs some function at runtime process procedure manager stand-alone program transducer memory Connector: mechanism that mediates communication, coordination, cooperation among components procedure call shared representation message protocol data stream direct access implicit invocation Institute for Software Research 27 Language of Styles: Control Issues Topology: Geometric form of control flow linear hierarchical Synchronicity: dependency among components synchronous opportunistic acyclic arbitrary asynchronous direct access Binding time: when are control relations set up message protocol initialization time compile time while running Institute for Software Research 28 Language of Styles: Data Issues Topology: Geometric form of data flow linear hierarchical Continuity: how continuous is flow continuous sporadic (discrete times) Mode: how data is made available explicit passing broadcast acyclic arbitrary sharing multicast Binding time: when are control relations set up message protocol initialization time compile time while running Institute for Software Research 29 Language of Styles: Reasoning Data flow styles Functional Call-and-return styles Hierarchy composition (local reasoning) Interacting process styles Nondeterminism Data-centered repository styles Data integrity (ACID, convergence, invariants) Data-sharing styles Representation Hierarchical styles Levels 30 of service Institute for Software Research Utility of Arch Description Languages Establish uniform descriptive notation Clarify informal concepts Document design intent Discriminate among different styles Bring out significant differences that affect suitability for various tasks Provide advice on selecting style for a problem Following Jackson, characterize problem domain and use this to select appropriate abstract architecture Separate concerns about fit of architecture to problem from implementation and performance issues Allow comparison of alternatives, potentially simulation and analysis Institute for Software Research 31 The Internet infrastructure supports a creative and thriving social and economic system. The Internet is much richer than its communication infrastructure. Architectural styles help explain software systems. What is an appropriate theory of architectural style for Internet applications? Style establishes the abstract structure of a system Character of problem should lead to style for solution Institute for Software Research 32 Picking a target in cyberspace Cyberspace supports many sophisticated applications, interactions, and communities, including everything from IM/email to enterprise integration. Focus here on the web as used by everyday people This is the most different from conventional software systems We need to start with a specific focus End users are in dire need of support Institute for Software Research 33 Example: Twitter Vote Report On US Election Day 2008, this application, set up by an interested user community, collected reports on status of polling places and displayed it on a map in near real time so that interested people, especially the public, could spot problem areas 34 Institute for Software Research VoteReport input stream Data was collected by twitter, SMS, and telephone call. A stylized format was provided (and often ignored) The message feed was available Institute for Software Research 35 VoteReport composition The VoteReport application interpreted the Twitter/SMS feed, geo-referenced the data, and mapped them, presumably through the efforts of professional programmers. The report could be embedded in a web site by anyone The input stream was also available 36 Institute for Software Research Mashups “Mashup”s combine functionality of multiple websites or augment existing websites; they show how users build on existing systems in new and innovative ways. Wong did a qualitative survey of high-quality mashups to see how they used/improved existing web sites, how they combined data from multiple sites, and what kinds of user tasks they support Overall impression: mashups are ad hoc and idiosyncratic Institute for Software Research 37 J. Won and J Hong. What do we “mashup” when we make mashus? WEuSE IV 08 Capabilities of Mashups Mashups provided a variety of capabilities, often more than one per mashup Search: Does it have a search interface? Visualization: Does it add visualization to the date? Real-time: Is the purpose to allow user to monitor original websie as realtime data set? Widget: Is it actually a widget or plugin? Personalized: Does it use personal information or enable personalization? Folksonomy: Does it use or add tagging? In-situ use: Does it simply tailor a website for a specific situation or use? Institute for Software Research 38 Categories of mashups Aggregation: aggregate multiple websites or summarize sets of data Alternate UI & In-situ use: provide new ways t interact with a website or support specific use cases Personalization: specialize a website based on personal data from that site or another source Focused View of Data: index or categorize contents of another website Real-time Monitoring: react to changes in a website and make user aware of them Institute for Software Research 39 Implementations of mashups Inputs RSS feeds, search engine feeds, personal names, user ids, settings, ranges, geographical points and directions, search terms, tracking numbers, ISBNs, headlines, tags, “clippings” (screen scraping?) Outputs RSS feeds, list of links, visualizations, text, maps, pictures, numeric tables, lists, reports APIs Delicious, yelp, craiglist, flickr, city information feeds, eBay, Google Maps, Google Earth, USGS DRM, amazon, UPS, USPS, FedEx, iTunes, Napster, news sites, wikipedia daylife, Rhapsody, search engines, Twitter, YouTube, MySpace Institute for Software Research 40 Architectural Task for Web Applications How do web applications differ from classical software systems? Richer set of component types Much stronger role for data and its presentation More integration by inclusion or embedding More significance for initiative (push/pull) Minimal reliance on compiler for integration How much of classic architectural style carries over? How much extension is required? Structure (component/connector/composition) remains valid Details partially hold, but require extension Institute for Software Research 41 Online Activities Between user and … Interactive Synchronous Asynchronous … few known people IM, chat, email online meetings private blog, photo sharing … many unknown people multiplayer games, broadcast newsgroups, mailing lists, RSS feeds public blog, wiki, social networks, ratings … a server e-commerce, banking, games, remote desktop search, news, job hunt web browsing, music download, storage/backup Institute for Software Research 42 Distinguishing the Classes of Solutions One role of an architecture for cyberspace is to explain the structural distinctions among the applications that support user activities Sometime differences matter You wouldn’t use IM for offline backup of photo files What attributes of IM, photo files, and the backup task show the mismatch? Sometimes the same application can be realized in different ways Both FTP and P2P support file sharing, but with different resource implications What attributes show the abstract similarity, and what attributes show the resource implications? Institute for Software Research 43 Elementary Component Types (provisional) Text (human-readable, file or message) Encoded file (typed set of bits) Database (structured, with query capability) Web page (human-readable, possibly generated) Computation or service Stream (continuous, nonpersistent) Human (yes, people are part of the system) Attributes of interest state persistence, qualitative size, type/internal structure, rate of change (static/dynamic) Institute for Software Research 44 Connector Primitives (provisional) The lowest level of primitive is the TCP/UDP port 20, 21 25 109, 110 194 3724 5190 FTP SMTP POP2, POP3 IRC World Warcraft AOL IM 23 Telnet 80, 443 HTTP, HTTPS 143, 220 IMAP 666 Doom 4664 Google desktop 6881-6988 BitTorrent … and a couple of thousand others Attributes of interest activation (push/pull/continuous/interactive queryrespond); arity; signature; knowledge of targets; abstract protocol; synchronicity; performance (rates, bandwidth, latency); state of interaction (persistence, location); duration of interaction Institute for Software Research 45 Composition of primitives Abstractly, we can recognize Simple connections Scripting, linking Remote call in various forms RPC, service invocation Extended sessions Browser session with session ID User-facing Installation of plug-in, mashups Enterprise compositions compositions SOA, SaaS Institute for Software Research 46 Example: Internet Messaging A human creates short unstructured text units and pushes them via IRC to another known person. They are delivered with low latency, but they are not persistent. Institute for Software Research 47 Example: Grapevine model of email An email system has five components: a composer, a reader, a database for archiving messages, a directory system, and a transport mechanism. A message is a structured text entity. It has formatted headers, including an address text string, and may have internal formatting and/or attachments. The composer is an interactive editor (computation) that produces a mail message The reader can display the message, and it is integrated with the database for archiving messages, which it can also classify and retrieve. The directory system translates a symbolic address to a destination. The transport mechanism moves mail to is destination. Institute for Software Research 48 Grapevine paper, CACM, sometime in the 70s Example: VoteReport Here is a plausible architectural description of the VoteReport application The input is a stream of messages, merged from twitter, SMS, and transcriptions of phone calls. The input is supposed to be structured but often is not A parser receives the input stream and transduces the messages (to the extent possible) to georeferenced reports A report has a text, a timestamp, and an icon Recent reports are displayed on a zoomable map embedded in a web page Filters are provided, and summaries are displayed The map and input stream are elements that can be incorporated in other web pages. Institute for Software Research 49 The Internet infrastructure supports a creative and thriving social and economic system. Some The Internet is much richer than its communication infrastructure. The of the wires in the network have myelin sheaths problems are wicked, not just technical Architectural styles help explain software systems. Software architecture provides a model for a descriptive theory of software in cyberspace What is an appropriate theory of architectural style for Internet applications? Can we help end users at the ends of the network? Institute for Software Research 50 BEHOLD, WE HAVE SIGNAL Institute for Software Research 51