Making Mashups with Marmite Jeff Wong Jason I. Hong Carnegie Mellon University The Big Picture Problem • Lots of content out there on the web – But not always in a form amenable to your needs – Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center • Two observations: – In many cases, all of the data and services people need already exist, but not connected together – Unlikely that a web site can predict all possible needs A Solution: Mashups • Rapidly growing community of users creating “mashups” combining content from multiple web sites – Ex. Housingmaps.com A Solution: Mashups • Rapidly growing community of users creating “mashups” combining content from multiple web sites – – – – Ex. Housingmaps.com Ex. MySpace child predators Ex. Friendster locations Ex. Most popular videos on YouTube, Yahoo Video, … A Solution: Mashups • Rapidly growing community of users creating “mashups” combining content from multiple web sites – – – – • Ex. Housingmaps.com Ex. MySpace child predators Ex. Friendster locations Ex. Most popular videos on YouTube, Yahoo Video, … ProgrammableWeb.com statistics – ~1500 mashups created since April 2005 – 356 open web-based APIs available But Creating Mashups is Hard • Requires lots of skill to create a mashup – Ex. Housingmaps creator has PhD in computer science – Ex. MySpace child predator list took months • Requires programming expertise in many areas – – – – – Web crawling Text parsing Pattern matching Databases HTML Marmite End-User Programming for Mashups • Main idea: make it easy to create web mashups • Use a dataflow approach connecting small operators – Inspired by Unix pipes and Apple’s Automator • Example: – Get all events from Upcoming.org – Filter out events that are too old – Put them all onto a map • Runs inside of a standard web browser Set of Operators Data Flow View Data View Using Marmite (Envisioned) • Extract content from one or more web pages – names, addresses, dates, phone #, URLs • Process it in a data flow manner – filtering out values or adding metadata – integrating with other data sources (similar to a database join operation) • Direct the output to a variety of sinks – databases, map services, text files, visualizations, web pages, or source code that can be further edited Marmite • • • Motivation and Examples Features and Design Rationale User Evaluation Features and Design Rationale • Conducted a series of quick evaluations to understand design space and potential problems – Automator – Lo-fi prototypes Automator Informal Automator Evaluation • Had three novices try three simple web-based tasks – Warm-up task – Traverse a set of web pages – Download a set of images • Some findings: – Some difficulties knowing how to start and what to do next – Little feedback about state of system between operations – Difficult to iterate due to network speed issues Lo-Fi Prototypes • 6 paper prototypes with 20 participants Design Solutions • • Problem: how to start and what to do next Solution: Suggest next actions – Weak data typing to find types (addresses, numbers, etc) – Filter operators to only show relevant ones – Suggest operators that might be applicable Design Solutions • • Problem: little feedback about state of system between operations Solution: link data flow and data view together – Many systems take program-centric view (ex. Automator) or data-centric view (ex. spreadsheets) – Use hybrid data flow / data view, showing an operation and its effects together – Data view usually “spreadsheet”, other views possible too (for example, maps) Design Solutions • • Problem: difficult to iterate due to network speeds Solution: cache data, let people “replay” data – Reload, pause, play Other Design Findings • Screen real estate issues – Collapsible operators, leaving a readable label Extracting Generic Content • Can’t have pre-defined extractor operators for every possible web site – Need a more general way of extracting data from pages • Developed a generic wizard UI for selecting links – Content from that set could be extracted via other operators – Uses Solvent (MIT), an XPath-based algorithm for finding patterns in web pages • Finds “groups” of related web content based on how HTML is structured Marmite Operators • Operators have input types – Operator uses this to guess which columns it wants • Operators have output types Implementation • JavaScript (for underlying code) and Extensible Binding Language (XBL for UI) • Operators currently in JavaScript – Ideally could be scriptable in any programming language – Currently ~15 operators Marmite • • • Motivation and Examples Features and Design Rationale User Evaluation Evaluation • Informal user study with 6 people – 2 novices – 2 people with spreadsheet experience (formulas) – 2 people with programming experience • Tasks (in increasing difficulty) – Warmup task showing how to retrieve a set of addresses and how to geocode an address – Search for and filter out events further than a week away – Compile a list of events from two event services and plot them on a map – Recreate the housingmaps site Results • Three people able to complete all tasks in ~1 hour – First two users confused about suggested actions (automatically popped up, made manual for other 4 users) – Novice made some progress, not able to finish all tasks • Able to re-create housingmaps in ~15 minutes Marmite More Results • Biggest barrier was understanding the data flow – Did not understand input and output concept – Applied operators as one-off, did not realize that it was a static representation of flow – Did not understand data flow and data view were linked Future Directions • Short-term – – – – • Better screen-scraping operators More operators Better connection with web services (WSDL and REST) Better help for starting a data flow Long-term – Intelligence analysis – Better visualizations – Location-based services Conclusions • Marmite, a tool for creating web-based mashups – Extract content from one or more web pages – Process it in a data flow manner – Direct the output to a variety of sinks • • Hybrid data flow / data view User evaluation shows some promising results Jeff Wong, Jason Hong, Making Mashups with Marmite: Re-purposing Web Content through EndUser Programming, CHI 2007 Marmite Types of Operators • Sources – Add data into Marmite by querying databases, extracting information from web pages, and so on. • Processors – modify, combine, or delete existing rows. Example operators include geocoding (converting street addresses to latitude and longitude) and filtering. Processor operators might add or remove columns as well • Sinks – redirect the flow the data out of Marmite. Examples include showing data on a map, saving it to a file, or to a web page.