Making Mashups with Marmite Jeff Wong Jason I. Hong

advertisement
Making Mashups
with Marmite
Jeff Wong
Jason I. Hong
Carnegie Mellon University
The Big Picture Problem
•
Lots of content out there on the web
– But not always in a form amenable to your needs
– Ex. Easy to get a list of hotels in San Jose, not so easy
to sort by distance to convention center
•
Two observations:
– In many cases, all of the data and services people need
already exist, but not connected together
– Unlikely that a web site can predict all possible needs
A Solution: Mashups
•
Rapidly growing community of users creating
“mashups” combining content from multiple web sites
– Ex. Housingmaps.com
A Solution: Mashups
•
Rapidly growing community of users creating
“mashups” combining content from multiple web sites
–
–
–
–
Ex. Housingmaps.com
Ex. MySpace child predators
Ex. Friendster locations
Ex. Most popular videos on YouTube, Yahoo Video, …
A Solution: Mashups
•
Rapidly growing community of users creating
“mashups” combining content from multiple web sites
–
–
–
–
•
Ex. Housingmaps.com
Ex. MySpace child predators
Ex. Friendster locations
Ex. Most popular videos on YouTube, Yahoo Video, …
ProgrammableWeb.com statistics
– ~1500 mashups created since April 2005
– 356 open web-based APIs available
But Creating Mashups is Hard
•
Requires lots of skill to create a mashup
– Ex. Housingmaps creator has PhD in computer science
– Ex. MySpace child predator list took months
•
Requires programming expertise in many areas
–
–
–
–
–
Web crawling
Text parsing
Pattern matching
Databases
HTML
Marmite
End-User Programming for Mashups
•
Main idea: make it easy to create web mashups
•
Use a dataflow approach connecting small operators
– Inspired by Unix pipes and Apple’s Automator
•
Example:
– Get all events from Upcoming.org
– Filter out events that are too old
– Put them all onto a map
•
Runs inside of a standard web browser
Set of Operators
Data Flow View
Data View
Using Marmite (Envisioned)
•
Extract content from one or more web pages
– names, addresses, dates, phone #, URLs
•
Process it in a data flow manner
– filtering out values or adding metadata
– integrating with other data sources (similar to a database
join operation)
•
Direct the output to a variety of sinks
– databases, map services, text files, visualizations, web
pages, or source code that can be further edited
Marmite
•
•
•
Motivation and Examples
Features and Design Rationale
User Evaluation
Features and Design Rationale
•
Conducted a series of quick evaluations to
understand design space and potential problems
– Automator
– Lo-fi prototypes
Automator
Informal Automator Evaluation
•
Had three novices try three simple web-based tasks
– Warm-up task
– Traverse a set of web pages
– Download a set of images
•
Some findings:
– Some difficulties knowing how to start and what to do next
– Little feedback about state of system between operations
– Difficult to iterate due to network speed issues
Lo-Fi Prototypes
•
6 paper prototypes with 20 participants
Design Solutions
•
•
Problem: how to start and what to do next
Solution: Suggest next actions
– Weak data typing to find types (addresses, numbers, etc)
– Filter operators to only show relevant ones
– Suggest operators that might be applicable
Design Solutions
•
•
Problem: little feedback about state of system
between operations
Solution: link data flow and data view together
– Many systems take program-centric view (ex. Automator)
or data-centric view (ex. spreadsheets)
– Use hybrid data flow / data view, showing an operation
and its effects together
– Data view usually “spreadsheet”, other views possible too
(for example, maps)
Design Solutions
•
•
Problem: difficult to iterate due to network speeds
Solution: cache data, let people “replay” data
– Reload, pause, play
Other Design Findings
•
Screen real estate issues
– Collapsible operators, leaving a readable label
Extracting Generic Content
•
Can’t have pre-defined extractor operators for
every possible web site
– Need a more general way of extracting data from pages
•
Developed a generic wizard UI for selecting links
– Content from that set could be extracted via other operators
– Uses Solvent (MIT), an XPath-based algorithm for finding
patterns in web pages
• Finds “groups” of related web content based on how
HTML is structured
Marmite
Operators
•
Operators have input types
– Operator uses this to guess which columns it wants
•
Operators have output types
Implementation
•
JavaScript (for underlying code) and Extensible
Binding Language (XBL for UI)
•
Operators currently in JavaScript
– Ideally could be scriptable in any programming language
– Currently ~15 operators
Marmite
•
•
•
Motivation and Examples
Features and Design Rationale
User Evaluation
Evaluation
•
Informal user study with 6 people
– 2 novices
– 2 people with spreadsheet experience (formulas)
– 2 people with programming experience
•
Tasks (in increasing difficulty)
– Warmup task showing how to retrieve a set of addresses
and how to geocode an address
– Search for and filter out events further than a week away
– Compile a list of events from two event services and plot
them on a map
– Recreate the housingmaps site
Results
•
Three people able to complete all tasks in ~1 hour
– First two users confused about suggested actions
(automatically popped up, made manual for other 4 users)
– Novice made some progress, not able to finish all tasks
•
Able to re-create housingmaps in ~15 minutes
Marmite
More Results
•
Biggest barrier was understanding the data flow
– Did not understand input and output concept
– Applied operators as one-off, did not realize that it was a
static representation of flow
– Did not understand data flow and data view were linked
Future Directions
•
Short-term
–
–
–
–
•
Better screen-scraping operators
More operators
Better connection with web services (WSDL and REST)
Better help for starting a data flow
Long-term
– Intelligence analysis
– Better visualizations
– Location-based services
Conclusions
•
Marmite, a tool for creating web-based mashups
– Extract content from one or more web pages
– Process it in a data flow manner
– Direct the output to a variety of sinks
•
•
Hybrid data flow / data view
User evaluation shows some promising results
Jeff Wong, Jason Hong, Making Mashups with
Marmite: Re-purposing Web Content through EndUser Programming, CHI 2007
Marmite
Types of Operators
•
Sources
– Add data into Marmite by querying databases, extracting
information from web pages, and so on.
•
Processors
– modify, combine, or delete existing rows. Example operators
include geocoding (converting street addresses to latitude
and longitude) and filtering. Processor operators might add
or remove columns as well
•
Sinks
– redirect the flow the data out of Marmite. Examples include
showing data on a map, saving it to a file, or to a web page.
Download