slides

advertisement
ArtifactWebs: Navigable Product
Structures
Susan Finger and Sharad Oberoi
Carnegie Mellon University
Collaborative learning in design
• Goal
– Develop tools that encourage process
competence, constructive skills, and reflective
practice
• Web–based collaboration tool
• Meeting capture and summarization
• Navigable artifact webs
Collaborative learning in design
• Goal
– Develop tools that encourage process
competence, constructive skills, and reflective
practice
• Web–based collaboration tool
• Meeting capture and summarization
• Navigable artifact webs
Collaborative learning in design
• Assertions
– Most learning in design classes takes place in
team meetings and in individual activities
undertaken to help meet team goals
– Argumentation, co-construction, and reflection
are important elements of collaborative learning
Outline
• Setting
– Engineering design capstone course
– Ongoing project to understand collaborative
learning by student design teams
Engineering design capstone course
• Required for all accredited engineering
programs in US
• Commonly stated goal: Students should
synthesize all the engineering knowledge
they have acquired as undergraduates
Engineering design course projects
• The projects are usually:
– Team-based
– Company-sponsored (or client-driven)
– Non-competing (each team has an independent
project)
– Often taught by academics with little project
experience and even less design experience
• The grade is usually based on
– The quality of the final product
– The self-reported quality of the team interactions
Engineering design course projects
• Students
– are novices in their domain knowledge
– are novices in their knowledge of the design
process
– often judge their success by the grade they earn or
by the artifacts they produce
• Teacher
– rarely plans to use the team’s design directly
– usually does not attend group meetings
– often does not know if a feasible solution exists to
the design problem as stated
Engineering design course projects
• Team membership can change over time, so it
is difficult to keep track of the progress as
well as the options explored
• Inherent temptation to start the work over
from scratch, wasting time and resources
• These problems exist for both industry and
student teams, but are usually more severe
for student teams
Engineering design course projects
learning goals
activities
assessment
Collaborative learning research group
• Our focus is to develop tools that encourage
process competence, constructive skills,
and reflective practice
– Need to capture process to understand student
learning
– Collaboration tools designed for industry rarely
work well for student teams
– Sequence of two National Science Foundation
grants on collaborative learning in design
NSF Grant: Collaborative Learning
across Time and Space
• Goal: To take advantage of advances in
mobile computing to create collaboration
tools for student design teams
• Means: Create an environment that
– facilitates group collaboration for students
– enables faculty to peer into the collaborative
learning process
• Hook: Students design the tools they need
for their own collaboration
Kiva collaboration tool
• Takes advantage of students willingness to
send email, use IM, post on newsgroups,
send text messages
• Design goal: Create an interface that
students perceive to be equivalent to their
preferred communication modes; that is:
make it feel like chat
Design education testbed
• RPCS: Rapid prototyping of computer
systems
– Interdisciplinary, capstone design course
– Ambitious projects, e.g.
• GM companion car-driver interface
• Context aware cell-phone
• Wireless classroom on the Voyager science
boat
Capturing in-process data
• For 4 years, RPCS has used the Kiva for
team collaboration
– Light-weight collaboration tool
– Combines functions of e-mail and bboards
– Widely accepted and liked by student teams; it
feels likes chat and meets their needs
– Each year’s Kiva has hundreds of threads and
thousands of posts and files
We have 4 years of data of all the team
conversations and files that would normally
go through email or chat
Alan
I was asking which formula you wanted to use. which comes
down to which regression line we are using to map from the fuel
values to RPM. I used : y = 327.89x3 - 2194.6x2 a 5087.4x 2719.1. R2 = 0.9997
Kim
Ah I see.. To be quite honest, I was planning on discussing this
issue during class for tomorrow. For now we can just use the one
that you wrote above and we will talk more about it during
tomorrow's lecture. Thanks!!
Chris It also might be good to do something as a special case in the
formula so that we don't return a negative number for low values
of fuel consumption. It looks really weird in the dashboard. ;)
Alan
hehe.. Right. Thanks for testing that. :
Alan
Kim: I switched formulas. Now I am using this one. y = -1812.5x4
+ 6744.9x3 - 8322.2x2 + 4325.9x + 4.1796. R2 = 0.9993. This is
because I had to re-center the data to 0, and this new formula
works much better then the alternatives. This is engine 1 I think.
(the first set of numbers)
Alan
Sigh.. another new formula. Forgot the upper bound. y = 81.054x3 + 311.21x2 + 396.24x - 10.003. R2 = 0.9859
Kiva usage
• How do students use the Kiva?
–
–
–
–
Group coordination (18%)
Knowledge and work exchange (33%)
Preparation of deliverables (24%)
Other (25%)
ADEPT - Assessing Design Engineering Projects
Classes with Multi-disciplinary Teams
• Develop a physical infrastructure that enables the
capture of synchronous and asynchronous interactions
of student design teams
– The (complete) up-to-date record of all of a team’s
interactions will enable us to create ArtifactWebs that
integrate and summarize team communications
– The ArtifactWebs will provide traceability and
accountability for individual contributions to shared
knowledge
– The ArtifactWebs will enable facilitated improvement of
engineering design courses (i.e. the instructor will know
when to intervene)
Capturing in-process data
• This year, we collected audio files of
meetings
– Individual speaker
– Automated speech to text transcripts
– Observation and coding of all team meetings
We have 1 year of data of team
conversations (with many gaps)
Objectives
• To create ArtifactWebs that
– represent the state of the project based on the
artifacts described in the project documents
– enable designers to search and navigate to find
relevant information quickly and efficiently
– evolve as the artifact, and the documents about
it, evolve.
Design documentation
• Design project documents are generated by different
team members at different times during a project, so
no one is aware of everything that is in all the
documents
• Locating the right information among evolving
documents or reference documents can be time
consuming
• Even for teams with well-structured document
management systems, finding the correct paragraph
or document fragment for a given topic can be
difficult
Visionary Scenario
A student in the wearable computer class is working on
developing a text to speech module for a mobile device.
Someone tells her that last year’s class developed an OCR
(optical character recognition) module for the Trinetra project.
She accesses the Trinetra DesignWeb through the class web
space.
Visionary Scenario
She quickly searches (using standard search) to find the subweb for the
OCR module.
She then browses within the OCR module exploring various aspects of
the OCR design from the previous team.
Visionary Scenario
Finally she focuses on the modules on the mobile device. She reads the
segment of the final report on the OCR mobile module as well as some of
the supporting documents that led to the final decisions in the OCR
design.
Challenges
• Levels of abstraction
• Alternate views for different users
• Credibility of source (transcripts of
meetings vs. final reports)
• Identifying the structure of created
knowledge, especially for different versions
of the same document
• Identifying the design intent
Strategy overview
• Divide documents into topic segments
• Cluster segments by semantic similarity (e.g.
revisions of same paragraph or similar
paragraphs from different sources)
• Summarize each cluster
• Create a diagram that connects the key
words in the document summaries
• Develop graphical display algorithms that
enable users to search and navigate the
graphs to access the underlying documents
Segmentation
• Divide documents into topic segments
– use the explicit structure of the documents (table
of contents and internal headings)
– use existing text segmentation algorithms such
as TextTiling, which performs semantic clustering
of terms and topic identification based on
clustering
• Issue: Size of segments (big or little chunks)
Clustering
• Cluster segments by semantic similarity (e.g.
revisions of same paragraph or similar
paragraphs from different sources)
– InfoMagnets, created by Rosé, uses Latent
Semantic Analysis and document clustering to
automatically generate a bubble diagram, which
a user can then incrementally adjust through the
interface.
• Issue: Non-standard vocabulary across
disciplines
Summarizing
• Summarize each cluster
– Summarization is widely used in web searches
– Many potential summarization algorithms exist
• Issues: What types of summaries are useful
for designers and what types are useful for
creating the graphs
Graphing
• Create a diagram that connects the key
words in the document summaries
– Use co-word analysis to find relationships among
the key words in the document summaries
• Issues: Level of granularity and strength of
relationships
Visualizing
• Develop graphical display algorithms that
enable designers to search and navigate the
graphs to access the underlying documents
• Issues: Algorithm and interface design
documents
Auto-summarization
Design
teams
Summarized
fragments
Document
fragments
 Collocation
analysis
 Version matching
 Credibility mapping
 Document structure
and associated
metadata
Network of versioned
fragments
documents
Auto-summarization
Design
teams
Summarized
fragments
Document
fragments
 Collocation
analysis
 Version matching
 Credibility mapping
 Document structure
and associated
metadata
Network of versioned
fragments
documents
Auto-summarization
Design
teams
Summarized
fragments
Document
fragments
 Collocation
analysis
 Version matching
 Credibility mapping
 Document structure
and associated
metadata
Network of versioned
fragments
Conclusions
• Creating ArtifactWebs automatically from student design
documents is useful for organizing the information into
product structures.
• These product structures can be used for developing
computational environments that support systematic
modeling and also for characterizing design problems.
• ArtifactWebs can help us understand the content and
nature of information related to various aspects of the
artifact and how designers generate and refine it.
Questions?
Prior work
•
Previous work on automatic topic segmentation has
focused on segmentation of expository text written by
professionals
–
–
–
•
•
technical articles, such as journal papers
non-technical articles (e.g. blogs)
multi-party dialogues in a synchronous (e.g. chat) or
asynchronous environment (e.g. discussion-boards)
Student project reports do not come under any of these
categories
Nobody has evaluated student design reports that are
often characterized by their authors’ lack of experience
in technical writing
Proposed Solution
•
Navigable ArtifactWebs that will:
– Aid instructors and students alike by giving them a
bird’s eye view of the evolving design.
– Enable team members to explore the ideas that have
been generated during the design process, the
connections between the ideas, and the evolution of
the ideas.
– Direct the users to the relevant fragment of a
document that contains the detailed discussion of an
idea, in addition to searching the relevant topics using
a query-based approach.
Challenges
• Levels of abstraction
• Alternate views for different users
• Credibility of source (transcripts of
meetings vs. final reports)
• Identifying the structure of created
knowledge, especially for different versions
of the same document
• Identifying the design intent
Background
• Two broad categories of previous work in topic segmentation:
1. Lexical Cohesion Models: based on the central idea that the
segmentation of text is guided primarily by distribution of
terms used in it, in contrast to using cue words for the
purpose.
Examples: TextTiling (Hearst, 1997) and Latent Semantic
Analysis (Landauer and Dumais, 1997)
2. Content-oriented Models: based on the evaluation of
reoccurrence of topic patterns over multiple thematically
similar discourses.
Examples: Approaches based on Hidden Markov models
(Barzilay et al, 2004).
TextTiling (Hearst, 1997)
1. Block comparison approach: Adjacent pairs of text blocks are
compared for overall lexical similarity.
The sentences are grouped into blocks of size N/2 each, where
the more the terms are similar to each other in the two blocks,
the higher the lexical score we get at the gap between them.
2. Vocabulary introduction approach: Adjacent pairs of text
blocks are compared for overall lexical dissimilarity.
The sentences are grouped into blocks of size N/2 each, where
the more thematically unrelated terms are introduced, the
higher the lexical score we get at the gap between them.
TextTiling (Contd)
3. Lexical chain-based approach: Adjacent pairs of text
blocks are compared for identifying the number of active
chains, or terms that repeat within threshold sentences
and span the sentence gap.
This approach is based on the assumption that when a
term is repeated in a more or less short distance (called
a hiatus), a lexical chain is created between these two
occurrences. Thematic boundaries are set in the text at
places where the number of chains is minimal. This
approach attempts to segment texts at places where the
local cohesion is the lowest.
Museli (Arguello et al,2006)
• Used for evaluating dialogues.
• It combined evidence of topic shifts from lexical
cohesion with linguistic evidence such as syntactically
distinct features.
• It used unigrams, bigrams, POS-tagging and lexical
scores as the features to solve the segmentation
problem as a binary classification problem where each
contribution is classified as NEW_TOPIC if the
contribution introduces a new topic and SAME_TOPIC
otherwise.
Three degenerative approaches
a) Classifying all contributions as NEW_TOPIC (ALL),
b) Classifying no contributions as NEW_TOPIC (NONE),
c) Classifying contributions as NEW_TOPIC at uniform
intervals (EVEN), corresponding to the average
reference topic length
Experiments
•
Data Source: Documents created by students
in the Rapid Prototyping of Computer Systems
classes at Carnegie Mellon as our data-set.
Experiments
• Evaluation Metrics:
a) Pk measure determines the probability of
misclassifying two contributions a distance of
k contributions apart from each other by
determining if they constitute the same topic
segment or not. Lower Pk values are preferred
over higher ones.
b) F-measure refers to the weighted harmonic
mean of precision and recall.
Experiments (Contd)
•
Gold Standard: We use the section and sub-section
headings for student documents as tags for different
student document fragments and the boundaries
between them as the correct segmentation locations.
Experiments (Contd)
•
1.
2.
3.
Methodology:
TextTiling: Block comparison approach
Museli: Naïve Bayes classifier with an attribute
selection wrapper and the Chi-square test for ranking
the attributes using 10-fold cross-validation. [All along
we were careful not to include instances from the same
document in both the training and test sets on any fold
so that the results would not be biased.]
We trained a model with the top 1000 features, and
applied that trained model to the test data.
Three degenerative approaches
Results
TextTiling works best, while Museli worked worst.
Conclusions
• TextTiling was designed to partition texts into
contiguous, non-overlapping subtopic segments. So it
works best for segmenting student design documents.
• Naïve Bayes algorithm works best when there is much
more training data available than what we provided, and
where the documents are more stratified, so there is less
chance of overlapping words in each category.
• The degenerative approaches give us a baseline for what
happens with regular segmentation regardless of the
content.
Possible weak points in our approach
• Stoplist could be customized for each document.
• Lemmatization removed words with the same root,
but this may have implications on engineering
design documents.
Summary
• We evaluated the approaches considered successful for
automated topic segmentation of monolithic text written
by professional authors and multi-party dialogues to the
documents written by students as part of their projectbased courses.
Future/Ongoing Work
Future/Ongoing Work
• We plan to evaluate the implementation of content-oriented
models used successfully for a series of thematically similar
discourses (such as a number of newspaper articles about
similar events) for different versions of student design
documents. Since these approaches greatly rely on the
structure of the document for successfully implementing the
topic segmentation, having student documents characterized
by a lack of such explicitly defined structures would be
interesting.
• We believe that such evaluations will slowly but steadily help
us move towards achieving the vision of ArtifactWebs.
Thanks!
2002 NSF grant
• Team-Based Design: Collaboration Across
Time and Space
– Daniel P. Siewiorek, Susan Finger and Asim
Smailagic
– Combined Research and Curriculum
Development Grant
Kiva vision
• Rapid Prototyping of Computer Systems
– 25 students primarily from engineering, and also
from design, English, HCI designed, developed,
integrated, and tested an environment for student
team work
– In their visionary scenario, the Kiva is an
interactive physical and digital workspace that
addresses the requirements of interdisciplinary
teams. It is the digital equivalent of a dedicated
project room.
Kiva vision
First implementation
Second iteration
Assessment strategy
•
•
•
•
Pre and post essays on design process
Pre and post domain knowledge tests
Focus groups
Coding and analysis of the posts
Proposed Solution
•
Navigable ArtifactWebs that will:
– Aid instructors and students alike by giving them a
bird’s eye view of the evolving design.
– Enable team members to explore the ideas that have
been generated during the design process, the
connections between the ideas, and the evolution of
the ideas.
– Direct the users to the relevant fragment of a
document that contains the detailed discussion of an
idea, in addition to searching the relevant topics using
a query-based approach.
Potential Benefits to Members
• Ability to find information in documents from
prior designs
• Ability to track a design-in-progress through
the completeness of its DesignWeb
Download