Pollock - ECE/CIS

advertisement
Lori Pollock
Professor, CIS
Program Analysis, Software Development &
Maintenance Tools, Optimizing Compilers
‘81
’81-’86
B.S. CS and Econ, Allegheny
PhD in CS, U of Pittsburgh
Married Mark
’86-’90
Assistant Prof, Rice U
Lauren ‘88; Lindsay ‘90
’91Assistant, Associate, Full Prof UD CIS
Matt ’95
Today: Lauren & Lindsay here at UD, Matt 15, 3 PhD
students, 3 masters students, and a few undergraduate
researchers
What I do here at UD
• Research
– Software Engineering and Compilation Lab (Hiperspace)
• 213 Smith Hall
– Collaborations
• Vijay Shanker (UD CIS), Terry Harvey (UD CIS), Lisa Marvel (Army
Research Lab), Martin Swany (UD CIS), Guang Gao (UD ECE)
– Funding
• Primarily NSF grants; previously some Army funding
• Graduate Teaching
–
–
–
–
CISC 672 Compilers
CISC 673 Program Analysis and Transformations
CISC 879 Software Testing and Maintenance
CISC 879 Software Tools and Environments
• Undergraduates
– Programming XO laptops for local middle school teachers
– Study abroad programs
What I do outside UD
• Associate Editor, Transactions on Software Engineering
and Methodology (TOSEM)
• Computing Research Association (CRA)’s Committee on
the Status of Women in Computer Research (CRA-W)
• Mentoring – speaker at mentoring
workshops for undergrads, grads,
assistant and associate profs, and
industry lab researchers
• Program committees, conf org,
NSF panels, paper reviews,… (typical of university
researchers)
Current Students in Training
Antony Danalis
PhD
Xiaoran Wang
PhD
Giri Sridhara
PhD
Masters: Divya, Suparna, Poonam
Current Undergraduates: Sana Malik, Michelle Allen
Recently Completed PhD
2007-10
David Shepherd
Postdoc, Startup
Ben Breech
Army
Research
Lab
Sara Sprenkle
Assistant Prof
Washington & Lee U
Mike Jochen
Assistant Prof
East Stroudsburg U
Emily Gibson Hill
Assist Prof, Montclair State U
Overview: Research Projects
Natural Language
Analysis of Programs
Giri, Xiaoran
Shanker, Hill
Green Software
Engineering
Clause
Winbladh
Testing Web
Applications
Program Analysis
Sprenkle
Compiler Technology
Optimizing
Cluster
Parallel Programs
Antony,
Swany
Identifying
Code vulnerabilities
To Malware
Software Tools…………..Testing…..........Compilers….……Parallel Computing
Optimizing Cluster Parallel Programs
Research Problem - How can scientific codes be scaled
to a cluster of many CPUs?
Major Challenge – Communication Costs
Approach and Contributions:
An integrated system to hide communication latency
-Surveyor: Collect “knowledge” of cluster
-Compiler: analyze dependencies and transform to create maximal
communication/computation overlap
-Communication Library: Use a companion library to MPI
ASPhALT: Automatic System for
Parallel AppLication Transformations
Contribution: FIRST to cluster-optimize MPI codes
Testing Web Applications
Web Application Structure
Client
(HTML)
Server
(Java)
Browser
Front End
Database
(MySql)
Back End
• Combination of
– Stand-alone applications
– GUIs and Database applications
– Distributed applications
• Numerous technologies and components
Traditional Software Testing Process
Hard to obtain when
testing web applications!!
Application
Representation
Test Case
User-session-based Testing
Application
Specification
Application
Implementation
Expected
Results
Generator
Test
Cases
Test
Cases
Replay Tool
Actual
Results
Oracle
Pass/ Fail
User-session-based Testing Process
User 1:
register.jsp?name=ss&pass=tst
User-session-based
login.jsp?name=ss&pass=tst
logout.jsp
Users
Beta Web Application
(v.0.9) Deployment
Web Application
Implementation (v.1.0)
Expecte
d
Results
User
Log User Sessions Create/Reduce
Create test
Requests
cases
Test
Cases
Test
Test Cases
Cases
Replay Tool
Actual
Results
Oracle
Pass/ Fail
Maintenance Testing for Web
Applications
Research Problem: How can we exploit user session logging for testing
of web applications after initial deployment, with minimal tester effort?
Contributions: Scalable, practical, automated structural testing
framework for web applications
* Test case generation
* Test suite reduction
* Test oracles
* Test coverage criteria in terms of URLs, parameters, values
Analyzing the Names in Software
Research Problem
- 60-90% software costs are in reading and navigating large software
systems to fix bugs and add new features. Can we help with automation
of search, navigation, location of relevant code?
- Key: Programmers leave clues of their intent as they choose names.
Focus on actions
-Correspond to verbs
-Verbs need Direct Object
- Phrases more useful
Proposed Approach
– Develop, extend, and apply natural language-based analysis to the
identifier names and comments
Contribution - Aid understanding, debugging, maintenance, development
Our Research Focus and
Impact
Search
Exploration
Understanding …
Software Maintenance Tools
NLPA
Natural Language Analysis
Word splitting
Word relations
(synonyms,
antonyms, …
Part of speech
tagging
Abbreviations…
How do SE tool users search code now?
Determine
Relevance of
Results
(Re)formulate
Query
Query
User
Search
Results
Search Method
1. User formulates query
2. Query executed
by search method
3. User views search
results
4. Repeat as necessary
Source
Code
Our focus:
Natural language (NL) queries
User faces 2 challenges:
• Decide what query words to search for
• Determine whether the results are relevant
“compile report” vs.
“compil*report”
Our Contextual Search
Process
(Re)formulate
Query
Query
Search Method
Source
Code
Informatio
Information
n
Extraction
Extraction
Process
Process
Determine
Relevance of
Results
User
Search Method
NL Phrase
Mapping
Partial
Source
Phrase
Code
Matching
Hierarchical
Search
Results
Another Tool - UD-Summarize: An Example
/* Update linear edge view.
If width <= 1, draw line to given graphics2d,
else draw polyline to graphics2d */
Challenges in Summary Comment
Generation for Methods
• Method name does not always suffice as
summary
Ex: main(), returnPressed()
• There is no notion of what constitutes an
important statement for a method summary
• A selected statement may have too little or
too much detail for a succinct summary
• Generated phrases should be
understandable without reading the
associated code and its context within the
method body
UD-Summarize: The Process
Method M signature and
body
Build required structural and linguistic
program representations
UD-GenPhrase: Generate Phrases for Selected
Statements and Combine Phrases
Summary comment for
M
Developing Basic NL Analyses
Search
Exploration
Understanding …
Software Maintenance Tools
NLPA
Natural Language Analysis
Word splitting
Word relations
(synonyms,
antonyms, …
Part of speech
tagging
Abbreviations…
Illustrating some issues:
Extracting Clues from Signatures
1.
2.
3.
4.
Split Name into Words
Part-of-speech tag method name
Chunk method name
Identify Verb and Direct-Object (DO)
public UserList getUserListFromFile( String path ) throws IOException {
POS Tag
get
get
try {
User
List
From
<verb>
<adj>
<noun>
<prep> File <noun>
File tmpFile
= new File(
path );
return parseFile(tmpFile);
Chunk
User List
From File
<verb
phrase>
<noun
phrase>
<prep phrase>
} catch( java.io.IOException e ) {
throw new IOrException( ”UserList format issue" + path + " file " + e );
•
•
•
Automatic Abbreviation
Expansion
Don’t want to miss relevant
code with abbreviations
Given a code segment, identify character sequences that are short forms
and determine long form
non-dictionary word
no boundary
Approach: Mine expansions from code [MSR 08]
1.Split Identifiers:
2.Identify non-dictionary
words
3.Determine long form
Issues with a
Simple Dictionary Approach
• Manually create a lookup table of common
abbreviations in code
- Vocabulary evolves over time, must maintain
table
- Same abbreviation can have different
Control
Flow
Graph
expansions depending
on
domain
AND context:
?
cfg
Context-Free Grammar
configuration
configure
Overview: Research Projects
Natural Language
Analysis of Programs
Giri, Xiaoran
Shanker, Hill
Green Software
Engineering
Clause
Winbladh
Testing Web
Applications
Program Analysis
Sprenkle
Compiler Technology
Optimizing
Cluster
Parallel Programs
Antony,
Swany
Identifying
Code vulnerabilities
To Malware
Software Tools…………..Testing…..........Compilers….……Parallel Computing
What are we doing now?
• Emily
– Developing a general software word usage model
– Implementing SWUM
– Evaluating for search
• Giri
– Developing techniques for automatic comment generation
• Which statements to include in summary content?
• How to generate phrases for that content?
– Providing feedback on SWUM refinement
• Divya
– Analyzing specific Java statement structures
– Developing templates for comment phrase generation
What have we learned overall?
• Evaluation studies indicate
Natural language analysis has far more potential to
improve software maintenance tools than we initially
believed
• Existing technology falls short
Synonyms, collocations, morphology, word
frequencies, part-of-speech tagging, AOIG
• Keys to further success
Improve recall
Extract additional NL clues
Another of our Tools: Dora
the Program Explorer*
Query
Natural Language Query
• Maintenance request
• Expert knowledge
• Query expansion
Program Structure
• Representation
• Current: call graph
• Seed starting point
Dora
Relevant Neighborhood
• Subgraph relevant to query
* Dora
Relevant
Neighborhood
comes from exploradora, the Spanish word for a female explorer.
Download