domain analysis

advertisement
Combining Bibliometric and Knowledge
Elicitation Techniques to Map a
Knowledge Domain
Katherine W. McCain*, June M. Verner, Gregory W. Hislop,
William Evanco, & Vera Cole.
College of Information Science & Technology
Drexel University
KATE'S
PHILADELPHIA
BRAND
BIBLIOMETRICS
PHILADELPHIA brand Bibliometrics
Organizations
ISI: Gene Garfield, Henry Small
Drexel: Belver Griffith, Howard White, Chaomei Chen, Xia
Lin, Carl Drott, Jackie Mancall, and a host of grad students
Center for Research Planning: Dick Klavans, Len Simon
Major themes:
citation analysis/core literatures;
aging of scholarly literatures;
single period and longitudinal studies of scholarly literatures
and fields;
real-time, on-the-fly mapping of literatures, fields, paradigm
shifts, vocabulary structures, etc.;
bibliometric applications in collection management,
competitive intelligence, institutional evaluation, etc.
AGENDA
Introduction: Domain analysis & software engineering
Mapping methods:
Author Cocitation Analysis
Knowledge Elicitation – card sorting
Results
ACA clusters & map
PFNet author network
Card sorting clusters & map
Comparisons of ACA and KE results
Conclusions
DOMAIN ANALYSIS
SYSTEMS ANALYSIS: the task of identifying the operations and
objects needed to specify information processing in a particular
application domain
INFORMATION SCIENCE: the study of the field (knowledge
domain) as a thought or discourse community. It focuses on such
topics as knowledge organization, structure, cooperation patterns,
language and communication forms, information systems, and
relevance criteria as a way of understanding these communities
(Hjørland, B., & Albrechtsen, H. (1995)
An Aside On DISCOURSE COMMUNITY
A group (likely to be geographically dispersed) who share:
a common public goal or goals
a body of specialized knowledge
mechanisms of intercommunication and participation
a genre (e.g. scholarly journal)
a specialized vocabulary
Adapted from John Swales, Genre Analysis (1990 Cambridge)
SOFTWARE ENGINEERING
The establishment and use of sound engineering
principles in order to obtain economically software that
is reliable and works efficiently on real machines.
the technological and managerial discipline concerned
with systematic production and maintenance of software
products that are developed and modified on time and
within cost estimates
DOMAIN ANALYSIS OF SOFTWARE
ENGINEERING
a study of the journal literature of software engineering, based on
both author referencing patterns and index term assignments
a study of the factors that affect the “visibility” of software
engineering authors
an INSPEC-based co-descriptor mapping of software
engineering
a conjoint study of the intellectual and cognitive structure of
software engineering
Citation content analysis of Brooks’ Mythical Man-Month
TWO APPROACHES TO MAPPING SE
BIBLIOMETRICS: Cocited author mapping uses the patterns of
co-occurrence of authors’ names in reference lists to examine
the intellectual structure of scholarly literatures and, by
extension, the fields that produce those literatures
KNOWLEDGE ELICITATION: the process of collecting from a
human source of knowledge, information that is thought to be
relevant to that knowledge. [Cooke]
Card sorting: structural analysis of mental models elicited via sorting
named cards into piles
AUTHOR COCITATION ANALYSIS
AUTHOR SELECTION: authors highly cited in texts and in the
core SE literature = 60 authors selected for study
COCITATION DATA GATHERED: cocitation counts retrieved
from SCISEARCH, 1990 – 1997
ANALYSIS:
Raw cocitation counts -- PFNets
Correlation matrix – cluster analysis & multidimensional scaling
60 AUTHORS
Abdel-Hamid, Tarek K.
Fagan, M. E.
Kaiser, G. E.
Rombach, H. D.
Albrecht, Allan J.
Fenton, Norman E.
Kemerer, C. F.
Rumbaugh, James
Basili, Victor R.
Garlan, David
Kernighan, Brian W.
Selby, R. W.
Beizer, Boris
Ghezzi, Carlo
Kitchenham, Barbara A.
Shaw, Mary
Biggerstaff, Ted J.
Gilb, Tom
Lehnman, M. M.
Shepperd, M.
Boehm, Barry W.
Glass, Robert L.
McCabe, Thomas J.
Shneiderman, Ben
Booch, Grady
Goldberg, Adele
Meyer, Bertrand
Sommerville, Ian
Brooks, Frederick P., Jr.
Gomaa, Hassan
Mills, Harlan D.
Tichy, W. F.
Card, David N.
Grady, Robert B.
Musa, John D.
Tracz, Will
Clarke, Lori A.
Harrison, W.
Myers, Glenford J.
Wasserman, A. I.
Coad, Peter
Hoare, C.A.R
Parnas, David L.
Weiser, M.
Curtis, Bill
Humphrey, Watts S.
Pfleeger, Shari L.
Weyuker, Elaine J.
David, Allan M.
Jackson, Michael A.
Pressman, Roger S.
Wing, Jeanette, M.
DeMarco, Tom
Jacobson, Ivar
Prieto-Diaz, R.
Yourdon, Edward
Dijkstra, Edsger W.
Jones, T. Capers
Ramamoorthy, C. V.
Zave, Pamela
Data Gathering for ACA
CITATIONS
* Mu ltiple form s of auth ors '
na me s were us ed in the se arch
s trate gie s
JONES T C
1 59 3 33 1 97 10 6 63 9
HUM PHREY W
74
WE YUKER E 3 2
66
5
9 1 29 2 24 9 23 0
15
8
36 3 14 1 27 6
HUM PHRE Y W
Retrieval Strategy *
5 58 3 38 2 71 1 39 2
JONES TC
Source Papers
C A = BROO KS FP
AND
C A = PFLEEGER S
GL ASS RL
GLASS RL
Pfleeger, S. ..
Weyuker, E
3 67 1 18 2 88
DIJKSTRA E
Jon es, T C..
1 33 3 12 13
BROOKS FP
DIJKST RA E
Raw Cocitation
Matrix
8 31
BASILI V
BROOKS FP
C A = BROO KS FP
AND
C W = JO NES TC
*
BASILI V
Broo ks, FP ...
Broo ks, FP ...
ALBRECHT W
1 . 1 98 2
2 . 1 98 7
3.
4.
5 . 1 98 1
6.
7 . 1 97 3
8 . 1 98 4
9.
1 0.
Analytical Tools for Raw Cocitation counts
Analytical Tools for Proximity Matrix
ACA ANALYSES
Raw Cocitation Matrix
PFNet: links nodes (authors) based on their single highest
co-occurrence counts. The result is generally a network
structure with some authors appearing as major foci (many
links to others) representing specialties
Correlation Matrix
Hierarchical cluster analysis: 8 cluster solution identifies
major subject clusters
Multidimensional scaling: 2 dimensional map shows overall
structure and major themes
Knowledge Elicitation Methods
Interviews and observation
Process tracing (e.g. protocol analysis)
Conceptual techniques
Card sorting is a conceptual technique that can be done alone or
combined with semi-structured interviews.
Card Sorting
Software engineers contacted via e-mail, invited to
participate in study
Task: sort cards bearing authors’ names into piles,
label piles, complete short questionnaire
As many piles as desired
Piles with single authors
Pile of “don’t know” or “aren’t software engineers
46 respondents participated in postal mail study (a few
interviews)
Don't
Know
Metrics
Brooks, F.
Formal
Methods
1
DI JKSTRA
0
1
8
HOARE
1
2
5
37
JAC OBSON
0
0
30
4
3
7
28
0
0
1
0
0
0
2
1
1
DIJKST RA
SOMMERVILLE
3
BASI LI
PFLEEGER
Cards were sorted into piles
and labeled, based on
respondents' perceptions
2
PF LEEGER
0
JACOBSON
BOOCH
HOARE
7
ABDEL-HAMI D
BASILI
BOOCH
Stack of cards with authors' name
sent to respondents with instructions
RAW "CO-PILE" COUNTS
Card Sorting Procedure
CARD SORTING ANALYSES
(correlation matrix)
Hierarchical cluster analysis—8 cluster level
Multidimensional scaling – 2 dimensional map
LOW FORMA L
Tracz • SW AR CHITECTURE/
SW REUSE
P rieto-D iaz
Jacobson
Biggerstaff •
•
SW PROJECT MGT
DeMarco
Rumbaugh •
•
•
• Coad
Yourdon
Abdel-H amid Kemerer
Kai ser
Booch•
••
•
•
•
Gomaa
Boehm•
P ressman
OB JECT-ORIENTED
•
AN ALYSIS & DESIGN /
• Wasserman
•Brooks
Humphrey
PROGR AMMING
Rombach
SYSTEMS
•
Jackson
• • Ki tchenham
•
AN
ALYSIS
Al
brecht
•
•
Davis
Curtis
Basili
•
L ehman
•
• Card
•
& DESIGN
Gi lb
•
•
•
Grady• Fenton
•
Meyer•
Shaw
Shnei derman
P fleeger
MICR O
MAC RO
•
LEVEL
LEVEL
Fagan
•
•
Zave •
Som
merville
•
SW PER FORMANC E
Shepperd
•
Goldberg•
P arnas
•
• Ramamoort hy
Selby
•
McCabe
Myers
Musa
Garlan
•
•
•
Ghezzi Wing
Gl ass
SW METRIC S
Jones
Beizer•
•
•
•
•
FORMA L APPR OACHES
•Mil ls
TO DEVELOPMEN T/
FORMA L METHODS
• Harrison
Kerni ghan•
Weyuker
•
Tichy
•
Hoare•
Di jkstra•
Weiser
•
Cocitation Map of 60 Highly Cited
Authors in Software Engineering
1990 - 1997
SW TESTIN G/
RELIA BILITY
Clarke
•
TICH Y
JAC OBSON
KEMERER
JAC KSON
KAISER
ALBRECHT
RU MBAUGH
GOLD BERG
SHNEIDERMAN
MU SA
JONES
YOURDON
COAD
KERN IGHAN
CU RTIS
LEH MAN
DEMARCO
MEYER
KITCHENHAM
BOOCH
F AGAN
HU MPH REY
SHAW
PRIETO-DIAZ
BOEH M
TRACZ
HOARE
Z AVE
SOMMERVILLE
GLASS
GH EZ ZI
MI LLS
PRESSMAN
BROOKS
ABDEL-HAMID
DI JKSTRA
WING
RAMAMOOR TH Y
WASSERMAN
GARLAN
GI LB
BIGGERSTAFF
WEI SER
PARN AS
GOMAA
GR ADY
DAVI S
PFLEEGER
BASILI
ROMBACH
F ENTON
SELBY
CARD
PFNet of Raw Cocitation Counts
for 60 Software Engineering Authors
1992 - 1997.
MC CABE
WEY UKER
MY ERS
BEIZ ER
SHEPPERD
HARRI SON
CLARKE
Comparisons: ACA and KE
Cluster similarity – most authors in similar
clusters in terms of membership. Some
differences in labeling
There are differences between the way authors’
works are cited and the way the authors are
perceived in terms of labels (known for textbook
writing, cited for specific textbook content)
CARD SORTING CLUSTERS
JONES
BASILI
PFLEEGER
ROMBACH
SW METRICS
CARD
MCCABE
GRADY
FENTON
KITCHENHAM
HARRISON
SELBY
SHEPPERD
KEMERER
ALBRECHT
COCITATION CLUSTERS
BASILI
PFLEEGER
ROMBACH
CARD
SW METRICS
MCCABE
GRADY
FENTON
KITCHENHAM
HARRISON
SELBY
SHEPPERD
WEYUKER
KEMERER
ALBRECHT
SE MANAGEMENT
PROCESS MODELING
BOEHM
GILB
CURTIS
HUMPHREY
ABDUL-HAMID
LEHMAN
BOEHM
GILB
SE PROJECT
CURTIS
MANAGEMENT
HUMPHREY
ABDUL-HAMID
LEHMAN
BROOKS
CARD SORTING CLUSTERS
FORMAL
METHODS/
SW ARCHITECTURE
OBJECT ORIENTED
PROGRAMMING &
DESIGN
SE METHODOLOGIES/
SE TEXTS
GARLAN
RAMAMOORTHY
DIJKSTRA
HOARE
PARNAS
SHAW
WING
ZAVE
GHEZZI
KERNIGHAN
BOOCH
RUMBAUGH
JACOBSON
MEYER
COAD
GOLDBERG
PRESSMAN
SOMMERVILLE
DEMARCO
YOURDON
WASSERMAN
GOMAA
JACKSON
BROOKS
GLASS
MILLS
MYERS
DAVIS
COCITATION CLUSTERS
JONES
DAVIS
DIJKSTRA
HOARE
PARNAS
SHAW
WING
ZAVE
GHEZZI
KERNIGHAN
BOOCH
RUMBAUGH
JACOBSON
MEYER
COAD
GOLDBERG
FORMAL METHODS/
FORMAL APPROACHES
OO ANALYSIS
& DESIGN
PROGRAMMING
SHNEIDERMAN
PRESSMAN
SYSTEMS ANALYSIS
SOMMERVILLE & DESIGN
DEMARCO
YOURDON
WASSERMAN
GOMAA
JACKSON
CARD SORTING CLUSTERS
BIGGERSTAFF
SW REUSE
TRACZ
PRIETO-DIAZ
SW TOOLS &
ENVIRONMENTS
KAISER
TICHY
COCITATION CLUSTERS
BIGGERSTAFF
TRACZ
SW ARCHITECTURE
PRIETO-DIAZ SW REUSE
KAISER
TICHY
GARLAN
Comparisons: ACA and KE
Map similarity – similar distribution of authors and
clusters along X-axis (r=0.73) but not along Y-axis
(r=-0.08)
The most important structural theme in Software
Engineering, the “micro   macro” dimension, exists
in both citation patterns and in perceptions of the field
by citing authors. Along the Y-axis, citing patterns focus
on the content of authors’ work while general
perceptions include more aspects of the authors’
personae.
Conclusions
Boehm, Basili, Booch, and Hoare are central figures in the
Software Engineering R&D literature; we can identify other
authors as probable linkers between research specialties.
The main organizing principle in SE is a continuum of activities
related to the process of software design, development, and
evaluation.
Key specialties in Software Engineering (in the decade of the
1990s) included Object-Oriented Programming, Analysis &
Design, Formal Methods, Software Reuse, Software Testing &
Reliability, Software Process Management, and Software
Metrics.
Conclusions
ACA (mapping, PFNets) and KE (cardsorting) provide
complementary views of software engineering. KE methods
increase our understanding of the domain by capturing subjects’
mental models of the domain and providing additional
information about mapped entities
ACA and KE provide useful cross-validation. The structure of the
literature as seen through networks of author indebtedness
(citation of previous work) is a good reflection of their mental
models of the field, the place of the (cited) authors, and the
relationships among their contributions
Download