Combining Bibliometric and Knowledge Elicitation Techniques to Map a Knowledge Domain Katherine W. McCain*, June M. Verner, Gregory W. Hislop, William Evanco, & Vera Cole. College of Information Science & Technology Drexel University KATE'S PHILADELPHIA BRAND BIBLIOMETRICS PHILADELPHIA brand Bibliometrics Organizations ISI: Gene Garfield, Henry Small Drexel: Belver Griffith, Howard White, Chaomei Chen, Xia Lin, Carl Drott, Jackie Mancall, and a host of grad students Center for Research Planning: Dick Klavans, Len Simon Major themes: citation analysis/core literatures; aging of scholarly literatures; single period and longitudinal studies of scholarly literatures and fields; real-time, on-the-fly mapping of literatures, fields, paradigm shifts, vocabulary structures, etc.; bibliometric applications in collection management, competitive intelligence, institutional evaluation, etc. AGENDA Introduction: Domain analysis & software engineering Mapping methods: Author Cocitation Analysis Knowledge Elicitation – card sorting Results ACA clusters & map PFNet author network Card sorting clusters & map Comparisons of ACA and KE results Conclusions DOMAIN ANALYSIS SYSTEMS ANALYSIS: the task of identifying the operations and objects needed to specify information processing in a particular application domain INFORMATION SCIENCE: the study of the field (knowledge domain) as a thought or discourse community. It focuses on such topics as knowledge organization, structure, cooperation patterns, language and communication forms, information systems, and relevance criteria as a way of understanding these communities (Hjørland, B., & Albrechtsen, H. (1995) An Aside On DISCOURSE COMMUNITY A group (likely to be geographically dispersed) who share: a common public goal or goals a body of specialized knowledge mechanisms of intercommunication and participation a genre (e.g. scholarly journal) a specialized vocabulary Adapted from John Swales, Genre Analysis (1990 Cambridge) SOFTWARE ENGINEERING The establishment and use of sound engineering principles in order to obtain economically software that is reliable and works efficiently on real machines. the technological and managerial discipline concerned with systematic production and maintenance of software products that are developed and modified on time and within cost estimates DOMAIN ANALYSIS OF SOFTWARE ENGINEERING a study of the journal literature of software engineering, based on both author referencing patterns and index term assignments a study of the factors that affect the “visibility” of software engineering authors an INSPEC-based co-descriptor mapping of software engineering a conjoint study of the intellectual and cognitive structure of software engineering Citation content analysis of Brooks’ Mythical Man-Month TWO APPROACHES TO MAPPING SE BIBLIOMETRICS: Cocited author mapping uses the patterns of co-occurrence of authors’ names in reference lists to examine the intellectual structure of scholarly literatures and, by extension, the fields that produce those literatures KNOWLEDGE ELICITATION: the process of collecting from a human source of knowledge, information that is thought to be relevant to that knowledge. [Cooke] Card sorting: structural analysis of mental models elicited via sorting named cards into piles AUTHOR COCITATION ANALYSIS AUTHOR SELECTION: authors highly cited in texts and in the core SE literature = 60 authors selected for study COCITATION DATA GATHERED: cocitation counts retrieved from SCISEARCH, 1990 – 1997 ANALYSIS: Raw cocitation counts -- PFNets Correlation matrix – cluster analysis & multidimensional scaling 60 AUTHORS Abdel-Hamid, Tarek K. Fagan, M. E. Kaiser, G. E. Rombach, H. D. Albrecht, Allan J. Fenton, Norman E. Kemerer, C. F. Rumbaugh, James Basili, Victor R. Garlan, David Kernighan, Brian W. Selby, R. W. Beizer, Boris Ghezzi, Carlo Kitchenham, Barbara A. Shaw, Mary Biggerstaff, Ted J. Gilb, Tom Lehnman, M. M. Shepperd, M. Boehm, Barry W. Glass, Robert L. McCabe, Thomas J. Shneiderman, Ben Booch, Grady Goldberg, Adele Meyer, Bertrand Sommerville, Ian Brooks, Frederick P., Jr. Gomaa, Hassan Mills, Harlan D. Tichy, W. F. Card, David N. Grady, Robert B. Musa, John D. Tracz, Will Clarke, Lori A. Harrison, W. Myers, Glenford J. Wasserman, A. I. Coad, Peter Hoare, C.A.R Parnas, David L. Weiser, M. Curtis, Bill Humphrey, Watts S. Pfleeger, Shari L. Weyuker, Elaine J. David, Allan M. Jackson, Michael A. Pressman, Roger S. Wing, Jeanette, M. DeMarco, Tom Jacobson, Ivar Prieto-Diaz, R. Yourdon, Edward Dijkstra, Edsger W. Jones, T. Capers Ramamoorthy, C. V. Zave, Pamela Data Gathering for ACA CITATIONS * Mu ltiple form s of auth ors ' na me s were us ed in the se arch s trate gie s JONES T C 1 59 3 33 1 97 10 6 63 9 HUM PHREY W 74 WE YUKER E 3 2 66 5 9 1 29 2 24 9 23 0 15 8 36 3 14 1 27 6 HUM PHRE Y W Retrieval Strategy * 5 58 3 38 2 71 1 39 2 JONES TC Source Papers C A = BROO KS FP AND C A = PFLEEGER S GL ASS RL GLASS RL Pfleeger, S. .. Weyuker, E 3 67 1 18 2 88 DIJKSTRA E Jon es, T C.. 1 33 3 12 13 BROOKS FP DIJKST RA E Raw Cocitation Matrix 8 31 BASILI V BROOKS FP C A = BROO KS FP AND C W = JO NES TC * BASILI V Broo ks, FP ... Broo ks, FP ... ALBRECHT W 1 . 1 98 2 2 . 1 98 7 3. 4. 5 . 1 98 1 6. 7 . 1 97 3 8 . 1 98 4 9. 1 0. Analytical Tools for Raw Cocitation counts Analytical Tools for Proximity Matrix ACA ANALYSES Raw Cocitation Matrix PFNet: links nodes (authors) based on their single highest co-occurrence counts. The result is generally a network structure with some authors appearing as major foci (many links to others) representing specialties Correlation Matrix Hierarchical cluster analysis: 8 cluster solution identifies major subject clusters Multidimensional scaling: 2 dimensional map shows overall structure and major themes Knowledge Elicitation Methods Interviews and observation Process tracing (e.g. protocol analysis) Conceptual techniques Card sorting is a conceptual technique that can be done alone or combined with semi-structured interviews. Card Sorting Software engineers contacted via e-mail, invited to participate in study Task: sort cards bearing authors’ names into piles, label piles, complete short questionnaire As many piles as desired Piles with single authors Pile of “don’t know” or “aren’t software engineers 46 respondents participated in postal mail study (a few interviews) Don't Know Metrics Brooks, F. Formal Methods 1 DI JKSTRA 0 1 8 HOARE 1 2 5 37 JAC OBSON 0 0 30 4 3 7 28 0 0 1 0 0 0 2 1 1 DIJKST RA SOMMERVILLE 3 BASI LI PFLEEGER Cards were sorted into piles and labeled, based on respondents' perceptions 2 PF LEEGER 0 JACOBSON BOOCH HOARE 7 ABDEL-HAMI D BASILI BOOCH Stack of cards with authors' name sent to respondents with instructions RAW "CO-PILE" COUNTS Card Sorting Procedure CARD SORTING ANALYSES (correlation matrix) Hierarchical cluster analysis—8 cluster level Multidimensional scaling – 2 dimensional map LOW FORMA L Tracz • SW AR CHITECTURE/ SW REUSE P rieto-D iaz Jacobson Biggerstaff • • SW PROJECT MGT DeMarco Rumbaugh • • • • Coad Yourdon Abdel-H amid Kemerer Kai ser Booch• •• • • • Gomaa Boehm• P ressman OB JECT-ORIENTED • AN ALYSIS & DESIGN / • Wasserman •Brooks Humphrey PROGR AMMING Rombach SYSTEMS • Jackson • • Ki tchenham • AN ALYSIS Al brecht • • Davis Curtis Basili • L ehman • • Card • & DESIGN Gi lb • • • Grady• Fenton • Meyer• Shaw Shnei derman P fleeger MICR O MAC RO • LEVEL LEVEL Fagan • • Zave • Som merville • SW PER FORMANC E Shepperd • Goldberg• P arnas • • Ramamoort hy Selby • McCabe Myers Musa Garlan • • • Ghezzi Wing Gl ass SW METRIC S Jones Beizer• • • • • FORMA L APPR OACHES •Mil ls TO DEVELOPMEN T/ FORMA L METHODS • Harrison Kerni ghan• Weyuker • Tichy • Hoare• Di jkstra• Weiser • Cocitation Map of 60 Highly Cited Authors in Software Engineering 1990 - 1997 SW TESTIN G/ RELIA BILITY Clarke • TICH Y JAC OBSON KEMERER JAC KSON KAISER ALBRECHT RU MBAUGH GOLD BERG SHNEIDERMAN MU SA JONES YOURDON COAD KERN IGHAN CU RTIS LEH MAN DEMARCO MEYER KITCHENHAM BOOCH F AGAN HU MPH REY SHAW PRIETO-DIAZ BOEH M TRACZ HOARE Z AVE SOMMERVILLE GLASS GH EZ ZI MI LLS PRESSMAN BROOKS ABDEL-HAMID DI JKSTRA WING RAMAMOOR TH Y WASSERMAN GARLAN GI LB BIGGERSTAFF WEI SER PARN AS GOMAA GR ADY DAVI S PFLEEGER BASILI ROMBACH F ENTON SELBY CARD PFNet of Raw Cocitation Counts for 60 Software Engineering Authors 1992 - 1997. MC CABE WEY UKER MY ERS BEIZ ER SHEPPERD HARRI SON CLARKE Comparisons: ACA and KE Cluster similarity – most authors in similar clusters in terms of membership. Some differences in labeling There are differences between the way authors’ works are cited and the way the authors are perceived in terms of labels (known for textbook writing, cited for specific textbook content) CARD SORTING CLUSTERS JONES BASILI PFLEEGER ROMBACH SW METRICS CARD MCCABE GRADY FENTON KITCHENHAM HARRISON SELBY SHEPPERD KEMERER ALBRECHT COCITATION CLUSTERS BASILI PFLEEGER ROMBACH CARD SW METRICS MCCABE GRADY FENTON KITCHENHAM HARRISON SELBY SHEPPERD WEYUKER KEMERER ALBRECHT SE MANAGEMENT PROCESS MODELING BOEHM GILB CURTIS HUMPHREY ABDUL-HAMID LEHMAN BOEHM GILB SE PROJECT CURTIS MANAGEMENT HUMPHREY ABDUL-HAMID LEHMAN BROOKS CARD SORTING CLUSTERS FORMAL METHODS/ SW ARCHITECTURE OBJECT ORIENTED PROGRAMMING & DESIGN SE METHODOLOGIES/ SE TEXTS GARLAN RAMAMOORTHY DIJKSTRA HOARE PARNAS SHAW WING ZAVE GHEZZI KERNIGHAN BOOCH RUMBAUGH JACOBSON MEYER COAD GOLDBERG PRESSMAN SOMMERVILLE DEMARCO YOURDON WASSERMAN GOMAA JACKSON BROOKS GLASS MILLS MYERS DAVIS COCITATION CLUSTERS JONES DAVIS DIJKSTRA HOARE PARNAS SHAW WING ZAVE GHEZZI KERNIGHAN BOOCH RUMBAUGH JACOBSON MEYER COAD GOLDBERG FORMAL METHODS/ FORMAL APPROACHES OO ANALYSIS & DESIGN PROGRAMMING SHNEIDERMAN PRESSMAN SYSTEMS ANALYSIS SOMMERVILLE & DESIGN DEMARCO YOURDON WASSERMAN GOMAA JACKSON CARD SORTING CLUSTERS BIGGERSTAFF SW REUSE TRACZ PRIETO-DIAZ SW TOOLS & ENVIRONMENTS KAISER TICHY COCITATION CLUSTERS BIGGERSTAFF TRACZ SW ARCHITECTURE PRIETO-DIAZ SW REUSE KAISER TICHY GARLAN Comparisons: ACA and KE Map similarity – similar distribution of authors and clusters along X-axis (r=0.73) but not along Y-axis (r=-0.08) The most important structural theme in Software Engineering, the “micro macro” dimension, exists in both citation patterns and in perceptions of the field by citing authors. Along the Y-axis, citing patterns focus on the content of authors’ work while general perceptions include more aspects of the authors’ personae. Conclusions Boehm, Basili, Booch, and Hoare are central figures in the Software Engineering R&D literature; we can identify other authors as probable linkers between research specialties. The main organizing principle in SE is a continuum of activities related to the process of software design, development, and evaluation. Key specialties in Software Engineering (in the decade of the 1990s) included Object-Oriented Programming, Analysis & Design, Formal Methods, Software Reuse, Software Testing & Reliability, Software Process Management, and Software Metrics. Conclusions ACA (mapping, PFNets) and KE (cardsorting) provide complementary views of software engineering. KE methods increase our understanding of the domain by capturing subjects’ mental models of the domain and providing additional information about mapped entities ACA and KE provide useful cross-validation. The structure of the literature as seen through networks of author indebtedness (citation of previous work) is a good reflection of their mental models of the field, the place of the (cited) authors, and the relationships among their contributions