BASICS OF CONTENT ANALYSIS Presented by Natalia Tomlin Assistant Professor and Technical Services Librarian B. Davis Schwartz Memorial Library, LIU Post DEFINING CONTENT ANALYSIS • “Summarizing, quantitative analysis of messages that relies on the scientific method” (Neuendorf, 2002) • “Technique for the objective, systematic, and quantitative description of manifest content of communication” (Berelson, 1952) • “Research technique for making replicable and valid inferences from texts (or other meaningful matter) in the context of their use” (Knippendorff, 2004) • “Procedures for defining, measuring, and analyzing both the substance and meaning of texts or messages or documents” (Beck and Manuel, 2008) Kimberly Neuendorf Klaus Knippendorff Stone, Dunphy, Smith, and Ogilvie, 1966 CONTENT ANALYSIS: QUANTITATIVE OR QUALITATIVE? • Quantitative – focus on numerically measurable objectives Research questions are stated as hypotheses Use of inferential statistics • Qualitative – focus on how the things occur, how people think about processes, exploratory research, more holistic, natural approach, use of language as a primary data, researcher is a part of the project. Use of verbal categories and descriptive statistics Content analysis may be quantitative or qualitative BRIEF HISTORY OF CONTENT ANALYSIS • XVII century – analysis of texts by Church • Speed (1893) “Do newspapers now give the news?” – content analysis of New York newspapers • 1930s-1940s – earlier content analysis studies by sociologists • World War II – propaganda analysis • 1950s – use of content analysis by psychologists, anthropologists, historians, linguists, educators, psychiatrists, literary critics, library science • 1958 – first computer-aided content analysis • Evolution from word count to discovering concepts CONTENT ANALYSIS: AREAS OF IMPLEMENTATION • Written materials : books, journals, official documents, advertisements, speeches, conversations • Visual items – films, clothing, work of arts • Sound texts, operas, musicals, lyrics • Combinations of communication content: blogs, webpages, performance art, computer programs • Fields: marketing, literature, gender studies, political science, psychology etc. MANY PURPOSES OF CONTENT ANALYSIS • Disclose international differences in communication content • Audit communication against objectives • Code open-ended questions in survey • Determine psychological state of a person or group • Determine existence of propaganda • Reveal focus of individual groups • Reflect cultural patterns of groups • Describe trends in communication content (Berelson, 1952) Content Analysis Data Collection Technique Research Methodology EXAMPLES OF CONTENT ANALYSIS STUDIES • Walker (1975) – differences and similarities in American black and white popular song lyrics, 1962-1973. • Aries (1973) – socialization differences in male, female, and mixed-sex small groups • Adams and Shriebman (1978) – content analysis of news media • Graham, Kamins, Oetomo (1993) – analysis of advertisements in Japan and Germany • Horton (1986) – analysis of young adult books • Kaur-Kasior (1987) – treatment of culture in greeting cards CONTENT ANALYSIS IN LIS • • • • • • • • Turner and Beck (2002) - repair strategies of remote users searching the online catalog Sproles and Ratlege (2004) - librarian job ads Koufogiannakis, Slater, Coumley (2004) - content analysis of librarianship research Kuchi (2006) - academic libraries websites Tancheva (2003) - analysis of online tutorials Aharony (2009) - blogs of the librarians LIS thesis and dissertation research (1946-1963) 62% of dissertations used content analysis Koufogiannakis and Slater (2004) – content analysis is one of top 5 preferred research methods in LIS Content Analysis in LIS Delivery of library services Resourcespecific studies Studies of profession itself Content Analysis Conceptual Existence and frequency of concepts Relational Relationship among concepts RESEARCH QUESTIONS • Research question Do technical services jobs require more advanced technology skills than reference services jobs? (observed reality) • Hypothesis : Technical Services jobs require more advance information technology skills (prediction of relationship between two variables) • Importance of conceptual definitions of variables (exhaustive and mutually exclusive; previously developed or new) Coding is based on definitions • *** Some content analysis studies may state hypothesis but do not employ tests for statistical significance CONTENT ANALYSIS DESIGN • • • • • • Unitizing Sampling Coding Reducing Inferring Narrating Data making UNITS (WHAT IS TO BE OBSERVED • Sampling units (issues of newspapers, blogs, individual speeches – what to include or exclude in the analysis,) • Recording units (blog posts, specific newspaper column) • Context units-what can be communicated within the text (words, phrases, pictures, ideas) SAMPLING Sampling – ability to generalize the properties found in a sample to the population from which the sample is drawn • Random - Simple random (random numbers generator) - Systematic (every n-th element is chosen) - Stratified (division of the population into different subgroups and then random selection the final subjects proportionally from the different strata) • Non-Random - Purposive (selected based on the knowledge of a population and the purpose of the study) - Convenience CODING • Define the recording units (=unit of analysis) (word, sentence, theme, paragraph, whole text (text must be short) • Define categories (variables) (mutually exclusive and how broad/narrow categories will be) • Provide conceptual definitions for variables • Test the scheme on a sample of the text • Assess accuracy and reliability • Revise coding rule if needed • Test again • Code all text • Assess reliability and accuracy (2nd time) EXAMPLE OF CODEBOOK Unit of analysis : individual job ad posting Conceptual definition: each academic library job ad posted between 2012 and 2013 on CHE website • • • • • • • • Job number Job posting date Job category Type of library Degree requirement Professional experience Preferred degrees Faculty status CODEBOOK EXAMPLE Job number 001 002 Job posting date 1.01.01.2013-01.31.2013 2.02.01.2013-02.29.2013 Job category 1.Administrative 2.Instructional 3.Technical Services Type of library 1.Research library 2.Community college 3.4-year college Degree requirement 1.MLS only 2.MLS and one more masters degree 3.Other Professional experience 1.None 2.1+ years 3.3+ years CODEBOOK CREATION • Use of established conceptual definitions is adding validity to the study (previous studies; established sources such as ODLIS) • Exploratory studies are more likely to create their own conceptual definitions • Codebook serves as a guide for coders and a record of the project • Codebook needs to be refined during the pretesting • Better to have too many categories than too few ASSIGNING ENUMERATIONS TO VARIABLES • Nominal – numbers only used for labeling purpose, they have no true value. Example: type of library • Ordinal – rank ordered • Interval – numbers represent distance between categories within ranking. Example: Years of experience • Ratio – always has ‘’0” value . Example: Age QUALITY CONTROL • • • • Validation of coding schema through inter-coder reliability test Acceptable inter-coder reliability levels vary Reliability test is done at pilot stage and the end of the study and the results of the latter are reported in the study. Reliability problem can be addressed by additional training for coders, revising coding instructions, combining and separating categories. Calculating the agreement: nominal scale – percentage; Cohen’s kappa and Scott’s pi, Pearson’s correlation are used for scales beyond nominal REPORTING FINDINGS • Reporting in raw numbers, percentages, or frequencies • Must directly address research questions • Format: bars, charts, tables • Test of statistical significance (Chi-square) = associations between nominal variables ANALYSIS OF THE STUDY (1) “Libraries and public perceptions: A comparative analysis of the European press : Methodological insights” by Anna Galluzzi (2014) “The analysis of newspapers has been figured out as an alternative method to measure the relevance and the public perception of libraries” “The research aims at quantifying and qualifying the presence of issues concerning libraries in the European press over the last years …in order to answer the following research questions: • which are the most discussed topics concerning libraries and • have they changed over the last years? • are there any significant differences between the European countries in the debate about libraries? • are there any significant differences between the European newspapers in the debate about libraries” “chronological span covered by the research is five years, from 2008 to 2012. This choice was made because 2008 is generally considered the starting point of the economic crisis which is still deeply affecting the Western economies and political scenarios” “Countries taken into account are the United Kingdom, France, Spain and Italy, since they are considered representative of different areas and cultural traditions in Europe” “second selection was made among the numerous print newspapers published, with the objective of choosing two titles for each country according to the following basic criteria. The two newspapers were picked among those of national relevance, the most widespread and the oldest in each country, avoiding - if possible - those officially representing political parties and the radical ones. The selected newspapers are the following: 1.The United Kingdom: The Times and The Guardian 2.France: Le Figaro and Le Monde 3.Spain: El Mundo and El País 4.Italy: Corriere della Sera and La Repubblica The keywords used as query parameters in the full text search were ”librar*” and ”bibliot*” The articles retrieved using the abovementioned parameters are 41,611. After the retrieval of the articles responding to the query parameters, the second step was to select the pertinent ones, i.e.those articles which concern libraries in a proper sense The pertinent articles are 3,659. “After the selection, a text and content analysis of the articles was carried out. Though aware of the many advantages (speed, completeness, objectivity and precision) of an automatic processing, the risk to think that the whole analysis could be delegated to computer software, instead of using them to speed up and enhance it, was given a special credit. the analysis was carried out manually and no text analysis software was used, starting from the firm belief that no software can replace human reasoning. A certain degree of subjectivity was considered somewhat inevitable and acceptable” “First of all, each article was identified with a univocal name and an ExcelTM worksheet was prepared to host the results of the coding. Then, the articles were analyzed and coded. At the beginning, the texts were carefully reviewed and all concepts and ideas were annotated as they appeared and then grouped.” Variables/Coding categories 1.country 2.newspaper title 3.year of publication 4.prevalence or not of libraries as subject of the article 5.type of library considered: Public, National, Academic, School, Special/Specialized, No specification or more than one type 6.main topic of the article: Mission/Roles, Conservation/Holdings/Catalogue, Digitization/Digital libraries, History, Reading/Marketing, Politics/Strategy/Management, Library closures/Budget cuts, Internet/Ebook/Technology, Services/Users, Staff/Recruitment, New libraries/New buildings, Acquisitions/Open access, Buildings/Architecture. 7.the newspaper section where the article is published: Opinions/Letters/Debates, Culture/Education, In brief, Cities/ /National news, World/International news, Market/Economy/Business, Society, Science, Other ANALYSIS OF THE STUDY (2) “The Role of Online Videos in Research Communication: A Content Analysis of YouTube Videos Cited in Academic Publications” by Kousha, Thelwall, and Abdoli (2012) “This article explores the extent to which YouTube videos are cited in academic publications and whether there are significant broad disciplinary differences in this practice” Research questions: “How frequently are YouTube videos cited in academic publications and has frequency of use declined at any stage since the birth of YouTube (2005–2011)? What types of YouTube videos are commonly cited in research articles? Are there significant broad disciplinary differences in citing online videos?? Data collection Researches “extracted URL citations to YouTube videos from academic publications indexed by Scopus from 2005 to 2011 across four broad disciplines: the sciences, medicine and health sciences, social sciences, and arts and humanities. We then viewed a sample of the cited videos and classified their contents using a specially designed classification scheme” “viewed 551 randomly sampled cited videos from research articles (omitting reviews, conference papers, editorials, letters, and notes) from the Scopus searches. In many cases, we also read the descriptions of, and some comments on, the YouTube videos (if available) and searched for a lecturer or speaker biography to better understand video contexts. The first and third authors separately conducted an initial content analysis of the videos based on a primary classification scheme derived from a previous classification of YouTube videos tweeted by academics (Thelwall et al., in press). To reach a reasonable degree of agreement on the classification procedure, the two coders first crosschecked the categorization process for a sample of 80 videos from different subject areas, discussing the coding of different types of videos. Examples of the categories they used: “Demonstration of a natural or formal science phenomenon: This subclass includes videos with an apparently scientific theme such as a real-time lab experiment in robotics Natural or formal science documentary: This subclass includes documentaries (usually with narration and edited with different types of shots) about natural or formal science Natural or formal science academic lectures: This group includes natural or formal science lectures, speeches, and talks by academics in conferences” Limitations: “Another practical limitation was the complex and subjective issue of coding video contents. We discussed the coding system after the initial classification process and modified it several times to get general agreement. For instance, we first merged television shows and news-related videos into one class, but subsequently split them into two subclasses because shows are more related to arts and humanities whereas the news is more associated with the social sciences (e.g., political science and journalism). Furthermore, some scientific demonstrations also can be used for academic education, and, in rare cases, it was difficult to recognize whether they were created for scientific demonstrations, entertainment, or teaching.” ANALYSIS OF THE STUDY (3) “An analysis of American academic libraries' websites: 2000-2010 “ by Noa Aharony (2012) “It is …interesting to trace the changes and developments that academic library websites have undergone over the last ten years, as expressed through the library websites themselves” “research questions are: Is there a difference between the content of academic library websites in the year 2000 and in the year 2010? What are the LIS current trends and tendencies being expressed through those academic library websites?” Conceptual definition: “According to [23] McGillis and Toms (2001), a library website reflects its virtual public face, acting as a front door to the collections, services, and, to an extent, its staff” “The first phase of the investigation involved choosing academic library homepages, which appear both on a current webpage and in the Internet Archive, to be included in the sample. These were located by examining the Association of College and Research Libraries (ACRL) accredited LIS schools, numbering 57. A total of 31 academic libraries were selected from this list based on the following criteria: The library has a current homepage. The library homepage appears in the Internet Archive in the year 2000. Four out of the 31 libraries were not found in the Internet Archive in 2000, so data were collected from the first year that they appear in the Internet Archive” Time frame: “The year 2000 was chosen because: firstly, while the Internet Archive began archiving its documents in 1996, most of the academic library content is found from the year 2000 onwards; and secondly, a ten year period was deemed suitable for tracing the changes, developments, and trends of the last decade, which contained many20 March 2014 Page 3 of 11 ProQuest technological innovations and conceptual changes in the field of library and information science”. She conducted “content analysis of academic library websites in the two periods, based on [25] Qutab and Mahmood's (2009) website content analysis and modified for the purpose of the current study. The modified checklist includes 42 items divided into eight categories: -site description -currency -website aids and tools -library general information -library resources services -links to e-resources -value added services.” “The final percentage of agreement for all coding decisions was 89 per cent, which suggests that the coding classification used was reliable” CONTENT ANALYSIS OF THE INTERVIEW TRANSCRIPT (4) • Interviews: recorded and transcribed • Team of 4 coders (2 groups ) will work on assigned number of interviews • Print-outs of the interview text need to be read and the concepts highlighted • Each group needs to meet and agree on the highlighted concepts reporting percentage of agreement • All four coders will meet and discuss all concepts and group them into larger categories ADVANTAGES AND DRAWBACKS • Operates directly with text/transcripts of communication • Can use both- qualitative and quantitative operations • Allows research of the historical documents • Is an unobtrusive , nonreactive research technique • Not geographically limited • Time-consuming • Reveals the content but not the content significance • Can not make conclusions about motives, meanings, or effect of the messages • Some texts (websites) have tight data collection periods QUESTIONS? SPECIAL THANKS TO MORGAN GELBER. MY TALENTED AND GIFTED DAUGHTER WHO NEVER GETS THE CREDIT SHE DESERVERS.