This is a sample of conceptual or thematic discovery and analysis.
The researcher hypothesizes that, among a set of interviews, which includes more than one interviewee, there are common themes or concepts, correlated with particular key words or phrases, which could be discovered using KH Coder and shown to be closely correlated. Certain of the key words, phrases, or concepts could be correlated with positive, negative, or neutral “affects” (see the tutorial about “coded rules”) or “outcomes”. Further, there may be correlation with the interviewees' gender, age, work place, job(s), and so on, which we call “variables”.
Sample applications include product reviews; interview analysis between interviewees and over time; news article analysis; and so on.
The researcher gathers their data as word processing files. The interviews may cover more than one topic (Topic A; Topic B; Topic C;...). Each Topic may have more than one interviewee (Alice,
Bob, Charlie, Diane...) and any interviewee may have been interviewed over time (1 Apr; 1 May; 1
June...2011; 2012; 2013; 2014...). The interview transcripts (files) have certain fields (variables) in them that contain some of the items of interest, such as gender, age, etc., each set off by a field name or particular string.
Nomenclature: “Characters” are alphanumeric or punctuation characters, but not including blank spaces. A “string” is a series of characters which may include punctuation, blank spaces, and alphanumeric characters; strings have no particular delimiter. A “word” is a string of characters, not including blanks spaces or punctuation marks (nor HTML header tags); words are delimited by one or more blank spaces or punctuation marks. A “phrase” is an array two or more words. A “sentence” is an array of words, delimited at the beginning by one or more blank spaces and at the end by a period plus one or more blank spaces. A “line” or a “paragraph” is an array of one or more sentences, delimited by a newline at the end (which indicates the beginning of the next paragraph). An “interview” is a plain text file, containing one or more paragraphs, perhaps HTML header tags and perhaps “variables”.
Variables are specific strings that indicate the sentence or paragraph contains specific information that can be compared across interviews. Interviews may be assembled into “cases” that contain at least one interview.
At a more abstract level: interviews may contain one or more “questions” that are the same between interviews (Question 1; Question 2;...). Interviews may be by the same or different interviewees. A “case” is at least one interview, with at least one question. Interviews that contain more than one question could be split into “questions”, but retain the “interview” fields (interviewee; gender; age; etc.). “Cases” could be assembled by interviewee; per question; per date; or any combination of similar instances.
1.
The researcher must decide how they wish to approach the data: a)
Does the researcher want to compare “all” interviews at once, to surface interesting concepts? (Probably a good first step; more granular analysis comes later.) b) Perhaps the researcher wishes to compare an interviewee's responses over time? Then the interviews must be “tagged” with a variable about the interviewee, date or time. c) Each interview should be set off by an HTML header tag. Each interview should contain a
“date”, “interviewee”, and other variables that are of interest, designated by specific, common strings. Each paragraph in each interview should be delimited by a newline. d) If the intent is to compare responses to questions, per question, both across and between interviewees, and perhaps over time, then the data must be prepared with variables for
“question”; “interviewee”; and “date”.
2.
The researcher prepares the data, according to the Data Preparation section, putting HTML header tags and their contents on one line; “variables” on one line; paragraphs on individual lines. The researcher “normalizes” (replaces) all sentence endings with the same delimiter: a punctuation mark and one or two blank spaces, but always the same number of blank spaces. a) Everything is saved as Latin1 (ISO-8859-1) or US ASCII coding, removing tabs, control characters, and other special characters. b) Punctuation marks might include periods, exclamation points or question marks. c) For example: the researcher may decide that all sentences shall end with a punctuation mark and two blank spaces. d) If the interviews include more than one question, the researcher sets the questions off by a variable (“Question 1; Question 2;...) e) It might be recommended to begin with one file per interview; the researcher can then
“split” the interviews by question; assemble the responses over time, interviewee, or over question; and analyze from those perspectives.
3.
Analysis: (assuming there are more than two questions per interview and interviews occur over time) a)
To surface initial, “interesting” themes or concepts for analysis:
1) Ensure each interview has a different variable per interviewee, per interview, per question, and per date, using HTML header tags and variables.
2) Assemble all interviews into one text file, parsed by the HTML tags and variables.
3)
Create a “tag cloud” or “frequency count” or “proximity” plot of the key words or phrases, just to see if there is sufficient correlation to continue the analysis.
4) Analyze for one correspondence or co-occurrence network for all the whole, large file.
See what is highly counted; correlated; by words or key phrases. Those are the key concepts.
5) Revisit the correspondence or co-occurrence analysis: Analyze by the key words of each question, across interviews, interviewees, etc.
6) Now look for key words or phrases at lower granularity. b) To compare responses over time:
1) Separate (parse) the response to each question: save responses, per interviewee, per date, in separate text files. Delimit each interviewee's response, per date, via either an
HTML header tag or a variable (interviewee's name).
2) Assemble each interviewee's responses, over time, to the same question, in one text file.
3) Analyze over time, and compare results across interviewees. c) To compare responses between interviewees:
1) Parse the responses, to each question, per interviewee.
2) Assemble the responses, in different files: per interviewee over time; per question, between interviewees, at the same relative time; and a final file, across interviewees and across time. Set them off by either the “variable” of the interviewee's name, question, date and time, or similar designator.
3) Analyze responses, per question, between interviewees and over time or other variables.