There two kinds of text analysis. One is qualitative and the other is quantitative. The qualitative kind takes a corpus of texts (e.g., transcribed interviews; newspaper articles) and tags bits of text with codes. For example, a section of an interview where a man talks about his marriage might be tagged "marriage". The process is subjective, inductive, and iterative -- you just keep pawing the text getting deeply familiar with it. The end result is typically a sense of understanding or, if taken to a formal level, a theory. Grounded theory methodology is an example of this approach.
The quantitative kind of text analysis is also known as content analysis. In content analysis (regardless of how the codes were obtained in the first place) you have a fixed code-book that lists of all the codes and their meanings. Then a set of comparable texts, typically representing the units of study, such as patients, or students or employees, are coded using the codebook. This is very much like measurement: for each case (text) we measure a set of variables (codes). Typically, the coding is done by several coders who are unfamiliar with the specifics of the research objectives, and an assessment of inter-coder reliability is made in other to assess the quality of the measuring.
How does content analysis differ from the grounded theory approaches?
How do you go about creating a codebook?
How do you use multiple coders for increased reliability?
Primary Readings
Gersick, Bartunek & Dutton. 2000. Learning from Academia: The importance of relationships in professional life. Academy of Management Journal 43(6): 1026-1044. [^pdf],
Borgatti. Content analysis process [html]
Borgatti. What to look for when open-coding text [html]
Corbin, Juliet and Anselm Strauss. 2007. Basics of Qualitative Research. Sage.
Goodwin, Charles (1994). "Professional Vision." American Anthropologist 96(3):606-633 [pdf]
Borgatti. Notes on Professional Vision [yes, we read this article before. read it again]
Secondary Readings
Jehn, Karen "Ettie". 1997.A qualitative analysis of conflict types and dimensions in organizational groups. Administrative Science Quarterly 42: 530-557. [^pdf]
Cohen's Kappa
Calculator for cross-tabulated ratings [xls]
Overview of content analysis [html]
Achieving inter-coder reliability [html]
Datasets
Slides
content analysis2.pdf
themesmemes2.pdf