Content Analysis process

Post date: Nov 13, 2010 1:43:36 PM

The term "content analysis" is starting to lose its specific meaning. People seem to use it to refer to just about any kind of text analysis. But here I use it in its traditional sense, which refers to a kind of systematic measurement of a set of comparable texts. See the ppt on content analysis.

The idea here is to measure a set of cases on a set of variables, where the data for each case is some kind of text.The variables are codes which have been obtained earlier, either derived from theory or via an open-ended coding process. The data should be coded by multiple coders who are unfamiliar with the theory and hypotheses of the study.

It's best to have an odd number of coders to avoid ties.

Step 0. The data

- Start with a set of comparable texts, normally one text per person or whatever the research subject is (firm, country, etc).
- Examples include obituaries, personals, horoscopes, diaries, essays on the same topic ("what I did last summer"), commercials, web pages, annual reports, patents, etc.

Step 1. Initial Codebook

- Start by creating an initial codebook to be used as a guide by the coders.
- Each code should have a short name, a paragraph-long description, and a bit of text that exemplifies the code
- The codes can come from theory or from a previous open-ended coding process (e.g., grounded theory process)

Step 2. Training

- Take a set of, say, 20 cases, and have the coders code them independently
- Now bring the coders together to discuss discrepancies. Clarify the code descriptions as needed to enable all coders to agree across the 20 cases
- Repeat the last two training steps with a fresh batch of 20 cases.
- Construct a final version of the codebook

Step 3. Measurement

- Have the coders code (independently) the full dataset. This can include the training cases.
- No matter how good the training, coders will disagree. Use majority rule to decide the right answer for each code for each case, creating a case by code data matrix. You can put NA if the level agreement doesn't meet your standards (e.g., for a given case, 5 coders say yes, 4 coders say no, so you assign no code to that case)
- Measure and report the inter-coder agreement using Cohen's Kappa (or the Pearson correlation coefficient if your codes are numeric, as in "amount of emotion expressed in case"). Exclude the training cases here since they would artificially inflate agreement
  - Excel file for calculating Kappa is here

Step 4. Analysis

- Merge the resulting dataset with the other information you have collected on the cases, and analyze as usual. For example, you might see if a person's gender predicts the presence of a certain code in their text.

Page updated

Google Sites

Report abuse