Post date: Nov 13, 2010 1:43:36 PM
The Idea here is to measure a set of cases on a set of variables, where the data for each case is some kind of text.The variables are codes which have been obtained earlier, either derived from theory or via an open-ended coding process. The data should be coded by multiple coders who are unfamiliar with the theory and hypotheses of the study. The process has two main steps.
Step 0. The data
Start with a set of comparable texts, normally one text per person or whatever the research subject is (firm, country, etc).
Examples include obituaries, personals, horoscopes, diaries, essays on the same topic ("what I did last summer"), commercials, web pages, etc.
Step 1. Initial Codebook
Start by creating an initial codebook to be used as a guide by the coders.
Each code should have a short name, a paragraph-long description, and a bit of text that exemplifies the code
The codes can come from theory or from a previous open-ended coding process (e.g., grounded theory process)
Step 2. Training
Take a set of, say, 20 cases, and have the coders code them independently
Now bring the coders together to discuss discrepancies. Clarify the code descriptions as needed to enable all coders to agree across the 20 cases
Now repeat the last two training steps with a fresh batch of 20 cases.
Construct a final version of the codebook
Step 3. Measurement
Have the coders code the full dataset. This can include the training cases.
Use majority rule or a simple average to decide the right answer for each code for each case, creating a case by code data matrix
Measure and report the inter-coder agreement using Cohen's Kappa or the Pearson correlation coefficient. Exclude the training cases here since they would artificially inflate agreement
Excel file for calculating Kappa is here
Step 4. Analysis
Merge the resulting dataset with the other information you have collected on the cases, and analyze as usual