The transcription of 25 audio files was a process that was, in turn, peppered with the thrill of possible insights, and a long, repetitive, and often boring task. I was initially concerned that 25 interviews would not generate enough data from which to discover any unintended consequence-based theories, but my concerns were allayed by my first supervisor who assured me that this was an adequate number. A conversation I had with a colleague carrying out an ethnographic study as part of his own Doctoral research also helped. He had carried out a dozen interviews as part of his own research and was satisfied with this number. I also had to consider the eight participant journals I had also collected, though transcribing written documents simply required format changes rather than accuracy checking.
19 participant interviews were carried out via Zoom, an online meeting tool that automatically generates a full transcript of the meeting. While this automated system is a useful process and certainly saves a considerable amount of time, it is not completely accurate. In tandem with this, six face to face interviews were carried out and recorded with a Dictaphone. Interviews recorded this way were then uploaded to the university’s lecture capture tool; Panopto. As with Zoom, textual transcriptions generated within the tool. process used to prepare each. The transcription process to prepare for coding was as follows:
Download each automated transcript and copy and paste the content into a Microsoft Word document for ease of editing.
Listen to each audio file while reading through its transcript to check for accuracy, pausing the audio file to delete extraneous utterances and fillers such as ‘like’ (‘…and I was, like, playing a lot of games…’) Use of formatting tools such as find and replace or find and delete could not be used reliably. Removing all instances of the word ‘like’, for example, would certainly strip out fillers, but would also delete contextual uses, meaning clauses or sentences such as ‘I don’t really like playing after 10pm’ would lose meaning (‘I don’t playing after 10pm’). Rewinding and relistening to certain words or phrases that the transcript could not identify or had misidentified often meant using contextual linguistic knowledge to fill in gaps when careful relistening did not help. Much of the language used fell into the lexicon of gamers, something Zoom and Panopto often and understandably misidentify. When this occurs, both tools will select the ‘next best guess’. Contextual knowledge is useful, as words like ‘cheesing’, ‘janky’, and ‘nerfing’ were replaced with similar sounding but context-free replacements. Using contextual knowledge of the words likely to be used in such a conversation allowed me to read the words located either side of the misidentified word, helping to determine what the word spoken by the participant was.
Interview preamble and post amble was cut out, unless it suggested the need for further analysis.
Each transcript was then formatted to use a sans serif font for ease of reading, set at font size 12 for parity across transcripts, fully justified, and with my questions in emboldened text and responses in plain text. Lines were set at 1.5 spacing, again for ease of reading.
A number of images were also submitted as part of participants’ journal entries. These can also be coded using NVivo, software developed specifically for coding in qualitative research. Transcripts were uploaded to separate folders un NVivo in preparation for the coding process. Folders were titled ‘Interviews’, ‘Journals’, and ‘Images’.
33 complete transcripts are now ready for the next stage of research; initial line-by-line and incident-with-incident coding.
Two of my participants have very kindly given me permission to upload their (anonymised) transcripts so anyone who is new to the coding process and wants to practice can download and use them. You can find them below.