This project works to assess how anti- or pro-science the Republican and Democrat candidates are in the 2008 primaries.
Turn primary debates into strings
First, I read sixteen republican debates and sixteen democrat debates into text files using getTranscript. Each file was named by party and location. These were then converted into a string by readFile, which input a text file and returned a string. Each debate was turned into a string labeled either dem2008_1 through dem2008_16 or repub2008_1 through repub2008_16.
Count References to Science and Religion
Next, I wrote countScience, which counts the number of times a debate refers to scientific rhetoric. The scientific rhetoric was stored in keyScience. countScience input a string. First, the string was converted into a lower case string, so all scientific rhetoric matched keyScience regardless of its original caps. This string was then split into a list, called myList_science. For each word in myList_science, I checked if any of the keyScience words matched that word. This was accomplished by a nested for loop, where I simultaneously iterated through myList_science and keyScience. If scientific rhetoric was used, I increased my counter, called science, by one and appended that word to a list, called scienceWords. I then wrote a second function, countReligion, which is written exactly the same as countScience but iterates through keyReligion, increases a counter called religion when religious rhetoric is used, and adds that word to a list, called religiousWords.
I then tested both of these functions using a fake string, testScience. This string refers to science twice and religion once. My function successfully returned these two results when countScience(testString) and countReligion(testString) were run.
keyScience = ['science', 'scientific', 'scientist', 'math', 'climate', 'education', 'evolution', 'global warming']
keyReligion = ['religion', 'religious', 'christian', 'muslim', 'faith', 'bible', 'god', 'creationism', 'gospel']
Next, I worked to build a concordance (function called buildConcordance) so I could print the context of each reference. This code closely follows code from Homework 2-4. First, I defined two variables, concordance and myWholeList. I made the string lower case, searched for whole words then iterated through that list. I returned a matched string, then found the position where that word was matched and added the word and position to a new list, called myList. Each list was then appended to myWholeList, which combined all of the instances into a single list. For each element in myWholeList, I then made a concordance of each location, adding it to a dictionary only if it did not already match a key or appended its location if it already existed.
I then worked to find the context for each instance of rhetoric, whether scientific or religious. findContext took three arguments: the concordance built in buildConcordance; myList, or a list of key words; and the original text. I split the concordance into a list of keys and made the text lowercase. Then, I iterated through the keys in keyList. Like in countScience, I used a nested for loop to simultaneously iterate through myList to check if the key matched any of the words in myList (keyScience or keyReligion). If the word does match one of the words in myList, I add the string (plus or minus thirty characters from the "hit") to myFinalList. I also correct for instances near the beginning or end of the string. This prints the context of each appearance of the key words.
This is again tested using testScience. I use buildConcordance to build a concordance of locations where keyScience and keyReligion words appear. I then print findContext, which successfully prints two strings when compared to keyScience and one when compared to keyReligion.
Find the data
I define a new function, putScienceTogether, that combines all of these above functions. It inputs a string and returns the number of times the scientific rhetoric is used and the context in which it is used. var1, or the number of times science is referred to, so I can later perform calculations on these numbers. I write the same function for putReligionTogether.
Lastly, I want to use the putScienceTogether and putReligionTogether to count the number of times science or religion are referred to and perform calculations on this data. I perform putScienceTogether and putReligionTogether on all of the debate strings. A number, which I assign to a variable, dvar1-dvar16 or rvar1-rvar16. In totalScience and totalReligion, I count the number of times science and religion are referred to and return that simple number. Because the number of debates is the same, we can directly compare this number.
In analyzeScience and analyzeReligion, I instead look at the average number of times democrats and republicans refer to science and religion and calculate the standard deviation for each number. I calculate the average by counting the number of times each list is used, then divide that number by the total number of debates, or 16. I then use a new function, called standardDeviation, to calculate the standard deviation. This function was written based on a description of how to calculate the standard deviation and checked by a test function, called testStandardDeviation.