This project analyzes the readability of “State of the Union Addresses and Messages” from 1993-2017 to track the readability pattern over time and among the four most recent presidents (William J. Clinton, George W. Bush, Barack Obama, and Donald J. Trump).
The readability of “State of the Union Addresses and Messages” varies among different presidents but is consistent (standard deviation less than 0.5) during the presidency of a given president.
The speech scripts come from American Presidency Documents Archive: http://www.presidency.ucsb.edu/sou.php
“State of the Union Addresses and Messages” from 1993 to 2017 will be copied and pasted from the website as plain text. Each document will be named in the format of “PresidentNameYear.txt” as separate .txt
files.
Downloaded Text Files: https://drive.google.com/open?id=0B2rUC8oOBYH4bU5zQ3BwelAwZ1U
python3 AoyingHuang_Proj2.py directory_of_rawtext_folder
. Error message will appear if elements lacking in the command line. sentencecount
. This function find the number of sentence end-start patterns with Regular Expressions (Regex) and the total # of sentences = # of patterns + 1. Specifically, the pattern includes the following 5 cases: textfile.split()
.ARI = 4.71 x (characters/words) + 0.5 x (words/sentences) - 21.43
textfile.split()
. The length of this list is the total number of words in the text.FKGL = 0.39(total words / total sentences) + 11.8(total syllables / total words) - 15.59
.txt
file.Python Program: https://drive.google.com/open?id=0BwLqgtoS-HrCWDhCYXhreEFtSEE
For instructions, please refer to the comments at the beginning of this Python program.