This tutorial will use the heart attack data which comes with this description of its variables:
Heart Attack Patients This set of data is all of the hospital discharges in New York State with an admitting diagnosis of an Acute Myocardial Infarction (AMI), also called a heart attack, who did not have surgery, in the year 1993. There are 12,844 cases. AGE gives age in years SEX is coded M for males F for females DIAGNOSIS is in the form of an International Classification of Diseases, 9th Edition, Clinical Modification code. These tell which part of the heart was affected. DRG is the Diagnosis Related Group. It groups together patients with similar management. In this data set there are just three different drgs. 121 for AMIs with cardiovascular complications who did not die. 122 for AMIs without cardiovascular complications who did not die. 123 for AMIs where the patient died. LOS gives the hospital length of stay in days. DIED has a 1 for patients who died in hospital and a 0 otherwise. CHARGES gives the total hospital charges in dollars. Data provided by Health Process Management of Doylestown, PA.
This is a very large data set and so is provided as zip files. (You may need a program such as winzip to unzip them). Available are plain text (with tabs separating entries) and Excel versions of the data.
Getting tables into R is can be complicated so use this file which contains only the data on the DIED variable. You can use its URL or save it to your hard drive. In RStudio, use the >Import Dataset button from the Workspace tab. The data.frame is called DIED4R and it has one variable called V1 because there was no heading. We could attach the data.frame and use V1, or we can directly access V1. In both cases, let's call the variable died:
> attach(DIED4R)
> died = V1
> detach(DIED4R)
> died = DIED4R$V1
With R alone, save it on your hard drive in the directory where the R program is located. If you name the file DIED4R.txt, you can use this R command to input the data
> died = scan(file="DIED4R.txt")
This puts the data into a variable called "died". Use table() on this variable to get counts if you do not already have them.
> table(died)
1410 of the patients died.
Proportion Test and Confidence Intervals
A single command gives confidence intervals and tests any hypothetical p0 specified. Here we compare this proportion to a (hypothetical) usual mortality rate of 10%. Ignore the X-squared value and use the p-value for a hypothesis test.
> prop.test(1410,12844,p=0.1) 1-sample proportions test with continuity correction data: 1410 out of 12844, null probability 0.1 X-squared = 13.5385, df = 1, p-value = 0.0002337 alternative hypothesis: true p is not equal to 0.1 95 percent confidence interval: 0.1044507 0.1153421 sample estimates: p 0.1097789
Exercises
1. Make a guess as to the proportion of states that voted Democrat in the 1996 Presidential election. Now test your guess using the table() and prop.test() commands introduced in this tutorial. Was your guess within the 95% confidence interval?