Q3 Knowledge Base Pearson r

The Pearson Correlation Coefficient (Pearson r)

Researchers test theories and make predictions. How is a theory tested? First, the theory must be operationalized. That is, evidence must be weighed so conclusions can be drawn as to whether the theory is true. The evidence comes in the form of data. Data are collected most frequently by enumerating, counting, or observing characteristics and traits of "subjects." In other words, data is collected on variables.

 Example: A juvenile probation officer has witnessed the behavior of hundreds of Juvenile offenders over the years and has come to identify a number of characteristics that she believes are common to these delinquents. They include: single parent households, drug use, low self-esteem, low GPA in school, high tardiness and absence rate, poor diet, lack of routine or schedule, no work assignments or responsibilities, abhorrent television programming, and socialization with misfits. 

This officer was interested in conducting a study to determine if, and how strongly, these characteristics and attributes are related to delinquency. She created a questionnaire and surveyed a number of at-risk teens who had just turned 18 years-of-age. The questionnaire is a survey instrument called the Delinquency Profile Test (DPT). It asks several questions that summarize those behaviors and characteristics. 

 Each questionnaire produced a score that she believed would reflect how likely a juvenile is to have been involved with the juvenile justice system during their high-risk years. This officer was particularly concerned about recidivism. She theorized the more of these characteristics that a teen possesses, the more times that juvenile will have been arrested during those at-risk years.

 Theory: Persons with high DPT scores will have more arrests 

 The theory is an abbreviated way of presenting the idea that persons with high Delinquency Profile Test (DPT) scores are most likely to have acquired a greater number of arrests than those with low DPT scores. What is the basis for that theory? The questions on the DPT are designed to create a profile of each teen. The test amounts to what is typically called, in the Criminal Justice sciences, a composite measure. A composite measure is a group of variables in a survey format that are scored to reflect a propensity toward a particular behavior, in this case, delinquency. Each question is worth a certain amount of points, and the sum of the points is that person's score. The higher the score, the more the teen is at risk for delinquency.  

 Dependent variable: Juvenile delinquency (as measured by times arrested)

 Independent variable: DPT score. 

 In fact, all of the questions on the questionnaire are independent variables. In this case they have been grouped into a "composite" measure called the "Delinquency Profile Test (DPT)." Therefore, the independent variable is the DPT score.

 After the data are collected, the first step in the analysis is to compute the descriptive statistics. The dependent variable is the outcome or effect: times arrested, a metric variable. The independent variable, the so-called "cause", is DPT score, a metric variable. When the data-types of the dependent variable and independent variable are both metric, the researcher is conducting a relationship study. 

 As previously mentioned, the standard deviation not only provides an idea of how spread out the scores are, but also informs the researcher where a given person or subject stands in relation to the mean (), and all other persons in a population (in terms of percentile rank). When a researcher is investigating a relationship between two metric variables to test a theory, she or he uses a single number that performs a similar function, The Pearson Correlation Coefficient (Pearson r). 

 Section 2 -- Describing Two Metric Variables Using a Single Number

 The Pearson Correlation Coefficient (Pearson r) is a single number that describes (1) the direction (positive or negative), and (2) the strength of the relationship between two metric variables. 

 A.  The direction of the relationship between two metric variables

 Pearson r describes how two metric variables "go together" in terms of the so-called "cause/effect" interaction that is implied by the theory. Pearson r shows the researcher if the variables move together or apart. They move together when high levels of one variable are accompanied by high levels of the other, a positive relationship. They move apart when high levels of one variable are accompanied by low levels of the other, a negative relationship.

 The direction of the relationship is revealed to us by the sign (+ or -) of Pearson r. If teens with high DPT scores have high arrests, the relationship is positive and Pearson r will be positive. If high DPT scores are associated with low arrests, the relationship is negative and Pearson r will be negative. If there is no particular pattern of high with high, or low with high, then Pearson r will be near zero. It may be a positive or negative number, but if it is close to zero, the relationship will be so weak as to be almost meaningless.

 The officer theorized that among this group of at-risk teens, high DPT scores will be associated with high number of arrests, and low DPT scores will be associated with low number of arrests.

 What is the null hypothesis?

--------------------------------------------------------------------------------------

There is not a statistically significant relationship between times arrested and DPT score.

 B.  The Strength of the relationship between two metric variables

 Pearson r informs the researcher about not only the direction of the relationship, but also the strength of the relationship between the two metric variables. Whenever there is a correlation between the independent variable (X) and the dependent variable (Y), it can be said that X contains some information about Y. Pearson r will always be a number between 1 and -1.  The strength of the relationship is determined by how close Pearson r is to either 1 or -1. The closer Pearson r is to zero, the weaker the relationship.  When Pearson r = zero, there is no relationship. Pearson r describes how closely the values of the independent variable are with the dependent variable. 

 What does it mean "how closely the values go together?" Let's say the juvenile probation officer randomly selected 7 of the 18-year-olds from the sample of at-risk teens who had filled out the DPT survey. The process of operationalization started with the formulation of the theory, and the creation of the survey. The survey, in essence, tells the researcher two things: DPT score and times arrested. 

 To help visualize, it is as if the researcher lined all 7 teens up from left to right and went from person to person asking two questions: "What is your DPT score?" and "How many times have you been arrested?" If the relationship between DPT scores and "times arrested" is positive, teens with high DPT scores would have high arrest numbers (direction). If the relationship was strong, the correlation between the two variables would be close to +1 (strength). 

 If all 7 teens were lined up from left to right, and the researcher started at one end asking two questions: "what is your DPT score?", and "How many arrests have you had?" We would expect the two variables to "move together" as we moved from one teen to the next: High DPT scores go with high arrests. 

 Section 3 -- Describing Two Metric Variables Using a Graph

 How can this study be presented to the research community visually? As described in earlier, researchers make use of a tool to display the data: a graph. The scatterplot is a graph used to visually represent two metric variables at once; similar to the way a frequency polygon represented one metric variable. With the dependent (Y) variable on the vertical axis (called the ordinate) and the independent variable (X) on the horizontal axis (called the abscissa), a "dot" is placed at the intersection of the points where the scores meet on the two axis. 

 The scatterplot shows the relationship to be positive when the scores tend to create the pattern around a line stretching from lower left to upper right, and negative when the pattern is from upper left to lower right. Since this study predicts a positive relationship, high DPT scores are accompanied by high arrests, the pattern is from lower left to upper right.

Note how the scores reveal a pattern around the dotted line. There are two important points to be noted about what the scatterplot reveals about the relationship between the two variables. They are exactly the same thing that Pearson r tells the researcher: the strength of the relationship (rather it is strong or weak) and the direction of the relationship (whether it is positive or negative). 

 

The direction of the relationship is revealed by the pattern of how the points on the scatterplot tend to fall. Either the points tend to fall from lower left to upper right (positive), or they fall from upper right to lower left (negative). How does a scatterplot show the strength of the relationship? By how tightly around the center of the pattern the dots are congregated. Examine the sample scatterplots below and note how the dots in a strong relationship tend to follow a tighter pattern than dots in a weak relationship. 

Describing Two Metric Variables Using a Graph

 1.  We used single numbers (the mean, the standard deviation, etc.) and graphs (the frequency polygon, the ogive, and the histogram) to describe a single metric variable. These “tools” helped us make sense of a collection of raw scores that, by themselves, are confusing and unwieldy. 

 2.  Pearson r does the same job of organizing two metric variables. Please be mindful, however, that we are striving to accomplish the mission of social science: to explain and predict. Therefore, describing and organizing a pair of metric variables is not an end in itself, but a tool we use to achieve a greater good: Provide explanations for social conditions.

 3.  The bottom line is this: It is a wonderful thing to have the power of mathematics to dazzle and amaze the intellect, but when it comes time to contribute to the understanding of the human social condition, statistics for the Criminal Justice sciences is quite distinct from math and mere “numbers on paper;” It is the key to efficacy in the creation of social policy. 

Procedure for testing Pearson r for significance 

The purpose of testing Pearson r for significance is to ascertain if the theory is validated by the data, or if the observed relationship occurred by chance. If the theory can be validated by the data, a repetition of the study would yield similar results. The theory is being compared to the null hypothesis.

1.  Theory: Formulate the theory, and then create a linear model expressing the theory as an equation:  (r 0). The equation conveys this concept: "there is a (positive or negative) correlation between the independent and dependent variables and the correlation is not equal to zero"

2.  Null: Formulate the Null, and express it as an equation:  (r = 0). The equation conveys this concept: "The correlation is not different from Zero."

3.  Results: Calculate the degrees of freedom: df = (N - 2) Where N is the number in the sample upon which there is complete data. (Think of it as the number of subjects or participants who answered both questions on the questionnaire.) Locate rcrit or Sig. (2-tailed), and test Pearson robt. What you are doing is ascertaining the probability of the obtaining a Pearson robt this size by chance using either the “Critical Values of Pearson r” chart, or “Sig. (2-tailed)” on the SPSS output. Write the results of the test of the null hypothesis. The null will either be retained or rejected.

a.  When no output is available, use the "Critical Values for Pearson r" chart in Appendix 3: The ABSOLUTE VALUE of the obtained Pearson r (“robt”) must equal or exceed the value of the critical value of r (“rcrit”) at the appropriate df level. If the absolute value of r is equal to, or greater than, rcrit, reject the null. 

b.  Using an SPSS Output:  The probability (Sig, 2-tailed) must be less than .05.  If it is, reject the null.  If the probability (Sig) is greater than or equal to .05, retain the null. (Note: SPSS computes the exact probability and calls it “Sig. 2-tailed”. Therefore, when writing the analysis based on results from an SPSS printout, show the exact probability.

4.  Analysis: Observe the Pearson r coefficient between the independent variable and the dependent variable. Be aware of the direction of the relationship. Write the analysis. Read the null hypothesis and note how the results compare to the theory. If the relationship is statistically significant, calculate Magnitude of Effect (MOE) r2. Include the strength of the variance explained and the variance that remains un-explained.

How what is Already Known about Describing Metric Data is Extended to Two Variables Simultaneously

1. We learned how to compute the standard deviation. The only reason we went to those lengths was to prepare for this comparison between the standard deviation and Pearson r. We spent a great deal of time emphasizing that the standard deviation is the most powerful tool we have with metric data. It is a single number that shows how much scores deviate from the mean.

2. We are now working with actual theories that make guesses about the relationship between two metric variables. It is extremely convenient that we are already so informed about the magic or the standard deviation, because we find Pearson r is basically the same thing (only with two variables at the same time). 

3.  The bottom line is this: The more we can build on what we already know, the less time we have to spend on a learning curve associated with learning new material. The first step in testing a theory that involves a metric variable is to discover how much scores deviate from the mean. The standard deviation informs us about one metric variable, Pearson r does the job using two variables simultaneously. One standard deviation means one Z-score. Pearson r shows how many Z-scores the dependent variable moves for every one Z-score movement in the independent variable.

A review of what we know to be true is in order

1.  In the study, each person has a raw score on two variables. Therefore, each subject has a pair of raw scores, and hence, a pair of Z-scores. 

2.  When the raw scores are summed, and divided by the number of scores, the mean is the result. The mean is a single number that conveys the "centrality" of the data-set. 

3.  The standard deviation shows how much the raw scores vary from the mean, on average," for any one variable.

4.  When the product of the Z-scores are summed, that is, when all cross-product deviations (Zx * Zy) are summed, and then divided by N-1, Pearson r is the result. 

5.  Pearson r is like an average of the Z-scores, just like the mean is an average of the raw scores, and the standard deviation is an average of the deviations from the mean. 

6.  However, the mean is only a measure of one variable, and the standard deviation is only a measure of one variable. Pearson r is a measure for two variables. Collectively, the sample is comprised of several pairs of Z-scores. Multiply the Z-score of one variable by the Z-score of the other variable (cross product deviation), sum all the cross product deviations, and divide by N-1; the result is a single number that shows how two variables go together: Pearson r.

A Conceptual Explanation for Pearson r

The bottom line is this: The basic concept of Pearson r is the average of the cross-product deviations. It shows how two variables differ from each mean together. It shows how strongly they are related and the direction of that relationship.

Testing Pearson r for Statistical Significance

The juvenile probation officer that conducted this study, like any social scientist, is being asked to provide evidence that supports her theory. Many confuse this notion by thinking she is obligated to “prove" her theory. She can't "prove" anything. Why? Because anything can happen by chance; but she can demonstrate that it is unlikely her theory is false. How? She tests it for statistical significance using the null hypothesis.

Why is the null hypothesis used, and why can't the theory be tested directly? As explained earlier, the null is always stated as "no relationship" or "no difference." By rejecting the null (by saying that the null is false) nearly all “chance” is being removed. With the prospect that “chance” has been minimized, there is only one thing left to conclude: a relationship exists. Remember, not all chance is being removed, but enough random chance, or, if you will, enough luck can be disqualified from reasonable and prudent consideration to justify the notion that the theory is supported based on the collected data.

When a social scientist is asked to test a relationship, she or he uses data obtained in the sample to determine if the relationship is due to chance in the population. If it is discovered that, by using data from the sample, the likelihood of this relationship occurring by chance is less than five times in one hundred (.05), then the relationship is statistically significant.

The officer took a sample of 7 juvenile offenders and data was gathered on their “times detained” and their “Delinquency Profile scores.”  The computed Pearson r (robt) was .801.     

Null hypothesis (H0):  There is not a statistically significant relationship between times detained and Delinquency-Profile test scores.

The test of the null hypothesis will determine if .801 is different from zero. What does the statement “is Pearson r different from zero” mean? A strong positive Pearson r is near +1, a strong negative Pearson r is near -1. Anywhere in between -1 and +1 could be strong or weak depending on a researcher's perspective. Social scientists must have a concrete determination that is standardized and agreeable to all, in order to reveal the truth about the theory. That is why, in the Criminal Justice sciences, there is a threshold beyond which Pearson r is not strong enough to be of value. The name for that threshold is "statistical significance". 

Since it was previously determined that the acceptable element of chance was less than 5 times out of 100, it is declared that “the significance level is less than .05” (p < .05).  In order to know if Pearson r meets the “standard” that it is being held to, it must be compared to a “critical” value. If the absolute value of Pearson r (robt) falls below that critical value (rcrit), Pearson r is not strong enough and is considered meaningless. If it is meaningless, it is said that the relationship occurred by chance. 

What is the critical value of r (rcrit), and how is it found? The critical value of Pearson r, is found in a chart. The values of rcrit shown in the chart are only positive. Therefore, use the "absolute value" of Pearson r when comparing Pearson r to rcrit. (NOTE: the term "absolute value" means to ignore the + sign or - sign of the statistic). 

However, rcrit is dependent upon the sample size used in the study. For every different sample size, there is a different rcrit. The chart shows the list of critical values of Pearson r for most different sample sizes. The chart shows the gradients of sample size as measured by a statistical term known as the "degrees of freedom" (df). 

The degrees of freedom (df) are calculated by subtracting the number of variables from the total number of subjects. In a bivariate relationship there are always two variables, so the formula is always (N - 2), because there are "N" subjects and two variables. Therefore in this example df is 5 (7-2).

The fact that it did not occur by chance means that out of all the people in the population that could have been in the sample, it was not just a coincidence that persons who were included in this sample with high DPT scores also happened to have high times detained. In other words the obtained Pearson r was not just "luck of the draw." Therefore the relationship was not "by chance."

The term "luck of the draw" illustrates a great point about what the term "by chance" actually means. Two people are playing a card game. The winner is the person who draws the high card. Using an honest deck, one person draws a "10" and the other draws a king. The king wins. Why? The person with the king "beat the odds" and won "by chance." 

It could have happened that, out of all people in the population, the 7 juveniles that were drawn for the sample had high times detained and high DPT scores "by chance." How is it known that that didn't happen by chance? Because the likelihood of that happening is taken into account when robt is compared to rcrit

Therefore, if Pearson r is found to be statistically significant, the exact probability of selecting juveniles with high times detained and high DPT scores "by chance" has already been "figured in". The researcher can be confident that it was not "dumb luck" that this Pearson r turned out to be this big, and she can convey to the world the truth: The relationship between times detained and DPT scores is statistically significant.

The Magnitude of Effect (MOE) is an indication of how solid the evidence is on which the theory is based. It can demonstrate how much (or how little) of the dependent variable is actually attributed to variations in the independent variable. MOE, therefore, can be used as a safeguard against the notion that the theory has been “proven” by the mere acceptance of the fact that the results are statistically significant. Remember, just because the evidence supports the theory, it may not provide a complete explanation of the phenomenon, nor does it necessarily consist of the only explanation. 

When Pearson r is statistically significant, it shows that the independent variable contains information about the dependent variable, but how does a researcher describe “how much” information is really there? Assuming that the relationship is significant, the researcher shows that what is being explained is the “variability” or “variance” in the dependent variable. She or he does this in terms of “percentage of variance” explained. 

The squared correlation coefficient, r2, is called the Coefficient of Determination.  It is the proportion of variability in the dependent variable (Y) that can be accounted for through knowledge of the independent variable (X). For example, the correlation coefficient between the variables produced a Pearson r of .80. Then r2 was equal to .64. Therefore, it can be said that 64% of the variability in times detained is accounted for by the DPT scores. By the same token, 1 - r2 produces the Coefficient of Alienation.  In other words, 36% of the variability is due to something else (error).   r2 = 64%    (1 - r2) = 36%

Lets look at the example: A juvenile probation authority wanted to target "kids-at-risk" to ascertain which juveniles were more likely to get in trouble. She devised a "Delinquency-Profile" test comprised of several factors commonly associated with children-at-risk and administered it to inner-city youths. She also obtained the arrest records of a randomly sampled group of 7 juveniles. She theorize that repeated delinquency is associated with elevated risk factors among at-risk youths. That theory is being compared to the null hypothesis (HO).  If HO is rejected, there is only one remaining possibility -- a relationship exists. We operationalize the study. 

If the null hypothesis is rejected: The relationship is statistically significant, it did not occur by chance. If the null were true, the probability of getting a correlation coefficient this size by chance is less than 5%. 

If the null is retained: The relationship is not statistically significant. If the study were repeated, the probability of getting a correlation coefficient only this size is too high. This outcome leads to the conclusion that the null is true.

   Analysis: Write the analysis. Read the null hypothesis and note how the results compare to the theory. If the relationship is statistically significant, calculate Magnitude of Effect (MOE). Always include the strength of the variance explained and the variance that remains un-explained.

r2  =   r * r

r2  = .80 * .80 

r2  = .64

To conform with APA guidelines, include the statistic being tested, the degrees of freedom in parenthesis, and the probability into the narrative. To accentuate the difference in reporting from using a calculator to using computer software: 

Using critical values of r:        r(5) = .801, p < .05

      Using an SPSS output:        r(5) = .801, p =.030

Example: The theory is supported. There is a strong positive correlation between Delinquency-Profile test scores and number of times detained. The correlation is statistically significant: r(5) = .801, p < .05 . Sixty-four percent (64%) of the variance in times detained was accounted for by Delinquency-Profile test scores, leaving  36% of the variance accounted for by other factors.

Testing Pearson r for Statistical Significance

1.  We are being asked to accept the a theory that high DPT scores go “hand in glove” with high arrest rates. High DPT scores mean a juvenile has many “risk-factors.” The researchers are evaluating the evidence that can either support that theory, or discredit it based on the notion that the youths that were drawn in the sample were persons with high times detained who “just happened” to be persons who have a large number of risk-factors also (“by chance”). 

2.  It will be easy to demonstrate that the theory is erroneous if we find that it would be common to draw 7 persons randomly, who have high times detained, but who have few or no risk factors. Such a population would produce very few, if any samples that have a Pearson r coefficient of .754 (rcrit) or greater. 

3.  Therefore, the theory would be discredited if a sample of seven persons produced a person r of less than .754 five times out of one-hundred. Restated, for every 100 times we randomly draw a sample of juvenile delinquents (n = 7), 95 times the robt will equal or exceed .754.

4.  The bottom line is this: for the null to be retained, we only have to get a Pearson r of .753 or less, 5.0 times out of 100. That means we think it would be highly unlikely to compute an robt = .754 if the null were true.