1. One Variable Introduction

Learning objectives (and summaries)

Understand the difference between different types of data and how they are represented in spreadsheets and graphs.

    • Be able to design and create an online survey with a variety of question types

    • Use Google Forms to create text response (say anything), text response (respond with a number of something), scale, multiple choice, and true/false questions.

    • Interpret the raw spreadsheet of responses with appropriate vocab for rows and columns

    • Identify each row as a person who took the survey (a subject/individual) and each column as a categorical variable (limited options), a quantitative variable (measured/counted number), or a qualitative variable (such as everyone different with a text response).

    • Interpret summary results from the survey

    • Be able to understand what the frequency charts, pie graphs, and bar graphs are communicating on the Google Forms summary page. Note that bar graphs and pie graphs can often both be used for the same type of data.

    • Understand why each type of graph is paired with a type of question

    • Scale questions use a bar to show the results in order from 1-5 (or whatever numbers your scale uses), multiple choice and true/false questions use a pie graph to show which percent each of the options received, and the open text responses did not have any graphs. Note that numerical responses can be turned into graphs, just not in Google Forms, and this will be covered in the quantitative distributions module.

    • Describe the key differences between the following question types: open text response, numeric response, Likert scale, multiple choice, true/false or yes/no

    • The more open the response, the larger variety of possible answers, but it is harder to summarize the data. The more closed the response, the easier it is to analyze the data, but the more likely it is that some of the options will not accurately capture the thoughts and feelings of the people taking the survey.

Assessment (10 core points)

    • Test (8pts): 5 multiple choice or numerical questions; 1 of these free response questions (3pts):

      • What are the pros and cons of an open response question? When is an appropriate time to use one?

      • What are the pros and cons of a multiple choice question? When is an appropriate time to use one? How do you display the results with a graph?

      • When you setup a spreadsheet, what do the rows (across) and columns (vertical) represent? Why is a spreadsheet a good way to store data?

    • Show me completed survey activity (2pts)

Instruction

Printable guided notes: version 1, version 2

Vocabulary

Categorical Variable - A variable that can be classified into two or more categories; this variable does not have a quantity. Ex: yes/no, red/blue, made/missed, etc.

Individuals- items or subjects that are in a study

Subject- an individual that is a human

Qualitative Variable- an open response type question

Quantitative Variable- A variable that is measured based on anything that has to do with numbers; Ex: age, weight, a number scale, or even using money

Relative Frequency- the number divided by all the possible outcomes

Variable- what is being measured or used to measure (in future modules, we will see independent vs dependent variables and explanatory vs response variables)

In class

    • Use Google Forms to create a 10 question survey in groups of 3. Use at least one of each of the following types of questions: text response (say anything), text response (respond with a number of something), scale, multiple choice, and true/false.

    • After surveying at least 12 people, discuss the spreadsheet of results with teammates. Identify each row as a person who took the survey (an subject/individual) and each column as a categorical variable (limited options), a quantitative variable (numbers), or a qualitative variable (such as everyone different with a text response)

    • Look over the Google Forms summary page. Note how different types of data are displayed. Scale questions use a bar to show the results in order from 1-5 (or whatever numbers your scale uses), multiple choice and true/false questions use a pie graph to show which percent each of the options received, and the open text responses did not have any graphs. Note that numerical responses can be turned into graphs, but Google Forms does not yet support this.

    • The more open the response, the larger variety of possible answers, but it is hard to summarize the data. The more closed the response, the easier it is to analyze the data, but the more likely it is that some people will not agree with any of the options or will agree with a couple options equally.

DISCUSSION: why different types of questions have different purposes, and why they each are visualized in different graphs (and why some can't be visualized).

Practice

1. Label each of the following as a categorical variable, quantitative variable, individual, or subject.

    • a. A dog

    • b. Weight

    • c. Number of friends

    • d. Your friend Sam

    • e. Preferred political party

  • f. You

2. I took a survey on people’s favorite type of chocolate. Here is the raw data I received back:

Dark, milk, milk, milk, dark, caramel, caramel, almonds, milk, dark, dark, almonds, dark, milk, caramel, dark, dark, caramel, dark, milk, caramel

    • a. Create a frequency chart of this data

  • b. Add a column for relative frequency

  • c. Create a bar graph of the data using relative frequency as the vertical axis label

  • d. Create a pie chart of the data

  • e. If you want to persuade others that dark is more popular than milk chocolate, which graph is more convincing? Explain.

3. A new club has 3 freshmen, 5 sophomores, 11 juniors, and 4 seniors.

  • a. Create a frequency and relative frequency chart of these results

  • b. Create a bar graph of the data using frequency as the vertical axis label

  • c. Create a pie chart of the data

  • d. What do you think each graph best emphasizes? Explain.

  • e. Imagine that the bar graph started with 2 as the lowest frequency instead of zero. How would this change the perception of the graph?

Practice solutions

1. Label each:

    • a) individual (a dog is a noun)

    • b) quantitative variable (describes an individual, is a number)

    • c) quantitative variable (describes an individual, is a number)

    • d) subject (a noun, specifically a human, thus a subject)

    • e) categorical variable (describes a subject, is a set of options)

    • f) subject (a noun, specifically a human, thus a subject)

2. Mmmmm chocolate:

    • a/b)

    • c) Notice the label "relative frequency" on the left, the label "chocolate type" on the bottom, the option title below each bar, and the large graph title at the top. All are necessary.

    • d) Notice the labels of each option inside the circle -- you may either do this or create a legend on the side. It is also recommended to mark the percentage in the circle.

    • e) Though not the only acceptable answer, I think the bar graph makes a stronger case than the pie chart for highlighting differences. In a pie chart, it is harder to see small differences.

3. A new club:

    • a) Note how I decided to put these categories NOT in order from most to least frequent. This is because we associate these categories as having an inherent order (9th grade then 10th grade and so forth). Thus, we logically expect to see graphs with these kinds of order categories in their normal order.

    • b) Be aware of the required titles/labels, as a graph without meaning is quite useless. Also notice the order of the categories -- this is important because there is an inherent order in classes (grades 9-12).

    • c) Again, be aware of the required titles/labels. Also notice that for the pie chart, I did put the categories in order of greatest to least.

    • d) Bar graph: juniors are the largest group by a bit, then sophomores, then seniors, then freshmen -- the specific order is more clearly noticeable and the counts of numbers of students are easy to see

    • Pie chart: nearly half of the club is juniors and the other classes make up roughly equal parts of the remaining space

    • e) If you made the bottom of the bar graph start at 2, it would look like there are twice as many seniors as freshmen, 3 times as many sophomores, and 9 times as many juniors. This deceives the reader. Unless the goal is to zoom into a graph or present a biased picture, graphs should start at zero.

Notes

Fix use of ordinal data as quantitative -- see StatTrek's awesome description: http://stattrek.com/statistics/measurement-scales.aspx