1.1 What Is 1.1 Statistical Enquiry All About?

Mathematics 1.1

Achievement Standard 91944

Explore data using a statistical enquiry process

5 Credits

Internal

This assessment is about students exploring data using a statistical enquiry process with statistical insight.

The following page covers each of these skills and how to write the assessment.

Assessment Format

This standard is assessed as a short (800 words max) report completed in class time.

This report is digital and uses NZ Grapher to analyse the data.

The teacher can provide guidance on the plan, selection of dataset, size of sample, and can check that the question the student is asking will allow them to explain different sources of variation.

The teacher can also identify the population for the data and help with wording of the investigative question/statement.

All other work must be completed by the student during the time in class.

Skills for 1.1 Statistical Enquiry

This standard requires students to:

Complete an investigation about a dataset using the Statistical Enquiry Process (PPDAC) in one of four styles:
- comparison (numerical comparison of two or more groups)
- relationship (between two numerical variables)
- time series
- experimental probabilities (involving events with at least two stages).
Gather their own data through surveys or experiments or use existing data
Explain different sources of variation in data collection
Present data in an appropriate way
Describe what the data shows

PPDAC

Your report and process for writing the report for 1.1 Statistical Enquiry has 5 stages:

Problem
Plan
Data
Analysis
Conclusion

Problem

The Problem Statement is when you identify a problem to investigate.

There are four types of investigations you will learn about. Each has it's own style of Problem Statement.

Follow these structures exactly when writing your report.

Relationship Investigations

This report investigates the relationship between NUMERICAL (units) and NUMERICAL (units) based on a dataset from POPULATION.

e.g.

This report investigates the relationship between the Age of Students (years) and the Arm Span of Students (cm) based on a dataset from NZ Census At School of High School Students in 2020.

Comparison Investigation

This report investigates the whether GROUP A tend to have a greater NUMERICAL (units) than GROUP B based on a dataset from POPULATION.

e.g.

This report investigates the whether Students who get their money for their cellphone plan from their parents tend to have a higher Monthly Phone Plan ($ NZD) than Students who get their money for their cellphone plan from their pocket money based on a dataset from NZ Census At School of High School Students in 2020.

Time Series Investigations

This report investigates the change in NUMERICAL (units) between START TIME and END TIME based on a dataset from POPULATION.

e.g.

This report investigates the change in Total Filled Jobs (millions) each month between July 2004 and February 2012 based on a dataset from Statistics New Zealand's Total Filled Jobs Survey.

Experimental Probability Investigations

This report investigates the probability that EVENT OCURRS NUMBER OF TIMES out of TOTAL NUMBER OF EVENTS based on probabilities from POPULATION.

e.g.

This report investigates the probability that Mr Wills has 7 good nights of sleep in a week based on probabilities from Mr Wills' record of sleep from 2020 to 2023.

Background

Students can also add some detail after their Problem Statement to explain why they chose this question and any background information that might be useful to a reader.

Hypothesis

After their Problem Statement, students can also add a hypothesis, whether they think their problem statement will be true and why they think this.

Plan

The Plan is the section where you write exactly what you did in gathering and processing your data.

You should be familiar with how to gather data on your own and how to organise data into a table.

You will need to ensure your plan gives you enough data in the end to make the call. The minimum number of pieces of data to make a reliable judgement are:

Relationship Investigation — 30 pairs of data.
Comparison Investigation — 30 pieces of data from each category explored.
Time Series Investigation — 5 complete cycles.
Experimental Probability Investigation — 30 trials.

Data Collection Methods

How to write a plan

Types of Data

Sources of Variation

To pass, you must write about about sources of variation and how they are controlled in your Plan.

Sources of variation include:

Natural or Real
Occasion-to-Occasion
Measurement
Induced
Sampling

Sometimes you'll read the term Non-Sampling Variation. This is all variation except for Sampling Variation.

Natural or Real Variation

Variation that occurs because individuals are different.

This is the kind of variation we expect to measure. For some measures this will be large while others will be small.

e.g. different people will have different heights, different weights, etc. Basketball players will have less Natural Variation in height than the general population.

Occasion-to-Occasion Variation

Happens because some variables are not consistent, even over short time periods. Well designed experiments reduce this by taking recordings at the same time or in the same situations.

E.g.

Blood pressure is different even 5 minutes after it's recorded. Test results are affected by the time of day the test is sat.

Measurement Variation

Happens because no measurement or person reading the measurement is perfect.

Well designed experiments reduce this by using accurate measuring equipment along with being careful and thorough during the recording. Good experiments quote the error range associated with any reading.

e.g. If a ruler is used, the smallest measurement that can be done is 1mm. This means for any measurement, there will be an error of up to 0.5mm.

Induced Variation

Happens by members of the population being in different circumstances.

Well designed experiments will try and reduce the differences between groups.

This is often done with a control group - a separate group that do not have the new procedure or circumstances.

e.g. using students from the same year level or skill level.

Sampling Variation

Happens because random chance causes people in groups to be a little different. The smaller the sample, the more likely there will be an extreme difference. Taking larger samples reduces this error but this is costly and is never truly removed. That is why we use confidence internals which measure the likely sample error.

e.g. one random group of students might do better in a test than another random group of students.

Minimising Bias

You should try to minimise bias when you write your plan.

Reducing sources of variation means our sample will be representative of our population. This means our sample will be as close to looking like our population as possible. It is never entirely possible to get a sample that is truly representative of our population but we can get it close.

Our sampling also needs to minimise Bias. A biased sample is one where the way we created our sample means it doesn't represent (look like) our population.

Examples of bias include:

When individuals select themselves to be studied. Individuals with strong opinions will be over-represented in your sample, creating bias.
Asking leading questions. Poorly worded questions or certain orders of questions can cause people to change their answers, creating bias.
When people lie. Sometimes, when people are asked certain questions they don't always say their true beliefs such as when being asked about whether people have committed a crime, creating bias.
When people drop-out of a study. Sometimes during long trials, people stop mid-way through. The kinds of people who drop out may be those who had a negative effect or no effect from an intervention. This creates bias towards the drug or treatment being a positive result.

Data

Data is where you put all your data and your data displays. The displays you will need to understand include:

scatter graph
time series graph
box and whisker plot
two-way table
bar or frequency graph of outcomes from a probability experiment
long-run relative frequency graph

Making a Scatter Graph with Line of Best Fit

Making a Dot-Plot / Box-and-Whisker Plot

Making a Time Series Graph

Making a Two-Way Table

Making a Bar Graph / Frequency Graph of Probability Experiment Outcomes

Making a Long-Run Relative Frequency Graph and Digital Simulations

Analysis

The Analysis section is where you explain what your results show.

Below is the list of things you need to discuss in your results.

Relationship Investigations

What to include in the Analysis of a Relationship Investigation:

direction of relationship
strength of relationship
linearity of relationship
clusters
unusual or interesting data points
patterns

Comparison Investigations

What to include in the Analysis of a Comparison Investigation:

centre
spread
shape
shift and overlap of two groups
clusters
unusual or interesting data points.

Time Series Investigations

What to include in the Analysis of a Time Series Investigation:

trend of time series
unusual or interesting data points, spikes, or troughs
seasonality, cycles, and patterns.

Experimental Probability Investigations

What to include in the Analysis of an Experimental Probability Investigation:

clusters
unusual or interesting data points
centre
spread
shape
patterns

Relationship Investigations Example

Direction: There appears to be a positive relationship between Student Age (years) and Arm Span (cm). We know this because the points start in the bottom left and end up in the top right. We also know the relationship is positive because the slope of 4.5 is positive.

Strength: The relationship between Student Age (years) and Arm Span (cm) appears to be weak. We know this because the points are far away from the line of best fit. We also know this because the r value is 0.25 which is less than 0.6, the value above which the relationship would be considered moderate.

Linearity: The relationship between Student Age (years) and Arm Span (cm) appears to be non-linear. We know this because there is an uneven scatter with more points above the line than below.

Cluster: There appears to be one large cluster between 9-14 years old from arm spans of 120-180cm. This is about the length we would expect for 9-14 year olds.

Unusual Points: There is only one student who is 7 years old and one student who is 19 years old. There are also a few students with very low arm spans, less than 40cm. This might be due to a student having no arms or due to measurement error when these data were recorded.

Patterns: There are only specific values for age along the x axis, exact years. This is because the data that was recorded was in whole years. If date of birth had been used instead the data might be more continuous.

Comparison Investigations Example

Centre: The median cellphone plan cost is $5 NZD higher for those students whose parents paid than those students who used their own pocket money. The mean cellphone plan cost is $15 NZD higher for those students whose parents paid than those students who used their own pocket money.

Spread: The IQR for the parent group is $30 while for the pocket money group it is just $10, $20 less.

Shape: The pocket group shows only slight right skew with the mean $2.40 more than the median. The parents group shows greater right skew with the mean $12 more than the median.

Shift/Overlap: The UQ for the pocket group is the same as the median for the parent group, at $20. This means 3/4 of those in the pocket group spend below or the same as 1/2 of those in the parent group.

Clusters: There appears to be clustering around the $10 and $20 marks in both groups. This may be due to common cellphone plans being available for $10 and $20 per month from Vodaphone and 2Degrees.

Unusual Points: One student in the parent pays group reported spending $180 per month on their phone plan. This is more than $40 above the next student, indicating an outlier. This may be because the student is on a family plan.

Time Series Investigations Example

Trend: The trend starts in February 2004 with 1.81 Million Filled Jobs. This rose to 1.95 Million Filled Jobs in April 2008, a rise of approximately 2,900 more filled jobs each month. There is a sharp fall of 3,300 filled jobs per month down to a low of 1.89 Million Filled Jobs in October 2009. The trend rises more slowly at just 1,000 Filled Jobs per month to 1.92 Million Filled Jobs in February of 2012.

Spikes / Troughs and Unusual Points: There is a strange pattern during the period of 2008 with values not matching the pattern seen in other years. This may be due to the 2008 Economic Crisis.

Cycles: There appears to be a maximum each December with 60,000 more jobs than the trend and a reduced number of jobs in June to October, around 20,000 less filled jobs than the trend line. This may be due to seasonal jobs like fruit picking which are more available in New Zealand summer. In January there is a big fall in jobs, around 50,000 less than the trendline. This is likely due to people quitting their jobs at the end of the year as they look for new work.

Experimental Probability Investigations Example

Clusters: There are no apparent clusters in this dataset.

Unusual Points: It would be unusual to get no good sleeps in a whole week which occurred 2 times out of 161 but as this is not far away from the rest of our data these points cannot be considered unusual.

Centre: The median number of good night's sleep that Mr Wills had in a week (out of 7) was 5. The average number of good nights of sleep Mr Wills got in a week was 4.52. The mode - most frequent number of good nights of sleep Mr Wills got in a week was also 5.

Spread: The standard deviation, the average amount each value deviates from the mean is 1.62 good nights of sleep per week. This means the probability of having 7 goods nights of sleep in a week is (7-4.52)/1.62 = 1.53 standard deviations above the mean.

Shape: The distribution appears to show a slight left skew. This is also evident as the mean is 0.48 nights of sleep per week below the median.

Patterns: There are no apparent additional patterns present.

Relationship Investigations - Predictions

In a relationship investigation you must include a Prediction.

A prediction is where you use your linear model (y=mx+c) to predict a y value from an x value. You must also round this answer to a reasonable value.

This video shows how to make this graph.

You must include this in your analysis section.

Comparison Investigations - Making The Call

In a Comparison Investigation, you must "make the call" which means decide whether the Problem statement was correct or not.

In order to "make the call" you must calculate the Overall Visible Spread (OVS) and Difference Between the Medians (DBM). The size of your sample also matters in this calculation.

This video shows how to make this graph.

You should include this at the bottom of your analysis.

Time Series Investigations - Seasonal Graphs / Cycle Graphs

In a Time Series Investigation you must include a Forecast.

A Forecast is a prediction of what you might expect to find in the future based on the combination of the trend and the seasonal effect.

You can use a Forecast Graph on NZ Grapher to do this. You must add this forecast into your report with units.

This video shows how to make this graph.

You should include this at the bottom of your analysis.

Experimental Probability Investigations - Digital Simulations

‪Plinko Probability‬

In an Experimental Probability Investigation you should also include a Simulation.

A simulation is a computer experiment using data to mimic real life.

The website linked above shows one way to do this. The settings used in the example below assumes "Mr Wills thinks he get's a good sleep 70% of the time": Rows = 7 , Binary Probability = 0.70 , then run it 1,000 times.

You should include this at the bottom of your analysis.

Conclusion

The Conclusion is the section where you make the call on whether your Problem Statement was true or false.

For each style of experiment, making the call is slightly different.

Relationship Investigations

This report concludes that there is a Positive/Negative Weak/Moderate/Strong Linear/Non-Linear relationship between NUMERICAL (units) and NUMERICAL (units) based on a dataset from POPULATION.

e.g.

This report concludes that there is a Positive Weak Non-Linear relationship between the Age of Students (years) and the Arm Span of Students (cm) based on a dataset from NZ Census At School of High School Students in 2020.

Comparison Investigation

This report concludes that because the DBM is /is not greater than 1/3 / 1/5 of the OVS, we can / cannot make the call that GROUP A tend to have a greater NUMERICAL (units) than GROUP B based on a dataset from POPULATION.

e.g.

This report concludes that because the DBM is not greater than 1/3 of the OVS, we cannot make the call that Students who get their money for their cellphone plan from their parents tend to have a higher Monthly Phone Plan ($ NZD) than Students who get their money for their cellphone plan from their pocket money based on a dataset from NZ Census At School of High School Students in 2020.

Time Series Investigations

This report concludes that there was a general trend of GENERAL TREND along with a seasonal trend of SEASONAL TREND between START DATE and END DATE based on a dataset from POPULATION.

e.g.

This report concludes that there was a general trend of a rise, fall, and slower rise along with a seasonal trend of fewer jobs in winter and more in summer along with a reduction in January in Total Filled Jobs (millions) each month between July 2004 and February 2012 based on a dataset from Statistics New Zealand's Total Filled Jobs Survey.

Experimental Probability Investigations

This report concludes that the probability EVENT OCURRS NUMBER OF TIMES out of TOTAL NUMBER OF EVENTS is CALCULATED PROBABILITY while our simulation showed this probability is SIMULATED PROBABILITY based on probabilities from POPULATION.

e.g.

This report concludes that the probability that Mr Wills has 7 good nights of sleep in a week is 10.6% while our simulation showed this probability is 6.9% based on probabilities from Mr Wills' record of sleep from 2020 to 2023.

Enough Data or Reliable Enough Data?

You also need to consider if you had enough data or reliable enough data to make the call.

The minimum amounts of data needed to make a call are:

Relationship — 30 pairs of data.
Comparison — 30 pieces of data from each category explored.
Time series — 5 complete cycles.
Experimental probability — 30 trials.

Other Improvements?

You also need to consider if the experiment had any flaws at any stage in the PPDAC cycle.

Reflecting on how to improve the experiment is the difference between an Achieved and a Merit/ Excellence.

Additional Resources for 1.1 Statistical Enquiry

Liz Sneddon's website is very useful.

Ensure you have a strong knowledge of the Statistics parts of Junior Mathematics.

Relationship Investigation Exemplar

Comparison Investigation Exemplar

Time Series Investigation Exemplar

Experimental Probability Investigation Exemplar

Practice Assessments and Marking Schemes

The following are practice assessments along with their marking schemes.

Practice MATH 1.1 Assessment Information - CardioGoodFitness

Practice MATH 1.1 Dataset - CardioGoodFitness

MATH 1.1 Marking Scheme

Achieved Exemplar from Unpacking Math 1.1 NCEA

Merit Exemplar from Unpacking Math 1.1 NCEA

Excellence Exemplar from Unpacking Math 1.1 NCEA

Page updated

Report abuse