The Data Analysis criterion assesses the extent to which the report provides evidence that you have recorded, processed and presented the data in ways that are relevant to the research question.
3 strands, maximum 6 points
The communication of the recording and processing of the data is both clear and precise.
The recording and processing of data shows evidence of an appropriate consideration of uncertainties.
The processing of data relevant to addressing the research question is carried out appropriately and accurately.
collection of sufficient and relevant data to address the research question
appropriate qualitative observations (images/drawings correctly labelled)
concise presentation (of text, tables, calculations, graphs, other illustrations)
use of correct scientific units and their symbols
appropriate formatting of data: units are correct and uncertainties are identified; consistent number of decimal places or significant figures
clear and precise processed data that addresses the research question
a sample calculation or the use of screenshots where appropriate
relevance of graphs (e.g. with best-fit lines or curves).
Qualitative observations
Descriptions of qualitative observations are expected to accompany the raw data where applicable. Their importance will depend on the nature of the investigation.
Data table
Use concise column headings with units and uncertainties. It is not necessary to provide separate tables for raw data and processed data. A correct and consistent number of decimal places, based on the degree of precision, is expected in raw and processed data.
Tables should have appropriate titles and table numbers, or they should be set in a context that makes them unambiguous. Within the text of the report, they should be referenced using, for example, Figure 1 or Graph 1.
If large amounts of data have been collected, insert only a sample of the raw data. Data taken directly from an electronic device are raw data and require further processing to constitute processed data.
Data processing - see separate section below.
Sample calculations
Percentages, means, ranges at the end of the column or row of data are part of data processing. You are expected to use software to process data (eg google sheets) to calculate means, ranges and errors or percentages. You must explain your reasoning behind the processing and include a screenshot of the calculation including the formula used so that the processing is clear so and the validity of the calculations and interpretations can be verified. A worked example is only necessary for unusual processing.
Graphs
Inadequate labelling of a graph (axes, legends, titles) will impact data analysis. The type, size, proportions and scaling of the graph impact not only presentation, but also the graph’s usefulness in data analysis.
Both axes must be labelled with the names of quantities. Names can be spelled out in full or using the symbols already included in the report. For example, labels for a graph of time squared against distance might be “d / cm” for the x-axis and “t2 / s2” for the y-axis. Note that quantity symbols are styled in italics and units are not. The standard for expressing the units for quantities is to separate the quantity symbols from the unit with a solidus (forward slash), “/”. There is no penalty if units appear in parentheses (e.g. “d (cm)”).
degrees of precision in the instruments used
consideration of errors and uncertainties
consistency in the reported uncertainties
variation in the results, as shown by propagation of uncertainty, uncertainty bars, maximum and minimum lines of best fit
ranges (maximum value minus minimum value)
an appropriate response to outlier data.
Measurement uncertainties can be obtained from an instrument’s graduations, manufacturer specifications (for electronic devices) or the read-out for least count. The realistic use of an instrument also needs to be considered. For example, a handheld stopwatch used to measure the time of an event will not have a precision of 0.001 seconds, even if the stopwatch can provide such a read-out—human reaction times are not this fast. You should justify the size of uncertainty based on the nature of the experiment. Repeating a measurement for the same event often reveals an uncertainty larger than the precision of the instrument.
Uncertainties associated with single measurements must be expressed to the same degree of precision as raw data. Using the least count, the uncertainty could be expressed as 0.1 for 2.3 s, 0.01 for 2.34 s and 0.001 for 2.345 s. This is the minimum uncertainty, but often the uncertainty is greater. For example, measurement of a length could be (87.4 ± 0.2) cm. Expressions such as (87.4 ± 0.05) cm or (87.4 ± 2) cm are inappropriate.
Where relevant, measurement uncertainties should appear in the column headings along with the units. Uncertainties are also expressed graphically using scatter plots with trend lines. Graphs should include uncertainty bars. Uncertainties that are present but too small to be visible should be noted in the report. Uncertainty bars can be different for each data point, or each point can have the same absolute uncertainty. Uncertainty bars are usually only drawn for the dependent variable.
Uncertainties in processed data
Propagation of uncertainties involves mathematical operations using the non-statistical rules provided in data booklet (p.3). See also Tsokos old syllabus - uncertainties for more details.
Other processes, including trigonometric and logarithmic functions, should take account of the range of values and be illustrated by a sample calculation.
For repeated measurements, the uncertainty of the mean can be determined by using one-half of the range between the maximum and minimum values. It is acceptable to assume symmetry here so that the plus and minus values are the same. It is important that the raw data values, the mean value and the associated uncertainty are expressed to the same number of decimal places.
Most uncertainties should be expressed as a single digit. However, if the uncertainty begins with a “1” then two significant figures are acceptable, for example 87.4 ± 1.2.
If repeated measurements reveal no differences, then the least count remains the uncertainty. If one-half of the range reveals an uncertainty that is smaller than the individual raw data uncertainty, then the uncertainty in the mean must remain the larger value.
The minimum and maximum lines should simply be estimated while considering all the uncertainty bars. They should not be lines drawn using just the first and last data point uncertainty bars (as shown in Hodder - this is incorrect). The minimum and maximum lines are not required to touch or include all the uncertainty bars. You only need to consider their location by eye based on a reasonable judgement, not a mathematical procedure.
Data identified as possible or probable outliers should not be systematically omitted from calculations. Outliers are actual measured results and therefore need to be considered. Removing them so that the results “fit better” with expectations or with a general model is not good practice. This is manipulation of data and it is unscientific. Instead, you could consider presenting the outcome with the outliers included and excluded, thereby revealing their impact.
Outliers are most likely to occur as the result of human error, methodological flaws, or irregularity in the equipment or environment. Often, the quantity in question can be remeasured. The scientific method requires rigour and integrity in gathering data, while the IB requires academic integrity from students. Both of these are more important than attempts to make data appear consistent. Therefore, being cautious when rejecting data is essential, and data exclusion requires a justification. You should never reject two or more data as this distorts the true quality of their data sets. Although there is no single agreed method for rejecting outliers, common sense and careful analysis are always helpful.
processing that is efficiently presented and at the DP level for the topic
appropriate processing tools
realistic trend lines in presented data
appropriate graphing techniques including adequate scale, title and labelled axes
correct calculations and graphing.
Data processing
Processing is the transformation of raw data to arrive at a conclusion.
Graphing, even that of raw data, is part of processing, especially if it is used to derive values such as gradients for rates. If the graphing of processed data is more appropriate than the graph or raw data, only graph the processed data.
Trend-lines
An appropriate best-fit line or curve is common practice and should be guided by theoretical considerations, known equations or dimensional analysis. You should not assume a linear fit unless it is justified. If a best-fit line is justified by theory and experiment, the scatter of data points above and below the best-fit line should be approximately equal, demonstrating that the variation is genuinely random.
If you do not have sufficient data, a trend line may be used to show how the limited data collected fit a given model.
The purpose of a graph is to reveal the trend or mathematical function between the quantities graphed. This means looking for a continuous line or curve approximating the data scatter. The best-fit line or curve should never connect data point to data point, as this would not reveal a mathematical function. Nor should the best-fit line assume a zero−zero origin—this would disguise any systematic shift. Extrapolating to the axis origin and beyond the maximum value provides additional evidence that the chosen line or curve makes physical sense.
If a linear graph line is established through linearisation using the known theory, minimum and maximum lines can be used to determine the uncertainty in the gradient and intercept.
Notes:
For further guidance on uncertainties and errors see IB guidance on uncertainties and errors.
The processing of data to obtain an uncertainty value is assessed in the third strand (relevant processing of data) of the “Data analysis” criterion.
Interpretation of the data as it relates to the research question and consideration of the impact of uncertainties is assessed in the “Conclusion” criterion because (in part) this criterion assesses the relevance of the conclusion to the analysis.
SI units are expected for base quantities (e.g. time in seconds, distance in metres) and derived quantities (e.g. force in newtons or energy in joules). Non-SI units (e.g. eV, u, ly) are acceptable if the scientific context makes them relevant.
Source: Roberta Rodriguez, IB Physics Teacher