Data Capture Strategy

“How many readings should I take?”

You know what your independent variable is - it is the factor influencing the outcome of the experiment that you are setting. But what values are you going to set that independent variable to? I quite often get asked by students conducting labs “how many readings should I take?”. Presented here is a generalised method to establish the answer to that question for any experiment you conduct.

The range

The first thing to do is establish the range. The smallest and largest values you intend to set your independent variable to. It might be the case that you are anticipating the outcome of your experiment, for example you are testing a product that has a typical operating range or you are investigating theory that applies to a specific range of values.

If you don’t know what will happen during an experiment (and the best experiments are the ones where the results are not known in advance) then the best approach is to check the range on the equipment. What is the maximum and minimum value of the independent variable that the equipment will allow you to achieve?

Top Tip! Check the instrumentation measuring the dependent variable is capable and accurate at the full range of values you set for the independent variable.

The number of readings

Next you need to determine the number of readings to take within this range. As with all engineering projects, you have to balance the quality of your result with resources available to achieve it, and in this case time is a resource. It might be that you are in an undergraduate class and your time in the laboratory is finite due to your timetable, or you may be working in industry and your time is valuable, so should be used effectively. Ideally, you would take as many readings as possible to fully resolve the relationship between independent and dependent variables. However, when the number of readings you can obtain is limited, you want to ensure you are collecting only the minimum number of data points necessary to achieve a decent outcome.

The image to the left shows a large number of data points (crosses) describing the relationship between the independent and dependent variable (grey line). The large number of data points does a good job of resolving the shape of the curve, but is probably too many to justify.

To determine a sensible number of readings to take, you first need to establish if you have an approximate idea of what you are expecting the results to obtain, for example where the purpose of the experiment is to validate your expectations.

I'm anticipating the results

You anticipate the results of the experiment, for example you are validating the experimental system performs in the way predicted by a mathematical model. If this is the case, you can determine the minimum number of data points you need to gather, based on the aim of the experiment. You will need to think about what you are trying to achieve by conducting the experiment and if the number of data points you plan to capture will sufficiently fulfil the aim. Let’s take an example.

Say you are expecting your system to produce a linear relationship between independent and dependent variables. You could decide to collect just two data points, as shown in the graph on the left. This would allow you to determine the gradient and intercept of the linear relationship, using the blue line, which might be what you wanted from the experiment. However, it won’t confirm if the relationship is linear. For example, from just two data points the relationship could follow the green line, or any other curve that passes through points 1 and 2.

In order to confirm a linear relationship, a third data point would need to be gathered, as shown in the graph on the right, and found to lie on the same line as the first two. Three points on a line will not rule out relationships other than linearity. For example, the three data points on the graph to the right could be found for a relationship described by the green curve. But given you were expecting a linear relationship and the first three data points confirm this, it is highly likely the hypothesis is correct. Obtaining more data points will increase the certainty of the overall experimental finding and, if they are quick and easy to obtain, they may be sensible to gather. Be pragmatic in balancing resources required to collect additional data and the confidence you have in the result.

But what if you don't know what results your are expecting?

I'm experimenting because I don't know what will happen

If you are unsure about the relationship between the independent and dependent variable that the experiment might produce (this is often the case as the purpose of an experiment is to discover something you didn’t previously know), or where a particular feature of the relationship might occur, then you need a strategy to determine how many readings to take. This is one suggestion:

  1. Consider how many readings you are capable of taking in a fraction, say half, of the time available for your experiment and make this the initial number of measurements. Reserve enough time to analyse the outcomes from this initial set of data.
  2. Equally space this number of the settings of the independent variable to cover the range. For example, if the range of the independent variable has already been determined as between 0 and 10 Amps and you have enough time to perform 6 measurements, then you would run your experiment at 0, 2, 4, 6, 8 and 10 Amps.
  3. Plot the data and determine where additional data points need to be gathered with the insight provided by the initial data.

Top Tip! Check for hysteresis. Hysteresis occurs when the data you obtain is different depending on what order you obtain it. Data doesn't necessarily need to be collected in one particular order, for example lowest setting to highest setting. So consider in your experimental design collecting data in both directions to determine in hysteresis is present.

Once you have obtained your initial data, there is no definitive rule to determine if you should gather more data, how much to gather or where to gather it. If the initial data reveals an area of particular interest, then this would be worthy of further exploration. If there are large, discontinuous changes in your data then further resolution in this area is advisable to gain a complete understanding of how the relationship between the variables. But at this stage, considered judgement from the experimentalists, based on the aims of the experiments, is required. Consider the following three cases as examples:

Above shows a clearly linear trend between independent and dependent variable. The initial data in this range is fairly conclusive and gathering additional data would probably be unnecessary.

In this example shown above, the data is displaying a peak value. It may be valuable to determine more precisely where this peak occurs by gathering more data between points 3 and 5.

Here, the data appears to be (approximately) constant between points 1 and 3 and between points 4 and 6. There is little value in gathering extra data in this area. However, there is a large jump between 3 and 4. Gathering extra data between points 3 and 4 may provide useful insight into the behaviour of the system.

Summary: A data capture strategy should specify the range of values the independent variable is set to and the number of readings in that range, or specify how decisions to obtain the range and number will be made during the experiment. The range can be dictated by the aim of the experiment or, if that isn’t available, the maximum operating range of the equipment. To determine the number of data points to gather, some judgement is required. If you anticipate the result, you can determine the minimum number of points to gather to meet the aims of the experiment. But you may want to add a few more to eliminate any reasonable doubt in your findings. If you are unable to predict the results in advance, take initial readings and reserve sufficient time to interpret and gather more data based on your initial findings.