By Goutham Vinjamuri, BS-MS 2021
Data collection is the first step in any astronomical study. The data is what enables astronomers and astrophysicists to study the systems of their interests and learn about them. But before the collected data can be used for astronomical studies, it goes through a number of processes that improve the quality of information collected. In general, this process consists of two components. The first component is called calibration, and the second one is data processing.
Data is collected using optical instruments working in different parts of the electromagnetic spectrum. In most cases, a telescope will be used to collect the data and store it accordingly for future use. It is common knowledge that there are no ideal machines. All equipment introduce some kind of unintended signal (noise) to the original data in the process of collecting the data itself. It is something that cannot be eliminated completely. However if left untreated, this can cause problems as different instruments will produce different results. This inconsistency in results is undesirable and must be avoided to get any meaningful scientific results. To deal with this problem, one performs certain corrections on the raw data, which minimize the effects of these imperfections so that one obtains (more or less) consistent data from the observations. This step is called calibration.
This article attempts to address some basic ideas and tools used in calibration in optical telescopes because these are fundamental concepts that are modified and used for other applications as well. As a bonus, since most amateur astronomy tools also operate in the visible region, one can use these techniques to get better results in photographing deep-sky objects.
So the most important part of understanding how good data is collected is to know how data is collected in the first place. First, let’s talk about the basic parts of all optical instruments and understand their functionality.
Any optical instrument that is reflective or refractive must have a component that collects incoming light; this part is known as an objective. However, the objective is a term predominantly used in the context of refractors, as the first (and usually the largest) lens which collects the incoming light and focuses it onto something else. Mirrors (the primary mirror) play this role in reflectors. The other component of the telescope is its eyepiece. The eyepiece is a lens (usually a combination of lenses) which gets light from the objective and magnifies it. The light from the eyepiece is what we end up using to collect data (or see, in the case of optical telescopes). Simply put, one can make a simple telescope by putting an objective and an eyepiece in the desired configuration inside a tube. Apart from this basic setup which all telescopes have, to do professional work one needs a few more tools which will support in accurate data collection. These include a Shutter and an electronic device known as a CCD. Although other tools and filters may be used in data collection, for the purposes of this article, we shall only discuss these limited ideas and how the techniques used in them affect the quality of data.
Shutter
The principal role of a shutter is exactly what its name suggests. It shuts and stops light from falling onto the eyepiece or any sensor attached to it. But why do we need to control the amount of light that enters our instrument? It is because we receive different amounts of light from objects at different distances. Closer objects will have a larger light cone reaching us than farther objects. So obviously in the latter case, the number of photons reaching us (the detector/eyepiece) will be smaller than in the first case. To balance this, one can allow more photons to hit the detector by opening the shutter for a longer period so that more information can be collected. This is called exposure. Typically in professional observations, the shutter time varies from 30 seconds to 300 seconds. This means that the shutter will be kept open for that amount of time, exposing the detector to the incoming photons. This is similar to the shutter speed feature commonly used in phones and DSLRs while taking low-light pictures. Shutters can either be separate mechanical parts on the telescope or more commonly, smaller electronic components as a part of the detectors called microshutters.
Charged Coupled Devices (CCDs)
The other very important part is the detector itself, or CCD, as it is generally used in astronomy. A Charged Coupled Device (CCD) is an electronic sensor that can store information as charges. They were first developed in the 70s at Bell Labs as a means of information, like an SSD. It is a semiconductor device which works on the basis of the photoelectric effect. In simple terms, a CCD is a 2D array of photosensitive wafers separated by square potential wells. Each of these small divisions is called a pixel (picture element). When photons strike a pixel, it knocks out an electron-hole pair and create a charge. Because of the potential barriers separating each pixel, the charges in them remain isolated. The more numbers of photons hit each pixel, the more the charge stored in it is.
Now, what remains is to read out the charges in each pixel and store them digitally. The incredible advantage CCDs provide is fast and reliable readout. This is done using electronic shift registers. Since our data is spread out on a 2D grid, the collection of data must be sequential to avoid mistakes. Think of the array as a batch of lines each having the same number of entries. To read the entries row-wise, push each element of a line into its next position. Then the last row (the row with ending elements of the line) gets pushed into a new row and the first row of the line gets emptied. Now push this row horizontally one by one and record the data. Then iterate this process until all the rows of the array are empty.
This is done electronically using horizontal and vertical shift registers.
Once the charges are read out they are converted into voltages and stored digitally, which can now be converted into images (or left in numbers) for research.
Now that we have a rudimentary understanding of the components involved and how they work, let us look at the common problems one might encounter while collecting data in this manner and how they can be dealt with.
A Schmidt-Cassegrain telescope with a CCD attached to the eyepiece
Sources of Error
The first two components we discussed are the optics of the system. While they can also introduce imperfections and errors in the form of optical defects, they have been greatly reduced in recent times because of the advances in optics. The biggest problems we face with optics are aberrations. However, one must also understand that they cannot be eliminated from the system. Aberrations are caused because not all the light rays can be made to converge at a single point, either because of the different wavelengths involved in white light (dispersion) or because of edge effects. These problems are rectified using special lenses and prisms that reduce aberrations as much as possible.
The biggest source of noise is the sensor itself. Before readout from the CCD, each pixel is given a constant non-zero voltage, also called an offset voltage. This is done to separate the holes and the electrons from the electrode so that a depletion region can be created. Often times this offset voltage also prevents any negative count reading. This is called bias or bias voltage. The bias voltage must theoretically be equal on all pixels, but naturally, we notice slight variations in the bias voltage of each pixel. This can produce false counts or unintended readings, which can hamper the result slightly.
And since the photons registered (counts) are from the photoelectric effect, thermal radiation caused by the movement of electrons can also produce counts. These are called invisible currents. Cosmic rays hitting the sensor can also cause unintentional counts.
And apart from this, there is always a constant noise source generated by the electronics itself. It is not possible to design electronic components which do not produce noise. On top of all this, there is always atmospheric noise, also known as sky background.
And one must be mindful of the fact that not all pixels of the sensors are perfect. Because of repeated exposure, some pixels lose sensitivity with time resulting in non-functioning pixels.
As one can see, astronomers have so many things to take care of before obtaining the data! Some of the above-mentioned problems are treatable, some are not. Let us look into some popular methods which are employed to correct these problems.
Corrections
One must understand that all these sources of error contribute to additional counts. Meaning the raw data from the CCD readout will contain counts due to the object in observation itself (intended counts) along with counts because of all the errors in the instrumentation and the counts because of noise (uninteded counts).
Thus to correct our final image, we have to subtract unintended counts from the the total readout counts. But clearly, some of these errors are stochastic in nature. One cannot predict which pixel has how much bias voltage or on which pixel a cosmic ray may strike. So, a common practice in situations like these is to repeat the corrections multiple times and take a median of all of them. Then, this median can be used to correct for the respective error. But an important question to ask is “Why median staking? Why not just mean or any way of combining all the data into one?” Statistically, the mean is defined as the algebraic over the total number of objects in the sample. And the median is defined as the point which divides ordered the data into two equally distributed groups. Both these quantities lie in between the upper and the lower bounds of data. But one clearly notices the difference between them if the distribution of the data is not symmetric. When data is symmetrically distributed, both the mean and median lie quite close to each other. But when you look at asymmetric distributions, median represents the central values better than the mean. Thus when you are looking at large median is a good representative of all the variations in the sample space.
To treat bias voltage, one first obtains bias correction data. As bias is an offset DC voltage applied to each pixel, it does not depend on the data collected (it is always present). Then to find out the offset levels of each pixel, all one needs to do is take a picture without allowing external photons to hit the sensor. Or in simpler words, pictures with zero exposure (closed shutter). The readings generated in this way produce a bias frame. Another important step added to this is called trimming. Some pixel rows are unexposed or covered by design, which is called overscan. The bias data of these rows is not useful, hence the image can be shortened or trimmed.
Dark current is the current flowing in the sensor even without any incident photons. The primary cause of dark currents is the thermal motion of electrons. When the current generated in this way exceeds the barrier potential, it is registered as a valid count. This is also sometimes referred to as reverse bias. And since the effect is thermal in nature, dark currents increase with time as temperature increases. Hence in most cases, CCDs are cooled externally to minimize this effect. Cooling is a very effective method to treat dark currents, which in many cases results in near zero dark current being recorded. But to further eliminate any residual currents generated in this process, many theoretical studies have been made which prove that dark currents follow a statistical distribution similar to the Maxwell distribution.
As explained in the previous section, not all pixels are equally sensitive to light. One must know how sensitive each pixel is on average to correct this. This difference becomes a factor of the actual readings and hence must be calculated and corrected. To find the individual sensitivity, the simplest method would be to uniformly illuminate each pixel and record the counts registered in each pixel. This can then be divided from the actual data to remove the non-uniformity in light sensitivity. One of the methods is using a fixed white light source and uniformly illuminating the sensor by placing a translucent cover on it to ensure uniform distribution, which is called flat fielding.
The methods explained above are all repeated multiple times and median stacked before using the frames to correct the images. These are called calibration frames, and this process is called calibration. Further processing must be done on this data in order to reduce the noise further because this is still very basic. We have not dealt with atmospheric noise yet, which is random and harder to solve and requires complex data analysis and a good understanding of how images are stored. But in any case, those operations are only performed after getting properly calibrated data from the sensor first.
These methods are very basic yet very powerful and have a lot of applications. They are used not only in astronomy but also in any image processing avenues using CCD or CMOS sensors.
For reference, here is an example of how data looks before and after calibration!
As we can see, the calibrated image contains so much more detail when compared with the uncalibrated image. This is why calibration is such an important step before doing further image processing!