ULTRASPEC timing problem

Summary: ULTRASPEC runs occasionally suffer from the insertion of an extra "null" timestamp, shifting the registration between times and data by one frame. The problem can be reliably recognised, and it can be fixed by manipulation of the data. We have developed a Python script to do so. Read on for more details.


The problem

In January 2014, we noticed that the reduction pipeline command "rtplot" used to display the incoming data of ULTRASPEC was throwing up the following error:

Ultracam::read_header WARNING: time unreliable: GPS clock not yet synced since power up

At the same time, it was reporting a time in the year 1970. The latter is the zero point for the time used by ULTRASPEC, so clearly something goes wrong on occasion. Each ULTRASPEC data frame consists of 32-bytes of timing information (the "timestamp" from now on), followed by the data. Obviously it is important that the association of timestamp and data frame is consistent, but this is what the timing glitch affects. What happens is that a null (all bytes 0) timestamp appears. This causes the above bug to appear because within the timing data are various flags, and the first that gets tested is the one that indicates whether the GPS clock has been synchronized. Since all bytes are zero, then so too is the bit flag representing the sync status, hence the above (spurious in this case) warning. Since all the timing bytes are zeroed, then the spurious 1970 time also appears. So far we have never seen this happen more than once in a given run. Why it happens we don't yet know; it probably marks a failure to acquire the time from the GPS, but why this happens needs further work. When it happens, it tends to occur early in a run, typically on the second frame, but not reliably so. Some window formats seem to trigger it more than others, but again not in a consistent manner. The important point is the effect, which is to delay the correct timestamp until the next frame comes in. This propagates to all subsequent frames. Thus, in a nutshell, the effect of the glitch is to corrupt the timestamp associated with one frame, and to attach too early a timestamp to all subsequent frames. Times based upon such data without correction will therefore be too early by one cycle, e.g. a sequence of 3 second cadence will yield times that are too early by 3 seconds, if uncorrected post-glitch data are used. If you are interested in timing, this can be very significant because it is often possible to establish times to much higher precision than the cadence.

It is important to realise that apart from the final frame of the run, the timestamps are not lost, hence the times are largely correctable. In fact, even the last frame's time can typically be recovered as it is usually based upon the timestamps of the preceding two frames, due to the precise order in which times are taken. We have developed a Python-based corrector script to fix the data affected by this bug (and leave other data untouched). The script works by transposing the timing bytes after the corrupted file backwards in the file.


Correction of timing bug in the end_of_night_tasks script

If you run the "end_of_night_tasks" archiving script at the end of your night, the timing corrector script is automatically run on the data prior to copying the data. You can also run the script separately if necessary as described next.


How to use the script (linux, Mac)

To use the script, first download it from the link above, save it somewhere on your local disk, and make it executable (chmod +x uspfix.py). You must have Python version 2.6 or greater available. Then go to the directory containing the (potentially) corrupted data files, which should be in pairs such as "run011.xml" and "run011.dat". NB The script only operates on raw ULTRASPEC data, and for instance will not work on FITS files created from them.

Next ensure that the ".dat" files are writeable. e.g. on linux the following command will do this

chmod +w run*.dat

on all the run*.dat files in your current directory. This is needed because the data files are modified by the script. If you want to run the script recursively on a directory tree, then the following command will be needed to set the permissions recursively:

find . -name "run*.dat" | xargs chmod +w

You are now ready to run the script which is a matter of:

directory_path_to_script/uspfix.py

to correct just the files in the directory you are currently in, or

directory_path_to_script/uspfix.py -r

to run recursively on the current directory and all its sub-directories. The latter is useful when correcting several nights of data. NB If you are on the data reduction PC ("drpc") in the TNT control room, then simply typing "uspfix.py" or "uspfix.py -r" should work because the script should be in your command search path.


After the script is run, you may find files such as "run011.dat.old" in your directories. These are an indication that the respective file was affected by the bug and has been corrected. Immediately before making the correction, a copy of the data is made for safety and given a name with the ".old" suffix.

Please try not to interrupt the operation of the script. It has traps to prevent hitting <ctrl-C> in mid-correction, but it still might be possible to interrupt it when a file is only partially corrected which would be difficult to recover from. The script can be safely run twice or more without problem. On the second and later passes it will not modify the data further. If you suspect you might have managed to corrupt your data, then the copy made to "run###.dat.old" (which happens prior to any modification) may help you revert to the start.

Are my data affected?

The short answer is, "Yes", or at least they are potentially affected. The timing glitch is not new; we just did not notice it until the end of January 2014. Therefore all ULTRASPEC data taken to date are potentially at risk and require correction. Ignore this only if timing is not an issue for you. The easiest way to determine whether your data need correction is simply to run the script, as then your job will be done.

Our apologies to any observer affected by this problem. We will endevour to fix the issue properly (i.e. so that it does not ever occur) as soon as possible.


Contact us

Should you ever encounter a data file affected by more than one null timestamp, please e-mail Tom Marsh at the University of Warwick about it. We have never had this occur , but it might be possible, in which case we would like to see the data to check whether the script can be modified to correct the data in this case.