Fast Search Algorithm

DISCLAIMER: The algorithm used to search for dispersed transients is surely not new - it's a very simple, basic method.

The development of the algorithm was motivated by the enormous processing requirements associated with the usual professional 'single pulse search' applications (e.g. Heimdall, 'single_pulse_search.py'). With my consumer-grade PC the time needed to process the filterbank data using those professional packages is totally impractical.

The idea came from manually looking at dynamic spectrum plots and noting that it is very easy to sort - by eye - those spectra which contain dispersed pulses and which do not.

When looking at these dynamic spectra I noted that what my eye-brain was picking out bright points which lie along a straight line. The algorithm mimics that ability with a reasonable efficacy.

Conveniently - the narrow bandwidth of the HawkRAO Vela observation data means the dispersion across channels is very close to being linear. A more general approach would be to fit to a square-law dispersion delay curve.

The Basic Algorithm

The basic algorithm calculates the maximum dispersion time for the observation frequency and the observation bandwidth for a given maximum DM. Using that calculated time it sets the length (in time) of the block of data required to encompass the maximum dispersion delay across the bandwidth - with a lower limit of 1 second.

The filterbank data is read in blocks with time duration as calculated above.  However, it is possible that a dispersed pulse traverses from one block into the next - so the increment of the start of the blocks is half the calculated block duration to ensure that any dispersed pulse is fully contained inside one block.  In this way the data is actually read in twice.

Normally - in other applications - the blocks are processed by de-dispersing over a range of test DMs.  The de-dispersion process chews up massive amounts of processing time.

Instead, in this algorithm the un-de-dispersed data is searched for the maximum values in each channel (assuming that the FRB signal is above the noise in a sufficient number of channels) and a test is done for a minimum 'coefficient of determination' (COD) for a linear fit to these points.  The test is done groups of 4 channels out of the 32 - incremented by 1 channel for each test.  Any group which returns a minimum set COD is added to a 'valid channels' mask. If the test passes set requirements a final COD test is done using this 'valid channels' mask.  The linear fit algorithm returns the starting point (w.r.t. highest frequency channel) and the slope - which is a measure of the DM.

Various parameters are recorded which allow subsequent curation of candidates to eliminate those candidates which are obviously RFI.

Only those blocks of data which contain the curated candidates are de-dispersed and subjected to further curation - which accounts for fast processing of data - where a 3.5 GB filterbank file is processed in ~ 1 minute.