4. Binned Data Production

Cube data

'Cube' is the nickname we gave to the intermediate analysis result which is a 2D detector image as a function of certain variables. This is how the first level of data reduction is achieved through average. The other variable is most typically the time tool resorted delay time, but can be other variables as well such as laser power, temperature, etc.

The code is now rewritten to use the SmallDataAna(_psana) interface as that allows a more flexible definition of both binning variables and selection variables using derived variables rather than being restricted to the values saved directly in the hdf5 file. As for the SmallData production, there is a driver script (makeCube) and a "production" python file called "MakeCube.py". An example of the production file can be found here:

https://github.com/slac-lcls/smalldata_tools/blob/master/examples/MakeCube.py

The relevant lines are here:

ana.addCut('lightStatus/xray',0.5,1.5,'on')

ana.addCut('lightStatus/laser',0.5,1.5,'on')

ana.addCube('cube','delay',np.arange(13.,15.,0.05),'on')

cs140Dict = {'source':'cs140_rob','full':1}

ana.addToCube('cube',['ipm2/sum','ipm3/sum','diodeU/channels',cs140Dict]

anaps.makeCubeData('cube')

Now let us dissect what this is doing:

ana.addCut('lightStatus/xray',0.5,1.5,'on')

ana.addCut('lightStatus/laser',0.5,1.5,'on')

We are defining an event selection called "on". At this point we only require there to be both laser and X-rays. Typically one would add requirement on the incoming intensity and, if interested in the timetool, some quality requirement on the time tool signal.

ana.addCube('cube','delay',np.arange(13.,15.,0.05),'on')

Here we are defining a cube called "cube": we give it a name (here "cube"), a variable we want to bin in (here "delay"), the bins we would like to use for the binning and lastly the name of the filter/event selection we defined previously (here "on").

cs140Dict = {'source':'cs140_rob','full':1}

ana.addToCube('cube',['ipm2/sum','ipm3/sum','diodeU/channels',cs140Dict])

Now we tell what data we would like to bin. You can either pass the names of variables in the littleData or the name of detectors in the "big" data. This is not being passed as a dictionary with the source name (the alias) and then information of what information you would like to add to the cube (main use case of the full data, passed as above asking 'full':1 as value pair - the 1 is unimportant, the code only checks the presence of the "full" key).

anaps.makeCubeData('cube')

At least we will now make the cube. Note that we are calling this on "anaps" (!). "ana" has the same function: this will only bin the data present in the smallData file (or the derived fields attached to the xarray), it will quietly ignore the variables only to be gotten from the xtc. Because this will get data from the xtc file, you will want to run this using mpi using the driver script, but checking the cube definition (correct definition of bins,....) can be done using "ana" interactively.

The cube name will be used to name the hdf5 file that will get written by the function. The "ana" function by default will NOT write a file and only return a dictionary with the binned data. It has a parameter that will make it write an hdf5 file. The "anaps" function will always write the hdf5 file as this is integral to how the events are distributed among cores and how the data is reassembled in the end.

Binning variables:

The primary binning variable is defined in the cube definition. It either needs to be a variable in the smallData originally or an added variable. Using "delay" will create a derived variable for the X-ray-laser delay using the scan variable (if applicable), the timetool and the fast delay stage encoder value. If the bins are not passed, the code will try to use the np.unique(scanValue) which will only work for "step" scans.

In addition to the primary binning variables, you can now add more binning variables to make a higher dimensional "cube". This can be done like this:

ana.add_BinVar(addBinInfo)

You can pass either a list like [varname, bins] or a dict with variables names as keys and bin boundaries as values.

Add variables from ana:

You can add lists of variables in the smallData, wether they were present "originally or if they were added to the data. Now there are two ways to also bin droplet data: You can either save an image based on droplets/bin or make square arrays in x/y/adu for each bin. This is specified like this:

ana.addToCube('cube',['droplet:epix/droplets:image','droplet:epix_2/droplet:array])

Add variables from the xtc (images):

Data from the xtc file are added as dictionaries as described above. Option for the dictionary include:

full: save full detector data

image: if present, save image.

thresAdu: require pixels in to be added image to be above threshold

thresRms: -"-

common_mode: number identifying the common mode method

---old---

These files are typically named "CubeSetup_<somethingDescriptive>" and are passed along to a job submission script called "cubeRun" . cubeRun has a help function that explains the command line parameters, but they are very similar to littleDataRun, aside from the necessary "-c <CubeSetupFilename>". The following are some of the options that differ from littleDataRun. -m takes the common mode parameter: 5 means using the unbonded pixels and 1 uses the zero peak. "1" works better, but fails if ASICs have a lot of signal. The unbounded pixels always work. If we want to threshold the pixels in high gain mode, I would suggest 2.5 rms or 25 ADU as typically working values to start with.

-s <size>: rebin image to size x size pixels

-m <common mode parmaeter>: apply common mode

-t <thres in ADU>: hitfinder

-T <thres in Rms>: hitfinder

-R store raw cspad data (NOT image)

The hdf5 file also stores the pedestal and rms values. If the data is stored is "raw" format, then the big CsPad will have the shape of 32x185x388 instead of 1692x1691. The same is true for the pedestal and the rms. We also store the x/y values for each pixel.