General Research Tutorials

These started off as notes in my lab book on how to do tasks that I routinely do during the course of my research.

Combining multiple text files (on a Mac) using the 'sort' function in Terminal

Linking SONA to Qualtrics and Pavlovia

This tutorials uses URL queries to automatically send SONA participants to your Qualtrics and/or Pavlovia study, automatically recording their participant ID, and back to SONA for immediate credit.

I made this into a PDF and uploaded on my OSF page here for ease of sharing - thanks to PsychoPy for picking this up on social media and sharing on their website!

Combining multiple text files (on a Mac) using the 'sort' function in Terminal

TL:DR - An easy way of combining multiple data files (comma separated, tab delimited, etc.) is to use the Sort command in terminal and dragging the files into the terminal window. sort drag files >newFile.txt

Experimental programs typically outputs one file (usually .txt or .csv) per participant, and while some (e.g. ePrime) have an function in their complementary data preparation/analysis software to combine draw the data from specified files to run group analyses, there are others that don't come bundled with additional analysis software (e.g. PsychoPy). You'd then have to write some code to combine the files and the logic typically involves running a loop that reads each file in the output folder and writing the content into a specified combined file. From my experience the same script will not work for every case since each program and experiment has their own way of producing output files that requires you to edit the original script.

If you only use one program and your output files are always consistent, it would be more efficient to write a specific script for that and reuse it all the time. This tutorial is one way of combining files that I find gets the job done the fastest when dealing with data from different experimental programs. It uses the Terminal and Finder, which comes standard in any Mac (ever, probably? Earliest OS X I've used was 10.2).

The example example files in this demonstration are from Experiment Builder , which annoyingly outputs a folder of files for each participant with all the files within each folder having the same name as the corresponding ones in other participants' folders. To be fair, EB is mainly for running eye-tracking experiments and their accompanying software (Data Viewer) for eye-tracking data is great. Would be nice if they had a similar software for non eye-tracking data.

I've previously written code in R to do this for EB files, and it's obviously much easier once you have a working script. But I've been using a few other programs recently, with outputs files a bit differently, so this is a good general alternative.

This example deals with a folder that contains one folder for each participant, each containing a data file called RESULTS_FILE.txt, so minor tweaking has to be done if your data is organised slightly differently.

Tutorial starts here

1. How the folders are organised: Individual folder for each participant, containing the results files (oops, realised I had previously deleted all the extra log files that EB spits out with the data, so each participant folder only has one file. But it still works exactly the same.).

2. Launch Terminal (I used spotlight: cmd+space) and navigate to the folder using the 'cd' commands. Since my folder 'exampleDataFiles' is on the desktop, I typed cd desktop followed by cd exampleDataFile

3. Select all of the files that you want to be combined. Since all the files have a similar (the same, in this case) name, I can simply search for the file name in the 'exampleDataFiles' folder, which will show me all files within all the subfolders. Cmnd+A will select them all.

4. In the Terminal type sort followed by dragging the selected files onto the Terminal window, followed by > and the new file name you want for the combined data. The new file will be written into the current folder (exampleDataFiles folder in this case).

TaDaaa!

Do note that the new file is sorted by the first column of the original files, so you'll have to bear that in mind and reorder the rows it if required.

----------------------------------------------------------------------------------------------------------------------------

Using QMPE for ex-Gaussian analysis

This is a quick guide/notes on how to run the QMPE software. For a comprehensive guide, always refer to the technical manual. Also note that I mainly work with response time data and fully within participant designs, and thus some of the specific tips might not be fully applicable to other types of situations, and adjustments should be made.

I wrote this guide for a student who was doing ex-Gauss analysis for his Masters project. He didn't have any questions or ask for any clarifications, so I guess that's a sign that the instructions are clear enough!

Required files and software

To generate the ex-Gaussian parameters you will need the QMPE software, an instruction file, and a data file (all downloadable from the Newcastle Cognition lab website).

Preparing the data

All the data needs to be in one single file, so if the experiment software only generates individual data files for each participant, these will have to be combined first (easily done with free online software such as txtcollector, or with code. On Mac, this can be done using the sort command in terminal <tutorial soon>).

The data file has to formatted in a way that the QMPE software can recognise (i.e. long format). The data file is in .dat file type, but is essentially a tab-delimited document which can be opened and edited in any software that can tab delimited format, like MS Excel, Wordpad, TextEdit, etc.

For basic function of estimating of ex-Gaussian parameters (which is what I normally do), a correctly formatted data file consists of two columns: a ‘data’ column on the right, which contains individual data points (e.g. RT of each trial), and an ‘index’ column, which identifies the participant and condition the data point belongs to.

In this format, every data point takes up one row. So the total number of rows will be equal to the total number of trials all participants went through. For example, if 20 participants did an experiment that had 100 trials, the data file will have 2000 rows (20 x 100).

The index column may have to be created if it was not already pre-programmed into the experiment output file. This can be easily done in Excel via sorting/filtering or concatenating the participant and condition columns (just make sure the resulting index is an integer and not a string). The end result is that all trials from the same condition and participant will have a unique number, with the total number of unique numbers in a fully within-participant design equal to the number of conditions multiplied by the number of participants. I recommend the data to be sorted by condition first, then participant (like in the example below) to make it easier to reorganise the output for analysis.

Example of a data file with 2 participants and 2 experimental conditions. First column indexes the unique participant/condition combination, second column is the DV. Column C would not be in an actual data file.

Removing invalid data

For my research, only responses that are correct and within 200 and 2500ms are included in analyses. The rationale for using only correct responses is that since the tasks are very simple (accuracy is typically very high, >95%), erroneous trials can be assumed to be due to other cognitive processes that are not being investigated. For the hard RT cut-offs, these are conservative assumptions of the reasonable amount of time for encoding, processing, response selection and response execution. Correct trials faster than 200ms may be anticipatory or accidental, while responses over 2500ms may be due to being distracted by an external stimuli. As with wrong responses, I assume that the factors influencing these extremely fast and slow responses are not relevant to the cognitive processes under investigation and should be left out of the analyses.

After removing all the invalid data, the number of valid data in each index has to be calculated (easily done using pivot tables in Excel). The QMPE software requires the user to specify the number of quantiles to be estimated. The maximum number of quantiles is N-1, where N is the number of valid data-points left in the smallest index. For example, after all invalid data has been removed and the index with the least number of valid data-points left has 85 trials, the number of quantiles to be specified later in the instruction file is 84.

Invalid data can be removed prior to formatting, but I personally do it after as it is easier to generate an index when I know exactly how many trials I am working with (i.e. all of them). Save the file and if required, change the extension to .dat (make sure the file type is changed to .dat, and not just the file name e.g. dataFile.dat.xlsx is still an excel file and will not be recognised by the software).

Preparing the instruction file

The instruction file has a .p extension, but can be edited in Notepad, Wordpad, etc. Instructions are embedded in the comments but can be a bit confusing. The sample instruction file is presented below and the text in bold are my additional comments/instructions. Those comments correspond to the lines that I would typically take note of and edit before running the program.

# A line begining with # -Hash- is ignored (for comments)

# Text lines cannot be followed by comments, numeric parameters can.

# SAMPLE INSTRUCTION FILE

# This file contains intstructions to start an analysis.

# Format is very strict.

################################################################

# First, present the input data file name.

sample.dat #make sure this matches your data file name

################################################################

# Second is the output file stem (no extension)

# creates:

# *.par for best-fitting parameters and standard errors correlations

# *.oe = observed and expected quantiles/vincentiles/raw

Sample #this will be the name of your output file

################################################################

# Third

.00001 Measurement unit size, in general 1 ms for RT.#change this unit size to suit your data

# Fourth: Mode: 0 = silent running, 1= one output/cell,

# 2 = trace fit, >7 conditional trace mode.

################################################################

# Fifth to seventh, convergence parameters,

# Parameters can be changed while fitting is running in trace mode

1.e-9 Proportional objective function change tolerance

1.e-5 Proportional L(inf)-norm tolerance,

# i.e. all parameters must change less than this

250 Maximum number of iterations allowed in one search

################################################################

# Eighth,

1 Type of distribution to fit (1=ExGaus, 2=Weibull, 3=LogNormal, 4=Gumbel,5=Wald)

# 0=none, remaining of this file is ignored.

################################################################

# Ninth

2 Fit to 1=raw data, 2=quantile, following lines ignored if 1 on this line.

# For next parameter line, 0=>precalculated quantiles, 1=>maximum number of quantiles,

# 2=> fixed number of evenly spaced quantiles, 3=>specify p values, one per line, start with 0 end with 1

3 #This is potentially the most confusing part. This sample uses user specified p-values instead

0 # of even quantiles. Change the ‘3’ to ‘2’, and then enter the maximum quantile (calculated

0.2. # earlier) in the line below it. The other lines (0.2, 0.4 etc.) can be deleted.

0.4

0.6

0.8

Output files

Run the QMPE program by specifying the instruction file, and two output files will be generated, with extensions .oe and .par. The program will automatically exit after it is finished the procedure, but also when it encounters an error. So, always check that the number of rows in the .par file is the same as the index. The .oe file contains the observed and expected values (which can be plotted to visualise model fit), while the .par file contains the parameter estimates. Estimates for Mu, Sigma, and Tau are in columns 7, 8, and 9 respectively of the .par files, and should be extracted to your preferred statistical analysis program. Column 5 is also important to take note of as it is the exit codes, which can highlight a potential problem with fitting that specific parameter. Typically anything <32 is not an issue while parameters with exit code 32 and above are removed from analysis (those parameters also typically have values that are very different from the others). Again, consult the manual if you are unsure. To try to not lose such data you can re-run the analysis with adjustments the starting values and maximum iterations specified in the instruction file, if you know what you're doing, but typically us mere mortals just stick with the default numbers and remove those parameters.

If you are unsure how to work with tab-delimited files, an easy way of doing this is to open the file in Excel, where all the data are on one column, and use Data>text to columns to separate them out to individual columns. Then copy/paste desired columns into a new sheet and format it for analysis e.g. SPSS requires a ‘wide format’ where each participant takes up one row which is a simple copy/paste job if the index was organised by condition first then participant, as recommended earlier. (if not then you will have to use a few more functions and/or pivot tables).

The parameters can then be analysed as you do typical data.

Google Sites

Report abuse