Sea Hero Quest is a game that was developed by researchers at the University College London (UCL) for mass distribution intended to enable behavioral screening of wayfinding and path integration at scale.
We have chosen to adopt this for our first study with the epileptic patients at IUPUI.
The organization of materials related to this project are as follows:
newmanmemorylab@gmail.com google drive /Projects/SeaHeroQuest
Dropbox (NewmanLab)/scripts/projects/Sea Hero Data
Password: SeaHeroQuest#1324
The data is HIPAA protected. This means several things:
It must be de-identified before downloading from clinical servers
Downloaded data should be on lab workstations, lab hard drives, or lab-managed HPC data partitions only. No data should ever be on personal computers.
People must be on Dr. Gupta's IRB before being granted access to the data. This involves:
Having an up to date CoI form: https://research.iu.edu/compliance/conflict-interest/disclosure.html
Having completed the IRB training: https://research.iu.edu/training/required/human-subjects.html
Below are notes collected from discussions with various relevant people:
People local to the project
Hospital side
Dr. Kunal Gupta M.D. - Neurosurgeon who implants patients
Hill, Haley M <hhill5@iuhealth.org>
Role: Administrative lead of the Epilepsy Monitoring Unit
Holloway, Matthew A <mhollowa@iuhealth.org>
Title: Sr. Applications Support Analyst - Neurophysiology Indiana University Health at Riley Hospital for Children
Spoke to about plugging our game into the acquisition system
He's was not particularly helpful when it came time to get this set up (and was fairly short with us)
Below are non-local people who also use Sea Hero Ques
Here are people cited by Antoine at the end of his 3/12/21 presentation
Hugo Spiers - One of the PIs leading the development of SHQ
Antoine Coutrot - Data scientist and first author of Global Determinants of Navigation Ability
Michael Hornberger - PI of the SHQ project?!? (guessing based on how Hugo loops him in to all major decisions), currently doing work with SHQ and AD
Screen grab of papers Antoine cited in his 3/12/21 presentation
Coutrot, Antoine and Manley Ed and Yesiltepe, Demet and Dalton, Ruth C and Wiener, Jan and Hölscher, Christoph and Hornberger, Michael and Spiers, Hugo. Cities have a negative impact on navigation ability: evidence from 38 countries, 2020 [bioRxiv].
Coutrot, Antoine and Schmidt, Sophie and Coutrot, Lena and Pittman, Jessica and Hong, Lynne and Wiener, Jan and Hölscher, Christoph and Dalton Ruth C and Hornberger, Michael and Spiers, Hugo. Virtual navigation tested on a mobile app is predictive of real-world navigation performance, PLoS ONE, Vol. 14, No. 3, e0213272, 2019 [pdf].
Coutrot, Antoine and Silva, Ricardo and Manley Ed and de Cothi, Will and Sami, Saber and Bohbot, Véronique and Wiener, Jan and Hölscher, Christoph and Dalton, Ruth C and Hornberger, Michael and Spiers, Hugo. "Global Determinants of Navigation Ability ", Current Biology, Vol. 28, No 17, pp 2861-2866, 2018 [pdf].
Jan 14, 2021 - Ehren decided that Sea Hero Quest was best choice for task to use with patients at hospital
Reasons included:
Appetitive task. Kahana's advice had been to identify a game-like task that patients would want to play instead of be exhausted by, Sea Hero Quest was designed to be fun and engaging.
Relevant. Our lab studies hippocampal dependent function including spatial navigation. SHQ was designed to measure two forms of spatial navigation: wayfinding and path integration.
Rapidly deployable. The first patients were being implanted within a few months of the initiation of our collaboration with Dr. Kunal Gupta. We needed a task that would be sufficiently polished and pilot tested in ~ 2 mos. This game was already built and tested. It is not assumed that we couldn't find or develop a better task in time, but SHQ was ready for immediate use.
Known entity. In the spirit of not re-inventing the wheel, the fact that SHQ had been used extensively with human subjects previously and had a track record in 'the literature' was appealing and will hopefully save us headaches later in convincing reviewers that it is a valid tool.
Enormous benchmark data set available. Unlike a novel task, the fact that ~4 million people have played SHQ worldwide provides solid contextual background to interpret data we collect in our patients.
Jan 19, 2021 - First subject implanted (code named AA-01 or 210119) and Ningyao, Anna, and Austen helped to collect data over four days (1/22, 1/23, 1/25, 1/27)
Lessons learned included
avoid running the subject immediately after 'stim testing' - performance was terrible and the patient struggled to focus.
avoid running the subject the day after the implantation - they are still groggy and recovering from the surgery
Feb 2, 2021 - Second subject implanted (code named AA-02 or 210202). Data was collected over two days (2/5 & 2/6)
Feb 16, 2021 - Third subject implanted (code named AA-03 or 210216). Data was collected over three days (2/17, 2/19, 2/23)
Mar 12, 2021 - Antoine Coutrot, first author on several prior SHQ publications, gave a zoom presentation about the work that has been done with SHQ
A recording of the second 2/3rds of the talk are available here. What isn't recorded was his description of how useful video game based behavioral assessment can be for getting a broader cross section of data than standard laboratory based data collection. He also explained that his video game is a navigation based task called Sea Hero Quest wherein players navigate a boat across a map to a series of check points. What is recorded are all of the findings that have emerged from the use of this game thus far (and all that could still be done!).
Notes Ehren took during the talk:
Pitched SHQ as solution to WEIRD problem in science - studies focus on people from Western, Educated, Industrialized, Rich, Democratic populations (Henrich 2010)
SHQ collected data from all countries in the world, 55:45% male:female ratio, from 6/16-3/19
Published major results included,
performance declines with age (until mid-60s, but above that may be sampling error), (Coutrot et al., Current Bio, 2018)
performance is worse for females but this effect is proportional to the gender equality index from the country of the player (Coutrot et al., Current Bio, 2018)
people from cities perform worse especially if those cities have very low entropy in the layout (a grid layout is low entropy compared to non-grid layout), (Coutrot et al, BioRxiv 2020)
performance in SHQ is more discriminative of APOE gene status than declarative memory assessment (Coughlan et al PNAS, 2019)
Effect was even clearer when data was normalized by global demographics
Ongoing work includes
Education and country significantly mediate the predictive relationship between the effects of age and gender on way finding ability. (Coutrot et al. (in prep))
Gave example of 'causal evidence' comparing the change in navigation ability for UK residents affected and not affected by a new law requiring one extra year of education - one additional year of education increased navigation performance.
Analyses of trajectory information rather than simply 'time to complete a level'
Work of Coutrot graduate student Hippolyte Dubois
Performing trajectory clustering, graph signal processing
Topics discussed with him after the talk
"Should we normalize performance by levels 1 & 2?"
Yes. Do it both ways. See what the effect is.
"How to chose a level to use with the patients?"
Possible to score level difficulty by dividing the median path length by the minimum path length. Rank all levels by this.
"Are some levels more predictive of performance on other levels?"
We've looked at this, I can send you the results
"What advice do you have for use looking at what are effectively 'case studies' given our low n?"
Plot and study every trajectory, compare them to the mean over other subjects from the same demographic. Make a movie plotting their movements versus everyone else's. Look for systematic differences in how they perform (e.g., stay closer to walls, turn around more often, etc.)
"What other work is going on using SHQ with with AD patients?"
Michael Hornberger is doing more work.
Antoine would like to build new ties to AD populations to use SHQ with them
I indicated that I would like to do the same at IADC and would include him if I could get a foot in the door there
Information dumps from Hugo
from 1/25/21:
"here’s a link to the paper we’ve been working on that examines the graph theoretic aspects of different levels. It needs significant revision, but the core idea is there.
"https://drive.google.com/drive/folders/1sNU5sI9BrHTkdlNcbDlNAW2GR4wjJjks?usp=sharing"
** Note from Ehren: We have saved a copy of this folder to the memlab Google Drive **
** most helpful, it includes an analysis of which levels are most difficult and it includes overhead maps of all levels **
Information Dumps from Antoine
from 1/25/21:
Regarding existence of documentation regarding the formatting of the SHQ log files:
"I don’t have a formal documentation but the log structure is simple. Each json file corresponds to 1 attempt of 1 level. It contains 3 fields:
- meta, with the number of the attempt and of the level, the duration (in sec) of the participant attempt to complete the level, and the map view duration (in sec) prior the start of the navigation. For path integration levels (levels 4, 9, 14, 19…) the map view duration is replaced by the flare accuracy. Flare accuracy = 3 corresponds to the correct direction, flare accuracy = 1 or 2 to the incorrect directions.
- events, which you can ignore,
- players, which contains the coordinates of the trajectory sampled at 2 Hz.
"I can send you Matlab scripts to extract the trajectory lengths, duration, and flare accuracies from the log files, or you can create your own scripts with your favorite software. Let me know what you prefer. I can also send you the global dataset. It’s quite a heavy file, so it can help if we narrow it down to the demographics you’ll need to compare your patients with. "
from 1/27/21:
"The events are the landmarks that can be seen from the current boat coordinates. The problem is that we don’t have a dictionary linking the landmark numbers to actual landmark shape and position, so it is not interpretable. This comes from a misunderstanding between the game developers and our team. Technically we could rebuild this dictionary by hand identifying the landmarks one by one, but we never have had time for this…
"Here is a link to the scripts I use to extract the trajectory length, duration, flare accuracy…: https://www.dropbox.com/s/00xrcnzjzjuqj5w/process_shq_data.zip?dl=0
The main script is process_data.m, the data should be in the data folder (1 folder per participant). The map folder contains the sketch maps of each level.
Don’t hesitate to ask if anything is unclear.
"The processed data file (i.e. with only the distance / duration / flare accuracy) is about 4Go. The raw data with all the trajectory coordinates is about 1 To and quite complicated to process.
"For the comparison, percentile rank of the path length totally makes sense. We usually normalize the path length of the wayfinding levels by the path length of the tutorial levels (levels 1 and 2) to account for video gaming skills."
** Note from Ehren: I downloaded and saved the scripts from the above link to Dropbox. They are currently here - /Users/ehren/Dropbox (NewmanLab)/docs (1)/docs_Ningyao/Sea Hero Data/scripts **
From 1/29/21:
"Hi Ehren,
See answers below.
Le 27 janv. 2021 à 17:19, Ehren Newman <enewman@gmail.com> a écrit :
Thanks for the code Antoine. I gave it a run and it was perfect and clear. I have a few additional questions. I'm sorry to be any burden!
The extracted duration aligns better to the number of samples in the 'player' section of the log than the 'duration' in the 'meta' section. For example, in the attached log for level 4, meta/duration=15.9, player has 56 samples, and the extracted Metrics.duration = 28.5. Is there a reason that meta/duration is off?
Interesting! In the Metrics table, the duration is computed from the number of samples. Since the sampling frequency is 2 Hz and there are 57 samples, Metrics.duration = 28.5 s
I’ve never seen such a discrepancy between the meta duration and the sample-based duration. The game dev team coded the meta information, I guess there is a bug, maybe only for the flare levels, which would explain why I never noticed it before?
Anyway, you should stick with the sample-based duration stored in the Metrics table, since we know how it’s calculated.
For the full database of processed data, is that the 'Metrics' table for all data? If so, that would be great to get a copy of. Can you include demographic information? I don't think we need the full trajectory data at this point.
Yes, more or less.
all_levels.csv contains the trajectory length for all levels (except levels 5, 10, 15… where you have to take a picture of a monster).
all_levels_dur.csv contains the duration for the wayfinding levels (levels 1,2,3,6, 7,8,11,12,13,…), and the flare accuracy for the path integration levels (levels 4, 9, 14, 19…).
SHQ_simpledemographics.csv contains the age, gender, country, handedness, level of education of the participants.
Concretely, how do you normalize by levels 1 and 2? Is it: pathlength_norm_lvl_x = pathlength_lvl_x / (pathlength_lvl_1 + pathlength_lvl_2) for every player separately?
This is correct.
Then the percentile ranking for a given individual is it pct_rank = mean(pathlength_norm_lvl_x_player_y / [pathlength_norm_lvl_x]) where [pathlength_norm_lvl_x] is the normalized path lengths over all players?
"I would use pct_rank = (CF-1)/N, where CF is the count of all scores less than or equal to pathlength_norm_lvl_x_player_y and N is the total number of scores.
That's too bad about the events not being so usable. Labeling these may be the kind of thing I could have a student do. Even if we can't label all of them, it may be possible to ID key landmarks by playing the game with this goal in mind and then examining the log files. This would take time but would be doable if we know what points are possible landmarks.
"Yes, this is what I was considering as well. Poor student ^^ I’m not even sure whether all the daymarks are coded in this landmark dictionary or not. But it’s worth trying and see if it makes sense.
Last question - I assume that 'r' in 'player' is an indication of rotation but what are the units?
"Correct. It parses the full circle in 16 bins, so r=0 means between 0 and 22.5°, r = 1 means between 22.5° and 45° and so on. Don’t ask me why it’s coded like that, it’s bad. If you need more precise number you can also compute them from the trajectory coordinates.
Finally, I should say that Hugo mentioned that you've done some cool work parsing the trajectories to compare when / where people go astray. At some point after this project has advanced further (we have more data at the least), it would be great to chat with you more about this. I love the idea that your approach could improve our estimate of how oriented the patient is at every point in time - a value we could compare to their ongoing brain activity. If you're interested to share what you've been working on in this regard, I would be glad if you would present to my group at a lab meeting eventually.
"I think Szymon Walkowiak, one of Hugo's PhD students is precisely working on that matter. I work with a PhD student who uses machine learning to extract rich representations from the trajectories (beyond the simple length or duration), and tries to link these representations to the demographics. I’m sure he would be glad to present his work if you are interested. Or I’d be happy to present a panorama of the SHQ-related studies published so far :)
"Don’t hesitate if you have further questions!"
** Note from Ehren: I've download
From 2/2/21:
"Is there any documentation on what the possible landmarks are?
"Good question. Not sure, but if someone knows, it would be Demet (demet.yesiltepe@northumbria.ac.uk). She is a PhD student at Northumbria University (UK) and works on the effect of space syntax on navigation. She played a bit with SHQ landmarks, see this paper: https://www.tandfonline.com/doi/abs/10.1080/13875868.2020.1830993
"Would we be trying to accomplish too much if we have a single 60 or 90 min meeting aiming to cover both the panorama and the generation of rich representations?
"Not necessarily since the generation of rich representations is still work in progress :)
Anyway, I’d be happy to present a panorama and we can develop the angles that most interest you in the discussion."
I was tasked by professor Newman to find a good way to read this large dataset and find any patterns in the data.
The following documentation goes along with the .ipynb file linked on this wiki page. Also on the memlab google drive in SeaHeroQuest_frommax under SeaHero.ipynb
The scores for 3.8million participants are found within the csv file all_levels.csv with the rest of the SHQ data on dropbox.
The starting goal was to find a way to read such a large file and be able to access all rows within a timely manner, to do this I used a Python library called Dask.
Dask works utilizing another library Pandas to create datasets using low memory on python so that it can be navigated through and computations performed.
Dask works the same as pandas in that their functions are purposely identical.
dd.read_csv() will take the csv file we want to read as a parameter to load the dataset to a variable of type: dask dataframe
Different computational functions for mean, standard deviation, and other mathematic/statistic formulas
dd.compute() turns the dask dataframe into a pandas dataframe after doing large computations with dask
for more specific documentation for using dask dataframes, visit https://docs.dask.org/en/latest/dataframe-api.html
After successfully reading the data in a way that allowed it to be manipulated, I combined the all_levels.csv dataset with the demographic dataset.
Using the entirety of the dataset, I did a Pearson's correlation showed that several levels had a weak correlation (maximum being about 0.5), with the middle levels (levels 40-60) had the largest number of correlated levels. I figured that the low correlation could be due to skewed data, so I excluded anybody who didn't complete at least to level 48 and people who identified as either 18 or 99 years old. This changed the dataset from 3.8million rows, to only 78,000 rows.
I re-calculated the Pearson's correlation, using the restricted data and while the correlation levels did seem to increase, there still was no definite correlation between any two levels. However, the filtering data shows levels 52 & 53 have the strongest correlation with each other. 43, 52, 53, 58 & 71 have the most overall correlation with other levels, with correlation around 0.3.
So what could make those levels correlate?
Possibilities
All are type: checkpoint
All are Difficulty: Hard, only 10 levels out of 74 are labeled 'Hard'
All are aligned with the path
4/5 are Landmarks: Easy, with one level (53) being Landmarks: None
4/5 are Global: No, with one level (53) being Global: Yes
At this point we wanted to see reasons why there may not be any correlation. The graph above is a plot of each participants' distance traveled for level 53. Notice the exponential increase to over 8000, when the mean is under 1000. This graph looks the same for every level and I thought some of the low correlation between levels could be due these high numbers skewing the correlation. A good way to see whether the data is skewed is to take the z-score by using the standard deviation and mean for each level, and discretizing the scores into categories. Doing this also tells us about the variability of each level.
Z-table graph for scores on Level 1:
low variability
Z-table graph for scores on Level 41:
more variability
looks more like standard normal distribution
The attached file all_zscores.csv contains the raw data for the graphs shown above and all other levels.
One thing that was discussed but wasn't able to be completed in the timeframe is looking at the z-scores for all levels based on quintile splits of levels 1 & 6. The hope of this analysis is that it will reveal which levels become more difficult across quintiles based on performance on earlier levels. (useful for calibrating which levels to have patients at hospital play based on benchmark level performance).
Non-EEG Data Collected from Patients and Other Related Clinical Questions
Taken from a e-mail conversation in which Vibin asked Dr. Gupta questions. Vibin's questions and Dr. Gupta's responses shown below.