Full command log:
Full command record
Log into the server
ssh cinliu@137.110.172.99
cd /space/chen-syn01/1/data/cinliu/data/fIDlist/
cat idlist.txt
cd /space/gwas-syn1/1/data/GWAS/UKBioBank/phenotypes/Baskets/
ls –l
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb43789.csv > /space/chen-syn01/1/data/cinliu/data/fIDlist/list43789.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb42438.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list42438.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb42012.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list42012.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) c{print fn" = "$fn;}; exit; }' ukb41296.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list41296.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb40545.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list40545.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb40544.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list40544.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb40543.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list40543.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb40542.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list40542.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb40541.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list40541.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb40539.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list40539.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb37384.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list37384.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb37115.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list37115.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb37113.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list37113.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb37112.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list37112.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb37111.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list37111.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb37110.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list37110.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb37109.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list37109.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb37108.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list37108.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb37107.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list37107.csv
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukb32537.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/list32537.csv
logout
scp cinliu@137.110.172.99:/space/chen-syn01/1/data/cinliu/data/fIDlist/idlist.txt /Users/niniliu/Desktop/2020lab/hao/DataUKB
scp cinliu@137.110.172.99:/space/chen-syn01/1/data/cinliu/data/fIDlist/*.csv /Users/niniliu/Desktop/2020lab/hao/DataUKB
grep -f idlist.txt list43789.csv
grep -f idlist.txt list42438.csv
grep -f idlist.txt list42012.csv
grep -f idlist.txt list41296.csv
grep -f idlist.txt list40545.csv
grep -f idlist.txt list40544.csv
grep -f idlist.txt list40543.csv
grep -f idlist.txt list40542.csv
grep -f idlist.txt list40541.csv
grep -f idlist.txt list40539.csv
grep -f idlist.txt list37384.csv
grep -f idlist.txt list37115.csv
grep -f idlist.txt list37113.csv
grep -f idlist.txt list37112.csv
grep -f idlist.txt list37111.csv
grep -f idlist.txt list37110.csv
grep -f idlist.txt list37109.csv
grep -f idlist.txt list37108.csv
grep -f idlist.txt list37107.csv
grep -f idlist.txt list32537.csv
ssh cinliu@137.110.172.99
cd /space/gwas-syn1/1/data/GWAS/UKBioBank/phenotypes/Baskets/
awk -F "," '{print$1,$367,$368,$369,$389,$390,$391,$392,$393,$394,$395,$396,$397,$398,$399,$400,$401,$402,$403,$404,$405,$406,$407,$408,$409,$410,$411,$412}' ukb40545.csv > /space/chen-syn01/1/data/cinliu/data/UKBcsv/40545.csv
awk -F "," '{print$1,$958,$959,$960,$961,$1323,$1324,$1325,$1326,$14462,$14463,$14464,$14465,$14466,$9471,$9472,$9473,$9474,$11060,$11061,$11062,$11063,$9616,$1339,$1340,$1341,$1342,$1132,$1133,$1134,$1135,$8779,$8780,$8781,$8782,$8631,$8632,$8633,$8634,$629,$630,$631,$632,$878,$879,$880,$881,$6414,$6415,$15512,$15513,$15514,$15515,$15516,$15507,$15508,$15509,$15510,$15511,$15502,$15503,15504,$15505,$15506,$1020,$593,$610}' ukb40539.csv > /space/chen-syn01/1/data/cinliu/data/UKBcsv/40539.csv
awk -F "," '{print$1,$4500}' ukb37110.csv > /space/chen-syn01/1/data/cinliu/data/UKBcsv/37110.csv
scp cinliu@137.110.172.99:/space/chen-syn01/1/data/cinliu/data/UKBcsv/*.csv /Users/niniliu/Desktop/2020lab/hao/DataUKB/
Page Break
EXPLAINATION
This is a step by step explanation of how I extracted the UKBio bank data.
(I really did it half manually so apologies in advance if that causes any inconvenience.)
PART1: locating the data
cd /space/chen-syn01/1/data/cinliu/data/fIDlist/
cat idlist.txt
Copy and pastes the ‘Data-field ID number’ from the given factors into a .txt file called “idlist.txt”
Go to /space/gwas-syn1/1/data/GWAS/UKBioBank/phenotypes/Baskets
Using the ls –l command view the timestamp of the files
Using the following command I was able to view just the first row with their collumns number printed:
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' filename.csv
So the displayed output is everything in the first row only, the number infront of the equal sign prints out the column number.
Using the command:
awk 'BEGIN{ FS="," }{ for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' ukbxxxx.csv> /space/chen-syn01/1/data/cinliu/data/fIDlist/listxxxx.csv
I saved the first the most recent 20 .csv’s first row inofmation with the columns number into a separate list to space/chen-syn01/1/data/cinliu/data/fIDlist/
Now using grep -f idlist.txt listxxxx.csv look through the lists manually and record the location (column number and file number) in an excel
(this step takes a while because it’s just manual)
But basically what returns for each command is something that looks like this:
I just manually look throught the field ID (right hand side of the equal sign) to see if there is any match, if there is a match with then I note it down the information in the excel: https://ucsdhs-my.sharepoint.com/:x:/g/personal/cil001_health_ucsd_edu/Ea3BxBdGOcdMnQl6UExyTSkBj2mVpOCNsEnIWpw5jo7Xhg?e=tCQ1mS
PART2: extracting the data
Once locations (ie column numbers) have been located I log back into our server and export the desired columns to separate csv
Scp the csv files and combine into one excel (using R, see ‘readUKB.R’)
Page Break
Background information
There are 2 main types of files I used (accessed from /space/gwas-syn1/1/data/GWAS/UKBioBank/phenotypes/Baskets):
The main data = ukbxxxx.csv
The coding key for the field = ukbxxx.html
The main data
The actual data of the .csv file will look something like this is opened in excel:
The meaning of the column headers:
F = field number
I = instance index - used to distinguish data for a field which was gathered at different times
A = array index - used to distinguish multiple pieces of data for that field which were gathered at the same time
Note: More information on data can be found on page 21-23 of Accessing_UKB_data_v2.3.pdf
After the data is extracted you can look in the .html files to see the details of how each item is coded
Click on the hyperlinks in the description to see the details of the incoding.