awk
Example 1
Extract a word in a line containing keywords:
$awk ‘$2=="keyA" || $2=="keyB" {print $3}’ subject.txt > outfile.txt
Example 2
Use 'grep' , 'awk' to extract lines containing keywords.
$grep "tig00000001.gff | grep "Aragorn:1.2" | awk '$7 == "-" {print $0}' > tig00000001_RNA_minus.txt
Example 3
Use with 'sort | uniq -c' to grasp NGS read variation in a fastq file (convenient for amplicon sequencing)
$awk '(NR%4==2){print $0}' subject.fastq | sort | uniq -c > output.txt
BAPS
# preparation of BAPS file.
1. Prepare vcf file. Reorder columns according to source/country of the isolates.
2. Convert modified vcf to BAPS using PGDSpider2. java -Xmx1024m -Xms512m -jar path/to/PGDSpider2.jar. MLST data can be used as input data after adding isolate ID to the last column.
# infering genetic population structure using SNPs, VNTR or MLST data:
1. Launch BAPs by typing ./run_baps6.sh /Applications/MATLAB/MATLAB_Compiler_Runtime/v713 in terminal.
3. Start “clustering of individual” (be aware that isolate number is already given to the last column by PGDSpider2). Define several values of maximum number of cluster, like "5,8,10". This step requires information of population name (=isolation source/country), and the first row number for each population.
4. Save pre-processed data (1 file) and mixture results (2 files).
5. Load BAPS results (.mat) file for admixture analysis. Define cut-off cluster size. “1” takes into account all clusters.
6. Save two result files. The result file (with probability) can be used to generate barplot after reformatting.
BEDTools
Example 1
Calculate read depth over the genome(s) excluding "0" coverage regions.
$ bedtools genomecov -ibam XXXXXX.bam -bg >out.txt
Example 2
Calculate read depth for each base position in the genome.
$ bedtools genomecov -ibam XXXXXX.bam -d >out.txt
bamToFastq (in bedtool package)
Example 1
Generate a fastq file from a bam file.
$ bamToFastq -i XXXXXX.bam -fq XXXXXX.fastq
bc
Example 1
use as calculator in Linux, Unix
$ bc (Enter)
1001037168/4
250259292
$ quit
DBGWAS
Exmaple: "It uses a compacted De Bruijn Graph (cDBG) structure to represent the variability within all bacterial genome assemblies given as input. Then cDBG nodes are tested for association with a phenotype of interest and the resulting associated nodes are then re-mapped on the cDBG. "
# see input file format in the following site.
https://gitlab.com/leoisl/dbgwas/-/tree/master/containers/docker#running-dbgwas-on-singularity
It runs on singluarity in Linux cluster machine.
$ singularity run ../software/dbgwas-0.5.4.simg -strains ./dbgwas/sample_example/strains -newick ./dbgwas/sample_example/strains.newick -nc-db Resistance_DB_for_DBGWAS.fasta -pt-db uniprot_sprot_bacteria_for_DBGWAS.fasta
blast+
Example 0
First, create blast database.
$ makeblastdb -in multifasta_file -out database_name -dbtype nucl -parse_seqids
Example 1
Generate results in alignment format.
$ tblastn -query aa.fas -db database_prefix -max_hsps 2 >blast_aln.txt
Example 2
Generate results in table format.
$ tblastn -query aa.fas -db database_prefix -outfmt 6 -max_hsps 2 >blast_tbl.txt
legacy blast
Example 1
Generate results in alignment format:
$ blastall -p tblastn -i aa_fasta.fas -d /usr/local/db/blast/refseq/refseq-genomic-bacteria -v 1 -b 1 -o blast_test.txt
link to legacy blast option:
https://www.ncbi.nlm.nih.gov/Class/BLAST/blastallopts.txt
Bowtie
Example 1
Generate alignments in SAM format
Indexing first.
$ bowtie-build fasta_file prefix
# for single end reads
$ bowtie -S -m 1 -v 1 prefix_of_index XXX.fastq XXX.sam
Bowtie2
Example 1
Generate alignments in SAM format. This handles compressed (gz) files.
Indexing first.
$ bowtie2-build -f reference.fa prefix_of_index
# for paired_end reads
$ bowtie2 -x prefix_of_index -1 XXXX_1.fastq -2 XXXX_2.fastq -q -N 1 -S XXXX.sam
Breseq
Example 1.
Identifying structural variation in haploid microbial genomes from short-read resequencing data.
1. Installation to the Linux machine. Move to the the source directory, then type:
$ ./configure --prefix=${PWD}
$ make
$ make test
$ make install
2. Usage 1. Standard mode.
$ breseq -n name_of_output -j 2 -r reference1.gbk reads1.fastq.gz reads2.fastq.gz
# -j : number of CPU you wish to use.
3. Usage 2. Polymorphism mode
$ breseq -n name_of_output -p -j 4 -r reference1.gbk reads1.fastq.gz reads2.fastq.gz reads3.fastq.gz
3. Usage 3. Retrieving coverage data
$ breseq BAM2COV -b ./data/reference.bam -f ./data/reference.fasta -o file_name_prefix <entry name here>:10000-15000 -t
# -t : tabular format option. It tells you the coverage of uniquely mapped read and ambiguously mapped read separately.
4 Usage 4. Generating coverage plot
$ breseq BAM2COV -b ./data/reference.bam -f ./data/reference.fasta --format PNG -o file_name_prefix <entry name here>:10000-15000
BSMAP/sambamba/methratio.py
Example 1
Determine cytosine methylation level for each site in a genome.
Step 1. Map Bisulfite-treated paired-end reads to a reference genome.
# (-r 0) allows only uniquely mapped reads.
# output format can be selected from either ".bsp" ".sam" or ".bam" by specifying the output file name.
# Works in both Linux and MacOSX.
https://github.com/popucui/bsmap/issues/17
$ bsmap -a R1.fastq -b R2.fastq -d ref.fasta -o outfile.bam -p 2 -w 100 -r 0 -n 0
Step 2. Remove PCR duplicate using "sambamba"
# sambamba needs to be installed
$ sambamba markdup -r -t 2 input.bam output.bam
Step 3. Determine the C to C+T ratio using "methylatio.py"
$ python methratio.py -o output.txt -d ref.fasta bsmap_output_file.bam
bwa mem
canu v 1.8
Example 1. Assembling long reads.
# shell script example.
#!/bin/sh
#$ -S /bin/sh
#$ -l short
#$ -l mem_req=8G
canu \
-p strain_A -d strain_A_results_dir \
genomeSize=5.4m \
-pacbio-raw ./strain_A_pac_fastq/*.fastq \
minReadLength=4000\
gridOptions="-l mem_req=8G" \
gridEngineMemoryOption="-l mem_req=MEMORY" \
gridEngineThreadsOption="-pe def_slot THREADS" \
corMhapSensitivity=high\
corMinCoverage=0
# four particuarly important options .
minReadLength=4000 (or other values, default is 1000)
correctedErrorRate=0.040 (or higher or lower values)
corMhapSensitivity=high (or normal)
corMinCoverage=0 (or default)
cat
Example 1
Add text data to an existing file:
$ cat file_A >> file_B
chmod
Example 1
Allow user to write, read, execute a file.
$ chmod u+rwx directory_name or file_name
Example 2
Allow user to write, read, execute all files under the selected directory. (Use this when erasing a directory containing protected files.)
$ chmod -R 755 directory_name or file_name
Circlator
https://github.com/sanger-pathogens/circlator/wiki
Circos installation
1. First, install perl modules by repeating the following command. XXXXX is a module name.
$ sudo perl -MCPAN -e 'install XXXXX'
2. Download GD-2.50 separately then compile files, and install.
$ curl -O http://www.cpan.org/authors/id/L/LD/LDS/GD-2.50.tar.gz
$ tar xvfz GD-2.50.tar.gz
$ cd GD-2.50
$ perl Makefile.pl
$ sudo make install
3. Download Circos executable, then export path to .bash_profile.
To run circos, type:
$ circos -conf circos.conf
ClonalFrameML
Example 1
Infer core genome phylogeny after removing recombination tracts. Whole genome alignment and ML tree are required.
$ ClonalFrameML newick_file seq_file output_prefix [OPTIONS]
# remove quote of tip label in newick file.
# https://github.com/xavierdidelot/clonalframeml/wiki
# Rscript command can no longer be installed in MAC
COG
Example 1
Assign COG ID to amino acid sequence list in a multifasta file:
1. Generate a list of amino acid sequence in multifasta format.
2. Submit the multifasta file to Web CD search tool: (http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi). Select "Search only" then select COG. Receive email. .
3. Go to NCBI site following the link in email. Select 'Domain Hits', Data mode=concise, and then 'Download'. Download results in HTML format.
4. COG ID - function category mapper is available from here (ftp://ftp.ncbi.nih.gov/pub/wolf/COGs/COG0303/cogs.csv).
du
Example 1
Check directory size
$ du -sh directory_name
EMBOSS
Example 1. seqret
Extract one entry from multifasta file.
$ seqret multifasta_name:entry_name entry_name.fasta
Example 2.
Extract one entry from multifasta file and generate its reverse-compelmentary sequence.
$ seqret -srev multifasta_name:entry_name entry_name.fasta
Example 3. extractseq
Extract specific regions from one entry sequnce in a multifasta file and generate its concatenated sequence.
http://www.bioinformatics.nl/cgi-bin/emboss/help/extractseq
$ extractseq multifasta_name:entry_name output_name.fasta -regions "10-20 30-45 533-537"
Example 4. dotpath
Draw a dotplot.
$ dotpath ref.(multi)fasta query.(multi)fasta -graph ps -overlaps -word 20 -goutfile dotpath -overlaps
reverse query sequence.
$ dotpath ref.(multi)fasta query.(multi)fasta -sreverse2 -graph ps -overlaps -word 20 -goutfile dotpath_r -overlaps
Example 4. yank, union
Concatenate sequences in the list
firt creat a list file using yank:
http://www.bioinformatics.nl/cgi-bin/emboss/help/union
then unicon command. @ is required.
$ union -sequence @entry.list -outseq test.fasta
FALCON
https://pb-falcon.readthedocs.io/en/latest/tutorial.html
fastGEAR (linux ver. needs MatLab)
https://users.ics.aalto.fi/pemartti/fastGEAR/fastGEAR_manual.pdf
fastME
Example 1
Construct distance-based phylogenetic tree.
# alignment needs to be in Phylip format.
# Example of DNA alignment (TN93+G model)
$ fastme -i input.phy -d TN93 -g -m NJ -b 100 -T 2
fineSTRUCTURE v2
Example 1
Estimate "fine" genetic population structure based on recombination events
$ fs project_folder_name -idfile XXX.txt -phasefiles YYY.phase -ploidy 1 -recombfiles ZZZ.recombfile
grep
Example 1
Extract lines containing a keyword:
$ grep -f keywordlist.txt subject.txt >outfile.txt
Example 2
Replace a phrase in a file (or files) by combining 'xargs', 'sed' commands.
# remove or modify values in .nwk file.
$ grep -rl ')[0-9]\.[0-9]\{6\}' ./ampC_copy.tree | xargs sed -i '.bk' 's/)0\.[0-9]\{6\}\:/):/g'
$ rm *bk
Gubbins installation on MAC
# first, install miniconda3, then
$ conda config --add channels r
$ conda config --add channels defaults
$ conda config --add channels conda-forge
$ conda config --add channels bioconda
$ conda install gubbins
Gubbins
Example 1
Generate ML tree of bacterial species based on core genome alignments after filtering recombination tracts.
$ run_gubbins.py [FASTA alignment]
# gubbins_drawer.py is no longer supported!!
# phandango for data visualization
https://jameshadfield.github.io/phandango/#/
GenomeMatcher, (MiGAP - discontinued)
Example 1
Annotate bacterial genome sequences, then extract qualifier values (COG classification, product name, gene name etc.) per feature in a single line in tab-delimited format.
1. Create a sequence file in fasta or multifasta format.
2. Submit the fasta(multifasta) file to MiGAP.
3. Download annotation in GenBank/DDBJ/EMBL format.
4. Open GenomeMatcher -> Accessories -> ExtractFromGenBankFile
then, load the GenBank file.
5. Select features of interest, then click on ‘execute extraction’
6. Copy the results and paste them on excel sheet.
Example 2
Visualize inversion in a bacterial chromosome using GenomeMatcher:
1. Load sequences to x and y axis in main window.
2. Click on ‘colorgram’. This runs blast program.
gzip, tar compression in mac
Example 1: tar
Compress large files, for example fastq files, to reduce file size
# file
$ tar -zcvf file_name.tar.gz file_name
# directory
$ tar -zcvf directory_name.tar.gz directory_name
# directory
$ tar czvf ../short_read_directory.tar.gz ./short_read_directory
Example 2 : tar
Decompress .gz files
$ tar -zxvf file_name.tar.gz
Example 3: gzip
Compress a large file, for example fastq file, to reduce file size without leaving a copy of the original file.
$ gzip file_name.fastq
Example 4 : gzip
Compress a large file and make a copy of the original file
$ gzip -c file_name.fastq >fine_name.fastq.gz
Example 5: gzip
Decompress .gz file(s) without leaving the original file.
$ gzip -d file_name.fastq.gz
HTseq-count
Example 1
Obtain mapped read count based on annotation in a .gff file in strand specific RNA-seq
$ htseq-count -f sam -s yes -t CDS -i locus_tag XXXX.sam YYY.gff >count.txt
# Illumina TruSeq
$ htseq-count -f sam -s reverse -t CDS -i locus_tag XXXX.sam YYY.gff >count.txt
Example 2.
obtain mapped read count based on annotation in a .gff file in strand non-specific RNA-seq
$ htseq-count -f bam -s no -t CDS -i locus_tag XXXX.bam YYY.gff >count.txt
Image J
Example 1
Create a stack image.
1. Click on "File" > "Open" (select my TIF file)
2. Click on "Image" > "Images to stack" (two images will be merged)
3. Click on "Image" > "Crop", if necessary
4. Click on "File" > "Save as" (enter new file name)
Example 2
Create a montage.
1. Click on "File" > "Open" (select a stack TIF file)
2. Click on "Image" > "Stack" > "Make Montage"
3. Move to tool bar. Click on ">>" > "Magic Montage"
4. Click on "Montage Shuffler Tool" in tool bar. Click on the montage image, then drag the slice.
5. Click on "File" > "Save as" (enter new file name)
Illustrator
Save RGB image in TIFF.
Export -> format TIF -> Export -> Select the followings:
Colar mode: RGB
Resolution: 300 dpi
Anti-aliasing: Art-optimized
LZW compression: unckeck
Embed ICC profile: check
ls, find
Example 1
Check the file size. Create a list of file names.
$ ls -s | sort -nr | more
$ find . -size +20 -print >list.txt
MatLab compiler Runtime (MCR)
Example 1
Install to Linux machine:
$ ./install -mode silent -agreeToLicense yes -destinationFolder /sshare1/home/yano/XXXXXXX/XXXXXXXX(absolute path)
ModelGenerator
Example 1
Find optimal model for construction of maximum likelihood tree.
$ java -jar ../../software/modelgenerator_v_851/modelgenerator.jar gene(aa).aln 4 >gene_modelgen.txt
Mummer 3
http://mummer.sourceforge.net/manual/#snpdetection
Example 1
Draw a dotplot (mummer-3.23 x gnuplot 5.2 ).
$ mummer -mum -b -c ref.(multi)fas que.(multi)fas > out
$ mummerplot --postscript -p mapping out
$ gnuplot mapping.gp
Example 2
nucmer. This is useful for multifasta file. I encoutered a problem when I installed mummer through conda.
"x27" in mummerplot needed to be replaced with " using editor.
$GNUPLOT_EXE = "false" needed be converted to "gnuplot"
$ nucmer -minmatch 60 ../../data/query.fasta ../../data/reference.fasta
$ mummerplot -x "[0,15000000]" -y "[0,15000000]" -postscript -p test out.delta
# "[ , ]" is coordinate range
KEGG
Example 1
Assign KO number to amino acid sequence list in a multifasta file (blastKOALA):
1. Generate a list of amino acid sequence in multifasta format.
2. Submit the multifasta file to KEGG mapper (http://www.kegg.jp/kegg/tool/annotate_sequence.html).
3. Get results in the KEGG website.
open
Example 1
Open multiple R console windows in mac.
$ open -n /Applications/R.app
OrderedPaining
Example 1
Find recombination host regions in the aligned genomes. This uses chromoPainter.
# the program distinguishes between capital and small letters in the .hap file !!
# .cshrc
$ qsub -cwd -l s_vmem=8G,mem_req=8G -o test1.log -e test1.log <<< "/bin/bash ./orderedPainting.sh -g path/to/.hap -l path/to/strain_name.txt"
Parsnp/Gingr/harvesttools
Example 1
Align closely related bacterial genomes, and detect SNPs in the aligned regions. Six files will be generated. Run (tree construction step) does not finish when you have too many unaligned regions on the reference genome. If this is the case, change the reference file. To force inclusion of all genomes, -c is required.
# for qsub, increase memory limit: (#$ -l mem_req=24G,s_vmem=24G)
$ parsnp -g ./reference_genome.gbk -d ./genome_directory_name -c
or
$ parsnp -r ./reference_genome.fa -d ./genome_directory_name -c
Example 2
Generating multifasta file of SNPs:
# remember that the output file contains "N" at non reference sequence.
# Output file is filtered: it does not contain gap "-", SNPs in small (<200 bp) LCB, high density InDel regions (20 indels in 100 bp). But it contains "N".
# position information is not recorded. position information should be obtained from .vcf file.
$ harvesttools -i parsnp.ggr -S output.snps
Example 3
Generating multifasta file of concatenated aligned regions:
# remember that SNPs are not filtered!!!
$ harvesttools -i parsnp.ggr -M output.fa
PEAT
Example 1.
Detect and trim potential adapter sequences in fastq file
$ PEAT_mac paired -1 ../XXX_R1.fastq.gz -2 ../XXX_R2.fastq.gz -o XXX -n 2 --adapter_contexts --out_gzip
https://github.com/jhhung/PEAT
PhyML
Example 1
Generates maximum likelihood tree.
#The input alignment format needs to be Phylip format.
# Add * to the head of entry name if you wish to specify out group.
# The tree file contains "0.000000:" to node. This should be removed before running PAML.
# Nucleotide alignment example
$ ../PhyML-3.1_macOS -i ampC.phy -q -d nt -m GTR -c 4 -a e -b 100
# amino-acid alignment example
$ ../PhyML-3.1_macOS -i ampC.phy -q -d aa -m JTT -c 4 -a e --free_rates -b 100
Pilon
Example 1
Correct - mainly bacterial- genome sequence using illumina reads. BAM files should be prepared using Bowtie2 etc and samtools.
$ java -jar pilon.jar --genome XXX.fasta --bam XXX.sort.bam (.bai should be in the same directory) --outdir YYYY --output prefix --changes
# --vcf
PROKKA (installation to Mac)
# First, attach Bio::Perl etc to current version of Perl.
$ sudo cpan Time::Piece XML::Simple Bio::Perl Digest::MD5
# install Prokka and its dependancies through homebrew
$ brew tap homebrew/science
$ brew update
$ brew install prokka --HEAD
PROKKA
Example 1
Annotate bacterial genome. Generate GFF3 file, sqn file etc.
see the following link:
https://github.com/tseemann/prokka/blob/master/README.md#invoking-prokka
$ prokka --outdir ./anno_${name} --prefix ${name} --locustag ${name} ./pilon.fasta
PGDSpider GUI
$ java -Xmx1024m -Xms512m -jar PGDSpider2.jar
Pfam
Example 1
Assign Pfam ID to amino acid sequence list in a multifasta file:
1. Generate a list of amino acid sequence in multifasta format.
2. Submit the multifasta file to ‘Batch search’ of Pfam (http://pfam.xfam.org/search#tabview=tab1).
3. Receive results by email.
Pubmed
Example 1
Find published papers using keywords.
Type as follows in the search window.
(Kobayashi I[Author]) AND restriction[Title]
(Yano H[Author]) AND transposon[All Fields]
pyenv
Example 1
Change python version in a working directory
# pyenv needs to be installed using home-brew ($brew install pyenv)
$ pyenv local 2.7.9
pyseer
Detect association between phenotype and unitig (or K-mer) in bacterial population (bacterial GWAS) using GLM or LMM in a given poulation structure or lineage information. The --lineage option tries to detect the lineages most associated with the phenotype.
Tutorial is found here: https://pyseer.readthedocs.io/en/master/
Step 1. Generate phenotype file in tab-delimited format.
Step 2. Generate distance file. The easiest way to generate distane file is to use "phylogeny_distance.py" provided by the authors. This converts tree file to the distance matrix file.
$ phylogeny_distance.py core_genome.tree > phylogeny_distances.tsv
Step 3. Generate unitig list using "unitig-counter". "unitig.txt" is generated in the output directory.
$ unitig-counter -strains strain_list.txt -output output -nb-cores 4
Step 4. pyseer fixed effect analysis
$ pyseer --phenotypes phenotypes.tsv --kmers unitig.txt --uncompressed --distances structure.tsv --print-samples --min-af 0.01 --max-af 0.99 --cpu 4 --filter-pvalue 1E-8 > pyseer.assoc
qsub, qdel, qreport
Example 1
Increase memory limit: up to 8, 12 or 16
$ qsub -cwd -l s_vmem=8G,mem_req=8G shellfile.sh
Example 2
Delete job:
$ qdel jobID
Example 3
See stats of finished job:
$ qreport -j jobID
Regular expression in Linux
See the following link (in Japanese)
http://itpro.nikkeibp.co.jp/article/COLUMN/20060228/231171/
Roary (install via anaconda3)
# Sometimes, SGE job is terminated before generating core_gene_alignment.aln.
Example 1
# standard usage
$ roary –f output_dir_prefix *.gff
Example 2
# -e : prank alignment option for individual core gene (99% conservation); (stacked!, when generating alignment. prank is not always good!!)
$ roary -e –f output_dir_prefix *.gff
Example 3
# -cd: definition of conservation level of core gene; default 99
# -z option: leave alignment file of each gene
# -e --mafft: use mafft insead of prank
$ roary -cd 95 -e --mafft -n -z –f output_dir_prefix *.gff
Example 5
# -i : blastP cut-off value
$ roary –i 90 *.gff
Example 6
# generate aminoacid sequence alignments. Gene names should be in *.gff. (This is useful only for conserved known genes)
$ query_pan_genome -a gyrA1_multifasta -n gyrA1 /gff/*.gff
Example 6
# Quickly generate concatenated core gene alignment
# It took 3 hours for 206 genomes in mac . 3.1 GHz Dual-Core Intel Core, but completed the process.
$ roary -e --mafft -n -p 8 -f output_folder /*.gff
Samtools
Example 1
Extract selected entries from mutlitfasta.
$ xargs samtools faidx test.fa < namelist.txt
Example 2
Extract unmapped reads from a bam file (flag 4) generated by single end mapping
$ samtools bam2fq -f 4 XXXX.sort.bam >XXXX_unmap.fastq
Example 3
Convert SAM format to BAM format, then generate indexed BAM file.
$ samtools view -bS XXX.sam > XXX.bam
$ samtools sort XXX.bam XXX.sort
$ samtools index XXX.sort.bam
Script
Example 1
export log in the terminal window in a text file
$ script log.txt
after analysis
$ exit
Singularity (Linux)
see:
https://www.sylabs.io/guides/3.1/user-guide/quick_start.html#interact-with-images
Example 1.
Copying a container image from Docker.
First of all, "qlogin" at the home directory. Invoke "singularity" using "module".
$ module load singularity
Then,
$ singularity pull docker://godlovedc/lolcow
lolcow.simg is generated in the directory.
Example 2.
Use "blast +" in the singularity container.
$ module load singularity
$ singularity exec /usr/local/biotools/b/blast\:2.7.1--boost1.64_1 tblastn -query aa.fas -db database_prefix -max_hsps 2 >blast_results
SolexaQA++
Example 1
Quality analysis and graphs generation
$ SolexaQA++ analysis XXXXX.fastq
Example 2
Filter reads based on Phred score. This software handles compressed files.
# -h 20
# -h 30
# need "-454" option for 454 data
$ SolexaQA++ dynamictrim XXX.fastq.gz YYY.fastq.gz -d directory -h 30
Example 3
Filter reads based on the length of the trimmed reads.
$ SolexaQA++ lengthsort XXX.fastq.trimmed.gz YYY.fastq.trimmed.gz --length 50
ssh-keygen
Example 1
Generate exchange keys to access remote server using SSH:
1. In Mac generate files using ssh-keygen:
local$ssh-keygen
Enter file in which the key is (/usename/.ssh/): (press enter)
Enter passphrase: XXXXXXXX
Enter same passphrase again: XXXXXXXX
‘id_rsa’ and ‘id_rsa.pub’ file will be generated in ./ssh.
2. Then, generate a ’authorized_keys’ file in server and copy the key in id_rsa.pub to 'authorized_keys':
local$ scp id_rsa.pub username@xx.xx: username/.ssh
remote$ touch authorized_keys
remote$ chmod 600 authorized_keys
remote$ cat id_rsa.pub >> authorized_keys
remote$ rm id_rsa.pub
New ‘id_rsa.pub’ files can be generated in multiple computers. Add (cat) the new key in ‘id_rsa.pub’ in the local /.ssh directory to /.ssh/authorized_keys in the server.
Star
Trinity (Linux only)
Example 1
Assemble reads from RNA-seq illumina reads.
# modify "runME.sh" file, then submit.
# Trinity is installed in the HGC server. /usr/local/package/trinity/2.2.0/Trinity
# for single-end reads
in runME.sh,
#!/bin/bash -ve
#######################################################
## Run Trinity to Generate Transcriptome Assemblies ##
#######################################################
/usr/local/package/trinity/2.2.0/Trinity --seqType fq \
--max_memory 8G \
--single XXXXXX.fastq.trimmed \
--SS_lib_type F \
--CPU 2
# for paired-end reads
in runME.sh,
#!/bin/bash -ve
#######################################################
## Run Trinity to Generate Transcriptome Assemblies ##
#######################################################
/usr/local/package/trinity/2.2.0/Trinity --seqType fq \
--max_memory 8G \
--left XXXXXX_1.fastq, YYYY_1.fastq \
--right XXXXXX_2.fastq, YYYY_2.fastq \
--SS_lib_type RF \
--CPU 2
Tr
Example 1.
Delete a character
$ tr -d \- <infile >outfile