Commonplace book

awk

Example 1

Extract a word in a line containing keywords:

$awk ‘$2=="keyA" || $2=="keyB" {print $3}’ subject.txt > outfile.txt

Example 2

Use 'grep' , 'awk' to extract lines containing keywords.

$grep "tig00000001.gff | grep "Aragorn:1.2" | awk '$7 == "-" {print $0}' > tig00000001_RNA_minus.txt

Example 3

Use with 'sort | uniq -c' to grasp NGS read variation in a fastq file (convenient for amplicon sequencing)

$awk '(NR%4==2){print $0}' subject.fastq | sort | uniq -c > output.txt

BAPS

# preparation of BAPS file.

1. Prepare vcf file. Reorder columns according to source/country of the isolates.

2. Convert modified vcf to BAPS using PGDSpider2. java -Xmx1024m -Xms512m -jar path/to/PGDSpider2.jar. MLST data can be used as input data after adding isolate ID to the last column.

# infering genetic population structure using SNPs, VNTR or MLST data:

1. Launch BAPs by typing ./run_baps6.sh /Applications/MATLAB/MATLAB_Compiler_Runtime/v713 in terminal.

3. Start “clustering of individual” (be aware that isolate number is already given to the last column by PGDSpider2). Define several values of maximum number of cluster, like "5,8,10". This step requires information of population name (=isolation source/country), and the first row number for each population.

4. Save pre-processed data (1 file) and mixture results (2 files).

5. Load BAPS results (.mat) file for admixture analysis. Define cut-off cluster size. “1” takes into account all clusters.

6. Save two result files. The result file (with probability) can be used to generate barplot after reformatting.

BEDTools

Example 1

Calculate read depth over the genome(s) excluding "0" coverage regions.

$ bedtools genomecov -ibam XXXXXX.bam -bg >out.txt

Example 2

Calculate read depth for each base position in the genome.

$ bedtools genomecov -ibam XXXXXX.bam -d >out.txt

bamToFastq (in bedtool package)

Example 1

Generate a fastq file from a bam file.

$ bamToFastq -i XXXXXX.bam -fq XXXXXX.fastq

Example 1

use as calculator in Linux, Unix

$ bc (Enter)

1001037168/4

250259292

$ quit

DBGWAS

Exmaple: "It uses a compacted De Bruijn Graph (cDBG) structure to represent the variability within all bacterial genome assemblies given as input. Then cDBG nodes are tested for association with a phenotype of interest and the resulting associated nodes are then re-mapped on the cDBG. "

# see input file format in the following site.

https://gitlab.com/leoisl/dbgwas/-/tree/master/containers/docker#running-dbgwas-on-singularity

It runs on singluarity in Linux cluster machine.

$ singularity run ../software/dbgwas-0.5.4.simg -strains ./dbgwas/sample_example/strains -newick ./dbgwas/sample_example/strains.newick -nc-db Resistance_DB_for_DBGWAS.fasta -pt-db uniprot_sprot_bacteria_for_DBGWAS.fasta

blast+

Example 0

First, create blast database.

$ makeblastdb -in multifasta_file -out database_name -dbtype nucl -parse_seqids

Example 1

Generate results in alignment format.

$ tblastn -query aa.fas -db database_prefix -max_hsps 2 >blast_aln.txt

Example 2

Generate results in table format.

$ tblastn -query aa.fas -db database_prefix -outfmt 6 -max_hsps 2 >blast_tbl.txt

legacy blast

Example 1

Generate results in alignment format:

$ blastall -p tblastn -i aa_fasta.fas -d /usr/local/db/blast/refseq/refseq-genomic-bacteria -v 1 -b 1 -o blast_test.txt

link to legacy blast option:

https://www.ncbi.nlm.nih.gov/Class/BLAST/blastallopts.txt

Bowtie

Example 1

Generate alignments in SAM format

Indexing first.

$ bowtie-build fasta_file prefix

# for single end reads

$ bowtie -S -m 1 -v 1 prefix_of_index XXX.fastq XXX.sam

Bowtie2

Example 1

Generate alignments in SAM format. This handles compressed (gz) files.

Indexing first.

$ bowtie2-build -f reference.fa prefix_of_index

# for paired_end reads

$ bowtie2 -x prefix_of_index -1 XXXX_1.fastq -2 XXXX_2.fastq -q -N 1 -S XXXX.sam

Breseq

Example 1.

Identifying structural variation in haploid microbial genomes from short-read resequencing data.

1. Installation to the Linux machine. Move to the the source directory, then type:

$ ./configure --prefix=${PWD}

$ make

$ make test

$ make install

2. Usage 1. Standard mode.

$ breseq -n name_of_output -j 2 -r reference1.gbk reads1.fastq.gz reads2.fastq.gz

# -j : number of CPU you wish to use.

3. Usage 2. Polymorphism mode

$ breseq -n name_of_output -p -j 4 -r reference1.gbk reads1.fastq.gz reads2.fastq.gz reads3.fastq.gz

3. Usage 3. Retrieving coverage data

$ breseq BAM2COV -b ./data/reference.bam -f ./data/reference.fasta -o file_name_prefix <entry name here>:10000-15000 -t

# -t : tabular format option. It tells you the coverage of uniquely mapped read and ambiguously mapped read separately.

4 Usage 4. Generating coverage plot

$ breseq BAM2COV -b ./data/reference.bam -f ./data/reference.fasta --format PNG -o file_name_prefix <entry name here>:10000-15000

BSMAP/sambamba/methratio.py

Example 1

Determine cytosine methylation level for each site in a genome.

Step 1. Map Bisulfite-treated paired-end reads to a reference genome.

# (-r 0) allows only uniquely mapped reads.

# output format can be selected from either ".bsp" ".sam" or ".bam" by specifying the output file name.

# Works in both Linux and MacOSX.

https://github.com/popucui/bsmap/issues/17

$ bsmap -a R1.fastq -b R2.fastq -d ref.fasta -o outfile.bam -p 2 -w 100 -r 0 -n 0

Step 2. Remove PCR duplicate using "sambamba"

# sambamba needs to be installed

$ sambamba markdup -r -t 2 input.bam output.bam

Step 3. Determine the C to C+T ratio using "methylatio.py"

$ python methratio.py -o output.txt -d ref.fasta bsmap_output_file.bam

bwa mem

canu v 1.8

Example 1. Assembling long reads.

# shell script example.

#!/bin/sh

#$ -S /bin/sh

#$ -l short

#$ -l mem_req=8G

canu \

-p strain_A -d strain_A_results_dir \

genomeSize=5.4m \

-pacbio-raw ./strain_A_pac_fastq/*.fastq \

minReadLength=4000\

gridOptions="-l mem_req=8G" \

gridEngineMemoryOption="-l mem_req=MEMORY" \

gridEngineThreadsOption="-pe def_slot THREADS" \

corMhapSensitivity=high\

corMinCoverage=0

# four particuarly important options .

minReadLength=4000 (or other values, default is 1000)

correctedErrorRate=0.040 (or higher or lower values)

corMhapSensitivity=high (or normal)

corMinCoverage=0 (or default)

cat

Example 1

Add text data to an existing file:

$ cat file_A >> file_B

chmod

Example 1

Allow user to write, read, execute a file.

$ chmod u+rwx directory_name or file_name

Example 2

Allow user to write, read, execute all files under the selected directory. (Use this when erasing a directory containing protected files.)

$ chmod -R 755 directory_name or file_name

Circlator

https://github.com/sanger-pathogens/circlator/wiki

Circos installation

1. First, install perl modules by repeating the following command. XXXXX is a module name.

$ sudo perl -MCPAN -e 'install XXXXX'

2. Download GD-2.50 separately then compile files, and install.

$ curl -O http://www.cpan.org/authors/id/L/LD/LDS/GD-2.50.tar.gz

$ tar xvfz GD-2.50.tar.gz

$ cd GD-2.50

$ perl Makefile.pl

$ sudo make install

3. Download Circos executable, then export path to .bash_profile.

To run circos, type:

$ circos -conf circos.conf

ClonalFrameML

Example 1

Infer core genome phylogeny after removing recombination tracts. Whole genome alignment and ML tree are required.

$ ClonalFrameML newick_file seq_file output_prefix [OPTIONS]

# remove quote of tip label in newick file.

# https://github.com/xavierdidelot/clonalframeml/wiki

# Rscript command can no longer be installed in MAC

COG

Example 1

Assign COG ID to amino acid sequence list in a multifasta file:

1. Generate a list of amino acid sequence in multifasta format.

2. Submit the multifasta file to Web CD search tool: (http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi). Select "Search only" then select COG. Receive email. .

3. Go to NCBI site following the link in email. Select 'Domain Hits', Data mode=concise, and then 'Download'. Download results in HTML format.

4. COG ID - function category mapper is available from here (ftp://ftp.ncbi.nih.gov/pub/wolf/COGs/COG0303/cogs.csv).

Example 1

Check directory size

$ du -sh directory_name

EMBOSS

Example 1. seqret

Extract one entry from multifasta file.

$ seqret multifasta_name:entry_name entry_name.fasta

Example 2.

Extract one entry from multifasta file and generate its reverse-compelmentary sequence.

$ seqret -srev multifasta_name:entry_name entry_name.fasta

Example 3. extractseq

Extract specific regions from one entry sequnce in a multifasta file and generate its concatenated sequence.

http://www.bioinformatics.nl/cgi-bin/emboss/help/extractseq

$ extractseq multifasta_name:entry_name output_name.fasta -regions "10-20 30-45 533-537"

Example 4. dotpath

Draw a dotplot.

$ dotpath ref.(multi)fasta query.(multi)fasta -graph ps -overlaps -word 20 -goutfile dotpath -overlaps

reverse query sequence.

$ dotpath ref.(multi)fasta query.(multi)fasta -sreverse2 -graph ps -overlaps -word 20 -goutfile dotpath_r -overlaps

Example 4. yank, union

Concatenate sequences in the list

firt creat a list file using yank:

http://www.bioinformatics.nl/cgi-bin/emboss/help/union

then unicon command. @ is required.

$ union -sequence @entry.list -outseq test.fasta

FALCON

https://pb-falcon.readthedocs.io/en/latest/tutorial.html

fastGEAR (linux ver. needs MatLab)

https://users.ics.aalto.fi/pemartti/fastGEAR/fastGEAR_manual.pdf

fastME

Example 1

Construct distance-based phylogenetic tree.

# alignment needs to be in Phylip format.

# Example of DNA alignment (TN93+G model)

$ fastme -i input.phy -d TN93 -g -m NJ -b 100 -T 2

fineSTRUCTURE v2

Example 1

Estimate "fine" genetic population structure based on recombination events

$ fs project_folder_name -idfile XXX.txt -phasefiles YYY.phase -ploidy 1 -recombfiles ZZZ.recombfile

grep

Example 1

Extract lines containing a keyword:

$ grep -f keywordlist.txt subject.txt >outfile.txt

Example 2

Replace a phrase in a file (or files) by combining 'xargs', 'sed' commands.

# remove or modify values in .nwk file.

$ grep -rl ')[0-9]\.[0-9]\{6\}' ./ampC_copy.tree | xargs sed -i '.bk' 's/)0\.[0-9]\{6\}\:/):/g'

$ rm *bk

Gubbins installation on MAC

# first, install miniconda3, then

$ conda config --add channels r

$ conda config --add channels defaults

$ conda config --add channels conda-forge

$ conda config --add channels bioconda

$ conda install gubbins

Gubbins

Example 1

Generate ML tree of bacterial species based on core genome alignments after filtering recombination tracts.

$ run_gubbins.py [FASTA alignment]

# gubbins_drawer.py is no longer supported!!

# phandango for data visualization

https://jameshadfield.github.io/phandango/#/

GenomeMatcher, (MiGAP - discontinued)

Example 1

Annotate bacterial genome sequences, then extract qualifier values (COG classification, product name, gene name etc.) per feature in a single line in tab-delimited format.

1. Create a sequence file in fasta or multifasta format.

2. Submit the fasta(multifasta) file to MiGAP.

3. Download annotation in GenBank/DDBJ/EMBL format.

4. Open GenomeMatcher -> Accessories -> ExtractFromGenBankFile

then, load the GenBank file.

5. Select features of interest, then click on ‘execute extraction’

6. Copy the results and paste them on excel sheet.

Example 2

Visualize inversion in a bacterial chromosome using GenomeMatcher:

1. Load sequences to x and y axis in main window.

2. Click on ‘colorgram’. This runs blast program.

gzip, tar compression in mac

Example 1: tar

Compress large files, for example fastq files, to reduce file size

# file

$ tar -zcvf file_name.tar.gz file_name

# directory

$ tar -zcvf directory_name.tar.gz directory_name

# directory

$ tar czvf ../short_read_directory.tar.gz ./short_read_directory

Example 2 : tar

Decompress .gz files

$ tar -zxvf file_name.tar.gz

Example 3: gzip

Compress a large file, for example fastq file, to reduce file size without leaving a copy of the original file.

$ gzip file_name.fastq

Example 4 : gzip

Compress a large file and make a copy of the original file

$ gzip -c file_name.fastq >fine_name.fastq.gz

Example 5: gzip

Decompress .gz file(s) without leaving the original file.

$ gzip -d file_name.fastq.gz

HTseq-count

Example 1

Obtain mapped read count based on annotation in a .gff file in strand specific RNA-seq

$ htseq-count -f sam -s yes -t CDS -i locus_tag XXXX.sam YYY.gff >count.txt

# Illumina TruSeq

$ htseq-count -f sam -s reverse -t CDS -i locus_tag XXXX.sam YYY.gff >count.txt

Example 2.

obtain mapped read count based on annotation in a .gff file in strand non-specific RNA-seq

$ htseq-count -f bam -s no -t CDS -i locus_tag XXXX.bam YYY.gff >count.txt

Image J

Example 1

Create a stack image.

1. Click on "File" > "Open" (select my TIF file)

2. Click on "Image" > "Images to stack" (two images will be merged)

3. Click on "Image" > "Crop", if necessary

4. Click on "File" > "Save as" (enter new file name)

Example 2

Create a montage.

1. Click on "File" > "Open" (select a stack TIF file)

2. Click on "Image" > "Stack" > "Make Montage"

3. Move to tool bar. Click on ">>" > "Magic Montage"

4. Click on "Montage Shuffler Tool" in tool bar. Click on the montage image, then drag the slice.

5. Click on "File" > "Save as" (enter new file name)

Illustrator

Save RGB image in TIFF.

Export -> format TIF -> Export -> Select the followings:

Colar mode: RGB

Resolution: 300 dpi

Anti-aliasing: Art-optimized

LZW compression: unckeck

Embed ICC profile: check

ls, find

Example 1

Check the file size. Create a list of file names.

$ ls -s | sort -nr | more

$ find . -size +20 -print >list.txt

MatLab compiler Runtime (MCR)

Example 1

Install to Linux machine:

$ ./install -mode silent -agreeToLicense yes -destinationFolder /sshare1/home/yano/XXXXXXX/XXXXXXXX(absolute path)

ModelGenerator

Example 1

Find optimal model for construction of maximum likelihood tree.

$ java -jar ../../software/modelgenerator_v_851/modelgenerator.jar gene(aa).aln 4 >gene_modelgen.txt

Mummer 3

http://mummer.sourceforge.net/manual/#snpdetection

Example 1

Draw a dotplot (mummer-3.23 x gnuplot 5.2 ).

$ mummer -mum -b -c ref.(multi)fas que.(multi)fas > out

$ mummerplot --postscript -p mapping out

$ gnuplot mapping.gp

Example 2

nucmer. This is useful for multifasta file. I encoutered a problem when I installed mummer through conda.

"x27" in mummerplot needed to be replaced with " using editor.

$GNUPLOT_EXE = "false" needed be converted to "gnuplot"

$ nucmer -minmatch 60 ../../data/query.fasta ../../data/reference.fasta

$ mummerplot -x "[0,15000000]" -y "[0,15000000]" -postscript -p test out.delta

# "[ , ]" is coordinate range

KEGG

Example 1

Assign KO number to amino acid sequence list in a multifasta file (blastKOALA):

1. Generate a list of amino acid sequence in multifasta format.

2. Submit the multifasta file to KEGG mapper (http://www.kegg.jp/kegg/tool/annotate_sequence.html).

3. Get results in the KEGG website.

open

Example 1

Open multiple R console windows in mac.

$ open -n /Applications/R.app

OrderedPaining

Example 1

Find recombination host regions in the aligned genomes. This uses chromoPainter.

# the program distinguishes between capital and small letters in the .hap file !!

# .cshrc

$ qsub -cwd -l s_vmem=8G,mem_req=8G -o test1.log -e test1.log <<< "/bin/bash ./orderedPainting.sh -g path/to/.hap -l path/to/strain_name.txt"

Parsnp/Gingr/harvesttools

Example 1

Align closely related bacterial genomes, and detect SNPs in the aligned regions. Six files will be generated. Run (tree construction step) does not finish when you have too many unaligned regions on the reference genome. If this is the case, change the reference file. To force inclusion of all genomes, -c is required.

# for qsub, increase memory limit: (#$ -l mem_req=24G,s_vmem=24G)

$ parsnp -g ./reference_genome.gbk -d ./genome_directory_name -c

$ parsnp -r ./reference_genome.fa -d ./genome_directory_name -c

Example 2

Generating multifasta file of SNPs:

# remember that the output file contains "N" at non reference sequence.

# Output file is filtered: it does not contain gap "-", SNPs in small (<200 bp) LCB, high density InDel regions (20 indels in 100 bp). But it contains "N".

# position information is not recorded. position information should be obtained from .vcf file.

$ harvesttools -i parsnp.ggr -S output.snps

Example 3

Generating multifasta file of concatenated aligned regions:

# remember that SNPs are not filtered!!!

$ harvesttools -i parsnp.ggr -M output.fa

PEAT

Example 1.

Detect and trim potential adapter sequences in fastq file

$ PEAT_mac paired -1 ../XXX_R1.fastq.gz -2 ../XXX_R2.fastq.gz -o XXX -n 2 --adapter_contexts --out_gzip

https://github.com/jhhung/PEAT

PhyML

Example 1

Generates maximum likelihood tree.

#The input alignment format needs to be Phylip format.

# Add * to the head of entry name if you wish to specify out group.

# The tree file contains "0.000000:" to node. This should be removed before running PAML.

# Nucleotide alignment example

$ ../PhyML-3.1_macOS -i ampC.phy -q -d nt -m GTR -c 4 -a e -b 100

# amino-acid alignment example

$ ../PhyML-3.1_macOS -i ampC.phy -q -d aa -m JTT -c 4 -a e --free_rates -b 100

Pilon

Example 1

Correct - mainly bacterial- genome sequence using illumina reads. BAM files should be prepared using Bowtie2 etc and samtools.

$ java -jar pilon.jar --genome XXX.fasta --bam XXX.sort.bam (.bai should be in the same directory) --outdir YYYY --output prefix --changes

# --vcf

PROKKA (installation to Mac)

# First, attach Bio::Perl etc to current version of Perl.

$ sudo cpan Time::Piece XML::Simple Bio::Perl Digest::MD5

# install Prokka and its dependancies through homebrew

$ brew tap homebrew/science

$ brew update

$ brew install prokka --HEAD

PROKKA

Example 1

Annotate bacterial genome. Generate GFF3 file, sqn file etc.

see the following link:

https://github.com/tseemann/prokka/blob/master/README.md#invoking-prokka

$ prokka --outdir ./anno_${name} --prefix ${name} --locustag ${name} ./pilon.fasta

PGDSpider GUI

$ java -Xmx1024m -Xms512m -jar PGDSpider2.jar

Pfam

Example 1

Assign Pfam ID to amino acid sequence list in a multifasta file:

1. Generate a list of amino acid sequence in multifasta format.

2. Submit the multifasta file to ‘Batch search’ of Pfam (http://pfam.xfam.org/search#tabview=tab1).

3. Receive results by email.

Pubmed

Example 1

Find published papers using keywords.

Type as follows in the search window.

(Kobayashi I[Author]) AND restriction[Title]

(Yano H[Author]) AND transposon[All Fields]

pyenv

Example 1

Change python version in a working directory

# pyenv needs to be installed using home-brew ($brew install pyenv)

$ pyenv local 2.7.9

pyseer

Detect association between phenotype and unitig (or K-mer) in bacterial population (bacterial GWAS) using GLM or LMM in a given poulation structure or lineage information. The --lineage option tries to detect the lineages most associated with the phenotype.

Tutorial is found here: https://pyseer.readthedocs.io/en/master/

Step 1. Generate phenotype file in tab-delimited format.

Step 2. Generate distance file. The easiest way to generate distane file is to use "phylogeny_distance.py" provided by the authors. This converts tree file to the distance matrix file.

$ phylogeny_distance.py core_genome.tree > phylogeny_distances.tsv

Step 3. Generate unitig list using "unitig-counter". "unitig.txt" is generated in the output directory.

$ unitig-counter -strains strain_list.txt -output output -nb-cores 4

Step 4. pyseer fixed effect analysis

$ pyseer --phenotypes phenotypes.tsv --kmers unitig.txt --uncompressed --distances structure.tsv --print-samples --min-af 0.01 --max-af 0.99 --cpu 4 --filter-pvalue 1E-8 > pyseer.assoc

qsub, qdel, qreport

Example 1

Increase memory limit: up to 8, 12 or 16

$ qsub -cwd -l s_vmem=8G,mem_req=8G shellfile.sh

Example 2

Delete job:

$ qdel jobID

Example 3

See stats of finished job:

$ qreport -j jobID

Regular expression in Linux

See the following link (in Japanese)

http://itpro.nikkeibp.co.jp/article/COLUMN/20060228/231171/

Roary (install via anaconda3)

# Sometimes, SGE job is terminated before generating core_gene_alignment.aln.

Example 1

# standard usage

$ roary –f output_dir_prefix *.gff

Example 2

# -e : prank alignment option for individual core gene (99% conservation); (stacked!, when generating alignment. prank is not always good!!)

$ roary -e –f output_dir_prefix *.gff

Example 3

# -cd: definition of conservation level of core gene; default 99

# -z option: leave alignment file of each gene

# -e --mafft: use mafft insead of prank

$ roary -cd 95 -e --mafft -n -z –f output_dir_prefix *.gff

Example 5

# -i : blastP cut-off value

$ roary –i 90 *.gff

Example 6

# generate aminoacid sequence alignments. Gene names should be in *.gff. (This is useful only for conserved known genes)

$ query_pan_genome -a gyrA1_multifasta -n gyrA1 /gff/*.gff

Example 6

# Quickly generate concatenated core gene alignment

# It took 3 hours for 206 genomes in mac . 3.1 GHz Dual-Core Intel Core, but completed the process.

$ roary -e --mafft -n -p 8 -f output_folder /*.gff

Samtools

Example 1

Extract selected entries from mutlitfasta.

$ xargs samtools faidx test.fa < namelist.txt

Example 2

Extract unmapped reads from a bam file (flag 4) generated by single end mapping

$ samtools bam2fq -f 4 XXXX.sort.bam >XXXX_unmap.fastq

Example 3

Convert SAM format to BAM format, then generate indexed BAM file.

$ samtools view -bS XXX.sam > XXX.bam

$ samtools sort XXX.bam XXX.sort

$ samtools index XXX.sort.bam

Script

Example 1

export log in the terminal window in a text file

$ script log.txt

after analysis

$ exit

Singularity (Linux)

see:

https://www.sylabs.io/guides/3.1/user-guide/quick_start.html#interact-with-images

Example 1.

Copying a container image from Docker.

First of all, "qlogin" at the home directory. Invoke "singularity" using "module".

$ module load singularity

Then,

$ singularity pull docker://godlovedc/lolcow

lolcow.simg is generated in the directory.

Example 2.

Use "blast +" in the singularity container.

$ module load singularity

$ singularity exec /usr/local/biotools/b/blast\:2.7.1--boost1.64_1 tblastn -query aa.fas -db database_prefix -max_hsps 2 >blast_results

SolexaQA++

Example 1

Quality analysis and graphs generation

$ SolexaQA++ analysis XXXXX.fastq

Example 2

Filter reads based on Phred score. This software handles compressed files.

# -h 20

# -h 30

# need "-454" option for 454 data

$ SolexaQA++ dynamictrim XXX.fastq.gz YYY.fastq.gz -d directory -h 30

Example 3

Filter reads based on the length of the trimmed reads.

$ SolexaQA++ lengthsort XXX.fastq.trimmed.gz YYY.fastq.trimmed.gz --length 50

ssh-keygen

Example 1

Generate exchange keys to access remote server using SSH:

1. In Mac generate files using ssh-keygen:

local$ssh-keygen

Enter file in which the key is (/usename/.ssh/): (press enter)

Enter passphrase: XXXXXXXX

Enter same passphrase again: XXXXXXXX

‘id_rsa’ and ‘id_rsa.pub’ file will be generated in ./ssh.

2. Then, generate a ’authorized_keys’ file in server and copy the key in id_rsa.pub to 'authorized_keys':

local$ scp id_rsa.pub username@xx.xx: username/.ssh

remote$ touch authorized_keys

remote$ chmod 600 authorized_keys

remote$ cat id_rsa.pub >> authorized_keys

remote$ rm id_rsa.pub

New ‘id_rsa.pub’ files can be generated in multiple computers. Add (cat) the new key in ‘id_rsa.pub’ in the local /.ssh directory to /.ssh/authorized_keys in the server.

Star

Trinity (Linux only)

Example 1

Assemble reads from RNA-seq illumina reads.

# modify "runME.sh" file, then submit.

# Trinity is installed in the HGC server. /usr/local/package/trinity/2.2.0/Trinity

# for single-end reads

in runME.sh,

#!/bin/bash -ve

#######################################################

## Run Trinity to Generate Transcriptome Assemblies ##

#######################################################

/usr/local/package/trinity/2.2.0/Trinity --seqType fq \

--max_memory 8G \

--single XXXXXX.fastq.trimmed \

--SS_lib_type F \

--CPU 2

# for paired-end reads

in runME.sh,

#!/bin/bash -ve

#######################################################

## Run Trinity to Generate Transcriptome Assemblies ##

#######################################################

/usr/local/package/trinity/2.2.0/Trinity --seqType fq \

--max_memory 8G \

--left XXXXXX_1.fastq, YYYY_1.fastq \

--right XXXXXX_2.fastq, YYYY_2.fastq \

--SS_lib_type RF \

--CPU 2

Example 1.

Delete a character

$ tr -d \- <infile >outfile

Google Sites

Report abuse