Programmer's notes

This page is designed to describe some tips and tricks that were sometimes time-consuming, essentially using R, bash (Unix shell), Matlab or Perl.

############### IN THE "R" LANGUAGE (in ECLIPSE) ####################################

# resolving issues writing in csv from R (with col.names=F) -> If you switch to write.table() (which write.csv() calls anyway) you're golden

# wondering about the structure of a particular variable named "a" -> str(a)

# looking to get a string after a pattern -> regexpr is what you need

# Finding a string incrementally Use Edit > Incremental Find Next (Ctrl+J) or Edit > Incremental Find Previous (Ctrl+Shift+J) to enter the incremental find mode, and start typing the string to match. Matches are found incrementally as you type. The search string is shown in the status line. Press Ctrl+J or Ctrl+Shift+J to go to the next or previous match. Press Enter or Esc to exit incremental find mode.

# converting numeric to string -> formatC( Vect, digits=3, width=7, format="f", flag=0 ) or sprintf("%06.3f", c(12.234, 234.5675, 1.5))

############### IN THE BASH LANGUAGE ############################################

# text compatibility issues (CR and LF) -> dos2unix

# deleting a job on the cluster ->qdel (qstat might also help)

# need to get the right to execute files ->chmod u+x *

#autocompletion -> tab

(1) counting number of sequences in a fasta file: grep -c "^>" file.fa

(2) add something to end of all header lines: sed 's/>.*/&WHATEVERYOUWANT/' file.fa > outfile.fa

(3) clean up a fasta file so only first column of the header is outputted: awk '{print $1}' file.fa > output.fa

Here are some other cool ones: http://chrisduran.eu/bioinformatics/linux-and-osx-commands-for-working-with-fasta-files/

(4) Linearize a fasta file: cat input.fasta| awk '{if (substr($0,1,1)==">"){if (p){print "\n";} print $0} else printf("%s",$0);p++;}END{print "\n"}' | grep '>' -A 1 --no-group-separator> joinedinput.fasta

add an index behind each sequence ID in a fasta file

awk '/>/{print $0(++i)}!/>/' file.fasta

############### IN THE MATLAB LANGUAGE #########################################

# what to use numerics but it is in character -> num2str()

############### FOR THE SLURM CLUSTER ###########################################

pour charger R sur le cluster slurm,module load system/R-3.5.3

############### MISCELLANEOUS #################################################

Highlighting elements in a textfile -> Did you know that Notepad++ had a user-defined language ?

Cygwin is particularly useful to handle files with unix commands with the windows console.

To use unix cmds in priority when conflicting with a windows cmd (for example the "find" cmd), the cygwin path as to be IN THE FRONT of the environment variable.

C:\cygwin64\bin\;C:\ProgramData\Oracle\Java\javapath;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;%SYSTEMROOT%\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Intel\OpenCL SDK\2.0\bin\x86;C:\Program Files (x86)\Intel\OpenCL SDK\2.0\bin\x64;C:\Program Files\Intel\DMIX;C:\Program Files (x86)\MATLAB\MATLAB Compiler Runtime\v714\runtime\win32;C:\Program Files (x86)\Skype\Phone\

https://www.howtogeek.com/howto/41382/how-to-use-linux-commands-in-windows-with-cygwin/