Programmer's notes
This page is designed to describe some tips and tricks that were sometimes time-consuming, essentially using R, bash (Unix shell), Matlab or Perl.
############### IN THE "R" LANGUAGE (in ECLIPSE) ####################################
# resolving issues writing in csv from R (with col.names=F) -> If you switch to write.table() (which write.csv() calls anyway) you're golden
# wondering about the structure of a particular variable named "a" -> str(a)
# looking to get a string after a pattern -> regexpr is what you need
# Finding a string incrementally Use Edit > Incremental Find Next (Ctrl+J) or Edit > Incremental Find Previous (Ctrl+Shift+J) to enter the incremental find mode, and start typing the string to match. Matches are found incrementally as you type. The search string is shown in the status line. Press Ctrl+J or Ctrl+Shift+J to go to the next or previous match. Press Enter or Esc to exit incremental find mode.
# converting numeric to string -> formatC( Vect, digits=3, width=7, format="f", flag=0 ) or sprintf("%06.3f", c(12.234, 234.5675, 1.5))
############### IN THE BASH LANGUAGE ############################################
# text compatibility issues (CR and LF) -> dos2unix
# deleting a job on the cluster ->qdel (qstat might also help)
# need to get the right to execute files ->chmod u+x *
#autocompletion -> tab
(1) counting number of sequences in a fasta file: grep -c "^>" file.fa
(2) add something to end of all header lines: sed 's/>.*/&WHATEVERYOUWANT/' file.fa > outfile.fa
(3) clean up a fasta file so only first column of the header is outputted: awk '{print $1}' file.fa > output.fa
Here are some other cool ones: http://chrisduran.eu/bioinformatics/linux-and-osx-commands-for-working-with-fasta-files/
(4) Linearize a fasta file: cat input.fasta| awk '{if (substr($0,1,1)==">"){if (p){print "\n";} print $0} else printf("%s",$0);p++;}END{print "\n"}' | grep '>' -A 1 --no-group-separator> joinedinput.fasta
add an index behind each sequence ID in a fasta file
awk '/>/{print $0(++i)}!/>/' file.fasta
############### IN THE MATLAB LANGUAGE #########################################
# what to use numerics but it is in character -> num2str()
############### FOR THE SLURM CLUSTER ###########################################
pour charger R sur le cluster slurm,module load system/R-3.5.3
############### MISCELLANEOUS #################################################
Highlighting elements in a textfile -> Did you know that Notepad++ had a user-defined language ?
Cygwin is particularly useful to handle files with unix commands with the windows console.
To use unix cmds in priority when conflicting with a windows cmd (for example the "find" cmd), the cygwin path as to be IN THE FRONT of the environment variable.
C:\cygwin64\bin\;C:\ProgramData\Oracle\Java\javapath;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;%SYSTEMROOT%\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Intel\OpenCL SDK\2.0\bin\x86;C:\Program Files (x86)\Intel\OpenCL SDK\2.0\bin\x64;C:\Program Files\Intel\DMIX;C:\Program Files (x86)\MATLAB\MATLAB Compiler Runtime\v714\runtime\win32;C:\Program Files (x86)\Skype\Phone\
https://www.howtogeek.com/howto/41382/how-to-use-linux-commands-in-windows-with-cygwin/