SnC - Search and Concatenate 1.5
 - Combine files made easier.

A command line tool to combine data files. Used to generate reports for gene list with data and annotations in microarray research. It is a universal tool that can do data combination for any text data.


       

Search and Concatenate

Version: 1.5

List keywords in keyword-file, the first word in each line is used as a keyword. Search each keyword in the first column for match(es) in the database-file. The whole line of the match is concatenated to the keyword line and print to console.

Syntax: snc [-c] [-m] [-r] [-b] [-h] [-sSeparator] keyword-file database-file

All input files should have keywords on the first column and use tab-delimited format or custom separator.

 Options:
   -c   Case sensitive: Matches in case-sensitive way.
   -m   Multiple matches: When uses this option, all matched lines will be
           displayed, thus  may result in more lines than the index file. If
           not, only first matched line will be displayed.
   -r   Reserve leading and ending spaces of keywords.
   -b   Remove blank lines in keyword file.
   -sSeparator Set the column separator. Default is tab (\t).
   -h   Show this help.

Usage: snc -c Index.txt  Database.txt

Examples for input files:

> type Index.txt
        Probe_name      M01     M02     M03
        GE102179        0.97    1.02    0.56
        GE102180        0.76    0.99    0.77
        GE102182        0.81    0.74    1.23
        GE102183        1.01    0.91    0.73
        GE102184        0.99    0.77    1.25
        GE102186        0.79    0.72    1.12
        GE102187        1.25    1.07    1.11
        GE102188        1.11    1.26    1.13
        .....

>type Database.txt
        Probe_name      Accession       UGCluster       Symbol
        GE102179        NM_030598       Mm.251242       Dscr1l1
        GE102180        NM_027248       Mm.92391        Zfp219
        GE102182        NM_019439       Mm.32191        Gabbr1
        GE102183        NM_011110       Mm.23347        Pla2g5
        GE102184        AI390431        Mm.309707       3-Sep
        GE102187        NM_009855       Mm.89474        Cd80
        GE102188        NM_029702       Mm.87720        Arfrp1
        GE102189        NM_021389       Mm.286495       Sh3kbp1
        GE102190        NM_008975       Mm.153891       Ptp4a3
        GE102191        NM_020618       Mm.27330        Smarce1
        GE102192        NM_183171       Mm.52641        Fez1
        ......

The command would be:

>snc index.txt database.txt > result.txt

The result would be:

>type result.txt

Probe_name      M01     M02     M03     Probe_name      Accession       UGCluster       Symbol
GE102179        0.97    1.02    0.56    GE102179        NM_030598       Mm.251242       Dscr1l1
GE102180        0.76    0.99    0.77    GE102180        NM_027248       Mm.92391        Zfp219
GE102182        0.81    0.74    1.23    GE102182        NM_019439       Mm.32191        Gabbr1
GE102183        1.01    0.91    0.73    GE102183        NM_011110       Mm.23347        Pla2g5
GE102184        0.99    0.77    1.25    GE102184        AI390431        Mm.309707       Sep3
GE102186        0.79    0.72    1.12
GE102187        1.25    1.07    1.11    GE102187        NM_009855       Mm.89474        Cd80
GE102188        1.11    1.26    1.13    GE102188        NM_029702       Mm.87720        Arfrp1

 

Version: This is a under development alpha version. Please feel free to drop your comments to drakkarsoft@gmail.com

 

Download: SnC.exe (20Kb)

 

Requires:  Microsoft .NET Framework Version 2.0

                 

Install / uninstall: This is a "green" software. Just put it whereever you like it to be, eg, "C:\PathwayDiagrammer", and create a shortcut on you desktop and run it. If you don't like it, just put it into your "Recycle Bin". Nothing will remain in your system.