# spaced seeds

*spaced k-mers, gapped q-grams, gapped k-mers, gapped n-mers, ...*

Note that i also try to keep a list of optimized spaced seed patterns where the percent identity is varying continuously : please contact me if you need specific values or specific models ...

[1]

C.-A. Leimeister, J. Schellhorn, S. Schöbel, M. Gerth, C. Bleidorn, and B. Morgenstern, “Prot-SpaM: Fast alignment-free phylogeny reconstruction based on whole-proteome sequences,” *bioxiv*, May 2018. [ DOI | http | .pdf ]

[2]

T. Dencker, C.-A. Leimeister, M. Gerth, C. Bleidorn, S. Snir, and B. Morgenstern, “Multi-SpaM: a Maximum-Likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees,” in *Proceedings of the 16th RECOMB international conference on Comparative Genomics, Magog-Orford (Canada)*, vol. 11183 of *Lecture Notes in Computer Science*, pp. 227-241, Springer, October 2018. [ DOI ]

[3]

S. Girotto, M. Comin, and C. Pizzi, “FSH: fast spaced seed hashing exploiting adjacent hashes,” *Algorithms for Molecular Biology*, vol. 13, March 2018. (earlier version in WABI 2017). [ DOI | http ]

[4]

D. E. K. Martin, “Minimal auxiliary markov chains through sequential elimination of states,” *Communications in Statistics - Simulation and Computation*, February 2018. [ DOI | http ]

[5]

L. Mallet, T. Bitard-Feildel, F. Cerutti, and H. Chiapello, “PhylOligo: a package to identify contaminant or untargeted organism sequences in genome assemblies,” *Bioinformatics*, vol. 33, pp. 3283-3285, October 2017. [ DOI | http ]

[6]

S. Girotto, M. Comin, and C. Pizzi, “Metagenomic reads binning with spaced seeds,” *Theoretical Computer Science*, vol. 698, pp. 88-99, October 2017. [ DOI |http ]

[7]

S. Girotto, M. Comin, and C. Pizzi, “Fast spaced seed hashing,” in *Proceedings of the 17th International Workshop on Algorithms in Bioinformatics (WABI), Boston (USA)* (R. Schwartz and K. Reinert, eds.), vol. 88 of *Leibniz International Proceedings in Informatics (LIPIcs)*, pp. 7:1-7:14, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, August 2017. [ DOI | http ]

[8]

S. Girotto, M. Comin, and C. Pizzi, “Binning metagenomic reads with probabilistic sequence signatures based on spaced seeds,” in *Proceedings of the 12th IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Manchester (UK)*, August 2017. [ DOI | http ]

[9]

C.-A. Leimeister, S. Sohrabi-Jahromi, and B. Morgenstern, “Fast and accurate phylogeny reconstruction using filtered spaced-word matches,” *Bioinformatics*, vol. 33, pp. 971-979, April 2017. [ DOI | http ]

[10]

L. Noé, “Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds,” *Algorithms for Molecular Biology*, vol. 12, February 2017. [ DOI | http | .pdf ]

[11]

D. E. K. Martin and L. Noé, “Faster exact distributions of pattern statistics through sequential elimination of states,” *Annals of the Institute of Statistical Mathematics*, vol. 69, pp. 231-248, February 2017. [ DOI | http | .pdf ]

[12]

L. Hahn, C.-A. Leimeister, R. Ounit, S. Lonardi, and B. Morgenstern, “rasbhari: optimizing spaced seeds for database searching, read mapping and alignment-free sequence comparison,” *PLoS Computational Biology*, vol. 12, p. e1005107, October 2016. [ DOI | http ]

[13]

R. Ounit and S. Lonardi, “Higher classification sensitivity of short metagenomic reads with CLARK-S,” *Bioinformatics*, vol. 32, pp. 3823-3825, August 2016. [ DOI |http ]

[14]

H. Chen, A. D. Smith, and T. Chen, “WALT: fast and accurate read mapping for bisulfite sequencing,” *Bioinformatics*, vol. 32, pp. 3507-3509, July 2016. [ DOI |http ]

[15]

J. Healy, “FLAK: Ultra-fast Fuzzy Whole Genome Alignment,” in *Proceedings of the 10th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB)*, vol. 477 of *Advances in Intelligent Systems and Computing*, pp. 123-131, Springer, June 2016. [ DOI | http ]

[16]

I. Sović, M. Šikić, A. Wilm, S. N. Fenlon, S. Chen, and N. Nagarajan, “Fast and sensitive mapping of nanopore sequencing reads with GraphMap,” *Nature Communications*, vol. 7, April 2016. [ DOI | http ]

[17]

R. Wang, Y. Xu, and B. Liu, “Recombination spot identification based on gapped k-mers,” *Nature Scientific Reports*, vol. 6, March 2016. RETRACTED: 20 March 2018. [ DOI | http ]

[18]

Y. Gheraibia, A. Moussaoui, Y. Djenouri, S. Kabir, P.-Y. Yin, and S. Mazouzi, “Penguin search optimisation algorithm for finding optimal spaced seeds,”*International Journal of Software Science and Computational Intelligence (IJSSCI)*, vol. 7, pp. 85-99, November 2015. [ DOI | http ]

[19]

P.-T. Do and C.-G. Tran-Thi, “An improvement of the overlap complexity in the spaced seed searching problem between genomic DNAs,” in *Proceedings of the 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS), Ho Chi Minh City (Vietnam)*, pp. 271-276, IEEE Computer Society Press, September 2015. [ DOI | http ]

[20]

R. Ounit and S. Lonardi, “Higher classification accuracy of short metagenomic reads by discriminative spaced k-mers,” in *Proceedings of the 15th International Workshop on Algorithms in Bioinformatics (WABI), Atlanta (USA)*, vol. 9289 of *Lecture Notes in Bioinformatics*, pp. 286-295, Springer, August 2015. [ DOI | http ]

[21]

K. Břinda, M. Sykulski, and G. Kucherov, “Spaced seeds improve k-mer based metagenomic classification,” *Bioinformatics*, vol. 31, pp. 3584-3592, July 2015. [DOI | http ]

[22]

I. Petrov, S. Brillet, E. Drezen, S. Quiniou, L. Antin, P. Durand, and D. Lavenier, “KLAST: fast and sensitive software to compare large genomic databanks on cloud,” in *Proceedings of the World Congress in Computer Science, Computer Engineering, and Applied Computing (WORLDCOMP), Las Vegas (USA)*, pp. 85-90, July 2015. [ .pdf ]

[23]

T. T. Tran, M. Giraud, and J.-S. Varré, “Perfect hashing structures for parallel similarity searches,” in *Proceedings of the 14th IEEE International Workshop on High Performance Computational Biology (HICOMB), Hyderabad, India*, pp. 332-341, May 2015. [ DOI | .pdf ]

[24]

L. Egidi and G. Manzini, “Multiple seeds sensitivity using a single seed with threshold,” *Journal of Bioinformatics and Computational Biology*, vol. 13, p. 1550011, March 2015. [ DOI | http ]

[25]

I. Birol, J. Chu, H. Mohamadi, S. D. Jackman, K. Raghavan, B. P. Vandervalk, A. Raymond, and R. L. Warren, “Spaced seed data structures for de novo assembly,” *International Journal of Genomics*, vol. 2015, p. ID 196591, March 2015. [ DOI | .pdf ]

[26]

B. Morgenstern, B. Zhu, S. Horwege, and C.-A. Leimeister, “Estimating evolutionary distances between genomic sequences from spaced-word matches,”*Algorithms for Molecular Biology*, vol. 10, February 2015. [ DOI | http | .pdf ]

[27]

L. Noé and D. E. K. Martin, “A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances,”*Journal of Computational Biology*, vol. 21, pp. 947-963, December 2014. [ DOI |http | http | http ]

[28]

B. Buchfink, C. Xie, and D. H. Huson, “Fast and sensitive protein alignment using DIAMOND,” *Nature Methods*, vol. 12, pp. 59-60, November 2014. [ DOI | .html ]

[29]

E. Giaquinta, K. Fredriksson, S. Grabowski, A. I. Tomescu, and E. Ukkonen, “Motif matching using gapped patterns,” *Theoretical Computer Science*, vol. 548, pp. 1-13, September 2014. [ DOI | http ]

[30]

M. Ghandi, M. Mohammad-Noori, and M. A. Beer, “Robust k-mer frequency estimation using gapped k-mers,” *Journal of Mathematical Biology*, vol. 69, pp. 469-500, August 2014. [ DOI | http | .pdf ]

[31]

M. Ghandi, D. Lee, M. Mohammad-Noori, and M. A. Beer, “Enhanced regulatory sequence prediction using gapped k-mer features,” *PLoS Computational Biology*, vol. 10, p. e1003711, July 2014. [ DOI | http ]

[32]

S. Horwege, S. Lindner, M. Boden, K. Hatje, M. Kollmar, C.-A. Leimeister, and B. Morgenstern, “Spaced words and kmacs: Fast alignment-free sequence comparison based on inexact word matches,” *Nucleic Acids Research*, vol. 42, pp. W7-W11, May 2014. [ DOI | http | .pdf ]

[33]

C.-A. Leimeister, M. Boden, S. Horwege, S. Lindner, and B. Morgenstern, “Fast alignment-free sequence comparison using spaced-word frequencies,” *Bioinformatics*, vol. 30, pp. 1991-1999, March-April 2014. [ DOI | http | .pdf ]

[34]

K. Břinda, “Languages of lossless seeds,” in *Proceedings of the 14th International Conference on Automata and Formal Languages (AFL), Szeged, Hungary* (Z. Ésik and Z. Fülöp, eds.), vol. 151 of *Electronic Proceedings in Theoretical Computer Science*, pp. 139-150, 2014. [ DOI | http | .pdf ]

[35]

J. Healy and D. Chambers, “Approximate k-mer matching using fuzzy hash maps,”*IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)*, vol. 11, pp. 258-264, March 2014. [ DOI | http ]

[36]

T. Gagie, G. Manzini, and D. Valenzuela, “Compressed spaced suffix arrays,” in *Proceedings of the 2nd International Conference on Algorithms for Big Data (ICABD), Palermo (Italy)*, vol. 1146 of *CEUR-WS*, pp. 37-45, 2014. [ .pdf ]

[37]

W. Li, B. Ma, and K. Zhang, “Optimizing spaced k-mer neighbors for efficient filtration in protein similarity search,” *IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)*, vol. 11, pp. 398-406, February 2014. [ DOI |http ]

[38]

L. Egidi and G. Manzini, “Spaced seeds design using perfect rulers,” *Fundamenta Informaticae*, vol. 131, pp. 187-203, March 2014. (earlier version in SPIRE 2011). [DOI | http ]

[39]

M. C. Frith and L. Noé, “Improved search heuristics find 20 000 new alignments between human and mouse genomes,” *Nucleic Acids Research*, vol. 42, p. e59, February 2014. [ DOI | http | .pdf ]

[40]

L. Egidi and G. Manzini, “Design and analysis of periodic multiple seeds,”*Theoretical Computer Science*, vol. 522, pp. 62-76, February 2014. [ DOI | http ]

[41]

A. M. S. Shrestha, M. C. Frith, and P. Horton, “A bioinformatician's guide to the forefront of suffix array construction algorithms,” *Briefings in bioinformatics*, vol. 15, pp. 138-154, January 2014. [ DOI | http | .pdf ]

[42]

M. Boden, M. Schöneich, S. Horwege, S. Lindner, C. Leimeister, and B. Morgenstern, “Alignment-free sequence comparison with spaced *k*-mers,” in *Proceedings of the German Conference on Bioinformatics (GCB)*, vol. 34 of *OpenAccess Series in Informatics (OASIcs)*, pp. 24-34, September 2013. [ DOI |.pdf ]

[43]

T. Onodera and T. Shibuya, “The gapped spectrum kernel for support vector machines,” in *Proceedings of the International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM)*, vol. 7988 of *Lecture Notes in Computer Science*, pp. 1-15, Springer, April 2013. [ DOI | http | .pdf ]

[44]

L. Egidi and G. Manzini, “Better spaced seeds using quadratic residues,” *Journal of Computer and System Sciences*, vol. 79, pp. 1144-1155, November 2013. [DOI | http ]

[45]

L. Ilie, H. Mohamadi, G. Brian Golding, and W. F. Smyth, “BOND: Basic OligoNucleotide Design,” *BMC Bioinformatics*, vol. 14, February 2013. [ DOI | http |.pdf ]

[46]

M. Hou, L. Zhang, and R. S. Harris, “Alignment seeding strategies using contiguous pyrimidine purine matches,” in *Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB), Orlando (USA)*, pp. 384-391, October 2012. [ DOI | http ]

[47]

W. Li, B. Ma, and K. Zhang, “Efficient filtration for similarity search with spaced k-mer neighbors,” in *Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Philadelphia (USA)*, pp. 11-16, IEEE Computer Society Press, October 2012. [ DOI | http ]

[48]

T. Marschall, I. Herms, H.-M. Kaltenbach, and S. Rahmann, “Probabilistic arithmetic automata and their applications,” *IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)*, vol. 9, pp. 1737-1750, December 2012. [ DOI | http ]

[49]

D. Do Duc, H. Q. Dinh, T. H. Dang, K. Laukens, and X. H. Hoang, “AcoSeeD: An ant colony optimization for finding optimal spaced seeds in biological sequence search,” in *Proceedings of the 8th International Conference on Swarm Intelligence (ANTS), Brussels (Belgium)*, vol. 7461 of *Lecture Notes in Computer Science*, pp. 204-211, Springer, September 2012. [ DOI | http | .pdf ]

[50]

S. Ilie, “Efficient computation of spaced seeds,” *BMC Research Notes*, vol. 5, February 2012. [ DOI | http | .pdf ]

[51]

M. Startek, S. Lasota, M. Sykulski, A. Bulak, L. Noé, G. Kucherov, and A. Gambin, “Efficient alternatives to PSI-BLAST,” *Bulletin of the Polish Academy of Sciences: Technical Sciences*, vol. 60, pp. 495-505, December 2012. [ DOI | http | .pdf ]

[52]

M. Pellegrini, M. E. Renda, and A. Vecchio, “Ab initio detection of fuzzy amino acid tandem repeats in protein sequences,” *BMC Bioinformatics*, vol. 13, p. S8, March 2012. [ DOI | http ]

[53]

M. David, M. Dzamba, D. Lister, L. Ilie, and M. Brudno, “SHRiMP2: Sensitive yet practical short read mapping,” *Bioinformatics*, vol. 27, pp. 1011-1012, April 2011. [DOI | http ]

[54]

E. Bao, T. Jiang, I. Kaloshian, and T. Girke, “SEED: efficient clustering of next-generation sequences,” *Bioinformatics*, vol. 27, pp. 2502-2509, August 2011. [DOI | http | .pdf ]

[55]

L. Egidi and G. Manzini, “Spaced seeds design using perfect rulers,” in *Proceedings of the 18th International Symposium on String Processing and Information Retrieval (SPIRE), Pisa (Italy)*, vol. 7024 of *Lecture Notes in Computer Science*, pp. 32-43, Springer, October 2011. [ DOI | http | .pdf ]

[56]

K. Chen, K. She, and Q. Zhu, “Overlap digraph: An effective model for finding good spaced seeds for biological sequence local alignment,” *Chinese Science Bulletin*, vol. 56, pp. 1100-1107, April 2011. [ DOI | http | .pdf ]

[57]

L. Ilie, S. Ilie, and A. Mansouri Bigvand, “SpEED: fast computation of sensitive spaced seeds,” *Bioinformatics*, vol. 27, pp. 2433-2434, September 2011. [ DOI |http | .pdf ]

[58]

L. Ilie, S. Ilie, S. Khoshraftar, and A. Mansouri Bigvand, “Seeds for effective oligonucleotide design,” *BMC Genomics*, vol. 12, p. 280, June 2011. [ DOI | http |.pdf ]

[59]

A. Gambin, S. Lasota, M. Startek, M. Sykulski, L. Noé, and G. Kucherov, “Subset seed extension to Protein BLAST,” in *Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS 2011), January 26-29 2011, Rome (Italy)*, pp. 149-158, SciTePress Digital Library, January 2011. [ DOI | http ]

[60]

S. M. Kielbasa, R. Wan, K. Sato, P. Horton, and M. C. Frith, “Adaptive seeds tame genomic sequence comparison,” *Genome Research*, vol. 21, pp. 487-493, March 2011. [ DOI | http | .pdf ]

[61]

M. Crochemore and G. Tischler, “The gapped suffix array: A new index structure for fast approximate matching,” in *Proceedings of the 17th International Symposium on String Processing and Information Retrieval (SPIRE), Los Cabos (Mexico)* (E. Chavez and S. Lonardi, eds.), vol. 6393 of *Lecture Notes in Computer Science*, pp. 359-364, Springer, October 2010. [ DOI | http | .pdf ]

[62]

E. Giladi, J. Healy, G. Myers, C. Hart, P. Kapranov, D. Lipson, S. Roels, E. Thayer, and S. Letovsky, “Error tolerant indexing and alignment of short reads with covering template families,” *Journal of Computational Biology*, vol. 17, pp. 1397-1411, October 2010. [ DOI | http | http ]

[63]

L. Noé, M. Gîrdea, and G. Kucherov, “Designing efficient spaced seeds for SOLiD read mapping,” *Advances in Bioinformatics*, vol. 2010, p. ID 708501, July 2010. [DOI | http | .pdf ]

[64]

L. Zhou, I. Mihai, and L. Florea, “Spaced seeds for cross-species cDNA-to-genome sequence alignment,” *Communications in Information and Systems*, vol. 10, no. 2, pp. 115-136, 2010. [ http | .pdf ]

[65]

L. Noé, M. Gîrdea, and G. Kucherov, “Seed design framework for mapping SOLiD reads,” in *Proceedings of the 14th Annual International Conference on Research in Computational Molecular Biology (RECOMB), April 25-28, 2010, Lisbon (Portugal)* (B. Berger, ed.), vol. 6044 of *Lecture Notes in Computer Science*, pp. 384-396, Springer, April 2010. [ DOI | http | http | http ]

[66]

W.-H. Chung and S.-B. Park, “Hit integration for identifying optimal spaced seeds,” *BMC Bioinformatics - Selected articles from the 8th Asia-Pacific Bioinformatics Conference (APBC), 18-21 january, Bangalore, India*, vol. 11, p. S37, January 2010. [ DOI | http | .pdf ]

[67]

G. Battaglia, D. Cangelosi, R. Grossi, and N. Pisanti, “Masking patterns in sequences: A new class of motif discovery with don't cares,” *Theoretical Computer Science*, vol. 410, pp. 4327-4340, October 2009. [ DOI | http ]

[68]

W.-H. Chung and S.-B. Park, “An empirical study of choosing efficient discriminative seeds for oligonucleotide design,” *BMC Genomics*, vol. 10, p. S3, December 2009. [ DOI | http | .pdf ]

[69]

V.-H. Nguyen and D. Lavenier, “PLAST: parallel local alignment search tool for database comparison,” *BMC Bioinformatics*, vol. 10, p. 329, October 2009. [ DOI |http | .pdf ]

[70]

Y. Chen, T. Souaiaia, and T. Chen, “PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds,” *Bioinformatics*, vol. 25, pp. 2514-2521, October 2009. [ DOI | http | .pdf ]

[71]

K. Chen, Q. Zhu, F. Yang, and D. Tang, “An efficient way of finding good indel seeds for local homology search,” *Chinese Science Bulletin*, vol. 54, pp. 3837-3842, November 2009. [ DOI | http | .pdf ]

[72]

B. Ma and H. Yao, “Seed optimization for i.i.d. similarities is no easier than optimal Golomb ruler design,” *Information Processing Letters*, vol. 109, pp. 1120-1124, September 2009. (earlier version in APBC 2008). [ DOI | http ]

[73]

S. M. Rumble, P. Lacroute, A. V. Dalca, M. Fiume, A. Sidow, and M. Brudno, “SHRiMP: Accurate mapping of short color-space reads,” *PLoS Comp. Biol*, vol. 5, p. e1000386, May 2009. [ DOI | http ]

[74]

L. Zhou, M. Pertea, A. L. Delcher, and L. Florea, “Sim4cc: A cross-species spliced alignment program,” *Nucleic Acids Research*, vol. 37, p. e80, May 2009. [ DOI ]

[75]

W. Li, B. Ma, and K. Zhang, “Amino acid classification and hash seeds for homology search,” in *Proceedings of the 1st International Conference in Bioinformatics and Computational Biology, BICoB 2009, New Orleans LA (USA)*, vol. 5462 of *Lecture Notes in Computer Science*, pp. 44-51, Springer, April 2009. [DOI | http | .pdf ]

[76]

L. Ilie and S. Ilie, “Fast computation of neighbor seeds,” *Bioinformatics*, vol. 25, pp. 822-823, March 2009. [ DOI | http | .pdf ]

[77]

D. Y. Mak and G. Benson, “All hits all the time: parameter free calculation of seed sensitivity,” *Bioinformatics*, vol. 25, pp. 302-308, February 2009. (earlier version in APBC 2007). [ DOI | http | .pdf ]

[78]

M. A. Roytberg, A. Gambin, L. Noé, S. Lasota, E. Furletova, E. Szczurek, and G. Kucherov, “On subset seeds for protein alignment,” *IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)*, vol. 6, pp. 483-494, July 2009. [ DOI | http | http | http ]

[79]

K.-M. Chao and L. Zhang, *Sequence Comparison: Theory and Methods*, vol. 7 of *Computational Biology*. Springer, 2008. [ DOI | http ]

[80]

D. Lavenier, “Ordered index seed algorithm for intensive DNA sequence comparison,” in *IEEE International Symposium on Parallel and Distributed Processing (IPDPS)*, pp. 1-8, April 2008. [ DOI | http | .pdf ]

[81]

G. Benson and D. Y. Mak, “Exact distribution of a spaced seed statistic for DNA homology detection,” in *Proceedings of the 15th International Symposium on String Processing and Information Retrieval (SPIRE), Melbourne (Australia)*(A. Amir, A. Turpin, and A. Moffat, eds.), vol. 5280 of *Lecture Notes in Computer Science*, pp. 282-293, Springer, November 2008. [ DOI | http | .pdf ]

[82]

V.-H. Nguyen and D. Lavenier, “Speeding up subset seed algorithm for intensive protein sequence comparison,” in *Proceedings of the 6th IEEE International Conference on research, innovation & vision for the future*, pp. 57-63, July 2008. [DOI | http ]

[83]

J. Yang and L. Zhang, “Run probabilities of seed-like patterns and identifying good transition seeds,” *Journal of Computational Biology*, vol. 15, pp. 1295-1313, December 2008. (earlier version in APBC 2008). [ DOI | http | http ]

[84]

F. Nicolas and É. Rivals, “Hardness of optimal spaced seed design,” *Journal of Computer and System Sciences*, vol. 74, pp. 831-849, August 2008. (earlier version in CPM 2005). [ DOI | http | .pdf ]

[85]

I. Herms and S. Rahmann, “Computing alignment seed sensitivity with probabilistic arithmetic automata,” in *Proceedings of the 8th International Workshop on Algorithms in Bioinformatics (WABI), Karlsruhe (Germany)*, vol. 5251 of *Lecture Notes in Bioinformatics*, pp. 318-329, Springer, September 2008. [ DOI | http |.pdf ]

[86]

H. Lin, Z. Zhang, M. Q. Zhang, B. Ma, and M. Li, “ZOOM! Zillions Of Oligos Mapped,” *Bioinformatics*, vol. 24, pp. 2431-2437, November 2008. [ DOI | http |.pdf ]

[87]

M. A. Roytberg, A. Gambin, L. Noé, S. Lasota, E. Furletova, E. Szczurek, and G. Kucherov, “Efficient seeding techniques for protein similarity search,” in *Bioinformatics Research and Development, Proceedings of the 2nd International Conference BIRD 2008, Vienna (Austria), July 7-9, 2008* (M. Elloumi, J. Küng, M. Linial, R. Murphy, K. Schneider, and C. Toma, eds.), vol. 13 of *Communications in Computer and Information Science*, pp. 466-478, Springer, July 2008. [ DOI |http | http | http ]

[88]

D. G. Brown, *Bioinformatics Algorithms: Techniques and Applications*, ch. A survey of seeding for sequence alignment, pp. 126-152. Wiley-Interscience (I. Mandoiu, A. Zelikovsky), February 2008. [ DOI ]

[89]

J. Yang and L. Zhang, “Run probability of high-order seed patterns and its applications to finding good transition seeds,” in *Proceedings of the 6th Asia Pacific Bioinformatics Conference (APBC), 14-17 January 2008, Kyoto, Japan*(A. Brazma, S. Miyano, and T. Akutsu, eds.), vol. 6 of *Advances in Bioinformatics and Computational Biology*, pp. 123-132, Imperial College Press, January 2008. [DOI | http | .pdf ]

[90]

B. Ma and H. Yao, “Seed optimization is no easier than optimal Golomb ruler design,” in *Proceedings of the 6th Asia Pacific Bioinformatics Conference (APBC), 14-17 January 2008, Kyoto, Japan* (A. Brazma, S. Miyano, and T. Akutsu, eds.), vol. 6 of *Advances in Bioinformatics and Computational Biology*, pp. 133-144, Imperial College Press, January 2008. [ DOI | http | .pdf ]

[91]

L. Zhou, J. Stanton, and L. Florea, “Universal seeds for cDNA-to-genome comparison,” *BMC Bioinformatics*, vol. 9, p. 36, January 2008. [ DOI | http | .pdf ]

[92]

Z. Zhang, H. Lin, and M. Li, “Mango: multiple alignment with N gapped oligos,” *Journal of Bioinformatics and Computational Biology*, vol. 6, pp. 521-541, June 2008. [ DOI | .html | .pdf ]

[93]

R. S. Harris, *Improved pairwise alignment of genomic DNA*. Ph.d. thesis, The Pennsylvania State University, December 2007. [ bib ]

[94]

Z. Zhang, H. Lin, and M. Li, “Mango: A new approach to multiple sequence alignment,” in *Proceedings of the 6th International Conference on Computational Systems Bioinformatics (CSB), San Diego (USA)*, vol. 6, pp. 237-247, August 2007. [ .html | .pdf ]

[95]

G. Kucherov, L. Noé, and M. A. Roytberg, “Subset seed automaton,” in *Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA), July 16-18, 2007, Prague (Czech Republic)*(J. Holub and J. Zdarek, eds.), vol. 4783 of *Lecture Notes in Computer Science*, pp. 180-191, Springer, July 2007. [ DOI | http | http | http ]

[96]

P. Peterlongo, L. Noé, D. Lavenier, G. Georges, J. Jacques, G. Kucherov, and M. Giraud, “Protein similarity search with subset seeds on a dedicated reconfigurable hardware,” in *Proceedings of the 2nd Workshop on Parallel Bio-Computing (PBC), September 9-12, 2007 Gdansk (Poland)* (R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, eds.), vol. 4967 of *Lecture Notes in Computer Science*, pp. 1240-1248, Springer, September 2008. [ DOI | http |.pdf ]

[97]

J.-E. Duchesne, M. Giraud, and N. El-Mabrouk, “Seed-based exclusion method for non-coding RNA gene search,” in *Proceedings of the 13rd International Computing and Combinatorics Conference (COCOON)*, vol. 4598 of *Lecture Notes in Computer Science*, pp. 27-39, Springer, July 2007. [ DOI | http | .pdf ]

[98]

L. Ilie and S. Ilie, “Long spaced seeds for finding similarities between biological sequences,” in *Proceedings of the 2nd International Conference on Bioinformatics & Computational Biology (BIOCOMP)*, pp. 3-8, 2007. [ .pdf ]

[99]

L. Ilie and S. Ilie, “Multiple spaced seeds for homology search,” *Bioinformatics*, vol. 23, pp. 2969-2977, September 2007. [ DOI | http | .pdf ]

[100]

L. Ilie and S. Ilie, “Fast computation of good multiple spaced seeds,” in*Proceedings of the 7th International Workshop on Algorithms in Bioinformatics (WABI), Philadelphia (USA)*, vol. 4645 of *Lecture Notes in Bioinformatics*, pp. 346-358, Springer, September 2007. [ DOI | http | .pdf ]

[101]

L. Zhang, “Superiority of spaced seeds for homology search,” *IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)*, vol. 4, pp. 496-505, July 2007. [ DOI | http ]

[102]

X. Gao, S. C. Li, and Y. Lu, “New algorithms for the spaced seeds,” in *Frontiers of Algorithmic Workshop 2007 (FAW2007)*, vol. 4613 of *Lecture Notes in Computer Science*, pp. 51-61, Springer, August 2007. [ DOI | http | .pdf ]

[103]

S. Feng and E. R. Tillier, “A fast and flexible approach to oligonucleotide probe design for genomes and gene families,” *Bioinformatics*, vol. 23, pp. 1195-1202, May 2007. [ DOI | http | .pdf ]

[104]

Y. Kong, “Generalized correlation functions and their applications in selection of optimal multiple spaced seeds for homology search,” *Journal of Computational Biology*, vol. 14, pp. 238-254, March 2007. [ DOI | http | http ]

[105]

B. Ma and M. Li, “On the complexity of spaced seeds,” *Journal of Computer and System Sciences*, vol. 73, pp. 1024-1034, March 2007. [ DOI | http ]

[106]

M. Farach-Colton, G. M. Landau, S. Cenk Sahinalp, and D. Tsur, “Optimal spaced seeds for faster approximate string matching,” *Journal of Computer and System Sciences*, vol. 73, pp. 1035-1044, November 2007. [ DOI | http ]

[107]

L. Zhou and L. Florea, “Designing sensitive and specific spaced seeds for cross-species mRNA-to-genome alignment,” *Journal of Computational Biology*, vol. 14, pp. 113-130, March 2007. [ DOI | http | http ]

[108]

D. Y. Mak and G. Benson, “All hits all the time: parameter free calculation of seed sensitivity,” in *Proceedings of the 5th Asia Pacific Bioinformatics Conference (APBC)* (D. Sankoff, L. Wang, and F. Chin, eds.), vol. 5 of *Advances in Bioinformatics and Computational Biology*, pp. 327-340, Imperial College Press, January 2007. [ DOI | http | .pdf ]

[109]

M. Csűrös and B. Ma, “Rapid homology search with neighbor seeds,”*Algorithmica*, vol. 48, pp. 187-202, June 2007. (earlier version in COCOON 2005). [ DOI | http | .pdf ]

[110]

J. Xu, D. G. Brown, M. Li, and B. Ma, “Optimizing multiple spaced seeds for homology search,” *Journal of Computational Biology*, vol. 13, pp. 1355-1368, September 2006. (earlier version in CPM 2004). [ DOI | http | http ]

[111]

D. Y. Mak, Y. Gelfand, and G. Benson, “Indel seeds for homology search,” *Bioinformatics*, vol. 22, no. 14, pp. e341-e349, 2006. [ DOI | http | .pdf ]

[112]

A. E. Darling, T. J. Treangen, L. Zhang, C. Kuiken, X. Messeguer, and N. T. Perna, “Procrastination leads to efficient filtration for local multiple alignment,” in *Proceedings of the 6th International Workshop on Algorithms in Bioinformatics (WABI), Zürich (Switzerland)*, vol. 4175 of *Lecture Notes in Bioinformatics*, pp. 126-137, Springer, September 2006. [ DOI | http | .pdf ]

[113]

Y. Sun and J. Buhler, “Choosing the best heuristic for seeded alignment of DNA sequences,” *BMC Bioinformatics*, vol. 7, p. 133, March 2006. [ DOI | http | .pdf ]

[114]

M. Li, B. Ma, and L. Zhang, “Superiority and complexity of the spaced seeds,” in *Proceedings of the 17th Symposium on Discrete Algorithms (SODA)*, pp. 444-453, ACM Press, January 2006. [ DOI | http | .pdf ]

[115]

G. Kucherov, L. Noé, and M. A. Roytberg, “A unifying framework for seed sensitivity and its application to subset seeds,” *Journal of Bioinformatics and Computational Biology*, vol. 4, pp. 553-569, November 2006. [ DOI | .html | http |http ]

[116]

A. Pol and T. Kahveci, “Highly scalable and accurate seeds for subsequence alignments,” in *Proceedings of the IEEE 5th Symposium on Bioinformatics and Bioengineering (BIBE), Minneapolis (USA)*, pp. 27-31, IEEE Computer Society Press, October 2005. [ DOI | http ]

[117]

K. P. Choi and L. Zhang, “Analysis of spaced seed technique in sequence alignment,” *COSMOS*, vol. 1, pp. 57-73, May 2005. [ DOI | .html | .pdf ]

[118]

M. Fontaine, S. Burkhardt, and J. Kärkkäinen, “BDD-based analysis of gapped *q*-gram filters,” *International Journal of Foundations of Computer Science*, vol. 16, pp. 1121-1134, December 2005. (earlier version in PSC 2004). [ DOI | .ps.gz ]

[119]

M. Csűrös and B. Ma, “Rapid homology search with two-stage extension and daughter seeds,” in *Proceedings of the 11th International Computing and Combinatorics Conference (COCOON)*, vol. 3595 of *Lecture Notes in Computer Science*, pp. 104-114, Springer, August 2005. [ DOI | http | .pdf ]

[120]

F. P. Preparata, L. Zhang, and K. P. Choi, “Quick, practical selection of effective seeds for homology search,” *Journal of Computational Biology*, vol. 12, pp. 1137-1152, November 2005. [ DOI | http | http ]

[121]

J. Buhler, U. Keich, and Y. Sun, “Designing seeds for similarity search in genomic DNA,” *Journal of Computer and System Sciences*, vol. 70, no. 3, pp. 342-363, 2005. (earlier version in RECOMB 2003). [ DOI | http | .pdf ]

[122]

B. Brejová, D. G. Brown, and T. Vinař, “Vector seeds: An extension to spaced seeds,” *Journal of Computer and System Sciences*, vol. 70, no. 3, pp. 364-380, 2005. (earlier version in WABI 2003). [ DOI | http ]

[123]

Y. Sun and J. Buhler, “Designing multiple simultaneous seeds for DNA similarity search,” *Journal of Computational Biology*, vol. 12, no. 6, pp. 847-861, 2005. (earlier version in RECOMB 2004). [ DOI | http | http ]

[124]

M. Farach-Colton, G. M. Landau, S. Cenk Sahinalp, and D. Tsur, “Optimal spaced seeds for faster approximate string matching,” in *Proceedings of the 32nd International Colloquium on Automata, Languages and Programming (ICALP'05), Lisboa (Portugal)*, vol. 3580 of *Lecture Notes in Computer Science*, pp. 1251-1262, Springer, 2005. [ DOI | http | .pdf ]

[125]

F. Nicolas and É. Rivals, “Hardness of optimal spaced seed design,” in*Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM), Jeju Island (Korea)* (A. Apostolico, M. Crochemore, and K. Park, eds.), vol. 3537 of *Lecture Notes in Computer Science*, pp. 144-155, Springer, 2005. [DOI | http | .pdf ]

[126]

D. G. Brown, “Optimizing multiple seeds for protein homology search,” *IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)*, vol. 2, pp. 29-38, january 2005. (earlier version in WABI 2004). [ DOI | http ]

[127]

D. Kisman, M. Li, B. Ma, and W. Li, “tPatternhunter: gapped, fast and sensitive translated homology search,” *Bioinformatics*, vol. 21, pp. 542-544, February 2005. [ DOI | http | .pdf ]

[128]

B. Brejová, *Evidence Combination in Hidden Markov Models for Gene Prediction*. PhD thesis, University of Waterloo, 2005. [ http | .pdf ]

[129]

L. Noé and G. Kucherov, “YASS: enhancing the sensitivity of DNA similarity search,” *Nucleic Acids Research*, vol. 33 (web-server issue), pp. W540-W543, April 2005. [ DOI | http | .pdf ]

[130]

G. Kucherov, L. Noé, and M. A. Roytberg, “A unifying framework for seed sensitivity and its application to subset seeds (extended abstract),” in *Proceedings of the 5th International Workshop on Algorithms in Bioinformatics (WABI), October 3-6, 2005, Mallorca (Spain)* (R. Casadio and G. Myers, eds.), vol. 3692 of *Lecture Notes in Computer Science*, pp. 251-263, Springer, October 2005. [ DOI | http |http | http ]

[131]

G. Kucherov, L. Noé, and M. A. Roytberg, “Multiseed lossless filtration,” *IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)*, vol. 2, pp. 51-61, January 2005. [ DOI | http | http | http ]

[132]

L. Noé, *Recherche de similarités dans les séquences d'ADN: modèles et algorithmes pour la conception de graines efficaces*. PhD thesis, Université Henri Poincaré - Nancy, September 2005. [ http | http ]

[133]

J. Flannick and S. Batzoglou, “Using multiple alignments to improve seeded local alignment algorithms,” *Nucleic Acids Research*, vol. 33, pp. 4563-4577, August 2005. [ DOI | http ]

[134]

P. Peterlongo, N. Pisanti, F. Boyer, and M.-F. Sagot, “Lossless filter for finding long multiple approximate repetitions using a new data structure, the bi-factor array,” in *Proceedings of the 12th International Conference, on String Processing and Information Retrieval (SPIRE), Buenos Aires (Argentina)* (M. Consens and G. Navarro, eds.), vol. 3772 of *Lecture Notes in Computer Science*, pp. 179-190, November 2005. [ DOI | http | .pdf ]

[135]

D. G. Brown, M. Li, and B. Ma, “A tutorial of recent developments in the seeding of local alignment,” *Journal of Bioinformatics and Computational Biology*, vol. 2, no. 4, pp. 819-842, 2004. [ DOI | .html | .pdf ]

[136]

M. Fontaine, S. Burkhardt, and J. Kärkkäinen, “BDD-based analysis of gapped *q*-gram filters,” in *Proceedings of the 9th Prague Stringology Conference (PSC)*, pp. 56-68, 2004. [ .html | .pdf ]

[137]

D. G. Brown, “Multiple vector seeds for protein alignment,” in *Proceedings of the 4th International Workshop on Algorithms in Bioinformatics (WABI), Bergen (Norway)* (I. Jonassen and J. Kim, eds.), vol. 3240 of *Lecture Notes in Bioinformatics*, pp. 170-181, Springer, September 2004. [ DOI | http | .pdf ]

[138]

D. G. Brown and A. K. Hudek, “New algorithms for multiple DNA sequence alignment,” in *Proceedings of the 4th International Workshop on Algorithms in Bioinformatics (WABI), Bergen (Norway)* (I. Jonassen and J. Kim, eds.), vol. 3240 of *Lecture Notes in Bioinformatics*, pp. 314-325, Springer, September 2004. [ DOI |http | .pdf ]

[139]

X. Huang, L. Ye, H.-H. Chou, I.-H. Yang, and K.-M. Chao, “Efficient combination of multiple word models for improved sequence comparison,” *Bioinformatics*, vol. 20, no. 16, pp. 2529-2533, 2004. [ DOI | http | .pdf ]

[140]

U. Keich, M. Li, B. Ma, and J. Tromp, “On spaced seeds for similarity search,” *Discrete Applied Mathematics*, vol. 138, no. 3, pp. 253-263, 2004. (earlier version in 2002). [ DOI | http ]

[141]

M. Csűrös, “Performing local similarity searches with variable length seeds,” in *Proceedings of the 15th Annual Combinatorial Pattern Matching Symposium (CPM), Istanbul (Turkey)* (S. Sahinalp, S. Muthukrishnan, and U. Dogrusoz, eds.), vol. 3109 of *Lecture Notes in Computer Science*, pp. 373-387, Springer, 2004. [DOI | http | .pdf ]

[142]

J. Xu, D. G. Brown, M. Li, and B. Ma, “Optimizing multiple spaced seeds for homology search,” in *Proceedings of the 15th Symposium on Combinatorial Pattern Matching (CPM), Istambul (Turkey)* (S. Sahinalp, S. Muthukrishnan, and U. Dogrusoz, eds.), vol. 3109 of *Lecture Notes in Computer Science*, pp. 47-58, Springer, 2004. [ DOI | http | http ]

[143]

I.-H. Yang, S.-H. Wang, Y.-H. Chen, P.-H. Huang, L. Ye, X. Huang, and K.-M. Chao, “Efficient methods for generating optimal single and multiple spaced seeds,” in *Proceedings of the IEEE 4th Symposium on Bioinformatics and Bioengineering (BIBE), Taichung (Taiwan)*, pp. 411-416, IEEE Computer Society Press, 2004. [DOI | http ]

[144]

B. Brejová, D. G. Brown, and T. Vinař, “Optimal spaced seeds for homologous coding regions,” *Journal of Bioinformatics and Computational Biology*, vol. 1, pp. 595-610, January 2004. [ DOI | .html | .pdf ]

[145]

M. Li, B. Ma, D. Kisman, and J. Tromp, “PatternHunter II: Highly sensitive and fast homology search,” *Journal of Bioinformatics and Computational Biology*, vol. 2, no. 3, pp. 417-439, 2004. (earlier version in GIW 2003). [ DOI | .html ]

[146]

Y. Sun and J. Buhler, “Designing multiple simultaneous seeds for DNA similarity search,” in *Proceedings of the 8th Annual International Conference on Research in Computational Molecular Biology (RECOMB), San Diego (California)*, pp. 76-84, March 2004. [ DOI | http ]

[147]

K. P. Choi, F. Zeng, and L. Zhang, “Good spaced seeds for homology search,”*Bioinformatics*, vol. 20, no. 7, pp. 1053-1059, 2004. [ DOI | http | .pdf ]

[148]

L. Noé and G. Kucherov, “Improved hit criteria for DNA local alignment,” *BMC Bioinformatics*, vol. 5, p. 149, October 2004. [ DOI | http | .pdf ]

[149]

G. Kucherov, L. Noé, and M. A. Roytberg, “Multi-seed lossless filtration (extended abstract),” in *Proceedings of the 15th Annual Combinatorial Pattern Matching Symposium (CPM), July 5-7, 2004, Istanbul (Turkey)* (S. Sahinalp, S. Muthukrishnan, and U. Dogrusoz, eds.), vol. 3109 of *Lecture Notes in Computer Science*, pp. 297-310, Springer, July 2004. [ DOI | http | http | http ]

[150]

G. Kucherov, L. Noé, and Y. Ponty, “Estimating seed sensitivity on homogeneous alignments,” in *Proceedings of the IEEE 4th Symposium on Bioinformatics and Bioengineering (BIBE), May 19-21, 2004, Taichung (Taiwan)*, pp. 387-394, IEEE Computer Society Press, April 2004. [ DOI | http | http | http ]

[151]

L. Noé and G. Kucherov, “Improved hit criteria for DNA local alignment,” in *Proceedings of the 5th Open Days in Biology, Computer Science and Mathematics (JOBIM), June 28-30, 2004, Montréal (Canada)*, June 2004. [ http | http ]

[152]

W. Chen and W.-K. Sung, “On half gapped seed,” *Genome Informatics*, vol. 14, pp. 176-185, 2003. (earlier version in GIW 2003). [ DOI | http | .pdf ]

[153]

B. Brejová, D. G. Brown, and T. Vinař, “Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity,” in *WABI*, vol. 2812 of *Lecture Notes in Computer Science*, pp. 39-54, Springer, September 2003. [ DOI | http | .pdf ]

[154]

K. P. Choi and L. Zhang, “Sensitivity analysis and efficient method for identifying optimal spaced seeds,” *Journal of Computer and System Sciences*, vol. 68, no. 1, pp. 22-40, 2004. [ DOI | http ]

[155]

J. Buhler, U. Keich, and Y. Sun, “Designing seeds for similarity search in genomic DNA,” in *Proceedings of the 7th Annual International Conference on Research in Computational Molecular Biology (RECOMB), Berlin (Germany)*, pp. 67-75, ACM Press, April 2003. [ DOI | .pdf ]

[156]

S. Schwartz, W. J. Kent, A. Smit, Z. Zhang, R. Baertsch, R. C. Hardison, D. Haussler, and W. Miller, “Human-mouse alignments with BLASTZ,” *Genome Research*, vol. 13, pp. 103-107, 2003. [ DOI | http ]

[157]

B. Ma, J. Tromp, and M. Li, “PatternHunter: Faster and more sensitive homology search,” *Bioinformatics*, vol. 18, no. 3, pp. 440-445, 2002. [ DOI | http | .pdf ]

[158]

B. Brejová, D. G. Brown, and T. Vinař, “Optimal spaced seeds for Hidden Markov Models, with application to homologous coding regions,” in *Proceedings of the 14th Symposium on Combinatorial Pattern Matching (CPM), Morelia (Mexico)*(M. C. R. Baeza-Yates, E. Chavez, ed.), vol. 2676 of *Lecture Notes in Computer Science*, pp. 42-54, Springer, June 2003. [ DOI | http | .pdf ]

[159]

S. Burkhardt and J. Kärkkäinen, “Better filtering with gapped *q*-grams,” *Fundamenta Informaticae*, vol. 56, no. 1-2, pp. 51-70, 2002. (earlier version in CPM 2001). [ http | .ps.gz ]

[160]

S. Burkhardt and J. Kärkkäinen, “One-gapped *q*-gram filters for Levenshtein Distance,” in *Proceedings of the 13th Symposium on Combinatorial Pattern Matching (CPM)*, vol. 2373 of *Lecture Notes in Computer Science*, pp. 225-234, Springer, 2002. [ DOI | http | .pdf ]

[161]

J. Buhler, “Provably sensitive indexing strategies for biosequence similarity search,” in *RECOMB, Washington DC (USA)*, pp. 90-99, ACM Press, April 2002. [DOI | http ]

[162]

P. Nicodème, B. Salvy, and P. Flajolet, “Motif statistics,” *Theoretical Computer Science*, vol. 287, no. 2, pp. 593-617, 2002. [ DOI | http ]

[163]

S. Burkhardt and J. Kärkkäinen, “Better filtering with gapped *q*-grams,” in *Proceedings of the 12th Symposium on Combinatorial Pattern Matching (CPM)*, vol. 2089 of *Lecture Notes in Computer Science*, pp. 73-85, Springer, July 2001. [DOI | http | .pdf ]

[164]

J. Buhler and M. Tompa, “Finding motifs using random projections,” in *Proceedings of the 5th Annual International Conference on Research in Computational Molecular Biology (RECOMB)*, pp. 69-76, ACM Press, 2001. [ DOI | http ]

[165]

J. Buhler, “Efficient large-scale sequence comparison by locality-sensitive hashing,” *Bioinformatics*, vol. 17, no. 5, pp. 419-428, 2001. [ DOI | http | .pdf ]

[166]

W. J. Kent and A. M. Zahler, “Conservation, regulation, synteny, and introns in a large-scale c. briggsae–c. elegans genomic alignment,” *Genome Research*, vol. 10, pp. 1115-1125, August 2000. [ DOI | http | .pdf ]

[167]

A. Califano and I. Rigoutsos, “Flash: A fast look-up algorithm for string homology,” in *Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology (ISMB)*, pp. 56-64, July 1993. [ DOI | http | .pdf ]