Mailing Lists

    • DDLBETA: An invitation-only mailing list for discussions of text classification, text mining, and related issues.
    • SIG-IRList: a moderated regular IR news source.
    • DBWorld: mailing list intended for messages of interest to the database research community.
    • KDNuggets: bi-weekly electronic newsletter focusing on Data Mining and Knowledge Discovery.

Software Documents and Manuals

Crawlers

    • Heritrix: A web crawler written in Java.
    • Larbin: Another web crawler which is written in c.
    • Sitemap: A crawling protocol observed by Google, Yahoo! and MSN goes beyond robots.txt.

"It is a very sad thing that nowadays there is so little useless information." -- Oscar Wilde

compass