When our email collection becomes a pile of documents, in many cases, we need to organize and analyze it to give us information about it. Therefore, we use an email client, like Thunderbird or Microsoft Outlook, which gives us tools for basic functions: sending, retrieving, organizing our emails, and spam detection. Those tools are not for analyzing large email collections. To analyze them, we need a special software tool that functions as the analyzer tool to provide information about the email collections. When we analyze emails, we know who communicates with us, what and how many groups are formed based on email frequency, the most interchangeable information among participants, and what topics are mainly discussed.
Based on those requirements and problems, I have developed a unique tool with many features, especially for analyzing large document collections. I call this application as BuddyMiner. The first version of BuddyMiner is restricted to read mbox file format (a stored Mozilla email format). With BuddyMiner, we will be helped to find some patterns of information, automatic clustering, statistic graphics of email collection, information retrieval for the collection, etc.
Features BuddyMiner was developed based on text mining clustering, information retrieval, and information extraction theory. This approach makes BuddyMiner a special application for any organization to help them find hidden pattern information in their email collection. BuddyMiner is designed to analyze Indonesian and English documents. Picture 1 gives us an illustration of the main interface of BuddyMiner.
Picture 1. An Illustration of the BuddyMiner Interface