Lab 1 - Inverted Index

Due Tuesday 9/7, 11:59pm

In this assignment, you will brush up on your Java skills and review Java's file I/O libraries and Collections framework. You will write a program that processes several text files and builds an inverted index. Your inverted index will be a data structure that stores a mapping from words to the documents in which those words were found.

Requirements

    1. You will design the inverted index data structure using any data structures available in the Java Collections framework. Think about efficiency! Insertion of new records should be fast. Also, given a word, finding its record should be fast.
    2. Your program will take as input a String denoting a directory on the user's computer. It will traverse the directory and all its subdirectories. For each text file found (you may assume you only process files with extension .txt), your program will process the file and add the appropriate data to the inverted index.
      1. For each word in the file, your program will store a record in the inverted index indicating the document in which the word appears and the position at which the word was found in the document.
    3. Your program will ignore all characters except letters and digits.
  1. The output of your program will be a text file named output.txt that contains the information in the inverted index.
  2. You will submit all of your code and class files in a jar called invertedindex.jar. Your program will be tested as follows. If your program does not run as follows, one letter grade will be deducted from your score.

java -cp invertedindex.jar Driver -d /My/Directory

Testing

  1. One test case has been provided for you in /home/public/cs212 (though you should develop other test cases of your own). The lab1 directory contains a sample directory sampledir and a sample output file output.txt. From your home directory, if you run the following command your output should look very similar to the file output.txt provided. Also, the running time for the program should not be more than a few seconds on this input.

java -cp invertedindex.jar Driver -d /home/public/cs212/lab1/sampledir

Submission Instructions