For this project, you will extend your previous project to support multithreading. You must make your inverted index data structure thread-safe using a custom lock class that allows multiple read operations and exclusive write operations. The building and searching of your inverted index must also be multithreaded using a work queue and thread pool.
The input and output requirements of this project are identical to the previous project. In addition to the normal testing of your project, you must also compare the runtimes of this project to your previous one.
The suggested deadline for this project is Monday, April 08, 2013 at 11:59pm. You must still meet the functionality requirements of the previous project.
In addition to the requirements of the previous project, you must extend your inverted index to support the following functionality:
Create a custom lock class that allows concurrent read operations, and exclusive/non-concurrent write and read/write operations.
Make your inverted index data structure thread-safe using your custom lock class.
Make the building of your inverted index multithreaded, such that each worker thread parses a single text file.
Make the searching of your inverted index multithreaded, such that each worker thread builds the search results for each individual query (which may contain multiple words).
Make sure searching does not begin until your inverted index is finished being populated.
Use work queues/thread pools for the multithreading in this project. You may use the IBM developerWorks Work Queue implementation.
Exit gracefully without calling System.exit() when all of the building and searching operations are complete.
You may NOT use any of the classes in the java.util.concurrent package.
It is very important you take an iterative approach for this project. Here is a suggested breakdown of tasks:
Make the inverted index thread-safe using the synchronized keyword, without using a custom lock. Rerun all unit tests.
Make the building of the inverted index multithreaded using a work queue with a single thread. Rerun all unit tests.
Make the building of the inverted index multithreaded using a work queue with multiple threads. Rerun all unit tests.
Make the searching of the inverted index multithreaded using a work queue. Rerun all unit tests.
Make the inverted index thread-safe using your custom lock class, replacing the synchronized keyword with calls to your custom lock object. Rerun all unit tests.
You should not be experiencing any significant slowdowns of your code. If you do at any step, you may want to re-evaluate your approach. However, to be eligible for code review, you only need to achieve correct results. Efficiency will be evaluated as part of the design process.
Your code must run on the lab computers. If you are developing your code on a home computer or laptop, be sure to check out your code on a lab computer and test it. Your main method must be placed in a class named Driver. This should be the only file that is not generalized and specific to the project.
Your code will be tested using the following commands:
svn export https://www.cs.usfca.edu/svn/<username>/cs212/project3
cd project3
java -cp project3.jar Driver <arguments>
where <arguments> will be the following command-line arguments (in any order):
-d <directory> where -d indicates the next argument is a directory, and <directory> is the directory of text files that must be processed for the inverted index
-q <queryfile> where -q indicates the next argument is a file path, and <file> is a text file containing search queries
-i <filename> where -i is an optional flag such that:
If present, you should output the inverted index to a file. If not present, do not output the inverted index.
If the <filename> is missing, you should use invertedindex.txt as the default filename.
-r <filename> where -r is an optional flag such that:
If present, you should output the search results to a file. If not present, do not output the search results.
If the <filename> is missing, you shoud use searchresults.txt as the default filename.
-t <threads> where -t indicates the next argument <threads> is the number of threads to use in the work queue/thread pool
If the proper command-line arguments are not provided, your program should output a user-friendly error message to the console and exit gracefully.
The output of your program will be identical to the previous project. Do NOT output the files invertedindex.txt and searchresults.txt unless the appropriate flags are specified.
You must submit your project to your SVN repository at:
https://www.cs.usfca.edu/svn/<username>/cs212/project3
where <username> should be replaced with your CS username. You should include the following files in this directory:
a jar file named project3.jar in all lowercase that includes all of the necessary *.class files to run your program
a src directory with all of the *.java files necessary to compile your program
a readme.txt file with your name, email address, student id, and brief description/justification of your approach
If there are any issues with your submission, you will be asked to resubmit the project and a code review will not be performed.
You should thoroughly test your own code. Make sure it meets the functionality requirements, performs proper exception and error handling, and produces the correct output. Your code must be fully functional before submitting it for code review.
Several test input, output, and JUnit files have been provided to help you test your code. You may access these files on the lab computers at:
/home/public/cs212
These are the same input/output files as the previous project.
You MUST pass all of the Project 2 unit tests and Project 3 unit tests before moving on to Project 4. However, you may sign up for code review if you are passing all of the unit tests EXCEPT the testRuntime() Project 3 unit test.