You can find some general hints here about how to successfully pass Project 1 – Inverted Index and Project 2 – Partial Search. Note that each approach is different, so some of these hints may not apply.
This project involves creating a custom data structure—specifically an inverted index. Think about a HashMap data structure. Does it have any string or file parsing logic within the class? It does not. The data structure only focuses on maintaining data, including storing data, getting data, and outputting data. Other classes then use this data structure to store information, and provide any string or file parsing necessary.
Remember that only the Driver class should be project specific, and any code that is generalized should be placed elsewhere. The Driver class will usually handle argument parsing, which methods to call at what times, and some exception handling.
You are expected to produce professional code, which includes adhering to proper code style and commenting guidelines. Use Javadoc comments, and make sure you comment each method and member. Be descriptive, use proper capitalization, and grammar.
Make methods static when possible. A static method does not depend on any instance members or call any instance methods. If that describes one of your methods, declare that method static. This will usually result in a speedup in your code, because the compiler can make more assumptions and optimizations regarding that method.
Don't make all your variables instance members. Unless you need to store an attribute of the instance, try local variables instead. Instance members should almost always be declared private, initialized in the constructor, and properly encapsulated within your class. This takes quite a bit of extra work, so only make instance members when necessary.
Some of your classes may seem sparse for this project, but you'll likely fill in additional functionality in later projects.
Make sure you work on this project iteratively. Exactly how you break this project up can vary. I recommend focusing on partial search and search result sorting separately.
You know you will eventually have to sort your search results. We've examined some ways of doing this with Collections.sort(), Comparable, Comparator, and anonymous inner classes. Knowing this, you may want to actually create a class for storing search results. Each search result should have a total count and earliest position attribute. This will make sorting search results later much easier.
You may be tempted to use a tree-like collection for sorting search results. However, collections like TreeMap and TreeSet do not behave well with (a) custom objects and (b) mutable objects whose attributes you sort on may change. A better approach will be to use Collections.sort() once the search results have been generated.
When working on how to search through your inverted index efficiently, keep in mind that you do not want to search through your ENTIRE inverted index for every query. You will need a sorted set of keys to do this efficiently. Either use a data structure that maintains a sorted order, or maintain a separate set of sorted keys. In the second case, just make sure you don't re-sort the keys unless you need to. Then, you should be able to figure out (a) where to start looking, and (b) where to stop looking.
Make sure your Driver class triggers building the index, searching the index, and printing any results. These should not be automatically triggered by methods in your other classes, as this order of operations is project-specific.
With this project, many students are tempted to break encapsulation by passing around references to private mutable data, or making unnecessary copies of data to avoid breaking encapsulation but decreasing time and space efficiency. If you find yourself doing this, then there is likely a better design approach you can take. However, don't get too caught up on design for your first iteration. First focus on correctness, and then worry about refactoring your code for efficiency. You should get used to the fact that the first version of your code will no longer be the last version!
Make sure your project 2 still passes the project 1 unit tests!