Testing fight!
29 August 2022
29 August 2022
Testing is a key step to implement best development practices: it helps to assess the functionality of the code. By testing our code we can identify which parts of the code need improvement or do not work as expected. During the last two weeks, I have been working on the tests for the guess_topologyAttribute API and DefaultGuesser (add format) after modifying the code base. I should say that working on tests after modifying code is one of the most unwise decisions I have recently taken, probably only surpassed by my decision of eating pineapple on pizza! 🍕 🍍 Probably adding test after instead of simultaneously with the code base, wasn't the best decision given that The challenge in my GSoC project is not much about the complexity of the code but rather that the introduced changes have an impact across a significant part of the MDAnalysis library.
In this blog post, I will share how I started adding unit tests after modifying nearly thirty files in the code base, how it made me realize (maybe a bit too late) that a test-driven development is far more efficient and clean and how I learned that it is always better to add tests together with changes in the code base!
Fixing the existing tests
The first thing I had to do before writing new tests was to make sure that the existing tests in the MDAnalysis test suite is working fine. Obviously, the tests that had the most dramatic failures were those concerned with parsers. Parsers have three main tests that were directly affected:
test_mandatory_attributes(): for making sure that mandatory attributes had been created properly (those are 'ids', 'resids', 'resnums', 'segids').
test_expected_attributes(): for ensuring that the expected attributes(either read or guessed) have been created properly.
test_no_unexpected_attributes(): for catching any foreign attribute that wasn't expected to exist.
As I removed guessing inside parsers (commit 0cbc497), the above tests had failed in one way or another for each parser. So, I navigated through all parsers tests files and modify their mandatory and expected lists accordingly. However, while working on this fix I had to make sure that I don’t remove types or masses by mistake because some parsers read types and masses if they are available.
The table below provides details about the behavior of each guesser with atom types and masses.
Removing atom type and mass guessing from parsers was followed by transferring it to take place at the universe initiation level. This was necessary for now; because we still don’t want to break the default behavior of the current version of MDAnlaysis, so whatever changes I do it must not affect the overall output of the universe initiation. That's why I put a simple condition inside the universe constructor after reading the topology file: if atom types or masses are not a part of the topology then add them to the to_guess parameter list. This seemed fine to me until I discovered some Universe unit tests failed because the universe was trying to guess atom type and masses for empty universes or universes constructed from a topology object that doesn't need this automatic guessing behavior, and it is irrelevant to them and will cause crashes.
So, I had to put extra checkpoints for when automatic guessing should happen, which is mainly needed in the case when the topology data is coming from a valid parser. This directed me to track where in universe __init__ we can get the information of the source of the topology data, so if it is from one of our registered parsers, we can then start automatic guessing. I found out this info comes from a method called topology_from_file_like(), which takes the topology file as input and try to find a parser for its format, then return a topology object, so I modified it to return the topology object + the name of the parser used to read the file stream if any.
Before setting on the above solution, I had a discussion with Jonathan (@jbarnoud, my mentor) about where we should check for either automatic guessing should happen or not. Originally, I thought about putting this checkpoint inside the topology_from_file_like() method and then returning a boolean value that reflects if we should start guessing or not (commit b80edec). But Jonathan advised me to remove this from the the topology_from_file_like() method, because in the end these checkpoints are not permanent, and they are just there to help with automatic guessing, which will be removed in an upcoming version of the library, so there is no need to pollute existing stable helper methods.
Writing test
After resolving all the issues with the existing tests, its time to write new tests for the new functionalities, this was more straightforward, as I already fixed many bugs and issues in the code while trying to pass existing tests, besides that, DefaultGuesser contains the same guesser methods that already exist and tested, I just needed to make a minor modification to it.
MDAnalysis tests
Mdanalysis uses pytest to develop and run its test suite. I was already familiar with it because I made a pull request with a couple of unit tests at the application phase of GSoC. For more details about contributing to unit tests for MDAnalysis, see Test in MDAnalysis.