Putting final touches

15 October 2022

It has been a while since I documented my progress in the guess_topologyAttrs API. Last time I talked about testing both the API and other parts of the library that I have touched. In the past month, I was mainly adding the final touches to the code and continually addressing the mentor’s feedback, plus finishing the documentation and the docstring for the API and the DefaultGuesser.

The most important final updates that have been done in the last month are as follows:


Removing parser checking inside the universe


This was nearly the most debatable point with my mentors while working on the API. My point was we should check for the parser before starting masses and types guessing. I felt it is more optimum to not begin automatic guessing for every and all created universe, and just guess types and masses when topologies come from a parser that used to do so. On the other side, Jonathan’s ((@jbarnoud, my mentor) point of view was to not check for any parser and just add (‘types ‘, masses’) as a default value to the to_guess parameter of the universe; this will result in a cleaner code, cause checking parsers is a temporary thing that has no meaning in future versions. So, after several conversations, I got convinced by Jonathan’s point of view and removed the parser-checking part.


Autopep8 accident


I ran autopep8 into several files that I was working on (universe.py, table.py, and some topology parsers modules) to automatically style them and avoid styling manually. This caused more harm than good; it made lots of unnecessary changes to those files, which made it harder to review the real code changes in the pull request. This took some time to restore the old styles of the affected. The lesson here is just don’t run autopep8 into the whole module, which I learned the hard way.


Testing old vs new types and masses values

I added three tests in the parser's base.py module to check three things: 1- types and/or masses are guessed as expected in all universes created with parsers after having no guessing happen in them anymore. 2- The values of the guessed types with the guess_topologyAttr API are the same as those from the old behavior. 3- The values of the guessed masses with the guess_topologyAttr API are the same as those from the old behavior. Once those three tests are passed, then it's safe to say that we are not breaking the default behavior of the code.

Force guessing and partial guessing

In the above phase, I discovered that the ITPParser parser doesn’t fully guess masses; instead, it read masses and fills the unknown or trashy values with guessed ones. To reserve this behavior, we need the guess_topologyAttrs API to be aware of this, so we need the default behavior to fill in the missing values if the attribute already exists in the universe. This led to adding a force_guessing parameter to the API, which gives the API more flexibility in handling different cases. So now the user can either guess the attribute by filling the gaps if exist or guess all the attribute values unconditionally by passing the attribute of interest to the force_guess parameter. This behavior is handled in the guess_attrs method of the BaseGuesser class as follows:

missing_value_label topologyattrs attribute

Adding the concept of partial guessing gave rise to the need for a label that represents the missing value for each attribute. For that reason, I added a missing_value_label attribute to the topologyattrs class. For now, it is declared for the Masses to be a nan. So, every time we carry partial guessing for masses, we check if a mass value is a nan, and this check is carried through the is_missing_value class method of the TopologyAttr.


Merged pull requests 🎉

In September, I had two merged pull requests, which are more related to fixing bugs and minor issues that I faced while working on the API:


1- gussed_attributes and read_attributes methods unexpected behavior # 3779:

This pull request is for fixing a bug that I faced while using gussed_attributes and read_attributes topology methods. The issue with those methods was related to not dealing optimally with is_guessed values of topology objects as bonds and angles.


2- adding element attribute to txyz parser #3826:

This pull request was for adding elements attribute to the TXYZParser. This parser was the only one that guesses masses from names instead of atom types, to not break this behavior, I added to it elements attribute, which are simply valid atom element names. So, now the DefaultGuesser can guess masses from elements with not worrying about any special behavior.



Although the guesser-basics pull request is not merged yet, the guess_topologyAtrrs functionality is almost complete and the updates to it at this level are not major ones.


Next step

Now I began implementing PDBGuesser, which I’ll talk about in detail in the next post.