Scrubbing Statements

Note: This section has absolutely nothing to do with installing or using the PocketSense scripts. It's just a FYI for folks interested in what's going on behind the scenes.

Scrub-a-dub-dub

Implementing OFX transactions isn't terribly complicated. Tedious... yes, but not complicated. There's an entire standard dedicated to describing what a developer needs to know. Every transaction in an OFX account statement has a FITID field, which I presume is derived from the term FInancial Transaction IDentifier. The Microsoft OFX implementation requires that every transaction for an account have a unique FITID value, since this field is used to match download transactions with those that already exist.

So, what were the odds of running into issues right out of the starting gate? Apparently, they were pretty high. We use Discover as our primary charge card, and this just happens to be one of the institutions that didn't implement the OFX standard very well. When I first started testing the code, I found that duplicate entries were being inserted into my Money file. Not one or two duplicates, but every transaction that I had retrieved during the time interval. If I re-ran the code on that same day, there would be no more duplicates. If I waited a day and ran it again, the duplicates would come back. All of them. This, obviously, didn't cut it. Fixing Discover data transmissions, so that Money would properly match transactions, became a priority.

I decided that a mechanism was needed for "scrubbing" OFX data files sent from an institution to correct known issues. Based on my limited account set, none of the institutions format their transaction identifiers the same. It therefore seemed reasonable to include the ability to call tailor-made routines prior to sending OFX files to Money. At that time, I was only concerned about Discover, but felt that the "fix" shouldn't be hard-coded into the main routine. This way, any site could have a special routine that would be called prior to transferring the data to Money. Simple enough, except that we first needed to figure out what was actually wrong with the Discover data. You can read more about it below, if you're interested, but suffice it to say that Discover Card transaction downloads have worked great since.

The Tribulations of Discover

If the FITID field is implemented correctly, Money should never duplicate transactions from a series of downloaded statements. All of the institutions that I currently use have implemented this requirement in a manner that works. Well, everyone except Discover. Looking inside a Discover OFX statement, we find that the their FITID field always has the following form: FITIDYYYYMMDDamt##### . Breaking this value into its components:

On the surface, this looks to meet the Microsoft OFX requirement for a unique FITID, except for one small (but important) glitch. The 5-digit serial number isn't always the same. If you connect to Discover on Tuesday, and then again on Wednesday, the identical transaction may download with two different 5-digit serial numbers. Why Discover does this is a mystery (to me) at this point in time, but it causes a big problem when trying to match transactions from downloaded statements that were collected on different days.

My first "fix" for this was to edit the code to retrieve only those transactions that had posted since the last successful statement download. I rewrote the script to do this, and had it working great, until a weekend came along. It turns out, that Discover may not send a transaction through it's OFX server until a day or two after it posts to their system. If I download transactions for the past week on Sunday, but a transaction they post on Monday has a post date prior to my connect time on Sunday, the next statement download will miss that transaction. The net result is that transactions can be missed. That's not good.

Looking at the FITID value, it's pretty obvious that simply deleting the 5-digit serial number from every transaction would force all duplicate transactions to have a matching FITID, regardless of the date of download. The problem with that approach is that truly different transactions may well have the same dollar amount on the same day. If we simply strip the 5-digit code, we run the risk of overwriting transactions that have the same date and dollar amount. This is the reason for the 5-digit code, allowing up to 99999 different transactions on the same date with the same dollar amount. The scheme would work, except that Discover appears to regenerate the 5-digit value on a daily basis! Their logic appears to be that each OFX statement download will have a unique FITID assigned to each transaction, but there is no guarantee that the transactions will have the same FITID tomorrow. In fact, it's not likely that they will.

The fix that I settled on for this "issue" was to replace the ##### value with a sequential number. The 5-digit code will be replaced with 0. If there is a true duplicate entry on that date (i.e., two transactions with the same dollar amount on the same day), then the second transaction will have a code of 1, and the value will increment up for each repetition after that. So long as Discover always sends transactions that post on a given date at the same time, this should do the trick. If, for some reason, they post two identical looking transactions on the same date, but then send them on different days, there's a possibility that this scheme could misidentify one. The odds of this happening seem to be pretty low, and even if it does, so long as Discover sends them in the same order (i.e., the first received is the first that shows in the OFX file), then the method will still work. Once duplicate transactions are recorded for a given date, it doesn't really matter if they come in the same order in the next OFX statement. Since each of the new FITID values will have already been recorded with an associated transaction, they will all be skipped during the import process, leaving the values received the first time around intact.

Scrubber.py

It seemed likely that since Discover has an oddball way of doing things, then so may others. Rather than implement a one-time fix for the Discover statements, I decided to implement a module named scrubber.py. This module gets called for each statement download, but unless it finds a known problem in the file, nothing happens. If there is a scrubber routine defined for the issue, then it gets fixed prior to sending the OFX file to Money. The Scrubber.py routine has progressed from it's original purpose, and now has numerous routines to address issues that users have discovered, and will be continually updated as new problems arise.