Nowcasting Texas RRC Oil and Gas data (ongoing project)

US shale oil and gas production is probably one of the hottest topics in current energy markets and Texas oil production is at the center of this "revolution". This is why Texas RRC oil and gas data are closely watched by energy analysts worldwide. However, these data are only preliminary: recently, James Osborne wrote a nice article explaining the reasons behind this practice:

" ...The railroad commission declined to make staff available to discuss the divergent data. But a spokeswoman for the agency wrote in an email its data “reflects a snapshot in time” and is continually revised as oil and gas companies submit their reports.
Most production revisions occur within 90 days as the Commission issues a letter to operators who are delinquent with their production reports and gives them 30 days to file a late report. After that 30-day timeframe has passed, the Commission issues [a] certified letter informing the operator their lease will be severed and shut in if they do not submit a production report, so most late reports are turned in within 90 days,” she wrote... "

Unfortunately, as energy practitioners know, majors corrections to RRC data can take place up to 1 year from the initial release, while minor corrections up to 2 years. Moreover, preliminary RRC vintage data show a very strong negative bias in the first year (from their initial release), and a slight positive bias in the second year (from their initial release): that is, they strongly underestimate the real data during the first year, while they slightly overestimate it during the second year. Clearly, this is quite a different situation from vintage data and nowcasting in (macro)economics .

Given this situation, I have started working out a (simple) methodology to re-construct the "real" Texas oil and gas production data. The idea is as follows: using the latest RRC data up to month T, and the previous data up to month T-1 (published one month before), I compute the amount of corrections that each month should undergo to be close to the real data. In doing this, I consider only the last 24 months because older months have only negligible corrections: what I do is to sum for each month the corrections which took place in the previous “h” months, where I put h=24 for computational simplicity. By doing this for all the past 24 months, I build a set of corrections factors to reconstruct the supposed “real” Texas oil and gas production data for month T up to month T-23: I then use this set of correcting factors together with those computed with past vintage data to compute the average correcting factors over all vintage data-sets:

It is clearly visible that the preliminary Texas RRC data strongly underestimate recent production data, while slightly overestimates older data. Finally, confidence intervals can be computed around the corrected data using the variability in past vintage data.

The "corrected" approximate real production data for crude + condensate (using all vintage data),  together with the latest EIA data is reported below

This method has proved to be reliable over time, providing estimates of Texas oil production very close to the final data and much earlier than the latter are published (as I said above, it may take more than 2 years). Moreover, these estimates proved to be closer to the real data than the official EIA data for Texas: for example, on the 31/08/2016, with more than a 1-year delay, the EIA revised its Texas data for 2014 and 2015 and aligned it to my corrected Texas RRC data:

This is an ongoing project, performed in real time: if this method will prove to be stable and fairly robust, it is my intention to write a paper about it. If you are an oil&gas researcher, journalist, producer, etc, and you are potentially interested in writing a joint paper, please let me know it.