behind the magic
A behind the scenes look at the THR Leaderboard spreadsheet used to track tournament results. The spreadsheet uses an "importer" to scrape raw data from ACR result pages and populating the ID and Result sheets for its corresponding tournament type; the Freebuy (F25) and the Bonanza (BB) currently.
F25 & BB Leaderboards
Tournament IDs, Results and LBs for November 2018
You can view all recorded tournaments for the current month which includes complete tournament results per ACR. Feel free to click on individual sheets in the spreadsheet above. Try the Points Scale to view the current method for scoring.
This spreadsheet is in a current state of flux with new features being added all the time. This iteration includes tracking FT appearances in the Freebuy and Top 6 finishes in the Bonanza. Points and Winnings averages were included as well.
Web Scraping 101 via the IMPORTXML function
Importing results data for the leaderboards is accomplished with the IMPORTXML function. Lots of docs on that, not going into details on its inner workings.
The function is quirky sometimes and can be stubborn. First the error for about three minutes. All of sudden the error stops and the data is imported. I used to get this a lot on last months sheets. The reason was I had too many IMPORTXML statements on the ID & Results sheets, two for each game times 2 . The more you have, the more it bogged down.
Since we only need to import the data once, I created an importer sheet. Import the results once, then copy to the appropriate sheets. Only 2 import functions total. Not that many errors any more, except when we are unable to get the results because the page is not reachable.
The error came back again. Maybe the site is unreachable? No, it is fin I waited and waited ....
Then the import came back. This time there was a shift of data elements on the page. This threw it all out of whack.
Something that will always come up when you are extracting data from a website is the inevitable Html change to the page. One inserted DIV or SPAN is enough to throw everything out of whack. Happened right in the middle of my import for a Bonanza game one morning.
It is a good thing we are not using IMPORTXML functions for every result set, otherwise it would be a giant PITA to rework them all. Only have two import functions to deal with, just a matter of re-calibration and we are good to go.
The important part of the current IMPORTXML statement is the path string at the end. The path points to a group of elements that the IMPORTXML function will pull values from.