So, as I have recently published the 2021 NYC Mayoral Election maps, for the uninitiated who want to make RCV maps of other cities, or New York itself, here is a step by step guide to understanding how one calculates Final round, and the hypothetical matchups as presented in a Cast Vote Record. Or as close as I have figured out how to calculate the results.
In order to understand how to calculate the final round/Hypothetical results, we first need to understand the basics of RCV. Trust me, this was important to figuring it out, and is important in order to make sure that all bugs can be sorted out.
Basically, RCV gives you a certain amount of options where you a rank a preference between your favorite or least favorite. Example rank: 1) Sheila Nezhad, 2) Kate Knuth 3) Jacob Frey.
To understand how RCV votes count, you need to know that ALL votes for a candidate* count towards them no matter how early or late the prior candidates votes exhausted.
for NYC 2021, say I voted for Wiley, Stringer, Adams, Garcia, and Adams again, that vote would count for Adams in the final round, once Wiley was eliminated despite Scott Stringer being eliminated much earlier.
Note that the vote doesn't count for Garcia, or Adams twice. This will be important later.
The Cast vote record is exactly as it sounds, a record of all the individual ballots cast in any given election. Ipso facto what this does is give you how *every voter* individually ranked their ballot, from this you will need to use a countif function to get the round 1 results, and a more complicated Countifs function, deliniated with a table, to get the final round results—once you understand all the excentricities in Ranking ballots.
Mouthfull title, I know. Anyway. Due to the above mentioned nature, there are many potential eccentricities that need to be accounted for. While Ballot Exhaust is the one that's most well known—because it's the one candidates hate the most—for calculating results the worst ones are what I like to call Multi-voting and Double voting.
In this context multivoting is voting for both of the candidates who's votes you want to isolate, and double voting is voting for one of the candidates you want to isolate multiple times.
Were you to isolate each preference voting individually using a Countifs function in excel these types of vote would over-inflate a candidate beyond the votes that they actually got.
Note that undervoting and Overvoting aren't problems so long as the ballot is labeled *within the CVR* because the only issues are over-voting for one candidate and voting for both candidates, as such, all equations need to solve for these two problems to get how the CVR delineates the ballots
Now, we get to the fun part, the Equatiosn *jazz hands*.
Here, you need to isolate each candidates votes in each round making sure that each ballot only counts as (1) One vote for (1) One Candidate. As such, the following Equations are going to be needed. Numbers deliniate Columns
Countif(A:A,"CandidateOne")
Countif(B:B,"CandidateOne", A:A,"<>CandidateTwo")
Countif(c:c,"CandidateOne", b:b,"<>CandidateTwo", a:a,"<>CandidateTwo")
Countif(d:d,"CandidateOne", c:c, "<>CandidateTwo), b:b,"<>CandidateTwo", a:a,"<>CandidateTwo")
and so on and so force depending on the ammount of choices, Repeat for Candidate two
Next, you do two columns for each of the candidates—if(sum(a2:d2)>1,"1","0"). What this'll do is give you how many voters actually voted for candidate One over candidate Two, whilst isolating for double voting as the five colums above isolate for the issue of multi voting, and this count if greater than one isolates for double voting.
Next up you have to use the countif function on each candidate's vote row to see how many votes they got in each precinct (reminder we just isolated how many *voters* voted for each candidate over the other we isolated) and there we have it! a way to count the votes.
I will note that this only gets us to 99.9% of the way there, as you may find some candidates off by a few votes from official statistics. The data is still usable and correct for most forms of analysis (I enphasize: trust the official certified figures), but I would note this even if it were off by a singular vote. I have been working to decode the Cast Vote Records and even getting to this point where we can tell how 99.9% of people voted was not easy and required a lot of trial and Error.
I will continue to try and get to 100% accuracy and edit this when I find out the issue, but for most unofficial purposes, this should serve people to get as close as is possible with Excel to the real results.
Oh and for NYC, make sure you calculate all the vote totals together per Borough individually (Vote types together, Boroughs by themselves), if you don't that will Royally screw you up on the easy step of counting the votes on the First Round.
CVRs are very sensitive to these sorts of thing.
Addendum:
Thanks to the help of @Orcasaccount on Twitter I was able to get this to 100%, what I was missing was the fact that in NYC vs Other Jurisdictions overvote was disqualified. Still, note the above issue for the Cast Vote Record However as that is significant.
to resolve the issue, add another section to the Countiffs, (i.e. Countif(B2,"CandidateOne", A2,"<>CandidateTwo", a2, "<>Overvote"))