Welcome back. In the previous posts we saw how to do inference using the beta-binomial to get probabilities for the proportion of fake ballots in an election, as well as an upper bound on the probability that the election result is incorrect. We briefly mentioned the hypergeometric distribution but did discuss it further nor use it.
Like the binomial (and beta-binomial), the hypergeometric distrbution can be used to model the number of successes in a series of sampling events with a binary outcome. The distinction is that the binomial models sampling with replacement, whereas the hypergeometric models sampling without replacement. In other words, if we are sampling from a box, the binomial applies when the sample is returned to the box before drawing more samples. The hypergeometric applies when the sample is not returned. But wait, doesn’t that mean that we’ve been doing it wrong?
When auditing ballots we keep track of those already checked, a ballot is never audited twice. Shouldn’t we then be using the hypergeometric distribution? It turns out that the binomial distribution approaches the hypergeometric distribution in the limit of a large total number of items compared to the number sampled. This fits our case, as we can only audit a limited number of ballots compared to all those cast.
As we saw in the previous post, the beta distribution is a conjugate prior for the binomial, which makes inference very easy. Unfortunately this is no the case for the hypergeometric. But because of the converging behaviour seen above, we can stick to the beta-binomial’s easy calculations without sacrificing accuracy. For the sake of completeness we will quickly show the posterior for the hypergeometric, following [1]. Incidentally this “manual calculation” is what allowed us to obtain the images above, through the javascript implementation in the jsfiddle.
Again, this is just Bayes theorem with the hypergeometric likelihood function and a uniform prior. In [1] it is also pointed out that the normalization factor can be computed directly with this expression
We use this in the jsfiddle implementation to normalize. Another thing to note is that the hypergeometric posterior is 0 at positions that are inconsistent with evidence. One cannot have less successes than have been observed, nor more than are possible given the evidence. These conditions are checked explicitly in the implementation. Finally, the jsfiddle does not contain an implementation for obtaining the upper bound on the probablity of election error, only code for the posterior is present. How about forking it and adding that yourself?
In these three posts we have used bayesian inference to calculate probabilities over proportion of fake ballots, and from there to calculate probabilities that an election result was incorrect. These probabilities could be used to achieve trust from stakeholders in that everything went well, or conversely to detect a possible fraud and invalidate an election.
I’ll finish the post by mentioning that the techniques we have seen here can be generalized beyond the special case of detecting fake ballots for plurality votes. For example, one could use bayesian inference to conduct ballot audits for the sake of checking tally correctness, not just from failures in authentication, but from errors in counting. See [2] for this kind of more general treatment
[1] http://www.wmbriggs.com/public/HGDAmstat4.pdf
[2] https://www.usenix.org/system/files/conference/evtwote12/rivest_bayes_rev_073112.pdf
In this work, because results of audits are not just binary, and because tallies are not only plurality, the authors use dirichlet distrbutions and sampling using posteriors to project possible alternative tallies.