I need to stop being such a perfectionist and publish things I’ve done, even if they aren’t as complete as I would like. Here are some:

## Merrill 1984

I’ve reproduced all the tables and figures from *A Comparison of Efficiency of Multicandidate Electoral Systems* using my own simulator written in Python, then added some modern voting methods to compare. He simulates Condorcet Efficiency and Social Utility Efficiency with both the unrealistic “impartial culture” random voter preference model, and a more realistic spatial model.

**Condorcet Efficiency**measures the likelihood of electing the “pairwise champion”, when one exists (ignoring simulations in which there is a tie).**Social Utility Efficiency**measures the “utility” (favorability or representativeness) of the winner, proportional to the best and worst options that were on the ballot. This has also been called Effectiveness (Weber), Voter Satisfaction Index (Shentrup), Voter Satisfaction Efficiency (Quinn), and is related to “Bayesian Regret” (Smith).

Here are the original Condorcet Efficiency graphs (2.5–10k simulated elections):

And here are my reproductions (with same parameters and axes, but 100k simulations for all points, and addition of STAR Voting and Score Voting):

Here are the original Social Utility Efficiency graphs (also 2.5–10k elections?):

And my reproductions (same updates as before):

Zoomed in to show the distinctions between the methods clustered at the top:

All these simulations use honest voter strategy. For ranked systems, they just rank the candidates honestly. For Score and STAR, they give maximum score to their favorite candidate, minimum to their least favorite, and proportional in between. The Approval Voting strategy in all of these is the “optimal” strategy described in the paper, where voters approve of any candidate they like more than average (which is probably not realistic).

**To do:**

- Figure out why there are discrepancies from the original, up to 16% for one point. (Just random variation from the low number of simulations, or is there a bug in mine or the original?)
- Add error bars (
*way*more complicated topic than I expected) - Add
*Top Four*and*Final Five*voting methods. These will of course be identical to Hare RCV for 4 or fewer candidates, but I predict that with more candidates, they will degrade slightly towards FPTP, as the single-mark primary stage eliminates good candidates through vote-splitting.- Simulate other variations like Approval primary→Top 5→Hare RCV general, FPTP primary→Top 5→Condorcet RCV general, Unified Primary, etc.

- Do the same tests in 1- or 1.5-dimensional attribute spaces, etc. I suspect Condorcet has better SUE/VSE at lower dimensionality where circular ties can’t exist.

**Note: **Weber’s “Effectiveness” and Merrill’s SUE values disagree, despite having the same formula, because Merrill normalizes utilities before finding the utility winner in each election. Smith mentions Merrill’s 100% SUE results as evidence of a “bug”, but it’s just a difference in interpretation. I think Weber’s approach makes more sense, since I believe that elections with polarizing majoritarian winners beating broadly-liked candidates really could happen.

## Winner distributions on a 1D spectrum

Assuming a one-dimensional preference spectrum (spatial model again), we can plot the probability distribution of winners along that spectrum. I typically assume a “bell curve” distribution of voter preferences, since that’s roughly similar to what I’ve seen in real-life surveys.

The most obvious distribution of candidate positions would then be the same as that of the voters. (Which I freely admit could be completely wrong compared to the real-world, but the point is to show some general principles.)

### Comparing systems with different candidate dispersions

First a comparison of several systems with candidates distributed the same as voters:

“Random Winner”, or Sortition, is mostly just here to show the probability distribution of the candidates, and to act as a worst-case, since they are selected completely randomly, without regard for the preferences of the voters. (This is not the same thing as Random Ballot.)

“Best possible winner” is just the candidate nearest the center of the voters’ preferences, as a best-case scenario for comparison.

For the systems that count only first choice preferences in each round (FPTP, T2R, IRV), you can see the center-squeeze effect making them biased in favor of more polarizing candidates away from the center. The extra rounds of T2R and IRV improve this somewhat, but don’t fundamentally fix it.

I suspect the better performance of Condorcet vs STAR is because of the one-dimensional spectrum without circular ties. As I mentioned above, I suspect Condorcet would have better SUE/VSE than STAR in this case. (To do: Test that.)

I was surprised at how obvious the center-squeeze effect becomes when the candidate distribution is half as wide as the voter distribution (which is plausible if, for example, extremist candidates know they have no chance and choose not to run at all):

All three plurality-based systems behave pretty much the same in this scenario, being biased strongly against the best representative, and in favor of polarizing candidates to either side.

This effect also gets worse with more candidates, as expected from the SUE graphs above. (To do: Make plots of that.)

Measuring as relative dispersion of candidate changes, you can see how the plurality-based systems become more biased *against *the best candidates as the candidates become better overall:

(FPTP on the left, Hare RCV on the right.)

Meanwhile, the consensus-based systems produce better results with better candidates:

(Condorcet on left, STAR on right.)

### Approvals per ballot

I didn’t include Approval Voting in the above, because it varies so much with the number of approvals per ballot. I don’t have a good model of real-world voter behavior yet, but here’s a crude first attempt (relative dispersion 0.5):

This is just the “Vote-for-*k*” method described in Weber 1977, where every voter approves of the same number of candidates per ballot. Obviously the center-squeeze effect for Vote-for-1 is identical to FPTP, then it improves as the approvals per ballot reaches half of the number of candidates, then gets worse again.

The 23 real-world binding approval voting elections I’ve found so far have from 1.1 to 3.1 approvals per ballot per seat, but most are closer to 1 than 3. It’s also not known for these elections what the distributions of A/B/S are across voters, and I would imagine that this affects the winner distributions. (To do: Test and find out.)

Within an election, the only A/B/S distributions I’ve found are from non-binding polls, so should be taken with a grain of salt, but they (unsurprisingly) don’t match the Vote-for-*k* model, with different voters approving of different numbers of candidates, with distributions that look like this:

How to model this for simulations? I don’t know. TBD.

To do:

- Clean up table of real-world approval elections
- Add references for all of this
- Put the labels on the right side of the graphs, next to each line

[All images created by me, and licensed as CC0, unless otherwise specified]