Comparing the penalty-taking abilities of men and women.
I recently came across a dataset that had historic penalty performance for a variety of football matches over the past twenty-or-so years. The data has in it both men’s and women’s performances so I thought it might be interesting to compare and contrast the performance of each, particularly given the increase in popularity of women’s of the last few years. The code for this analysis can be found on my Github:
It was found that the success rate of women’s performance varied quite strongly over time, while the men’s performance was relatively consistent in comparison. This led to a few further insights:
The first step of the analysis was to evaluate the mean conversion rate for each gender over every year in the dataset as shown below.
year | mean_male | num_penalties_male | scored_male | mean_female | num_penalties_female | scored_female |
---|---|---|---|---|---|---|
2003 | 1.000000 | 73 | 73 | 1.000000 | 8 | 8 |
2007 | 0.820896 | 536 | 440 | 1.000000 | 7 | 7 |
2008 | 0.852273 | 528 | 450 | 1.000000 | 7 | 7 |
2009 | 0.819113 | 586 | 480 | 1.000000 | 12 | 12 |
2010 | 0.787572 | 692 | 545 | 1.000000 | 18 | 18 |
2011 | 0.795518 | 714 | 568 | 0.885714 | 70 | 62 |
2012 | 0.782421 | 694 | 543 | 1.000000 | 40 | 40 |
2013 | 0.801460 | 685 | 549 | 0.804348 | 46 | 37 |
2014 | 0.754941 | 759 | 573 | 1.000000 | 19 | 19 |
2015 | 0.757616 | 755 | 572 | 0.929577 | 71 | 66 |
2016 | 0.767492 | 929 | 713 | 1.000000 | 51 | 51 |
2017 | 0.775408 | 797 | 618 | 0.880000 | 125 | 110 |
2018 | 0.772242 | 843 | 651 | 0.844660 | 206 | 174 |
2019 | 0.788560 | 979 | 772 | 0.854839 | 310 | 265 |
2020 | 0.823759 | 1027 | 846 | 0.831169 | 231 | 192 |
2021 | 0.784272 | 941 | 738 | 0.778443 | 334 | 260 |
2022 | 0.744845 | 776 | 578 | 0.699029 | 309 | 216 |
Plotting these results (including standard deviations, which are omitted from the table) we see that there is some interesting behaviour. The simplest to see is that after about $2005$ the male conversion rate settles around $80\%$. Looking at the number of penalties, we can be reasonably confident in this figure as only year $2003$ has a small number of penalties; the remaining years have in excess of $500$ penalties taken by men for that year. On the other hand, the women’s figure is initially very noisy which is likely a consequence of having relatively few penalties included in the data. Only at year $2017$ are more than $100$ penalties included. One feature of this data that is important is that in $2018$, women’s league data is included, but it already appears that the general trend of conversion rate approaches that of men’s when enough penalties are included; adding the league data seems to strengthen this trend.
This plot shows that after about $2007$, the male conversion rate is stable, meanwhile the female rate is noisy. The two rates are similar after $2018$, indicating a slight change in behaviour. The change in rate for females looks to appear at abour $2016$, however, the scale of data is still smaller in $2016-2017$ compared to $2018$ onwards so we use this as the point of demarcation, especially since this is when the women’s league data is included.
Prior to $2018$ the women have much less consistent standard deviations which could be due to the relative scarcity of data compared to the male dataset (sometimes the there are more than $10\times$ more male penalties than female penalties in a year). Once the number of penalties recorded is on the same scale, the deviations look similar.
In summary, there is a shift in performance before $2018$ and after $2018$ for women.
Male performance is roughly similar throughout. In the time period up to $2018$ womens’ conversion rate noisily moves between $0.8$ and $1.0$ which is about $20$ percentage points higher than the typical male conversion rate.
This large difference does suggest that behaviour is different between male and female takers prior to $2018$.
We employ hypothesis testing to more emphatically answer is there a difference in penalty conversion rates between men and women? The attributes we test for independence are gender
and conversion_rate
in the original data. If conversion_rate
is statistically independent of gender
, then we would expect that the conversion rates should be similar, irrespective of the whether gender == male
or gender == female
.
In conclusion, although it may have historically seemed that women had a better conversion rate, more recent evidence is not strong enough to maintain this position.