Monday, June 25, 2018

The World Cup in Numbers

By Dr. Patrick English – World Cup Data Hub ‘Chief of Stats’

As we move today into round three of the group stage games, let’s take a look into the numbers so far. With 32 games now played, this blog post looks at the key data from both rounds of group stage games, how successful different formations and styles of play have been (and which kind of teams have been most often using them), and finally what our regression model is saying currently best predicts a winning team at this tournament. 

Firstly, round two of group stage games has been characterised by more goals than round one, less draws, and far more wins for higher ranked teams. In round one, the average goals scored by teams stood at 1.2 per team, per match. At the conclusion of round two, this figure had increased to 1.5 goals per team, per game on average. We’ve also seen two fewer draws in the second round than the first, with the total points-shared outcome falling from six to four. Finally, while in round one top ranked teams (pot one) recorded only two victories, in round two there were six wins for the highest ranked teams in the competition. So perhaps we can, with some notable exceptions, suggest that round two as a shift toward more ‘business as usual’ in terms of footballing hierarchy?

Secondly, three-defender formations appear to be enjoying a considerable amount of success at the tournament so far. Of those formations used more than once, five have a win rate of more than 50%. Of those five, two feature a back three and their combined success rate is 66%. The most successful four-defender formations have been the “4-2-3-1” (54% win rate), and the “4-3-3” and “4-4-2” (each on 50%). 


According to our data, direct attacking remains the most successful style of play, but only just, over possession football, with win rates of 57% and 55% respectively. Interestingly, the “4-2-3-1” formation appears to be by the best option when it comes to beating higher ranked teams, while the attack-minded “4-3-3” appears to be good at doing the job against lower ranked sides (as well as a traditional “4-4-2”).

Some strong, but perhaps quite logical, stories have been developing regarding the usage of formations and styles of play by higher and lower ranked teams at this year’s World Cup. For instance, teams in pot one have used a total of seven different formations so far, with “4-2-3-1” and “3-4-2-1” the most frequently deployed.


Conversely, teams in pot four have used a total of six formations and have been much more likely to field solid back fours, lone strikers, and anchoring defensive midfielders in their games so far this tournament. Interestingly though the most attacking formations seem to be coming from teams in pot two, with the “4-3-3” favoured by teams in this seeding group a total of 5 times.


Teams in pot four have also been using a ‘balanced’ style of play in over half of their total games, while the same is true for the ‘possession’ focus for teams in pots one and two. Teams in pot three have used a ‘defensive’ approach the most – a total of four times.  

Also, as was discussed in the inaugural World Cup Data Hub podcast, there seems to be a strong connection between a higher team FIFA World Ranking and the ability to get more shots on target and at a better rate of accuracy than their opponents. The graph below demonstrates a clear trend between fewer shots on target and lower team ranking. This, combined with the shooting accuracy statistics, suggests that efficiency and accuracy in shooting are a strong component of being a successful footballing team on the world stage.


Finally, the logit regression model – a statistical tool which is being used to figure out what sort of combination of teams and tactics are associated with winning games – is reporting some very interesting findings. Generally speaking, it is pretty much bringing much of the above together into one simple story, which is exactly what regressions are so good for. According to this analysis, the currently strongest predictors of a winning team at the World Cup are those from pot two (though the results also suggest that relative to pot four teams, pot one and pot three sides are also better at winning), playing with fewer defenders, and facing the fewer shots. 


The former result highlights how as well as teams in pot four, many of the highest ranked teams in the competition have been struggling a lot at this year’s tournament – Poland and Argentina (0 wins) spring to mind, but also Portugal and Brazil – currently having only one win each. Conversely, pot two teams have been doing very well (Croatia, England, Mexico and Uruguay spring to mind). The latter two factors, I think, are quite closely connected; thinking back to last week’s podcast, this has very much been a growing theme at the World Cup and perhaps highlights something about the kind of chances and approach play that teams will have against opposition units fielding three centre backs. Teams playing against a back three are probably more likely exploit the flanks and focus on crosses, with a packed-out midfield in front of the defence and three centre backs guarding the perhaps resulting in fewer direct shooting opportunities. 

So, there we have it! The World Cup so far in numbers, stats,  and data. Stay tuned for the next podcast (releasing tomorrow) and for further blogs from the World Cup Data Hub team.

Sunday, June 17, 2018

The World Cup Data Hub Launch and Explainer

The Sheffield Methods Institute and Q-Step Centre are launching our World Cup Data Hub which will collect, analyse, and present a huge range of data and statistics from this year’s 2018 men’s World Cup. 

The World Cup Data Hub will be your one-stop shop for stats, data, key figures and team by team analysis over the next four weeks as the World Cup unfolds.

This blog post explores in detail the type of data we are collecting and what it all means.

1.    Formations 

Tactics and how teams are using them to varying degrees of success will be central to the data and analysis provided by the Data Hub. One of the key elements of any tactical plan is the formation that a team uses. How many defenders do they play? Midfielders? Attackers? How advanced is their midfield? Will five defenders really mean a solid back five or three holding the fort while two wing backs push on high up the field? We will be recording how teams line up, such as whether they use a 4-4-2 formation, or a 5-4-1, or something more adventurous such as a 3-2-3-2. We will also record the exact number of players fielded in each position to analyse the relationship between players fielded and match statistics (do more defenders mean, on average, fewer goals conceded, for example?). Our data will group match statistics and results achieved by each formation played and try to determine which formations have been most successful, and indeed least successful, in this year’s World Cup. 

For example, Croatia lined up against Nigeria with four defenders, two central midfielders, an attacking central midfielder in Rebic, and then each of Perisic, Kramaric, and Mandzukic in advanced attacking positions. Their tactic entered into the database was therefore a “4-3-3”. Conversely, both Uruguay and Portugal lined up in their matches in a “4-4-2” shape, while Saudi Arabia began their opening match against Russia with a “4-5-1”, who themselves fielded the ever-popular “4-2-3-1” formation.

2.    Playing Styles 

As well as formations, the way that teams approach the game in terms of their playing style and philosophy is also very important. Will teams focus on dominating possession and patiently awaiting the opportunity to score? Or will they operate a defensive game, staying solid and hoping to give nothing away? Again, we will be collecting information on how every team approaches each game in their World Cup campaign, looking to see which approaches will be most or least successful. Approaches will be categorised into one of the following five (with examples provided):

Defensive
Team focus on remaining solid and hard to break down, will keep 10 or 11 players behind the ball when not in possession, and will generally not commit too many players forwards in attacks at any great speed, not much risk in the game plan. For example, both Australia and Iceland operated with defensive approaches in their games against France and Argentina respectively.

Counter-attacking
Teams will play largely within their own half and defend in numbers but equally will show strong intent on attacking their opponents (usually quickly and directly) when the opportunity arises, committing plenty of bodies forward. For example, Portugal played a very strong counter-attacking game against Spain in their opening match.

Balanced
Teams will do a little bit of everything; focusing equally on defending in a good shape and not pushing their defensive line or pressing too high, but also maintaining some possession and attacking for significant portions of the game. For example, Denmark used a balanced approach in their game against Peru, managing to keep possession and patiently fashion chances but without giving too much away in the midfield or defensive areas.

Control/Possession
Teams will focus on dominating possession and attacking their opponents in a sustained, systematic manner, committing plenty of players (including full/wing-backs) forward into the opposition half. Will typically press fairly high and operate with a fairly high line. For example, Argentina, Spain, and Saudi Arabia all used (slightly different) styles of possession football in their opening matches.

Direct Attacking
Teams will be aggressive in their attacking play and force the issue all over the pitch, pressing their opponents for possession at every opportunity (often in groups) and launching quick, direct attacks to spearheading forward players as often as possible. For example, Russia, Peru and Mexico opened their World Cup campaigns with aggressive and direct attacking play.


3.    Goals and Attempts

Goals win games and winning games wins tournaments. We will be collecting all kinds of data on goals. How many goals a team scores, how many they concede, how many shots they take and how many of those are on target, and how many shots a team faces will all be coded up. As well as this, we will record the type of goals that teams score and concede. Do teams score more headers than shots, and do the assists come from crosses or passes? As well as this, we are collecting total ball possession for each team too. 

For example, in their high-scoring, frantic affair, Spain and Portugal managed to score six goals between them (three a-piece). Two of Portugal’s goals were direct set pieces (one penalty, one free-kick), and one was a shot resulting from a pass. Spain scored one headed goal from a set piece play (free-kick routine), and then two from winning the ball from an opponent. These goals are uniquely captured in the database and coded accordingly. Spain enjoyed 55% of the ball, with Portugal seeing 45%.

4.    Changes

Finally, we will be collecting information on if and how teams change their tactics and approach over the course of the match. Further, we will collect data on any enforced man-disadvantages a team faces (through a red card or injuries after the maximum number of substitutions have been played). 


All of this information is added to a statics datafile with detailed team information, including their ranking and seeding going into the tournament. The data is then processed into six fully-interactive modules which users can look at to visualise our dataset. They are as follows:

Fixation on Formation 
a)    See how different formations have been more or less successful at winning games, retaining ball possession, taking shots and scoring goals, and a range of other match outcomes and statistics

Positional Pulls
b)   See how the number of players fielded in each main footballing position (defenders, midfielders, and attackers) is connected to winning matches, conceding and scoring goals, ball possession, and other match statistics.

Has It All Gone Potty
c)    See whether or not there are patterns in the way teams are using formations, playing approach (whether they are attacking, controlling the game, counterattacking, or 'parking the bus'), and changing their tactics across the different world cup seeding groups.

Rank 'Em Up
d)   Use the graph to see the relationship between FIFA World Rankings and match statistics including goals scored, shots taken, shots faced, and more - are higher ranked teams posting better statistics?

Team by Team
e)   Load up a country from the menu to see detailed information on the formations, playing approach, winning rates and match statistics for every team at this year's World Cup.

Regression Analysis
f)     One for the really statsy-fans out there - use the menu to load variables into a logistic regression to find out what best predicts teams winning matches over the course of the World Cup