Sunday, June 17, 2018

The World Cup Data Hub Launch and Explainer

The Sheffield Methods Institute and Q-Step Centre are launching our World Cup Data Hub which will collect, analyse, and present a huge range of data and statistics from this year’s 2018 men’s World Cup. 

The World Cup Data Hub will be your one-stop shop for stats, data, key figures and team by team analysis over the next four weeks as the World Cup unfolds.

This blog post explores in detail the type of data we are collecting and what it all means.

1.    Formations 

Tactics and how teams are using them to varying degrees of success will be central to the data and analysis provided by the Data Hub. One of the key elements of any tactical plan is the formation that a team uses. How many defenders do they play? Midfielders? Attackers? How advanced is their midfield? Will five defenders really mean a solid back five or three holding the fort while two wing backs push on high up the field? We will be recording how teams line up, such as whether they use a 4-4-2 formation, or a 5-4-1, or something more adventurous such as a 3-2-3-2. We will also record the exact number of players fielded in each position to analyse the relationship between players fielded and match statistics (do more defenders mean, on average, fewer goals conceded, for example?). Our data will group match statistics and results achieved by each formation played and try to determine which formations have been most successful, and indeed least successful, in this year’s World Cup. 

For example, Croatia lined up against Nigeria with four defenders, two central midfielders, an attacking central midfielder in Rebic, and then each of Perisic, Kramaric, and Mandzukic in advanced attacking positions. Their tactic entered into the database was therefore a “4-3-3”. Conversely, both Uruguay and Portugal lined up in their matches in a “4-4-2” shape, while Saudi Arabia began their opening match against Russia with a “4-5-1”, who themselves fielded the ever-popular “4-2-3-1” formation.

2.    Playing Styles 

As well as formations, the way that teams approach the game in terms of their playing style and philosophy is also very important. Will teams focus on dominating possession and patiently awaiting the opportunity to score? Or will they operate a defensive game, staying solid and hoping to give nothing away? Again, we will be collecting information on how every team approaches each game in their World Cup campaign, looking to see which approaches will be most or least successful. Approaches will be categorised into one of the following five (with examples provided):

Defensive
Team focus on remaining solid and hard to break down, will keep 10 or 11 players behind the ball when not in possession, and will generally not commit too many players forwards in attacks at any great speed, not much risk in the game plan. For example, both Australia and Iceland operated with defensive approaches in their games against France and Argentina respectively.

Counter-attacking
Teams will play largely within their own half and defend in numbers but equally will show strong intent on attacking their opponents (usually quickly and directly) when the opportunity arises, committing plenty of bodies forward. For example, Portugal played a very strong counter-attacking game against Spain in their opening match.

Balanced
Teams will do a little bit of everything; focusing equally on defending in a good shape and not pushing their defensive line or pressing too high, but also maintaining some possession and attacking for significant portions of the game. For example, Denmark used a balanced approach in their game against Peru, managing to keep possession and patiently fashion chances but without giving too much away in the midfield or defensive areas.

Control/Possession
Teams will focus on dominating possession and attacking their opponents in a sustained, systematic manner, committing plenty of players (including full/wing-backs) forward into the opposition half. Will typically press fairly high and operate with a fairly high line. For example, Argentina, Spain, and Saudi Arabia all used (slightly different) styles of possession football in their opening matches.

Direct Attacking
Teams will be aggressive in their attacking play and force the issue all over the pitch, pressing their opponents for possession at every opportunity (often in groups) and launching quick, direct attacks to spearheading forward players as often as possible. For example, Russia, Peru and Mexico opened their World Cup campaigns with aggressive and direct attacking play.


3.    Goals and Attempts

Goals win games and winning games wins tournaments. We will be collecting all kinds of data on goals. How many goals a team scores, how many they concede, how many shots they take and how many of those are on target, and how many shots a team faces will all be coded up. As well as this, we will record the type of goals that teams score and concede. Do teams score more headers than shots, and do the assists come from crosses or passes? As well as this, we are collecting total ball possession for each team too. 

For example, in their high-scoring, frantic affair, Spain and Portugal managed to score six goals between them (three a-piece). Two of Portugal’s goals were direct set pieces (one penalty, one free-kick), and one was a shot resulting from a pass. Spain scored one headed goal from a set piece play (free-kick routine), and then two from winning the ball from an opponent. These goals are uniquely captured in the database and coded accordingly. Spain enjoyed 55% of the ball, with Portugal seeing 45%.

4.    Changes

Finally, we will be collecting information on if and how teams change their tactics and approach over the course of the match. Further, we will collect data on any enforced man-disadvantages a team faces (through a red card or injuries after the maximum number of substitutions have been played). 


All of this information is added to a statics datafile with detailed team information, including their ranking and seeding going into the tournament. The data is then processed into six fully-interactive modules which users can look at to visualise our dataset. They are as follows:

Fixation on Formation 
a)    See how different formations have been more or less successful at winning games, retaining ball possession, taking shots and scoring goals, and a range of other match outcomes and statistics

Positional Pulls
b)   See how the number of players fielded in each main footballing position (defenders, midfielders, and attackers) is connected to winning matches, conceding and scoring goals, ball possession, and other match statistics.

Has It All Gone Potty
c)    See whether or not there are patterns in the way teams are using formations, playing approach (whether they are attacking, controlling the game, counterattacking, or 'parking the bus'), and changing their tactics across the different world cup seeding groups.

Rank 'Em Up
d)   Use the graph to see the relationship between FIFA World Rankings and match statistics including goals scored, shots taken, shots faced, and more - are higher ranked teams posting better statistics?

Team by Team
e)   Load up a country from the menu to see detailed information on the formations, playing approach, winning rates and match statistics for every team at this year's World Cup.

Regression Analysis
f)     One for the really statsy-fans out there - use the menu to load variables into a logistic regression to find out what best predicts teams winning matches over the course of the World Cup