Load some fun and essential libraries.

suppressMessages(library(dplyr))
suppressMessages(library(tidyr))
suppressMessages(library(readxl))
suppressMessages(library(tinytex))
suppressMessages(library(plotly))
suppressMessages(library(lubridate))
suppressMessages(library(htmltools))
suppressMessages(library(DT))

Revenues

We took revenue data from the fabulous work released each year by Deloitte with their Deloitte Money Football League. We painstakingly pored over each year’s leaderboard and compiled into Excel, which is presented here as CSV. We have no reason to doubt anything produced by Deloitte, however, we did do a couple of crosschecks against some filings with Companies House. We are content to rely on Deloitte’s work and thank them for it.

Although we are dealing with a review of Liverpool and Manchester City, all teams have been recorded in our csv file.

Clicking on the link below will take you to the Deloitte Football Money League home and contains some excellent and informative reads over the years they have compiled this data.

Deloitte Football Money League

Let’s grab the data we prepared. The read.csv() function will create an R data frame for us with the contents of the csv file we specify. A data frame is an R object similar in structure to a traditional table.

R uses the ‘<-’ assignment operator. There are others but we will not complicate the conversation. Best practice typically recommends <- for assignment.

Football.Revenues <- read.csv(file="footiedata/FootballRevenues.csv", header=TRUE, sep=",")
## This creates an object (in this case a data.frame) named 'Football.Revenues' from the csv file.

head(Football.Revenues, n=10)
##       Season Rank Identifier                Club Matchday Broadcast Commercial Total
## 1  2017/2018    1         RM         Real Madrid    127.1     222.6      315.5 665.2
## 2  2017/2018    2        FCB        FC Barcelona    128.3     197.5      285.8 611.6
## 3  2017/2018    3       MUFC   Manchester United    105.9     204.1      280.0 590.0
## 4  2017/2018    4         BM       Bayern Munich     92.0     156.5      308.9 557.4
## 5  2017/2018    5       MCFC     Manchester City     56.6     211.5      235.4 503.5
## 6  2017/2018    6        PSG Paris Saint-Germain     89.2     113.2      277.5 479.9
## 7  2017/2018    7        LFC        Liverpool FC     81.2     222.6      151.3 455.1
## 8  2017/2018    8        CFC             Chelsea     73.9     204.2      169.9 448.0
## 9  2017/2018    9        ARS             Arsenal     98.9     183.3      106.9 389.1
## 10 2017/2018   10       THFC   Tottenham Hotspur     75.5     200.7      103.2 379.4
## We call the new object 'Football.Revenues to obtain a visualization and display only top 10.
## Calling str() 'structure' of an object displays information about the object and contents.
str(Football.Revenues)
## 'data.frame':    220 obs. of  8 variables:
##  $ Season    : chr  "2017/2018" "2017/2018" "2017/2018" "2017/2018" ...
##  $ Rank      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Identifier: chr  "RM" "FCB" "MUFC" "BM" ...
##  $ Club      : chr  "Real Madrid" "FC Barcelona" "Manchester United" "Bayern Munich" ...
##  $ Matchday  : num  127.1 128.3 105.9 92 56.6 ...
##  $ Broadcast : num  223 198 204 156 212 ...
##  $ Commercial: num  316 286 280 309 235 ...
##  $ Total     : num  665 612 590 557 504 ...
## class() will show the type of object.
class(Football.Revenues)
## [1] "data.frame"

We will also be using custom palette for plots leveraging club colors. We are generating this global variable “footie_palette” to reference in all our plots.

footie_palette <- c(rgb(200,16,46, maxColorValue = 255), rgb(108,171,221, maxColorValue = 255))
footie_palette <- setNames(footie_palette, c("LFC", "MCFC"))
LabelTags <- c("Liverpool FC", "Manchester City")
LabelTags <- setNames(LabelTags, c("LFC", "MCFC"))

We are going to visualize total revenues for Liverpool and Manchester City to get a feel for context and orders of magnitude. The goal is to filter the data.frame Football.Revenues to work with only LFC and MCFC data. The filter() function is a component of library(dplyr).

LFCvMCFC.Revenues <- # assignment to an object.
 filter(Football.Revenues, Identifier %in% 
          c("LFC","MCFC"))
# we like to use %in% but there are other methods.
 
LFCvMCFC.Revenues
##       Season Rank Identifier            Club Matchday Broadcast Commercial Total
## 1  2017/2018    5       MCFC Manchester City     56.6     211.5      235.4 503.5
## 2  2017/2018    7        LFC    Liverpool FC     81.2     222.6      151.3 455.1
## 3  2016/2017    5       MCFC Manchester City     51.9     203.5      198.1 453.5
## 4  2016/2017    9        LFC    Liverpool FC     68.8     156.8      138.9 364.5
## 5  2015/2016    5       MCFC Manchester City     52.5     161.4      178.7 392.6
## 6  2015/2016    9        LFC    Liverpool FC     56.8     125.7      119.5 302.0
## 7  2014/2015    6       MCFC Manchester City     43.4     135.4      173.8 352.6
## 8  2014/2015    9        LFC    Liverpool FC     57.1     124.6      116.4 298.1
## 9  2013/2014    6       MCFC Manchester City     47.5     133.2      165.8 346.5
## 10 2013/2014    9        LFC    Liverpool FC     51.0     101.0      103.8 255.8
## 11 2012/2013    6       MCFC Manchester City     39.6      88.4      143.0 271.0
## 12 2012/2013   12        LFC    Liverpool FC     44.6      63.9       97.7 206.2
## 13 2011/2012    7       MCFC Manchester City     30.8      88.2      112.1 231.1
## 14 2011/2012    9        LFC    Liverpool FC     45.2      63.3       80.2 188.7
## 15 2010/2011    9        LFC    Liverpool FC     40.9      65.3       77.4 183.6
## 16 2010/2011   12       MCFC Manchester City     26.6      68.8       57.8 153.2
## 17 2009/2010    8        LFC    Liverpool FC     42.9      79.5       62.1 184.5
## 18 2009/2010   11       MCFC Manchester City     24.4      54.0       46.7 125.1
## 19 2008/2009    7        LFC    Liverpool FC     42.5      74.6       67.7 184.8
## 20 2008/2009   19       MCFC Manchester City     20.8      48.2       18.0  87.0
## 21 2007/2008    7        LFC    Liverpool FC     39.2      76.3       51.5 167.0
## 22 2007/2008   20       MCFC Manchester City     18.5      43.3       20.5  82.3

We can see that the list is now filtered the way we like. Let’s summarize and plot a visual of total revenues over the period in question.

Revenue.Totals <-
  LFCvMCFC.Revenues %>% # we will discuss the "pipe" %>% below.
  group_by(Club) %>%
    summarize(Total = sum(Total))
Revenue.Totals
## # A tibble: 2 × 2
##   Club            Total
##   <chr>           <dbl>
## 1 Liverpool FC    2790.
## 2 Manchester City 2998.

A decent amount of money, then.

Revenue.Pie <-
  
  plot_ly(
    data = Revenue.Totals,
    labels = ~Club,
    values = ~Total,
    type = 'pie',
    marker = list(colors = footie_palette)) %>% #note piping '%>%' the plot to layout.
  
layout(title = 'Total Revenue LFC v MCFC - 2007/2008 to 2017/2018 (£ Millions)')

Revenue.Pie

What’s £200 Million between friends? So, from the 2007/2008 season up to the 2017/2018 season, Manchester City officially recorded a smidgeon under £3 Billion in revenues while Liverpool recroded £2.79 Billion. One would not think this to be an astronomical advantage. But let’s see how the numbers look over time.

Revenue.Plot <-
  
  plot_ly(
    data = LFCvMCFC.Revenues,
    x = ~Season,
    y = ~Total,
    type = "scatter",
    mode = "lines",
    color = ~Identifier,
    colors = footie_palette) %>%
  
  layout(legend = list(x = 0.05, y = 0.95), #sometimes we move the legend around
      title = "LFC & MCFC Annual Revenue",
         yaxis = list(title = '(£ Millions)'))

Revenue.Plot

This is an interesting graphic. Liverpool booked more revenue than Manchester City from the 2007/2008 season until 2010/2011 season. From 2011/2012 onwards, Manchester City has steadily outperformed Liverpool. Although, for 2017/2018 it appears Liverpool have shrunk the gap somewhat. Whether this becomes a future trend is impossible to say but worth watching. In any event, it is clear that over 11 seasons, the overall revenue for these two teams had a steep incline. No doubt this is a large reflection of the tv deals signed by the Premier League over the past decade.

We do not expect to see a striking visualization but let’s check and see how each compares on the three major categories of revenue.

Stream.Summary <-
  LFCvMCFC.Revenues %>%
  group_by(Club)%>%
  summarize(Matchday = sum(Matchday), Commercial = sum(Commercial), Broadcast = sum(Broadcast))


Stream.Summary
## # A tibble: 2 × 4
##   Club            Matchday Commercial Broadcast
##   <chr>              <dbl>      <dbl>     <dbl>
## 1 Liverpool FC        570.      1066.     1154.
## 2 Manchester City     413.      1350.     1236.
Matchday.Plot <-
  plot_ly(
    data = LFCvMCFC.Revenues,
    x = ~Season,
    y = ~Matchday,
    type = 'scatter',
    mode = 'lines',
    color = ~Identifier,
    colors = footie_palette,
    showlegend = FALSE
  ) %>%
  
  layout(annotations = list(x = 0.2 , y = .95, text = "Matchday", showarrow = F, 
xref='paper', yref='paper'))

Broadcast.Plot <-
  plot_ly(
    data = LFCvMCFC.Revenues,
    x = ~Season,
    y = ~Broadcast,
    type = 'scatter',
    mode = 'lines',
    color = ~Identifier,
    colors = footie_palette,
    showlegend = FALSE
  ) %>%

layout(annotations = list(x = 0.4 , y = .95, text = "Broadcast", showarrow = F, 
xref='paper', yref='paper'))
  
Commercial.Plot <-
  plot_ly(
    data = LFCvMCFC.Revenues,
    x = ~Season,
    y = ~Commercial,
    type = 'scatter',
    mode = 'lines',
    color = ~Identifier,
    colors = footie_palette
  ) %>%
  
  
layout(annotations = list(x = 0.6 , y = .95, text = "Commercial", showarrow = F, 
xref='paper', yref='paper'))

Stream.Plot <- 
  subplot(Matchday.Plot, Broadcast.Plot, Commercial.Plot, shareY = TRUE) %>%
 
layout(title = "LFC & MCFC Revenue Categories",
      yaxis = list(title = '(£ Millions)')
)

Stream.Plot

That certainly tells a story.

  1. Matchday revenue is less relied upon than broadcast or commercial. After some remodelling at Anfield, Liverpool has opened an advantage.
  2. Broadcast revenue has seen huge growth and jumps can be seen every 3 years as new broadcast contracts are negotiated. Both teams are relatively even here with some fluctuation based on Champions League participation and other factors.
  3. Commercial revenue is significantly disparate. The latest figures show Manchester City a whopping £84 Million ahead for 2017/2018. While Liverpool is also growing its commercial revenue, Manchester City have held this advantage for some time now.

One might suspect the club receives generous arrangements from sister companies within the family. However, we will have to see if this sort of data is available for review.

Player Transfers

Grabbing the data. The read_xlsx() function allows us to auto-generate an R data.frame directly from an existing MS Excel file. There are similar functions, available through R and packages, for other file types such as read.csv(). read_xlsx() is a function found within the library(readxl).

LFCvMCFC <- read_xlsx("footiedata/LFCVMCFC.xlsx")

The data enclosed was produced using information on transferleague.co.uk, transfermarkt.com/.co.uk, wikipedia and club/fan sites to help clarify some questions.

In most cases, transfer amounts are as listed on transferleague.co.uk and player attributes primarily sourced on transfermarkt. Some notes:

  1. Alex Manninger was removed from the data.
  2. Danny Ings transfer value set at tribunal April 2016.
  3. Removed Lawrence Vigouroux transaction between U20 sides.
  4. Fábio Aurélio re-signed with Liverpool in August 2010. This transaction was removed and not considered to be a transfer.
  5. Rabbi Matonda transaction removed. Research shows only sale to Schalke and no evidence of large fee transaction incoming for MCFC.

A quick look at the summary data:

#display summary
summary(LFCvMCFC)
##       Date                            Club           Player_Purchased        Fee          Birthdate                       Age_at_Transfer
##  Min.   :2008-07-01 00:00:00.00   Length:164         Length:164         Min.   : 0.00   Min.   :1974-04-12 00:00:00.000   Min.   :15.28  
##  1st Qu.:2010-07-01 00:00:00.00   Class :character   Class :character   1st Qu.: 1.65   1st Qu.:1984-06-04 00:00:00.000   1st Qu.:21.17  
##  Median :2012-08-31 00:00:00.00   Mode  :character   Mode  :character   Median : 8.25   Median :1989-10-02 00:00:00.000   Median :24.29  
##  Mean   :2013-02-25 14:38:02.93                                         Mean   :13.98   Mean   :1988-10-13 19:01:27.804   Mean   :24.39  
##  3rd Qu.:2015-07-29 06:00:00.00                                         3rd Qu.:22.00   3rd Qu.:1993-03-10 18:00:00.000   3rd Qu.:27.01  
##  Max.   :2019-07-09 00:00:00.00                                         Max.   :75.00   Max.   :2000-07-23 00:00:00.000   Max.   :35.33  
##      From              League            Position         International     
##  Length:164         Length:164         Length:164         Length:164        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 

Quick points of interest are median transfer fee at £8.25 Million, median age at time of transfer 24.29 years.

We intend to plot against calendar years primarily and add a column “Year” using lubridate year() function:

#add column to df extracting year from "Date"
LFCvMCFC$Year <-
year(LFCvMCFC$Date)

#display structure
str(LFCvMCFC)
## tibble [164 × 11] (S3: tbl_df/tbl/data.frame)
##  $ Date            : POSIXct[1:164], format: "2018-07-01" "2018-07-01" "2018-07-19" "2018-07-13" ...
##  $ Club            : chr [1:164] "LFC" "LFC" "LFC" "LFC" ...
##  $ Player_Purchased: chr [1:164] "Fabinho" "Naby Keïta" "Alisson" "Xherdan Shaqiri" ...
##  $ Fee             : num [1:164] 39 52.8 56.2 13 3 ...
##  $ Birthdate       : POSIXct[1:164], format: "1993-10-23" "1995-02-10" "1992-10-02" "1991-10-10" ...
##  $ Age_at_Transfer : num [1:164] 24.7 23.4 25.8 26.8 19.8 ...
##  $ From            : chr [1:164] "Monaco" "RB Leipzig" "AS Roma" "Stoke City" ...
##  $ League          : chr [1:164] "Ligue 1" "Bundesliga" "Serie A" "Premier League" ...
##  $ Position        : chr [1:164] "Defensive Midfiled" "Centre Midfield" "Goalkeeper" "Right Wing" ...
##  $ International   : chr [1:164] "Brazil" "Guinea" "Brazil" "Switzerland" ...
##  $ Year            : num [1:164] 2018 2018 2018 2018 2017 ...

How much money did both teams spend on transfers from mid 2008 to 2019?

SpendSummary <-
  
  LFCvMCFC %>%
  group_by(Club) %>%
  summarise(TotalFees = sum(Fee))

SpendSummary
## # A tibble: 2 × 2
##   Club  TotalFees
##   <chr>     <dbl>
## 1 LFC        901.
## 2 MCFC      1391.

Yeah, quite a lot.

TotalPie <-
  
  plot_ly(
    data = SpendSummary,
    labels = ~Club,
    values = ~TotalFees,
    type = 'pie',
    marker = list(colors = footie_palette)) %>% #note piping '%>%' the plot to layout.
  
layout(title = 'Total Transfer Spend LFC v MCFC - 2008 to 2019 (£ Millions)')

TotalPie

Of all the fees that LFC and MCFC spent from 2008 to 2019, Liverpool accounted for 39% and MCFC accounted for 61%. Or put another way, MCFC spent half as much again as LFC…

(1391.05-900.95)/900.95 * 100
## [1] 54.39814

Intuitively, fans will probably have expected amounts in this order of magnitude.

Let’s see how that spending played out year over year…

AnnualSpending <-
  LFCvMCFC %>%
  group_by(Year, Club) %>%
  summarise(AnnualSpend = sum(Fee))
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
AnnualSpending
## # A tibble: 23 × 3
## # Groups:   Year [12]
##     Year Club  AnnualSpend
##    <dbl> <chr>       <dbl>
##  1  2008 LFC          39  
##  2  2008 MCFC         77.7
##  3  2009 LFC          36  
##  4  2009 MCFC        168  
##  5  2010 LFC          22.4
##  6  2010 MCFC        135. 
##  7  2011 LFC         114. 
##  8  2011 MCFC        100  
##  9  2012 LFC          28.9
## 10  2012 MCFC         57  
## # ℹ 13 more rows
AnnualPlot <-
  
  plot_ly(
    data = AnnualSpending,
    x = ~Year,
    y = ~AnnualSpend,
    type = "scatter",
    mode = "line",
    color = ~Club,
    colors = footie_palette) %>%
  
  layout(legend = list(x = 0.05, y = 0.95), #sometimes we move the legend around
      title = "LFC & MCFC Annual Transfer Spending",
         yaxis = list(title = 'Spent (£ Millions)'))

AnnualPlot

It would appear Manchester City took full advantage of new investment in 2008, 2009 and 2010. This pattern also resumed in 2015, 2016 and 2017 where significant investment was made. Liverpool spent more on transfers than Manchester City during only three years; 2011, 2014 and 2018.

Let’s see what patterns might emerge from a look at the individual transactions.

TransactionScatter <- 
 
   plot_ly(
    data = LFCvMCFC,
    x = ~Year,
    y = ~Fee,
    type = "scatter",
    mode = 'markers',
    color = ~Club,
    colors = footie_palette,
    size = 10, #we have opted for a standard size here
    marker = list(opacity = 0.75), #we have a lot of density in parts so opacity is increased
    text = ~paste(Player_Purchased, Fee, Club), #text and hoverinfo introduced to improve default hovertext
    hoverinfo = "text") %>%

layout(legend = list(x = 0.06, y = 1.0),
       title = "LFC & MCFC Individual Transfers",
       yaxis = list(title = 'Cost (£ Millions)'))

TransactionScatter

Interestingly, we can see LFC has the highest amount paid for a single player (Virgil van Dijk in 2018) but in all other years, MCFC purchased the most expensive player; in many years, several more expensive players than LFC.

Let’s have a look at distribution of purchase transactions:

PriceViola <- LFCvMCFC %>%
  
  plot_ly(
    x = ~Club,
    y = ~Fee,
    split = ~Club,
    type = 'violin',
    colors = footie_palette,
    color = ~Club,
    box = list(visible = T),
    meanline = list(visible = T)) %>% 
  
  layout(title = "LFC & MCFC Fee Distribution of Transfer Targets",
    xaxis = list(title = "Club"),
        yaxis = list(title = "Fee (£ Millions)",
      zeroline = F)
  )

PriceViola

We like this visual. You can see that Liverpool’s distribution of player purchase values skews lower than Manchester City. Liverpool median = £7 Million and Manchester City £12 Million. The LFC mean = £11.5 Million and MCFC mean = £16.1 Million. Liverpool KDE of 1 collides at £3.27 Million vs £4.27 Million for Manchester City. LFC 3rd quartile is £17 Million and Manchester City at £25.5 Milion. If one assumes a correlation of quality to cost, this visualization would suggest a deeper distribution of quality in the Manchester City side.

We were curious how these massive financial resources had be allocated to positions on the pitch, also…

PositionSummary <-
  
 LFCvMCFC %>%
  group_by(Position, Club) %>%
    summarise(Total= sum(Fee))
## `summarise()` has grouped output by 'Position'. You can override using the `.groups` argument.
PositionSummary
## # A tibble: 24 × 3
## # Groups:   Position [12]
##    Position           Club  Total
##    <chr>              <chr> <dbl>
##  1 Attacking Midfield LFC    31.8
##  2 Attacking Midfield MCFC  102  
##  3 Centre Back        LFC   138. 
##  4 Centre Back        MCFC  261. 
##  5 Centre Forward     LFC   200. 
##  6 Centre Forward     MCFC  257. 
##  7 Centre Midfield    LFC   186. 
##  8 Centre Midfield    MCFC   88.8
##  9 Defensive Midfiled LFC    53.5
## 10 Defensive Midfiled MCFC  102. 
## # ℹ 14 more rows

Visual will make this more meaningful…

PositionScatter <- 
  
  plot_ly(
    data = PositionSummary,
    x = ~Total,
    y = ~Position,
    type = "scatter",
    mode = 'markers',
    color = ~Club,
    colors = footie_palette,
    size = 10) %>%
      
layout(title = "LFC & MCFC Transfer Spend by Position",
       xaxis = list(title = 'Total (£ Millions)'))

PositionScatter

We are not surprised that most financial resources have been allocated to the spine of the team in Centre Back, Centre Midfield and Centre Forward. This is true of both teams although MCFC have deployed significantly more. Interesting outlier at Right Wing where MCFC have deployed susbstantial resources.

What stories emerge when looking at teams that players were purchased “From”?

PartnerSummary <-
  LFCvMCFC %>%
  group_by(Club, From) %>%
  summarise(Total = sum(Fee))
## `summarise()` has grouped output by 'Club'. You can override using the `.groups` argument.
PartnerSummary
## # A tibble: 110 × 3
## # Groups:   Club [2]
##    Club  From                Total
##    <chr> <chr>               <dbl>
##  1 LFC   1.FSV Mainz 05        4.7
##  2 LFC   AC Milan             16  
##  3 LFC   AEK Athens            1.5
##  4 LFC   AS Roma             117. 
##  5 LFC   Ajax                 23  
##  6 LFC   Arsenal              35  
##  7 LFC   Aston Villa          52.5
##  8 LFC   Atletico Madrid       0  
##  9 LFC   Bayer 04 Leverkusen  11  
## 10 LFC   Benfica              20  
## # ℹ 100 more rows
PartnerScatter <- 
  
  plot_ly(
   data = PartnerSummary,
   x = ~Total,
   y = ~From,
   type = "scatter",
   mode = 'markers',
   color = ~Club,
   colors = footie_palette,
   marker = list(size = ~Total/4)) %>% #we have changed scaling here to drag the eye in desired direction
     
layout(title = "LFC & MCFC Transfer Spending by Selling Club",
       xaxis = list(title = "Spending (£ Millions)")
   )

PartnerScatter

Liverpool really have spent a lot of money at Southampton and AS Roma. However, this plot is super high on density and low on readability. Do we need to see all the dots around y axis? Probably not. Let’s do a top 10 leveraging top_n() function.

TopPartnerSummary <-
  top_n(PartnerSummary, 10, Total)

TopPartnerSummary
## # A tibble: 20 × 3
## # Groups:   Club [2]
##    Club  From           Total
##    <chr> <chr>          <dbl>
##  1 LFC   AS Roma        117. 
##  2 LFC   Ajax            23  
##  3 LFC   Arsenal         35  
##  4 LFC   Aston Villa     52.5
##  5 LFC   Monaco          39  
##  6 LFC   Newcastle       66.3
##  7 LFC   RB Leipzig      52.8
##  8 LFC   Southampton    170. 
##  9 LFC   Sunderland      26  
## 10 LFC   TSG Hoffenheim  29  
## 11 MCFC  Arsenal         70  
## 12 MCFC  Atletic Bilbao  57  
## 13 MCFC  Benfica         51  
## 14 MCFC  Everton         84.5
## 15 MCFC  FC Porto        52  
## 16 MCFC  Leicester       60  
## 17 MCFC  Monaco          95  
## 18 MCFC  Real Madrid     59  
## 19 MCFC  Valencia        57  
## 20 MCFC  VfL Wolfsburg   82
TopPartnerScatter <- 
  
  plot_ly(
    data = TopPartnerSummary,
    x = ~Total,
    y = ~From,
    type = "scatter",
    mode = 'markers',
    color = ~Club,
    colors = footie_palette,
    marker = list(size = ~Total/4)) %>%
  
  layout(title = "LFC & MCFC Top 10 Transfer Partners",
         xaxis = list(title = "Spending (£ Millions)")
)
TopPartnerScatter

LFC and MCFC clearly had their preferred partners during the period in question. MCFC more evenly distributed across Monaco, Everton, Wolfsburg and Arsenal. We wonder if this pattern holds true when looking at the League players have come from. top_n() will be used here again.

LeagueSummary <-
  LFCvMCFC %>%
  group_by(Club, League) %>%
  summarise(Total = sum(Fee))
## `summarise()` has grouped output by 'Club'. You can override using the `.groups` argument.
TopLeagueSummary <-
  top_n(LeagueSummary, 10, Total)

TopLeagueScatter <- 
  
  plot_ly(
    data = TopLeagueSummary,
    x = ~Total,
    y = ~League,
    type = "scatter",
    mode = 'markers',
    color = ~Club,
    colors = footie_palette,
    
    marker = list(size = ~Total/5)) %>%
          
layout(title = "LFC & MCFC Transfer Purchases by Top 10 League of Origin",
       xaxis = list(title = "Spending (£ Millions)")
)

TopLeagueScatter

Clearly, both teams have a preference for players already in the Premier League. Manchester City have shown a significant appetite for high value players from Spain’s La Liga and to a lesser extent Germany’s Bundesliga. Liverpool have also spent a good deal on Italy’s Serie A, which is no surprise as we earlier saw the value of transactions with AS Roma. We are going to sidetrack for a moment as a question just occurred to us. Did the arrival of Pep Guardiola as manager of Manchester City influence spending on players from La Liga and Bundesliga? Guardiola managed in both these leagues prior to joining Manchester City.

#Filter dataset
PepSpending <-
  LFCvMCFC %>%
  filter(Club == "MCFC" & League %in% c("La Liga", "Bundesliga", "Premier League"))

#Summarize the data
  PepSpendingSummary <-
    PepSpending %>%
    group_by(Year, League) %>%
    summarise(Total = sum(Fee))
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
PepSpendingSummary
## # A tibble: 26 × 3
## # Groups:   Year [11]
##     Year League         Total
##    <dbl> <chr>          <dbl>
##  1  2008 Bundesliga       6.7
##  2  2008 La Liga         39  
##  3  2008 Premier League  14  
##  4  2009 Bundesliga      16  
##  5  2009 La Liga          0  
##  6  2009 Premier League 152  
##  7  2010 Bundesliga      11  
##  8  2010 La Liga         49  
##  9  2010 Premier League  26  
## 10  2011 Bundesliga      27  
## # ℹ 16 more rows

And let’s visualize this with Premier League for reference:

PepSpendingPlot <-
  
  plot_ly(
    data = PepSpendingSummary,
    x = ~Year,
    y = ~Total,
    type = "scatter",
    mode = "line",
    color = ~League) %>%

layout(title = "MCFC Spending on Bundesliga and La Liga",
       yaxis = list(title = "Spending (£ Millions)")
       )
PepSpendingPlot

Pep Guardiola arrived February 1, 2016. There is clearly an uptick for spending on players from La Liga but peaks also exist in 2010 and 2013. Pep’s knowledge of the league may have influenced some of the recent purchases.

Fans may be curious about distribution of purchases across country of origin. This is what we found. Again, we have refined based on a top 10.

NationSummary <-
  LFCvMCFC %>%
  group_by(Club, International) %>%
  summarise(Total = sum(Fee))
## `summarise()` has grouped output by 'Club'. You can override using the `.groups` argument.
TopNationSummary <-
  top_n(NationSummary, 10, Total)

TopNationScatter <- 
  
  plot_ly(
  data = TopNationSummary,
  x = ~Total, y = ~International,
  type = "scatter",
  mode = 'markers',
  color = ~Club,
  colors = footie_palette,
  marker = list(size = ~Total/3)) %>%
  
  layout(title = "LFC & MCFC Transfer Spending by Top 10 Country of Origin",
         xaxis = list(title = "Spending (£ Millions)",
         yaxis = list(title = "Country")
         )
)

TopNationScatter

In terms of monetary value, both clubs have spent the most on English players. MCFC have also invested large sums in French and Brazilian players. The absolute numbers are certainly telling but how are they trending? We will examine the top 6 nationalities represented above.

#filter the data
LFCFliter <-
  LFCvMCFC %>%
  filter(Club == "LFC" & International %in% c("England", "Brazil", "France", "Argentina", "Netherlands", "Spain"))
 
#summarize
LFCNationTrendSummary <-
  LFCFliter %>%
  group_by(Year, International) %>%
  summarise(Total = sum(Fee))
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
LFCNationTrendPlot <- 
  plot_ly(
    
    data = LFCNationTrendSummary,
          
          x = ~Year, y = ~Total,
          
          type = "scatter",
          mode = 'line',
          linetype = ~International
          
          )

LFCNationTrendPlot
## Adding lines to mode; otherwise linetype would have no effect.
## Adding lines to mode; otherwise linetype would have no effect.
## Adding lines to mode; otherwise linetype would have no effect.
## Adding lines to mode; otherwise linetype would have no effect.
## Adding lines to mode; otherwise linetype would have no effect.
## Adding lines to mode; otherwise linetype would have no effect.

For Liverpool we can see the early impact of spending on English players which also shows a recent slow increase. Large jump in resources deployed on Brazilian and Dutch players. Spending on the Dutch is due to the record on Virgil van Diyk.

and MCFC?

#filter the data
MCFCFliter <-
  LFCvMCFC %>%
  filter(Club == "MCFC" & International %in% c("England", "Brazil", "France", "Argentina", "Netherlands", "Spain"))
 
#summarize
MCFCNationTrendSummary <-
  MCFCFliter %>%
  group_by(Year, International) %>%
  summarise(Total = sum(Fee))
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
MCFCNationTrendPlot <- 
  plot_ly(
    
    data = MCFCNationTrendSummary,
          
          x = ~Year, y = ~Total,
          
          type = "scatter",
          mode = 'line',
          linetype = ~International
              )

MCFCNationTrendPlot
## Adding lines to mode; otherwise linetype would have no effect.
## Adding lines to mode; otherwise linetype would have no effect.
## Adding lines to mode; otherwise linetype would have no effect.
## Adding lines to mode; otherwise linetype would have no effect.
## Adding lines to mode; otherwise linetype would have no effect.
## Adding lines to mode; otherwise linetype would have no effect.

We are curious about the ages of players being purchased. Are they younger or established? Is one team buying younger than the other and investing for a longer future? We will use some slightly more involved statistical investigation here, opting for a violin plot. Please google them if you need detailed understanding.

AgeViola <- LFCvMCFC %>%
  
  plot_ly(
    x = ~Club,
    y = ~Age_at_Transfer,
    split = ~Club,
    type = 'violin',
    colors = footie_palette,
    color = ~Club,
    box = list(visible = T),
    meanline = list(visible = T)) %>% 
  
  layout(title = "LFC & MCFC Age Distribution of Transfer Targets",
    xaxis = list(title = "Club"),
        yaxis = list(title = "Age at Transfer",
      zeroline = F)
  )

AgeViola

It is down to the individual reader to make any inferences here in terms of club policy. However, LFC does demonstrate a mean age of transfer targets almost a full year younger than MCFC. This is also true of the q3 values and slightly less true of q1. The difference is reduced somewhat looking at median. KDE of 1 occurs for LFC at ~25 years of age and for MCFC at ~24.80. Again, readers can reach their own conclusions but we find this interesting and would be of even greater interest if combined with data regarding contract length, etc.

We feel like trying a subplot of Country and League of Origin…

TopLeagueScatterSub <- 
  
  plot_ly(
    data = TopLeagueSummary,
    x = ~Total,
    y = ~League,
    type = "scatter",
    mode = 'markers',
    color = ~Club,
    colors = footie_palette,
    marker = list(size = ~Total/5),
    showlegend = FALSE) %>%
          
layout(xaxis = list(title = "Spending (£ Millions)")
)

TopNationScatterSub <- 
  
  plot_ly(
  data = TopNationSummary,
  x = ~Total, y = ~International,
  type = "scatter",
  mode = 'markers',
  color = ~Club,
  colors = footie_palette,
  marker = list(size = ~Total/4)) %>%
  
  layout(xaxis = list(title = "Spending (£ Millions)")
)

NationAndLeagueSub <-
  
subplot(TopLeagueScatterSub, TopNationScatterSub, shareX = TRUE, margin = 0.1) %>%
  
  layout(title = "LFC & MCFC Transfer Spending by Nationality and League of Origin")

NationAndLeagueSub

Generating HTML datatables:

RevTable <- datatable(Football.Revenues, caption = "Deloitte Football Money League Rankings (all revenues in £ Millions)")

RevTable
PurchaseTable <- datatable(LFCvMCFC, caption = "Player Purchase Transactions (Fee £ Millions")

PurchaseTable