Simple to start - following interesting data to its conclusions
The current iteration of this post is from 2019-10-24
This project started after I came across a visualisation of the years of income needed to pay off a house across counties in the U.S. (assuming median house price and median income per region). This seemed like an achievable visualisation, and something I could do. It compelled me to start this blog to record my own projects trying to answer questions using data.
This project started after I came across a visualisation of the years of income needed to pay off a house across counties in the U.S. (assuming median house price and median income per region). This seemed like an achievable visualisation, and something I could do. It compelled me to start this blog to record my own projects trying to answer questions using data.
That post is a straightforward visualisation using a combination of simple statistics but the data highlights striking differences in relative costs of acquiring a home in the United States. As a resident of the UK, my first obvious question was what does this look like for the UK? From combining the data with what little I know of the U.S.'s geography, peaks in the data appear to correlate with the locations of large cities - so how well does this data correlate with population density? Are there other meaningful correlations that can be drawn? Do such correlations hold true for this data in the UK?
I had a few hypotheses as to what the data might show:
- London will be the most expensive region, as well as being the most expensive region relative to earnings
- Cities will likely have higher income:price ratios (higher number of years to buy a house) than the surrounding areas
- Based on the assumption that younger people move to the cities, and older/higher wage people move out later in their careers
- London may appear different (higher) to other cities, as it is a popular location for foreign investment in property, meaning it has outsized demand relative to supply versus other UK cities without as much interest.
- Would expect southern regions to be more expensive than northern ones, though not sure if the house price/income ratio for these regions will be altered.
- Not confident in other regions
To find out what story the data tells, the first step was to break down the problem into smaller chunks:
- Is equivalent data publically available for the UK?
- Both median house price and median income per region
- How do I go about plotting this region specific data to form a colour-coded map (Choropleth)?
- How do I get spatial representation of the regions the data describes
- How do I plot the regions
- How can I plot the regions using the data I find (colour-scaled representation)
To answer the first question I began looking for census data, assuming this should contain regionally organised income data. This search led me to the office for national statistics (ONS), which was somewhat like showing a child a sweet shop. ONS has far more detailed and numerous datasets than I had expected. The ONS website had available the annual median house price per region – an ONS standard region known as a ‘middle layer super output area’, or ‘MSOA’. I was also able to find model-based estimates for MSOA region household income for various years (2012, 2014 and 2016).
So with this data I had hit the first of my bullet points. The next task was to overlay this data on a map of the UK.
In search of guidance for plotting map data, I came across this post detailing Choropleth creation from JSON data using R. After familiarising myself with the code provided on the article, I found a source of geoJSON files for MSOA regions in the UK.
After spending a while trying to untangle the geoJSON structure to create a dataframe from multiple small geoJSON area files, I went back to ONS and was able to find a single shape file containing all of the regions that could be converted to geoJSON or topoJSON. From here I was able to extract the region-associated coordinates to construct a single data frame to which I could append the data I wished to plot.
With a single dataframe created, I just needed to plot the data. I started off with a histogram (binned distribution) of the average house price across the MSOA regions in 2018 (England and Wales). Plotting this allows a look at the data distribution, which helps determine how best to show the data on the choropleth.
The histogram shows a prominent positive skew with quite a long tail. I think I was expecting significantly less of this skew, i.e. closer to a normal distribution, so this visualisation was informative for me.
This presence of the tail with very high values on suggests that using a log scale to plot the median house price per region would likely be the best way to show the distribution of the data while making the most of the whole scale.
This is what this looks like when you’re able to see every region across England and Wales:
Choropleth of house prices across England and Wales (MSOA), 2018
It’s a pretty striking image, and there’s a lot of valuable information one can extract from this one image.
A couple of notable points:
- It’s possible to identify the location of cities across the UK, and not only by the more densely packed regions. Many of the prominent cities, with the notable exception of London, appear to have distinctly lower median house prices versus the surrounding areas. This doesn’t seem to be the case for cities in southern England.
- In general, southern England is more expensive than the rest of England and Wales. Both Wales overall, and densely packed regions in the north (those with smaller region sizes) appear to have median house prices that are much lower than the rest of the country.
- London seems to have a split identity, with an expensive (and in some cases very expensive) strip running from north to south west, and the rest of London has cheaper median house price than the surrounding region.
- Knowing absolutely nothing about it prior to this, I expected the Isle of White to have a relatively higher median house price than it does.
Unrelated to the conclusions from the data, there are a couple of quirks that I’ve noticed with my choropleth
- There’s some skewing that appears to be coming from the way that converted shapefile geoJSON files are handled. This means that while the southern parts of England look about right, there’s some serious stretching going on when you approach the Scotish border. This is fixable.
- Two islands in the Severn river are messing with the mapping of a region north west of Bristol. I’ll have to dig in to the data to find why these are being mapped the way they are, but if I had to guess, they’re a part of the same MSOA region. Dang.
Now that we’ve seen what the house price data looks like across England and Wale, I was curious as to whether this was a typical distribution of England and Wales house price data. Using the MSOA level data, I created a time series from the beginning of the data (1997-2018) and animated it to show the evolution of the house price distribution over time. using ffmpeg.
From this data it’s clear that the distribution of house prices has been changing quite dramatically over the past 20 years. From 1995 to ~2005 the distribution was closer to normal, with an expected tail leading off to higher prices. Since then, while the modal (most frequent) house price hasn’t changed very much, that tail has become more of a dominant feature over time. This has the effect of dragging the mean and median average house price up significantly during this period.
This might not be too much of a bad thing for house affordability, provided that annual income is increasing at the same pace.
Unfortunately MSOA-level annual income data is only available for 2012, 2014 and 2016. To find data for a comparable date range to that in the animation above, I found income at the local authority (LA) level, again through the ONS. The source of this data actually contained not only the median income per region, but the median house price and median house price/income ratio. Clearly I am not the first to think of analysing the data in this way.
Using this source for income, I plotted a similar animated histogram covering 2002-2018.
This might not be too much of a bad thing for house affordability, provided that annual income is increasing at the same pace.
Unfortunately MSOA-level annual income data is only available for 2012, 2014 and 2016. To find data for a comparable date range to that in the animation above, I found income at the local authority (LA) level, again through the ONS. The source of this data actually contained not only the median income per region, but the median house price and median house price/income ratio. Clearly I am not the first to think of analysing the data in this way.
Using this source for income, I plotted a similar animated histogram covering 2002-2018.
It's slightly harder to make judgements from the smaller set of medians that result from the local authority regional dataset, versus the MSOA-level resolution available for house price data. That said, median incomes are clearly rising across England and Wales throughout the time period, with a brief stall around 2009-2012. The shape of the distribution doesn't appear the change too much, though it does seem to spread towards the right as time goes on, indicating a rise in occurrence of high-end wages.
Stratifying this data by region enables greater scrutiny of how incomes are affected across the country. It’s likely that London and the south have higher wages and incomes than other regions, but it’s hard to tell when looking at the overall summary where increases and decreases are coming from.
London certainly takes the crown for most volatile here.
A breakdown of what appears to be going on.
- London has always had the highest median income of any region throughout the time series, by several thousand pounds.
- There is a very wide gulf between the median income in the top- and bottom-earning regions. This gap is widening, but slowed down after ~2012.
- London has the highest variance in annual income across each of its local authorities.
- The North East region has a notably small variance in median incomes across the local authorities within the region.
- This is the far end of a trend, where southern and the ‘East’ region have higher median incomes and larger variances, but other regions have similarly low median incomes and median income variance.
- Median annual income increases every year across every region, though the growth is not consistent
- Wage growth appears to slow down in the middle of the timeseries, around 2008-2012, possibly related to the 2007/2008 financial crisis.
Looking more closely at the change in median income over time, we can see what is happening.
Again, let’s try to disentangle what’s going on across these regions.
Again, let’s try to disentangle what’s going on across these regions.
- Median annual income growth was falling from the beginning of the dataset (2002/3) until 2010-12, since when it has been increasing. This increase in income growth has been slower than the fall.
- Median annual income grew every year, by at least £250/year across every region. I don’t know from this data whether this kept up with the increasing cost of living.
- Before 2010, London was far ahead of everywhere else in terms of income growth, but since then has performed similarly to other well-performing regions.
- The North East growth curve looks absolutely nothing like the rest of the group. Fascinatingly, and perhaps depressingly, this region has had resolutely declining wage growth from 2002-18.
- For some regions, the minimum income growth occurred earlier than in other regions, in 2010/11, rather than 2011/12. I assume this minimum represents an indicator of the region beginning the recovery from the 2007/8 financial crisis.
- 2010/11:
- West Midlands,
- East Midlands,
- Wales
- North West
- 2011/12:
- London,
- South East,
- East,
- South West,
- Yorkshire and The Humber
- Those regions with typically higher median incomes (and Yorkshire...) all appeared to recover slightly later.
- It looks as though the rate of income growth slowed across all regions after 2015/16. This could be to do with the Brexit vote and uncertainty, but this result will need further analysis to be more than speculation.
With both income and house price data available, I plotted the ratio of the two to get a measure of relative house affordability across the different regions, from 2002-2018.
Median house price/income ratio across regions of England and Wales

- It is relatively more expensive to buy a house than it was in 2002 in any region across England and Wales.
- In most regions except London, the South East and the East, house affordability relative to income is about the same as it was in 2007/2008, right around the time of the financial crisis. In a couple of these regions, it’s (marginally) better!
- In the South East and the East, house affordability has been yo-yoing since 2005. It would be interesting to know what is causing this.
- Perhaps it’s a more responsive, but over corrective housing or job market?
- London is a ludicrously expensive place for its inhabitants to buy a house. It started off bad, at a ratio of around 7 times the median annual income. This was more than twice that of the ratio found in the north of the country. Since then it has doubled to around 14 times the median annual income, and three other regions (the south and the East) have crossed over the level of unaffordability that London was at in 2002.
- As a resident of London, I have been made more sad by the existence of this graph.
- Please, if you spot a flaw in my methodology, please do let me know ASAP by email.
Going to hit pause there for now while I recuperate my spirits. I’ll be back with a choropleth showing the ratio visualised across England and Wales. There will also be another post following this detailing my thoughts and learning experiences as I carried out the project.
Comments
Post a Comment