Property Intelligence User Guide

 

View Release Notes

This user guide provides information on the Property Intelligence database, prepared by GBG for customers looking to use the data. Property Intelligence is designed to assist in the process of providing insurance quotes for customers in England, Wales and Scotland by supplying data on residential properties. Property coverage:

  • Houses, bungalows and flats covered although the majority of testing has been on freehold properties i.e. typically houses and bungalows;

Geographic coverage:

  • England and Wales full coverage;
  • Scotland full coverage but reduced accuracy because of different policies for land and property registration and the publication of data;
  • Northern Ireland limited coverage, just includes estate agent data;
  • Isle of Man and Channel Islands not covered;

Property Intelligence is currently built on a quarterly basis, this is subject to review. The underlying datasets have a range of update frequencies from monthly upwards. Updates are included in the build as they become available.

2. Database Fields


Utility fields

The database contains a set of utility fields and a set of feature fields. The utility fields are as follows:

  • UPRN - the Unique Property Reference Number originating from the Ordnance Survey
  • UDPRN - the Unique Delivery Point Reference Number originating from the Royal Mail
  • UMRRN - the Unique Multiple Residence Reference Number originating from the Royal Mail
  • Address1 - a standardised first line of address containing house name or number and street
  • Postcode - a full postcode
  • Easting - Ordnance Survey National Grid Easting
  • Northing - Ordnance Survey National Grid Northing
  • Latitude - latitude in ETRS89 converted from the easting using OSTN02
  • Longitude - longitude in ETRS89 converted from the northing using OSTN02
  • Output area code - Output Area (OA) code from the ONS Postcode Directory
  • Lower super output area code - Lower Super Output Area (LSOA) code from the ONS Postcode Directory
  • Output area code - Census 2021 Output Area (OA) code from the ONS Postcode Directory
  • Lower super output area code - Census 2021 Lower Super Output Area (LSOA) code from the ONS Postcode Directory
  • Country - one of England, Northern Ireland, Scotland, or Wales from the ONS Postcode Directory

Data items

The feature fields in the database are arranged in sets of three:

  • X - this is the data item of interest, for example a number of bedrooms for which X = “bedrooms”;
  • source_X - this is the source of the information, for example the number 2 indicates that this data item is sourced from the Land Registry;
  • p_X - this is a confidence score for the data ranging between 0 and 1. Confidence scores are calculated, where possible, as a “fraction correct” measure against a groundtruth dataset of 36,000 properties supplied by Simple and Open;

The unique key to the database is the UDPRN / UMRRN pair supplied by Royal Mail, the UPRN is also supplied. The list of data items is as follows:

  • Property type - whether the property is semi-detached, detached, terraced or a flat
  • Number of floors - the estimated number of floors in the property based on the height of the building.
  • Number of bedrooms - the number of bedrooms in a property
  • Number of bathrooms - the number of bathrooms in a property
  • Number of rooms in total - the number of rooms excluding bathrooms and kitchens
  • Building construction period - the construction date of a building in one of the following periods: (before 1719 (old), 1720-1839 (Georgian), 1840-1919 (Victorian/Edwardian), 1920-1945 (Inter-war), 1946-1979 (Post-war) and 1980 to date (Modern))
  • Year built - the year built, only available for those buildings in the Land Registry Price Paid data, built after 1995
  • Listed building - The grade of listing of a building, if it is listed, using data supplied by English Heritage, Cadw or Historic Scotland
  • Cadastral polygon area - the area of the cadastral parcel in which the building sits expressed in square metres using data from Land Registry
  • Height - the building height in metres
  • Building footprint (square metres) - the approximate footprint of the building expressed in square metres
  • Building volume (cubic metres) - the approximate volume of the building expressed in cubic metres
  • Average roof slope - the average slope of the property roof, can be used to identify properties with flat roofs
  • Flat roof fraction - the estimated fraction of a building which has a flat roof
  • Distance to tree - distance from the nearest tree over 10 metres tall to the property geocode
  • Geocode multiplicity - the number of property geocodes falling within the footprint of the building at 1.8 metres above ground level
  • Floor area (square metres) - the liveable floor area in square metres
  • Last transaction price - the price paid at the last transaction recorded by the Land Registry (England and Wales only, back to 1995)
  • Last transaction date - the date of the last transaction recorded by the Land Registry (England and Wales only, back to 1995)
  • Estimated current value - estimated current value based on data from Land Registry (England and Wales only, back to 1995)
  • Number of transactions - the number of transactions recorded by the Land Registry (England and Wales only, back to 1995)
  • Estimated council tax band - estimated council tax from price at reference years using Land Registry data (England and Wales only, back to 1995)
  • Within 200 metres of watercourse - flag indicating whether there is a watercourse within 200 metres
  • Distance to watercourse (within 200 metres) - distance (in metres) to a watercourse, if it is within 200 metres
  • Distance to road - the distance to the centre line of the nearest road from the property geocode, not necessarily accessible
  • Road class - road class, as provided by Ordnance Survey
  • Business usage - a flag indicating potential business usage
  • Planning classification - planning classification as per Town and Country Planning (Use Classes) Order 1987 for non-domestic properties
  • Congestion zone - a flag indicating if a property is in the London Congestion Zone
  • Burglary rate - the number of burglaries per property per year averaged over a LSOA (England and Wales only)
  • Storey on which flat sits - storey on which a flat sits. This typically contains N/A where it is not available or applicable or a number which may have been derived from a model based on the text found in the original data source
  • Is top floor flat? - Is a flat on the top floor of the building
  • Number of extensions - the number of extensions to a property, typically 1 but up to 4
  • Wall type - the type of wall used in construction, possible values cavity wall, solid brick, sandstone, granite, timber frame, system built and SAP05
  • Main central heating fuel - Main central heating fuel, possible values include gas, electricity, oil, coal, LPG, wood, B30K (a biofuel mix) and also 'not known' and 'none'
  • Type of tenure - type of tenure: owner-occupier, rented or social housing
  • Energy rating - Energy rating as indicated in the EPC Energy Certificate
  • EPC Inspection Date - Inspection date indicated in the EPC Energy Certificate

Technical details

 

Technical details for each of these fields are shown in the table below:
Title Field name Data type
UPRN UPRN Integer
UDPRN UDPRN Integer
UMRRN UMRRN Integer
Address1 address1 Text
Postcode postcode Text
Easting easting Float
Northing northing Float
Latitude latitude Float
Longitude longitude Float
Output area code OA11CD Text
Lower super output area code LSOA11CD Text
Output area code OA21CD Text
Lower super output area code LSOA21CD Text
Country country Text
Property type property_type Lookup
Number of floors floors Integer
Number of bedrooms bedrooms Integer
Number of bathrooms bathrooms Integer
Number of rooms in total total_rooms Integer
Building construction period age Lookup
Year built year_built Integer
Listed building listed Lookup
Cadastral polygon area cadastral Float
Height height Float
Building footprint (square metres) footprint Float
Building volume (cubic metres) volume Float
Average roof slope avg_roof_slope Float
Flat roof fraction flat_roof_fraction Float
Distance to tree distance_to_tree Float
Geocode multiplicity geocode_multiplicity Integer
Floor area (square metres) floor_area Float
Last transaction price last_transaction_price Integer
Last transaction date last_transaction_date Text
Estimated current value est_current_value Integer
Number of transactions n_transactions Integer
Estimated council tax band est_council_tax Lookup
Within 200 metres of watercourse watercourse_200M Lookup
Distance to watercourse (within 200 metres) distance_to_water Float
Distance to road distance_to_road Float
Road class road_class Lookup
Business usage business_usage Lookup
Planning classification planning_classification Lookup
Congestion zone congestion_zone Lookup
Burglary rate burglary_rate Float
Storey on which flat sits flat_floor Text
Is top floor flat? top_floor_flat Lookup
Number of extensions extensions Integer
Wall type wall_type Lookup
Main central heating fuel main_fuel Lookup
Type of tenure tenure Lookup
Energy rating energy_rating Lookup
EPC Inspection Date epc_inspection_date Text
Table 1: Technical details for each utility and data field. Lookup fields contain positive integers (starting from zero). source_X fields are lookup fields, p_X fields are number fields.

3. Lookup tables

Tables 2-13 are the lookup tables relating the numbers found in the database fields to descriptions for the property type, property age, Council Tax band, and data source. The Yes/No lookup is used for the 'watercourse 200M', 'congestion zone' and 'top floor flat' fields.

Yes/no lookup

Description Value
No 0
Yes 1
Table 2: Yes/no lookup

 

Property type lookup

Description Value
Detached 0
Semi-detached 1
Terraced 2
Flat 3
Unknown 4
Table 3: Property type lookup

 

Property age lookup

Description Value
Before 1719 (old) 0
1720-1839 (Georgian) 1
1840-1919 (Victorian/Edwardian) 2
1920-1945 (Inter-war) 3
1946-1979 (Post-war) 4
1980 to date (Modern) 5
Not known 6
Table 4: Property age lookup

 

Council Tax lookup

Description Value
A 0
B 1
C 2
D 3
E 4
F 5
G 6
H 7
I 8
N/A 100
Table 5: Council tax band lookup

 

Data source lookup

Description Value
Default 0
Land Registry 2
Historic England 3
Estate agent 4
LIDAR 7
NROSH multipart 8
NROSH snapshot 9
VOA 12
Heuristic 14
ML (age) 15
Naive Bayes (age) 17
Banded VOA 18
ML (bedrooms) 19
VOA (Council Tax) 20
OS Open Rivers 21
NB (bedrooms) 24
OS Open Map 25
Transport for London 28
police.uk 29
Flats modeller 30
Cadw 33
Historic Environment Scotland 34
OS Open Roads 35
Royal Mail 36
DCLG 37
DCLG non-domestic 38
Prefix flat floor modeller 42
Flats per floor modeller 43
Nearest neighbour modeller 44
DCLG Scotland 45
DCLG Scotland non-domestic 46
Financial Services 48
Table 6: Data source lookup

 

Business usage lookup

Description Value
Domestic 0
Business 1
Table 7: Business usage lookup

 

Main fuel lookup

Description Value
Gas 0
Electricity 1
Oil 2
Not known 3
Coal 4
LPG 5
Wood 6
None 7
B30K 8
Other 9
Biomass/Biogas 10
District heating 11
Waste heat 12
Table 8: Main fuel lookup

 

Wall type lookup

Description Value
Cavity wall 0
Solid brick 1
Sandstone 2
Timber frame 3
Granite 4
System built 5
SAP05 6
Not known 7
Table 9: Wall type lookup

 

Planning classification lookup

Description Value
Not known 0
A1/A2 Retail and Financial/Professional services 1
A3/A4/A5 Restaurant and Cafes/Drinking Establishments and Hot Food takeaways 2
B1 Offices and Workshop businesses 3
B2 to B7 General Industrial and Special Industrial Groups 4
B8 Storage or Distribution 5
C1 Hotels 6
C2 Residential Institutions - Hospitals and Care Homes 7
C2 Residential Institutions - Residential schools 8
C2 Residential Institutions - Universities and colleges 9
C2A Secure Residential Institutions 10
C3 - Dwelling houses 11
D1 Non-residential Institutions - Community/Day Centre 12
D1 Non-residential Institutions - Crown and County Courts 13
D1 Non-residential Institutions - Education 14
D1 Non-residential Institutions - Libraries Museums and Galleries 15
D1 Non-residential Institutions - Primary Health Care Building 16
D2 General Assembly and Leisure plus Night Clubs and Theatres 17
Others - Passenger terminals 18
Others - Emergency services 19
Others - Miscellaneous 24hr activities 21
Others - Car Parks 24 hrs 22
Others - Stand alone utility block 23
Others - Telephone exchanges 24
Sui generis 25
Table 10: Planning classification lookup

 

Road class lookup

Description Value
Unclassified 0
Not classified 1
Classified unnumbered 2
B Road 3
A Road 4
Motorway 5
Unknown " 6
Table 11: Road class lookup

 

Listed building grade lookup

Description Value
Not listed 0
I or A 1
II* or B 2
II or C 3
Table 12: Listed building grade lookup

 

Tenure lookup

Description Value
Owner-occupier 0
Rented 1
Social 2
Table 13: Tenure lookup

 

Energy rating lookup

Description Value
A 0
B 1
C 2
D 3
E 4
F 5
G 6
Table 14: Energy rating lookup

 

4. Accuracy

Accuracy for the tested fields calculated using Y24M02-groundtruths.sqlite on 2024-03-01 22:52:52 against 33096 properties is shown in the table below.

Field Accuracy (%)
Number of bedrooms 71.4
Number of bathrooms 77.4
Building construction period 68.7
Property type 82.3
Number of floors 89.9

 

Table 15: Summary accuracy for fields, measured against 'groundtruth' properties in England and Wales, excluding flats

 

5. Coverage

The following tables show dataset coverage and accuracy for number of floors, bedrooms, age and property type using the along with confidence for these attributes based on measurements against the 33,000 property groundtruth dataset covering England and Wales.

Source Coverage Accuracy Confidence
DCLG 0.115 0.700 0.700
Default 0.029 0.394 0.500
Estate agent 0.446 0.795 0.850
Flats modeller 0.001 0.318 0.600
NB (bedrooms) 0.401 0.651 0.641
NROSH multipart 0.008 0.772 0.800
NROSH snapshot 0.000 1.000 0.800
Overall 1.000 0.714 0.738
Table 16: Accuracy and coverage for bedrooms
Source Coverage Accuracy Confidence
Default 0.578 0.800 0.730
Estate agent 0.422 0.707 0.760
Overall 1.000 0.774 0.743
Table 17: Accuracy and coverage for bathrooms

 

Source Coverage Accuracy Confidence
Cadw 0.000 0.636 0.630
DCLG 0.420 0.778 0.650
Heuristic 0.011 0.460 0.600
Historic England 0.004 0.565 0.540
Land Registry 0.021 0.941 0.950
Naive Bayes (age) 0.324 0.708 0.721
Overall 1.000 0.687 0.639
VOA 0.221 0.471 0.473
VOA 0.229 0.471 0.473
Table 18: Accuracy and coverage for age

 

Source Coverage Accuracy Confidence
Banded VOA 0.057 0.629 0.597
DCLG 0.068 0.939 0.900
Default 0.001 0.471 0.540
Estate agent 0.680 0.853 0.880
LIDAR 0.191 0.747 0.800
Land Registry 0.000 0.000 0.920
NROSH multipart 0.003 0.152 0.800
Overall 1.000 0.823 0.849
Table 19: Accuracy and coverage for property_type

 

Source Coverage Accuracy Confidence
Banded VOA 0.126 0.826 0.813
DCLG 0.101 0.962 0.940
LIDAR 0.772 0.903 0.900
Overall 1.000 0.899 0.893
Table 20: Accuracy and coverage for floors

 

6. Attribute distribution charts

The following charts show the distribution of values for selected fields, for domestic properties, not arising from the default model. 



Figure 1: Distribution of property type


Figure 2: Distribution of number of bedrooms


Figure 3: Distribution of number of bathrooms


Figure 4: Distribution of building construction period


Figure 5: Distribution of number of floors

 

7. Direct data content

The following tables shows the coverage with direct data for the five fields tested against groundtruth.

Attribute Percentage direct
Property type 89.1
Floors 73.8
Bedrooms 61.3
Bathrooms 35.8
Age 56.8
Age 55.3
Table 21: Percentage of data supplied from direct sources rather than modelled

 

8. Data recency

Data recency for the Property Intelligence dataset is determined by a number of factors, listed below:

  • The build process for Property Intelligence takes approximately 2 months from start to delivery to customer with quarterly scheduled releases;
  • Individual datasets have a range of update frequencies, some are static and will never be updated, others are yearly, quarterly or monthly;
  • Two datasets, EPC (formerly DCLG) and Estate agent data, have property-level fields which indicate when an inspection was carried out so potentially day-level data on recency could be provided;
  • The LIDAR data is a composite dataset, 80% of which has been collected in the last 10 years;

The table below shows the dates of the datasets used in this version of Property Intelligence along with an indication of the expected update frequency.

Dataset Frequency Date
Congestion Zone Once None
DCLG Quarterly 2024-01-31
DCLG Scotland Quarterly 2024-01-31
ONS Postcode to LSOA/LA lookup Quarterly 2024-01-31
Land Registry House Price Index Monthly 2024-01-31
Land Registry Cadastral Polygons Quarterly None
Land Registry Price Paid Monthly 2024-01-31
English Heritage Yearly 2023-06-21
Historic Environment Scotland Yearly 2023-06-21
Cadw Yearly 2022-05-09
NROSH Once 2016-12-12
ONSPD Quarterly 2023-12-04
ONS rural-urban classification Once 2016-12-12
OS Open UPRN Quarterly 2024-01-01
OS Open Rivers Quarterly None
OS Open Roads Quarterly None
Police.uk Monthly 2024-01-31
Royal Mail Monthly 2024-01-23
VOA Yearly 2022-11-28
Estate Agent Monthly 2024-02-01

Table 22: Data recency and frequency by dataset

 

The Environment Agency started to systematically cover England for LIDAR measurement in about 2005 and they have added, very approximately 5% coverage in each year since then.

Figure 6: Cumulative percentage of LIDAR coverage

9. Attributions

Release Notes

March 2024: 

No new fields have been added in this release but sources have been updated.

January 2024: 

No new fields have been added in this release but sources have been updated.

October 2023: 

No new fields have been added in this release but sources have been updated.

July 2023: 

No new fields have been added in this release but sources have been updated.

April 2023: 

An EPC_INSPECTION_DATE is added. References to 'DCLG', the original department responsible for the EPC Energy Certificate data are replaced with 'EPC' in documentation.

The Census 2021 codings oa21cd and lsoa21cd are added, to sit alongside the Census 2011 codings. Currently the source Open Data used to derive fields in Property Intelligence still use the Census 2011 codings.

January 2023:

No new fields have been added in this release but sources have been updated.

October 2022:

Data refreshed - there are no new fields that have been added in this release but the data sources have been updated to reflect this release.

July 2022:

Data refreshed - there are no new fields that have been added in this release but the data sources have been updated to reflect this release.

April 2022:

No new fields have been added in this release but sources have been updated.

Our supplier of business information which is used to populate the business usage field has changed.

January 2022:

No new fields have been added in this release but sources have been updated.

Floor areas for flats are now included in modelling so that values for neighbouring flats are used if direct data is not available.

As a result of changing our address cleanser to the standard GBG Loqate Verify engine we now include some data from Northern Ireland.

October 2021:

No new fields have been added in this release but sources have been updated.

We have added the DCLG Scotland data which provides a significant improvement in accuracy for property type, and property age in Scotland as well as improvements in accuracy to numbers of bedrooms and floors. DCLG Scotland also provides fields including extension count, wall type, main fuel, floor area, total rooms, tenure and energy rating which were not previously populated for Scotland.

There are improvements in the flat floor modeller such that it does not return unreasonable large values (over 90 storeys) or non-numeric values (other than N/A).

July 2021:

No new fields have been added in this release but sources have been updated.

We have introduced modelling for 'flat_floor' - the storey a flat sits on which improves the coverage for this field, and introduces new entries to the data sources table.

April 2021:

Tenure and energy rating fields have been added. Tenure is a replacement for the previously removed tenancy field. It indicates whether a property is owner-occupied, private rental or social housing. The name has been changed to retain consistency with the underlying dataset

The property age field has improved direct data content and accuracy as a result of the addition of a new dataset.

January 2021:

No new fields have been added in this release but sources have been updated.

October 2020:

We have resumed supply of two fields which had been suspended:

  • the congestion zone field from the raw TFL data rather than using a third party as a supplier;
  • the cadastral area (building plot area) which had been suspended due to licensing issues. This is based on the Land Registry INSPIRE Polygons data;

The Land Registry House Price Index is available once again, and thus Estimated Current Values will be up to date unless a sale went through during the period in which the HPI was suspended.

July 2020:

As noted in the April 2020 Release notes we have removed the following fields from this release:

  • geocode accuracy
  • red route
  • tenancy
  • multiplicity
  • outdoor area
  • building count
  • adult occupants

The Land Registry UK House Price Index has been suspended as of the April 2020 release, due to be published in June because of the impact of COVID-19 which means limited transactions are occurring on which to base the Index. The relevant Land Registry Bulletin describing this change is here. This means the estimated current value field will contain the estimated current value at last release of the House Price Index - 1st March 2020.

April 2020:

This build incorporates the OS AddressBase Premium property-level easting / northing and latitude/longitude coordinates, these replace those provided by our previous supplier. This data is supplied under evaluation terms which you have already signed up to.

These will be replaced with coordinates from derived from Ordnance Survey Open Source data once this has been released in July 2020.

As a result of recent supplier changes we are also withdrawing a number of fields including the geocode accuracy and red route fields. The Congestion Zone field will remain but not be populated in the next build.

We will also be withdrawing the Tenancy field as a result of other supplier licensing changes.

Finally there are a number of fields which have not been populated for some time including multiplicity, outdoor area, building count and number of adult occupants. All of these fields are present in this build containing default values in most cases but will be removed from the July 2020 build.

January 2020:

No new fields have been added in this release but sources have been updated.

October 2019:

No new fields have been added in this release but sources have been updated.

July 2019:

No new fields have been added in this release but sources have been updated.

April 2019:

The listed building field now reports the grade of listing (I, II* or II in England or Wales, A, B or C in Scotland). Previously buildings were just reported listed/not listed.

October 2018:

A tenancy field was added in this release which identifies a property as being rented, social housing or owner-occupier.

June 2018:

No new fields have been added in this release but sources have been updated, in addition the documentation provides details of data recency.

February 2018:

This release contains further parameters derived from LIDAR data, these include building footprint, building volume, the average roof slope, a flat roof fraction, the distance to the nearest tree over 10 metres high to the property geocode and a geocode multiplicity which counts the number of geocodes within a building. The building footprint is not listed below since it is not a new field but has been re-calculated using LIDAR data.

  • Building volume (cubic metres)
  • Average roof slope
  • Flat roof fraction
  • Geocode multiplicity
  • Distance to tree

October 2017:

This release incorporates a major new dataset which has brought improved accuracy to numbers of bedrooms, numbers of floors, property type and property age fields as well as introducing a number of new fields, listed below. The cadastral, outdoor area and footprint fields will be populated only with default values from this release onwards for licensing reasons. We hope to re-introduce the building footprint in the February 2018 release.

  • Floor area (square metres)
  • Number of transactions
  • Distance to watercourse (within 200 metres)
  • Planning classification
  • Road class
  • Storey on which flat sits
  • Is top floor flat?
  • Number of extensions
  • Wall type
  • Main central heating fuel

June 2017:

This release sees a switch to using the Royal Mail PAF, Not Yet Built and Multiple Residence files as the base address list which results in approximately 10% more addresses than earlier releases. In addition accuracy in identifying flats was improved substantially, and a number of fields pertaining specifically to flats included.

  • Distance to road
  • Burglary rate

February 2017:

This release introduced the following new fields, with a focus on logistics.

  • Year built
  • Business usage
  • Business usage
  • Congestion zone

October 2016:

This is the first public release of the Property Intelligence dataset

  • Property type
  • Property type
  • Number of floors
  • Number of bedrooms
  • Number of bedrooms
  • Number of bathrooms
  • Number of bathrooms
  • Number of rooms in total
  • Number of rooms in total
  • Listed building
  • Building construction period
  • Building footprint (square metres)
  • Last transaction price
  • Last transaction date
  • Cadastral polygon area
  • Height
  • Estimated current value
  • Estimated council tax band
  • Within 200 metres of watercourse