Property Intelligence User Guide
This user guide provides information on the Property Intelligence database, prepared by GBG for customers looking to use the data. Property Intelligence is designed to assist in the process of providing insurance quotes for customers in England, Wales and Scotland by supplying data on residential properties. Property coverage:
- Houses, bungalows and flats covered although the majority of testing has been on freehold properties i.e. typically houses and bungalows;
Geographic coverage:
- England and Wales full coverage;
- Scotland full coverage but reduced accuracy because of different policies for land and property registration and the publication of data;
- Northern Ireland limited coverage, just includes estate agent data;
- Isle of Man and Channel Islands not covered;
Property Intelligence is currently built on a quarterly basis, this is subject to review. The underlying datasets have a range of update frequencies from monthly upwards. Updates are included in the build as they become available.
2. Database Fields
Utility fields
The database contains a set of utility fields and a set of feature fields. The utility fields are as follows:
- UPRN - the Unique Property Reference Number originating from the Ordnance Survey
- UDPRN - the Unique Delivery Point Reference Number originating from the Royal Mail
- UMRRN - the Unique Multiple Residence Reference Number originating from the Royal Mail
- Address1 - a standardised first line of address containing house name or number and street
- Postcode - a full postcode
- Easting - Ordnance Survey National Grid Easting
- Northing - Ordnance Survey National Grid Northing
- Latitude - latitude in ETRS89 converted from the easting using OSTN02
- Longitude - longitude in ETRS89 converted from the northing using OSTN02
- Output area code - Output Area (OA) code from the ONS Postcode Directory
- Lower super output area code - Lower Super Output Area (LSOA) code from the ONS Postcode Directory
- Output area code - Census 2021 Output Area (OA) code from the ONS Postcode Directory
- Lower super output area code - Census 2021 Lower Super Output Area (LSOA) code from the ONS Postcode Directory
- Country - one of England, Northern Ireland, Scotland, or Wales from the ONS Postcode Directory
Data items
The feature fields in the database are arranged in sets of three:
- X - this is the data item of interest, for example a number of bedrooms for which X = “bedrooms”;
- source_X - this is the source of the information, for example the number 2 indicates that this data item is sourced from the Land Registry;
- p_X - this is a confidence score for the data ranging between 0 and 1. Confidence scores are calculated, where possible, as a “fraction correct” measure against a groundtruth dataset of 36,000 properties supplied by Simple and Open;
The unique key to the database is the UDPRN / UMRRN pair supplied by Royal Mail, the UPRN is also supplied. The list of data items is as follows:
- Property type - whether the property is semi-detached, detached, terraced or a flat
- Number of floors - the estimated number of floors in the property based on the height of the building.
- Number of bedrooms - the number of bedrooms in a property
- Number of bathrooms - the number of bathrooms in a property
- Number of rooms in total - the number of rooms excluding bathrooms and kitchens
- Building construction period - the construction date of a building in one of the following periods: (before 1719 (old), 1720-1839 (Georgian), 1840-1919 (Victorian/Edwardian), 1920-1945 (Inter-war), 1946-1979 (Post-war) and 1980 to date (Modern))
- Year built - the year built, only available for those buildings in the Land Registry Price Paid data, built after 1995
- Listed building - The grade of listing of a building, if it is listed, using data supplied by English Heritage, Cadw or Historic Scotland
- Cadastral polygon area - the area of the cadastral parcel in which the building sits expressed in square metres using data from Land Registry
- Height - the building height in metres
- Building footprint (square metres) - the approximate footprint of the building expressed in square metres
- Building volume (cubic metres) - the approximate volume of the building expressed in cubic metres
- Average roof slope - the average slope of the property roof, can be used to identify properties with flat roofs
- Flat roof fraction - the estimated fraction of a building which has a flat roof
- Distance to tree - distance from the nearest tree over 10 metres tall to the property geocode
- Geocode multiplicity - the number of property geocodes falling within the footprint of the building at 1.8 metres above ground level
- Floor area (square metres) - the liveable floor area in square metres
- Last transaction price - the price paid at the last transaction recorded by the Land Registry (England and Wales only, back to 1995)
- Last transaction date - the date of the last transaction recorded by the Land Registry (England and Wales only, back to 1995)
- Last transaction duration type - the duration type of the last transaction recorded by the Land Registry (England and Wales only, back to 1995)
- Estimated current value - estimated current value based on data from Land Registry (England and Wales only, back to 1995)
- Number of transactions - the number of transactions recorded by the Land Registry (England and Wales only, back to 1995)
- Estimated council tax band - estimated council tax from price at reference years using Land Registry data (England and Wales only, back to 1995)
- Within 200 metres of watercourse - flag indicating whether there is a watercourse within 200 metres
- Distance to watercourse (within 200 metres) - distance (in metres) to a watercourse, if it is within 200 metres
- Distance to road - the distance to the centre line of the nearest road from the property geocode, not necessarily accessible
- Road class - road class, as provided by Ordnance Survey
- Business usage - a flag indicating potential business usage
- Planning classification - planning classification as per Town and Country Planning (Use Classes) Order 1987 for non-domestic properties
- Congestion zone - a flag indicating if a property is in the London Congestion Zone
- Burglary rate - the number of burglaries per property per year averaged over a LSOA (England and Wales only)
- Storey on which flat sits - storey on which a flat sits. This typically contains N/A where it is not available or applicable or a number which may have been derived from a model based on the text found in the original data source
- Is top floor flat? - Is a flat on the top floor of the building
- Number of extensions - the number of extensions to a property, typically 1 but up to 4
- Wall type - the type of wall used in construction, possible values cavity wall, solid brick, sandstone, granite, timber frame, system built and SAP05
- Main central heating fuel - Main central heating fuel, possible values include gas, electricity, oil, coal, LPG, wood, B30K (a biofuel mix) and also 'not known' and 'none'
- Type of tenure - type of tenure: owner-occupier, rented or social housing
- Energy rating - Energy rating as indicated in the EPC Energy Certificate
- EPC Inspection Date - Inspection date indicated in the EPC Energy Certificate
Technical details
Title | Field name | Data type |
---|---|---|
UPRN | UPRN | Integer |
UDPRN | UDPRN | Integer |
UMRRN | UMRRN | Integer |
Address1 | address1 | Text |
Postcode | postcode | Text |
Easting | easting | Float |
Northing | northing | Float |
Latitude | latitude | Float |
Longitude | longitude | Float |
Output area code | OA11CD | Text |
Lower super output area code | LSOA11CD | Text |
Output area code | OA21CD | Text |
Lower super output area code | LSOA21CD | Text |
Country | country | Text |
Property type | property_type | Lookup |
Number of floors | floors | Integer |
Number of bedrooms | bedrooms | Integer |
Number of bathrooms | bathrooms | Integer |
Number of rooms in total | total_rooms | Integer |
Building construction period | age | Lookup |
Year built | year_built | Integer |
Listed building | listed | Lookup |
Cadastral polygon area | cadastral | Float |
Height | height | Float |
Building footprint (square metres) | footprint | Float |
Building volume (cubic metres) | volume | Float |
Average roof slope | avg_roof_slope | Float |
Flat roof fraction | flat_roof_fraction | Float |
Distance to tree | distance_to_tree | Float |
Geocode multiplicity | geocode_multiplicity | Integer |
Floor area (square metres) | floor_area | Float |
Last transaction price | last_transaction_price | Integer |
Last transaction date | last_transaction_date | Text |
Last transaction duration type | last_transaction_duration_type | Lookup |
Estimated current value | est_current_value | Integer |
Number of transactions | n_transactions | Integer |
Estimated council tax band | est_council_tax | Lookup |
Within 200 metres of watercourse | watercourse_200M | Lookup |
Distance to watercourse (within 200 metres) | distance_to_water | Float |
Distance to road | distance_to_road | Float |
Road class | road_class | Lookup |
Business usage | business_usage | Lookup |
Planning classification | planning_classification | Lookup |
Congestion zone | congestion_zone | Lookup |
Burglary rate | burglary_rate | Float |
Storey on which flat sits | flat_floor | Text |
Is top floor flat? | top_floor_flat | Lookup |
Number of extensions | extensions | Integer |
Wall type | wall_type | Lookup |
Main central heating fuel | main_fuel | Lookup |
Type of tenure | tenure | Lookup |
Energy rating | energy_rating | Lookup |
EPC Inspection Date | epc_inspection_date | Text |
3. Lookup tables
Tables 2-13 are the lookup tables relating the numbers found in the database fields to descriptions for the property type, property age, Council Tax band, and data source. The Yes/No lookup is used for the 'watercourse 200M', 'congestion zone' and 'top floor flat' fields.
Yes/no lookup
Description | Value |
---|---|
No | 0 |
Yes | 1 |
Property type lookup
Description | Value |
---|---|
Detached | 0 |
Semi-detached | 1 |
Terraced | 2 |
Flat | 3 |
Unknown | 4 |
Property age lookup
Description | Value |
---|---|
Before 1719 (old) | 0 |
1720-1839 (Georgian) | 1 |
1840-1919 (Victorian/Edwardian) | 2 |
1920-1945 (Inter-war) | 3 |
1946-1979 (Post-war) | 4 |
1980 to date (Modern) | 5 |
Not known | 6 |
Council Tax lookup
Description | Value |
---|---|
A | 0 |
B | 1 |
C | 2 |
D | 3 |
E | 4 |
F | 5 |
G | 6 |
H | 7 |
I | 8 |
N/A | 100 |
Data source lookup
Description | Value |
---|---|
Default | 0 |
Land Registry | 2 |
Historic England | 3 |
Estate agent | 4 |
LIDAR | 7 |
NROSH multipart | 8 |
NROSH snapshot | 9 |
VOA | 12 |
Heuristic | 14 |
ML (age) | 15 |
Naive Bayes (age) | 17 |
Banded VOA | 18 |
ML (bedrooms) | 19 |
VOA (Council Tax) | 20 |
OS Open Rivers | 21 |
NB (bedrooms) | 24 |
OS Open Map | 25 |
Transport for London | 28 |
police.uk | 29 |
Flats modeller | 30 |
Cadw | 33 |
Historic Environment Scotland | 34 |
OS Open Roads | 35 |
Royal Mail | 36 |
DCLG | 37 |
DCLG non-domestic | 38 |
Prefix flat floor modeller | 42 |
Flats per floor modeller | 43 |
Nearest neighbour modeller | 44 |
DCLG Scotland | 45 |
DCLG Scotland non-domestic | 46 |
Financial Services | 48 |
Business usage lookup
Description | Value |
---|---|
Domestic | 0 |
Business | 1 |
Main fuel lookup
Description | Value |
---|---|
Gas | 0 |
Electricity | 1 |
Oil | 2 |
Not known | 3 |
Coal | 4 |
LPG | 5 |
Wood | 6 |
None | 7 |
B30K | 8 |
Other | 9 |
Biomass/Biogas | 10 |
District heating | 11 |
Waste heat | 12 |
Wall type lookup
Description | Value |
---|---|
Cavity wall | 0 |
Solid brick | 1 |
Sandstone | 2 |
Timber frame | 3 |
Granite | 4 |
System built | 5 |
SAP05 | 6 |
Not known | 7 |
Planning classification lookup
Description | Value |
---|---|
Not known | 0 |
A1/A2 Retail and Financial/Professional services | 1 |
A3/A4/A5 Restaurant and Cafes/Drinking Establishments and Hot Food takeaways | 2 |
B1 Offices and Workshop businesses | 3 |
B2 to B7 General Industrial and Special Industrial Groups | 4 |
B8 Storage or Distribution | 5 |
C1 Hotels | 6 |
C2 Residential Institutions - Hospitals and Care Homes | 7 |
C2 Residential Institutions - Residential schools | 8 |
C2 Residential Institutions - Universities and colleges | 9 |
C2A Secure Residential Institutions | 10 |
C3 - Dwelling houses | 11 |
D1 Non-residential Institutions - Community/Day Centre | 12 |
D1 Non-residential Institutions - Crown and County Courts | 13 |
D1 Non-residential Institutions - Education | 14 |
D1 Non-residential Institutions - Libraries Museums and Galleries | 15 |
D1 Non-residential Institutions - Primary Health Care Building | 16 |
D2 General Assembly and Leisure plus Night Clubs and Theatres | 17 |
Others - Passenger terminals | 18 |
Others - Emergency services | 19 |
Others - Miscellaneous 24hr activities | 21 |
Others - Car Parks 24 hrs | 22 |
Others - Stand alone utility block | 23 |
Others - Telephone exchanges | 24 |
Sui generis | 25 |
Road class lookup
Description | Value |
---|---|
Unclassified | 0 |
Not classified | 1 |
Classified unnumbered | 2 |
B Road | 3 |
A Road | 4 |
Motorway | 5 |
Unknown " | 6 |
Listed building grade lookup
Description | Value |
---|---|
Not listed | 0 |
I or A | 1 |
II* or B | 2 |
II or C | 3 |
Tenure lookup
Description | Value |
---|---|
Owner-occupier | 0 |
Rented | 1 |
Social | 2 |
Energy rating lookup
Description | Value |
---|---|
A | 0 |
B | 1 |
C | 2 |
D | 3 |
E | 4 |
F | 5 |
G | 6 |
Last transaction duration type lookup
Description | Value |
---|---|
Not known | 0 |
Freehold | 1 |
Leasehold | 2 |
4. Accuracy
Accuracy for the tested fields calculated using Y24M05-groundtruths.sqlite on 2024-06-08 13:53:38 against 33096 properties is shown in the table below.
Field | Accuracy (%) |
---|---|
Number of bedrooms | 71.4 |
Number of bathrooms | 77.4 |
Building construction period | 68.8 |
Property type | 82.3 |
Number of floors | 89.9 |
5. Coverage
The following tables show dataset coverage and accuracy for number of floors, bedrooms, age and property type using the along with confidence for these attributes based on measurements against the 33,000 property groundtruth dataset covering England and Wales.
Source | Coverage | Accuracy | Confidence |
---|---|---|---|
DCLG | 0.116 | 0.697 | 0.700 |
Default | 0.029 | 0.397 | 0.500 |
Estate agent | 0.450 | 0.794 | 0.850 |
Flats modeller | 0.001 | 0.318 | 0.600 |
NB (bedrooms) | 0.397 | 0.652 | 0.641 |
NROSH multipart | 0.007 | 0.769 | 0.800 |
NROSH snapshot | 0.000 | 1.000 | 0.800 |
Overall | 1.000 | 0.714 | 0.739 |
Source | Coverage | Accuracy | Confidence |
---|---|---|---|
Default | 0.574 | 0.800 | 0.730 |
Estate agent | 0.426 | 0.707 | 0.760 |
Overall | 1.000 | 0.774 | 0.743 |
Source | Coverage | Accuracy | Confidence |
---|---|---|---|
Cadw | 0.000 | 0.636 | 0.630 |
DCLG | 0.424 | 0.778 | 0.650 |
Default | 0.010 | 0.495 | 0.470 |
Heuristic | 0.011 | 0.462 | 0.600 |
Historic England | 0.004 | 0.565 | 0.540 |
Land Registry | 0.021 | 0.940 | 0.950 |
Naive Bayes (age) | 0.314 | 0.713 | 0.720 |
Overall | 1.000 | 0.688 | 0.637 |
VOA | 0.216 | 0.473 | 0.472 |
Source | Coverage | Accuracy | Confidence |
---|---|---|---|
Banded VOA | 0.055 | 0.629 | 0.596 |
DCLG | 0.069 | 0.939 | 0.900 |
Default | 0.002 | 0.417 | 0.540 |
Estate agent | 0.681 | 0.852 | 0.880 |
LIDAR | 0.189 | 0.747 | 0.800 |
Land Registry | 0.000 | 0.000 | 0.920 |
NROSH multipart | 0.003 | 0.152 | 0.800 |
Overall | 1.000 | 0.823 | 0.850 |
Source | Coverage | Accuracy | Confidence |
---|---|---|---|
Banded VOA | 0.122 | 0.828 | 0.814 |
DCLG | 0.103 | 0.961 | 0.940 |
Default | 0.003 | 0.752 | 0.840 |
LIDAR | 0.772 | 0.903 | 0.900 |
NROSH multipart | 0.000 | 1.000 | 0.800 |
Overall | 1.000 | 0.899 | 0.893 |
6. Attribute distribution charts
The following charts show the distribution of values for selected fields, for domestic properties, not arising from the default model.
![](/media/hszdqiy2/chart-1.png?rmode=max&width=500&height=375)
![](/media/q5edixq0/chart-2.png?rmode=max&width=500&height=375)
![](/media/lptbldwu/chart-3.png?rmode=max&width=500&height=375)
![](/media/wqbbgdvr/chart-4.png?rmode=max&width=500&height=375)
![](/media/ruif0ukh/chart-5.png?rmode=max&width=500&height=375)
7. Direct data content
The following tables shows the coverage with direct data for the five fields tested against groundtruth.
Attribute | Percentage direct |
---|---|
Property type | 89.0 |
Floors | 73.9 |
Bedrooms | 61.5 |
Bathrooms | 36.0 |
Age | 57.1 |
8. Data recency
Data recency for the Property Intelligence dataset is determined by a number of factors, listed below:
- The build process for Property Intelligence takes approximately 2 months from start to delivery to customer with quarterly scheduled releases;
- Individual datasets have a range of update frequencies, some are static and will never be updated, others are yearly, quarterly or monthly;
- Two datasets, EPC (formerly DCLG) and Estate agent data, have property-level fields which indicate when an inspection was carried out so potentially day-level data on recency could be provided;
- The LIDAR data is a composite dataset, 80% of which has been collected in the last 10 years;
The table below shows the dates of the datasets used in this version of Property Intelligence along with an indication of the expected update frequency.
Dataset | Frequency | Date |
---|---|---|
Congestion Zone | Once | None |
DCLG | Quarterly | 2024-04-30 |
DCLG Scotland | Quarterly | 2024-04-30 |
ONS Postcode to LSOA/LA lookup | Quarterly | 2024-01-31 |
Land Registry House Price Index | Monthly | 2024-04-30 |
Land Registry Cadastral Polygons | Quarterly | None |
Land Registry Price Paid | Monthly | 2024-05-01 |
English Heritage | Yearly | 2023-06-21 |
Historic Environment Scotland | Yearly | 2023-06-21 |
Cadw | Yearly | 2022-05-09 |
NROSH | Once | 2016-12-12 |
ONSPD | Quarterly | 2024-02-19 |
ONS rural-urban classification | Once | 2016-12-12 |
OS Open UPRN | Quarterly | 2024-04-01 |
OS Open Rivers | Quarterly | None |
OS Open Roads | Quarterly | None |
Police.uk | Monthly | 2024-04-30 |
Royal Mail | Monthly | 2024-04-24 |
VOA | Yearly | 2023-11-27 |
Estate Agent | Monthly | 2024-05-01 |
The Environment Agency started to systematically cover England for LIDAR measurement in about 2005 and they have added, very approximately 5% coverage in each year since then.
Figure 6: Cumulative percentage of LIDAR coverage
9. Attributions
This dataset contains Open Data typically provided under the UK government's OGL3 license, a requirement of this license is that an attribution is provided for the data. These are as follows:
- EPC: Contains public sector information licensed under the Open Government Licence v3.0.
- EPC Scotland: Contains public sector information licensed under the Open Government Licence v3.0.
- LIDAR - Environment Agency: (c) Environment Agency copyright and/or database right (2019). All rights reserved.
- LIDAR - Scottish Government: Crown copyright Scottish Government, SEPA and Scottish Water (2012)
- LIDAR - Lle: Contains Natural Resources Wales information © Natural Resource Wales and Database Right. All rights Reserved
- Land Registry Price Paid: Contains HM Land Registry data © Crown copyright and database right 2024. This data is licensed under the Open Government Licence v3.0.
- Land Registry INSPIRE Polygons: This information is subject to Crown copyright and database rights [2024] and is reproduced with the permission of HM Land Registry. The polygons (including the associated geometry, namely x, y co-ordinates) are subject to Crown copyright and database rights [2024] Ordnance Survey 100026316.
- Listed buildings England: Historic England (2024). Contains Ordnance Survey data © Crown copyright and database right (2024). The Historic England GIS Data contained in this material was obtained on 2023-06-21
- Listed buildings Wales: Designated Historic Asset Descriptive Information, The Welsh Historic Environment Service (Cadw), 2023-06-21, licensed under the Open Government Licence http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
- Listed buildings Scotland: Contains Historic Environment Scotland and Ordnance Survey data © Historic Environment Scotland - Scottish Charity No. SC045925 © Crown copyright and database right [2024]
- Office for National Statistics: Source: Office for National Statistics licensed under the Open Government Licence v.3.0
- Office for National Statistics: Contains Royal Mail data © Royal Mail copyright and database right [2024]
- Office for National Statistics: Contains OS data © Crown copyright and database right [2024]
- Office for National Statistics: Contains GeoPlace data © Local Government Information House Limited copyright and database right [2024] [100050727]
- Ordnance Survey: Contains OS data © Crown copyright and database right (2024)
- Transport for London: Powered by TfL Open Data. Contains OS data © Crown copyright and database rights 2016 and Geomni UK Map data © and database rights [2019]