This user guide provides information on the Property Intelligence database, prepared by GBG for customers looking to use the data. Property Intelligence is designed to assist in the process of providing insurance quotes for customers in England, Wales and Scotland by supplying data on residential properties. Property coverage:
Houses, bungalows and flats covered although the majority of testing has been on freehold properties i.e. typically houses and bungalows;
Geographic coverage:
England and Wales full coverage;
Scotland full coverage but reduced accuracy because of different policies for land and property registration and the publication of data;
Northern Ireland limited coverage, just includes estate agent data;
Isle of Man and Channel Islandsnot covered;
Property Intelligence is currently built on a quarterly basis, this is subject to review. The underlying datasets have a range of update frequencies from monthly upwards. Updates are included in the build as they become available.
2. Database Fields
Utility fields
The database contains a set of utility fields and a set of feature fields. The utility fields are as follows:
UPRN- the Unique Property Reference Number originating from the Ordnance Survey
UDPRN- the Unique Delivery Point Reference Number originating from the Royal Mail
UMRRN- the Unique Multiple Residence Reference Number originating from the Royal Mail
Address1- a standardised first line of address containing house name or number and street
Postcode- a full postcode
Easting- Ordnance Survey National Grid Easting
Northing- Ordnance Survey National Grid Northing
Latitude- latitude in ETRS89 converted from the easting using OSTN02
Longitude- longitude in ETRS89 converted from the northing using OSTN02
Output area code- Output Area (OA) code from the ONS Postcode Directory
Lower super output area code- Lower Super Output Area (LSOA) code from the ONS Postcode Directory
Output area code- Census 2021 Output Area (OA) code from the ONS Postcode Directory
Lower super output area code- Census 2021 Lower Super Output Area (LSOA) code from the ONS Postcode Directory
Country- one of England, Northern Ireland, Scotland, or Wales from the ONS Postcode Directory
Data items
The feature fields in the database are arranged in sets of three:
X - this is the data item of interest, for example a number of bedrooms for which X = “bedrooms”;
source_X- this is the source of the information, for example the number 2 indicates that this data item is sourced from the Land Registry;
p_X- this is a confidence score for the data ranging between 0 and 1. Confidence scores are calculated, where possible, as a “fraction correct” measure against a groundtruth dataset of 36,000 properties supplied by Simple and Open;
The unique key to the database is the UDPRN / UMRRN pair supplied by Royal Mail, the UPRN is also supplied. The list of data items is as follows:
Property type- whether the property is semi-detached, detached, terraced or a flat
Number of floors- the estimated number of floors in the property based on the height of the building.
Number of bedrooms- the number of bedrooms in a property
Number of bathrooms- the number of bathrooms in a property
Number of rooms in total- the number of rooms excluding bathrooms and kitchens
Building construction period- the construction date of a building in one of the following periods: (before 1719 (old), 1720-1839 (Georgian), 1840-1919 (Victorian/Edwardian), 1920-1945 (Inter-war), 1946-1979 (Post-war) and 1980 to date (Modern))
Year built- the year built, only available for those buildings in the Land Registry Price Paid data, built after 1995
Listed building- The grade of listing of a building, if it is listed, using data supplied by English Heritage, Cadw or Historic Scotland
Cadastral polygon area- the area of the cadastral parcel in which the building sits expressed in square metres using data from Land Registry
Height- the building height in metres
Building footprint (square metres)- the approximate footprint of the building expressed in square metres
Building volume (cubic metres)- the approximate volume of the building expressed in cubic metres
Average roof slope- the average slope of the property roof, can be used to identify properties with flat roofs
Flat roof fraction- the estimated fraction of a building which has a flat roof
Distance to tree- distance from the nearest tree over 10 metres tall to the property geocode
Geocode multiplicity- the number of property geocodes falling within the footprint of the building at 1.8 metres above ground level
Floor area (square metres)- the liveable floor area in square metres
Last transaction price- the price paid at the last transaction recorded by the Land Registry (England and Wales only, back to 1995)
Last transaction date- the date of the last transaction recorded by the Land Registry (England and Wales only, back to 1995)
Estimated current value- estimated current value based on data from Land Registry (England and Wales only, back to 1995)
Number of transactions- the number of transactions recorded by the Land Registry (England and Wales only, back to 1995)
Estimated council tax band- estimated council tax from price at reference years using Land Registry data (England and Wales only, back to 1995)
Within 200 metres of watercourse- flag indicating whether there is a watercourse within 200 metres
Distance to watercourse (within 200 metres)- distance (in metres) to a watercourse, if it is within 200 metres
Distance to road- the distance to the centre line of the nearest road from the property geocode, not necessarily accessible
Road class- road class, as provided by Ordnance Survey
Business usage- a flag indicating potential business usage
Planning classification- planning classification as per Town and Country Planning (Use Classes) Order 1987 for non-domestic properties
Congestion zone- a flag indicating if a property is in the London Congestion Zone
Burglary rate- the number of burglaries per property per year averaged over a LSOA (England and Wales only)
Storey on which flat sits- storey on which a flat sits. This typically contains N/A where it is not available or applicable or a number which may have been derived from a model based on the text found in the original data source
Is top floor flat?- Is a flat on the top floor of the building
Number of extensions- the number of extensions to a property, typically 1 but up to 4
Wall type- the type of wall used in construction, possible values cavity wall, solid brick, sandstone, granite, timber frame, system built and SAP05
Main central heating fuel- Main central heating fuel, possible values include gas, electricity, oil, coal, LPG, wood, B30K (a biofuel mix) and also 'not known' and 'none'
Type of tenure- type of tenure: owner-occupier, rented or social housing
Energy rating- Energy rating as indicated in the EPC Energy Certificate
EPC Inspection Date- Inspection date indicated in the EPC Energy Certificate
Technical details
Technical details for each of these fields are shown in the table below:
Title
Field name
Data type
UPRN
UPRN
Integer
UDPRN
UDPRN
Integer
UMRRN
UMRRN
Integer
Address1
address1
Text
Postcode
postcode
Text
Easting
easting
Float
Northing
northing
Float
Latitude
latitude
Float
Longitude
longitude
Float
Output area code
OA11CD
Text
Lower super output area code
LSOA11CD
Text
Output area code
OA21CD
Text
Lower super output area code
LSOA21CD
Text
Country
country
Text
Property type
property_type
Lookup
Number of floors
floors
Integer
Number of bedrooms
bedrooms
Integer
Number of bathrooms
bathrooms
Integer
Number of rooms in total
total_rooms
Integer
Building construction period
age
Lookup
Year built
year_built
Integer
Listed building
listed
Lookup
Cadastral polygon area
cadastral
Float
Height
height
Float
Building footprint (square metres)
footprint
Float
Building volume (cubic metres)
volume
Float
Average roof slope
avg_roof_slope
Float
Flat roof fraction
flat_roof_fraction
Float
Distance to tree
distance_to_tree
Float
Geocode multiplicity
geocode_multiplicity
Integer
Floor area (square metres)
floor_area
Float
Last transaction price
last_transaction_price
Integer
Last transaction date
last_transaction_date
Text
Estimated current value
est_current_value
Integer
Number of transactions
n_transactions
Integer
Estimated council tax band
est_council_tax
Lookup
Within 200 metres of watercourse
watercourse_200M
Lookup
Distance to watercourse (within 200 metres)
distance_to_water
Float
Distance to road
distance_to_road
Float
Road class
road_class
Lookup
Business usage
business_usage
Lookup
Planning classification
planning_classification
Lookup
Congestion zone
congestion_zone
Lookup
Burglary rate
burglary_rate
Float
Storey on which flat sits
flat_floor
Text
Is top floor flat?
top_floor_flat
Lookup
Number of extensions
extensions
Integer
Wall type
wall_type
Lookup
Main central heating fuel
main_fuel
Lookup
Type of tenure
tenure
Lookup
Energy rating
energy_rating
Lookup
EPC Inspection Date
epc_inspection_date
Text
Table 1: Technical details for each utility and data field. Lookup fields contain positive integers (starting from zero). source_X fields are lookup fields, p_X fields are number fields.
3. Lookup tables
Tables 2-13 are the lookup tables relating the numbers found in the database fields to descriptions for the property type, property age, Council Tax band, and data source. The Yes/No lookup is used for the 'watercourse 200M', 'congestion zone' and 'top floor flat' fields.
Yes/no lookup
Description
Value
No
0
Yes
1
Table 2: Yes/no lookup
Property type lookup
Description
Value
Detached
0
Semi-detached
1
Terraced
2
Flat
3
Unknown
4
Table 3: Property type lookup
Property age lookup
Description
Value
Before 1719 (old)
0
1720-1839 (Georgian)
1
1840-1919 (Victorian/Edwardian)
2
1920-1945 (Inter-war)
3
1946-1979 (Post-war)
4
1980 to date (Modern)
5
Not known
6
Table 4: Property age lookup
Council Tax lookup
Description
Value
A
0
B
1
C
2
D
3
E
4
F
5
G
6
H
7
I
8
N/A
100
Table 5: Council tax band lookup
Data source lookup
Description
Value
Default
0
Land Registry
2
Historic England
3
Estate agent
4
LIDAR
7
NROSH multipart
8
NROSH snapshot
9
VOA
12
Heuristic
14
ML (age)
15
Naive Bayes (age)
17
Banded VOA
18
ML (bedrooms)
19
VOA (Council Tax)
20
OS Open Rivers
21
NB (bedrooms)
24
OS Open Map
25
Transport for London
28
police.uk
29
Flats modeller
30
Cadw
33
Historic Environment Scotland
34
OS Open Roads
35
Royal Mail
36
DCLG
37
DCLG non-domestic
38
Prefix flat floor modeller
42
Flats per floor modeller
43
Nearest neighbour modeller
44
DCLG Scotland
45
DCLG Scotland non-domestic
46
Financial Services
48
Table 6: Data source lookup
Business usage lookup
Description
Value
Domestic
0
Business
1
Table 7: Business usage lookup
Main fuel lookup
Description
Value
Gas
0
Electricity
1
Oil
2
Not known
3
Coal
4
LPG
5
Wood
6
None
7
B30K
8
Other
9
Biomass/Biogas
10
District heating
11
Waste heat
12
Table 8: Main fuel lookup
Wall type lookup
Description
Value
Cavity wall
0
Solid brick
1
Sandstone
2
Timber frame
3
Granite
4
System built
5
SAP05
6
Not known
7
Table 9: Wall type lookup
Planning classification lookup
Description
Value
Not known
0
A1/A2 Retail and Financial/Professional services
1
A3/A4/A5 Restaurant and Cafes/Drinking Establishments and Hot Food takeaways
2
B1 Offices and Workshop businesses
3
B2 to B7 General Industrial and Special Industrial Groups
4
B8 Storage or Distribution
5
C1 Hotels
6
C2 Residential Institutions - Hospitals and Care Homes
7
C2 Residential Institutions - Residential schools
8
C2 Residential Institutions - Universities and colleges
9
C2A Secure Residential Institutions
10
C3 - Dwelling houses
11
D1 Non-residential Institutions - Community/Day Centre
12
D1 Non-residential Institutions - Crown and County Courts
13
D1 Non-residential Institutions - Education
14
D1 Non-residential Institutions - Libraries Museums and Galleries
15
D1 Non-residential Institutions - Primary Health Care Building
16
D2 General Assembly and Leisure plus Night Clubs and Theatres
17
Others - Passenger terminals
18
Others - Emergency services
19
Others - Miscellaneous 24hr activities
21
Others - Car Parks 24 hrs
22
Others - Stand alone utility block
23
Others - Telephone exchanges
24
Sui generis
25
Table 10: Planning classification lookup
Road class lookup
Description
Value
Unclassified
0
Not classified
1
Classified unnumbered
2
B Road
3
A Road
4
Motorway
5
Unknown "
6
Table 11: Road class lookup
Listed building grade lookup
Description
Value
Not listed
0
I or A
1
II* or B
2
II or C
3
Table 12: Listed building grade lookup
Tenure lookup
Description
Value
Owner-occupier
0
Rented
1
Social
2
Table 13: Tenure lookup
Energy rating lookup
Description
Value
A
0
B
1
C
2
D
3
E
4
F
5
G
6
Table 14: Energy rating lookup
4. Accuracy
Accuracy for the tested fields calculated using Y23M05-groundtruths.sqlite on 2023-06-22 19:13:24 against 33096 properties is shown in the table below.
Field
Accuracy (%)
Number of bedrooms
71.4
Number of bathrooms
77.4
Building construction period
68.5
Property type
82.3
Number of floors
89.9
Table 15: Summary accuracy for fields, measured against 'groundtruth' properties in England and Wales, excluding flats
5. Coverage
The following tables show dataset coverage and accuracy for number of floors, bedrooms, age and property type using the along with confidence for these attributes based on measurements against the 33,000 property groundtruth dataset covering England and Wales.
Source
Coverage
Accuracy
Confidence
DCLG
0.112
0.700
0.700
Default
0.029
0.396
0.500
Estate agent
0.441
0.797
0.850
Flats modeller
0.001
0.311
0.600
NB (bedrooms)
0.408
0.651
0.641
NROSH multipart
0.008
0.772
0.800
NROSH snapshot
0.000
1.000
0.800
Overall
1.000
0.714
0.737
Table 16: Accuracy and coverage for bedrooms
Source
Coverage
Accuracy
Confidence
Default
0.583
0.800
0.730
Estate agent
0.417
0.707
0.760
Overall
1.000
0.774
0.743
Table 17: Accuracy and coverage for bathrooms
Source
Coverage
Accuracy
Confidence
Cadw
0.000
0.636
0.630
DCLG
0.410
0.777
0.650
Heuristic
0.011
0.462
0.600
Historic England
0.004
0.565
0.540
Land Registry
0.021
0.941
0.950
Naive Bayes (age)
0.329
0.708
0.720
Overall
1.000
0.685
0.639
VOA
0.225
0.471
0.473
VOA
0.229
0.471
0.473
Table 18: Accuracy and coverage for age
Source
Coverage
Accuracy
Confidence
Banded VOA
0.057
0.630
0.597
DCLG
0.066
0.937
0.900
Default
0.001
0.457
0.540
Estate agent
0.677
0.854
0.880
LIDAR
0.195
0.747
0.800
Land Registry
0.000
0.000
0.920
NROSH multipart
0.003
0.152
0.800
Overall
1.000
0.823
0.849
Overall
1.000
0.823
0.848
Table 19: Accuracy and coverage for property_type
Source
Coverage
Accuracy
Confidence
Banded VOA
0.128
0.824
0.813
DCLG
0.099
0.962
0.940
LIDAR
0.772
0.903
0.900
Overall
1.000
0.899
0.893
Overall
1.000
0.898
0.893
Table 20: Accuracy and coverage for floors
6. Attribute distribution charts
The following charts show the distribution of values for selected fields, for domestic properties, not arising from the default model.
Figure 1: Distribution of property type
Figure 2: Distribution of number of bedrooms
Figure 3: Distribution of number of bathrooms
Figure 4: Distribution of building construction period
Figure 5: Distribution of number of floors
7. Direct data content
The following tables shows the coverage with direct data for the five fields tested against groundtruth.
Attribute
Percentage direct
Property type
88.9
Floors
73.7
Bedrooms
60.9
Bathrooms
35.5
Age
56.0
Age
55.3
Table 21: Percentage of data supplied from direct sources rather than modelled
8. Data recency
Data recency for the Property Intelligence dataset is determined by a number of factors, listed below:
The build process for Property Intelligence takes approximately 2 months from start to delivery to customer with quarterly scheduled releases;
Individual datasets have a range of update frequencies, some are static and will never be updated, others are yearly, quarterly or monthly;
Two datasets, EPC (formerly DCLG) and Estate agent data, have property-level fields which indicate when an inspection was carried out so potentially day-level data on recency could be provided;
The LIDAR data is a composite dataset, 80% of which has been collected in the last 10 years;
The table below shows the dates of the datasets used in this version of Property Intelligence along with an indication of the expected update frequency.
Dataset
Frequency
Date
Congestion Zone
Once
None
DCLG
Quarterly
2023-08-24
DCLG Scotland
Quarterly
2023-08-24
ONS Postcode to LSOA/LA lookup
Quarterly
2023-09-06
Land Registry House Price Index
Monthly
2023-08-24
Land Registry Cadastral Polygons
Quarterly
None
Land Registry Price Paid
Monthly
2023-09-05
English Heritage
Yearly
2023-06-21
Historic Environment Scotland
Yearly
2023-06-21
Cadw
Yearly
2022-05-09
NROSH
Once
2016-12-12
ONSPD
Quarterly
2023-08-23
ONS rural-urban classification
Once
2016-12-12
OS Open UPRN
Quarterly
2023-09-01
OS Open Rivers
Quarterly
None
OS Open Roads
Quarterly
None
Police.uk
Monthly
2023-08-24
Royal Mail
Monthly
2023-08-22
VOA
Yearly
2022-11-28
Estate Agent
Monthly
2023-09-01
Table 22: Data recency and frequency by dataset
The Environment Agency started to systematically cover England for LIDAR measurement in about 2005 and they have added, very approximately 5% coverage in each year since then.
9. Attributions
This dataset contains Open Data typically provided under the UK government's OGL3 license, a requirement of this license is that an attribution is provided for the data. These are as follows:
An EPC_INSPECTION_DATE is added. References to 'DCLG', the original department responsible for the EPC Energy Certificate data are replaced with 'EPC' in documentation.
The Census 2021 codings oa21cd and lsoa21cd are added, to sit alongside the Census 2011 codings. Currently the source Open Data used to derive fields in Property Intelligence still use the Census 2011 codings.
No new fields have been added in this release but sources have been updated.
We have added the DCLG Scotland data which provides a significant improvement in accuracy for property type, and property age in Scotland as well as improvements in accuracy to numbers of bedrooms and floors. DCLG Scotland also provides fields including extension count, wall type, main fuel, floor area, total rooms, tenure and energy rating which were not previously populated for Scotland.
There are improvements in the flat floor modeller such that it does not return unreasonable large values (over 90 storeys) or non-numeric values (other than N/A).
No new fields have been added in this release but sources have been updated.
We have introduced modelling for 'flat_floor' - the storey a flat sits on which improves the coverage for this field, and introduces new entries to the data sources table.
Tenure and energy rating fields have been added. Tenure is a replacement for the previously removed tenancy field. It indicates whether a property is owner-occupied, private rental or social housing. The name has been changed to retain consistency with the underlying dataset
The property age field has improved direct data content and accuracy as a result of the addition of a new dataset.
We have resumed supply of two fields which had been suspended:
the congestion zone field from the raw TFL data rather than using a third party as a supplier;
the cadastral area (building plot area) which had been suspended due to licensing issues. This is based on the Land Registry INSPIRE Polygons data;
The Land Registry House Price Index is available once again, and thus Estimated Current Values will be up to date unless a sale went through during the period in which the HPI was suspended.
As noted in the April 2020 Release notes we have removed the following fields from this release:
geocode accuracy
red route
tenancy
multiplicity
outdoor area
building count
adult occupants
The Land Registry UK House Price Index has been suspended as of the April 2020 release, due to be published in June because of the impact of COVID-19 which means limited transactions are occurring on which to base the Index. The relevant Land Registry Bulletin describing this change is here. This means the estimated current value field will contain the estimated current value at last release of the House Price Index - 1st March 2020.
This build incorporates the OS AddressBase Premium property-level easting / northing and latitude/longitude coordinates, these replace those provided by our previous supplier. This data is supplied under evaluation terms which you have already signed up to.
These will be replaced with coordinates from derived from Ordnance Survey Open Source data once this has been released in July 2020.
As a result of recent supplier changes we are also withdrawing a number of fields including the geocode accuracy and red route fields. The Congestion Zone field will remain but not be populated in the next build.
We will also be withdrawing the Tenancy field as a result of other supplier licensing changes.
Finally there are a number of fields which have not been populated for some time including multiplicity, outdoor area, building count and number of adult occupants. All of these fields are present in this build containing default values in most cases but will be removed from the July 2020 build.
The listed building field now reports the grade of listing (I, II* or II in England or Wales, A, B or C in Scotland). Previously buildings were just reported listed/not listed.
This release contains further parameters derived from LIDAR data, these include building footprint, building volume, the average roof slope, a flat roof fraction, the distance to the nearest tree over 10 metres high to the property geocode and a geocode multiplicity which counts the number of geocodes within a building. The building footprint is not listed below since it is not a new field but has been re-calculated using LIDAR data.
This release incorporates a major new dataset which has brought improved accuracy to numbers of bedrooms, numbers of floors, property type and property age fields as well as introducing a number of new fields, listed below. The cadastral, outdoor area and footprint fields will be populated only with default values from this release onwards for licensing reasons. We hope to re-introduce the building footprint in the February 2018 release.
This release sees a switch to using the Royal Mail PAF, Not Yet Built and Multiple Residence files as the base address list which results in approximately 10% more addresses than earlier releases. In addition accuracy in identifying flats was improved substantially, and a number of fields pertaining specifically to flats included.