Data
GBNames traces the origins and spread of family groups as manifest through surnames over the distant and more recent past.
Digitally encoded historical census data for England, Scotland, and Wales provided by the Economic and Social Research Council
I-CeM project provide population-wide microdata of individuals’
names and addresses (following a secure data user agreement by researchers). This integrated collection of historic census
microdata derives from the decennial Censuses of England and Wales for 1851, 1861, 1881, 1891, 1901 and 1911,
and for Scotland for the period 1851 to 1901 (although we do not use the 1871 data). Addresses are georeferenced to parishes,
the boundaries of which have been digitised. See for more information:
Higgs and Schürer (2014).
Individual names and addresses cannot be obtained from censuses until the data are 100 years old, and so we use consumer
registers for the period 1997-2016: details of these data are available in Lansley et al. (2019) and
Van Dijk et al. (2021)
Maps
One way to analyse and compare individual surname distributions over time without needing to accommodate changing administrative
areas is to assign individual occurrences of names to the central point (centroid) of the parishes that existed when each of the
historical censuses was conducted. For the more recent past, we geocode all adult individuals found in the Consumer Registers
using georeferenced postcodes. We then map the over-all geographic distribution of these points on a surname-by-surname basis
using a technique called kernel density estimation (KDE), in effect placing a search window (kernel) over successive locations
and recording changes in the density of points across space. Details of the technique are available in the 6th Edition of
M. J. de Smith, M. F. Goodchild, and P.A. Longley (2018) Geospatial Analysis, available online at
www.spatialanalysisonline.com,
and in Cheshire and Longley (2011).
For all data, KDEs for selected years are calculated only for surnames that have at least 100 individual bearers. This threshold is used to avoid disclosing
information about living individuals and to comply with our data licensing agreements. Because data for Scotland are not available for 1911,
KDEs for this year for names that are highly concentrated in Scotland are replaced by their corresponding 1901 KDEs (only England and Wales). This is done
to avoid creating the false impression that these surnames' centres have significantly changed. The KDE calculations are parallelised
through GNU Parallel and executed using the R
scripting language (with the Sparr library). More information on these procedures
is available in one of our academic papers: Van Dijk and Longley (2020).
Further analysis of historic surname regions and population change is published in:
Kandt et al. (2020), Longley et al. (2021), and
Longley et al. (2023).
This website is powered by the Python-based open source web framework Django.
Leaflet.js has been used for creating the interactive maps. Website design is facilitated by by Bootstrap 4.
Historical Forenames / Recent Forenames
For your selected surname, we provide the most popular associated forenames for all residents between 1851-1911 and those for
adults between 1997-2016. Historical forenames are classified as male or female using census returns from the period, while a
lookup table is used for the more recent period.
Top Historical Parishes / Top Contemporary Areas
We identify the historical and present day areas in which your selected surname occurs most frequently. The Top Historical
Parishes identify Registration Districts as well as Parishes, while the Top Contemporary Areas are divisions of present day (2011) Local Authority
Districts (Middle layer Super Output Areas). These are calculated as the highest frequencies over the historical or recent periods.
Ethnicity Estimation
Given and family names provide clues as to ethnicity. We provid a rough estimate of the probable ethnicity of the surname that you entered.
Better estimates can be obtained using CDRC and other full names classification software, for more details on CDRC's Ethnicity Estimator
software that uses family naming practices, please refer to
Kandt and Longley (2018).
UK Output Area classification
This is a classification of UK neighbourhoods, arranged into Supergroups and Groups based upon 2021/22 Census of Population data.
We created this through a collaboration with the Office for National Statistics (ONS). We show the 2021/22 OAC Supergroup and Group in which your selected surname occurs most frequently. The classification
is described fully in Wyszomierski et al. (2023).
London Output Area classification
This is a classification of London's neighbourhoods, arranged into Supergroups and Groups based upon 2021 Census of Population data.
The classification was created from the 2021 Census data and was a partnership between the Consumer Data Research Centre, University College London and the University of Liverpool.
We show the 2021 LOAC Supergroup and Group in which your selected surname occurs most frequently. For more details, you can refer to the 2011 version of this classification that is fully described in
Singleton and Longley (2015).
Access to Healthy Assets and Hazards (AHAH)
AHAH is a multi-dimensional index for Great Britain developed by the Consumer Data Research Centre (CDRC) in order to describe
neighbourhoods as more or less healthy places in which to live. Various data describe access to retail facilities,
health services, and physical features. Neighbourhoods can be ranked from best to worst and we show the decile in which your
selected surname occurs most frequently. The classification is fully described in Green et al. (2018).
We use Version 3, which is updated including the most up-to-date data for 2022.
Internet User Classification
The Consumer Data Research Centre Internet User Classification describes the nature and extent of Internet usage by the residents
of neighbourhoods across Great Britain. We show the Group into which the largest number of bearers of your selected surname fall.
You can read more about this classification in Singleton et al. (2020).
Index of Multiple Deprivation (IMD)
Indices of Multiple Deprivation (IMDs) are periodically calculated by UK governments in order to summarise prevailing physical
and social conditions at neighbourhood scale. Assuming equivalence between different national measures, we use the 2019 IMDs for
England and Wales and the 2020 Index for Scotland in order to identify the modal decile within which your selected surname falls.
You can read more about the calculation of the IMDs for: England,
Wales, and Scotland.
Broadband Speed
This statistic records the modal value of the broadband speed available to bearers of your selected surname, based upon the fixed broadband download speed in Mbit/s.
You can find out more information at Broadband Speed.
Frequencies
For your selected name, we show the total number of (adult and child) occurrences for 1851-1911 (excluding 1871 because of
non-availability of data for England and Wales and excluding unavailable data for Scotland for 1911). The corresponding
figures for 1997-2016 are estimated adult frequencies.
Funding
This work is funded by the UK ESRC Consumer Data Research Centre (CDRC) grant reference ES/L011840/1 and EPSRC grant EP/M023583/1 (‘UK Regions Digital Research Facility’).