Statistical tools to interpret soil variation.


Understanding the geochemical composition of soil across the landscape is key to determine where land is suitable to support crop growth, flood management and the provision of raw materials, and in particular, whether risks to human-health and the environment arise through contamination by toxic constituents (Fordyce et al., 2017). However, there are major statistical challenges in obtaining this information. The geochemical composition of the soil is complex and spatially heterogeneous reflecting the multiple natural and anthropogenic sources of various elements, the different pressures and erosional processes which apply locally and the interactions between the components of the soil matrix. Heterogeneity is particularly marked in urban areas where soil could be imported or impacted by historical and recent transport, construction and industrial activities.

Soil information is generally obtained through expensive surveys requiring collection of soil samples and laboratory analyses. Novel multivariate (Gelfand et al., 2010) and spatial statistical models are required to interpret such data efficiently. Extreme value theory (Coles, 2001) is also needed to predict the occurrences of toxic elements exceeding a harmful threshold. This project will address such challenges with reference to the Geochemical Baseline Survey of the Environment (G-BASE) for the Clyde Basin (Fordyce et al., 2017). This survey consists of almost 3000 measurements of 50 chemical parameters in soil samples from urban Glasgow and the peri-urban and rural surrounds. The student will:

• Identify the soil geochemical information required by land managers to enable decisions regarding land use and potential remediation
• Develop the statistical tools needed to integrate the G-BASE measurements with other environmental data to provide this information and to quantify the uncertainty in these predictions
• Identify the interactions between different measured chemical parameters and assess whether these might be more easily inferred by analysing the composition of each soil sample as a whole rather than treating the parameters as a set of correlated variables
• Assess the spatially-varying risk of the concentrations of parameters exceeding regulatory thresholds and the tendency for such exceedances in different parameters to coincide.
• Examine the relevance of the developed methodology to other G-BASE and international soil surveys.

Click on an image to expand

Image Captions

Spatial variation of measured log(Pb) across the Clyde Basin (data owned by BGS)


The student will be based at BGS (Keyworth) where they will have day-to-day interactions with soil geochemists and environmental statisticians. They will have weekly video meetings with their University of Glasgow supervisors and make three two-week visits to the University of Glasgow each year where they will receive specialist statistical supervision. A University of Glasgow supervisor will also visit BGS to provide statistical guidance.

BGS collaborate with the French National Institute for Agricultural Research, and the student will make two visits to Orleans France to ensure the relevance of their work to the wider soil science community. The project will utilise existing datasets. Details of the phases of the project are contained in the timeline below.

Project Timeline

Year 1

The student will review the Clyde Basin G-BASE data and the environmental and industrial context of the study area. Relevant environmental datasets will be compiled (e.g. geological maps, land cover maps, satellite imagery and digital terrain models). Through discussions with BGS scientists, land managers and officials from Glasgow City Council, the student will prioritize the required soil information.

Geostatistical tools, such as linear mixed models will initially be used to integrate the datasets and make the required spatial predictions. The student will consider whether more realistic models of the relationships between the data sources could be established through the use of more general statistical approaches such as Gaussian process regression or machine learning (Hengl et al., 2018). This work will identify the most suitable methods for extracting required information about the expected composition of the soil at a particular location and scale and for quantifying the uncertainty in such predictions.

Year 2

The student will address challenges resulting from the compositional nature of the geochemical data (i.e. the need for the concentrations of each element to sum to the whole of each sample). For example, the majority of samples might contain a substantial proportion of silica, but this could be diluted through processes such as erosion leading to larger concentrations of other elements. Thus standard correlation measures might indicate a misleading link between these erosion-resistant elements. The student will consider compositional statistical models (McKinley et al., 2016) that address these issues and explore the relationships between constituents of the soil that can be inferred. This work will lead to insights into causes of spatial features within the Clyde Basin data and contribute to the ongoing academic debate regarding the importance of compositional analyses. The student will report their findings at the World Soil Congress, which takes place in Glasgow in August 2022.

Year 3

Land managers will be particularly concerned about where there is a risk that concentrations of toxic elements exceed regulatory thresholds. Standard geostatistical approaches are likely to underestimate this risk because the dataset might contain only a small number of examples of such exceedances. Also, these approaches consider each constituent of the soil individually, whereas some contaminants might be associated with each other leading to particularly grave impacts where they coincide. The student will examine how extremes models for threshold exceedances (Coles, 2001) might extend the existing geostatistical approaches. This work will lead to predictions of contamination risk at a site that reflect all of the constituents of the soil matrix.

In the second half of the year, the student will consider the wider implications of their work and their applicability to other parts of GBASE and to soil monitoring in France.

Year 3.5

The student will complete the discussion of the implications of their work and present them to the collaborators in France. The thesis write-up will be completed.

& Skills

This is a multi-disciplinary project which will require the student to develop specialist skills and knowledge in statistics, soil science, data processing and machine learning and to collaborate with scientists across these disciplines and stakeholders.

Much of the statistical and soil science training will be provided through one-to-one supervision from experts in the field. BGS offers formal training in the use of R, in geostatistical modelling and the use of geographical information systems. PhD students registered in Glasgow attend four one-week intensive courses in statistical methods in their first year, and additional courses provided by the University of Glasgow (e.g. machine learning and high-performance computing) can be followed as required.

BGS and the University of Glasgow will both provide generic training courses (e.g. communication and listening, time management and presentation skills) which are required to become a successful independent researcher.

References & further reading

Coles, S.G., 2001. An Introduction to the Statistical Modeling of Extreme Values. Springer.

Fordyce F et al., 2017. Soil geochemical atlas of the Clyde Basin. Edinburgh, UK, British Geological Survey, 126pp. (OR/14/032)

Gelfand, A.E. et al., 2010. Handbook of spatial statistics. CRC press.

Hengl T, et al., 2018. Random forest as a generic framework for predictive modelling of spatial and spatio-temporal variables. PeerJ 6:e5518.

McKinley, M. et al., 2016. The single component geochemical map: Fact or fiction? Journal of Geochemical Exploration, 162, 16-28.

Further Information

Contact Ben Marchant (, 01491 692483)

Apply Now