Hibdige, S.; Larionov, A.; Harris, J.; Pawlett, M.
Soil metagenomics and chemical data from calcareous grassland restoration sites in England, 2021
https://doi.org/10.5285/8c8a836c-2117-4a36-bc1b-214b40e66feb
Cite this dataset as:
Hibdige, S.; Larionov, A.; Harris, J.; Pawlett, M. (2026). Soil metagenomics and chemical data from calcareous grassland restoration sites in England, 2021. NERC EDS Environmental Information Data Centre. https://doi.org/10.5285/8c8a836c-2117-4a36-bc1b-214b40e66feb
Download/Access
This dataset is available under the terms of the Open Government Licence
By accessing or using this dataset, you agree to the terms of the relevant licence agreement(s). You will ensure that this dataset is cited in any publication that describes research in which the data have been used.
This dataset comprises Alpha diversity and functional metrics estimations of soil bacterial and fungal communities and soil chemical properties (pH and electrical conductivity) in samples collected from 60 calcareous grassland restoration sites and a further six wildcard sites made up of ancient grasslands and rewilding sites, all in southern England. Samples were taken in 2021.
Soil samples were collected using a 10cm soil corer, with accompanying information on site management during the sampling year as well as the historic age of the restoration site. DNA was extracted from the soil and sequenced for metagenomic analysis.
Soil samples were collected using a 10cm soil corer, with accompanying information on site management during the sampling year as well as the historic age of the restoration site. DNA was extracted from the soil and sequenced for metagenomic analysis.
Publication date: 2026-03-16
12 downloads
382 views
Format
Comma-separated values (CSV)
Spatial information
Study area
Spatial representation type
Triangular irregular network
Spatial reference system
OSGB 1936 / British National Grid
Spatial resolution
10 metres
Temporal information
Temporal extent
2021-01-01 to 2021-12-31
Provenance & quality
Fieldwork instrumentation used. Basic field instruments were used to mark plot positions (GPS), set out the plots (20m tape measures), and collect soil samples (soil corer).
Methods of collection and how data was values were recorded. In each site, five circular plots (radius 10m) were established with their centres randomly decided in advance using ArcGIS. Sampling of our restoration's sites occurred in 2021. Some were subsequently resampled in 2023. Data processing and lab-based techniques continued into 2024. Soil: Five 10cm soil cores were collected using a 'W' sampling strategy covering a 60 x 60 m area in the centre of restoration grassland fields from each plot and stored at 4°C. Soil was sieved to 2mm, samples were split between storage at -20°C for DNA extraction and 4°C. Site characteristics: Information on site management and history (including site age) was recorded based on management agreements for agri-environment schemes. Surrounding land use cover was recorded based on the UKCEH Landcover Map.
Nature and units of recorded variables and processing steps.
pH: Soil was air dried and the five samples per site were pooled together. Ultrapure water was added and the samples agitated before values were measured with a pH probe. The pH reading was considered stable when the value did not vary by more than 0.02 units over five seconds. Electrical Conductivity: Soil was air dried and the five samples per site were pooled together. Ultrapure water was added and the samples agitated, centrifuged and filtered. Electrical conductivity was then measured using a probe and recorded in uS. Alpha diversity metrics: DNA was extracted from the soil using enzyme extraction and sequenced by a third-party company (Novogene) using a paired-end Illumina platform, (NovaSeq 6000) using well established amplicon regions. The data was processed using QIIME2 to generate feature tables per sample. Taxonomy was assigned to the amplicons using Greengenes 2 (bacteria) and UNITE dynamic (Fungal). Shannon, Faith PD and chao1 were calculated from the QIIME2 feature tables and the assigned taxonomy. Functional metrics (bacteria): PicrusT2 was used to estimate Enzyme Commission numbers (EC), KEGG orthologs (KO) and MEtaCyc pathway predictions were calculated using default parameters using EPA-NG to place sequences into the required reference phylogeny. QIIME2 was then used to calculate Shannon entropy and Chao1 from the pathway abundance feature tables from PicrusT2. Functional metrics (fungi): Funguild was used to estimate guild associations per sample. QIIME2 was then used to calculate Shannon entropy and Chao1 from the guild tables. Former land-use is categorical with 4 levels: Industrial (n=30), Agricultural (n=30), Ancient Woodland (n=4) and Rewilding (n=2).
Further information can be found within the supporting documentation.
Quality control/assessment applied to the data Data was collected in the field using pre-prepared data sheets. Data sheets were checked both visually before digital data entry, and any suspected errors were checked with raw field data sheets. Extracted DNA was quantified on Nanodrop spectrophotometer to measure ng/μl, 260/280, and 260/230 ratios before being sent for sequencing. The sequencing company (Novogene) provided their own quality control before and after sequencing and any that failed were re-extracted and re-sequenced. The received sequenced reads were examined using FAST-QC and MULTI-QC and trimmed accordingly with cut adapt. The reads were then quality filtered and processed using DADA2. Rarefactions plots were generated to examine feature counts to ensure feature tables were of expected quality.
Limitations on the data's reliability. Amplicon sequencing is largely only accurate to genus level and taxonomy assignment is limited by database curation. Functional analysis based on amplicon sequencing is less accurate that shotgun sequencing as functionality is inferred.
Methods of collection and how data was values were recorded. In each site, five circular plots (radius 10m) were established with their centres randomly decided in advance using ArcGIS. Sampling of our restoration's sites occurred in 2021. Some were subsequently resampled in 2023. Data processing and lab-based techniques continued into 2024. Soil: Five 10cm soil cores were collected using a 'W' sampling strategy covering a 60 x 60 m area in the centre of restoration grassland fields from each plot and stored at 4°C. Soil was sieved to 2mm, samples were split between storage at -20°C for DNA extraction and 4°C. Site characteristics: Information on site management and history (including site age) was recorded based on management agreements for agri-environment schemes. Surrounding land use cover was recorded based on the UKCEH Landcover Map.
Nature and units of recorded variables and processing steps.
pH: Soil was air dried and the five samples per site were pooled together. Ultrapure water was added and the samples agitated before values were measured with a pH probe. The pH reading was considered stable when the value did not vary by more than 0.02 units over five seconds. Electrical Conductivity: Soil was air dried and the five samples per site were pooled together. Ultrapure water was added and the samples agitated, centrifuged and filtered. Electrical conductivity was then measured using a probe and recorded in uS. Alpha diversity metrics: DNA was extracted from the soil using enzyme extraction and sequenced by a third-party company (Novogene) using a paired-end Illumina platform, (NovaSeq 6000) using well established amplicon regions. The data was processed using QIIME2 to generate feature tables per sample. Taxonomy was assigned to the amplicons using Greengenes 2 (bacteria) and UNITE dynamic (Fungal). Shannon, Faith PD and chao1 were calculated from the QIIME2 feature tables and the assigned taxonomy. Functional metrics (bacteria): PicrusT2 was used to estimate Enzyme Commission numbers (EC), KEGG orthologs (KO) and MEtaCyc pathway predictions were calculated using default parameters using EPA-NG to place sequences into the required reference phylogeny. QIIME2 was then used to calculate Shannon entropy and Chao1 from the pathway abundance feature tables from PicrusT2. Functional metrics (fungi): Funguild was used to estimate guild associations per sample. QIIME2 was then used to calculate Shannon entropy and Chao1 from the guild tables. Former land-use is categorical with 4 levels: Industrial (n=30), Agricultural (n=30), Ancient Woodland (n=4) and Rewilding (n=2).
Further information can be found within the supporting documentation.
Quality control/assessment applied to the data Data was collected in the field using pre-prepared data sheets. Data sheets were checked both visually before digital data entry, and any suspected errors were checked with raw field data sheets. Extracted DNA was quantified on Nanodrop spectrophotometer to measure ng/μl, 260/280, and 260/230 ratios before being sent for sequencing. The sequencing company (Novogene) provided their own quality control before and after sequencing and any that failed were re-extracted and re-sequenced. The received sequenced reads were examined using FAST-QC and MULTI-QC and trimmed accordingly with cut adapt. The reads were then quality filtered and processed using DADA2. Rarefactions plots were generated to examine feature counts to ensure feature tables were of expected quality.
Limitations on the data's reliability. Amplicon sequencing is largely only accurate to genus level and taxonomy assignment is limited by database curation. Functional analysis based on amplicon sequencing is less accurate that shotgun sequencing as functionality is inferred.
Licensing and constraints
This dataset is available under the terms of the Open Government Licence
Cite this dataset as:
Hibdige, S.; Larionov, A.; Harris, J.; Pawlett, M. (2026). Soil metagenomics and chemical data from calcareous grassland restoration sites in England, 2021. NERC EDS Environmental Information Data Centre. https://doi.org/10.5285/8c8a836c-2117-4a36-bc1b-214b40e66feb
Related
Correspondence/contact details
Authors
Hibdige, S.
Cranfield University
Larionov, A.
Cranfield University
Other contacts
Publisher
NERC EDS Environmental Information Data Centre
info@eidc.ac.uk
Rights holder
Cranfield University
