Terry, J.C.D.; O'Sullivan, J.D.; Rossberg, A.G.
Code and synthetic data for analysis of relation between species sizes and local abundances trends
Cite this model code as:
Terry, J.C.D.; O'Sullivan, J.D.; Rossberg, A.G. (2024). Code and synthetic data for analysis of relation between species sizes and local abundances trends. NERC EDS Environmental Information Data Centre. https://doi.org/10.5285/1f7687de-d68e-4349-b4d3-b7d3a127a7df
Download/Access
PLEASE NOTE: By accessing or using this model code, you agree to the terms of the relevant licence agreement(s). You will ensure that this model code is cited in any publication that describes research in which the data have been used.
Publication of this model code by the EIDC does not signify any endorsement or approval. By accessing and using the resource, you acknowledge that it is entirely at your own risk and you are solely responsible for any loss or liability that may arise
Publication of this model code by the EIDC does not signify any endorsement or approval. By accessing and using the resource, you acknowledge that it is entirely at your own risk and you are solely responsible for any loss or liability that may arise
This model code is available under the terms of the Open Government Licence
https://doi.org/10.5285/1f7687de-d68e-4349-b4d3-b7d3a127a7df
This resource contains the R code and core results of a study seeking to identify whether there are global patterns in whether larger or smaller bodied species are showing different population trends within communities. This used the global BioTIME database (including community time series from hundreds of studies of mostly covering the last 20-50 years) and several large trait databases to gain a very large sample size (12,956 assemblage time series from 144 studies, incorporating 2,109,593 observations of 10,286 species, of which 7,234 could be linked to at least one size trait).
Data resources in this deposit include matched trait values for each species, several population trends of each species, and community level correlations between population trends and body-size trait values. Additionally, html files describing the R markdown code to produce the data resources are included. This resource does not contain the raw population and trait data, which are openly available from various sources that are listed in the supporting documentation.
The matching between global databases required a large amount of initial cleaning and filtering steps. Although the data was subject to a number of checks, with tens of thousands of species, it was too large to check all alignments manually, and trait matches assume a single bodysize value for a species across its range. The original purpose was to generate a relative rank within a community, but caution is needed for more fine-grained analyses using this approach.
Data resources in this deposit include matched trait values for each species, several population trends of each species, and community level correlations between population trends and body-size trait values. Additionally, html files describing the R markdown code to produce the data resources are included. This resource does not contain the raw population and trait data, which are openly available from various sources that are listed in the supporting documentation.
The matching between global databases required a large amount of initial cleaning and filtering steps. Although the data was subject to a number of checks, with tens of thousands of species, it was too large to check all alignments manually, and trait matches assume a single bodysize value for a species across its range. The original purpose was to generate a relative rank within a community, but caution is needed for more fine-grained analyses using this approach.
Publication date: 2024-05-01
View numbers valid from 01 May 2024 Download numbers valid from 20 June 2024 (information prior to this was not collected)
Formats
Comma-separated values (CSV), html, md
Provenance & quality
To generate the assemblage time series, we downloaded all studies available in the ‘open’ component of the BioTIME database of community time series from Zenodo. We identified studies as ‘multi-site’ or ‘single-site’ based on the number of coordinates in the BioTIME database. Single-site studies were considered as one combined assemblage, whilst widely dispersed ‘multi-site’ studies were portioned into assemblages based on a global hexagonal grid of 96 km2 cells using the dggridR package in R. We retained records from assemblages with abundance or biomass data of at least 10 distinct species and at least 5 years between the first and last record.
We used four separate trait databases that include some measure of organism size, but we did not mix information between databases. For amniotes, an amniote life history database was downloaded. For plants, we used the TRY database. For fish, we downloaded a curated database of fish traits, which in turn is largely based on data from the FishBase database. It is focused on the North Atlantic and Pacific continental shelf, but this represents the majority of the relevant BioTIME studies. For marine species, we downloaded size data from the WoRMS database.
Data cleaning was performed in order to match up species names in the trait databases and BioTIME database.
We assessed each assemblage–trait combination where ≥40% and ≥5 of the species had data for that trait and >80% of year samples contained at least 5 species. We excluded transitory species within each assemblage by including only those species that were seen in over half of the year samples. Where this filtering left data from less than 1% of the cells in the original study, we removed the whole study. Where a study included both ‘abundance’ and ‘biomass’ data, we preferentially used the abundance data. Studies with only presence–absence data were not used.
To calculate the relative change in abundance of each species, we fitted the square-root transformed and scaled species totals as a function of year for each assemblage using ordinary least-squares regression models and calculate the slope β for each species.
The main response variable τ for each assemblage was then computed as Kendall’s rank correlation coefficient between size trait values and the set of βs. Species with missing trait values were excluded from the calculation of τ. Where there were multiple assemblages per study, study-level τ was taken as a simple arithmetic mean of all assemblage-level τ values. We also test two alternative transformations of the population data.
To examine study-level determinates of τ within each size trait, for each study we calculated: (1) the mean total species richness of each assemblage over the time frame, (2) the mean assemblage-level trait data completeness, (3) the mean number of years from which there were data, (4) the mean span of years from which there were data, (5) the log10-transformed number of assemblages within the study (that is, the spatial extent), (6) the absolute latitude of the centre of the study and (7) the range of traits in the assemblage (log10(max) − log10(min)). We fitted a set of linear models to assess whether these factors could predict either τ or τ2.
All analysis used the R language, and scripts are included in the KnittedScripts folder. More information is provided in the supporting information.
We used four separate trait databases that include some measure of organism size, but we did not mix information between databases. For amniotes, an amniote life history database was downloaded. For plants, we used the TRY database. For fish, we downloaded a curated database of fish traits, which in turn is largely based on data from the FishBase database. It is focused on the North Atlantic and Pacific continental shelf, but this represents the majority of the relevant BioTIME studies. For marine species, we downloaded size data from the WoRMS database.
Data cleaning was performed in order to match up species names in the trait databases and BioTIME database.
We assessed each assemblage–trait combination where ≥40% and ≥5 of the species had data for that trait and >80% of year samples contained at least 5 species. We excluded transitory species within each assemblage by including only those species that were seen in over half of the year samples. Where this filtering left data from less than 1% of the cells in the original study, we removed the whole study. Where a study included both ‘abundance’ and ‘biomass’ data, we preferentially used the abundance data. Studies with only presence–absence data were not used.
To calculate the relative change in abundance of each species, we fitted the square-root transformed and scaled species totals as a function of year for each assemblage using ordinary least-squares regression models and calculate the slope β for each species.
The main response variable τ for each assemblage was then computed as Kendall’s rank correlation coefficient between size trait values and the set of βs. Species with missing trait values were excluded from the calculation of τ. Where there were multiple assemblages per study, study-level τ was taken as a simple arithmetic mean of all assemblage-level τ values. We also test two alternative transformations of the population data.
To examine study-level determinates of τ within each size trait, for each study we calculated: (1) the mean total species richness of each assemblage over the time frame, (2) the mean assemblage-level trait data completeness, (3) the mean number of years from which there were data, (4) the mean span of years from which there were data, (5) the log10-transformed number of assemblages within the study (that is, the spatial extent), (6) the absolute latitude of the centre of the study and (7) the range of traits in the assemblage (log10(max) − log10(min)). We fitted a set of linear models to assess whether these factors could predict either τ or τ2.
All analysis used the R language, and scripts are included in the KnittedScripts folder. More information is provided in the supporting information.
Licensing and constraints
This model code is available under the terms of the Open Government Licence
Cite this model code as:
Terry, J.C.D.; O'Sullivan, J.D.; Rossberg, A.G. (2024). Code and synthetic data for analysis of relation between species sizes and local abundances trends. NERC EDS Environmental Information Data Centre. https://doi.org/10.5285/1f7687de-d68e-4349-b4d3-b7d3a127a7df
Supplemental information
Terry, J.C.D., O’Sullivan, J.D., & Rossberg, A.G. (2022). No pervasive relationship between species size and local abundance trends. Nat Ecol Evol, 6, 140–144.
Biotime - bodysize: archived files on Zenodo
Correspondence/contact details
Terry, J.C.D.
Queen Mary University of London
London
UNITED KINGDOM
christopher.terry@biology.ox.ac.uk
UNITED KINGDOM
Authors
Rossberg, A.G.
Queen Mary University of London
Other contacts
Rights holder
Queen Mary University of London
Custodian
NERC EDS Environmental Information Data Centre
info@eidc.ac.uk
Publisher
NERC EDS Environmental Information Data Centre
info@eidc.ac.uk
Additional metadata
Funding
Natural Environment Research Council Award: NE/T003510/1
Last updated
14 November 2024 13:45