"Dissimilarity measures affected by richness differences yield biased delimitations of biogeographic realms"

Recently, Costello et al.1 (hereinafter COS) established 30 marine biogeographic realms, complementing similar work on terrestrial biotas2. However, in our opinion, their methods had two major limitations. First, the results were not reproducible based on the reported methods. Second, they defined regions using Jaccard similarity (βjac), but this index is not appropriate for the delimitation of biogeographic regions3 because it is affected by differences in species richness4. Therefore, sites with impoverished biotas are considered dissimilar and thus can be identified as a distinct biogeographic region, even if that region has no unique species. This bias is particularly problematic when the sampling effort is uneven, which COS acknowledge to be the case in their dataset1. Based on these limitations, we argue that the marine biogeographic realms published by COS1 should be reconsidered in light of the recommendations we provide here. When defining biogeographic realms, the choice of the measure of dissimilarity between cells is fundamental. Indices that account only for the replacement component of assemblage dissimilarity4–6 and are thus independent of richness differences7, as Simpson’s dissimilarity index8 (βsim), must be selected3. COS1 also analyse their data using βsim, but argue that their results are robust to these alternative measures. However, we observe important discrepancies between their main result showing marine realms based on βjac (see Fig. 2 in ref.1) and their map showing realms based on βsim (see Fig. 3c in ref.1). For instance, in the former, the Atlantic Ocean is divided in two regions (northern and southern), and there is a separate region in the Indian Ocean, while in the latter all these regions seem to be lumped into one. We used the dataset provided by COS1 in their Supplementary Material (species presence-absence in 5 ̊ × 5 ̊ cells) to test if we could define similar marine biogeographic regions by using βsim between cells and well established procedures for delineating biogeographic regions2,3. We also used βjac with the aim to reproduce the results of the authors. All analyses were conducted in R9 using the scripts provided in Supplementary Software 1. Given the large differences in sampling effort across cells, we removed cells with fewer than 5 species, following Costello’s et al.1 procedure (not explicit in the text, but it can be deduced from the cells missing in their maps (see Fig. 3c-d in ref.1). Nonetheless, alternative analyses based on the complete presence-absence table (Supplementary Fig. 1) yield regionalisations that are roughly similar to our main result. From the presence-absence table we obtained a matrix of dissimilarities between cells using function beta.pair() in package betapart10. We then performed a hierarchical cluster analysis on this matrix of dissimilarities, using function hclust() in R9. Unlike the selection of the dissimilarity measure, choosing the clustering algorithm is not straightforward, and there are two criteria that could be maximised2: (i) cluster internal coherence (minimising the dissimilarities within clusters and maximising the dissimilarities between them), and (ii) correlation between the original dissimilarities and the cophenetic distances in the dendrogram. The Ward clustering algorithm is intended to maximise the first criterion, and according to previous contributions2, average clustering performs well for the second criterion. We thus implemented both and assessed their performance as measured by ANOSIM tests11 (command anosim () in package vegan) for the first criterion, and as measured by the correlation (Spearman ρ) between βsim dissimilarities and cophenetic distances for the second criterion. Ward clustering consistently performed better for the first criterion, yielding higher internal coherence of clusters than average clustering, for any number of clusters greater than 6. In turn, average clustering performed better than Ward clustering for the second criterion (Spearman ρ= 0.43 vs. ρ= 0.30, respectively). The average clustering method, as used by COS1 yielded unbalanced dendrograms, and as a result, most newly defined clusters consisted of only one cell or very few cells (see Supplementary Figs. 2–3). COS1 started with more than 200 clusters and then manually lumped them into 30 realms, an approach which implies that different realms are defined at varying levels of dissimilarity and introduces subjective decisions in the biogeographic classification. In contrast, Ward clustering yielded a more balanced regionalisation. In our view, DOI: 10.1038/s41467-018-06291-1 OPEN

Keywords: average clustering; cos1; criterion; biogeographic realms; ward clustering; dissimilarity

Journal Title: Nature Communications
Year Published: 2018

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended