Research ArticleOCEANOGRAPHY

Elucidating ecological complexity: Unsupervised learning determines global marine eco-provinces

See allHide authors and affiliations

Science Advances  29 May 2020:
Vol. 6, no. 22, eaay4740
DOI: 10.1126/sciadv.aay4740
  • Fig. 1 The SAGE method workflow.

    (A) Sketch of the workflow to determine the eco-provinces; raw 55-dimensional data reduced using summation within functional groups to 11-dimensional model output, including biomass of seven functional/trophic groups of plankton and four nutrient supply rates. Negligible values and persistent ice cover areas are discarded. Data are normalized and standardized. The 11-dimensional data are given to the t-SNE algorithm to highlight statistically similar feature combinations. DBSCAN selects the clusters carefully setting parameter values. The data are finally projected back onto a latitude/longitude projection. Note that this process is repeated 10 times, as a slight stochastic element is possible through the application of t-SNE. (B) illustrates how the AEPs are arrived at by repeating the workflow in (A) 10 times. For each of the 10 realizations, the interprovince Bray-Curtis (BC) dissimilarity matrix is determined based on the biomass of the 51 phytoplankton types. The BC dissimilarity within the aggregated provinces is determined going from a complexity of 1 AEP to full complexity of 115. The BC benchmark is set by Longhurst provinces.

  • Fig. 2 Eco-provinces in geographical and t-SNE space.

    (A) Modeled nutrient supply rates, phytoplankton, and zooplankton functional group biomass as rendered by the t-SNE algorithm and colored by province using DBSCAN. Each point represents one point in the high-dimensional space, with the majority of points captured as is demonstrated in Fig. 6B. Axes refer to “t-SNE” dimensions 1, 2, and 3. (B) Geographical projection of the provinces discovered by DBSCAN onto the origin latitude-longitude grid. Colors should be considered arbitrary but correspond to (A).

  • Fig. 3 The eco-province BC dissimilarity.

    (A) BC dissimilarity metric evaluated for every province compared to every other for the global surface 20-year mean of the 51 plankton biomasses. Note the expected symmetry of the values. (B) Spatial projection of one column (or row). The global distribution of BC dissimilarity metric evaluated for a province in the oligotrophic gyre compared to every other for the global surface 20-year mean. Black (BC = 0) denotes an identical region, while white (BC = 1) denotes no similarity.

  • Fig. 4 Heuristic processes to determine a minimum level of biogeochemical complexity.

    (A, B, and D) The intraprovince BC dissimilarity is assessed as the mean BC dissimilarity of the individual grid point communities compared to the mean province with no reduction in complexity. (B) The global mean intraprovince BC dissimilarity is 0.227 ±0.117. This is the benchmark for the ecologically motivated sorting presented in this work [green line in (C)]. (C) Averaged intraprovince BC dissimilarity: The black line illustrates the intraprovince BC dissimilarity of increasing complexity. The 2σ is from 10 repeats of the eco-province recognition process. For the full complexity in the provinces discovered by DBSCAN, (A) illustrates that an intraprovince BC dissimilarity of 0.099 is reached, while sorting into a complexity of 12 as suggested by (C) results in an intraprovince BC dissimilarity of 0.200, as demonstrated in (D).

  • Fig. 5 AEP interpretation for complexity 12.

    Sorting the provinces into the 12 AEPs A to L. (A) Biomass (mgC/m3) of the ecological ensemble in the 12 provinces. (B) Nutrient flux rates (mmol/m3 per year) for dissolved inorganic nitrogen (N), iron (Fe), phosphate (P), and silicic acid (Si). Fe and P are multiplied by 16 and 16×103, respectively, so that the bars are normalized to the phytoplankton stoichiometric requirements. (C) Note the distinction between polar, subtropical gyres and dominantly seasonal/upwelling regions. Monitoring stations are marked as follows: 1, SEATS; 2, ALOHA; 3, station P; and 4, BATS.

  • Fig. 6 Setting the DBSCAN parameters.

    Setting the parameters for t-SNE, the resultant number of found clusters is used as a measure of the connectiveness (A) and the percentage of the data assigned to a cluster (B). The red dot illustrates the optimal combination of coverage and connectedness. The minimum number was set on the basis of minimum number relevant for ecology.

Supplementary Materials

  • Supplementary Materials

    Elucidating ecological complexity: Unsupervised learning determines global marine eco-provinces

    Maike Sonnewald, Stephanie Dutkiewicz, Christopher Hill, Gael Forget

    Download Supplement

    This PDF file includes:

    • Notes S1 and S2
    • Figs. S1 to S4
    • References

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article