Next-generation sequencing, bioinformatics and machine learning as tools to diagnose the quality of soils


The impact of farming practices and pesticides on soil quality and health is a growing concern for consumers, farmers and land managers. To evaluate this impact, bioindicators such as protists have great potential but their use is limited because current methods do not allow for the analysis of soil samples in a detailed and rapid manner. To overcome these disadvantages, the identification of species based on DNA sequences coupled with the new next-generation sequencing techniques represents a promising approach, but the enormous amount of sequences and their large complexity makes it difficult to treat them by conventional means. It is therefore essential to develop methods that combine bioinformatics and machine learning to (i) quantify, analyze and treat sequences of protists; (ii) identifying and selecting bioindicators (a subset of protists) associated with different stressors; but also to (iii) model their relative abundance according to the different conditions, thus leading to the construction of diagnostic models. For this project, we aim to develop a biomonitoring approach in vine soils based on the quantification of metabarcoding of protists and on the predictive power of machine learning.

Our approach

The basis for this development will come from a laboratory experiment where the impact of a mixture of pesticides, temperature and soil moisture content on protist communities will be tested in microcosms and a field study where the impact of environmental factors on protist communities will be evaluated in a network of 33 vineyards from Valais.
One of the main challenges of this study is to be able to take into account the enormous abundance of protists as well as their great complexity, factors which make it difficult to treat them by conventional means. It therefore becomes essential to develop methods for (i) quantifying, analyzing and processing sequences of protists; (ii) identifying and selecting bioindicators (a subset of protists (OTUs)) that reflect the quality of vine soils; but also to (iii) model their relative abundance as a function of the different conditions. To do this, both sets of data will be processed and analyzed in a similar way with machine learning methods.

The specific aims of this project

A. Acquisition of laboratory and field experience data

  1. Annotated laboratory data
  2. Annotated field data

B. ML Highly predictive models

  1. List of Biomarkers
  2. Prediction model (s)
  3. Explanatory model

C. Environmental diagnostic tool

  1. Prototype of the prediction tool
  2. Publications


Dr Thierry Heger Changins, Nyon, Switzerland.