Dr Miguel A. Barreto-Sanz presented at the Bacteriophages in Food, Medicine and Biotechnology. St Hilda’s College, Oxford UK.

In-silico prediction of phage-bacteria interactions at a strain-level through omics data analysis and machine learning

Miguel A Barreto-Sanz^1,2, Diogo M Carvalho Leite^1,2, Xavier Brochet^1,2, Grégory Resch ³, Yok-Ai Que⁴, and Carlos Peña-Reyes ^1,2
¹ School of Business and Engineering Vaud (HEIG-VD), University of Applied Sciences of Western Switzerland (HES-SO), Yverdon-les-Bains, Switzerland
² Computational Intelligence for Computational Biology (CI4CB), SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
³ Department of Fundamental Microbiology, University of Lausanne, 1015 Lausanne, CH.
⁴ Department of Intensive Care Medicine, Bern University Hospital (Inselspital), Freiburgstrasse, 3010 Bern, CH.

Abstract
The emergence and rapid dissemination of antibiotic resistance threatens medical progress and calls for innovative approaches for the management of multidrug resistant infections. Phage-therapy, i.e., the use of viruses that specifically infect and kill bacteria during their life cycle, is a re-emerging and promising alternative to solve this problem. The success of phage therapy mainly relies on the exact matching between both the target pathogenic bacteria and the therapeutic phage. Therefore, having access to a fully characterized phage library is necessary, although not sufficient, to start with phage therapy. An essential, and obligate, second step to conceive personalized phage therapy treatments is the capacity to predict the interactions between the target pathogen and its potential phage. Several papers propose models to predict phage bacteria interactions but at a species-level. In clinical applications prediction of phage-bacteria interaction at a species-level is not enough to target a given pathogenic bacteria strain. In this paper we present an original methodology, based on an ensemble-learning approach, to predict in an in-silico way whether or not a given pair of phage-bacteria would interact at a strain-level. Using information from Genbank, phagesdb.org and the department of fundamental microbiology of the University of Lausanne, we assembled a dataset containing the genome sequences of 2’028 bacteria and 3’810 phages, in addition 2’728 interactions. A set of informative features, extracted from these genomes, form the base of the quantitative datasets used to train our predictive models. These features include the distribution of predicted protein-protein inter-action scores, as well as the amino acid frequency, the chemical composition, and the molecular weight of such proteins. Using an independent test dataset to evaluate the performance of our methodology, we obtained encouraging results of 75% AUC, 78% sensitivity and 72% specificity.