Diogo Leite presented a Poster at the Basel Computational Biology Conference
Computational models able to predict phage-bacteria relationships through genomics data analysis
Diogo Manuel Carvalho Leite 1,2, Xavier Brochet 1,2, Grégory Resch 3, Yok-Ai Que 4, and Carlos Peña-Reyes 1,2
1 School of Business and Engineering Vaud (HEIG-VD), University of Applied Sciences of Western Switzerland (HES-SO), Yverdon-Les-Bains, Switzerland {diogo.leite,xavier.brochet,carlos.pena}@heig-vd.ch
2 SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
3 Department of Fundamental Microbiology, University of Lausanne, Lausanne, Switzerland
4 Department of Intensive Care Medicine, Bern University Hospital (Inselspital), Bern, Switzerland
The emergence and rapid dissemination of antibiotic resistance worldwide hinders medical progress and threatens with a return to the pre-antibiotic era. This therapy uses viruses that specifically infect and kill bacteria during their life cycle to reduce/eliminate bacterial load. However, as phages are highly strain-specific, the challenge is finding suitable matches to a bacterium among a fully-characterized phage library. Currently, scientists perform phage selection by means of infection tests that may take several days of lab work. We address such a challenge by combining genomic feature extraction and machine-learning predictive modelling.
To address this, we created a dataset containing more than 1000 known phage-bacteria interactions with their genomes based on public data from Genbank and phagesdb.org databases. From these genomes we extracted features, which include the distribution of protein-protein interaction scores, proteins’ amino acids frequency and chemical composition, to build a quantitative dataset to train our predictive machine learning models.
Our approach attains, in average, performance values of around 90% in terms of f-measure, accuracy, specificity, and sensitivity. In addition, they are obtained in much less time than the corresponding in-vitro experiments. This promising results encourage us to further investigate new features to extract as well as additional predictive models (e.g., weighted ensemble-learning voting system). We will also enlarge our phage-bacteria interaction database so as to increase the predictive value and the possibility of prediction at the bacterial strain level.
Computational models able to predict phage-bacteria relationships through genomics data analysis