IWBBIO: International Work-Conference on Bioinformatics and Biomedical Engineering

Two new publications (Springer) for the IWBBIO conference

Computational Prediction of Host-Pathogen Interactions Through Omics Data Analysis and Machine Learning

Diogo Manuel Carvalho Leite 1,2, Xavier Brochet 1,2, Grégory Resch 3, Yok-Ai Que 4, Aitana Neves 1,2, and Carlos Peña-Reyes 1,2
1 School of Business and Engineering Vaud (HEIG-VD), University of Applied Sciences of Western Switzerland (HES-SO), Yverdon-Les-Bains, Switzerland {diogo.leite,xavier.brochet,carlos.pena}@heig-vd.ch
2 SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
3 Department of Fundamental Microbiology, University of Lausanne, Lausanne, Switzerland
4 Department of Intensive Care Medicine, Bern University Hospital (Inselspital), Bern, Switzerland

Abstract
The emergence and rapid dissemination of antibiotic resistance, worldwide, threatens medical progress and calls for innovative approaches for the management of multidrug resistant infections. Phage-therapy, i.e., the use of viruses (phages) that specifically infect and kill bacteria during their life cycle, is a re-emerging and promising alternative to solve this problem. The success of phage therapy mainly relies on the exact matching between the target pathogenic bacteria and the therapeutic phage. Currently, there are only a few tools or methodologies that efficiently predict phage-bacteria interactions suitable for the phage therapy, and the pairs phage-bacterium are thus empirically tested in laboratory. In this paper we present an original methodology, based on an ensemble-learning approach, to predict whether or not a given pair of phage-bacteria would interact. Using publicly available information from Genbank and phagesdb.org, we assembled a dataset containing more than two thousand phage-bacterium interactions with their corresponding genomes. A set of informative features, extracted from these genomes, form the base of the quantitative datasets used to train our predictive models. These features include the distribution of predicted protein-protein interaction scores, as well as the amino acid frequency, the chemical composition, and the molecular weight of such proteins. Using an independent test dataset to evaluate the performance of our methodology, our approach gets encouraging performance with more than 90% of accuracy, specificity, and sensitivity.

DOI: 10.1007/978-3-319-56154-7_33;


A Meta-Review of Feature Selection Techniques in the Context of Microarray Data

Zahra Mungloo-Dilmohamud 1, Yasmina Jaufeerally-Fakim 1, and Carlos Peña-Reyes 2
1 University of Mauritius, Reduit, Mauritius
2 University of Applied Sciences Western Switzerland (HES-SO), School of Business and Engineering Vaud (HEIG-VD), Swiss Institute of Bioinformatics (SIB), CI4CB, Computational Intelligence for Computational Biology Group, Yverdon, Switzerland

Abstract
Microarray technologies produce very large amounts of data that need to be classified for interpretation. Large data coupled with small sample sizes make it challenging for researchers to get useful information and therefore a lot of effort goes into the design and testing of feature selection tools; literature abounds with description of numerous methods. In this paper we select five representative review papers in the field of feature selection for microarray data in order to understand their underlying classification of methods. Finally, on this base, we propose an extended taxonomy for categorizing feature selection techniques and use it to classify the main methods presented in the selected reviews.