Skip to main content Skip to main navigation


LGCA-VHPPI: A local-global residue context aware viral-host protein-protein interaction predictor

Muhammad Nabeel Asim; Muhammad Ali Ibrahim; Muhammad Imran Malik; Andreas Dengel; Sheraz Ahmed
In: PLoS ONE (PLOS), Vol. 7, Pages 1-26, PLOS, 2022.


Viral-host protein protein interaction (PPI) analysis is essential to decode the molecular mechanism of viral pathogen and host immunity processes which eventually help to control viral diseases and optimize therapeutics. The state-of-the-art viral-host PPI predictor leverages unsupervised embedding learning technique (doc2vec) to generate statistical representations of viral-host protein sequences and a Random Forest classifier for interaction prediction. However, doc2vec approach generates the statistical representations of viralhost protein sequences by merely modelling the local context of residues which only partially captures residue semantics. The paper in hand proposes a novel technique for generating better statistical representations of viral and host protein sequences based on the infusion of comprehensive local and global contextual information of the residues. While local residue context aware encoding captures semantic relatedness and short range dependencies of residues. Global residue context aware encoding captures comprehensive long-range residues dependencies, positional invariance of residues, and unique residue combination distribution important for interaction prediction. Using concatenated rich statistical representations of viral and host protein sequences, a robust machine learning framework LGCAVHPPI is developed which makes use of a deep forest model to effectively model complex non-linearity of viral-host PPI sequences. An in-depth performance comparison of the proposed LGCA-VHPPI framework with existing diverse sequence encoding schemes based viral-host PPI predictors reveals that LGCA-VHPPI outperforms state-of-the-art predictor by 6%, 2%, and 2% in terms of matthews correlation coefficient over 3 different benchmark viral-host PPI prediction datasets.