Computationally predicting bacterial strain susceptibility to phages
openNIAID - National Institute of Allergy and Infectious Diseases
Summary. The prevalence of antibiotic resistant (AR) bacterial infections continues to grow. Although the
development of novel antimicrobial compounds is one approach to combat AR infections, phage therapy or using
lytic phage to treat bacterial infections, offers another solution that has many attractive benefits, including the
specificity of infection, leaving the healthy microbiome intact, low toxicity, and the diversity of phages available.
However, before phage therapy can be widely used in the clinic, one of the significant challenges that must be
addressed is the selection of which phage to use for a given bacterial pathogen. Phage infection specificity is
complicated by the fact that bacteria encode a diverse array of phage defense systems that block phage
infection, typically expressed by horizontally transferrable DNA elements. The bacterial pathogen must also
encode and express the phage receptor and any bacterial host factors that the phage requires for successful
replication and phage production. Currently, bacterial pathogens are manually screened against large phage
biobanks to select phage cocktails that can provide effective in vivo killing. Although this has been effective, such
an approach is costly and, more importantly, time-intensive, and it will be challenging to scale up as phage
therapy becomes more widely used. The field, therefore, needs rapid and cost-effective approaches to identify
effective phages for any bacterial pathogen, given the genome sequence of the bacteria and phages. The MPIs
of this proposal, Ravi and Waters, will use their diverse expertise in bacterial pathogenesis, phage biology and
defense, AR, microbial genomics, and computational biology, to develop an ML-based prediction model that can
identify effective phage and phage resistance-associated molecular features for any given E. coli strain. Another
critical outcome of this work will be the gold-standard data set generated in Aim 1 that will define the successful
infection of 69 dsDNA E. coli phage in the well-characterized BASEL phage collection with ~600 sequenced
pathogenic and non-pathogen E. coli strains generating ~42,000 unique data points. Aim 2 will first define all
known phage defense, AR, and virulence elements in this collection of E. coli and merge these annotated
features with the phage host infection phenotypes generated in Aim 1 using (un)supervised ML-based
approaches (e.g., logistic regression, random forest) to generate models that can predict effective phage
infections of any given E. coli host, along with the underlying molecular features (genes, proteins, domains)
culminating in resistance/susceptibility. This model will be validated with 50 new E. coli strains. Successful
completion of this proposal will generate a clinically useful predictive model for E. coli and lay the framework for
generating such predictive models for phage therapies against other bacterial pathogens. Moreover, the model
will lead to the discovery of novel phage defense elements and bacterial factors that impact phage infection, and
a deeper understanding of how bacterial pathogens evolve resistance to phage infection, knowledge, which can
be used to effectively tailor phage therapy to prevent widespread emergence of resistance.
Up to $435K
Deadline: 2028-01-31
Health