The extreme acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been characterised by the emergence of latest variants with elevated health. Identifying new lineages and estimating their relative health is due to this fact an essential job for guiding outbreak response. However, present phylogenetic approaches don’t scale nicely when contemplating massive numbers of genomes; for example, greater than 6 million genomes can be found for SARS-CoV-2. In addition, estimates of relative health don’t normally consider the results of particular person mutations: this might assist determine which mutations are extra related to health improve, permitting for a greater prediction and understanding of the biology of transmission. In order to deal with these gaps, Fritz Obermeyer, Jacob E. Lemieux and colleagues suggest a hierarchical regression mannequin that may infer lineage health, predict the health of latest lineages, forecast future lineage proportions, and estimate the results of particular person mutations on health.
To keep away from a full phylogenetic inference, the proposed mannequin — PyR0 — first clusters genomes by genetic similarity. After clustering, sequence information are used to assemble spatiotemporal lineage prevalence counts and amino acid mutation covariates. Then, the authors match these information to a Bayesian multivariate logistic multinomial regression mannequin: the mannequin regresses lineage counts towards mutation covariates. The authors leverage PyTorch, a machine studying framework, and Pyro, a probabilistic programming language, to scale effectively to massive datasets.
The authors utilized PyR0 to all publicly accessible SARS-CoV-2 genomes, estimating the contribution of varied mutations to lineage health. Driver mutations within the Spike protein, beforehand established experimentally, had been recognized by the mannequin, and different non-Spike mutations had been related to the elevated health of BA.2 (an Omicron subvariant). In addition, the authors demonstrated the mannequin’s functionality in inferring health of latest mutations primarily based on the trajectories of different lineages wherein they’ve beforehand emerged. According to the authors, inference and prediction takes about ten minutes on a single graphics processing unit. The general scalability and predictive options of the proposed mannequin have the potential to raised assist public well being efforts in quickly detecting and understanding new lineages as they emerge.
Author info
Authors and Affiliations
Corresponding creator
About this text
Cite this text
Chirigati, F. Effects of mutations on SARS-CoV-2 health.
Nat Comput Sci (2022). https://doi.org/10.1038/s43588-022-00289-y
https://www.nature.com/articles/s43588-022-00289-y