🌕 Gate Square · Creator Incentive Program Day 8 Topic– #XRP ETF Goes Live# !
Share trending topic posts, and split $5,000 in prizes! 🎁
👉 Check details & join: https://www.gate.com/campaigns/1953
💝 New users: Post for the first time and complete the interaction tasks to share $600 newcomer pool!
🔥 Day 8 Hot Topic: XRP ETF Goes Live
REX-Osprey XRP ETF (XRPR) to Launch This Week! XRPR will be the first spot ETF tracking the performance of the world’s third-largest cryptocurrency, XRP, launched by REX-Osprey (also the team behind SSK). According to Bloomberg Senior ETF Analyst Eric Balchunas,
Inspired by ChatGPT, Google DeepMind predicts 71 million genetic mutations! AI deciphers the genetic code of human genes in Science
Original source: Xinzhiyuan
After the protein prediction model AlphaFold set off a tsunami-level wave in the AI world, the Alpha family ushered in a new upstart.
Today, Google DeepMind released a new AI model-AlphaMissense, which can predict 71 million "missense mutations."
Specifically, of the 89% "missense mutations" successfully predicted by AlphaMissense, 57% were pathogenic and 32% were benign.
Only 0.1% of mutations can be confirmed by human experts.
In order for researchers to better understand its possible impact, Google has also made public the entire catalog of tens of millions of "missense mutations."
Discovering the underlying cause has long been one of the greatest challenges in human genetics.
The birth of AlphaMissense demonstrates the huge potential of AI in the medical field, especially in genetics.
It is of great significance for understanding the relationship between genetic variation and disease and developing targeted drug treatments.
Following AlphaFold, AlphaMissense may become an AI that can change the world and is expected to overcome the problems of human genetics!
Missense mutation is a genetic mutation used in the fields of biomedicine and molecular biology to describe protein-coding genes:
The substitution of a single letter in DNA results in a different amino acid in a protein.
If you think of DNA as a language, then the substitution of a single letter can change a word and completely change the meaning of a sentence.
In this case, changes to the DNA lead to changes in amino acids that affect the function of the protein.
Generally speaking, most of these missense mutations are benign and have little impact on the human body. But the remaining few are pathogenic and can severely disrupt protein function.
Missense mutations can be used for the diagnosis of rare genetic diseases, because a few or even a single missense mutation may directly cause the disease.
In addition, they are important for studying complex diseases, such as type II diabetes, which may be caused by many different types of genetic variants.
Of the more than 4 million missense mutations that have appeared in humans, only 2% have been labeled by experts as pathogenic or benign.
This represents only about 0.1% of all possible 71 million missense mutations.
But with AlphaMissense, we got the clearest image yet of the mutation's effects:
AlphaMissense can classify 89% of mutations with a threshold accuracy of 90% in a database of known disease mutations.
So, how exactly is AlphaMissense built?
Since their release, AlphaFold and AlphaFold 2 have predicted the structure of almost all proteins known to science from their amino acid sequences, more than 200 million+ proteins.
In this regard, Google researchers adapted the model based on AlphaFold (hereinafter referred to as AF), so that they can predict the pathogenicity of missense mutations that change a single amino acid in a protein.
In order to train the AlphaMissense model, it needs to be carried out in two stages:
The first stage
Train a neural network the same as AF. This neural network is inspired by large models like ChatGPT.
By predicting the identity of amino acids masked at random positions in multiple sequence alignments (MSA), it enables single-chain structure prediction, as well as protein language modeling.
The researchers made some minor architectural modifications to AF and increased the loss weights for protein language modeling, while still achieving comparable structure prediction performance to AF.
After pre-training, the masked language modeling head can already be used for mutation effect prediction by calculating the log-likelihood ratio between the reference amino acid and alternative amino acid probabilities, as in MSA Transformer and Evolutionary Scaling Modeling (EMS) Do that.
These neural networks have proven good at predicting protein structures and designing new proteins, and are especially useful for variant prediction because they already know which sequences are credible and which are not.
second stage
At this stage, the researchers fine-tuned the model on human proteins, set mutation sequences for the second line of MSA, and added variant pathogenicity classification targets.
Then, follow the method of PrimateAI to label mutations in human and primate populations.
Common mutations are considered benign, and never-before-seen mutations are considered pathogenic.
Once the model began to overfit on the validation set (2,526 Clin variants, with equal numbers of benign and pathogenic variants per gene), the researchers stopped training.
Instead, it uses AlphaFold's "intuition" about structure to identify possible disease-causing mutations in proteins.
Specifically, relevant protein sequence databases and structural context information of the mutation are used to generate a continuous score between 0 and 1 to approximate the pathogenic probability of the mutation.
This continuous score allows users to select a threshold to classify mutations as pathogenic or benign, depending on their accuracy requirements.
In experimental evaluation, AlphaMissense achieved state-of-the-art predictions across a wide range of genetic and experimental benchmarks, all without requiring explicit training on such data.
AlphaMissense outperforms other computational methods when classifying variants from Clin. Clin is a public data archive on the relationship between human variation and disease.
AlphaMissense was also the most accurate way to predict lab results, suggesting it was consistent with different ways of measuring pathogenicity.
AI changes genetics
A year ago, Google DeepMind released 200 million protein structures predicted using AlphaFold.
This initiative has helped millions of scientists around the world accelerate research and paved the way for new discoveries.
Now, AlphaMissense, based on AlphaFold, has further deepened the world's understanding of proteins by tracing the origin of DNA.
Again, a key step in translating this research is collaboration with the scientific community.
Google DeenpMind has been working with Genomics England to explore how AlphaMissense's predictions can help study the genetics of rare diseases.
Genome England cross-referenced AlphaMissense's findings with previously compiled data on the pathogenicity of known human mutations.
Google DeepMind has published a lookup table of missense mutations and shared expanded predictions of all possible 216 million single-amino acid sequence substitutions in more than 19,000 human proteins.
The published data also includes an average predicted value for each gene, which is similar to a measure of a gene's evolutionary constraints, indicating how important that gene is to an organism's survival.
(Red = predicted to be pathogenic, blue = predicted to be benign, gray = uncertain)
Left: Beta-hemoglobin subunit (HBB protein). Variations in this protein can cause sickle cell anemia.
Right: Cystic fibrosis transmembrane conductance regulator protein (CFTR protein). Variations in this protein can lead to cystic fibrosis.
Moreover, Google DeepMind has also cooperated with EMBL-EBI. Through the Ensembl mutation effect predictor, researchers will more easily apply the prediction results of AlphaMissense.
It is believed that in the near future, AlphaMissense will help solve core problems in genomics and the entire biological sciences.
References: