Discovery of a fundamental limit to the evolution of the genetic code

IRB Barcelona

Source -

Fig1 onlytrna pablodans3D respresentation of a transfer RNA (tRNA). These molecules are crucial for the translation of genes into proteins and they are also the reason why the genetic code cannot exceed 20 amino acid. (Author: Pablo Dans,IRB Barcelona)

A study performed at IRB Barcelona offers an explanation as to why the genetic code, the dictionary used by organisms to translate genes into protein, stopped growing 3,000 million years ago. The reason is attributed to the structure of transfer RNAs—the key molecules in the translation of genes into proteins. The genetic code is limited to 20 amino acids—the building blocks of proteins—the maximum number that prevents systematic mutations, which are fatal for life. The discovery could have applications in synthetic biology.

Nature is constantly evolving—its limits determined only by variations that threaten the viability of species.Research into the origin and expansion of the genetic code* are fundamental to explain the evolution of life. In Science Advances, a team of biologists specialised in this field explain a limitation that put the brakes on the further development of the genetic code, which is the universal set of rules that all organisms on Earth use to translate genetic sequences of nucleic acids (DNA and RNA) into the amino acid sequences that comprise the proteins that undertake cell functions.

Headed by ICREA researcher Lluís Ribas de Pouplana at the Institute for Research in Biomedicine (IRB Barcelona) and in collaboration with Fyodor A. Kondrashov, at the Centre for Genomic Regulation (CRG) and Modesto Orozco, from IRB Barcelona, the team of scientists has demonstrated that the genetic code evolved to include a maximum of 20 amino acids and that it was unable to grow further because of a functional limitation of transfer RNAs—the molecules that serve as interpreters between the language of genes and that of proteins. This halt in the increase in the complexity of life happened more than 3,000 million years ago, before the separate evolution of bacteria, eukaryotes and archaebacteria, as all organisms use the same code to produce proteins from genetic information.

The authors of the study explain that the machinery that translates genes into proteins* is unable to recognise more than 20 amino acids because it would confuse them, which would lead to constant mutations in proteins and thus the erroneous translation of genetic information “with catastrophic consequences”, in Ribas’ words. “Protein synthesis based on the genetic code is the decisive feature of biological systems and it is crucial to ensure faithful translation of information,” says the researcher.

A limitation imposed by shape

Saturation of the genetic code has its origin in transfer RNAs (tRNAs*), the molecules responsible for recognising genetic information and carrying the corresponding amino acid to the ribosome, the place where chain of amino acids are made into proteins following the information encoded in a given gene. However, the cavity of the ribosome into which the tRNAs have to fit means that these molecules have to adopt an L-shape, and there is very little possibility of variation between them. “It would have been to the system’s benefit to have made new amino acids because, in fact, we use more than the 20 amino acids we have, but the additional ones are incorporated through very complicated pathways that are not connected to the genetic code. And there came a point when Nature was unable to create new tRNAs that differed sufficiently from those already available without causing a problem with the identification of the correct amino acid. And this happened when 20 amino acids were reached,” explains Ribas.

Application in synthetic biology

One of the goals of synthetic biology is to increase the genetic code and to modify it to build proteins with different amino acids in order to achieve novel functions. For this purpose, researchers use organisms such as bacteria in highly controlled conditions to make proteins of given characteristics. “But this is really difficult to do and our work demonstrates that the conflict of identify between synthetic tRNAs designed in the lab and existing tRNAs has to be avoided if we are to achieve more effective biotechnological systems,”concludes the researcher.

This study has been funded by the Ministry of the Economy and Competitiveness, the Generalitat de Catalunya, the European Research Council (ERC) and the Howard Hughes Medical Institute in the US.

Reference article:

Saturation of recognition elements blocks evolution of new tRNA identities - Adélaïde Saint-Léger, Carla Bello-Cabrera, Pablo D. Dans, Adrian Gabriel Torres, Eva Maria Novoa, Noelia Camacho, Modesto Orozco, Fyodor A. Kondrashov, and Lluís Ribas de Pouplana - Science Advances (29 April 2016). DOI: 10.1126/sciadv.1501860


Codon wheel yourgenome www yourgenome orgGenetic code. Every three letters of genetic information (triplets or codons) correspond to one of the amino acids or to a “stop signal” that marks the end of the protein. For ex., the codon GCT represents the amino acid alanine (source:

Why do we need the genetic code?

Genetic information is stored in the cell nucleus in the form of DNA. Genes hold the information to produce proteins, which are the molecules that carry out most functions in a cell and therefore in an organism. But proteins are produced outside the nucleus, in the cytoplasm. Furthermore, genes and proteins use different languages. That used by the former is based on the letters of DNA—the 4 base types known as Adenine (A), Thymine (T), Cytosine (C) and Guanine (G). In contrast, the latter use amino acids—20 distinct molecules, which, when combined, comprise a wide variety of proteins. The genetic code is the dictionary that that nature “invented” to be able to translate from one language to another.

How are genes translated into proteins?

Genes are very long sequences formed by DNA bases (ATGCTTTTCACC...). The code reads these in sets of three, which are called triplets or codons (ATG,CTT,TTC,ACC,...). Each codon corresponds to an amino acid. For example, the codon ATG codes for the amino acid methionine and GCT for alanine. First, the genes are copied into a messenger RNA (mRNA), a nucleic acid with a simpler structure than that of DNA. This messenger molecule moves to the cytoplasm, where it can translate the information. The key molecules involved in this process are the ribosome, which is the protein “factory”, and transfer RNA (tRNA).

Proteinsyn usanationallibraryofmedicineFrom genes to proteins. DNA is copied into RNA in the cell nucleus. Messenger RNA moves to the cytoplasm where it is translated into proteins. In this last phase, transfer RNA serves as an interpreter between the two types of language. (US Library of Med)


tRNA is the physical representation of the genetic code and it speaks the languages of both DNA and proteins. In this regard, these crucial molecules have a codon recognition sequence at one end, e.g. GCT, while at the other end they have the amino acid corresponding to the codon, i.e. alanine. As the ribosome reads the mRNA, the amino acids of tRNAs start building chains to form the protein encoded by the gene.

Anticodon copy