Cellular Oncogenes as the Ancestors of Endocrine and Paracrine Growth Factors and Their Evolutionary Relic Status in Vertebrates

S. Ohno Hämatol. Bluttransf. Vol 29

Beckman Research Institute of The City of Hope, Duarte, California 91010, USA

A Introduction

The generally held belief that any gene whose expression is precisely regulated in development ought to perform an indispensable function to the host organism is not quite correct and, in fact, has no solid foundation. It should be recalled that the DNA replication mechanism of modern organisms has developed a high degree of precision (the error rate is 10 -10 per base pair per replication or thereabout) and that whatever damage sustained by DNA is effectively rectified by multitudes of DNA repair mechanisms. Accordingly, even dispensable pcptide chains such as fibrinopeptides A and B do not change their amino acid sequences very rapidly, a 1 % change in their amino acid sequences taking roughly 1.0- 1.6 million years [ 1 ]. Under these circumstances, a gene which has lost its usefulness to the host organism does not disappear quite so readily. The half-life of an enzyme gene that has become redundant has been estimatcd as 50 million years [2].

B. Redundant and Useless Genes May Persist for 50 Million Years or More

By becoming tetraploid, an organism initially gains four alleles at every gene locus. Subsequently, each set of four homologous chromosomes differentiates into two pairs, thus completing the process of diploidization. At this stage, freshly diploidized tetraploid species are endowed with twice the number of gene loci when compared with their diploid counterparts. This is the stage, at which trout and salmon of the teleost family Salmonidae, whitefish of the family Coregonidae, and grayling of the family Thymallidae find themselves [3]. A few of those duplicated, and therefore redundant, genes manage to acquire anew role; e.g., of all the vertebrates, only diploidized tetraploid teleost fish are endowed with liver-specific lactate dehydrogenase (LDH), in addition to the customary skeletal muscle and heart LDH. The mechanism of gene duplication as the means to acquire new genes with previously nonexistent functions, however, is very inefficient, having a very low success ratio: the phrase Salvandrum paucitas, dammnundrum multitudo gives ample testimony to its high failure ratio. Accordingly, older diploidized tetraploids of the teleost family Cyprinidae as well as Catostomidae have lost progressively larger numbers of these redundant, duplicated loci by silencing mutations. Since the fossil record gives the origin of these diploidized tetraploids, Ferris and Whitt [2] were able to calculate the average half-life of enzyme loci that became redundant as 50 million years, It should be noted here that this halflife refers to the average time needed for half of the redundant enzyme loci to lose their assigned functions. After losing their assigned functions, these redundant enzyme loci may continue to code for functionless polypeptide chains. The case in point is the murine Sip locus, situated in the middle of the major histocompatibility (MHC) antigen gene complex region of the mouse genome. While the neighboring Ss locus specifies C4 (complement 4 of antibody-mediated lysis), a protein specified by the Sip locus has already lost its assigned function as C4 owing to accumulated mutations. Yet, this SIp locus was androgen dependent in most mouse strains, and operator constitutive mutation of this locus was found in wild mice [4]. It would thus appear that a substantial portion of the redundant gene loci may continue to specify functionless proteins even after 100 million years of independence from natural selection. Although mammal-Iike reptiles were already an independent lineage at the time of the dinosaurs, mammals as we know them came into being only 70 million years ago. This should make us realize the unreality of the statement that any gene loci with precisely regulated expression must be indispensable to the host organism. The fact is that genes that have outlived their usefulness may linger on for 50-100 million years.

C. Most Oncogenes are Evolutionary Relics of the Cell Autonomous Stage of Development

Although multitudes of cellular oncogenes perform divergent functions (some of their products are found in the nucleus, while others are found inside the plasma membrane), it is clear that all of them function as intracellular cell growth factors. In unicellular eukaryotes such as baker's yeast as well as in many of the multicellular eukaryotes with underdeveloped circulatory systems such as insects, these intracellular growth factors have apparently played a vital role, for it should be recalled that embryonic development of insects is still largely a cell autonomous process as discussed in detail elsewhere [5]. With the advent of the cardiovascular system, development of vertebrates became a centrally controlled affair via multitudes of peptide and steroid hormones and cellular autonomy was suppressed. While intracellular growth factors of earlier times served as ancestors of these peptide hormones as well as of their plasma membrane receptors, they themselves largely became evolutionary relics whose functions have become redundant [5].

D. Near Immortality of Certain Oncogenes Conferred on Them by Their Original Construction

If cellular oncogenes became redundant at the onset of vertebrate evolution, most of them should have become silent by now, in spite of their long estimated half-life of 50 million years, for primitive vertebrates were already in evidence more than 300 million years ago. However, the continued performance of essential functions need not be invoked to explain this persistence for more than 300 million years. The view first expressed in 1981 [6] that all the coding sequences originally were repeats of base oligomers has found increasing support from independent sources [7-9]. Provided that the number of bases in the oligomeric unit is not a multiple of three, coding sequences made of oligomeric repeats are inherently impervious to normally very damaging base substitutions, deletions, and insertions, thus possessing a near immortality. In this kind of oligomeric repeat, three consecutive copies of the oligomeric unit translated in three different reading frames gives the unit periodicity to their polypeptide chains: while nonameric repeats, the unit sequence being a multiple of three, can give only tripeptide periodicity to their peptide chains, three consecutive copies of decameric repeats encode the decapeptidic periodicity to its polypeptide chain. Thus, if one reading frame of this kind of oligomeric repeat is open, the other two are automatically open as well. It follows then that the potentially most damaging base substitution that changes an amino acid-specifying codon to the chain terminator ( e.g., Trp cod on TGG to chain-terminating TAG or TGA) merely silences one of the three open reading frames. Deletions or insertions of bases that are not multiples of three are usually as damaging, for resulting frame shifts alter downstream amino acid sequences and most often result in premature chain terminations. In this type of oligomeric repeat, such insertions and deletions are of no consequence either, for downstream amino acid sequences are not at all affected by frame shifts. In a previous paper [10], we have analyzed the published coding sequence of human c-myc gene [II] in detail. Within the 5' half of c-myc coding sequence, we identified one each of recurring base tetradecamer, duodecamer, and two monodecamers. The significance of this becomes clear once it is realized that if c-myc is a unique sequence sensu stricto, even a given base decamer is expected to recur only once every 1048576 bases. Yet, here we found a recurring base tetradecamer within a mere 687-base 5' half c-myc coding sequence. Furthermore, recurring duodecamer and monodecamers were found to represent slightly modified parts of the tetradecameric sequence GGCCGCCGCCTCCT. Thus, it was concluded that the entire 5' half of the c-myc coding sequence originated from repeats of the previously noted base tetradecamer. Since 14 is not a multiple of 3, three consecutive copies of it translated in three different reading frames would have given the following tetradecapeptidic periodicity to the original c-myc polypeptide chain, at least the amino terminal half of it

Gly Arg Arg Leu Leu
GGC CGC CGC CTC CT/G

Ala Ala Ala Ser Trp
GCC GCC GCC TCC T/GG

Pro Pro Pro Pro
CCG CCG CCT CCT

Indeed, the human c-myc coding sequence, at least the 5' half of it, apparently inherited a measure of immortality from its original construction, for we found two long, alternative open reading frames, one covering the first 30 I bases and the other from the 599th to 952nd bases. When this region of human c-myc coding sequence [ II] was compared with the corresponding region of v-myc coding sequence of avian retrovirus MC29 [12], we found that the two differed from each other not so much by amino acid substitutions as by five stretches of insertions and two stretches of deletions. Thus, c-myc's inherent imperviousness to deletions and insertions was shown [ 10].

E. Resurrection of a Silenced v-src Gene by Utilization of its Alternative Open Reading Frame

A measure of immortality inherited by some of the oncogenes from their original construction was indeed shown by the following experiment of Mardon and Varmus [ 13]. First, they established the rat cell line that was transformed by the integration into the genome of a single copy of strain 877 Rous sarcoma virus v-src coding sequence. One of the defective mutations sustaincd by the integrated v-src that deprived from the rat cell line of the transformed phenotype was identified as an insertion of a single base A between 146th Glu cod on GAA and 147th Glu codon GAG. A resulting frame shift created anew chain terminator 51 bases further downstream, thus, silencing a mutated v-src [ 13]. The surprise was the second mutation that resurrected a silenced v-src as a transforming gene. This second event was an insertion of a duplicated 242-base segment into the position between T and GG of the 148th Trp cod on in the original reading frame. This 242-base segment started from GAT representing the 68th Asp codon in the original reading frame and ended in T of 148th Trp also in the original reading frame, thus including a previously inserted A. Since the inserted segment is now translated in an alternative reading frame, the resulting double frame shifts restored the original reading frame, starting from GAG of the 147th Glu of the wild-type v-src and downward which in the resurrected v-src became the 228th Glu. Such restoration of function by an insertion in the midst of the polypeptide chain of an 81-residue new amino acid sequence is hardly believable, unless anew sequence specified by a repeated coding segment translated in an alternative open reading frame resembles parts of the preexisting amino acid sequence. Such a resemblance, in turn, is expected only if the coding sequence itself still maintains a sufficient vestige of the ancestral construction; i.e., the coding sequence originating from repeats of a base oligomer in which the number of bases in the oligomeric unit was not a multiple of three. Indeed, the existence of so long an alternative open reading frame itself is a reflection of the v-src coding sequence's ultimate derivation from oligomeric repeats, the number of bases in the oligomeric unit not being a multiple of three. As might be expected, when translated in an alternative open reading frame, amino acid residues 1-8 encoded by a duplicated 242-base segment were Thr-ProSe-Arg-Arg-Arg-Ser-Val. In the standard amino acid sequence of Rous sarcoma V-src, the very similar nonapeptide,
Thr-Pro-Ser-(Gln)-Arg-Arg-Arg-Ser-Leu customarily occupies positions 10-18 [5].

F. Summary

Contrary to the popularly held view, genes that have lost their usefulness to the host organism may continue to encode proteins for 50 million years or longer. Accordingly, precisely regulated expression of genes can not be taken as proof of their indispensability. My view is that multitudes of oncogenes of vertebrates are evolutionary relics harking back to the days of invertebrate ancestors in which embryogenesis was still a cell autonomous process. Parts of certain oncogene coding sequences originated from repeats of base oligomers whose numbers of bases were not multiples of three. Thus, these segments are still endowed with a measure of immortality in that they are impervious to normally very deleterious base substitutions, insertions, and deletions.

References

I. Dayhoff MO (ed) (1972) Atlas of protein sequences and structure. National biomedical research foundation, Silver Springs. Maryland
2. Ferris SO, Whitt GS (1977) Loss of duplicated gene expression after polyploidization. Nature 265:258-260
3. Ohno S (1970) Evolution by gene duplication. Springer, Heidelberg Berlin New York
4. Hansen TH, Shreffler OC (1976) Characterizalion of a constitutive variant of lhe murine serum prolein allolype, Slp. J Immunol 117: 1507-1513
5. Ohno S ( 1984) Repeats of base oligomers as the primordial coding sequence of lhe primeval earlh and their vestiges in modern genes. J Mol EvoI20.313-321 (1984)
6. Ohno S (1981) Original domain for the serum albumin family arose from repealed sequences. Proc Natl Acad Sci USA 79.1999 –2002
7. Blake C ( 1983) Exon -present from the beginning? Nature 306: 535-537
8. Go M (1983) Modular structural unils, exonsand functions in chicken lysozyme. Proc Natl Acad Sci USA 80: 1964-1968
9. Alexander F, Young PR, Tilghman SM (1984) Evolution of the albumin: a-feloprotein ancestral gene from the amplification of a 27 nucleolide sequence. J Mol BioI 173.159-174
10. Ohno S, Yazaki A (1983) Simple construction of human c-myc gene implicated in B-cell neoplasms and ils relalionship with avian v-myc and human lymphokines. Scand J ImmunoI18.373-388
11. Watt R, Stanton LW, Marcu KB, Gallo RC. Croce CN, Tovera G (1983) Nucleotide sequence of cloned cONA of human c-myc oncogene. Nalure 303.725- 727
12. Alitalo K, Bishop MJ. Smilh OH, Chen E, Colby WW, Levinson AO (1983) Nucleotide sequence of the v-myc oncogene of avian relrovirus MC29. Proc Natl Acad Sci USA 80. 100-105
13. Mardon G, Vermus HE (1983) Frameshift and inlragenic suppressor mutations in a Rous sarcoma provirus suggest SRC encodes two proteins. CeI132:871-879