Research ArticleGENE TRANSCRIPTION

The landscape of transcription errors in eukaryotic cells

See allHide authors and affiliations

Science Advances  20 Oct 2017:
Vol. 3, no. 10, e1701484
DOI: 10.1126/sciadv.1701484
  • Fig. 1 A visual representation of the circle-sequencing assay.

    The circle-sequencing protocol identifies transcription errors (orange circles) by fragmenting RNA (green strands) into short oligonucleotides, circularizing them, and reverse-transcribing the RNA circles in a rolling-circle reaction to generate linear cDNA molecules made up of tandem repeats of the original RNA fragment (blue strands). During this step, artificial mutations may arise in the cDNA (purple circles). The cDNA is then processed to generate a library, amplified, and sequenced, during which further artifacts may arise (teal circles). However, because these artifacts are only present in one copy of the tandem repeats, they can be distinguished from true transcription errors, which are present in all tandem repeats. bp, base pair.

  • Fig. 2 Overview of transcriptional mutagenesis in yeast.

    Over the course of our experiments, we detected >200,000 transcription errors. Here, we provide a broad overview of our results at increasing levels of detail. (A) The transcription errors detected were distributed across the entire genome of yeast. (B) Although transcription errors occurred randomly across the length of a chromosome, most errors were detected in highly transcribed genes. These genes do not display an increased error rate per nucleotide but were simply sequenced at a greater frequency and thus provided the greatest amount of information to our data set. “Errors” indicate the total number of errors detected within a 100-bp interval. “Coverage” indicates the number of times a base pair in that interval was sequenced. (C) Depiction of a subset of the errors that were detected in the ADH1 gene. More than 2000 errors were detected in the ADH1 gene, affecting approximately 50% of all possible nucleotides. Each block represents a single error. Green blocks represent errors that changed the start codon of the ADH1 gene, purple errors changed its stop codon, and red errors generated premature termination codons. We also detected synonymous (orange) and nonsynonymous errors (blue), which altered almost every aspect of protein function and structure. (D) Individual errors detected in a small region of the ADH1 mRNA. (E) All errors detected in the ADH1 mRNA that are mapped onto the protein structure. All amino acids in which errors were detected are shown in red. For clarity, NAD is depicted in blue, and zinc is depicted in yellow.

  • Fig. 3 The error rate and error spectrum of transcription in yeast.

    (A) Error rate of transcripts generated by all major RNA polymerases in yeast cells. Because the error rate of transcription is >10-fold higher than the genetic mutation frequency, <1% of these errors are likely due to genetic mutation. Additional safety mechanisms have been built into our bioinformatic pipeline to identify these genetic mutations and remove them from further analysis. (B) Loss of Rpb9 and Dst1 or introduction of the rpb1E1103G allele results in error-prone transcription by RNAPII. Loss or Rpa12 results in error-prone transcription by RNAPI. (C) Error spectrum of transcripts generated by RNAPI, RNAPII, RNAPIII, and mtRNAP (mitochondrial RNAP) (D) Matrices depicting the genetic context that transcription errors occur in WT cells and three error-prone cell lines. The focal base is the base where the error occurred. The first base on the y axis is directly upstream of the focal base, whereas the second base is directly downstream. (E) All error-prone alleles that we tested resulted in a marked increase in G→A transitions by RNAPII. (F) Loss or Rpa12 results in a similar increase in G→A transitions by RNAPI.

  • Fig. 4 Frameshifts arise during transcription in yeast.

    (A) Insertions and deletions occur less frequently than base pair substitutions in yeast. (B) Homopolymeric tracts are hotspots for frameshift errors in yeast. Here, all possible homopolymer tracts (A, C, G, and T) were combined. (C) Tracts of dinucleotides are hotspots for frameshift errors in yeast as well. (D) Loss of Rpb9 and Dst1 or introduction of the rpb1E1103G allele results in an increase in frameshift errors in molecules transcribed by RNAPII, but not by RNAPI. (E) Loss of Upf2 increased the frequency of insertions in the error-prone cell lines. (F) Insertions were detected primarily at the 3′ end of genes. “Start” indicates the first codon of the transcript, whereas “Stop” indicates the stop codon. (G) Loss of Upf2 abolished the relationship between insertions and distance along a gene.

  • Fig. 5 Biological effects of transcription errors in eukaryotic cells.

    (A) Error-prone cell lines display a reduced growth rate. (B) Error-prone cells display a reduced life span. (C and D) Deletion of the molecular chaperone Ydj1 in Dst1Δ cells markedly decreases growth rate and life span, indicating that the error-prone cells exhibit proteotoxic stress. Previously, we made similar observations for rpb9Δ and rpb1E1103G cells (32). (E) A transcriptome analysis of two error-prone cell lines indicates that 75% of the genes that are overexpressed >2-fold in rpb1E1103G cells are also overexpressed in rpb9Δ cells. (F) A proteomic analysis of two error-prone cell lines indicates that 68% of the proteins that are up-regulated >2-fold in rpb1E1103G cells are also up-regulated in rpb9Δ cells. (G) List of all the genes that are up at the transcriptome level in both error-prone cell lines. Genes that were up-regulated at the protein level as well in both of the error-prone cell lines are listed in red. NS, not significant. (H and I) Metabolomic analysis of pathways that are up-regulated at the protein and transcriptome level using guanine, citrulline, and kynurenine as examples. Each point represents one biological replicate.

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/3/10/e1701484/DC1

    fig. S1. Optimizing the circle-sequencing assay.

    fig. S2. The error rate of transcription is not affected by the expression level of a gene.

    fig. S3. The error rate of transcription is not affected by the vicinity of a gene to an origin of replication.

    fig. S4. The error rate of transcription is equal along the length of a gene.

    fig. S5. Cell lines that display error-prone transcription do not exhibit elevated mutation frequencies.

    fig. S6. Transcriptional deletion rate in WT and error-prone cell lines.

    fig. S7. Multiple components of the purine synthesis and salvage pathways are affected in error-prone cells.

    fig. S8. Multiple components of nitrogen metabolism are affected in error-prone cells.

    fig. S9. Multiple components of NAD metabolism are affected in error-prone cells.

    table S1. Distribution of synonymous, missense, and nonsense errors in WT and error-prone cell lines.

    table S2. Genes significantly up-regulated >2-fold at the RNA level in rpb1E1103G cells.

    table S3. Genes significantly up-regulated >2-fold at the RNA level in rpb9Δ cells.

    table S4. Proteins significantly up-regulated >2-fold at the protein level in both rpb1E1103G and rpb9Δ cells.

    References (42, 43)

  • Supplementary Materials

    This PDF file includes:

    • fig. S1. Optimizing the circle-sequencing assay.
    • fig. S2. The error rate of transcription is not affected by the expression level of a gene.
    • fig. S3. The error rate of transcription is not affected by the vicinity of a gene to an origin of replication.
    • fig. S4. The error rate of transcription is equal along the length of a gene.
    • fig. S5. Cell lines that display error-prone transcription do not exhibit elevated mutation frequencies.
    • fig. S6. Transcriptional deletion rate in WT and error-prone cell lines.
    • fig. S7. Multiple components of the purine synthesis and salvage pathways are affected in error-prone cells.
    • fig. S8. Multiple components of nitrogen metabolism are affected in error-prone cells.
    • fig. S9. Multiple components of NAD metabolism are affected in error-prone cells.
    • table S1. Distribution of synonymous, missense, and nonsense errors in WT and error-prone cell lines.
    • table S2. Genes significantly up-regulated >2-fold at the RNA level in rpb1E1103G cells.
    • table S3. Genes significantly up-regulated >2-fold at the RNA level in rpb9Δ cells.
    • table S4. Proteins significantly up-regulated >2-fold at the protein level in both rpb1E1103G and rpb9Δ cells.
    • References (42, 43)

    Download PDF

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article