Optimized gene expression from bacterial chromosome by high-throughput integration and screening

See allHide authors and affiliations

Science Advances  12 Feb 2021:
Vol. 7, no. 7, eabe1767
DOI: 10.1126/sciadv.abe1767


Chromosomal integration of recombinant genes is desirable compared with expression from plasmids due to increased stability, reduced cell-to-cell variability, and elimination of the need for antibiotics for plasmid maintenance. Here, we present a new approach for tuning pathway gene expression levels via random integration and high-throughput screening. We demonstrate multiplexed gene integration and expression-level optimization for isobutanol production in Escherichia coli. The integrated strains could, with far lower expression levels than plasmid-based expression, produce high titers (10.0 ± 0.9 g/liter isobutanol in 48 hours) and yields (69% of the theoretical maximum). Close examination of pathway expression in the top-performing, as well as other isolates, reveals the complexity of cellular metabolism and regulation, underscoring the need for precise optimization while integrating pathway genes into the chromosome. We expect this method for pathway integration and optimization can be readily extended to a wide range of pathways and chassis to create robust and efficient production strains.


Microbial biosynthesis is a sustainable, high-specificity approach to achieve chemical conversions with the potential to produce a vast assortment of pharmaceutical, fuel, and commodity chemicals. Development of a high-performing production strain for a desired molecule generally requires tuning the expression levels of native and/or heterologous genes in hosts such as Escherichia coli (1, 2). While extrachromosomal multicopy plasmids offer a convenient method for rapidly prototyping different expression levels of genes and the majority of metabolic engineering efforts use them, they have several challenging attributes. First and foremost, plasmids are unstable, which has been extensively documented and studied because of the importance of plasmids in biotechnology (39). Cells with plasmids can suffer from both structural instability, in which the plasmid is still carried but mutations inactivate the gene of interest, as well as segregational instability, in which some cells no longer carry the plasmid (10, 11). Plasmids can also multimerize, which decreases their stability (12, 13). Second, plasmid copy number can vary between cells within stable populations, and the degree of variation is often not well characterized even for widely used plasmids (14). Münch et al. (15) presented a notable example of cellular heterogeneity during plasmid-based production of recombinant protein in Bacillus megaterium. The authors found that, under the condition of strong selective pressure, 30% of the cell population was in a low-producing state. This heterogeneity was found to be a result of asymmetric plasmid distribution. Moreover, plasmids require a selective pressure, which is typically an antibiotic molecule. The addition of antibiotics raises the process costs, puts additional stress on the cells, and is not always effective in enforcing maintenance of the plasmid. Kanamycin, for example, can become an ineffective selection agent at high phosphate concentrations (16). Furthermore, because of the rise of multidrug-resistant pathogens, there is strong motivation to reduce antibiotic usage (17).

Chromosomal integration and expression of genes is an effective way of avoiding the issues of plasmids described above. This alternative approach leads to more stable and robust production strains as well as reduces leakiness of inducible expression systems, which are highly desirable traits for large-scale and long-term production processes (18, 19). Mairhofer et al. (20) compared a plasmid-based with a chromosomally integrated expression system and found a much stronger substantial stress response in the plasmid-carrying strain. They identified massive overtranscription of the plasmid-based gene of interest, leading to diversion of ribosomes and other cellular resources, as the source of the metabolic burden.

Considerable research efforts have been devoted to developing techniques for synthetic integrations (2123). The two tools most widely used in prokaryotes for targeted integration of constructs are homologous recombination by expression of the RecET or λ-Red proteins (2426) and site-specific recombination (27). Homologous recombination enables rapid integration into a desired locus but is only suitable for constructs up to several kilobases in size and that do not carry substantial homology to another part of the genome. Site-specific recombination is less restrictive of the input construct but requires a bacterial chromosomal attachment site (attB) in the recipient genome, which can place limitations on the location of integration. This site restrictiveness can be circumvented by first inserting the attachment site into the target locus (28). Yet, despite the recognized desirability of chromosomal integration and the genetic tools available for targeted integrations, achieving suitable gene expression levels from the chromosome for optimal production remains far more challenging than manipulation of gene expression via plasmids, even in well-studied organisms such as E. coli (22).

Chromosomal gene expression levels are dependent on multiple determinants. Similar to plasmid-based gene expression, elements of the genetic construct, including promoters, ribosome-binding sites, and enhancers or activators, can be exploited to optimize heterologous gene expression. Yet, single-copy expression from the chromosome is generally weaker than that of the same construct from a multicopy plasmid. Subsequently, there have been efforts in increasing the copy number on the chromosome to achieve high-level gene expression (18, 29).

Another determinant affecting the expression level from the chromosome strongly is the location of gene integration. It has long been known that during exponential growth, genes closer to the origin of replication experience higher gene dosage and, therefore, higher expression levels (30, 31). More recently, it has been shown that other factors related to genomic position, such as level of DNA compaction or proximity to active genes, can also affect gene expression levels (32, 33). In examining transcription levels of a reporter gene at various sites in the E. coli genome, Bryant et al. (32) and Scholz et al. (33) observed considerable differences, up to ~300-fold, in expression across the E. coli genome, excluding gene dosage effects. On the other hand, it has been suggested that in nature, microorganisms may use transposon-mediated transfer of catabolic genes to different genomic locations as a means of fine-tuning their expression levels to adapt to new environmental conditions (34). These findings suggest that modulation of gene integration position can serve as a useful tool for regulating expression levels. For instance, Loeschcke et al. (35) have developed an approach for transfer and expression (TREX) of biosynthetic pathways in bacteria using Tn5 transposase to integrate gene clusters into random sites of the chromosomes and, based on resulting colony color intensities, observed a range of production levels for pigmented secondary metabolites.

Synthetic biology applications targeting the production of a small molecule often require delicately balanced cellular resource allocation between production and growth, as well as across different genes of the biosynthetic pathway. It remains elusive to predict the optimal gene expression levels for most pathways due to the highly entangled nature of cellular metabolic and regulatory networks. In addition, for chromosomal gene expression, it is not fully understood yet how the expression of different regions of the chromosome is affected by changes under environmental conditions and whether/how an inserted gene construct will affect the native expression level of the insertion region. It is, therefore, very difficult to rationally decide where to integrate a gene construct. Here, we present a new metabolic engineering strategy for optimizing a pathway’s performance by tuning the gene expression through position-dependent expression variation (Fig. 1). In this method, we use the Tn5 transposase to randomly integrate pathway genes in the E. coli genome in a multiplexed fashion. We subsequently screen the libraries using syntrophic coculture amplification of production (SnoCAP), which converts the production phenotype into a readily screenable growth phenotype, as described in our previous work (36). We demonstrate the approach using a pathway for the production of isobutanol, a promising drop-in biofuel (37).

Fig. 1 New approach for optimizing chromosomal heterologous gene expression by position-dependent expression variation.

(A) Gene expression level varies depending on distance from the origin of replication, as well as other position-dependent factors. The expression level of a heterologous pathway gene affects the production phenotype of a strain. Increasing the expression level of a gene can improve production level by increasing flux through the pathway, but avoiding too high of an expression level can also provide benefits such as reducing the buildup of toxic intermediates and avoiding unnecessary metabolic burden. (B) Libraries with pathway genes integrated into various genome locations can be screened for production phenotype using syntrophic coculture amplification of production (SnoCAP) (36). A cross-feeding coculture is configured consisting of a secretor strain producing a target molecule and a sensor strain that cannot synthesize the target molecule autonomously. The cocultures are compartmentalized in microfluidic droplets, incubated to allow cogrowth, and then sorted to identify the strains with high production phenotypes.

Efficient isobutanol production in E. coli has been achieved by diverting the branched-chain amino acid pathway intermediate 2-ketoisovalerate (2-KIV) for conversion into isobutanol (Fig. 2A) (38). It has been demonstrated that by overexpressing acetolactate synthase (AlsS) from Bacillus subtilis, ketol-acid reductoisomerase (IlvC) and dihydroxy-acid dehydratase (IlvD) from E. coli (for the conversion of pyruvate to 2-KIV), and 2-ketoacid decarboxylase (KivD) and alcohol dehydrogenase (AdhA) from Lactococcus lactis (for the conversion of 2-KIV to isobutanol), production can reach up to 84% of the theoretical yield under aerobic batch conditions (38) and titers can reach up to 50.8 g/liter under fed-batch conditions (39). Resolution of a cofactor imbalance has resulted in strains reaching 100% of theoretical yield under anaerobic conditions (40), and a recently developed strategy for isobutanol-linked growth selection under anaerobic conditions enabled optimization of pathway expression levels for achieving optimal production rates (41).

Fig. 2 Screening of libraries with alsS and ilvCD integrated by Tn5.

(A) Isobutanol production pathway (36, 52). Acetolactate synthase (AlsS), ketol-acid reductoisomerase (IlvC), dihydroxy-acid dehydratase (IlvD), 2-ketoacid decarboxylase (KivD), and alcohol dehydrogenase (AdhA). Colored boxes correspond to the integrations in libraries A (magenta), B (yellow), and C (green). (B) Histograms comparing fluorescence signal from droplets containing cocultures of fluorescent sensor strain with unlabeled secretor strain libraries, compared with the parent strain (i). The libraries were constructed by Tn5 integration of kan-PLlacO1-alsS (library A) (ii), PLlacO1-ilvD followed by kan-PLlacO1-alsS (library B) (iii), or PLlacO1-ilvCD followed by kan-PLlacO1-alsS (library C) (iv). Each histogram comprises data from the analysis of about 60,000 droplets. a.u., arbitrary units.

While much work with this pathway has relied on plasmid-based expression of the pathway genes, there is also interest in developing chromosomally integrated strains. Akita et al. (42) have demonstrated the integration of the five pathway genes into the E. coli genome, resulting in a titer of 6.8 g/liter and a yield of 55%, without the addition of costly antibiotics or inducers. Bassalo et al. (43) developed a new CRISPR-based method for high-efficiency, single-step chromosomal integration in E. coli and, in demonstrating its substantial size capability, reported integration of the entire five-gene isobutanol pathway into a single chromosomal locus, leading to an isobutanol titer of 2.2 g/liter from glucose (85 g/liter).

In the present work, we demonstrate that the generation of libraries with the isobutanol pathway genes integrated into diverse chromosomal positions combined with high-throughput production phenotype screening is an effective means of generating high-performance chromosomally integrated strains. E. coli strains resulted from this study show, to our knowledge, the highest titers and yields yet reported with the isobutanol pathway genes expressed exclusively from the chromosome.


Construction and screening of Tn5 integration libraries

We began by constructing an integration library (library A) in which a construct containing the alsS gene, under the PLlacO1 promoter, and a kanamycin-resistance gene was integrated by Tn5 transposase into the genome of JCL260 ∆lysA in random locations. JCL260 was developed by Atsumi et al. (38) as an isobutanol production strain, and its genome features six gene deletions created to decrease by-product formation, as well as the lacIq mutation for reduced leakiness of genes under the lac promoter before induction. We introduced the ∆lysA deletion into JCL260 in our previous work to render the strain a lysine auxotroph for applying the SnoCAP screening approach (36). In the SnoCAP approach, a library is coencapsulated in water-in-oil microdroplets with a fluorescent sensor strain that is auxotrophic for the target molecule of interest. The library itself is also auxotrophic for a second, orthogonal molecule (here, lysine) supplied by the sensor strain. This cross-feeding configuration enables conversion of the production phenotype into a fluorescent output, amplified by coculture growth (36). To screen for overproducers of 2-KIV, an intermediate in the isobutanol pathway (Fig. 2A), we use a sensor strain with a deletion of the ilvD gene (encoding a dihydroxy-acid dehydratase), which prevents it from converting 2,3-dihydroxy-isovalerate into 2-KIV. This strain can only grow when it is provided an exogenous source of 2-KIV and/or branched-chain amino acids. We demonstrated in our previous work that this sensor can be used to identify strains that will overproduce isobutanol once the genes responsible for converting 2-KIV to isobutanol (kivD and adhA) are added to them (36).

The aforementioned integration of alsS into JCL260 ∆lysA was accomplished by polymerase chain reaction (PCR) amplification of the alsS and kan genes with primers that add the 19–base pair (bp) mosaic end sequence recognized by Tn5, followed by reaction with Tn5, and electroporation of cells with the resulting transposome. The approximate library size was determined to be ~3.4 ± 1.1 × 104 by plating a small quantity of the cells after a short initial recovery on kanamycin-containing plates. The remainder of the library was subjected to a liquid kanamycin selection and subsequently screened by SnoCAP.

The cells are loaded into microdroplets at densities such that all droplets contain sensor cells, and most droplets contain either zero or one secretor cell from the integration library. Thus, the cogrowth phenotype of individual library members may be assessed. The microdroplet compartmentalization format enables high-throughput screening and isolation of droplets of interest by fluorescence-activated droplet sorting (44).

Fluorescence measurement of the droplets containing library A members cocultured with fluorescent sensor strain showed that more droplets exhibited growth than when droplets contained cocultures with the parent strain, which lacks the alsS gene, as the secretor (Fig. 2B, i and ii). We sorted the top 0.3% of droplets with the highest fluorescence signal and plated the pool of droplets on plates with tetracycline to select for the secretor cells and against the sensor cells (the parent strain of the secretor strain libraries contains a tetracycline resistance gene in the F plasmid, whereas the sensor strain is tetracycline sensitive). We assessed the isobutanol production of colonies obtained from the screening and colonies from the unscreened library, after transformation with plasmid pSA65, which encodes KivD and AdhA. Of 12 colonies chosen at random from the unscreened library, 10 produced less than 0.1 g/liter after 48 hours of fermentation (similar to the parent strain), 1 produced 0.4 g/liter, and the other produced 2.1 g/liter (fig. S1). Of 10 randomly selected colonies from the sorted pool, all produced more than 2.5 g/liter (fig. S1). There is a significant difference in isobutanol production between these two groups (P < 0.0001, unpaired two-sample t test), which demonstrates the effectiveness of the screening method.

We also attempted integration of the entire three-gene operon (alsS and ilvCD), along with the kanamycin resistance cassette, in one transposome. This is a 6.3-kb construct, compared with 3 kb for kan-alsS. While we could generate a substantial number of kanamycin-resistant cells from the whole operon integration (6.0 ± 2.7 × 103 in a single integration), upon encapsulation with the sensor strain and incubation, we did not detect any cogrowth in the droplets at either 0.5 or 1.0 g/liter norvaline, whereas the kan-alsS integration library led to droplets displaying detectable cell growth at both of these norvaline concentrations. We hypothesized that this lack of growth could be due to a problem with the library generation, such as incomplete integrations.

Because of the above observation that the longer construct had a lower integration efficiency, we chose to minimize construct length and use the ilvD gene as a selection marker by means of its essentiality in minimal medium, rather than an antibiotic. We deleted ilvD from JCL260 ∆lysA and integrated constructs containing either both genes ilvC and ilvD behind the PLlacO1 promoter or only the ilvD gene behind the same promoter. We performed the selection in minimal M9 medium supplemented with isopropyl-β-d-thiogalactopyranoside (IPTG) and lysine, with a small amount of the transformed cells plated on solid medium for library size estimation and the rest undergoing liquid selection. These integrations produced far smaller library sizes than the alsS integration construct (~102 integrants per electroporation versus ~104 for alsS). We determined that this was due to substantially lower survival rates in the minimal medium selection compared with kanamycin selection in LB medium. For example, when JCL260 ∆lysAilvD was transformed with pSA69, which carries alsS/ilvCD, as well as a kanamycin resistance gene, the number of colonies recovered on LB plates with kanamycin was two orders of magnitude higher than the number on M9 plates with lysine and IPTG. Further optimization of the recovery method, such as supplementation of additional nutrients or seeding of small quantities of the branched-chain amino acids, may help improve this lower-than-expected recovery rate and increase the diversity of the library. It should also be noted that the survival rate on the solid medium may be different from that in the liquid medium, which could contribute to inaccuracies in estimated library size. Deep sequencing of the library could be used to quantify the library size more accurately. We repeated the ilvD and ilvCD library generation several times and pooled the integrants, resulting in libraries of each of size ~500 to 1000 (estimated from agar plating). We then integrated the alsS-kan transposome into these libraries, which led to ~1.1 ± 0.5 × 104 integrants for the ilvD/alsS library (library B) and 1.3 ± 0.3 × 104 for the ilvCD/alsS library (library C). Therefore, the library should consist of about 10 different alsS integration locations for each ilvD or ilvCD integration library member.

When assessed by the SnoCAP method (library members coencapsulated in microdroplets with sensor strain, incubated for cogrowth, and the fluorescence of the droplets measured), the ilvD and ilvCD integration libraries displayed no measurable cogrowth. The ilvD/alsS and ilvCD/alsS libraries (libraries B and C, respectively), on the other hand, showed substantially more fluorescence compared with the library with only alsS integrated (library A) (Fig. 2B).

We sorted the droplets containing cocultures of these libraries and recovered colonies from the pools of droplets, and then transformed the isolates with pSA65 and assessed isobutanol production. The best isolates we identified came from library C. In particular, an isolate termed as C7 reached a production level closely approaching that of the strain expressing the pathway genes from a plasmid (Fig. 3A). With the goal of understanding the relationship between production phenotype and genome position–dependent gene expression, we decided to examine isolates displaying a spectrum of production levels and, hence, also isolated several library members with low- and intermediate-production levels (Fig. 3A) by screening the random libraries via the microplate format of SnoCAP.

Fig. 3 Production phenotype and gene insertion locations of select library isolates.

(A) Isobutanol titers of library isolates displaying various production levels. Samples were taken 48 hours after induction of the pathway genes. Each of the isolates has been transformed with the pSA65 plasmid expressing kivD and adhA. Parent strain JCL260 ΔlysA and JCL260 ΔlysA pSA69, the latter of which expresses the alsS and ilvCD genes from the pSA69 plasmid, each also carrying pSA65, are included for comparison. Error bars represent the SD of three biological replicates. (B) Gene integration locations of the isolates whose production is shown in (A).

Genomic characterization of select library isolates

We determined the insertion site locations in the selected isolates by transposon footprinting, which also allowed us to eliminate duplicates of the same genotype and resulted in a panel of unique isolates reported in this work. The integration locations were found to be spread across the whole genome (Fig. 3B and Table 1). Each of the library B isolates, B1 and B2, contains integrations into the F plasmid, a large (~100 kb) plasmid previously introduced to the base strain to supply lacIq for increased levels of the lac repressor. The F plasmid is stably maintained in E. coli because of an active partitioning system (45, 46), and it has been suggested that placing heterologous genes on the F plasmid may provide improved stability over the use of other plasmids (47, 48). It is also interesting that in isolate B2, both integration constructs are found to be located on the F plasmid.

Table 1 Gene integration locations.

View this table:

Another noteworthy observation was that two of the disrupted genes, ttdT and yjhI, are genes in which we had found mutations during previous work in which we chemically mutagenized JCL260 ΔlysA, screened for improved isobutanol productivity, and resequenced the genome of an improved isolate (36). In that study, the strain with the best production level had a mutation in the ttdT gene leading to an alanine-to-valine substitution at position 390 of the protein, and yjhI contained a mutation leading to a premature stop codon after the first 187 amino acids (of 227).

In the best isolate, C7, the two integration constructs landed in ttdT and yqiH, two genes that are quite close to each other, separated by only 18 kb (Fig. 4A). To better understand the genetic mechanism underlying the high-production phenotype of this isolate, we investigated whether the gene disruptions, besides expression of the inserted genes, can contribute to increased production. We knocked out the ttdT and yqiH genes individually in a base strain JCL260 ΔlysA. Into each deletion strain, we then integrated alsS in the yghX site, which we have previously seen leads to an intermediate production level (and therefore should allow us to see changes in the production that may not be visible in a very low– or high–producing background strain). After transformation of the strains with pSA65, we tested isobutanol production and found that the ttdT deletion strain produces more isobutanol to a statistically significant degree compared with the strain having wild-type ttdT, whereas yqiH deletion does not have a statistically significant effect (Fig. 4B).

Fig. 4 Examination of the effect of gene disruptions incurred during the Tn5 integrations in the C7 isolate on production phenotype.

(A) Schematic of the locations of kan-PLlacO1-alsS and PLlacO1-ilvCD integrations in isolate C7, determined by transposon footprinting. (B) Isobutanol production of strains with alsS genomically integrated, containing plasmid pSA65 encoding kivD and adhA, and having deletions of either ttdT or yqiH, compared with the same strain without either deletion. Data shown are from samples taken 72 hours after induction with IPTG. Error bars in (B) represent the SD of three biological replicates. *** indicates significant difference at P < 0.0001 using an unpaired t test. n.s., not significant.

Characterization of pathway gene transcript levels in select library isolates

We next analyzed gene expression at the transcript level in various isolates of different production levels. We harvested the RNA during exponential growth (3 hours after induction of the pathway genes with IPTG), reverse transcribed it, and analyzed gene expression of alsS, ilvC, and ilvD by quantitative real-time PCR (qPCR) (Fig. 5B), while continuing to grow the cultures and measuring their isobutanol production 48 hours following induction (Fig. 5A). It was found that the pathway gene expression in each of the integrated strains was, as expected, far lower than that in the strain expressing alsS/ilvCD from the pSA69 plasmid (Fig. 5B, note the log scale of the y axis). Yet, with considerably lower expression levels of all three genes, the top isolate C7 showed a production level very close to that of the plasmid strain. Although random integration may affect cell growth, all of the examined isolates displayed similar growth patterns under the culture conditions used in this study (fig. S2). This is likely because integrants with major growth defects were lost during the stages of the library construction and screening process that involved monoculture growth (liquid selection after transposon integration and selection on agar plates after SnoCAP screening).

Fig. 5 Investigation of the expression levels underlying production phenotypes of select library isolates.

(A) Isobutanol production 48 hours after induction of the cultures from which the RNA had been harvested. (B) Expression levels of alsS, ilvC, and ilvD, relative to alsS level in JCL260 ΔlysA pSA65/9. RNA was harvested 3 hours after induction with IPTG. * indicates no amplification during the qPCR. Data from two biological replicate cultures of each strain are shown. Error bars in (B) represent the SD of three technical replicates.

We noted from the quantified transcript levels that there was a lack of strong correlation between the expression level and the distance from the origin. For instance, close examination of the four isolates from library A (A1 to A4) showed that there was a large variation in the expression level of alsS between these isolates, whose integration sites were at 3.8, 4.5, 4.2, and 1.4 Mb, respectively (Fig. 3B). Yet, the expression level of A1 was very low despite the integration’s close proximity to the replication origin (oriC, 3.9 Mb), whereas A4 exhibited a very high expression level even though the integration site was the farthest from the origin among these four integrants (Fig. 5B). This observation is consistent with previous studies (32, 33) and exemplifies that position-dependent expression is affected by a myriad of known and yet-to-be-elucidated factors, instead of simply the gene dosage determined by the distance from the origin.

With the gene transcript profiles of these isolates, we sought to elucidate the relationship between gene expression and production phenotype. Some interesting observations were made. For instance, predictably, isolates from library A have lower expression of ilvD than isolates from libraries B and C. However, ilvC can be seen to be up-regulated even in certain strains that do not have ilvC insertion (A3, A4, and B1). These strains have high expression levels of alsS, so it is likely that accumulation of acetolactate, the product of AlsS, is inducing higher transcription levels of ilvC in these strains, as has been previously observed (49). We also noted that there was a similar trend of relative expression levels within the pathway (ilvC highest followed by alsS and then ilvD) in the medium- and high-production isolates (A3, A4, B1, C5, C6, and C7), which was also seen in the plasmid strain. Most notably, however, there was a lack of a clear relationship between gene expression levels and production. In particular, despite the fact that the top four analyzed isolates (B1, C5, C6, and C7) all showed a much higher expression of ilvD than the next two, substantially lower-producing, isolates (A3 and A4), it is not clear what causes C7 to outperform in production than the other three isolates or why differences in gene expression levels across these three do not lead to differences in production. Examination of the low-production isolates also provided some insights. Low expression, especially of alsS, appears to have caused low production in A1 and A2. Another isolate C1 showed much higher expression of the three pathway genes but did not exhibit increased production. We suspect that this could be due to deleterious effects of disrupting the genes in the integration sites or to poor pathway balancing. The above findings underscore the highly entangled and complex nature of optimizing pathway gene expression for a specific production phenotype.

We have here analyzed expression only at the mRNA level, and it should be noted that posttranscriptional processes often have considerable effects on protein abundance and activity (50). Further study involving quantitative proteomics and enzyme activity measurements may provide further insights into the mechanisms underlying the performance of the integrated strains.

Construction and characterization of strains with all isobutanol pathway genes integrated

To generate strains in which the entire isobutanol pathway is expressed from the chromosome, we integrated the adhA and kivD genes into isolate C7. Because in the work of Atsumi et al. (38), these genes were expressed from a high–copy number plasmid, we expected that high expression levels would be needed to achieve suitable production levels. We used the chemically inducible chromosomal evolution (CIChE) method of Tyo et al. (18) for generating multiple copies of a gene in a single integration site. In this approach, the genes of interest are placed alongside an antibiotic resistance marker between 1-kb homology regions that enable RecA-mediated recombination to increase the copy number in response to increases in antibiotic concentration (Fig. 6A). If desired, the copy number can be stabilized by deletion of the recA gene. We integrated the CIChE construct containing kivD, adhA, and the chloramphenicol resistance gene cat into the aslB site, a high-expression site (32). Before integrating this construct into C7, we also integrated it into a clean background strain, passaged it into higher chloramphenicol concentrations, deleted the recA gene to stabilize the copy number, and measured the construct copy number by qPCR of the cat gene (fig. S3). These measurements demonstrated that the copy number can be increased up to about 60 copies by this method. We tested the production of the C7 strain with the CIChE kivD-adhA integration (C7 CIChE) in chloramphenicol (50 μg/ml) and found that it produced more isobutanol than a strain with just a single copy of kivD and adhA integrated into the same site. Further increasing the chloramphenicol concentration did not lead to an increase in production level. The fully integrated strain C7 CIChE performed very similarly to C7 pSA65, generating isobutanol titers and yields lower than but close to those of the double plasmid strain (Fig. 6, B and C). Noticeably, on the other hand, this strain reached higher cell densities than the plasmid-bearing strains (Fig. 6D).

Fig. 6 Fully integrated strain performance compared with strains with plasmids.

(A) Overview of the CIChE approach for gene copy number amplification. A construct containing the genes of interest, kivD and adhA, along with the chloramphenicol resistance marker, cat, flanked by two identical 1-kb homology regions, is integrated into the E. coli chromosome. RecA-mediated recombination of the homology regions results in some daughter cells having higher copy number of the integrated genes and thus able to tolerate higher chloramphenicol concentrations. (B) Isobutanol production by strain C7 after integration of kivD-adhA by CIChE, compared with C7 pSA65, which has alsS and ilvCD integrated into the genome and kivD and adhA expressed from the pSA65 plasmid, and JCL260 ΔlysA pSA65/9, which has all five pathway genes expressed from the plasmids. Error bars represent the SD of three biological replicates. (C) The yields of the cultures based on glucose consumed. The inset in (C) lists the percentage of the theoretical maximum yield (0.41 g of isobutanol per gram glucose). (D) Cell densities of the production cultures whose performance is displayed in (B) and (C). OD600, optical density at 600 nm. (E) Expression levels of kivD and adhA, relative to kivD level in JCL260 ΔlysA pSA65/9. RNA was harvested 4 hours after induction with IPTG. Error bars represent the SD of three technical replicates.

We also examined the kivD and adhA transcript levels in the fully integrated strain compared with the plasmid-bearing strains. As expected, they are lower in C7 CIChE. The degree of the difference (more than two orders of magnitude), however, was quite unexpected (Fig. 6E). This indicates that the expression levels rendered by the high–copy number plasmid are unnecessarily high. This result further demonstrates that it is feasible to achieve high production through considerably lower pathway gene expression from the chromosome.


Synthetic biology is making strides toward solutions to some of society’s most pressing problems. The challenges of improving strain stability and reducing cell-to-cell variability, however, still present major hurdles to overcome (51). Moving gene expression from plasmids to the genome can help to manage these issues. Yet, it remains difficult to predict the expression levels that will be optimal for production and achievable by genomic integration into a particular site. Moreover, the interplay between gene integration and surrounding regions on the chromosome, such as gene disruption and expression level perturbation, could cause complications and additional difficulty in optimizing pathway expression.

The new approach of screening random integration libraries developed in the present study bypasses the challenges of rational selection of integration locations. The transposon integration method is versatile and applicable to almost any gene of interest. Because the integration is not homology-based (unlike λ-Red recombinase–based integration, for example), it can even be used on genes that are already natively present in the genome at suboptimal expression levels, such as ilvC in the work presented here. The high-throughput screening enables large libraries to be assessed, making it feasible to multiplex integrations of different pathway genes.

The application of the approach to the isobutanol pathway demonstrates its effectiveness and has generated a strain with, to our knowledge, the highest production titers yet reported in the literature for an E. coli strain with the isobutanol pathway chromosomally integrated. This chromosomally integrated strain, which achieves production performance close to that of a plasmid-based strain with substantially higher pathway gene expression levels, also provides a notable example of how plasmid-based expression can be unnecessarily high. We expect that the lower expression levels in the integrants may be beneficial under more stressful conditions, alleviating unnecessary demands on the cell. Even under the relatively optimal fermentation conditions used here, the integrated strains reach higher cell densities than the plasmid-bearing strains, which may, in part, be a result of reduced burden on these strains. Furthermore, we found that this approach can take advantage of beneficial gene deletions that may not be obvious choices for targeted deletion to yield optimal production.

Our results also open intriguing questions for future investigation, the answers to which would be valuable in guiding rational genome engineering endeavors. For example, it will be interesting to learn what factors can lead a strain with relatively high expression levels of all pathway genes, such as C1, to have such a substantially lower production level than other strains. Also intriguing is the fact that in the highest-producing isolate we identified, C7, as well as in another high-producing isolate, B2, the gene integrations are located quite close to each other. It remains unclear whether the good performance is due to that region having optimal expression levels, or if the proximity of the constructs provides some benefit, or it has merely occurred by chance. Deep-sequencing pools of droplets collected from different degrees of screening stringency could provide further insights into whether proximity leads to improvements in production. In addition, studies with targeted integrations could be designed to probe potential mechanisms. Even without answers to these intriguing questions regarding the mechanisms underlying the strain improvements, the approach presented here, combining generation of large integration libraries by transposon integration and high-throughput screening by SnoCAP, provides a streamlined and broadly applicable method to generate efficient chromosomally integrated biochemical producers.


Strains and plasmids

Strains and plasmids used are listed in table S1. JCL260, pSA65, and pSA69 (38, 52) were provided by J. Liao, University of California, Los Angeles (UCLA). Keio strains were obtained from the E. coli genetic stock center [Coli Genetic Stock Center (CGSC);]. Plasmid pSAS31 was constructed by S. Scholz (33).


M9IPG, consisting of M9 salts (47.8 mM Na2HPO4, 22.0 mM KH2PO4, 8.55 mM NaCl, 9.35 mM NH4Cl, 1 mM MgSO4, and 0.3 mM CaCl2), micronutrients [2.91 nM (NH4)2MoO4, 401.1 nM H3BO3, 30.3 nM CoCl2, 9.61 nM CuSO4, 51.4 nM MnCl2, 6.1 nM ZnSO4, and 0.01 mM FeSO4), thiamine HCl (3.32 μM), and dextrose (d-glucose) at the stated concentrations, was used for all culturing experiments. For SnoCAP screening, in both microdroplets and microplates, the glucose concentration was 20 g/liter, and the medium was supplemented with 3 mM l-isoleucine and kanamycin (50 μg/ml). For isobutanol production monocultures, the medium contained glucose (36 g/liter) and yeast extract (5 g/liter), and no antibiotics were added. Precultures for both screening and production cultures were carried out in LB Lennox with antibiotics appropriate to the strain. When used, antibiotics were supplied at the following concentrations: ampicillin, 100 μg/ml; kanamycin, 50 μg/ml; tetracycline, 10 μg/ml; and chloramphenicol, 50 μg/ml.

Transposon integration library construction

All primers and oligonucleotides were ordered from Integrated DNA Technologies Inc. and are listed in table S2. Each of the integration constructs was amplified by PCR from the pSA69 plasmid, adding the Tn5 mosaic ends. Primers phosph_transp_kan_for and phosph_transp_alsS_rev were used to amplify kan-PLlacO1-alsS (for libraries A, B, and C). phosph_transp_PLlacO1_ilvC_for and phosph_transp_ilvD_rev were used to amplify ilvCD and introduce a PLlacO1 promoter in front of ilvC (for Library C). phosph_transp_PLlacO1_ilvD_for and phosph_transp_ilvD_rev were used to amplify ilvD and introduce a PLlacO1 promoter in front of ilvD (for Library B). The linear PCR product was then digested with both Dpn I and Spe I enzymes [New England Biolabs (NEB)] to digest the template plasmid, phosphorylated with T4 polynucleotide kinase (NEB), cleaned with a PCR clean-up kit (Qiagen), and eluted in TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 7.5). The DNA was then reacted with EZ-Tn5 transposase (Lucigen) according to the manufacturer’s instructions. The resulting transposome was then electroporated into the appropriate strain. The cells were recovered for 1.5 hours with 1 ml of super optimal broth with catabolite repression (SOC) medium. Then, 50 μl was used for dilution and plating on LB with kanamycin plates to assess library size, and the remaining cells were grown to saturation in 100 ml of selective medium [either LB with kanamycin (50 μg/ml) for alsS integration or M9IPG with 0.1 mM IPTG and 3 mM lysine for ilvCD or ilvD integration]. The cells were then frozen in 1 ml of aliquots (resuspended in fresh LB with 25% glycerol) and later thawed, washed, and grown in LB with kanamycin (50 μg/ml) to prepare them for screening.

SnoCAP screening

SnoCAP screening for 2-KIV production, in droplet and microplate formats, was used as previously described (36). Stationary phase cultures in LB were used as the inocula for both formats. For the microplate format, K12 ΔilvD was used as the sensor strain. M9IPG with glucose (20 g/liter), 3 mM isoleucine, and kanamycin (50 μg/ml) was used as the medium for all screening cocultures except for those with the ilvD and ilvCD libraries (without alsS integration), in which the kanamycin was omitted. For the droplet format, K12 ΔilvD pSAS31, which expresses mNeonGreen, was used as the sensor strain. For the droplet sorting assay, the droplet collection device was soaked in a mixture of HFE-7500 oil and water for several days before use to improve droplet stability after collection.

Isobutanol fermentations and analysis

Isobutanol cultures were grown similarly to the method previously described (36). Overnight cultures in LB with antibiotics were diluted 1:100 (v/v) into 10 ml of M9IPG with glucose (36 g/liter), yeast extract (5 g/liter), and no antibiotics in a 125-ml baffled, unvented polypropylene flasks. Cells were grown for 2.5 to 3 hours at 37°C, 250 rpm, followed by induction of the pathway genes with 0.1 mM IPTG. Flasks were then sealed with parafilm and incubated at 30°C, 250 rpm. Samples were taken daily to measure cell density and glucose and isobutanol concentrations. Cell density was assessed by measuring the OD600 (optical density at 600 nm) of 200 μl of culture, diluted into the linear range, in a VersaMax microplate reader (Molecular Devices). Glucose and isobutanol were measured by high-performance liquid chromatography, as previously described (36). In the case of cultures in which RNA was harvested, 0.5 ml of cells was harvested with RNAprotect Bacteria Reagent (Qiagen) 3 to 4 hours after induction. RNA was isolated using an RNA mini kit (Qiagen) according to the manufacturer’s instructions, including on-column DNAse (deoxyribonuclease) I digestion. An additional digestion using the Turbo DNA-free Kit (Invitrogen) was performed after purification to eliminate any remaining genomic DNA. The RNA was then reverse transcribed to complementary DNA (cDNA) using the MultiScribe Reverse Transcriptase (Invitrogen) and approximately 400 ng of RNA per 20-μl reaction, with the following temperature profile: 25°C, 10 min; 37°C, 120 min; and 85°C, 5 min. A no reverse transcriptase control was carried out for each sample and checked with at least one set of primers during the real-time PCR analysis to ensure the absence of residual genomic DNA. qPCR was conducted on the cDNA using the SYBR Green qPCR Master Mix (Life Technologies) with 20-μl reactions. Reactions were run in 96-well plates on a 7900HT Fast Real-Time PCR Machine (Applied Biosystems), courtesy of the University of Michigan Advanced Genomics Core. The PCR program was as follows: 50°C, 2 min; 95°C, 10 min; and 40 cycles of 95°C for 15 s, 60°C for 1 min. Primers used are listed in table S2. Data were analyzed using the 2−ΔΔCT method (53).

kivD and adhA integration

A CIChE construct consisting of kivD, adhA, and a chloramphenicol resistance gene, cat, flanked by 1-kb homology regions, was previously generated and integrated into the aslB locus of the NV3r1 strain by λ-Red recombineering (36). Following the method of Tyo et al. (18), we passaged this strain into increasing concentrations of chloramphenicol and then deleted the recA gene by P1 transduction from donor strain BW 26,547 ΔrecA::kan Lambda recA+ (obtained from the CGSC). Transductants were selected on LB plates with kanamycin (50 μg/ml) and the corresponding concentration of chloramphenicol. Genomic DNA was isolated using the DNEasy Blood & Tissue Kit (Qiagen), and the copy number was determined by qPCR using the same primers as Tyo et al. (18), with one set for the cat gene and one for the bioA gene, which is used as a single-copy reference gene. Absolute copy number was determined by comparison to a strain with a single copy of cat integrated into the intC locus (NV3r1 ΔintC::yfp-cat). To transfer this integration into the C7 strain generated in this study, we prepared P1 lysates from NV3r1 (recA+) with the kivD, adhA CIChE construct after growing on chloramphenicol (80 μg/ml) and transduced it into the C7 strain, selecting on LB plates with chloramphenicol (40 μg/ml). Resulting colonies were isolation streaked twice on LB agar with chloramphenicol (40 μg/ml) and 0.8 mM sodium citrate.


Supplementary material for this article is available at

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We thank S. Scholz and P. Freddolino for valuable discussions regarding position-dependent gene expression variation and transposon footprinting, as well as M. Burns, L. Eniola-Adefeso, and C. Barr for the use of equipment in laboratories. The isobutanol strains and plasmids were originally provided by J. Liao (UCLA, formerly). The CIChE construct was constructed using the pTGD plasmid provided by K. Tyo (Northwestern U). Funding: This work was supported by the USDA AFRI NIFA Fellowships Grant Program (grant no. 2016-67011-24725). Author contributions: T.E.S. and X.N.L. conceived and designed the study. T.E.S. and A.K. constructed the integration libraries. M.T.C. and K.K. developed the droplet sorting platform. T.E.S. and M.T.C. performed the library screening. T.E.S. and D.N.C. characterized the library isolates and fully integrated strains. T.E.S. and X.N.L. wrote the manuscript. All authors discussed the results and commented on the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article