Regulatory encoding of quantitative variation in spatial activity of a Drosophila enhancer

See allHide authors and affiliations

Science Advances  02 Dec 2020:
Vol. 6, no. 49, eabe2955
DOI: 10.1126/sciadv.abe2955


Developmental enhancers control the expression of genes prefiguring morphological patterns. The activity of an enhancer varies among cells of a tissue, but collectively, expression levels in individual cells constitute a spatial pattern of gene expression. How the spatial and quantitative regulatory information is encoded in an enhancer sequence is elusive. To link spatial pattern and activity levels of an enhancer, we used systematic mutations of the yellow spot enhancer, active in developing Drosophila wings, and tested their effect in a reporter assay. Moreover, we developed an analytic framework based on the comprehensive quantification of spatial reporter activity. We show that the quantitative enhancer activity results from densely packed regulatory information along the sequence, and that a complex interplay between activators and multiple tiers of repressors carves the spatial pattern. Our results shed light on how an enhancer reads and integrates trans-regulatory landscape information to encode a spatial quantitative pattern.


Enhancers constitute a particular class of cis-regulatory elements that control in which cells a gene is transcribed, when, and at which rate (1, 2). Notably, enhancers play a central role during development in plants and animals (3), generating patterns of gene expression that delineate embryonic territories and prefigure future forms (4). How the information determining these patterns is encoded in a developmental enhancer has therefore been at the center of attention for several decades. Enhancers integrate spatial information from transcription factors (TFs) bound to them, and the number, affinity, and arrangement of TF binding sites (TFBSs) in the enhancer sequence are relevant to the enhancer spatial activity [reviewed in (5)]. However, the logic of TFBS organization that determines a spatial pattern is not sufficiently understood to reliably design a functional synthetic enhancer driving correct expression levels (6, 7).

The study of developmental enhancers has been polarized by two conceptions of gene expression patterns. Until recently, most studies have referred to enhancer activities in qualitative terms exclusively, where the notion of spatial pattern evokes discrete and relatively homogeneous domains of gene expression (8). With the rise of genomics from the early 2000s, it has become possible to precisely measure gene expression and, by extension, enhancer activity. However, whether it is measured in a given tissue or in single cells, this quantification of gene expression is done at the expense of losing spatial information [e.g., (911)], with few exceptions [e.g., (12, 13)]. It is nevertheless critical to appreciate that the overall levels and the spatial pattern of activity in a given tissue are intrinsically linked. Therefore, to understand how a spatial pattern of gene expression is encoded in the sequence of an enhancer, it is necessary to measure quantitative variation of gene expression in space in the tissue where the enhancer is active. Leading this endeavor, recent studies have quantified spatial enhancer activity but without considering the pattern itself as a quantitative object (1318).

To pursue this effort of measuring quantitative variation in spatial gene expression, we have analyzed the structure and the functional logic of a compact Drosophila enhancer sequence with quantitative measurements of its spatial activity in fly wings. The so-called spot196 enhancer, from the yellow gene of the fruit fly Drosophila biarmipes, drives a patterned gene expression in pupal wings with heterogeneous expression levels among cells (1921). The spot196 enhancer sequence contains at least four TFBSs for the activator Distal-less (Dll) and at least one TFBS for the repressor Engrailed (En) (Fig. 1A) (19, 20). Together, these inputs were considered to be sufficient to explain the spatial activity of spot196 in the wing, with activation in the distal region and repression in the posterior wing compartment (19, 20). Grafting TFBSs for these factors on a naïve sequence in their native configuration, however, proved insufficient to produce regulatory activity in wings. This prompted us to dissect the spot196 element further to identify what determines its regulatory activity, considering simultaneously spatial pattern and activity levels.

Fig. 1 A mutational scan of the D. biarmipes spot196 enhancer with a quantitative reporter assay.

(A) Wild-type ([+]) and mutant ([0] to [16]) versions of the spot196 enhancer from the D. biarmipes yellow locus (depicted at the top) were cloned upstream of a DsRed reporter to assay their respective activities in transgenic D. melanogaster. Each mutant targets a position of the enhancer, where the native sequence was replaced by an A-tract (color code: light green, guanine; purple, adenine; dark green, cytosine; pink, thymine). Four characterized binding sites for the TF Distal-less (Dll-a, Dll-b, Dll-c, and Dll-d) (19) are highlighted in red, and a single binding site for the TF Engrailed (20) is highlighted in blue across all constructs. (B) Average wing reporter expression for each construct depicted in (A) and an empty reporter vector (ø). Each wing image is produced from 11 to 77 individual wing images (38 on average; data file S2), aligned onto a unique wing model. The average image is smoothened, and intensity levels are indicated by a colormap. (C) Mutational effect on intensity of activity along the spot196 sequence. The phenotypic effect of each mutation described in (A) along the spot196 sequence (x axis) is plotted as the average level of expression across the wing relative to the wild-type average levels. Shaded gray areas around the curve represent the 95% confidence interval of the average levels per position. “1” on the y axis represents the mean wild-type intensity of reporter expression. The graph shows how each construct departs from the wild-type activity (see Materials and Methods). Mutation positions in constructs [0] to [16] are indicated above the graph. The locations of blocks A, B, and C, analyzed in Fig. 3, are also indicated above the graph. The yellow curve above the graph indicates the helical phasing.

We first introduced systematic small-scale mutations along the 196 base pairs (bp) of the enhancer sequence to test the necessity of the mutated positions; we then randomized large blocks of the enhancer sequence to test the sufficiency of the remaining intact sequence to drive activity. To assess the activity of each mutant enhancer, we devised a pipeline that uses comprehensive descriptors to quantify variations in reporter activity levels across the wing of Drosophila melanogaster transgenic lines. Our quantitative analysis revealed a high density of regulatory information, with all mutated positions along the spot196 enhancer sequence contributing significantly to the activity levels. It also outlined an unanticipated regulatory logic for this enhancer, where the spatial pattern in the wing results from a complex interplay between activators and multiple tiers of repressors carving a spatial pattern.


Regulatory information distributed along the entire spot196 enhancer contributes to its quantitative spatial activity in the wing

We first systematically evaluated the potential role of all positions along the spot196 enhancer sequence to produce an activity pattern and wild-type levels of gene expression. We generated a series of mutants scanning the element and thereby testing the necessity of short adjacent segments to the enhancer function. Notably, we made no prior assumption (e.g., predicted TFBSs) on the function of the mutated nucleotides. We maximized the disruption of sequence information by introducing stretches of 10 to 18 bp (11.5 bp on average) of poly(dA:dT), also known as A-tracts (22), at adjacent positions along the sequence (Fig. 1A). Thus, the sequence of each of the 17 constructs (spot196 [0] to spot196 [16], or [0] to [16] in short; Fig. 1A) is identical to the wild-type spot196 ([+] in short), except for one segment where the sequence was replaced by the corresponding number of adenines. These mutations affect the local sequence composition, without changing distances or helical phasing in the rest of the enhancer. We measured activities of each mutant enhancer in the wing of the corresponding reporter construct line of D. melanogaster, here used as an experimental recipient for site-specific integration. In brief, for each reporter construct line, we imaged individually around 30 male wings (1 wing per fly) under bright-field and fluorescent light. We detected the venation on the bright-field images of all wings and used it to compare reporter activity across wings. For this, we applied a deformable model to warp the fluorescent image of each wing, using landmarks placed along the veins of the corresponding bright-field image and aligning them to a reference venation (see Materials and Methods for details). The resulting dataset is a collection of fluorescence images for which the venation of all specimens is perfectly aligned. These images, represented as the list of fluorescence intensity of all pixels, constitute the basis of all our quantitative dissection. To assess whether the activity driven by a given enhancer sequence significantly differs from any other, wild type or mutant, we used the scores produced by principal components analysis (PCA) that comprehensively summarizes the variation of the pixel intensities across wings. To visualize the reporter activity per line, we used images representing the average activity per pixel (hereafter average phenotype).

The activity of each mutant (Fig. 1B) differs significantly from that of [+], as measured in the PCA space (fig. S1 and data file S1). This means that the activity of each mutant had some features, more or less pronounced, that significantly differentiate its activity from [+], revealing the high density of regulatory information distributed along the sequence of spot196. The magnitude and direction of the effects, however, vary widely among mutants, ranging from activity levels well above those of [+] to a near-complete loss of activity.

The average activity levels of each mutant construct in the wing relative to the average activity levels of [+] show how effect directions and intensities are distributed along the enhancer sequence (Fig. 1C). This distribution of regulatory information and the magnitude and direction of the effects, including several successions of overexpressing and underexpressing mutants, suggest a more complex enhancer structure than previously thought (20). The density of regulatory information is also reminiscent of what has been found for other enhancers (13, 23, 24).

In principle, the localized mutations we introduced can affect the spot196 enhancer function through nonexclusive molecular mechanisms. Mutations may affect TF-DNA interactions by disrupting TFBS cores or by influencing TF binding at neighboring TFBSs [for instance, by altering DNA shape properties (25, 26)]. A-tract mutations may also influence nucleosome positioning and thereby the binding of TFs at adjacent sites (27). Not exclusively, because of stacking interactions between adjacent As and Ts, they increase local DNA rigidity (22, 28, 29) and may thereby hinder or modulate TF interactions. These changes in rigidity, which we have evaluated for our mutant series (fig. S2A), may affect TF-TF interactions (fig. S2B). Regardless of the precise molecular mechanisms underlying the mutations we introduced in the spot196 sequence, we wanted to assess how they affect the integration of spatial information along the enhancer sequence.

An enhancer’s view on the wing trans-regulatory landscape revealed by logRatio images

We have introduced a spatial visualization of the intensity of effect of a mutation on the enhancer activity. We computed the pixel-wise log of the ratio between two average phenotypes (single mutants over [+]) at every pixel (30), hereafter noted logRatio. The advantages of using logRatio are detailed in the Supplementary Materials and briefly summarized here. logRatio images show visually how much a mutant affects the enhancer activity across the wing proportionally to the local activity level. By contrast, the absolute difference in expression is generally locally linked to the level of expression. Therefore, effects in areas of high activity tend to be much more visible than those in areas of low activity (compare Fig. 2 and fig. S3). logRatio images instead represent the local proportional effects and are therefore suitable to reveal the variety of spatial effects of mutations, irrespective of the expression pattern itself.

Fig. 2 Trans-regulatory integration along the spot196 sequence.

(A) Average phenotypes reproduced from Fig. 1B. (B) logRatio images [log([mutant]/[+]) for intensity values of each pixel of registered wing images] reveal what spatial information is integrated by each position along the enhancer sequence. For instance, a blue region on an image indicates that the enhancer position contains information for activation in this region. When mutated, this enhancer position results in lower activity than [+] in this region of the wing. Note that logRatio illustrates local changes between [+] and mutants far better than image differences (fig. S3) in regions of relatively low activity. (C) Summary of spatial information integrated along the enhancer sequence.

Depending on how TF integration is modified by a mutant, logRatio images can also reflect the distribution of the individual spatial inputs received and integrated along the spot196 sequence. They can be particularly informative when both a TFBS and the spatial distribution of the cognate TF are known, as they shed light on how directly the TF information is integrated. This is the case for En and Dll, for which TFBSs have been previously characterized in spot196 (19, 20). The disruption of an En binding site (Fig. 1, A and B, construct [15]) resulted in a proportional increase of activity in the posterior wing compartment (75%, F1,124 = 77.8, P = 8.8818 × 10−15). The log([15]/[+]) image (Fig. 2) shows that mutant [15] proportionally affects the activity mostly in the posterior wing. The effect correlates with En distribution (20) and is consistent with the repressive effect of its TF. Contrary to what the average phenotypes suggested (Fig. 1C), mutant [16] shows a very similar logRatio to that of [15], albeit with only 25% increase in activity. The effect of mutant [16] was barely discernible when considering the variation in the overall fluorescence signal (Fig. 1C), illustrating the power of the logRatio analysis to detect local effects in areas of low activity. Mutations that disrupted characterized Dll binding sites (Fig. 1, A and B, constructs [0], [1], [7], and [9]) resulted in strong reduction in reporter expression (90%, F1,74 = 143.3, P = 0; 75%, F1,78 = 109.3, P = 2.2204 × 10−16; 47%, F1,107 = 75.4, P = 4.8073 × 10−14; and 39%, F1,74 = 23.2, P = 7.6363 × 10−6, respectively; data file S1). The logRatio images for mutants [0], [1], and, to a lesser extent, [7] show a patterned decrease of activity in line with Dll distribution in the wing (Fig. 2) (19), with a proportionally stronger loss of activity toward the distal wing margin. This corroborates previous evidence that Dll binds to these sites. The respective logRatio images for segments [0] and [1] correlate with levels of Dll across the wing. This suggests that these sites individually integrate mostly Dll information and do so in a near-linear fashion. Site [9], which produces a relatively different picture with areas showing overexpression, is discussed below. Mutations of Dll sites, however, have nonadditive effects, as mutants [0], [1], [7], and [9] result in a decrease of activity levels by 90, 75, 47, and 39% compared to [+], respectively. This nonadditivity could be explained by a strong cooperative binding of Dll at these sites or, alternatively, by considering that these Dll TFBS are interacting with other sites in the sequence.

In addition, we noted that, despite mutating a Dll TFBS, mutant [9] showed a substantially different logRatio than [0] and [1] but similar to [8], with a repressing activity in the posterior wing compartment, proximally, and a distal activation (Fig. 2B). This dual effect could be explained by the disruption of the Dll site along with a distinct TFBS for a posterior repressor. Alternatively, a single TFBS could be used by different TFs with opposite activities. In this regard, we note that the homeodomains of Dll and En have similar binding motifs (31) and could both bind the Dll TFBS disrupted by [9] (and possibly [8]). The posterior repression of En and the distal activation of Dll seem compatible with this hypothesis.

Unraveling trans-regulatory integration along the spot196 sequence

Following the same approach, we next analyzed the information integrated in other segments. Apart from the known Dll and En TFBSs, the enhancer scan in Fig. 1C identified several segments with strong quantitative effects on the regulatory activity. Between the two pairs of Dll TFBSs, we found an alternation of activating sites [[3] and [6], reducing overall levels by 36% (F1,69 = 17.6, P = 7.8336 × 10−5) and 93% (F1,98 = 284.9, P = 0) compared to [+], respectively] and strong repressing sites [[2], [4], and [5], with an overall level increase of 3.2-fold (F1,72 = 511.5, P = 0), 1.9-fold (F1,85 = 103.2, P = 2.2204 × 10−16), and 2.7-fold (F1,82 = 426.5, P = 0) compared to [+], respectively]. Construct [3] proportionally decreases the expression mostly around the wing veins (Fig. 2B), suggesting that this segment integrates information from an activator of the vein regions. We had found a similar activity for this region of yellow from another species, Drosophila pseudoobscura, where no other wing blade activity concealed it (20). The logRatio of mutant [6], with a stronger, more uniform effect than for the other mutants that repress the activity, suggests a different trans-regulatory integration than Dll sites. We have recently shown that this site regulates the chromatin state of the enhancer (21). Regarding segments with a repressive effect, mutants [4] and [5] result in a fairly uniform relative increase in expression, different from the activity of [2], indicating that the information integrated by these two regions ([2] versus [4] and [5]) likely involves different TFs. Three segments, [6], [0], and [1] (the last two containing previously known Dll binding sites), each decrease the activity levels by 75% or more. Finding additional strong repressive sites ([2], [4], and [5]) with a global effect on the enhancer activity across the wing is also unexpected.

The analysis revealed another activating stretch of the sequence, between 116 and 137 bp, as mutated segments [10] and [11] decreased activity by 56% relative to [+] and showed very similar logRatios. Mutant [12] showed a mixed effect, with practically, in absolute terms, no effect in the anterior distal wing quadrant. Last, segments [13], [14], and [15] showed a succession of repressing and activating sites, as we have seen for segments [2] to [6], although with a lower amplitude. Mutant [13] caused an overall increase in activity (1.4-fold relative to [+]) with, proportionally, a uniform effect across the wing (logRatio). By contrast, mutant [14] decreased the overall activity by 36%, with a logRatio indicating an activating effect in the spot region and a repressive effect in the proximal part of the posterior wing compartment, similarly to mutants [8] and [9] but with lesser effects.

Together, this first dissection, focusing on the necessity of segments for the enhancer activity at the scale of a TFBS, which is typically 10 bp long (32), suggested a much higher density of regulatory information in the spot196 enhancer than previously described (19, 20). The nonadditivity of effects at Dll binding sites, three repressing and four activating and previously unidentified segments distributed in alternation along the enhancer, and the variety of their effects pointed to a complex regulatory logic, involving more (possibly six to eight) factors than just Dll and En. We resorted to a different approach to further probe the regulatory logic of spot196.

An interplay of activating and repressing inputs produces a spatial pattern of enhancer activity

The first series of mutations informed us on the contribution of the different elementary components of the spot196 enhancer sequence to its regulatory activity. However, it failed to explain how these components integrated by each segment interact to produce the enhancer activity. To unravel the regulatory logic of this enhancer, it is required to understand not only which segments are sufficient to drive expression but also how elementary components underlying the regulatory logic influence each other. To evaluate the sufficiency of, and interactions between, different segments, we would require to test all possible combinations of mutated segments, namely, a combinatorial dissection. Doing this at the same segment resolution as above is unrealistic, because the number of constructs grows with each permutation. Instead, we used three sequence blocks of comparable sizes in the spot196 enhancer—A, B, and C, defined arbitrarily (Fig. 3A)—and produced constructs where selected blocks were replaced by a randomized sequence (noted “-”). This second series, therefore, consists of eight constructs, including all combinations of one, two, or three randomized blocks, a wild-type [ABC] (which has strictly the same sequence as [+] from the first series), and a fully randomized sequence, [---].

Fig. 3 Regulatory interactions in the spot196 sequence.

(A) Schematics of constructs with block randomizations. The spot196 sequence was arbitrarily divided into three blocks (A, 63 bp; B, 54 bp; C, 79 bp). In each construct, the sequence of one, two, or all three blocks was randomized. (B) Terminology for parts of the wing where constructs from (A) drive reporter expression. (C) Average phenotypes resulting from constructs in (A). Constructs where single blocks remain indicate the sufficiency of these blocks to promote wing activity: A in the veins, B in the alula, and C at high levels across the wing blade. Constructs with two nonrandomized blocks show the effect of one block on the other. For instance, B is sufficient to suppress the wing blade activation promoted by C, as seen by comparing [-B-], [--C], and [-BC]. Colormap of average phenotypes normalized for all constructs of the block series, including block permutations of Fig. 4B. (D) Block interactions are best visualized with logRatio images of construct phenotypes shown in (C). For each logRatio, the denominator is the reference construct, and the image shows on a logarithmic scale how much the construct in the numerator changes compared to this reference. For instance, log([-BC]/[--C]) shows the effect of B on C, a global repression, except in the spot region. Colormap indicates an increase or a decrease of activity compared to the reference (denominator). For an overview of all comparisons, particularly the relative contribution of each block to the entire enhancer activity, see fig. S4 (C to F).

With these constructs, we can track which segments, identified in the first series as necessary for activation in the context of the whole spot196, are also sufficient to drive activity (table S3; see Fig. 1C for the correspondence between the two series of mutations). Of the three blocks (constructs [A--], [-B-], and [--C]), only block C is sufficient to produce activity levels comparable to those of the wild-type spot196 in the wing blade, although with a different pattern from [ABC] (fig. S4, A to C). Reciprocally, randomizing block C (construct [AB-]) results in a uniform collapse of the activity (fig. S4, A to C). We concluded that the sequence of block C contains information necessary and sufficient to drive high levels of activity in the wing in the context of our experiment. This is particularly interesting because C does not contain previously identified Dll TFBSs or strong activating segments. By contrast, blocks A and B, although they each contain two Dll sites, do not drive wing blade expression. The activating segments in block C revealed in the first dissection, particularly segments [10] and [11], are therefore candidates to drive the main activity of spot196 in the context of these reporter constructs.

Block A alone ([A--]) produces high levels of expression in the veins (fig. S4, A to C). Combined with block C (construct [A-C]), it also increases the vein expression compared to C alone. We concluded that A is sufficient to drive expression in the veins. Segment [3], which proportionally decreased the activity mostly in the veins, could therefore be the necessary counterpart for this activation.

Block B alone drives expression only near the wing hinge, in a region called the alula ([-B-]; Fig. 3, B to D). The first dissection series, however, did not identify a mutated segment within block B that affected specifically the alula.

The necessity of Dll binding sites (in segments [0], [1], [7], and [9]) and of segment [6], and their insufficiency to drive activity in the wing blade in the context of block A alone, block B alone, or blocks A and B combined, suggest that these sites with a strong activation effect function as permissive sites. We next focused on understanding the interplay between repressing and activating sites to shed light on how the spot196 patterning information is built. In the first series of constructs, we identified several strong repressing segments in block A ([2] and [4]) and block B ([5]). Using sufficiency reasoning with the second series of constructs, we further investigated how these inputs interacted with other parts of the enhancer (Fig. 3). These interactions are best visualized with logRatios, comparing this time double-block constructs to single-block constructs used as references (Fig. 3D and fig. S4, D to F). Block B has a strong repressive effect on block C throughout the wing, except at the anterior distal tip, where C activity is nearly unchanged [log([-BC]/[--C]); Fig. 3D]. Likewise, log([AB-]/[A--]) shows that B also represses the vein expression driven by A. Similarly, block A represses the C activity across the wing blade, except in the spot region log([A-C]/[--C]). We have seen above that blocks A and B both contain not only strong repressing segments but also known Dll TFBSs. Because both A and B show a repressive effect on block C, except in the spot region, we submit that the apparent patterned activation by Dll may result from its repressive effect on direct repressors of activity, mostly at the wing tip. This indirect activation model would explain the nonadditivity of the individual Dll binding sites observed in the first construct series and why grafting Dll TFBSs on a naïve DNA sequence is not sufficient to create a wing spot pattern. Together, these results outline an unexpectedly complex regulatory logic that contrasts with the simple model we had initially proposed (19, 20) and involves multiple activators and several tiers of repressors.

Sequence reorganization affects activity levels of the spot196 enhancer, not its spatial output

In a final series of experiments, we wondered whether the complex regulatory architecture uncovered by the first two mutant series was sensitive to the organization of the inputs. To test the effect of changes in the organization of enhancer logical elements, we introduced new constructs with permutations of blocks A, B, and C (Fig. 4A). These permutations preserve the entire regulatory content of the enhancer, except at the junction of adjacent blocks where regulatory information may be lost or created. All permutations that we have tested (four of five possible permutations) drive significantly higher levels of expression than the wild-type [ABC] [[ACB]: 2.9-fold (F1,98 = 191.8, P = 0); [BAC]: 6-fold (F1,93 = 589.1, P = 0); [BCA]: 5.8-fold (F1,93 = 589.1, P = 0); [CBA]: 8.4-fold (F1,93 = 1664.2, P = 0); Fig. 4B] yet with minor effects on the activity distribution proportionally to the wild type (Fig. 4C). We concluded from these experiments that, in terms of pattern, the regulatory output is generally resilient to large-scale rearrangements. As long as all inputs are present in the sequence, the spatial activity is deployed in a similar pattern, yet its quantitative activity is strongly modulated. Because they have little influence on the activity pattern, the rearrangements may not change the nature of the interactions within the enhancer or with the core promoter. Although we would need to challenge this conclusion with additional constructs and blocks with different breakpoints, we speculate that, molecularly, the block randomization perturbates the action of some of the uniformly repressing elements. It highlights the robustness of the enhancer logic to produce a given patterned activity.

Fig. 4 Block permutations scale the activity of the spot196 enhancer.

(A) Schematics of constructs with block permutations. In this series, the same blocks of sequences as in Fig. 3A were permutated. (B) Average phenotypes resulting from constructs in (A). Colormap of average phenotypes normalized for all constructs of the block series, including block randomizations of Fig. 3C and fig. S4B. (C) Average phenotypes in (B) compared to the average phenotype of the wild-type [ABC] (logRatio). Note that, in contrast to constructs with randomized blocks (Fig. 3), constructs with block permutations result in near-uniform changes of activity across the wing. Colormap indicates an increase or a decrease of activity compared to the wild-type enhancer [ABC].


With this work, we have set to decipher the regulatory logic of an enhancer, spot196. The viewpoint presented here is the information that the enhancer integrates along its sequence. Combined with the quantitative measurement of enhancer activity in a tissue, the wing, this information reveals the enhancer regulatory logic and how it reads the wing trans-regulatory environment to encode a spatial pattern. The strength of our arguments stems from the introduction of two complementary aspects of the method (discussed in the following sections): one to combine the assessment of necessity and sufficiency of regulatory information in our analysis and another to compare the spatial activity of enhancer variants (logRatio).

Regulatory necessity and regulatory sufficiency

When dissecting a regulatory element, it is straightforward to assess the necessity of a TFBS or any stretch of the sequence to the activity, by introducing mutations. It is generally more difficult to assess whether the same sequence is sufficient to promote regulatory activity at all, and most enhancer dissections are focusing on necessity analysis [see, for instance, (12, 17, 19, 20, 23, 3337)]. However, our study shows that, to decipher regulatory logic and eventually design synthetic enhancers, understanding which regulatory components are sufficient to build an enhancer activity is key.

A visual tool to compare spatial activities driven by enhancer variants

We introduced a new representation to compare activities between enhancer variants, typically a wild type and a mutant. Proportional effects, or local fold changes, as revealed by logRatio produce representations that are independent from the distribution of the reference activity. They also better reflect the distribution of factors in trans and their variations as seen by the enhancer (here, across the wing) than differential comparisons (compare Fig. 2 and fig. S3). Differential comparisons are dominated by regions of high activities and thereby focusing our attention to the regions of high variation of activity. By contrast, logRatios reveal strong effects in regions of low activity that would hardly be visible using differential comparisons, highlighting some cryptic components of the regulatory logic. When additional knowledge about TFBSs and TF distribution will become available, they will also inform us on the contribution of the TF in the regulatory logic. In this respect, the introduction of logRatios in our analysis has proven useful and could be adapted to any system where image alignment is possible, such as Drosophila blastoderm embryos (38) or developing mouse limbs (39).

A-tracts did not disrupt the major effect of TF-TF interactions

A-tracts are known to change local conformational properties of DNA. Hence, our A-tract mutations could influence the regulatory logic not only by directly disrupting the information contained in the sequence they replaced but also, indirectly, by introducing more changes than wanted. As an alternative, sequence randomization, however, is more likely to create spurious TFBSs, which is difficult to control for, especially if all the determinants of the enhancer activity are not known. The possible occurrence of undesired and undetected TFBSs would have biased our interpretation of the effect of individual segments and, consequently, of the regulatory logic of the enhancer. The chance that A-tracts introduce new TFBSs in the enhancer sequence is quite low compared to sequence randomization, which is why we favored this mutational approach for the analysis of short, individual segments. However, A-tracts can modify various physical properties of the DNA molecule and, in turn, influence interactions between TFs binding the enhancer. The disruption of a TF-TF interaction due to the introduction of an A-tract between two TFBSs (fig. S2B) would be revealed if mutating a particular segment would have an effect similar to the effect of mutating immediately adjacent flanking segments. We note, however, that we do not have such situation in our dataset. This suggests that the A-tracts we introduced, if anything, only mildly altered TF-TF interactions through changes in the physical properties of spot196. Instead, we think that the effects of A-tract mutations are mostly due to disrupted TFBSs along the enhancer sequence.

The regulatory logic underlying spot196 enhancer activity

The main finding of our study is that the spot196 enhancer likely integrates six to eight distinct regulatory inputs, with multiple layers of cross-interactions (Fig. 5). We had previously proposed that the spot pattern resulted from the integration of only two spatial regulators: the activator Dll and the repressor En (19, 20). The regulatory density that we reveal here (Figs. 1C and 2) is reminiscent of what has been found for other enhancers (13, 23, 24). A logical analysis of systematic mutations along the enhancer gives a different status to the factors controlling spot196. The main levels of spot196 activity across the wing blade seem to result mostly from two unknown activators: one promoting a relatively uniform expression in the wing blade, and another along the veins (Fig. 5A). This activation is, in turn, globally repressed throughout the wing by an unknown repressor whose action masks that of the global activator (Fig. 5B). Upon these first two regulatory layers, the actual spot pattern of activity is carved by two local repressions. A distal repression counteracts the effect of the global repressor in the distal region of the wing (Fig. 5C), but the spatial range of this repression is limited to the anterior wing compartment by another repressor acting across the posterior wing compartment (Fig. 5D). The former local repression could be mediated by Dll itself, a hypothesis compatible with the nonadditive effects of Dll TFBS mutations, whereas the latter is almost certainly due to En. Thus, the pattern of activity results not so much from local activation but from multiple tiers of repressors.

Fig. 5 A model of the regulatory logic governing the spot196 enhancer.

(A to D) The schematics show step by step how regulatory information and interactions integrated along the enhancer sequence produce a spatial pattern of activity. (A) Three independent inputs, respectively, in blocks A, B, and C promote activity (arrows) in the wing veins, the alula, and the wing blade, as illustrated with average phenotypes of constructs [A--], [-B-], and [--C], respectively. Note that activity levels in the wing blade, stemming from block C, match the final levels of the spot196 enhancer activity in the spot region. (B) A first set of repressive inputs suppresses activity in the wing blade (stemming from blocks A and B) and the veins (stemming from block B). The overall combined output of the initial activation and the global repressive inputs is a near-complete loss of activity, except in the alula. (C) A second set of repressive inputs, whose action is localized in the distal wing region, counters the global repression, thereby carving a pattern of distal activity promoted by block C. (D) The distal activity is repressed in the posterior wing compartment, likely through the repressive action of Engrailed, resulting in a final pattern of activity in the spot region.

One would expect this complex set of interactions between TFs that bind along the enhancer sequence to be vulnerable to sequence reorganization. We unexpectedly find that shuffling blocks of the sequence resulted in marked changes in activity levels with little effect on the activity pattern. Similarly, many of the mutations still produced a pattern of activity quite similar to the one of [+]. This suggests that the exact organization of the different inputs and the absence of some of these inputs do not affect the TF-enhancer and TF-TF interactions required for a patterned activity, which here translates mainly to the role of Dll in repressing global repressors and the repressing role of En. The frequency of these interactions, or the interactions with the core promoter, may, however, change significantly upon sequence modifications, affecting transcription rate. In other words, the regulatory logic described above is robust to changes for the production of a spatial pattern but less so for the tuning of enhancer activity levels.

The regulatory logic of this enhancer perhaps reflects the evolutionary steps of the emergence of spot196. The spot196 element evolved from the co-option of a preexisting wing blade enhancer (20). The sequences of this ancestral wing blade enhancer and the evolutionary-derived spot196 overlap and share at least one common input (21). This perspective is consistent with the idea that a novel pattern emerged by the progressive evolution of multiple tiers of repression carving a spot pattern from a uniform regulatory activity in the wing blade. To further deconstruct the regulatory logic governing the spot196 enhancer and its evolution, one first task will be to investigate how some of the mutations we introduced affect the activity of a broader fragment containing the entire spot activity (and the wing blade enhancer), closer to the native context of this enhancer. Another challenging step will be to identify the direct inputs integrated along its sequence. It will also be necessary to characterize their biochemical interactions with DNA and with one another. Ultimately, to fully grasp the enhancer logic will mean to be able to recreate these interactions in a functional synthetic regulatory element.


Fly husbandry

Our D. melanogaster stocks were maintained on standard cornmeal medium at 25°C with a 12:12 day-night light cycle.


All reporter constructs were injected as in (19). We used ɸC31-mediated transgenesis (40) and integrated all constructs at the genomic attP site VK00016 (41) on chromosome 2. All transgenic lines were genotyped to ascertain that the enhancer sequence was correct.

Molecular biology

All 196-bp constructs derived from the D. biarmipes spot196 sequence were synthesized in vitro by a biotech company (Integrated DNA Technologies, Coralville, USA; catalog no. 121416). Table S1 provides a list of all constructs and their sequences. Each construct was cloned by In-Fusion (Takara, Mountain View, USA) in our pRedSA vector [a custom version of the transformation vector pRed H-Stinger (42) with a 284-bp attB site for ɸC31-mediated transgenesis (40) cloned at the Avr II site of pRed H-Stinger]. All constructs in Fig. 1 were cloned by cutting pRedSA with Kpn I and Nhe I and using the following homology arms for In-Fusion cloning: 5′-GAGCCCGGGCGAATT-3′ and 5′-GATCCCTCGAGGAGC-3′. Likewise, constructs in Fig. 3 were cloned by cutting pRedSA with Bam HI and Eco RI and using the following homology arms for In-Fusion cloning: 5′-GAGCCCGGGCGAATT-3′ and 5′-GATCCCTCGAGGAGC-3′.

Wing preparation and imaging

All transgenic wings imaged in this study were homozygous for the reporter construct. Males were selected at emergence from pupa, a stage that we call “post-emergence,” when their wings are unfolded but still slightly curled. When flies were massively emerging from an amplified stock, we collected every 10 min and froze staged flies at −20°C until we had reached a sufficient number of flies. In any case, staged flies were processed after a maximum of 48 hours at −20°C. We dissected a single wing per male. Upon dissection, wings were immediately mounted onto a microscope slide coated with transparent glue (see below) and fixed for 1 hour at room temperature in 4% paraformaldehyde diluted in phosphate-buffered saline–1% Triton X-100 (PBST). Slides with mounted wings were then rinsed in PBST and kept in a PBST bath at 4°C until the next day. Slides were then removed from PBST, and the wings were covered with Vectashield (Vector Laboratories, Burlingame, USA). The samples were then covered with a coverslip. Preparations were stored for a maximum of 48 hours at 4°C until image acquisition.

The glue-coated slides were prepared immediately before wing mounting by dissolving adhesive tape (Tesa brand, tesafilm, ref. 57912) in heptane (two rolls in 100 ml of heptane) and spreading a thin layer of this solution onto a clean microscope slide. Once the heptane had evaporated (under a fume hood), the slide was ready for wing mounting. All wing images were acquired as 16-bit images on a Ti2 Eclipse Nikon microscope equipped with a Nikon 10× plan apochromatic lens (numerical aperture, 0.45; Nikon Corporation, Tokyo, Japan) and a pco.edge 5.5 Mpx sCMOS camera (PCO, Kelheim, Germany) under illumination from a Lumencor SOLA SE II light source (Lumencor, Beaverton, OR, USA). Each wing was imaged by tiling and stitching of several z-stacks (z-step, 4 μm) with 50% overlap between tiles. Each image comprises a fluorescent channel (ET-DSRed filter cube, Chroma Technology Corporation, Bellows Falls, VT, USA) and a bright-field channel (acquired using flat field correction from the Nikon NIS-Elements software throughout), the latter being used for later image alignment. To ensure that fluorescence measurements are comparable between imaging sessions, we used identical settings for the fluorescence light source (100% output), light path, and camera (20-ms exposure time, no active shutter) to achieve comparable fluorescence excitation.


Stitched three-dimensional (3D) stacks were projected to 2D images for subsequent analysis. The local sharpness average of the bright-field channel was computed for each pixel position in each z-slice, and an index of the slice with the maximum sharpness was recorded and smoothed with a Gaussian kernel (sigma = 5 px). Both bright-field and fluorescent 2D images were reconstituted by taking the value of the sharpest slice for each pixel.

Image alignment

Wing images were aligned using the veins as a reference. Fourteen landmarks placed on vein intersections and end points and 26 sliding landmarks equally spaced along the veins were placed on bright-field images using a semi-automatized pipeline. Landmark coordinates on the image were then used to warp with a deformable model (thin plate spline) bright-field and fluorescent images to match the landmarks of an arbitrarily chosen reference wing by the thin plate spline interpolation (43). All wings were then in the same coordinate system, defined by their venation.

Fluorescent signal description

A transgenic line with an empty reporter vector (ø) was used as a proxy to measure noise and tissue autofluorescence. The median raw fluorescent image was computed across all ø images and used to remove autofluorescence, subtracted from all raw images before the following steps. All variation of fluorescence below the median ø value was discarded. The DsRed reporter signal was mostly localized in the cell nuclei. We measured the local average fluorescent levels by smoothing fluorescence intensity, through a Gaussian filter (sigma = 8 px) on the raw 2D fluorescent signal. The sigma corresponded roughly to two times the distance between the adjacent nuclei. To lower the memory requirement, images were then subsampled by a factor of 2. We used the 89,735 pixels inside the wings as descriptors of the phenotype for all subsequent analyses.

Average phenotypes, differences, logRatio colormaps, and normalization

Average reporter expression phenotypes were computed as the average smoothed fluorescence intensity at every pixel among all individuals in a given group (tens of individuals from the same transgenic line). The difference between groups was computed as the pixel-wise difference between the average of the groups (fig. S3). logRatio between two constructs represents the fold change of a phenotype relative to another and is calculated as the pixel-wise logarithm of the ratio between the two phenotypes. Averages, difference, and logRatio images were represented using colors equally spaced in CIELAB perceptual color space (44). With these colormaps, the perceived difference in colors corresponds to the actual difference in signal. Colormaps were spread between the minimal and maximal signals across all averages for average phenotypes. Difference and logRatio spread between minus and plus represent the absolute value of all difference for the phenotype differences, with gray colors indicating that the two compared phenotypes are equal.

Mutation effect direction and intensity

We proposed to represent the necessity of a stretch of the sequence along the enhancer with the activity levels of mutants of this stretch relative to the wild-type ([+]) activity. To summarize the overall effect of mutants (overexpression or underexpression), we measured the average level of activity across each wing relative to that of the reference. The reference level was defined as the average level of activity of all [+] individuals. The value at each position corresponds to the average of all individuals that present a sequence that have an effect on this position. The effect of a mutation is not strictly limited to the mutated bases, because they can also modify properties of DNA of flanking positions (45). To take this effect into account and produce a more realistic and conservative estimation of necessity measure at each position, we weighted the phenotypic contribution of each mutant line to the measure by the strength of the changes they introduce to the DNA shape descriptors at this position. At each position, the phenotype of constructs not affecting the DNA shape descriptors compared to [+] was not considered. When two mutants modify the DNA shape descriptors at one position, typically near the junction of two adjacent mutations, the effect at this position was computed as the weighted average of the effect of the two mutants, where the weight is the extent of the DNA shape modification relative to the [+] sequence. DNA shape descriptors were computed by the R package DNAshapeR (46). Notably, with an average of 11.5 bp, our A-tract mutations are somewhat larger than an average eukaryotic TFBS [~10 bp (32)], and each mutation is likely to affect up to two TFBSs. This size represents the limit of regulatory content that we can discriminate in this study.

PCA and difference significance

The intensity measure is an average of the overall and variable expression across the wing. Hence, mutations causing a different effect on the phenotype can have the same intensity value. To test whether the mutant significantly differs from [+], we used comprehensive and unbiased phenotype descriptors provided by PCA, which removes the correlation between pixel intensities and describes the variation in reporter gene expression. PCA was calculated on the matrix regrouping intensities of all pixels for every individual, of dimensions (n_individuals × n_pixels on the wing). The significance of the difference between two constructs considers the multivariate variation of the phenotypes and is tested using multivariate analysis of variance (MANOVA) on all five first components explaining more than 0.5% of the total variance (data file S3).

Overall expression intensity and significance

The overall expression level was measured for each individual as the average intensity across the wing. This was used to test the significance of overall increase and decrease in expression levels relative to the wild-type levels.

DNA rigidity scores

A-tracts are runs of consecutive A/T base pair without a TpA step. Stacking interactions and inter–base pair hydrogen bonds in ApA (TpT) or ApT steps of A-tracts lead to conformational rigidity (28). The length of an A-tract directly correlates with increased rigidity (47). To parametrize DNA rigidity at nucleotide resolution, we used A-tract length as a metric. For each position in a given DNA sequence, we find the longest consecutive run of the form AnTm that contains this position (with the requirement of n ≥ 0, m ≥ 0, and n + m ≥ 2), and score DNA rigidity at that position using the length of this subsequence. For example, the sequence AATCGCAT will map to the scores 3,3,3,0,0,0,2,2 because AAT and AT are A-tracts of lengths 3 and 2 bp, respectively.


Supplementary material for this article is available at

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: Funding: This work was supported by funds from the Ludwig Maximilian University of Munich, the Human Frontiers Science Program (program grant RGP0021/2018 to N.G., S.P., and R.R.), the Deutsche Forschungsgemeinschaft (grants INST 86/1783-1 LAGG and GO 2495/5-1 to N.G. and SPP 2202 to H.L. and H.H.), the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013/ERC Grant Agreement no. 615789 to B.P.), and the NIH (grant R35GM130376 to R.R.). Y.X. was supported by a fellowship from the China Scholarship Council (fellowship 201506990003). L.L. was supported by a DFG fellowship through the Graduate School of Quantitative Biosciences Munich (QBM). M.M. and D.D. are recipients of fellowships from the German Academic Exchange Service (DAAD). E.M. was supported by the Amgen Scholar program of the LMU. Author contributions: Y.L.P.: conceptualization, methodology, software, validation, formal analysis, data curation, writing—original draft, and visualization; Y.X.: validation, investigation, formal analysis, and data curation; L.L.: investigation and formal analysis; B.M.: investigation; R.J.: investigation; D.H.: software and data curation; D.B.: software and data curation; H.H.: methodology and supervision; H.L.: supervision; Y.W.: methodology, software, and formal analysis; E.O.: investigation; M.M.: investigation and formal analysis; D.D.: investigation and formal analysis; E.M.: investigation and formal analysis; R.R.: methodology, supervision, and funding acquisition; S.P.: software, supervision, and funding acquisition; B.P.: conceptualization, writing—original draft, and funding acquisition; N.G.: conceptualization, validation, writing—original draft, visualization, supervision, project administration, and funding acquisition. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article