Research ArticleNEUROSCIENCE

Autism-linked gene FoxP1 selectively regulates the cultural transmission of learned vocalizations

See allHide authors and affiliations

Science Advances  03 Feb 2021:
Vol. 7, no. 6, eabd2827
DOI: 10.1126/sciadv.abd2827


Autism spectrum disorders (ASDs) are characterized by impaired learning of social skills and language. Memories of how parents and other social models behave are used to guide behavioral learning. How ASD-linked genes affect the intertwined aspects of observational learning and behavioral imitation is not known. Here, we examine how disrupted expression of the ASD gene FOXP1, which causes severe impairments in speech and language learning, affects the cultural transmission of birdsong between adult and juvenile zebra finches. FoxP1 is widely expressed in striatal-projecting forebrain mirror neurons. Knockdown of FoxP1 in this circuit prevents juvenile birds from forming memories of an adult song model but does not interrupt learning how to vocally imitate a previously memorized song. This selective learning deficit is associated with potent disruptions to experience-dependent structural and synaptic plasticity in mirror neurons. Thus, FoxP1 regulates the ability to form memories essential to the cultural transmission of behavior.


Humans and other animals learn many of their complex and socially oriented behaviors by imitating more experienced individuals in their environment. For example, development of spoken language is rooted in a child’s ability to imitate the speech patterns of their parent(s) and other adults (13). Developmental learning of culturally transmitted behaviors is impaired in many neurodevelopmental disorders, and disruptions in learning social skills and speech and language are important early indicators of autism spectrum disorder (ASD) (46). Nonetheless, how ASD risk genes affect discrete aspects of behavioral imitation, like acquiring memories of appropriate social behaviors or mimicry of those observed behaviors, is still poorly understood. To start to address this issue, we sought to examine the role of the high-risk ASD gene FOXP1 (forkhead-box protein 1) in forebrain circuits important for the cultural transmission of song between adult and juvenile zebra finches (Fig. 1, A to D, fig. S1, and Materials and Methods for details on analysis of song behavior) (7, 8).

Fig. 1 Overview of zebra finch song learning and neural circuits for song.

(A) Timeline of zebra finch song learning in juvenile males. (B to D) Example representations of an adult zebra finch song; each color represents a syllable or note in the song. (B) Spectrogram of an adult male’s song. The y axis represents the frequency range (0 to 11.025 kHz), while the x axis represents total duration (5.27 s), and the colors reflect the amplitude. Colored bars underneath indicate introductory notes (pink, i) and syllables (a to h). (C) A syntax raster plot showing the syllables sung over repeated song bouts; colors reflect the syllables produced. (D) A representation of song syntax, with thickness of arrows representing the probability of syllable transitions. (E) Parasagittal schematic of the song circuit, with relevant nuclei labeled: Area X, striato-pallidal basal ganglia nucleus; Av, nucleus avalanche; HVC, premotor song nucleus; LMAN, lateral magnocellular nucleus of the anterior nidopallium; NIf, nucleus interfacialis of the nidopallium; Uva, nucleus uvaeformis; RA, robust nucleus of the arcopallium.

FOXP1 is among the top five ASD risk genes (9), and its haploinsufficiency causes specific language impairment and intellectual disability (10, 11). FoxP1 is expressed in many of the same areas of the pallium and basal ganglia in mammals and songbirds (1214). In zebra finches, FoxP1 expression is enriched in many forebrain regions known to be important for song learning (Fig. 1E and fig. S2) (1315). Here, we focus on the role of FoxP1 in the pallial region HVC (proper name), a premotor cortical analog. HVC is involved in the formation of song memories, in the vocal imitation process, and is necessary for the production of learned song (1622).

It has been suggested that in humans cortical mirror neurons participate in the dual functions of perception and expression of culturally transmitted behaviors like speech and language (2329). However, the role of mirror neurons in neurodevelopmental disorders that impair learning of culturally transmitted behaviors is still unclear (3034). The songbird brain, and in particular HVC, contains mirror neurons hypothesized to be important for song imitation (28, 3537). Young zebra finches learn to imitate song by first memorizing the temporal and spectral properties of an adult birds’ song and then by practicing singing several thousand times per day for ~60 days (21). Mirror neurons in HVC project to a portion of the striatum involved in song learning (Area X) and are hypothesized to be important for young birds to improve their song as they practice (23, 28, 38). Nonetheless, the function of Area X projecting HVC neurons (HVCX) in song learning is still poorly understood.

In this study, we find that FoxP1 is widely expressed in HVCX neurons (14). Knockdown of FoxP1 (FP1-KD) in HVC disrupts experience-driven structural and functional plasticity in HVCX neurons. Ultimately, it potently blocks a juvenile birds’ ability to learn from an adult song tutor, resulting in birds that fail to imitate any song over development despite having had extensive opportunities to learn from natural interactions with their tutor.


FoxP1 is expressed in striatal-projecting HVCX neurons

HVC has three nonoverlapping classes of projection neurons: HVCX, HVCAv, and HVCRA (Fig. 1E) (39). HVCX and HVCAv neurons transmit vocal motor-related signals to a portion of the striatum involved in song learning, Area X, and to the auditory nucleus Avalanche (Av), respectively. HVCRA neurons provide descending motor commands to the motor cortical-analog robust nucleus of the arcopallium (RA). HVCRA projections are necessary for the production of learned song at all stages of life (20, 4042). In contrast, HVCX and HVCAv projections are important for motor imitation of tutor song in juvenile birds but are not essential for song production in adult birds (35, 39, 40).

We used anatomical tracing and immunolabeling to examine FoxP1 expression in these different classes of neurons (Fig. 2). We found that FoxP1 is expressed in most of the HVC neurons projecting to the striatum (74.41 ± 2.17% of HVCX neurons) and in a smaller proportion of neurons projecting to the auditory system (24.27 ± 2.54% of HVCAv neurons) or to RA (29.98 ± 0.65% of HVCRA neurons; Fig. 2A and fig. S2).

Fig. 2 FoxP1 expression and knockdown in HVC.

(A) FoxP1 expression in different classes of HVC projection neurons. (Top) Schematics of retrograde injections, (middle) the proportion of cells that express FoxP1 for each cell type, and (bottom) FoxP1-expressing neurons for each HVC subtype, per HVC section (HVCX: 74.4 ± 2.2%, n = 3 birds, 6 hemispheres; HVCRA: 30.0 ± 0.7%, n = 2 birds, 4 hemispheres; HVCAv: 24.3 ± 2.5%, n = 3 birds, 6 hemispheres). (B) Western blot using a custom-made rabbit anti-FoxP1 antibody (59) of lysates from HVC injected with control (rAAV9/ds-CBh-GFP) (Ctrl) or shFoxP1 AAV (pscAAV-GFP-shFoxP1) (FP1-KD). (Bottom, left) Schematic of viral injections of control (n = 4 birds) or shFoxP1 (n = 4 birds) groups. (Bottom, right) Graph shows quantification of FoxP1 protein. Signals were normalized to GAPDH, averaged for each condition, and normalized to the controls. Histograms represent average ± SEM (FoxP1-80: control: 100 ± 26.6% versus FP1-KD: 54.7 ± 3.5%, Student’s t test with Bonferroni-Sidak correction for multiple comparisons, P > 0.05; FoxP1-70: control: 100 ± 25.0% versus FP1-KD: 24.7 ± 4.4%, Student’s t test with Bonferroni-Sidak, P = 0.027). n.s., not significant. (C) Representative examples of HVC sections from control (top) and FP1-KD (bottom) birds. Injections were performed as in schematic in (B). HVCX cells labeled with retrograde tracer in Area X (magenta, left), GFP signal from AAV-control/AAV-shFoxP1 injection (yellow, middle left), FoxP1 staining with antibody (cyan, middle right), and a merged composite (right). Inset boxes indicate example cell per condition, and arrowheads indicate the soma of the example neurons. Scale bars, 50 μm. (D) Quantification of (C), showing the difference in colocalization between control (n = 3 birds, 6 hemispheres) and FP1-KD (n = 3 birds, 5 hemispheres), as the normalized percentage of tracer-labeled cells that express FoxP1. Bar graphs represent average ± SEM (control: 100 ± 1.8% versus FP1-KD: 84.59 ± 2.52%, Student’s t test, P = 0.0006).

The widespread expression of FoxP1 in striatal-projecting HVCX neurons is of interest because HVCX neurons are thought to provide timing cues to basal ganglia circuits involved in reinforcement-based motor imitation of song elements (28, 38, 4346). In support of this view, lesions of HVCX neurons have recently been shown to significantly disrupt behavioral imitation of song in juvenile zebra finches (35); however, it is not known whether these learning deficits arise from problems in acquiring a memory during interactions with a song tutor or from problems in modifying song as juveniles practice singing.

Knockdown of FoxP1 in HVC blocks tutor song memory, but not behavioral imitation of song

To test the function of FoxP1 in the cultural transmission of birdsong, we developed a short hairpin RNA against FoxP1. Using an adeno-associated virus (AAV), we demonstrated that this construct can knock down FoxP1 expression in HVC significantly and that the virus preferentially expresses in HVCX neurons (Fig. 2, B to D, and figs. S3 and S4A). We then proceeded to knock down the expression of FoxP1 (FP1-KD) in age-matched juvenile birds at two stages of song learning: before or after they had an opportunity to memorize the song of an adult song tutor.

Juvenile male zebra finches memorize the song of their father or other adult song tutor(s) in the first 2 months of life. They then use auditory feedback and extensive practice to learn how to accurately imitate this memorized song by 90 to 100 days post-hatching (dph) (21). Juvenile birds can memorize the song of a tutor at any time between 20 and 60 dph, but they do not start to practice singing until approximately 35 to 40 dph (Fig. 1A). This developmental progression and the ability to raise birds in groups without a song tutor—referred to as “isolates”—allowed us to knock down FoxP1 expression before behavioral imitation of song, either before or after birds had an opportunity to form a memory of a tutor song (Fig. 3A: tutor exposure before FP1-KD, referred to as “behavioral imitation” group, FP1-KD BI; Fig. 3E: tutor exposure after FP1-KD, referred to as “social experience” group, FP1-KD SE) (19). Given the widespread expression of FoxP1 in striatal-projecting HVCX neurons, we hypothesized that FP1-KD might disrupt motor aspects of behavioral imitation of the tutors’ song (35). For example, it could impair a young bird’s ability to precisely modify song syllables to produce a good imitation of their tutor’s song. However, we found that FP1-KD, after birds had already formed a memory of the tutor song (FP1-KD BI birds), did not disrupt their ability to learn how to imitate that song over development (Fig. 3, A to D, and fig. S5A). We found that the adult songs of the FP1-KD BI group were stereotyped and indistinguishable from normal zebra finch song, suggesting that FoxP1 in HVC is not necessary for motor aspects of behavioral imitation in juvenile birds or in more basic aspects of song production.

Fig. 3 Song learning is impaired by FoxP1 KD.

(A, E, I, and M) Timelines illustrating tutoring experience of FP1-KD behavioral imitation (A), FP1-KD social experience (E), control social experience (I), and full isolate birds (M). (B, F, J, and N) Representative spectrograms from a single bird belonging to each experimental group and their tutor. All spectrograms are 5.27 s in duration and reflect a frequency range of 0 to 11.025 kHz. Colored underlines reflect syllable labels used for subsequent syntax visualizations and analysis. Similarity refers to percentage similarity to tutor; group comparisons in Fig. 4. The full isolate bird (N) had no tutor and therefore has no similarity to tutor score. It is contrasted with the song of an adult zebra finch with typical tutor exposure. (C, G, K, and O) Syntax raster plots illustrating the syntax stereotypy of an example bird for each condition. This is the same bird used for the spectrogram and syntax diagram. Each row reflects a single song bout, and each colored block reflects the syllable sung at that position in the bout. Rows are sorted according to syllable order. ER, entropy rate; group comparisons in Fig. 4. (D, H, L, and P) Diagrams reflecting syllable transitions produced by an example bird for each condition. Line thickness is proportional to the transition probability from the originating syllable to the following. Transitions with a probability of less than 4% are omitted for clarity.

In contrast, FP1-KD before experience with a tutor severely disrupted subsequent song learning (FP1-KD SE birds; Fig. 3, E to H, and fig. S5B). We found that birds with FP1-KD before tutor experience subsequently learned little from their song tutor and significantly less than the FP1-KD BI or control birds (Figs. 3, A to L, and 4, A and B, and fig. S5C). All but a single outlier failed to imitate any of their tutor’s song syllables. As adults, FP1-KD SE birds sang songs that were highly variable from trial to trial and with entropy rates (Materials and Methods) higher than their tutors (fig. S6), the FP1-KD BI birds, and the control birds (Figs. 3, A to L, and 4, C and D). We found that their songs had entropy rates that were statistically indistinguishable from birds that were never tutored during development (full isolates; Figs. 3, M to P, and 4E, and fig. S5D). As a further control, we found that birds injected with an shScrambled-encoding virus (Scr) and raised identically to the FP1-KD SE and Ctrl SE birds appear to learn song normally, with tutor similarity scores and entropy rates indistinguishable from the Ctrl SE group (fig. S7).

Fig. 4 Quantification of song learning and syntax in FP1-KD birds.

(A) FP1-KD social experience birds (n = 8) have significantly lower song similarity to tutor than FP1-KD behavioral imitation birds (n = 9, FP1-KD SE: 12.39 versus FP1-KD BE: 72.32, Mann-Whitney test, P < 0.001). Filled points correspond to the example birds shown in Fig. 3. (B) FP1-KD social experience birds (n = 8) have significantly lower song similarity to tutor than control social experience birds (n = 10, FP1-KD SE: 12.39 versus Ctrl SE: 54.6, Mann-Whitney test, P = 0.0031). Filled points correspond to the example birds shown in Fig. 3. (C) FP1-KD social experience birds (n = 8) have significantly higher song syntax entropy rates than FP1-KD behavioral imitation birds (n = 9, FP1-KD SE: 1.274 versus FP1-KD BI: 0.4512, Mann-Whitney test, P < 0.001). Filled points correspond to the example birds shown in Fig. 3. (D) FP1-KD social experience birds (n = 8) have significantly higher song syntax entropy rates than control social experience birds (n = 10, FP1-KD SE: 1.274 versus Ctrl SE: 0.6113, Mann-Whitney test, P = 0.0085). Filled points correspond to the example birds shown in Fig. 3. (E) Song syntax entropy rates do not differ significantly between FP1-KD social imitation (n = 8) and full isolate birds (n = 8, FP1-KD SE: 1.274 versus full isolate: 0.9667, Mann-Whitney test, P > 0.05). Filled points correspond to example birds shown in Fig. 3. For all box plots, median, 25th and 75th percentile, and minimum and maximum are reported. The single data points are overlaid on the side.

These findings suggest that FP1-KD critically impairs the cultural transmission of vocal behavior by selectively disrupting only one aspect of the song learning process: the ability to form appropriate memories during interactions with a song tutor. Unexpectedly, FP1-KD in HVC did not disrupt the ability to imitate a previously memorized song model, a sensorimotor learning process that requires extensive practice but does not require further interactions with the song tutor or other birds. This suggests that FP1-KD does not disrupt vocal production or the ability to evaluate and modify song performances using a tutor-song memory but selectively and potently impairs the ability to form that memory. The widespread expression of FoxP1 in HVCX neurons further implicates this circuitry in the initial formation of tutor-song memories.

Knockdown of FoxP1 inhibits dendritic spine turnover and decreases the intrinsic excitability of HVCX neurons

We next sought to identify the consequences of FP1-KD on the structural plasticity and intrinsic excitability of HVCX neurons. Previous research has shown that plasticity in HVC is predictive of a young bird’s ability to form tutor-song memories during development (16, 17). Birds with high levels of dendritic spine turnover are better learners than birds with low turnover (17). Therefore, we first used longitudinal in vivo two-photon imaging to track FP1-KD–mediated changes to dendritic spine dynamics in our social experience groups and in separate cohorts of juvenile isolates (Fig. 5A). We imaged spines on virally transduced, retrogradely labeled HVCX neurons, both in our FP1-KD cohorts and in control birds.

Fig. 5 FP1-KD reduces structural plasticity on HVCX neurons.

(A) Left: Schematic of the experimental protocol and timeline of the experiments. Right: In vivo two-photon images of sample GFP-labeled (green) and retrogradely labeled (red) control and FP1-KD HVCX neurons. Scale bar, 50 μm. (B) Left: Representative in vivo two-photon images of GFP-expressing dendrite sections from control (top) and FP1-KD (bottom) normally reared adult HVCX neurons. Scale bar, 5 μm. Right: Average ± SEM dendritic spine density (spines per micrometer) from adult HVCX neurons (control adult: 0.56 ± 0.03, n = 821 spines, 6 cells, 2 animals; FP1-KD adult: 0.67 ± 0.02, n = 668 spines, 6 cells, 5 animals; Student’s t test, P = 0.01). (C) Left: Representative in vivo two-photon images of GFP-expressing dendrite sections from control (top) and FP1-KD (bottom) juvenile isolate HVCX neurons. Scale bar, 5 μm. Right: Average ± SEM dendritic spine density (spines per micrometer) from juvenile HVCX neurons (control juvenile: 0.73 ± 0.03, n = 769 spines, 4 cells, 3 animals; FP1-KD juvenile: 0.51 ± 0.03, n = 745 spines, 6 cells, 5 animals; Student’s t test, P < 0.001). (D) Left: Control and FP1-KD adult dendritic segments from HVCX neurons taken at two different times (t0,t1 across a 4-hour imaging interval). Filled and empty arrowheads indicate gained and lost spines, respectively. Scale bars, 2 μm. Right: Average ± SEM percent dendritic spine turnover (acquired + lost spines/total spines counted) from control and FP1-KD adults (control adult: 4.3 ± 0.8%, n = 1126 spines, 6 cells, 2 animals; FP1-KD adult: 0.0 ± 0.0%, n = 1148 spines, 6 cells, 5 animals; Student’s t test, P < 0.001). (E) Left: Representative images of control and FP1-KD juvenile dendritic segments from HVCX neurons, taken at two different times (t0,t1 across a 2-hour imaging interval). Scale bars, 2 μm. Right: Average ± SEM percent dendritic spine turnover (control juvenile: 13.6 ± 3.1%, n = 650 spines, 4 cells, 3 animals; FP1-KD juvenile: 0.0 ± 0.0%, n = 735 spines, 6 cells, 5 animals; Student’s t test, P < 0.001).

We found that FP1-KD significantly reduced spine turnover in both juvenile and adult birds compared to age-matched controls (Fig. 5, D and E), suggesting that structural plasticity in HVCX neurons may be important for young birds to form the tutor-song memories used to guide song imitation. In addition, we found that HVCX neurons had significantly higher spine density in FP1-KD adults and lower spine density in FP1-KD juveniles than in their respective age-matched controls (Fig. 5, B and C), suggesting that FP1-KD also disrupts changes in spine density that occur during song learning.

Decreased spine turnover in FP1-KD birds could be tied to differences in the intrinsic excitability of HVCX neurons. For example, deafening of adult zebra finches causes increased intrinsic excitability and increased dendritic spine turnover in HVCX neurons (47). To examine this, we conducted whole-cell current-clamp recordings from virally expressing retrogradely labeled HVCX neurons from birds receiving injections with either shFoxP1-GFP (green fluorescent protein) or control viruses in HVC (Fig. 6A). We first examined the effect of FP1-KD in young adults that had been raised with song tutors and allowed to learn songs. Birds received viral injections in HVC between 85 and 95 dph, at the end of the formative stage for song imitation, and electrophysiological recordings were conducted 10 days later (95 to 105 dph). We found that neurons expressing shFoxP1 had significantly decreased intrinsic excitability compared to control HVCX neurons, expressing either GFP or Scr (Fig. 6B and fig. S8). We next examined whether FP1-KD had similar effects on the intrinsic excitability of HVCX neurons in juvenile isolates. Although the overall excitability of HVCX neurons appeared lower in young isolates, FP1-KD still significantly decreased intrinsic excitability of HVCX neurons (Fig. 6C). Together, these results point to the potentially important role of HVCX neuron intrinsic excitability and dendritic spine plasticity in forming memories used to guide vocal learning.

Fig. 6 FP1-KD reduces excitability of HVCX neurons.

(A) Schematic of an ex vivo slice and patch-clamp recording setup with high-resolution image of HVC in a brain slice used for electrophysiology. Scale bar, 50 μm. (B) Example traces (left; scale bars, 20 mV and 200 ms) and plot (right) reporting the number of action potentials (AP) elicited by somatic current injections in HVCX neurons from control and FP1-KD adult brain slices. FoxP1 knockdown decreased the intrinsic excitability of HVCX neurons in adults (controls, n = 10, 5 animals; FP1-KD, n = 8, 3 animals; two-way ANOVA, interaction F10,160 = 30.87, treatment F1,16 = 34.56, P < 0.001). (C) Example traces (left; scale bars, 20 mV and 200 ms) and plot (right) reporting the number of action potentials elicited by somatic current injections in HVCX neurons from control and FP1-KD juvenile brain slices. FoxP1 knockdown decreased the intrinsic excitability of HVCX neurons in isolate juveniles (controls, n = 12, 3 animals; FP1-KD, n = 6, 2 animals; two-way ANOVA, interaction F10,200 = 7.053, treatment F1,20 = 7.627, P = 0.01). All data are reported as average ± SEM.

Knockdown of FoxP1 blocks synaptic and network hallmarks of memory formation

We next explored whether experience with a tutor, and presumably formation of a tutor-song memory, is sufficient to elicit changes in the intrinsic excitability and synaptic physiology of HVCX neurons. We compared juvenile isolates with age-matched birds that were housed with a song tutor for two consecutive days (Fig. 7A). We found that 2 days of experience with a song tutor drove a significant increase in the intrinsic excitability of HVCX neurons in control birds (Fig. 7B). FP1-KD prevented this experience-dependent change in excitability. Neither FP1-KD nor experience with a tutor affected the excitation-to-inhibition ratio (E/I ratio) in HVCX neurons (Fig. 7C). However, 2 days of social experience led to a significant increase in AMPA/N-methyl-d-aspartate (NMDA) receptor ratios in HVCX neurons from control birds (Fig. 7D), a result consistent with synaptic strengthening following tutor experience (16, 17). In contrast, FP1-KD prevented this experience-dependent increase in AMPA/NMDA receptor ratio. These results indicate that experience with a song tutor results in an increase in both excitability and AMPA/NMDA receptor ratios in HVCX neurons and that FP1-KD is sufficient to block these signatures of tutor-song memory.

Fig. 7 FP1-KD prevents experience-dependent synaptic strength modifications.

(A) Experimental timeline. (B) HVCX neurons were more excitable in control birds subjected to the 2-day tutoring regime compared to isolates (two-way ANOVA, interaction F10,190 = 4.598, treatment F1,19 = 5.376, P = 0.03). This difference is not present in FP1-KD birds (two-way ANOVA, interaction F10,180 = 0.4578, treatment F1,18 = 0.4018, P > 0.05). FoxP1 knockdown decreased intrinsic excitability of HVCX neurons (controls, n = 9, 4 animals; FP1-KD, n = 10, 3 animals; two-way ANOVA, interaction F10,170 = 9.308, treatment F1,17 = 10.73, P = 0.005). Trend lines relative to the same experiments, but conducted in isolates, are reported here from Fig. 6C. Data are reported as average ± SEM. (C) AMPA receptor (AMPAR)– and GABA receptor (GABAR)–mediated currents recorded at −60 or 10 mV, respectively (scale bars,100 pA and 100 ms), in HVCX neurons from isolate and 2-day tutored birds. AMPAR/GABAR current amplitude ratios from isolates (open triangles) and 2-day tutored birds (open squares). FoxP1 knockdown has no significant effect on the AMPAR/GABAR ratios (isolates control: 0.25 ± 0.04, n = 10, 5 animals; isolated FP1-KD: 0.23 ± 0.06, n = 11, 5 animals; 2d tutored control: 0.16 ± 0.02, n = 14, 5 animals; 2d tutored FP1-KD: 0.22 ± 0.03, n = 18, 6 animals; Kruskal-Wallis test, P > 0.05). (D) Evoked AMPAR- and NMDAR-mediated currents recorded at −70 or +40 mV, respectively (scale bars, 50 pA and 100 ms), in HVCX neurons in isolates and 2-day tutored birds (isolate control: 2.2 ± 0.2, n = 12, 6 animals; isolate FP1-KD: 1.4 ± 0.2, n = 8, 5 animals; 2d tutored control: 8.0 ± 1.4, n = 10, 6 animals; 2d tutored FP1-KD: 2.9 ± 0.4, n = 17, 6 animals; Kruskal-Wallis test, P < 0.001; Dunn’s multiple comparisons, isolate control versus 2d tutored controls, P = 0.002, isolate FP1-KD versus 2d tutored controls, P < 0.001, 2d tutored controls versus 2d tutored FP1-KD, P = 0.008). For all box plots, median, 25th and 75th percentile, and minimum and maximum are reported.

Last, we tested whether these cellular and synaptic effects of FP1-KD were sufficient to block in vivo, network-level hallmarks of tutor-song memory. Acquisition of tutor-song memories is correlated with the rapid emergence of prolonged patterns of bursting activity in HVC and in RA (17, 18, 48). Taking advantage of the lack of a corpus callosum directly connecting HVC from the right and left hemispheres in zebra finches (49), we knocked down FoxP1 in only one hemisphere while expressing a control virus in the other. Birds were either maintained in social isolation from a song tutor (fig. S9A) or given 2 days of experience with a song tutor (Fig. 8A). We then made bilateral extracellular recordings from HVC to assess baseline and learning-related changes to network activity. In isolate birds (baseline condition), we did not detect any network-level differences in spontaneous neuronal activity between the FP1-KD and control hemispheres (figs. S9 and S10, A and B). This indicates that FP1-KD does not cause large-scale changes in the excitability or bursting properties of HVC neurons in the absence of previous tutor-song experience.

Fig. 8 FP1-KD reduces the experience-dependent reorganization of network-level activity.

(A) Schematic of experimental time line. AAV injections in HVC to knock down FoxP1 in one hemisphere and control virus in the other hemisphere (pseudo-randomized). Extracellular activity was recorded in both hemispheres (n = 6 birds, 3 to 5 recordings per hemisphere). (B) Sample traces (scale bar, 0.5 mV, 1 s) and average interspike interval distribution (bin 1 ms, 1 to 100 ms, logarithmic scale, 300 s per recording) (control hemispheres versus FP1-KD hemispheres, two-way ANOVA, interaction F99,990 = 1.222, P > 0.05). Data are reported as average (thick line) ± SEM (semitransparent contour). (C) Total number of bursts (control hemispheres: 199.4 ± 44.4 versus FP1-KD hemispheres: 148.0 ± 41.2; Wilcoxon matched-pairs signed-rank test, P = 0.03). (D) Average number of spikes in a burst (control hemispheres: 9.4 ± 1.6 versus FP1-KD hemispheres: 6.0 ± 0.4; Wilcoxon matched-pairs signed-rank test, P = 0.03). (E) Average interburst interval (control hemispheres: 1.5 ± 0.3 versus FP1-KD hemispheres: 2.4 ± 0.5; Wilcoxon matched-pairs signed-rank test, P = 0.03). (F) Average burst length (control hemispheres: 101.6 ± 30 versus FP1-KD hemispheres: 58.6 ± 5.8; Wilcoxon matched-pairs signed-rank test, P = 0.03). (G) Relative distribution of burst duration, normalized for each recording (5-ms duration bins, 5 to 2500 ms, logarithmic scale; control hemispheres versus FP1-KD hemispheres, two-way ANOVA, interaction F498,4980 = 1.972, P < 0.001, control versus FP1-KD F1,10 = 5.570, P = 0.04). Data are reported as average (thick line) ± SEM (semitransparent contour). (H) Average relative prevalence of bursts with durations between 5 and 15 ms (control hemispheres: 27.9 ± 4.0 versus FP1-KD hemispheres: 37.8 ± 2.8; Wilcoxon matched-pairs signed-rank test, P = 0.03). (I) Average relative prevalence of bursts with durations between 15 and 2500 ms (control hemispheres: 72.1 ± 4.0 versus FP1-KD hemispheres: 62.2 ± 2.8; Wilcoxon matched-pairs signed-rank test, P = 0.03).

In contrast, following 2 days of experience with a song tutor, we observed large-scale differences in the spontaneous bursting activity recorded between the control and FP1-KD hemispheres (Fig. 8, B to I). We observed significantly fewer bursts in the FP1-KD hemispheres, and these bursts had fewer spikes and were shorter than bursts recorded in the control brain hemispheres (Fig. 8, C to I). The total number of spikes, interspike interval distribution, and interspike intervals within bursts was not affected by FP1-KD (Fig. 8B and fig. S10, C and D). This suggests that while overall activity levels were preserved in the two hemispheres, the tutor-song experience drove redistribution of activity into sustained bursting patterns in the control, but not in the FP1-KD hemispheres. This indicates that knockdown of FoxP1 is sufficient to block circuit-level hallmarks associated with forming tutor-song memories and, thus, impairs the encoding of vocal memories that guide the cultural transmission of song.


Genetic manipulations of ASD-linked genes can now be applied in many animal models. However, it is challenging to study how these genes affect learning of culturally transmitted behaviors because most species used in ASD research do not transmit behavioral repertoires from one generation to the next via imitation (50). Zebra finches learn their song via imitation, continually transmitting this behavior across generations, thus providing an opportunity to examine the role of ASD-linked genes in developmental learning of a complex culturally transmitted behavior.

We found that FP1-KD in the songbird brain produces a series of synaptic, cellular, and network deficits, some of which resemble those recently described in mouse models (5153). We show that FoxP1 is widely expressed in striatal-projecting HVCX mirror neurons and that FoxP1 expression in this circuit is essential for young birds to encode memories that are used to guide imitation, but not for learning how to vocally imitate a previously memorized song. While the functional significance of mirror neurons in behavioral imitation and ASD has been a topic of much debate (27, 2931, 54, 55), our results suggest that they play an essential role in the cultural transmission of vocal behaviors from one generation to the next by helping to form memories of behaviors that young animals subsequently learn to imitate.

We found that FP1-KD in HVC impairs dendritic spine plasticity on HVCX neurons, dampens their intrinsic excitability, and blocks the cellular- and network-level signatures associated with tutor-song memory formation following experience with a song tutor. Although it is not yet clear whether the disruptions in learning shown here depend exclusively on HVCX neurons, as they are not the only cell type in HVC expressing FoxP1, we found that FoxP1 is broadly expressed across HVCX neurons and that our viral manipulations preferentially targeted these neurons. In addition to encoding tutor-song memories, HVC is also known to be essential for vocal imitation and song motor control (1622, 39). Therefore, our finding that FP1-KD selectively disrupts experience-dependent tutor-song memory encoding, while leaving song sensorimotor imitation and vocal production intact, is unexpected and indicates a specialized role for the entire network of FoxP1-expressing HVC neurons in vocal learning.

In this context, we should note that HVCX neurons are developmentally well positioned to play a role in tutor-song memory. Zebra finches can begin to form tutor-song memories as early as 20 days after hatching. The connections between HVC and Area X, via HVCX neurons, are well established by this age. However, HVC neurons do not start to innervate HVC’s other main target, RA, until 25 to 30 dph, well after the time when birds are known to be forming tutor-song memories (56, 57).

HVCX neurons project onto medium spiny neurons in Area X, and it has recently been shown that FoxP1 is also expressed in medium spiny neurons in both the direct and indirect pathways traversing the vocal basal ganglia (14, 58). A recent study demonstrated that FP1-KD in Area X of juvenile zebra finches causes song learning deficits characterized by incomplete imitation of tutor songs (15). These birds accurately imitate ~70% of their tutor’s song syllables by adulthood. This degree of learning is in stark contrast to the complete lack of imitation we observe in birds with FP1-KD in HVC before song tutoring. It is still unclear whether the learning deficits following FP1-KD in Area X are associated with disruption in tutor-song memory formation or sensorimotor learning. FP1-KD in Area X was carried out in birds raised with their song tutor when they were in the midst of forming tutor-song memories (23 dph). Therefore, distinguishing the role of FoxP1 in this downstream circuit in encoding information about tutor-song memories will require further investigation.

Our examination of FoxP1 in HVC provides proof that a detailed characterization of how ASD-linked genes affect different aspects of vocal imitation can offer insights into their roles in neurodevelopmental disorders. Together, this study implicates a role for FoxP1, mirror neurons, and experience-dependent plasticity in forming the memories used to transmit communicative behaviors from one generation to the next.



Experiments described in this study were conducted using juvenile and adult male zebra finches (30 to 130 dph). We raised either juvenile male zebra finches in isolation from an adult song model (isolates), with normal access to an adult song model (non-isolates), or isolates that were exposed to 2 days of tutoring by an adult song model (2-day tutored). All procedures were performed in accordance with protocols approved by Animal Care and Use Committee at UT Southwestern Medical Center.

Viral vectors

The following adeno-associated viral vectors were used in these experiments: rAAV2/9/ds-CBh-GFP (The University of North Carolina at Chapel Hill Gene Therapy Center Vector Core) and pscAAV-GFP-shFoxp1 [Intellectual and Developmental Disabilities Research Center (IDDRC) Neuroconnectivity Core at Baylor College of Medicine]. All viral vectors were aliquoted and stored at −80°C until use.


pscAAV-GFP-shFoxP1 was generated by polymerase chain reaction (PCR) amplification of U6-shFoxp1 from pLKO.1 (TRCN0000072005, Broad Institute) while adding Not I and Bam HI sites and then ligating into pscAAV9-CBh-GFP (46) digested with these enzymes. The short hairpin target sequence was GCTAACACTAAACGAAATCTA. The PCR program used was 98°C 2 min, 35 × (98°C 10 s, 55°C 15 s, and 72°C 5 s), and 72°C 7 min. The primers used were 5′-ATAAGAATGCGGCCGCTTTCCCATGATTCCTTC-3′ (forward) and 3′-CGCGGATCCAAAAAGCTAACACTAAACG-5′ (reverse). The scrambled hairpin (shScramble, sequence CCACTGTACTATCTATAACAT) was designed as a control.

Stereotaxic surgery

All surgical procedures were performed under aseptic conditions. Birds were anaesthetized using isoflurane inhalation (0.8 to 1.5%) and placed in a stereotaxic surgical apparatus. The centers of HVC and RA were identified with electrophysiological recordings, and Area X and Av were identified using stereotaxic coordinates.

Viral injections to HVC were performed using previously described procedures (46). Briefly, AAV vectors (pscAAV-GFP-shFoxP1 or rAAV9/ds-CBh-GFP) were injected into HVC (50 nl per injection and ~60 injections, for a total of ~3 μl) at ~35 dph, and the transgenes were allowed to express for a minimum of 10 days. We also injected 500 to 950 nl of differently conjugated tracers (dextran, Alexa Fluor 488, or Alexa Fluor 594, 10,000 molecular weight; Invitrogen) bilaterally into birds’ Area X, Av, and RA, respectively. Tracer injections were performed at the following approximate stereotaxic coordinates relative to interaural zero and the brain surface (rostral-caudal, medial-lateral, and dorsal-ventral, in millimeters, head angle): HVC (0, ±2.4, 0.1 to 0.6, with 30° head angle), RA (−1.0, ±2.4, 1.7 to 2.4, with 30° head angle), Area X (5.1, ±1.6, 3.3, with 45° head angle), and Av (1.75, ±2.0, 1.0, with 45° head angle).

Tutoring conditions

“Social experience” groups [FP1-KD SE (n = 8), Ctrl SE (n = 10), and Scr SE (n = 3)]. Juvenile male zebra finches raised in isolation from an adult song model were injected with viruses into HVC at 35 to 40 dph. After optimal viral expression, these juveniles were then housed with a song tutor between days ~47 and 65 dph. All birds were separated from their tutors at 65 dph and raised to adulthood.

“Behavioral imitation” group [FP1-KD BI (n = 9)]. Juvenile male zebra finches were reared with a song tutor and injected with viruses into HVC at 35 to 40 dph. They continued to be housed with their song tutor after viral injections until 60 dph, at which point they were separated from their tutors and raised to adulthood.

“Two-day tutored” groups [FP1-KD 2d (n = 21) and Ctrl 2d (n = 21)]. Juvenile male zebra finches that were raised as isolates were injected with viruses into HVC at 35 to 40 dph. After allowing time for viral expression, a song tutor was placed into the isolate’s cage for 2 days of tutoring. Birds were then separated from their tutors and immediately used for experiments.

“Full isolates” (Fig. 3 and fig. S5, n = 8). Juvenile male zebra finches raised in isolation from an adult song model until at least 90 dph, after which they were housed in groups with other adult males.

“Isolates” [Figs. 4 and 5, FP1-KD (n = 3) and Ctrl (n = 2)]. Juvenile male zebra finches raised in isolation from an adult song model were injected with viruses into HVC at ~35 dph and housed without exposure to an adult song model until used for electrophysiology 10 to 12 days following viral injection.

“Normally reared” [Fig. 5, FP1-KD (n = 3) and Ctrl (n = 5)]. Juvenile male zebra finches were reared with a song tutor, injected with viruses into HVC at 85 to 95 dph, and used for electrophysiology experiments ~10 days later.

In vivo two-photon imaging

We conducted longitudinal dendritic spine imaging in male juvenile zebra finches raised in isolation from an adult song model (isolates) aged 45 to 55 dph or adult zebra finches 90 to 100 dph. Viruses (pscAAV-GFP-shFoxP1 or rAAV9/ds-CBh-GFP) were allowed to express for a minimum of 14 days before a cranial window over HVC was made. Birds were anaesthetized by isoflurane inhalation (0.8 to 1.5%) and positioned in a stereotaxic apparatus for cranial windowing. The scalp overlying HVC was removed, and the scalp margins were sealed to the surface of the skull using Vetbond (n-butyl cyanoacrylate). Bilateral craniotomies (∼1 to 1.5 mm2) were made in the skull overlying HVC. The dura mater was excised, leaving the pia mater, the 60- to 150-μm-thick layer of neural tissue, and the lateral telencephalic ventricle overlying HVC intact. A custom-cut coverslip (no. 1 thickness) was placed directly on the pial surface and then sealed to the skull with dental acrylic. A head post was also affixed to the skull with dental acrylic. Birds were placed onto a custom stage under an Ultima IV Bruker laser scanning microscope running Prairie View software. Only HVC neurons that expressed both the retrograde tracer dextran Alexa Fluor 594 from Area X (HVCX neurons) and GFP from viruses were chosen for spine imaging. Dendritic segments of these neurons were imaged at high resolution during the bird’s subjective nighttime [1024 × 1024 pixels, 76 × 76 μm2, 3.2 μs per pixel, averaging two samples per pixel with 1-μm z steps, focused through a ×20, numerical aperture (NA) 1.0 Zeiss IR-Achroplan immersion objective]. Birds were then returned to a darkened holding cage and allowed to sleep and were reimaged 2 to 4 hours later.

Spine image analysis

Dendritic spine images were analyzed as reported previously (17). Briefly, three-dimensional image stacks were auto-aligned and smoothed using a Gaussian filter (ImageJ;, and the same dendritic segment, imaged twice with a 2-hour interval, was selected. Images exhibiting changes in fluorescence or rotational artifacts were excluded from further analysis. Sets of selected three-dimensional image stacks were scored by four researchers blind to the experimental condition to independently test the veracity of the comprehensive analysis carried out by a single researcher. This independent and blinded analysis verified that FP1-KD significantly reduced spine turnover in both juvenile and adult birds compared to age-matched controls (P < 0.05, paired t test). To assess spine growth and retraction, we compared individual dendritic spines across 2- to 4-hour time intervals and calculated spine stability (Nstable/Ntotal), spine elimination (Nlost/Ntotal), spine addition (Ngained/Ntotal), and spine turnover ((Ngained + Nlost)/2Ntotal), where Nstable is the number of spines that were stable over the time interval, Nlost is the number of spines lost over the time interval, Ngained is the number of spines gained over the time interval, and Ntotal is the total number of spines from the first imaging time point. Changes in spine density (Ntotal divided by dendritic length in micrometers) were measured from the same dendritic segments used to assess spine turnover.

In vivo extracellular recordings

Isolated, juvenile (dph 30 to 35) birds received a unilateral (pseudo-randomized) injection of pscAAV-GFP-shFoxp1 in HVC. The other hemisphere received the control virus (rAAV9/ds-CBh-GFP). After 10 to 12 days, we recorded HVC electrophysiological activity from both hemispheres (three to five recordings per hemisphere). We performed the recordings under light isoflurane anesthesia (0.8%) with Carbostar carbon electrodes (impedance: 1670 microhms/cm; Kation Scientific). To minimize variability, we advanced the electrodes in place in HVC. We then waited for 5 min to allow the activity to stabilize after the electrode penetration. We next collected five continuous minutes (300 s) of spontaneous activity and used these data for subsequent analysis. Recordings from both HVCs were sampled in a pseudo-randomized order. Signals were acquired at 10 kHz and filtered (high pass, 300 Hz; low pass, 20 kHz). We used Spike2 to analyze the spikes whose amplitude reached a threshold of 0.3 mV (determined on the basis of the average noise level among all the recordings). A Spike2 script was used to analyze the characteristics of spikes in bursting patterns. Bursts were defined as a minimum of two spikes separated by 10 ms or less, and the burst epoch was considered terminated if no spike was detected for 100 ms after the last spike of the burst. Values for each recording were averaged per hemisphere and statistically compared pairwise where indicated to reveal differences. For the comparison of bursts length, the data from each hemisphere were normalized to produce a frequency distribution of the burst lengths for that hemisphere (bin size of 1 ms). The data were then compared across treatments and further subdivided into two burst duration categories: 5 to 15 ms and 15 to 2500 ms.

Ex vivo physiology

Slice preparation. All extracellular solutions were adjusted to 310 mOsm (pH 7.3 to 7.4) and aerated with a 95% O2/5% CO2 mix. Zebra finches were first deeply anesthetized with isoflurane. Once the bird was no longer responsive to a toe pinch, it was quickly decapitated. The brain was removed from the skull and submerged in cold (1° to 4°C) oxygenated dissection buffer. Acute sagittal 230- to 250-μm brain slices were cut in dissection buffer at 4°C containing 225 mM sucrose, 3 mM KCl, 1.25 mM NaH2PO4, 26 mM NaHCO3, 10 mM d-(+)-glucose, 2 mM MgSO4, 0.5 mM CaCl2, and 2 mM kynurenic acid. Individual slices were incubated in a custom-made holding chamber saturated with 95% O2/5% CO2 at 34°C for 20 min and then kept at 30°C for a minimum of 45 min in artificial cerebrospinal fluid (aCSF) containing 126 mM NaCl, 3 mM KCl, 1.25 mM NaH2PO4, 26 mM NaHCO3, 10 mM d-(+)-glucose, 2 mM MgSO4, and 2 mM CaCl2.

Slice electrophysiological recording. Slices were constantly perfused in a submersion chamber with 32°C oxygenated normal aCSF. Patch pipettes were pulled to a final resistance of 3 to 5 megohms from filamented borosilicate glass on a Sutter P-1000 horizontal puller. HVCX cell bodies double-labeled with GFP and dextran Alexa Fluor 594 were visualized by epifluorescence imaging using a water immersion objective (×40, 0.8 numerical aperture) on an upright Olympus BX51 WI microscope, with video-assisted infrared charge-coupled device camera (QImaging Rolera). Data were low-pass–filtered at 10 kHz and acquired at 2 kHz with an Axon MultiClamp 700B amplifier and an Axon Digidata 1550B Data Acquisition system under the control of Clampex 10.6 (Molecular Devices).

For voltage clamp whole-cell recordings of HVCX projecting neurons, the internal solution contained 120 mM Cs-gluconate, 10 mM Hepes, 5 mM tetraethylammonium-Cl, 2.8 mM NaCl, 0.6 mM EGTA, 4 mM MgATP, 0.3 mM NaGTP, 5 mM BAPTA, and 7 mM QX314 chloride (adjusted to pH 7.3 to 7.4 with CsOH, 297 mOsm). For current clamp recordings, the internal solution contained 116 mM K-gluconate, 20 mM Hepes, 6 mM KCl, 2 mM NaCl, 0.5 mM EGTA, 4 mM MgATP, 0.3 mM NaGTP, and 10 mM Na-phosphocreatine (adjusted to pH 7.3 to 7.4 with KOH, 299 mOsm).

Electrically evoked synaptic currents were measured by delivering one electric stimulus (1 ms, 10 to 30 μA) every 12 s, with an isolation unit, through a glass stimulation monopolar electrode filled with aCSF, placed at about 50 to 100 μm from the recorded HVCX neuron. Synaptic responses were monitored at different stimulation intensities before baseline recording. “Normal” stimulation was defined as a stimulation reliably evoking a synaptic current in the range of 100 pA to 1 nA.

Excitatory and inhibitory synaptic currents were recorded in the whole-cell voltage clamp mode with the Cs-based patch pipette solution to measure the E/I ratio. Only recordings with series resistance below 20 megohms were included. Excitatory postsynaptic currents (EPSCs) and inhibitory postsynaptic currents (IPSCs) were recorded at the reversal potential for IPSCs (+10 mV) and EPSCs (−60 mV) in the presence of the NMDA receptor antagonist APV (d,l-2-amino-5-phosphonovaleric acid) (100 μM), respectively. We also used the Cs-based pipette solution to measure the ratio between N-methyl-d-aspartate receptor–mediated currents (INMDA) and α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid receptor–mediated currents (IAMPA) in HVCX neurons. We added the γ-aminobutyric acid type A (GABAA) receptor antagonist picrotoxin (10 μM) to the aCSF for these recordings. IAMPA was recorded at a holding potential of Vh = − 60 mV and measured at its peak. INMDA was recorded in the same cell at Vh = +40 mV. INMDA amplitude was calculated as the mean between 95 and 105 ms after the electric stimulation artifact to minimize the possible contamination by IAMPA. Access resistance (10 to 20 megohms) was monitored throughout the experiment.

Intrinsic excitability. Neuronal intrinsic excitability was examined with the potassium gluconate–based pipette solution. After whole-cell current clamp mode was achieved, cells were maintained at −80 mV. Input resistances were monitored by injecting a 150-ms hyperpolarizing current (40 pA) to generate a small membrane potential hyperpolarization from the resting membrane potentials. Firing rate represents the average value measured from one to three cycles (700-ms duration at 0.1 Hz, −200- to +250-pA range with 50- or 40-pA step increment, every 12 s).

Western blotting on HVC tissue

Samples for Western blotting were collected over 2 days between 9:00 a.m. and 12:00 p.m. from 90- to 92-dph birds that received either a control (rAAV9/ds-CBh-GFP), scrambled (rAAV9/CS-sh-scrambled-mCherry), or FoxP1 knockdown (pscAAV-GFP-shFoxP1) viral injection into HVC. Briefly, birds were anesthetized with isoflurane, and their brains were dissected. Sagittal sections (230 μm thick) were collected in aCSF using Vibratome VT1200. After confirming the GFP fluorescence under a microscope, HVC was dissected out from these sections. Phosphate-buffered saline (PBS) was pipetted out, and the tissue was lysed in radioimmunoprecipitation assay buffer containing protease and phosphatase inhibitors. Protein quantification was performed using a Bradford assay, and 50 μg of protein per well was used for Western blots. Proteins were run on a 10% SDS–polyacrylamide gel electrophoresis resolving gel, with 5% stacking gel at 80 V until the loading dye front ran off the gel, and were then transferred to an Immuno-Blot polyvinylidene difluoride membrane (Bio-Rad Laboratories) at 250 mA for 2 hours at 4°C. The membrane was dried at room temperature (RT) for 1 hour, reactivated in methanol, and blocked for 1 hour in 5% milk in tris-buffered saline (TBS). The membrane was cut above 50 kDa, and then each half was incubated with appropriate primary antibodies in 5% milk in TBS with 0.1% Tween 20 (TBS-T) overnight at 4°C. The following day, it was washed in TBS-T, incubated with secondary antibodies in 5% milk in TBS-T for 1 hour at RT, washed in TBS-T again, and imaged with TBS on the Odyssey Infrared Imaging System (LI-COR Biosciences). The following antibodies were used: rabbit α-FOXP1 (1:5000) (59), rabbit α-FOXP1 (#2005, Cell Signaling Technology; 1:1000), mouse α-GAPDH (glyceraldehyde-3-phosphate dehydrogenase) (#MAB374, Millipore; 1:10,000), donkey α-rabbit IgG (immunoglobulin G) IRDye 800 (#926-32213, LI-COR Biosciences; 1:20,000), and donkey α-mouse IgG IRDye 680 (#926-68072, LI-COR Biosciences; 1:20,000). The images were quantified using the Odyssey Imaging System (LI-COR Biosciences).


Immunohistochemistry experiments were performed following standard procedures. Briefly, birds were anesthetized with Euthasol (Virbac, TX, USA) and transcardially perfused with PBS, followed by 4% paraformaldehyde in PBS. Free-floating sagittal sections (40 μm) were cut using a cryostat (Leica CM1950, Leica). Sections were first washed in PBS. The tissues were then blocked in 5% normal donkey serum in PBST (0.3% Triton X-100 in PBS) for 1 hour at RT and incubated with primary antibodies diluted in the blocking buffer (5% donkey serum in PBST), first for 1 hour at RT and then at 4°C for 48 hours. Slices were then washed with PBS and incubated with fluorescent secondary antibodies (diluted in blocking buffer) at RT for 2 hours. After a final PBS wash, sections were mounted onto slides with Fluoromount-G (eBioscience, CA, USA). Composite images were acquired and stitched using an LSM 880 laser scanning confocal microscope (Carl Zeiss, Germany). The following primary antibodies were used: mouse anti-FoxP1 (ab32010, Abcam; 1:1000) and rabbit anti-FoxP1 (#2005, Cell Signaling Technology; 1:800). The following secondary antibodies were used: donkey anti-mouse conjugated to Alexa Fluor 405 (ab175658, Abcam; 1:200) and donkey anti-rabbit conjugated to Alexa Fluor 488 (A21206, Invitrogen; 1:500). All image analyses were performed using ImageJ, and graphs were prepared in GraphPad Prism 7.

Song analysis

A single day of undirected songs from each FP1-KD behavioral imitation (n = 9), FP1-KD social experience (n = 8), and control social experience bird (n = 10) was selected for analysis when the birds were between 90 and 100 dph. Adult isolate songs were obtained from eight birds ranging in age from 99 to 463 dph. Birds were recorded using Sound Analysis Pro 2011.

Syntax analysis

From the single day of song for each bird, a random subset of 30 song files was selected for syllable labeling. Syllables were labeled by hand by an expert based on the song spectrograms using a custom MATLAB program. Syllable labels, onset times, and offset times were exported to R for further analysis. Entropy rate was calculated using the ccber package in R for each bird based on the syllable labels, including a label reflecting the start or end of a song file. Entropy rate reflects an overall measure of the predictability of syllable transitions as a first-order Markov chain, calculated according to the following formulaEntropy rate=ijπiPijlog2Pijwhere Pij is the probability of transitioning from state i to state j, and πi is the stationary distribution of the model for state i. A perfectly predictable sequence, where every syllable is always followed by the same syllable, will have an entropy score of 0. A maximally entropic syllable sequence, where there is an equal probability that a given syllable transitions to any of the possible syllables (or a file end), will have an entropy rate of log2K, where K is the number of different states or, in this case, syllables plus an additional state representing the end of a file.

Syllable transition probabilities used to determine arrow thickness in Fig. 3 (D, H, L, and P) were calculated using the markovchain package in R. Each syllable as well as file boundaries and gaps lasting longer than 100 ms were considered “interbout gap” states in the first-order Markov chain. Each pairwise transition between states (syllables or interbout gaps) is tallied to determine the probability of each of the possible states occurring following a given current state to generate a two-dimensional transition probability matrix. Arrow thicknesses in Fig. 3 (D, H, I, and P) are directly proportional to the probability that the origin syllable would transition to the syllable designated by the arrow.

Syntax raster plots shown in Fig. 3 (C, G, K, and O) were created using custom R code. Syllable labels are first arranged into bouts. Bouts are considered strings of at least two syllables that are not separated by a gap longer than 100 ms or a file boundary. All bouts are then arranged according to user-specified primary and secondary alignment syllables. These alignment syllables are chosen to maximize the overall alignment of all bouts in the final raster plot. Each bout is shifted along the x axis such that the first occurrence of the primary alignment syllable occupies the 0 position across all bouts, with the order of syllables maintained within each bout. If a bout does not contain the primary alignment syllable, it is shifted such that the first secondary alignment syllable occupies the 0 position, and the order of syllables within the bout is maintained. Bouts that contain neither the primary nor secondary alignment syllables are plotted above the others at an offset such that the last syllable in that bout occupies the x position −1. Once aligned, the bouts are ordered along the y axis alphabetically based on the syllable labels following the alignment syllable. The result is a representation of the sequence of syllables across all labeled bouts from a single bird arranged to maximize the emergence of patterns and dominant sequences.

Similarity scoring

The lack of stereotyped motif structure in the isolate tutored shFoxP1 birds made it difficult to use standard automated song similarity scoring programs. Instead, similarity between spectrograms was evaluated by 11 human experts. Each was presented with a total of 126 spectrogram pairs and were instructed to rate the similarity of the two spectrograms on a scale of 1 (not similar) to 10 (very similar), following a five-comparison training set. The spectrograms used were all generated using Sound Analysis Pro 2011, and all reflected a duration of approximately 2.5 s. All spectrograms began at the beginning of a song bout and, with the exception of songs that lacked a typical motif structure, included at least one full song motif. The order of comparisons within the test set was randomized for each participant and included four comparisons between tutee and tutor for each tutee in the FP1-KD behavioral imitation, FP1-KD social experience, and control social experience groups (4 × 27 birds), eight comparisons between a tutor and itself to ensure that scorers were using the full range of the scale, and 10 duplicated tutor-tutee pairs to ensure that the scorers were internally consistent.

No individual scorer differed from the mean score by more than an average of 2 SDs, none differed by more than an average of 2 points on the duplicated comparisons, and all but one made use of the full 10-point scale (they gave at least one comparison a score of 0 and at least one a score of 10). One scorer never gave any comparison a perfect 10/10 score, so their scores were rescaled such that they spanned the full 1 to 10 range. A percentage similarity score was calculated for each bird by taking the mean similarity rating for all four comparisons to the tutor across all scorers. The full scoring set is available at

Sound Analysis Pro 2011 was used for song similarity scoring of Scr, FP1-KD SE, and Ctrl SE birds in fig. S6. A representative motif from the song tutor was selected and compared to 30 to 60 different motif renditions from their pupil bird, recorded when the pupil was between 80 and 100 dph. These comparisons were performed using the asymmetric time-courses similarity tool. The final percentage similarity score for each bird is the mean of the percentage similarity of the 30 to 60 comparisons.

Statistical analysis

All data were tested for normality using the Shapiro-Wilk test. Consequently, parametric or nonparametric statistical tests were used as appropriate: t test (unpaired or paired, as indicated), Mann-Whitney test (unpaired), or Wilcoxon matched-pairs signed-rank test (paired) was used where appropriate. One-way analysis of variance (ANOVA) or Kruskal-Wallis test was performed when comparisons were made across more than two conditions. Two-way ANOVA (post hoc Sidak) was used to test differences between two or more groups across different conditions. Statistical significance refers to *P < 0.05, ** P < 0.01, and *** P < 0.001. Statistical details for all experiments are included in the corresponding figure legends.


Supplementary materials for this article is available at

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We thank members of the Roberts and Konopka laboratories for discussion and comments on the manuscript and A. Guerrero and M. Harper for laboratory support. Funding: This research was supported by grants from the NIH (R21DC016340 to T.F.R. and G.K., R01NS108424 and R01DC014364 to T.F.R., and R01MH102603 to G.K.) and the NSF (IOS-1457206 to T.F.R.). F.A. and D.H.A. were supported by T32HL139438. Author contributions: T.F.R. and G.K. conceived the project. T.F.R. supervised the research. F.G.-O., T.M.I.K., H.P., M.T., and T.F.R. designed the experiments and wrote the manuscript with input from all authors. F.G.-O., T.M.I.K., H.P., M.T., V.D., M.C., S.E.P., D.H.A., and J.E.H. collected and analyzed the data for the project. G.K., M.C., S.E.P., and F.A. designed and tested the knockdown viral construct. All authors read and commented on the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article