Research ArticleNEUROSCIENCE

Does the mismatch negativity operate on a consciously accessible memory trace?

See allHide authors and affiliations

Science Advances  13 Nov 2015:
Vol. 1, no. 10, e1500677
DOI: 10.1126/sciadv.1500677


The extent to which the contents of short-term memory are consciously accessible is a fundamental question of cognitive science. In audition, short-term memory is often studied via the mismatch negativity (MMN), a change-related component of the auditory evoked response that is elicited by violations of otherwise regular stimulus sequences. The prevailing functional view of the MMN is that it operates on preattentive and even preconscious stimulus representations. We directly examined the preconscious notion of the MMN using informational masking and magnetoencephalography. Spectrally isolated and otherwise suprathreshold auditory oddball sequences were occasionally random rendered inaudible by embedding them in random multitone masker “clouds.” Despite identical stimulation/task contexts and a clear representation of all stimuli in auditory cortex, MMN was only observed when the preceding regularity (that is, the standard stream) was consciously perceived. The results call into question the preconscious interpretation of MMN and raise the possibility that it might index partial awareness in the absence of overt behavior.

  • consciousness
  • awareness
  • Perception
  • short-term memory
  • mismatch negativity
  • auditory cortex
  • magnetoencephalography
  • ERP
  • ERF


Short-term memory is a fundamental construct in cognitive science, where one of the central questions is whether and to what extent its contents are consciously accessible (13). In audition, the neural basis of short-term memory is often studied via mismatch negativity (MMN) (46), a change-related component of auditory evoked response and one of the most oft-studied brain responses in neuroscience. Normally elicited by a deviant stimulus in the so-called auditory oddball paradigm (6), MMN can be observed in response to any discriminable violation of an otherwise regular stimulus sequence (7).

Although its sensitivity to task and stimulation contexts is well established (8), the prevailing functional view of MMN is that it reflects automatic change detection, operating on a preattentive and even preconscious representation of the preceding context regularity (that is, the standard stream) (7). On the basis of this interpretation, MMN has been used extensively to assess the neural basis of preconscious auditory memory. Apart from the many studies claiming that it is only mildly sensitive to manipulations of directed attention, the preconscious view of MMN also stems from the fact that it has sometimes been observed in various states of behavioral unconsciousness (for example, sleep, sedation, and disorders of consciousness) (9, 10). However, these studies have failed to reach a consensus on whether MMN can be elicited in the absence of perceptual awareness of the standard stream, and to our knowledge, this has never been directly examined in the healthy, waking brain.

The present study addressed this question using informational masking (11) and simultaneous magnetoencephalography (MEG). Classical auditory oddball sequences were embedded in random multitone masker “clouds” (Fig. 1A and audio S1 and S2), rendering them sometimes inaudible despite spectral isolation and suprathreshold sensation levels (12). In each trial, listeners (N = 20) indicated via button press the moment at which they began to hear out the regular standard-stream sequence from the irregular multitone background, allowing us to bin MEG responses according to listeners’ moment-by-moment perception of the standard stream. Sequences consisting only of the multitone cloud stimulus (“catch trials”) facilitated the measurement of false-positive rates and the sensitivity index d′. Listeners were instructed to ignore the pitch change associated with the frequency deviant. Thus, in contrast to previous work examining the task dependence and context dependence of MMN, our paradigm required no stimulus or task manipulations to produce different perceptual interpretations of the deviant-preceding standard stream.

Fig. 1 Stimulation paradigm and behavioral results.

(A) Spectrographic representation of a targets-present masked trial (left), a targets-absent masked trial (middle), and an unmasked control trial (right). (B) True- and false-positive rates (that is, hits and false alarms) as a function of position within the stimulus sequence. (C) Corresponding d′. (D) True-positive rates as a function of position, binned by target frequency.


Listeners were sensitive to the presence of oddball sequences

On average, only 59% of all standard streams were perceived when they were embedded in multitone masker clouds (Fig. 1B). Listeners were sensitive to the presence of such sequences, however, as the false-positive rate never exceeded 7%, on average, resulting in a sensitivity index (d′) that plateaued around 2 (Fig. 1C). Repeated-measures analysis of variance (ANOVA) confirmed significant main effects of response (true-positive versus false-positive rates, F1,19 = 259.8, P < 0.0001), tone position (F1,19 = 212.7, P < 0.0001), and their interaction (F1,19 = 129.6, P < 0.0001), indicating a larger difference between true-positive and false-positive rates for later tone positions. Sorting the true-positive rates by standard-stream frequency further revealed that lower-frequency standard streams were easier to detect, especially for later tone positions (Fig. 1D) (main effect of frequency: F1,19 = 106.4, P < 0.0001; main effect of tone position: F1,19 = 237.1, P < 0.0001; two-way interaction between frequency and tone position: F1,19 = 56.0, P < 0.0001).

MMN was not generated in the context of unperceived standard streams

To examine whether perceptual awareness of the standard stream is required for MMN generation, we first verified that MMN could be elicited by our oddball paradigm in the absence of the multitone masker. Indeed, in quiet, deviants elicited larger peak amplitudes than standards in both left and right auditory cortex (AC) on the anterior superior temporal plane (aSTP) (Fig. 2, A to C) (deviants versus standards: F1,19 = 19.5, P < 0.0005).

Fig. 2 Neural activity in the control (A to C) and masked (D to F) conditions.

(A) Grand-average normalized dSPM between 100 and 150 ms for deviants minus standards under control (unmasked) conditions. (B) Grand-average MxNE solutions for (top) standards (black traces) and deviants (gray traces) and (bottom) their respective difference waveforms. (C) Quantified amplitudes for standards (black) and deviants (gray). (D) Grand-average MxNE solutions for virtual targets (gray), deviants and standards for undetected standard streams (blue), deviants and standards for detected standard streams (orange), and their respective difference waveforms (deviants minus standards). (E) Corresponding dipole locations. (F) Quantified amplitudes for MMN responses generated in the context of detected (orange) and undetected (blue) standard streams. au, arbitrary units.

For oddball sequences embedded in multitone masker clouds, MEG responses to both deviant and standard stimuli were binned and averaged separately according to the context in which they occurred (that is, perceived or unperceived standard streams) (Fig. 2, D to F). Additional epochs, composed of the MEG response to “virtual targets,” were created from the catch trial (that is, masker-alone) stimuli by time-locking the MEG signal relative to the potential positions of target tones, had they been present. This condition serves as a baseline and should result in a flat average response due to the random nature of the time-locking, thus yielding an empirical estimate of the signal-to-noise ratio of the averaging procedure (Fig. 2D, gray traces).

Examining the source waveforms from this analysis, it is immediately apparent that only deviants occurring in the context of perceived standard streams elicited an MMN (Fig. 2D, orange and blue traces) located on the aSTP (Fig. 2E). A three-way ANOVA performed on quantified amplitudes (Fig. 2F), with deviance (deviants versus standards), percept (detected versus undetected), and hemisphere (left versus right) as factors, revealed significant main effects of deviance (F1,19 = 11.4, P < 0.005) and percept (F1,19 = 29.5, P < 0.0005), as well as a significant two-way interaction between them (F1,19 = 5.7, P < 0.05), suggesting that deviants elicited larger responses than standards, but only if they occurred during perceived standard streams. This was confirmed by two subsequent two-way ANOVAs (each with deviance and hemisphere as factors) for amplitudes in the context of (i) undetected standard streams (where there was no significant effect of deviance) and (ii) detected standard streams (where there was a significant effect of deviance) [undetected: F1,19 = 0.0, not significant (n.s.); detected: F1,19 = 13.9, P < 0.005], as well as a direct comparison of MMN amplitude from deviant-standard subtraction waveforms (detected versus undetected: F1,19 = 5.7, P < 0.05; no significant effect of hemisphere; no significant interaction).

A supplemental analysis controlling for the frequency bias in the behavioral data (cf. Fig. 1D) produced the same pattern of results (fig. S1). In addition, the per-frequency deviance responses in the control conditions were nearly identical (fig. S2). Thus, the difference between detected and undetected deviance responses cannot be accounted for by differing distributions of higher-frequency versus lower-frequency stimuli comprising the detected and undetected bins.

To further examine whether MMN may have been elicited during unperceived standard streams, we conducted a Bayes factor analysis, which provides an indication of whether nonsignificant results actually support the null hypothesis (Bayes factors ≤ 1/3, by convention) versus being merely insensitive to differences between groups (Bayes factors around 1) (cf. Materials and Methods) (13). Assuming a uniform distribution of plausible population effect sizes with the maximum taken as the mean MMN amplitude elicited in the context of perceived standard streams, we obtained a Bayes factor of 0.29 (0.27 after controlling for the aforementioned frequency bias). Even when using a more conservative half-normal effect-size plausibility distribution, we still obtained values of 0.42 and 0.37 for the uncontrolled and controlled analyses, respectively. Thus, rather than merely being insensitive to its potential presence, our data provide evidence that no MMN was generated in the context of unperceived standard streams.

All stimuli were represented in primary AC

The lack of an MMN during unperceived standard streams raises the question of whether unperceived standards were at all represented at the cortical level. We therefore carried out an additional source analysis focused on early activity, which indicated that all stimuli generated a P1 response (already visible in the MMN-focused analysis of Fig. 2D) irrespective of listeners’ awareness (Fig. 3 and table S1). Thus, all stimuli were represented in cortex at an early stage of processing. Like the MMN, the P1 was also significantly larger for deviants that occurred in the context of perceived standard streams, but not unperceived standard streams. This was confirmed by a two-way interaction between deviance and percept (F1,19 = 10.3, P < 0.005; no significant main effects), as well as by a significant effect of deviance in the context of detected standard streams (F1,19 = 6.9, P < 0.05), but no such effect in the context of undetected standard streams (F1,19 = 1.8, n.s.). The same pattern of results held after epoch equalization was applied, although the comparisons did not reach statistical significance (fig. S3).

Fig. 3 Neural activity under masked conditions, with emphasis on the P1 latency range.

(A) Grand-average MxNE solutions for virtual targets (gray), deviants and standards for undetected standard streams (blue), deviants and standards for detected standard streams (orange), and their respective difference waveforms. (B) Corresponding dipole locations. (C) Quantified amplitudes for P1 responses generated in the context of detected (orange) and undetected (blue) standard streams, for both standards (left) and deviants (right).

Masker tones elicited similar activity before and after standard-stream detection

To further evaluate a potential attentional interpretation of our findings, we carried out an additional analysis on neural responses to masker tones. If the difference in the responses observed in the context of detected versus undetected standard streams were the result of selective attention to the standard stream after its detection, we would expect to observe correspondingly weaker responses to masker tones after listeners became aware of the standard stream. Although there was a trend for masker tone P1 responses to be smaller after target-stream detection in left AC, this effect was small and not present in right AC (Fig. 4 and table S2; see also fig. S4). Furthermore, we observed similar effects when masker-evoked responses were compared between the early portions and the late portions of the trials for which target streams were either absent or present but subliminal, suggesting that the effect is likely related to adaptation rather than attention. Finally, an attention-based account of our findings would also predict enhanced P1 responses for detected versus undetected standards. However, if anything, we observed the opposite (that is, smaller P1 responses for detected versus undetected standards) (Fig. 3C) (left AC: T19 = 2.1, P < 0.05; right AC: T19 = 0.82, n.s.).

Fig. 4 Neural activity elicited by masker tones, with emphasis on the P1 latency range.

(A) Grand-average MxNE solutions for responses elicited by masker tones for targets-detected trials (solid traces; top) and targets-undetected/targets-absent trials (dotted traces; bottom) for early (green) and late (magenta) time intervals. Epochs for targets-detected trials were created by splitting those trials into two time intervals (before and after target detection). Epochs for targets-undetected/targets-absent trials were created by splitting those trials into two time periods based on the average detection time of the targets-detected trials. (B) Corresponding dipole locations. (C) Quantified P1 amplitudes in response to masker tones both before (green) and after (purple) detection of the standard stream. Closed (open) bars are for targets-detected (targets-undetected) trials. A three-way ANOVA with hemisphere, target detection, and time interval as factors revealed significant two-way interactions between hemisphere and interval (F1,19 = 9.2, P < 0.01) as well as a marginally significant two-way interaction between target detection and interval (F1,19 = 3.3, P = 0.09). However, subsequent paired comparisons did not show significant effects of the time interval on either left AC or right AC in the direction one would expect based on an attentional account of the data (table S2). A similar pattern of results was observed when the more anterior MMN-defined source space was used (cf. figs. S4 and S6).

Finally, in addition to P1, masker tones also elicited a small but reliable N1, which is generally much more sensitive to attention than P1 (14). We thus performed a similar masker-tone analysis focused on the N1 (Fig. 5 and table S2; see also fig. S5). Here, the before/after effect present in targets-detected trials (that is, smaller masker-elicited N1 responses after target detection) was even larger for targets-undetected and targets-absent trials, the opposite of what would be expected based on an attentional suppression of masker-tone responses after target detection. These data are thus strong evidence against an attentional interpretation of our data and suggest that the reduction in amplitude observed for after-detection versus before-detection masker responses (for both the P1 and the N1) was the result of adaptation rather than a shift of selective attention.

Fig. 5 Neural activity elicited by masker tones, with emphasis on the N1 latency range.

(A) Grand-average MxNE solutions. (B) Corresponding dipole locations. (C) Quantified N1 amplitudes. A three-way ANOVA with hemisphere, target detection, and time interval as factors revealed a significant main effect of interval (F1,19 = 15.1, P < 0.005) as well as a significant two-way interaction between interval and target detection (F1,19 = 5.1, P < 0.05). However, this interaction went in the opposite direction that would be expected based on an attentional account of the data (that is, the early-late difference was larger for targets-undetected and targets-absent trials; table S2). Here, the source space used for the masker-elicited N1 was the same as for the masker-elicited P1. The same pattern of results was found when the more anterior MMN-defined source space was used (cf. figs. S5 and S6).


Our findings have several important implications for the interpretation of MMN, its neural generators, and its diagnostic utility in disorders of consciousness. Functionally, they suggest that MMN operates on conscious stimulus representations rather than preconscious ones (7), consistent with a recent interpretation (8) emphasizing its sensitivity to auditory perceptual organization, which, in our paradigm, likely determined listeners’ perception of the standard stream. However, in contrast to previous studies, our paradigm required no task or stimulus differences to manipulate MMN. This interpretation does not necessarily contradict previous work showing that behavioral detection of deviance may not be necessary for MMN generation (15). However, others have shown a strong correspondence between MMN latency and behavioral reaction time (16), highlighting MMN’s proximity to conscious processes. Nor does our interpretation argue against the existence of preconscious auditory memory traces (17), only that such traces may be insufficient for the MMN process (18).

An alternative interpretation that cannot be ruled out relates to attention. Dissociating awareness from attention is certainly challenging (2, 19). Here, because hearing the standard stream makes it easier to attend, attention—and not awareness—might have been the crucial factor underlying MMN generation. Although this interpretation is attractive given that (i) attention can modulate MMN (2024) and (ii) general attention to the stimuli used here was undoubtedly required for listeners to perform their detection task, we find it unlikely for several reasons.

First, the results of our masker-tone analysis strongly argue against an attentional interpretation. Second, several studies have observed MMN when participants actively ignored the stimuli by engaging in other, sometimes highly demanding tasks (7, 25). In these studies, basic perception of the standard stream is nevertheless likely given the otherwise quiet conditions. Third, in contrast to previous studies examining attentional modulation of MMN, the deviant tones used here were never task-relevant; thus, although listeners could have attended to the standard streams after becoming aware of them, they were neither required nor instructed to do so. Last, as we detail below, MMN can sometimes be observed during states of behavioral unconsciousness (that is, sleep, sedation, and minimally conscious state), where basic awareness of sensory stimuli seems more likely than the ability to attend them.

Thus, the strong attentional effects observed by previous MMN studies may have instead been a result of stimuli/task manipulations inducing changes in awareness. However, a recent study using perceptually bistable triplet sequences did not find such effects (26), perhaps because the deviant stimulus used was deviant regardless of perceptual organization.

Anatomically, our results agree with previous work by Näätänen and Alho (27) suggesting that the aSTP is the predominant MMN generator. First, it was clearly the most active area in our source analysis. Second, only deviants embedded in detected standard streams elicited an MMN, and previous work has implicated nonprimary AC in auditory perceptual awareness (28). Finally, all stimuli elicited earlier responses (P1) thought to arise from primary AC (29), and although intracortical data from humans (30), cats (31), and nonhuman primates (32) have identified an MMN-like response there, our source analyses did not show deviance-related activity in this area at MMN latency.

In addition to temporal generators, some studies have reported a frontal component of MMN (33), consistent with diminished MMN amplitudes after frontal lesions (34) and modeling work (35) [but see Wacongne et al. (36)]. There was no evidence for such a component in our data, suggesting either that no frontal activity was evoked by our paradigm or that MEG may not be sensitive to frontal MMN sources (37). This does not rule out that frontal processes, even if they do not contribute to MMN per se, might constitute a necessary antecedent to its generation. To the extent that frontal cortex is necessary for conscious access (38), this would be consistent with our finding that MMN generation required perceptual awareness of the standard stream.

Although we did not observe a deviance effect in primary AC at the time of MMN, our data did show an earlier effect around the P1 latency, consistent with aforementioned animal models and recent human studies (39). Like MMN, this effect was observed only in the context of perceived standard streams, a somewhat surprising result in light of the present and previous results showing that the presence or absence of P1 does not covary with perceptual awareness (40). If the P1 deviance effect we observed is related to stimulus-specific adaptation (SSA), as has been observed in other contexts (41, 42) and proposed for MMN (43, 44), then such SSA might be highly context-dependent—even at early stages of cortical processing—possibly as a result of perceptually gated tuning-curve sharpening (45, 46).

However, a recent study observed an early (~85 ms) deviance effect that persisted in deep sleep (47), an apparent discrepancy with our finding that this early deviance response (~55 ms in our data) was abolished during unperceived standard streams. The reasons for this are unclear but may be related to the tuning-curve hypothesis outlined above (that is, when subliminal, standards and deviants activated similar neuronal populations). Here, it is worth noting that Straus and colleagues (47) did not observe this early deviance response for “global” deviants (that is, stimuli that were physically identical to the immediately preceding context but deviant from the standard pattern on that block) (48). Another possibility is substantial paradigmatic difference (vowels versus tones, short versus long stimulus onset asynchrony, no masker). Both arguments would predict that the P1-related deviance effect we observed could appear in the context of unperceived standard streams given appropriate stimulus parameters.

A major difference between typical MMN paradigms and that used here is the complexity of the stimuli. Whereas the canonical MMN paradigm involves only a single stream of tones (frequent standards and rare deviants, as in our control conditions), the multitone masker presents listeners with a perceptually demanding segregation task and probably led to the slightly later/broader MMN we observed in masked versus control conditions. Specifically, the multitone masker likely decreased the salience of the difference between deviants and standards, a factor known to alter MMN morphology and peak latency (15, 16).

Whether generation of MMN requires standard-stream awareness in simpler stimulus configurations is debatable. However, in quiet, any oddball sequence for which MMN can be elicited will nearly always be perceived, and devising a paradigm in which oddball sequences are consciously accessible yet sometimes subliminal is challenging, and a primary advantage of the paradigm used here. Another way that this has been examined is through the recording of evoked responses during various states of behavioral unconsciousness, including sleep (49, 50), sedation (51, 52), and disorders of consciousness (10), where it is often concluded that MMN can be observed in the absence of consciousness, defined behaviorally (9). As mentioned in the Introduction, however, these studies have failed to reach a consensus on the relationship between perceptual awareness and MMN.

During sleep, MMN is typically observed only in particular stages [for example, rapid eye movement (REM) sleep or light stage 1] where individuals can be partially aware of—even behaviorally responsive to—external stimuli (47, 5355) and is completely abolished in deeper stages except for hypersalient, very rare, or extremely deviant stimuli (56). Some studies have failed to observe MMN even in REM sleep (57, 58). A very recent study has confirmed this general pattern (47) and, given that MMN was the only component to be completely abolished in deep sleep (in fact, before that, in nonresponsive stage 1), is consistent with the notion that MMN could reflect partial awareness. The results are clearer in the anesthesia literature, where MMN can be observed under sedation (51, 52) but with markedly reduced amplitudes; it is abolished during (59) or even before (60) behavioral unconsciousness. The notion that MMN is linked to perceptual awareness is also supported by studies showing its sensitivity to violations of syntax (25, 61), which has been shown to depend on listeners’ level of consciousness (62).

Perhaps most controversial is the literature on disorders of consciousness, where many studies have observed MMN during behavioral unconsciousness. Crucially, however, the presence of an MMN is highly predictive of imminent recovery in such patients (10, 63, 64), even though they are not generally thought to be even partially aware at the time of its elicitation (9). Our findings suggest arriving at this conclusion with caution and raise the possibility that the presence of MMN, or its emergence across recording sessions (65), could be one of the earliest available indicators of partial awareness.


Ethics statement

All procedures were approved by the Institutional Review Board at the University Hospital Heidelberg. Each participant provided written informed consent before participation.


Twenty healthy participants (female, 11; male, 9) aged between 21 and 32 years (mean ± SD, 26 ± 4 years) took part in the study. None of the listeners reported any history of hearing or neurological disorders. Seven additional subjects were excluded from the analysis because of poor behavioral performance (d′ < 1 or false-alarm rate > 0.3) (n = 3) or inability to obtain anatomical magnetic resonance imaging (MRI) (n = 4).

Stimuli and procedure

Sound stimuli were auditory oddball sequences (~5 s long) consisting of 10 short-duration pure tones (100 ms), with 10-ms on-ramps and off-ramps and 500-ms stimulus onset asynchrony. Only one deviant tone was presented in each trial and could occur anywhere between the third position and the eighth position within the sequence (inclusive). Four blocks of sequences (trials) were presented. The first and last blocks served as controls in which only oddball sequences (N = 120) were presented. Standard (deviant) tone frequencies were 589 (619), 902 (947), 1334 (1400), and 1931 (2028) Hz, reversed in half of the trials.

Listeners were asked to listen to the sequences and were required to initiate each trial with a button press. In addition to keeping the subjects alert during the measurement, the requirement that they initiate each trial also allowed them brief rest periods during which they were allowed to blink or close their eyes. In the second and third blocks, the same oddball sequences used in blocks 1 and 4 were embedded in a complex multitone masker consisting of tones randomly placed in time and frequency (12, 66). Individual masker tones were drawn from a log-uniform distribution between 200 and 5000 Hz (specifically 0.2, 0.25, 0.30, 0.36, 0.43, 0.51, 0.59, 0.68, 0.79, 0.90, 1.03, 1.17, 1.33, 1.51, 1.71, 1.93, 2.18, 2.45, 2.76, 3.10, 3.48, 3.90, 4.37, and 4.89 kHz). Listeners’ tasks on these blocks were to (i) indicate via button press the moment at which they began to hear out a regularly repeating target-tone stream amidst the multitone masker and (ii) initiate the start of the next trial. For the measurement of behavioral sensitivity (d′) to oddball sequences when embedded in multitone masker clouds, an oddball sequence was presented in only 80% (n = 120) of all trials (N = 150) in each masked block; the other 20% of the trials (n = 30) were “catch” trials in which no oddball sequence was presented. This enabled us to estimate false-positive rates and corresponding d′ values (z-transformed true-positive rates minus z-transformed false-positive rates) (67).

All sound stimuli were generated in MATLAB (The MathWorks) and stored as 32-bit .wav files with a sampling rate of 48 kHz. Digital sound files were converted into analog waveforms by an on-board sound card (RME DIGI96/8 PAD) and a freestanding digital-to-analog converter (RME ADI-8 DS; RME), which in turn were controlled by SoundMexPro software (SoundMexPro) in the MATLAB environment. The analog stimuli passed through a programmable attenuator (TDT PA5; Tucker-Davis Technologies) before being amplified (TDT HB7; Tucker-Davis Technologies) and presented to listeners via ER-3 insert earphones (Etymotic Research) at a comfortable listening level. Listeners’ behavioral responses were recorded via an optical button device from Current Designs Inc.

Data acquisition and preprocessing

MEG was recorded at a sampling rate of 500 Hz (using a 160-Hz online low-pass filter) via a Neuromag-122 system with 2 orthogonal planar gradiometers at each of 61 locations around the head. Structural MRI scans with the same field of view and resolution (1-mm isotropic), including T1-weighted magnetization-prepared rapid gradient echo (MPRAGE) and multiecho fast low-angle shot (FLASH) sequences, were acquired for each subject using a 3-T Siemens TIM Trio scanner.

Raw MEG data from each block were visually inspected to identify and reject time epochs or channels containing large artifacts or flat signals. They were then bandpass-filtered between 0.5 and 15 Hz and cleaned of blink artifacts using signal-space projection (68) in the MNE software package ( (69). The data were then epoched between −50 and 450 ms peristimulus (rejecting any epoch with gradiometer signals larger than ±10 pT/cm) and binned according to (i) standards versus deviants, (ii) standard/deviant target-stream frequency, and (iii) whether the standard stream was perceived before the occurrence of the deviant (in the case of masked conditions). Additional epochs, composed of virtual targets, were created from the catch trial (that is, masker-alone) stimuli by time-locking the MEG signal relative to virtual target tones (that is, the positions in the sequence of potential target tones, had they been present). Because the time-locking used here is effectively random, the resulting average response should be flat, yielding an empirical estimate of the signal-to-noise ratio of the averaging procedure.

For unmasked control blocks, the number of standard and deviant epochs was equalized across the four frequency bins before (i) averaging within bins and (ii) combining all standards and deviants into their respective bins to obtain average event-related fields for both standards and deviants. In our main analysis of masked conditions, epoch equalization was not performed because of the large effect of frequency on the detection of target streams (cf. Fig. 1D) and the resulting impact that such equalization would have on the signal-to-noise ratio of the responses. However, we did perform such epoch equalization as part of a supplemental analysis, as follows. Within each subject and for each frequency bin (F1, F2, F3, and F4), the numbers of epochs in the detected/undetected bins were equalized (separately for standards and deviants) by discarding trials in the bin with the higher initial number of epochs. Such equalization yields identical weighting of frequency bins for detected and undetected epoch distributions.

In addition to the analysis of target tones described above, we conducted a separate analysis of responses to masker tones. Individual masker-tone events were obtained by computing spectrograms of each stimulus sequence and finding energy onsets in each frequency band, after filtering out the band including the target tones (when present). Specifically, spectrograms were computed at each of the 24 frequencies in the log-normal distribution of possible masker-tone frequencies (plus intermediate frequencies) using 768-sample Hamming windows with an 87.5% overlap, yielding a temporal resolution of 2 ms, the same as for the 500-Hz MEG acquisition. This ensured that the timing of masker-tone onsets obtained using this method was no less accurate than if we had used trigger pulses recorded by the MEG acquisition system. Then, at each of the 24 possible masker-tone frequencies, we used an onset-detection algorithm that identified the first sample that exceeded 75% of the maximum power at the given frequency, ignoring the subsequent 100 samples at that frequency. This 100-sample (or 200-ms) “refractory period” likely resulted in some masker-tone events being missed, but this is unlikely to have affected the results given that there were 18,600 possible masker events across the two blocks [2 blocks × (120 targets-present trials/block × 10 500-ms windows/trial × 6 masker tones/window + 30 targets-absent trials/block × 10 500-ms windows/trial × 7 masker tones/window)].

The spectrograms were divided into two groups: (i) targets present and detected (targets-detected trials) and (ii) targets either absent or present but undetected (targets-absent and targets-undetected trials). Masker epochs (from −50 to 450 ms peristimulus) from these two categories were then subdivided into “early” and “late” epochs. For targets-detected trials, this corresponded to before and after target detection. For targets-undetected trials, this corresponded to before and after the average detection time from targets-detected trials. For targets-detected (targets-undetected) trials, this resulted in an average of 2107 (2361) and 5727 (6569) masker-tone events in the early and late time windows, respectively. The overall larger number of masker-tone events in targets-undetected trials reflects the fact that, when a target stream was present, it replaced one of the masker tones in each time window to preserve the overall stimulus energy across time. This analysis allowed us to examine whether the responses to masker elements changed as a function of target detection while controlling for potential effects of time.

With the FreeSurfer ( and MNE software packages, the MRI data were used to construct individualized, cortically constrained source spaces and boundary element models (BEMs). The high-resolution MPRAGE data were used to create three-dimensional (3D) reconstructions of the white and pial surface of each participant using FreeSurfer (70, 71). The resulting white matter surface was decimated (to about 4100 dipole locations per hemisphere) to create the MEG source space. The resulting dipoles represent, on average, a cortical surface area of about 24 mm2, with an intersource spacing of about 5 mm. The FLASH images were used to produce high-resolution inner-skull, outer-skull, and scalp surfaces. These, in turn, were decimated and used to construct participant-specific three-layer BEMs. The MEG and MRI coordinate frames were aligned by manually demarking the fiducial points on the high-resolution scalp surface. Refinement of this initial 3D affine registration was carried out using the iterative closest-point algorithm (72) as implemented by the MNE suite.

Source analysis

We used a combination of anatomically constrained, noise-normalized, minimum-norm estimates (73, 74) known as dynamic statistical parametric mapping (dSPM) (75) and sparse inverse estimates based on an L1/L2 mixed norm (MxNE) (76). For both masked and control conditions, we first computed the dSPM for the deviants-minus-standards subtraction condition. The resulting masked dSPMs served as priors in the computation of MxNE solutions, which proceeded as follows.

First, because this analysis indicated that the primary—indeed only—generator of MMN in our data was located on the superior temporal plane anterior to the transverse temporal gyrus (TTG) (cf. Fig. 2A), we restricted the source space for the MMN component by explicitly zeroing out vertices that were not in this area during sparse source analysis (cf. fig. S5). This was performed to avoid dipoles on the opposite side of the lateral fissure (a common problem in any MEG source localization method) and to restrict the resulting dipole solutions to the vicinity of the foci of activation revealed by dSPM for MMN under control conditions.

Second, a known issue with the MxNE is its strong dependence on the regularization of the noise covariance matrix, such that too much regularization can result in multiple dipoles in circumscribed areas with similar source time courses as opposed to a single dipole capturing the same variance. To circumvent this issue, we set the regularization parameter (α, in the case of the mixed-norm solver implemented in the MNE-Python suite) such that only a single source (in either left AC or right AC, depending on the individual participant) would be returned. We then computed a subsequent MxNE on the residual (that is, the portion of the data not explained by the first MxNE). We proceeded in this manner until at least one source was present in each hemisphere, and we took only the first dipole found in each hemisphere as the final resulting source waveform for that participant/hemisphere.

For control conditions, we first restricted the time window of the MxNE to that which narrowly enveloped the primary component of the mismatch response (that is, the MMN). The resulting dipole locations were then used as weights to a subsequent MxNE in which the entire epoch duration was analyzed, resulting in the source waveforms shown in Fig. 2. Note that the MxNE jointly solves for all evoked time courses of interest, in this case standards, deviants, and deviants minus standards. The same analysis was performed for masked conditions, except that, in this case, we jointly estimated the following conditions: (i) virtual targets, by time-locking the MEG signal to points at which target standards and deviants would have occurred had they been present; (ii) undetected standards, undetected deviants, and their subtraction; and (iii) detected standards, detected deviants, and their subtraction.

Source data from each of the 20 individual participants were transformed onto a template brain (the FreeSurfer average surface) using a 2D spherical registration method that is known to yield accurate alignment of both functional and anatomical brain areas across individuals (77). This was performed for both the distributed dSPM and the sparse mixed-norm estimates, resulting in (i) a grand-average source map (in the case of transformed dSPMs) and (ii) a collation of dipole locations (in the case of transformed MxNEs).

In addition to the MMN-focused source analysis described above, we performed a similar analysis for the target-elicited P1 component and masker-elicited P1/N1 components, with the only difference being the source spaces used. Consistent with typical P1 source topographies (78), for the target-elicited P1, we used a more posterior source space defined by the transverse temporal gyrus and the adjacent portion of the planum temporale (PT) (fig. S6A). For the masker-elicited P1 and N1 components, two subanalyses were carried out—one using the same posterior source space (fig. S6A) and the other using the more anterior source space defined by the control-condition MMN (fig. S6B). This was performed for two reasons. First, although the predominant generators of both the P1 (as mentioned above) and the N1 are typically found in TTG or PT (78), anterior subcomponents of N1 have also been observed. Second, because this analysis was used to examine whether our masked MMN results were influenced by attention, we also wanted to be sure to use as similar a source configuration as possible for it and the masker-elicited P1/N1 components.

Statistical analysis

All data were statistically evaluated using a repeated-measures ANOVA framework in R (

For behavioral data, true- and false-positive rates were evaluated using a two-way ANOVA with response (hits versus false alarms) and tone position (10 levels) as factors. True-positive (that is, hit) rates were further evaluated after they had been sorted into their respective frequency bins using a two-way ANOVA with frequency (8 levels) and tone position (10 levels) as factors.

For neural responses under control conditions, we compared the amplitudes of the responses elicited by both standards and deviants in both left AC and right AC. Quantified amplitudes were defined as follows. First, within each participant, the average responses for standards and deviants were combined into a single grand-average response, separately for each hemisphere. We then identified the negative-going peak (occurring between 100 and 200 ms) of this grand-average waveform. The amplitude values for deviants and standards were defined as the average amplitude (per condition) within 50 ms (±25) of the latency of the grand-average peak. These amplitude values were entered into a two-way ANOVA with condition (standards versus deviants) and hemisphere as factors.

For neural responses under masked conditions, we compared the amplitudes of the responses elicited by both standards and deviants in both left AC and right AC for both detected and undetected standard streams, a three-factor design. Quantified amplitudes were defined as the average amplitude between 100 and 250 ms (separately for each hemisphere), reflecting the slightly later/broader nature of MMN under masked conditions. The same pattern of results held when we used the peak-picking procedure from the control conditions (that is, finding the negative-going peak of the grand-average between 100 and 200 ms and taking the average amplitude within 50 ms of the latency of that peak, per condition). These amplitude values were entered into a three-way ANOVA with condition (standards versus deviants), percept (detected versus undetected), and hemisphere as factors. In addition, to further evaluate the effect of standard-stream perception on MMN generation, we directly compared MMN amplitudes for detected and undetected standard streams using a two-way ANOVA with percept (detected versus undetected) and hemisphere (left versus right) as factors. MMN amplitude was defined as for the individual conditions but using the deviant-minus-standard difference waveforms.

Bayes factor analysis

Because we were interested in whether the MMN would be elicited in the context of unperceived standard streams, we further analyzed MMN amplitudes in the context of unperceived standard streams using Bayes factors (13). With the mean and SE of an effect of interest (in our case, MMN amplitude in the context of unperceived standard streams), as well as a specification of a population effect-size “plausibility” distribution, the Bayes factor (ranging between zero and ∞) gives an indication of whether statistically nonsignificant differences actually provide evidence for the null hypothesis rather than being merely insensitive to a potential effect. Despite being a continuous metric, values less than 1/3 are conventionally taken as “substantial” evidence in favor of the null hypothesis, whereas values between 1/3 and 1 are taken as weak or “anecdotal” evidence (79). Values greater than 1 (3) are taken as weak (substantial) evidence against the null hypothesis.

To compute Bayes factors, we used the online calculator provided by Dienes (80), which allows one to specify (in addition to the required mean and SE of the effect of interest) the shape of the plausibility distribution for the alternative hypothesis (here, that the MMN was elicited in the context of unperceived standard streams). We first used a uniform distribution with a minimum of zero and a maximum of the mean MMN amplitude observed in the context of perceived standard streams. Intuitively, this indicates that although we did not precisely predict what the amplitude of the population mean MMN during unperceived standard streams would be, we assumed that it would be negative and that it could not be larger than that observed in the context of perceived standard streams. Second, because we wondered whether a uniform plausibility distribution might be too “vague” (that is, one might predict that MMN in the context of unperceived standard streams, even if present, would likely be smaller than that in the context of perceived standard streams), we also used a half-normal distribution with a mode of zero and an SD of half of the perceived-context MMN amplitude. As for the uniform specification, this assumes that the effective maximum plausible population effect size would be that which we observed in the context of perceived standard streams (that is, 2 SDs from the mode). However, it is a more conservative specification because values closer to zero are taken to be more plausible than those farther from it.


Supplementary material for this article is available at

Fig. S1. Neural activity under masked conditions after equalization of the number of epochs in each frequency bin across detected and undetected standards and deviants.

Fig. S2. Neural activity under control conditions, binned by frequency.

Fig. S3. Neural activity under masked conditions after epoch equalization, with emphasis on the P1 latency range.

Fig. S4. Masker-elicited P1 responses using the same source space as was used for MMN.

Fig. S5. Masker-elicited N1 responses using the same source space as was used for MMN.

Fig. S6. Source-space vertices for P1 and MMN source analyses.

Table S1. Target-tone P1 statistics.

Table S2. Masker-tone P1/N1 statistics.

Audio S1. Example of an auditory oddball sequence (an isochronous 902-Hz standard stream with a 947-Hz deviant and a 500-ms stimulus onset asynchrony) embedded in a random multitone masker cloud.

Audio S2. Example of an auditory oddball sequence in isolation.

This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Acknowledgments: We would like to thank B. Burghardt, H. Riedel, E. Tauberschmidt, and C. Uhlig for technical assistance, as well as A. Gramfort, Z. Dienes, and A. Rupp for helpful discussion. Funding: This work was supported by the German Federal Ministry of Education and Research (BMBF; grant 01EV0712 to A.G.). Author contributions: A.R.D. and A.G. conceived and designed the study. A.R.D. collected the data and performed the analysis. A.R.D. and A.G. wrote the paper. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.
View Abstract

Navigate This Article