## Abstract

Mammalian grid cells fire when an animal crosses the points of an imaginary hexagonal grid tessellating the environment. We show how animals can navigate by reading out a simple population vector of grid cell activity across multiple spatial scales, even though neural activity is intrinsically stochastic. This theory of dead reckoning explains why grid cells are organized into discrete modules within which all cells have the same lattice scale and orientation. The lattice scale changes from module to module and should form a geometric progression with a scale ratio of around 3/2 to minimize the risk of making large-scale errors in spatial localization. Such errors should also occur if intermediate-scale modules are silenced, whereas knocking out the module at the smallest scale will only affect spatial precision. For goal-directed navigation, the allocentric grid cell representation can be readily transformed into the egocentric goal coordinates needed for planning movements. The goal location is set by nonlinear gain fields that act on goal vector cells. This theory predicts neural and behavioral correlates of grid cell readout that transcend the known link between grid cells of the medial entorhinal cortex and place cells of the hippocampus.

- grid cell
- entorhinal cortex
- spatial cognition
- goal-directed navigation
- self localization
- maximum likelihood decoding
- population vector
- nonlinear gain fields
- goal-vector cells

## INTRODUCTION

The strikingly periodic spatial pattern of grid cell firing has caught the attention of experimental and theoretical neuroscientists alike and is thought to constitute a metric for space (*1*). Grid cells fire at regularly spaced intervals, so the distance an animal travels might be estimated by counting the number of episodes in which a particular grid cell fired. However, the spacing between firing fields and, hence, the distance traveled depend on the angle of movement relative to the grid’s orientation (Fig. 1A), which belies the notion of a metric at the single-cell level. The ensemble activity of grid cells could be deciphered to yield a spatial metric (*2*, *3*), but the cipher might need to be learned (*4*). We show here that a simple, biologically plausible decoder exists. This decoder need not be learned, is consistent with known properties of grid cells, and leads to a number of experimentally testable predictions.

Grid cells are organized into discrete modules (*5*); within each module, the spatial scale and orientation of the grid lattice are the same, but the lattice for different cells is shifted in space (Fig. 1B). This organization, which might have a mechanistic explanation (*6*, *7*), seems to complicate the encoding of spatial information. Constellations of grid cells will fire together at repeated locations, making the code ambiguous (Fig. 1B); if the grid orientations were not aligned, different sets of grid cells would fire together at these locations, creating unique labels for encoding spatial position (Fig. 1C). Paradoxically, the alignment of grid lattices within each grid cell module is a prerequisite for reading out the grid code using population vector averages across multiple spatial scales (Fig. 1D), as we show here.

## RESULTS

### Theoretical framework

For each grid cell, the spatial map of firing is captured by a bell-shaped function of space that repeats itself on a hexagonal lattice (Fig. 1D). All grid cells within one module have the same lattice orientation and grid scale λ but differ in their spatial phase (*5*, *8*). To determine what the neuronal population’s activity reveals about an animal’s location, consider a snapshot of the activity by counting the number of spikes *n*_{j} for each neuron in a fixed time window, such that the population’s response is *n* = (*n*_{1}, …, *n*_{N}). When the animal is at position , each neuron fires an average of spikes. However, the true number *n*_{j} scatters around this value. Assuming Poisson variability and statistically independent neurons, an ideal observer, given the set of spike counts *n*, will assign the following probability for the animal to be at position (1)Choosing the most likely position is known as maximum likelihood (ML) decoding.

### Population vector decoding and error correction in one dimension

Grid spacings range from about 25 cm to several meters in rats (*9*). The spiking of grid cells in the coarsest scale λ_{0} conveys a rough idea of where the animal is, but is subject to some uncertainty δ. Equation 1 generically predicts that an error δ introduced at scale λ_{0} can be corrected when a module with scale λ_{1} is added; δ is reduced to δ/[1 + *M*_{1}/(*M*_{0}*s*^{2})], where *s* = λ_{0}/λ_{1} and *M*_{k} is the number of cells at scale λ_{k}. Staggered spatial scales thus implement error correction, and the improvement grows with the number of neurons at the smaller scale (Supplementary Materials and fig. S1).

When space is restricted to one dimension, as on a narrow track (Fig. 2A), ML decoding of a single module reduces to a linear readout of the population vector average (Fig. 2B), provided the firing rate maps are given by von Mises functions (*10*). These are periodic generalizations of the Gaussian, Ω_{j}(*x*) = *n*_{max} · exp{κ[cos(2π(*x* – *c*_{j})/λ_{j}) – 1]}, where *c*_{j} is the preferred spatial phase of cell *j* and λ_{j} is its spatial period. The population vector (*11*) points exactly to the most likely position of the animal, and its length conveys the confidence in the position estimate.

In analogy to how one tells time using a traditional clock with an hour hand and a minute hand (Fig. 2C), neuronal population vectors lend themselves to an explicit algorithm for decoding position across multiple scales. Assume that the environment fits within the fundamental domain of the module with scale λ_{0}. The population vector is formed by summing over all cells at scale λ_{0}, weighting each cell’s spike count *n*_{j} by its spatial phase *c*_{j}. The coarse-scale estimate is then(2)where exp(*ic*_{j}) = cos(*c*_{j}) + *i* sin(*c*_{j}) is a phasor and the arg function computes the angle of a phasor in the complex plane (Fig. 2B). This estimate (Fig. 2A) is refined by the population vector of the second module that contains grid cells with scale λ_{1}(3)with . The term involving subtracts out the contribution from the earlier position estimate. This procedure can be recursively iterated for modules at finer scales and is necessary for a correct position estimate (fig. S2 and Supplementary Materials). Multiple modules thus implement error correction as shown by the increasingly more narrow distribution of position estimates (Fig. 2D). The performance of such a population vector decoder matches that of an ideal observer, but it requires that the grid cells be arranged into modules whose spatial periods form a discrete set. When the grid lattices span a continuum, an ideal observer might still glean a comparable amount of spatial information from the neuronal response, but a population vector decoder is invariably inferior (see figs. S7 and S8).

### Alignment of grid cell lattices in two dimensions within and across modules

Unlike time, space has more than one dimension. Using three superimposed plane waves as the argument of the von Mises function, one obtains a model for hexagonal firing fields in the plane. Periodicity in two dimensions means that the lattice’s unit cell is mapped onto a torus (Fig. 2E). Rather than a singe clock, we now have two clocks, one for each angle variable on the torus.

In one dimension, a population of neurons with von Mises tuning curves yields a posterior probability that is also von Mises, albeit more peaked. In two dimensions, the lattices of different grid cells may be shifted in spatial phase, as before, or also rotated. Random rotations within a module (Fig. 3A) destroy the hexagonal structure in . The tolerance to deviations from a perfect alignment can be assessed by computing the grid score (*1*), which measures the degree of local hexagonal symmetry. If the lattice orientations vary by more than 10°, the grid score of drops precipitously (Fig. 3B). As the lattices of measured grid cells are tightly aligned in orientation (*5*, *12*), the ensemble activity generates a grid-like posterior position probability.

Whether the lattice orientations are aligned or not, any neural ensemble can be decoded using Eq. 1. For ensembles with few neurons and low peak firing rates, alignment within a single module leads to slightly more accurate position estimates (Fig. 3C). Furthermore, the discrete symmetry axes in allow the animal to calculate its position by trilateral intersection (Fig. 4A). Three population vectors μ_{l} are formed by projecting the spatial phase onto the vectors of the hexagonal lattice. The ML estimate then reads(4)This procedure is equivalent to topographically arranging the population’s spike counts and multiplying them by a set of spatial weighting functions that are sinusoidal gratings (Fig. 4B), which correspond to the sine and cosine terms of Eq. 2. These terms then give rise to the population vectors that are summed in Eq. 4. Implicit in the gratings’ spatial phases is the origin of the coordinate system, yet this origin is arbitrary; for instance, it could represent the animal’s home or a reward location. Switching between locations can be accomplished by rotating the phases (Fig. 4B). Because , such a phase change is tantamount to a multiplication of the sinusoidal gratings by a function of the phase φ and resummation, akin to the multiplicative modulation of visual tuning curves in the posterior parietal cortex by gaze direction (*13*).

The estimate of the homing vector at one spatial scale sets an offset for refining the estimate at the next finer scale (Fig. 5A). Shorter scales imply that the lattice’s tiling of space becomes finer; thus, this offset resolves the ambiguity associated with the lattice’s periodic nature. The metric readout of the animal’s position relative to different locations of interest is then the result of a linear combination of scales (Fig. 5B). Though a population vector code requires the grid axes to be aligned within a module, alignment across modules is not essential. However, such an alignment improves the spatial resolution (Fig. 6, A to F) and has been observed experimentally (*5*, *14*, *15*).

### Decoding from elliptic grid patterns

In many cases, measured grid cell activity patterns deviate from hexagonal symmetry, most often in the vicinity of the boundaries of the environmental enclosure (*5*, *8*, *12*, *15*, *16*). If the grid lattice shears, the grid pattern becomes elliptical; this shearing can become more pronounced over time, starting from an almost hexagonal initial activity pattern (*14*).

No matter how extreme the ellipticity, the proposed population vector readout using Eq. 4 is unchanged, except for a (potentially location-dependent) rescaling of the wave vectors (see the Supplementary Materials and eq. S9). Indeed, any invertible linear transformation of the lattice can be readily incorporated into the readout. Different modules may be subject to distinct distortions (*14*, *15*), and Eq. 4 can be adapted separately for each module. Within each module, all grids should be distorted (locally) in a similar way, in accordance with findings by Stensola *et al.* (*14*).

### Optimal scale ratio

The total number of grid cells might be as low as 5000 in rats (*17*), and downstream neurons might sample from only a small set of cells. To study the limits of grid cell coding under such adverse conditions, we now consider low firing rates and few neurons per module. will have a shallow peak, implying that decoding becomes more uncertain and mistakes become more likely. If one reads off the minute hand on a clock, the answer cannot be off by more than 30 min; likewise, the worst error in decoding a 2D module with length scale λ is λ/2.

Refining the position estimate relies on nesting modules at different length scales, with the goal of making the peak in the probability distribution narrower. Figure 7A illustrates the change in in one spatial dimension upon the addition of a second module with a length scale λ_{1}. If the scale ratio is *s* = λ_{0}/λ_{1} = 2, the second module increases the probability of the worst possible error: miscomputing the position by ±λ_{0}/2.

The ratio *s* = 3/2, on the other hand, is a safe choice. At *x* = λ_{0}/2, the second module contributes a term proportional to cos(*s*π) to the log probability, but cos(3/2π) = 0. Together with the normalization of probability distribution, this trigonometric fact ensures that adding a second module does not increase the probability of large decoding errors, irrespective of the number of neurons, the firing rate, or the shape parameter of the neurons’ tuning curve.

The same argument holds when the population code represents 2D space. For the nonideal ratio of length scales *s* = 2, the histogram of positions decoded from the population spike counts, measured relative to the true position, exhibits a rosetta-like pattern (Fig. 7B): The hexagonal “petals” in this pattern reflect the interference between successive modules.

For a low firing rate and a small module size *M*, a four-module grid code conveys the most information when the length scales obey 1 ≤ *s* ≤ 2 (Fig. 7C). If the number of bits is *b*, the spike count vector can resolve 2^{b} different locations. The greater the number of spikes across the four modules, the higher the scale ratio *s* can be. The optimal *s* falls into discrete levels, as assessed by the average decoding error (Fig. 7E). The first discrete level distinct from *s* = 1 is centered on *s* = 3/2 (dotted line in Fig. 7D).

### Representing space beyond the longest grid scale

If the scale ratio *s* = λ_{0}/λ_{1} is not an integer, the smaller grid scales are no longer periodic with respect to the unit cell ℒ_{0} at the largest scale. Indeed, the posterior , treated as a function of , does not repeat itself until the least common multiple of all scales has been reached. This fact has led to the proposal that grid codes instantiate a residue number system to represent a spatial range much larger than λ_{0} (*2*, *18*): Each module signals an independent spatial phase, such that the set of phases represents a combinatorial code for spatial position (Fig. 8A).

A fixed scale ratio *s* implies a geometric progression of scales across modules. As soon as the number of modules *L* exceeds 2, the scales do not form a co-prime set; in other words, the least common multiple of the set is less than the product of all scales. This contravenes the Chinese Remainder Theorem (*19*) that underlies modular arithmetic and states that any combination of discrete spatial phases across modules maps onto a unique position (modulo the largest common multiple of all scales). However, violating the Chinese Remainder Theorem turns out to provide an advantage. If we approximate the scale ratio by a rational number *s* = *p*/*q*, where *p* and *q* are integers, certain combinations of discretized spatial phases across modules are forbidden—they should never appear in the deterministic limit. Because of the stochastic nature of neuronal discharge, such combinations might occur, however, when the population vectors are calculated, but then the readout could seek the closest valid set of phases, as illustrated in Fig. 8B, and thereby correct the error.

For *s* = *p*/*q*, the linear dimension of the range is *q*^{L−1}λ_{0}, where *L* is the number of modules. In two dimensions, a set of three population vectors are projected along the wave vectors . For each , a separate residue number system exists. These are then summed as in Eq. 4. As shown in Fig. 8C, modular arithmetic can be used to decode positions in 2D space far beyond the fundamental domain of the largest grid scale, as long as there are sufficiently many grid cells firing vigorously. In the limit of high noise, low firing rates, or low numbers of neurons, modular arithmetic collapses (Fig. 8D). As the population model yields an explicit and simple representation of the posterior probability , we can quantitatively assess the likelihood of making catastrophic errors.

Let us focus on one dimension and *L* = 2, as the results that follow generalize (Supplementary Materials). We can write the posterior aswhere we assume that each module has the same number of neurons *M*. The term cos(2π*cx*/*p*) has maxima at *x* = *kp*/*c*, *k* ∈ ℤ, whereas cos(2π*cx*/*q*) has maxima at *x* = *lq*/*c*, *l* ∈ ℤ. Errors most likely occur when the amplitude of the secondary peaks in the posterior *P*(*x*|*n*) lie close to the amplitude of the primary peak (Fig. 8E). The highest secondary peak occurs when

Solving this Diophantine (integer congruence) equation, we find that the difference between the primary and secondary maxima in the posterior is(5)The scaling law in Eq. 5 also correctly predicts the frequency of making large-scale decoding mistakes in the 2D plane, as determined numerically (Fig. 8F).

Ideally, Δln *P*(*x*|*n*) should be as large as possible. In other words, *p* and *q* should be small integers. We must have *q* > 1 for the range to be larger than λ_{0}; hence, the smallest integer we can choose is *q* = 2. With this choice, the smallest integer that is co-prime to *q* is *p* = 3. Therefore, the optimal scale ratio iswhich coincides with the most robust *s* based on iterative refinement of the vector estimate (Fig. 7, A to E). In the limit of high noise and low firing rates, we predict that a modular arithmetic code would sacrifice range for robustness.

## DISCUSSION

Multiscale grid codes can represent vast areas of space (*2*, *20*) or a more limited area with high precision (*3*). The resolution of such a code could reach a millimeter or less, based on an ideal observer decoding the population response. As we have demonstrated here, the ideal observer is unnecessary: Reading the code is both simple and biologically plausible, once population vector decoding is used. Similar to a land survey, measuring the position in two dimensions relies on determining multiple vectors; trigonometry predicts that several neuronal population vectors should be added together to obtain a position estimate. This estimate yields a new egocentric vector from the current (allocentric) position to an (arbitrary) origin ; changing the origin of the coordinate system using nonlinear gain fields is straightforward, so that a multiscale grid code could let a foraging animal always know the direction and distance to home or some goal. Such a mechanism generalizes population vector average decoders that have been implicated in visuomotor transformations (*11*, *21*, *22*).

The spatial activity of a single grid cell does not constitute a metric, whereas an ensemble of hierarchically organized grid cells does provide a distance measure, as our results prove. Although spatial information is highly distributed across scales, the readout is a simple linear combination of population vector averages that performs error correction. The metric’s accuracy stems from the geometric progression of length scales, ranging from coarse to fine. Such nested scales have been predicted by optimal coding (*3*). Measured grid cell maps cluster into discrete groups at different length scales, such that the ratio *s* of successive scales lies between 1.4 and 1.7 (*5*, *8*). Wei *et al.* (*23*) derived an optimal scale ratio of *e*^{1/D} using a different measure of resolution, where *D* is the dimension of space. In contrast, our argument that *s* ≈ 3/2 does not depend on *D* and is based on a worst-case analysis for small numbers of spikes (see the Supplementary Materials); we predict that grid fields in flying or swimming mammals also cluster into discrete modules and that the scale ratio is similar to the one observed in the spatial firing maps for terrestrial movement.

As the measured ratio between λ_{i} and λ_{i+2} is about 2 (*5*) or larger (*8*), silencing an intermediate-scale module should lead to systematic errors (Fig. 7B). On the other hand, removing the grid module with the smallest scale would only affect the fine precision of navigation. Likewise, our theory predicts that increasing some grid scales by down-regulating specific cellular conductances (*24*) should gradually decrease spatial precision, whereas it would drastically alter the readout if grid cells were used for modular arithmetic (*2*). These predictions could be tested systematically with path integration experiments, in which an animal would have to reproduce specific distances with high spatial accuracy without the aid of landmarks. Specific subpopulations called island and ocean cells have already been genetically identified and characterized (*17*, *25*). Given the ability to deep sequence single neurons, acute manipulations of single grid modules could become feasible in the near future. By perturbing specific modules using targeted pharmacogenetic (*26*) or optogenetic manipulations (*27*), the contributions of individual modules to navigational accuracy could be compared to theoretical predictions.

Experiments show that grid patterns in adjacent compartments fuse into a single, continuous pattern spanning both compartments while the animal familiarizes itself with the environment (*28*). Therefore, a single decoder will be able to read out the grid code, even over long distances. The universal nature of the metric has been called into question, however, as firing fields become more irregular in narrow regions of space (*15*), which corroborates earlier findings that the spatial representation in grid cells changes when a hairpin maze is introduced into an open arena (*16*). The measured grids are typically elliptical, and the eccentricity of the grid pattern often varies throughout one arena (*14*, *29*); the activity of boundary cells might influence these differences in eccentricity (*15*, *30*). Yet, as we have shown, the metric nature of the spatial representation lies not in the regularity or periodicity of the grid but in the population activity. Whatever the sources of distortion are, one can think of mechanisms by which the (location-dependent) eccentricity would modulate the readout formula (Eq. 4), preserving the population metric for space. The prerequisite for a universal metric is that the distortion be common to all cells within the same module. Across modules, however, the eccentricity can be different, as observed experimentally (*5*, *14*, *15*). If the brain did not use an adaptive readout, this should result in distortions of spatial perception, which could be tested in psychophysical experiments. Regardless of whether the metric is universal or not, the grid metric by itself may prove insufficient for real-world navigation, as it contains no information about physical obstacles in the environment. How the nervous system might translate the goal direction vector into a feasible shortest path is an open question.

Within the nested hierarchy of grid modules, self-similar scaling implies that the finer-scale modules provide higher spatial resolution. In modular arithmetic, the resolution is assumed to be the same for each module. However, as long as the true resolution is sufficient for modular arithmetic, the two decoding schemes are compatible with each other: a readout mechanism could do both. In either scheme, making the grid code robust predicts a geometric progression of grid scales, in both cases with the scale ratio *s* = 3/2. Such a progression stands in contrast to the original proposal for modular arithmetic, which envisions a set of grid scales that are not divisible by a common factor (*2*). Although a geometric progression forfeits some of the range that co-prime number sequences of spatial periods would permit, the spatial range could still cover several square kilometers, as the following calculation inspired by experimental estimates shows: take 10 nested grid cell modules (*5*), with scales ranging from 25 cm to several meters (*9*), which is consistent with a scale ratio of *s* = 3/2. With these numbers, the range along each dimension of space is 2^{9} · (3/2)^{9} · 0.25 m, which is approximately 5 km. Modular arithmetic codes need to first be converted into an explicitly metric representation (*2*), though, which must be learned (*4*, *31*). The high variability of neuronal discharge in grid cells (*10*, *32*, *33*) stresses the importance of robustness for any biological coding scheme. Given this variability, it seems unlikely that single grid cells will signal spatial phase to downstream networks of neurons with sufficient reliability; regardless of the grid cell coding scheme, population averages will be required.

Population vector decoding predicts not only that grid cells are grouped into modules but also that lattices within one module should be aligned, as observed experimentally (*1*, *12*). Spatial information would still be present, were the lattices randomly aligned or the modules absent, but would not be as easily decoded. Yet, which neurons read out the grid code? Landmark vector cells in the hippocampal area CA1 are one candidate (*34*, *35*), as these cells respond when the homing vector matches a fixed vector describing a specific direction and distance to a landmark. Outside of the hippocampus, circuits in the retrosplenial and posterior parietal cortex, areas essential to memory-dependent spatial navigation (*36*, *37*), may be involved. A direct link exists between (presumptive) grid cells of the presubiculum and retrosplenial cortex (*38*), whereas the posterior parietal cortex, well known for the multiplicative interactions between its inputs (*13*), receives afferents from the medial entorhinal cortex (*39*).

One of the key predictions of our theory is that the nervous system will be able to rotate the population vector averages. According to the sum rules of trigonometry, such a rotation is equivalent to multiplying the readout weights with cosine-like functions, which would act as gain fields. The effect of such a multiplication is a near-instantaneous change in the represented goal location. Neural mechanisms for such multiplications have been proposed (*40*, *41*), but the neural implementation could be implicit at the network level (*42*) and possibly be based on a restricted set of “template” functions (*43*). In the latter case, a set of cells would change their firing rate on the same time scale with which the goal location changes and switch the phase of the readout weights. Two-photon imaging in animals performing cued navigation tasks [for example, Harvey *et al.* (*36*)] could be used to search for such goal-encoding neurons and, with widefield imaging in freely navigating animals (*44*), could reveal signatures of these computations. Multiplication is an intermediate step in the computation of the homing vector, whose final result, if we interpret the model literally, corresponds to a linear ramp of firing activity as a function of distance to the goal (fig. S4); other representations are also possible (*45*). Whether grid cell ensemble activity is indeed decoded in this manner remains a question for further research.

## MATERIALS AND METHODS

To model the spatial firing rate map of a grid cell, which describes the cell’s expected spike count when the animal is at location , we superimposed three plane waves with wave vectors , where for , *l* ∈{1,2,3}, and exponentiated the result(6)This is a 2D von Mises function. For a hexagonal grid that is aligned to the *x* axis, the three vectors are with φ_{l} = −π/6 + *l*π/3. The term ω is given by ω = 2π/(sin(π/3)λ), where λ is the spatial scale of the firing pattern (or grid size), 1/κ measures the cell’s relative tuning width, and *n*_{max} is the maximal expected spike count. To capture the 2D spatial phase of cell *j*’s firing pattern, the argument on the right-hand side of Eq. 6 is replaced by .

We assumed that the number of spikes emitted by the *j*th grid cell obeys Poisson statistics, such that the expected spike count is . Each neuron is statistically independent, so the conditional probability for being at location given the spiking activity in a population of *N* neurons is

In the Supplementary Materials, we reformulated this posterior probability for the population of grid cells to derive a simple and biologically plausible decoder that provides a position estimate given a stochastic (noisy) realization of the population spike count vector *n* = (*n*_{1},…, *n*_{N}). Grid cells in the entorhinal cortex are subdivided into populations of cells with different grid spacings (*5*); these subpopulations are called “modules.” To capture this property, we assigned the same tuning curve to each neuron within a module, such that ω, *n*_{max}, and κ are identical, but the spatial phases are different. The phases are either equidistantly arranged or randomly (but uniformly) distributed across the unit cell ℒ (also known as the fundamental domain) of the hexagonal grid. The Supplementary Materials show how a population vector decoder for a single module can be derived from Eqs. 6 and 7 and how to combine the information about position across modules.

## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/1/11/e1500816/DC1

Text

Fig. S1. Hierarchical self-similar scales enable error correction.

Fig. S2. Noninteger scale ratios imply that the decoding algorithm must be able to rotate population vectors.

Fig. S3. The lengths of the population vectors along the hexagonal grid’s axes are correlated.

Fig. S4. Possible distributed representations of the position vector estimate of eq. S6 across a population of readout neurons.

Fig. S5. Comparison of different decoding schemes for a single module in two dimensions.

Fig. S6. Comparison of different decoding schemes for multiple modules in two dimensions.

Fig. S7. If the grid scales are not organized into discrete modules, population vector decoding is no longer straightforward.

Fig. S8. Continuum decoding requires multiple population vectors.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is **not** for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**We thank E. Moser for stimulating discussions, T. Stensola for providing Fig. 1D, M. Amoroso for graphical support, and M. Amoroso, D. Derdikman, and C. Harvey for comments on the manuscript.

**Funding:**Work at the Bernstein Center for Computational Neuroscience Munich was supported by Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie (01GQ1004A). A.M. received support from Deutsche Forschungsgemeinschaft grant MA 6176/1-1 and the Marie Curie Fellowship (PIOF-GA-2013-622943 of the European Union’s Seventh Framework Programme FP7 2007-13).

**Author contributions:**M.S., A.M., and A.V.M.H. worked out the theory, developed experimental predictions, and wrote the manuscript.

**Competing interests:**The authors declare that they have no competing interests.

**Data and materials availability:**All data used to obtain the conclusions in this paper are presented in the paper and/or the Supplementary Materials.

- Copyright © 2015, The Authors