Research ArticleNEUROSCIENCE

Connecting multiple spatial scales to decode the population activity of grid cells

See allHide authors and affiliations

Science Advances  18 Dec 2015:
Vol. 1, no. 11, e1500816
DOI: 10.1126/science.1500816


Mammalian grid cells fire when an animal crosses the points of an imaginary hexagonal grid tessellating the environment. We show how animals can navigate by reading out a simple population vector of grid cell activity across multiple spatial scales, even though neural activity is intrinsically stochastic. This theory of dead reckoning explains why grid cells are organized into discrete modules within which all cells have the same lattice scale and orientation. The lattice scale changes from module to module and should form a geometric progression with a scale ratio of around 3/2 to minimize the risk of making large-scale errors in spatial localization. Such errors should also occur if intermediate-scale modules are silenced, whereas knocking out the module at the smallest scale will only affect spatial precision. For goal-directed navigation, the allocentric grid cell representation can be readily transformed into the egocentric goal coordinates needed for planning movements. The goal location is set by nonlinear gain fields that act on goal vector cells. This theory predicts neural and behavioral correlates of grid cell readout that transcend the known link between grid cells of the medial entorhinal cortex and place cells of the hippocampus.

  • grid cell
  • entorhinal cortex
  • spatial cognition
  • goal-directed navigation
  • self localization
  • maximum likelihood decoding
  • population vector
  • nonlinear gain fields
  • goal-vector cells


The strikingly periodic spatial pattern of grid cell firing has caught the attention of experimental and theoretical neuroscientists alike and is thought to constitute a metric for space (1). Grid cells fire at regularly spaced intervals, so the distance an animal travels might be estimated by counting the number of episodes in which a particular grid cell fired. However, the spacing between firing fields and, hence, the distance traveled depend on the angle of movement relative to the grid’s orientation (Fig. 1A), which belies the notion of a metric at the single-cell level. The ensemble activity of grid cells could be deciphered to yield a spatial metric (2, 3), but the cipher might need to be learned (4). We show here that a simple, biologically plausible decoder exists. This decoder need not be learned, is consistent with known properties of grid cells, and leads to a number of experimentally testable predictions.

Fig. 1 Decoding spatial relations from single grid cells and single grid cell modules.

(A) Grid cells fire when the animal traverses the vertices of an internally generated hexagonal grid tiling the environment. However, counting the activity bouts of a single grid cell does not convey a measure of distance. Depending on the angle of the animal’s motion relative to the grid cell’s lattice, bumps in the spatial firing map will be encountered at spacings such as λ, Embedded Image, or Embedded Image. The periodicity of grid cell activity also makes it impossible to uniquely decode spatial location from a single grid cell. a.u., arbitrary units. (B) Nearby grid cells within a module share a common lattice orientation and scale, as depicted by the firing patterns of three idealized grid cells. The population activity repeats periodically as well; thus, it is impossible to decode the position at the population level, at least not on the basis of a single module when firing fields are set more closely than the dimensions of the environment. (C) Grid cell modules with different single-cell grid orientations could be decoded unambiguously but are not observed experimentally. As a consequence, spatial location can only be decoded by combining grid cell modules with multiple spatial scales. (D) Indeed, grid cells have lattices whose length scales λk form a discrete set, ranging from coarse to fine [figure adapted from Stensola et al. (5)]. The trajectories of a rat in a 2.2 × 2.2 m2 enclosure are shown in gray; spikes from four grid cells are shown in red.

Grid cells are organized into discrete modules (5); within each module, the spatial scale and orientation of the grid lattice are the same, but the lattice for different cells is shifted in space (Fig. 1B). This organization, which might have a mechanistic explanation (6, 7), seems to complicate the encoding of spatial information. Constellations of grid cells will fire together at repeated locations, making the code ambiguous (Fig. 1B); if the grid orientations were not aligned, different sets of grid cells would fire together at these locations, creating unique labels for encoding spatial position (Fig. 1C). Paradoxically, the alignment of grid lattices within each grid cell module is a prerequisite for reading out the grid code using population vector averages across multiple spatial scales (Fig. 1D), as we show here.


Theoretical framework

For each grid cell, the spatial map of firing is captured by a bell-shaped function of space that repeats itself on a hexagonal lattice (Fig. 1D). All grid cells within one module have the same lattice orientation and grid scale λ but differ in their spatial phase (5, 8). To determine what the neuronal population’s activity reveals about an animal’s location, consider a snapshot of the activity by counting the number of spikes nj for each neuron in a fixed time window, such that the population’s response is n = (n1, …, nN). When the animal is at position Embedded Image, each neuron fires an average of Embedded Image spikes. However, the true number nj scatters around this value. Assuming Poisson variability and statistically independent neurons, an ideal observer, given the set of spike counts n, will assign the following probability for the animal to be at position Embedded ImageEmbedded Image(1)Choosing the most likely position Embedded Image is known as maximum likelihood (ML) decoding.

Population vector decoding and error correction in one dimension

Grid spacings range from about 25 cm to several meters in rats (9). The spiking of grid cells in the coarsest scale λ0 conveys a rough idea of where the animal is, but is subject to some uncertainty δ. Equation 1 generically predicts that an error δ introduced at scale λ0 can be corrected when a module with scale λ1 is added; δ is reduced to δ/[1 + M1/(M0s2)], where s = λ01 and Mk is the number of cells at scale λk. Staggered spatial scales thus implement error correction, and the improvement grows with the number of neurons at the smaller scale (Supplementary Materials and fig. S1).

When space is restricted to one dimension, as on a narrow track (Fig. 2A), ML decoding of a single module reduces to a linear readout of the population vector average (Fig. 2B), provided the firing rate maps are given by von Mises functions (10). These are periodic generalizations of the Gaussian, Ωj(x) = nmax · exp{κ[cos(2π(xcj)/λj) – 1]}, where cj is the preferred spatial phase of cell j and λj is its spatial period. The population vector (11) points exactly to the most likely position of the animal, and its length conveys the confidence in the position estimate.

In analogy to how one tells time using a traditional clock with an hour hand and a minute hand (Fig. 2C), neuronal population vectors lend themselves to an explicit algorithm for decoding position across multiple scales. Assume that the environment fits within the fundamental domain of the module with scale λ0. The population vector is formed by summing over all cells at scale λ0, weighting each cell’s spike count nj by its spatial phase cj. The coarse-scale estimate is thenEmbedded Image(2)where exp(icj) = cos(cj) + i sin(cj) is a phasor and the arg function computes the angle of a phasor in the complex plane (Fig. 2B). This estimate (Fig. 2A) is refined by the population vector of the second module that contains grid cells with scale λ1Embedded Image(3)with Embedded Image. The term involving Embedded Image subtracts out the contribution from the earlier position estimate. This procedure can be recursively iterated for modules at finer scales and is necessary for a correct position estimate (fig. S2 and Supplementary Materials). Multiple modules thus implement error correction as shown by the increasingly more narrow distribution of position estimates (Fig. 2D). The performance of such a population vector decoder matches that of an ideal observer, but it requires that the grid cells be arranged into modules whose spatial periods form a discrete set. When the grid lattices span a continuum, an ideal observer might still glean a comparable amount of spatial information from the neuronal response, but a population vector decoder is invariably inferior (see figs. S7 and S8).

Fig. 2 Reading a multiscale periodic code.

(A) Tuning curves of four nested modules with M = 20 grid cells and evenly spaced phases. The animal’s position yields a spike vector n in each module. The likelihood P(x|n) at that scale depicts the probability of being at a certain location, given the respective spike vector. Modules with smaller spatial periods λ have more localized likelihoods, but their multiple peaks result in ambiguous position estimates. The joint likelihood given the responses of all modules, shown in gray, is highly localized and nonperiodic. The overall ML estimate is closer to the animal’s position than x0, the ML estimate of the first module. (B) All ML estimates are determined by population vectors (PVs), which are formed by assigning each position x to a phase on the unit circle, weighting the number of spikes of each cell by its preferred phase, and then summing, as shown for the first two modules. (C) These PVs can be combined for refining the position estimate, similar to how the hour and minute hands of a clock are combined to read the time of the day. In a clock, the ratio of successive scales is 12, as there are 12 hours in each half-day, and in each hour, the minute hand completes one full cycle. (D) However, the scale ratio for successive grid modules is not generally an integer [the example in (A) has a ratio of 3:2]; hence, at the next scale, a new PV refines the position estimate by using the earlier estimate xi as the center of the range of possible values for xi+1 (Eq. 3). The refined estimate x1 in (A) is close to but not identical with the ML estimate from this module. Further estimates taking into account modules 2 and 3 are recursively calculated (eq. S13). Histograms of these estimates for 213 realizations of the spike vector n are shown in colors corresponding to the different modules in (A). The relative SDs σ/λ0 highlight that the estimate at each scale successively refines the position estimate (simulation parameters: nmax = 2, κ = 2, and s = 3/2). (E) In two dimensions, the periodicity of the lattice means that the unit cell (black hexagon) can be mapped onto a torus. The position Embedded Image can be read out like a two-dimensional (2D) clock with multiple scales.

Alignment of grid cell lattices in two dimensions within and across modules

Unlike time, space has more than one dimension. Using three superimposed plane waves as the argument of the von Mises function, one obtains a model for hexagonal firing fields in the plane. Periodicity in two dimensions means that the lattice’s unit cell is mapped onto a torus (Fig. 2E). Rather than a singe clock, we now have two clocks, one for each angle variable on the torus.

In one dimension, a population of neurons with von Mises tuning curves yields a posterior probability Embedded Image that is also von Mises, albeit more peaked. In two dimensions, the lattices of different grid cells may be shifted in spatial phase, as before, or also rotated. Random rotations within a module (Fig. 3A) destroy the hexagonal structure in Embedded Image. The tolerance to deviations from a perfect alignment can be assessed by computing the grid score (1), which measures the degree of local hexagonal symmetry. If the lattice orientations vary by more than 10°, the grid score of Embedded Image drops precipitously (Fig. 3B). As the lattices of measured grid cells are tightly aligned in orientation (5, 12), the ensemble activity generates a grid-like posterior position probability.

Fig. 3 At the population level, a periodic representation of 2D space results only for aligned grid lattices.

(A) Four hundred neurons with randomly phase-shifted, yet aligned grid-like tuning curves yield a posterior probability distribution Embedded Image that is hexagonal (left). If the lattices are randomly oriented, the hexagonal structure disappears (right). (B) The degree of variation in the lattice orientations strongly affects the hexagonal structure in Embedded Image, as measured by its “gridness” (1). (C) Even for randomly oriented lattices, the population response can be decoded by an ideal observer; if the number of neurons is small, aligned lattices result in a lower root mean square (RMS) error. Randomly positioning the lattices, as opposed to evenly spacing them, worsens the error. Size of square box, 1 m2.

Whether the lattice orientations are aligned or not, any neural ensemble can be decoded using Eq. 1. For ensembles with few neurons and low peak firing rates, alignment within a single module leads to slightly more accurate position estimates (Fig. 3C). Furthermore, the discrete symmetry axes in Embedded Image allow the animal to calculate its position by trilateral intersection (Fig. 4A). Three population vectors μl are formed by projecting the spatial phase onto the vectors Embedded Image of the hexagonal lattice. The ML estimate then readsEmbedded Image(4)This procedure is equivalent to topographically arranging the population’s spike counts and multiplying them by a set of spatial weighting functions that are sinusoidal gratings (Fig. 4B), which correspond to the sine and cosine terms of Eq. 2. These terms then give rise to the population vectors that are summed in Eq. 4. Implicit in the gratings’ spatial phases is the origin of the coordinate system, yet this origin is arbitrary; for instance, it could represent the animal’s home or a reward location. Switching between locations can be accomplished by rotating the phases (Fig. 4B). Because Embedded Image, such a phase change is tantamount to a multiplication of the sinusoidal gratings by a function of the phase φ and resummation, akin to the multiplicative modulation of visual tuning curves in the posterior parietal cortex by gaze direction (13).

Fig. 4 Decoding position in two dimensions.

(A) The hexagonal lattice has three wave vectors, Embedded Image, Embedded Image, and Embedded Image, spaced 60° apart. One can transform the hexagonal unit cell into different equally sized rectangles: Form three such rectangles so that the short edge aligns with the Embedded Image’s. Compute the population vector estimate of the position μl along the short edge of the rectangle, averaging across the long edge. For each rectangle, this yields a position estimate along the axis Embedded Image, without specifying the position in the orthogonal direction. At the height μl, draw a line parallel to the long edge in each rectangle. If the projected position estimates are exact, the three resulting lines will meet at one point, the true position of the animal. Otherwise, the three lines form an equilateral triangle, whose center is the ML solution Embedded Image. (B) The grid cell population response can yield a homing vector in egocentric coordinates to any point in the environment, such as the location of the nest (purple) or a reward (orange). Topographically rearranging the cells according to spatial phase yields a spike count map. The population vector is formed by multiplying the spike count with cosine gratings, which are aligned along the three axes of the hexagonal lattice. Each such grating is complemented by a weight function phase-shifted by 90° (not shown). The phase of the gratings determines where the homing vector points; rotating the phase of the weights shifts the vector from pointing to the nest to pointing to the reward location.

The estimate of the homing vector at one spatial scale sets an offset for refining the estimate at the next finer scale (Fig. 5A). Shorter scales imply that the lattice’s tiling of space becomes finer; thus, this offset resolves the ambiguity associated with the lattice’s periodic nature. The metric readout of the animal’s position relative to different locations of interest is then the result of a linear combination of scales (Fig. 5B). Though a population vector code requires the grid axes to be aligned within a module, alignment across modules is not essential. However, such an alignment improves the spatial resolution (Fig. 6, A to F) and has been observed experimentally (5, 14, 15).

Fig. 5 Combining population vectors at different scales.

(A) For each scale, there is a reference frame set by the position estimate from the previous scale. This guarantees that the correction of the position estimate lies within the hexagonal unit cell at the next scale. (B) At each scale, periodicity maps the unit cell in 2D Euclidean space onto a torus. The population vectors from the corresponding module yield a vector from the origin (circles) to the estimate of current position (star). As shown in (A), this estimate sets the origin of the coordinate system at the next scale. A linear sum of the estimates (θii) at each length scale, multiplied by weights Wi, produces a precise estimate of the homing vector. These Wi’s are functions of the ratio s = λk+1k of successive length scales. As long as the longest length scale λ0 is large enough to cover the local environment, the homing vector maps directly back onto Euclidean 2D space.

Fig. 6 Failures of decoding and the alignment of orientations across modules.

Take two modules whose spatial periods are in the ratio s = λ01 = 2 and rotate the finer-scale module’s lattice orientation by an angle φ relative to the first module. (A) The expected logarithm of the posterior distribution Embedded Image when the two lattices are aligned, given that the true position is at Embedded Image. (B) Suppose that the population vector at scale λ0 yields an estimate Embedded Image. This Embedded Image centers the fundamental domain of the module with the smaller lattice scale λ1 (blue dashed lines), but now the true position must lie within the smaller fundamental domain. If it does not, as shown above, then the refinement stage of decoding will try to estimate Embedded Image and not Embedded Image. Here, ℒ11 is the fundamental domain at length scale λ1 centered at Embedded Image. For a scale ratio of s = 2, this leads to an error of λ0/2, where λ0 is the distance between firing fields at the coarsest scale. (C) If the lattices in the second module are rotated by π/6 relative to the first, the side peaks Embedded Image move toward the vertices of the fundamental domain. Note that the effective spatial period in Embedded Image also shrinks, compared to the case of aligned orientations. (D) An error at the coarser scale is “corrected” toward Embedded Image, which lies close to the vertex. (E) As the second module’s orientation varies, the expected variance of Embedded Image at a given spike count peaks at π/6. A value of Embedded Image was chosen for the posterior distribution (eq. S5). (F) This effect is most pronounced in the regime of low neuron numbers M or low spike counts nmax, which leads to the small concentration parameter Embedded Image for the posterior probability distribution. The width of the posterior distribution is inversely related to the concentration parameter Embedded Image.

Decoding from elliptic grid patterns

In many cases, measured grid cell activity patterns deviate from hexagonal symmetry, most often in the vicinity of the boundaries of the environmental enclosure (5, 8, 12, 15, 16). If the grid lattice shears, the grid pattern becomes elliptical; this shearing can become more pronounced over time, starting from an almost hexagonal initial activity pattern (14).

No matter how extreme the ellipticity, the proposed population vector readout using Eq. 4 is unchanged, except for a (potentially location-dependent) rescaling of the wave vectors Embedded Image (see the Supplementary Materials and eq. S9). Indeed, any invertible linear transformation of the lattice can be readily incorporated into the readout. Different modules may be subject to distinct distortions (14, 15), and Eq. 4 can be adapted separately for each module. Within each module, all grids should be distorted (locally) in a similar way, in accordance with findings by Stensola et al. (14).

Optimal scale ratio

The total number of grid cells might be as low as 5000 in rats (17), and downstream neurons might sample from only a small set of cells. To study the limits of grid cell coding under such adverse conditions, we now consider low firing rates and few neurons per module. Embedded Image will have a shallow peak, implying that decoding Embedded Image becomes more uncertain and mistakes become more likely. If one reads off the minute hand on a clock, the answer cannot be off by more than 30 min; likewise, the worst error in decoding a 2D module with length scale λ is λ/2.

Refining the position estimate relies on nesting modules at different length scales, with the goal of making the peak in the probability distribution Embedded Image narrower. Figure 7A illustrates the change in Embedded Image in one spatial dimension upon the addition of a second module with a length scale λ1. If the scale ratio is s = λ01 = 2, the second module increases the probability of the worst possible error: miscomputing the position by ±λ0/2.

Fig. 7 For grid cell decoding to be robust, the ratio of length scales in successive grid cell modules should be below 3/2.

(A) Comparison of the log posterior probability log[P(x|n)] for two scenarios with two modules each, but with different scale ratios s = λ01 > 1. Red shading indicates regions in which the second module increases P(x|n); blue shading represents a decrease. (B) For a grid network composed of four modules with M = 64 neurons each, the histogram shows the positions decoded from the spike count relative to the true positions, which were chosen at random 215 times. The length scale of each module was one-half of the next coarser module (s = 2), such that the modules interfered. The thin hexagon delineates the spacing between firing fields at the coarsest length scale λ0, whereas the thick inscribed hexagon is the unit cell of the lattice. The maximum of the neuron’s tuning curve was 2; the tuning curve’s shape parameter was κ = 2. (C) Spatial information in the four-module network as a function of the scale ratio s, which reaches a maximum around s = 3/2. (D) The optimal scale ratio s depends on the expected number of spikes 〈n〉 across all four modules and falls into discrete levels. (E) The RMS error for the optimal scale ratio s, relative to a 1-m2 enclosure.

The ratio s = 3/2, on the other hand, is a safe choice. At x = λ0/2, the second module contributes a term proportional to cos(sπ) to the log probability, but cos(3/2π) = 0. Together with the normalization of probability distribution, this trigonometric fact ensures that adding a second module does not increase the probability of large decoding errors, irrespective of the number of neurons, the firing rate, or the shape parameter of the neurons’ tuning curve.

The same argument holds when the population code represents 2D space. For the nonideal ratio of length scales s = 2, the histogram of positions decoded from the population spike counts, measured relative to the true position, exhibits a rosetta-like pattern (Fig. 7B): The hexagonal “petals” in this pattern reflect the interference between successive modules.

For a low firing rate and a small module size M, a four-module grid code conveys the most information when the length scales obey 1 ≤ s ≤ 2 (Fig. 7C). If the number of bits is b, the spike count vector can resolve 2b different locations. The greater the number of spikes across the four modules, the higher the scale ratio s can be. The optimal s falls into discrete levels, as assessed by the average decoding error (Fig. 7E). The first discrete level distinct from s = 1 is centered on s = 3/2 (dotted line in Fig. 7D).

Representing space beyond the longest grid scale

If the scale ratio s = λ01 is not an integer, the smaller grid scales are no longer periodic with respect to the unit cell ℒ0 at the largest scale. Indeed, the posterior Embedded Image, treated as a function of Embedded Image, does not repeat itself until the least common multiple of all scales has been reached. This fact has led to the proposal that grid codes instantiate a residue number system to represent a spatial range much larger than λ0 (2, 18): Each module signals an independent spatial phase, such that the set of phases represents a combinatorial code for spatial position (Fig. 8A).

Fig. 8 Localization beyond the spatial scale of the largest module.

(A) For the grid scales λ = {9,7,4}, the Chinese Remainder Theorem holds: for any set of integers 0 ≤ ni ≤ λi, there is an integer m that satisfies m mod λi = ni. The λi’s in this case form a co-prime set; that is, they have no common divisors. (B) On the other hand, the grid scales λ = {9,6,4}, which form part of a geometric progression with s = λii+1 = 3/2, are not co-prime. If we treat ni as the discrete phase of the ith module’s neuronal response, then not all combinations of 0 ≤ ni ≤ λi are allowed to occur. If they do occur, error correction in the readout could map the response to the closest valid combination of phases. (C) Decoding the position (x,y) = (4.83,−2.32) from 200 realizations of the population vectors across L = 3 modules. The fundamental domain ℒ0 at the largest scale has a unit area. The ratio between the chosen spatial periods is s = 7/5, implying that the range of the code increases twentyfivefold. (D) For a smaller number of neurons and lower firing rates, modular arithmetic fails. One thousand two hundred realizations are shown. The scatter of decoded positions is not completely random but reflects ancillary peaks in the posterior probability, whose spatial positions are dictated by the scale ratio s = 7/5. The vertices of the dashed hexagon, which is exactly 52 times the size of ℒ0, represents but one subset of these ancillary peaks. (E) Discrete spatial scales imply that the log posterior probability for the model reflects the sum of sine waves with different spatial frequencies, shown here with frequencies in the ratio of 7:5. Blue and red dots indicate the maxima of the individual sine components. When these points draw close together, a secondary maximum is observed in the sum (purple line). Robust encoding requires that the true position x = 0 be much more likely than any positions corresponding to secondary maxima. (F) The probability of making catastrophic errors in the 2D plane using L = 2 modules. Such errors correspond to population vector decoding yielding a position outside of the fundamental domain ℒ0 around the true position. Samples (totaling 215) are drawn for each scale ratio s = p/q = {3/2,4/3,5/3,7/5,9/5}. Reliable decoding requires that both p and q be small.

A fixed scale ratio s implies a geometric progression of scales across modules. As soon as the number of modules L exceeds 2, the scales do not form a co-prime set; in other words, the least common multiple of the set is less than the product of all scales. This contravenes the Chinese Remainder Theorem (19) that underlies modular arithmetic and states that any combination of discrete spatial phases across modules maps onto a unique position (modulo the largest common multiple of all scales). However, violating the Chinese Remainder Theorem turns out to provide an advantage. If we approximate the scale ratio by a rational number s = p/q, where p and q are integers, certain combinations of discretized spatial phases across modules are forbidden—they should never appear in the deterministic limit. Because of the stochastic nature of neuronal discharge, such combinations might occur, however, when the population vectors are calculated, but then the readout could seek the closest valid set of phases, as illustrated in Fig. 8B, and thereby correct the error.

For s = p/q, the linear dimension of the range is qL−1λ0, where L is the number of modules. In two dimensions, a set of three population vectors are projected along the wave vectors Embedded Image. For each Embedded Image, a separate residue number system exists. These are then summed as in Eq. 4. As shown in Fig. 8C, modular arithmetic can be used to decode positions in 2D space far beyond the fundamental domain of the largest grid scale, as long as there are sufficiently many grid cells firing vigorously. In the limit of high noise, low firing rates, or low numbers of neurons, modular arithmetic collapses (Fig. 8D). As the population model yields an explicit and simple representation of the posterior probability Embedded Image, we can quantitatively assess the likelihood of making catastrophic errors.

Let us focus on one dimension and L = 2, as the results that follow generalize (Supplementary Materials). We can write the posterior asEmbedded Imagewhere we assume that each module has the same number of neurons M. The term cos(2πcx/p) has maxima at x = kp/c, k ∈ ℤ, whereas cos(2πcx/q) has maxima at x = lq/c, l ∈ ℤ. Errors most likely occur when the amplitude of the secondary peaks in the posterior P(x|n) lie close to the amplitude of the primary peak (Fig. 8E). The highest secondary peak occurs when

Embedded Image

Solving this Diophantine (integer congruence) equation, we find that the difference between the primary and secondary maxima in the posterior isEmbedded Image(5)The scaling law in Eq. 5 also correctly predicts the frequency of making large-scale decoding mistakes in the 2D plane, as determined numerically (Fig. 8F).

Ideally, Δln P(x|n) should be as large as possible. In other words, p and q should be small integers. We must have q > 1 for the range to be larger than λ0; hence, the smallest integer we can choose is q = 2. With this choice, the smallest integer that is co-prime to q is p = 3. Therefore, the optimal scale ratio isEmbedded Imagewhich coincides with the most robust s based on iterative refinement of the vector estimate (Fig. 7, A to E). In the limit of high noise and low firing rates, we predict that a modular arithmetic code would sacrifice range for robustness.


Multiscale grid codes can represent vast areas of space (2, 20) or a more limited area with high precision (3). The resolution of such a code could reach a millimeter or less, based on an ideal observer decoding the population response. As we have demonstrated here, the ideal observer is unnecessary: Reading the code is both simple and biologically plausible, once population vector decoding is used. Similar to a land survey, measuring the position in two dimensions relies on determining multiple vectors; trigonometry predicts that several neuronal population vectors should be added together to obtain a position estimate. This estimate yields a new egocentric vector from the current (allocentric) position Embedded Image to an (arbitrary) origin Embedded Image; changing the origin of the coordinate system using nonlinear gain fields is straightforward, so that a multiscale grid code could let a foraging animal always know the direction and distance to home or some goal. Such a mechanism generalizes population vector average decoders that have been implicated in visuomotor transformations (11, 21, 22).

The spatial activity of a single grid cell does not constitute a metric, whereas an ensemble of hierarchically organized grid cells does provide a distance measure, as our results prove. Although spatial information is highly distributed across scales, the readout is a simple linear combination of population vector averages that performs error correction. The metric’s accuracy stems from the geometric progression of length scales, ranging from coarse to fine. Such nested scales have been predicted by optimal coding (3). Measured grid cell maps cluster into discrete groups at different length scales, such that the ratio s of successive scales lies between 1.4 and 1.7 (5, 8). Wei et al. (23) derived an optimal scale ratio of e1/D using a different measure of resolution, where D is the dimension of space. In contrast, our argument that s ≈ 3/2 does not depend on D and is based on a worst-case analysis for small numbers of spikes (see the Supplementary Materials); we predict that grid fields in flying or swimming mammals also cluster into discrete modules and that the scale ratio is similar to the one observed in the spatial firing maps for terrestrial movement.

As the measured ratio between λi and λi+2 is about 2 (5) or larger (8), silencing an intermediate-scale module should lead to systematic errors (Fig. 7B). On the other hand, removing the grid module with the smallest scale would only affect the fine precision of navigation. Likewise, our theory predicts that increasing some grid scales by down-regulating specific cellular conductances (24) should gradually decrease spatial precision, whereas it would drastically alter the readout if grid cells were used for modular arithmetic (2). These predictions could be tested systematically with path integration experiments, in which an animal would have to reproduce specific distances with high spatial accuracy without the aid of landmarks. Specific subpopulations called island and ocean cells have already been genetically identified and characterized (17, 25). Given the ability to deep sequence single neurons, acute manipulations of single grid modules could become feasible in the near future. By perturbing specific modules using targeted pharmacogenetic (26) or optogenetic manipulations (27), the contributions of individual modules to navigational accuracy could be compared to theoretical predictions.

Experiments show that grid patterns in adjacent compartments fuse into a single, continuous pattern spanning both compartments while the animal familiarizes itself with the environment (28). Therefore, a single decoder will be able to read out the grid code, even over long distances. The universal nature of the metric has been called into question, however, as firing fields become more irregular in narrow regions of space (15), which corroborates earlier findings that the spatial representation in grid cells changes when a hairpin maze is introduced into an open arena (16). The measured grids are typically elliptical, and the eccentricity of the grid pattern often varies throughout one arena (14, 29); the activity of boundary cells might influence these differences in eccentricity (15, 30). Yet, as we have shown, the metric nature of the spatial representation lies not in the regularity or periodicity of the grid but in the population activity. Whatever the sources of distortion are, one can think of mechanisms by which the (location-dependent) eccentricity would modulate the readout formula (Eq. 4), preserving the population metric for space. The prerequisite for a universal metric is that the distortion be common to all cells within the same module. Across modules, however, the eccentricity can be different, as observed experimentally (5, 14, 15). If the brain did not use an adaptive readout, this should result in distortions of spatial perception, which could be tested in psychophysical experiments. Regardless of whether the metric is universal or not, the grid metric by itself may prove insufficient for real-world navigation, as it contains no information about physical obstacles in the environment. How the nervous system might translate the goal direction vector into a feasible shortest path is an open question.

Within the nested hierarchy of grid modules, self-similar scaling implies that the finer-scale modules provide higher spatial resolution. In modular arithmetic, the resolution is assumed to be the same for each module. However, as long as the true resolution is sufficient for modular arithmetic, the two decoding schemes are compatible with each other: a readout mechanism could do both. In either scheme, making the grid code robust predicts a geometric progression of grid scales, in both cases with the scale ratio s = 3/2. Such a progression stands in contrast to the original proposal for modular arithmetic, which envisions a set of grid scales that are not divisible by a common factor (2). Although a geometric progression forfeits some of the range that co-prime number sequences of spatial periods would permit, the spatial range could still cover several square kilometers, as the following calculation inspired by experimental estimates shows: take 10 nested grid cell modules (5), with scales ranging from 25 cm to several meters (9), which is consistent with a scale ratio of s = 3/2. With these numbers, the range along each dimension of space is 29 · (3/2)9 · 0.25 m, which is approximately 5 km. Modular arithmetic codes need to first be converted into an explicitly metric representation (2), though, which must be learned (4, 31). The high variability of neuronal discharge in grid cells (10, 32, 33) stresses the importance of robustness for any biological coding scheme. Given this variability, it seems unlikely that single grid cells will signal spatial phase to downstream networks of neurons with sufficient reliability; regardless of the grid cell coding scheme, population averages will be required.

Population vector decoding predicts not only that grid cells are grouped into modules but also that lattices within one module should be aligned, as observed experimentally (1, 12). Spatial information would still be present, were the lattices randomly aligned or the modules absent, but would not be as easily decoded. Yet, which neurons read out the grid code? Landmark vector cells in the hippocampal area CA1 are one candidate (34, 35), as these cells respond when the homing vector matches a fixed vector describing a specific direction and distance to a landmark. Outside of the hippocampus, circuits in the retrosplenial and posterior parietal cortex, areas essential to memory-dependent spatial navigation (36, 37), may be involved. A direct link exists between (presumptive) grid cells of the presubiculum and retrosplenial cortex (38), whereas the posterior parietal cortex, well known for the multiplicative interactions between its inputs (13), receives afferents from the medial entorhinal cortex (39).

One of the key predictions of our theory is that the nervous system will be able to rotate the population vector averages. According to the sum rules of trigonometry, such a rotation is equivalent to multiplying the readout weights with cosine-like functions, which would act as gain fields. The effect of such a multiplication is a near-instantaneous change in the represented goal location. Neural mechanisms for such multiplications have been proposed (40, 41), but the neural implementation could be implicit at the network level (42) and possibly be based on a restricted set of “template” functions (43). In the latter case, a set of cells would change their firing rate on the same time scale with which the goal location changes and switch the phase of the readout weights. Two-photon imaging in animals performing cued navigation tasks [for example, Harvey et al. (36)] could be used to search for such goal-encoding neurons and, with widefield imaging in freely navigating animals (44), could reveal signatures of these computations. Multiplication is an intermediate step in the computation of the homing vector, whose final result, if we interpret the model literally, corresponds to a linear ramp of firing activity as a function of distance to the goal (fig. S4); other representations are also possible (45). Whether grid cell ensemble activity is indeed decoded in this manner remains a question for further research.


To model the spatial firing rate map Embedded Image of a grid cell, which describes the cell’s expected spike count when the animal is at location Embedded Image, we superimposed three plane waves with wave vectors Embedded Image, where Embedded Image for , l ∈{1,2,3}, and exponentiated the resultEmbedded Image(6)This is a 2D von Mises function. For a hexagonal grid that is aligned to the x axis, the three vectors are Embedded Image with φl = −π/6 + lπ/3. The term ω is given by ω = 2π/(sin(π/3)λ), where λ is the spatial scale of the firing pattern (or grid size), 1/κ measures the cell’s relative tuning width, and nmax is the maximal expected spike count. To capture the 2D spatial phase Embedded Image of cell j’s firing pattern, the argument Embedded Image on the right-hand side of Eq. 6 is replaced by Embedded Image.

We assumed that the number of spikes emitted by the jth grid cell obeys Poisson statistics, such that the expected spike count is Embedded Image. Each neuron is statistically independent, so the conditional probability for being at location Embedded Image given the spiking activity in a population of N neurons is

Embedded Image(7)

In the Supplementary Materials, we reformulated this posterior probability for the population of grid cells to derive a simple and biologically plausible decoder that provides a position estimate Embedded Image given a stochastic (noisy) realization of the population spike count vector n = (n1,…, nN). Grid cells in the entorhinal cortex are subdivided into populations of cells with different grid spacings (5); these subpopulations are called “modules.” To capture this property, we assigned the same tuning curve to each neuron within a module, such that ω, nmax, and κ are identical, but the spatial phases Embedded Image are different. The phases are either equidistantly arranged or randomly (but uniformly) distributed across the unit cell ℒ (also known as the fundamental domain) of the hexagonal grid. The Supplementary Materials show how a population vector decoder for a single module can be derived from Eqs. 6 and 7 and how to combine the information about position across modules.


Supplementary material for this article is available at


Fig. S1. Hierarchical self-similar scales enable error correction.

Fig. S2. Noninteger scale ratios imply that the decoding algorithm must be able to rotate population vectors.

Fig. S3. The lengths of the population vectors Embedded Image along the hexagonal grid’s axes are correlated.

Fig. S4. Possible distributed representations of the position vector estimate Embedded Image of eq. S6 across a population of readout neurons.

Fig. S5. Comparison of different decoding schemes for a single module in two dimensions.

Fig. S6. Comparison of different decoding schemes for multiple modules in two dimensions.

Fig. S7. If the grid scales are not organized into discrete modules, population vector decoding is no longer straightforward.

Fig. S8. Continuum decoding requires multiple population vectors.

References (4650)

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We thank E. Moser for stimulating discussions, T. Stensola for providing Fig. 1D, M. Amoroso for graphical support, and M. Amoroso, D. Derdikman, and C. Harvey for comments on the manuscript. Funding: Work at the Bernstein Center for Computational Neuroscience Munich was supported by Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie (01GQ1004A). A.M. received support from Deutsche Forschungsgemeinschaft grant MA 6176/1-1 and the Marie Curie Fellowship (PIOF-GA-2013-622943 of the European Union’s Seventh Framework Programme FP7 2007-13). Author contributions: M.S., A.M., and A.V.M.H. worked out the theory, developed experimental predictions, and wrote the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data used to obtain the conclusions in this paper are presented in the paper and/or the Supplementary Materials.

Stay Connected to Science Advances

Navigate This Article