Research ArticleCOGNITIVE NEUROSCIENCE

Number detectors spontaneously emerge in a deep neural network designed for visual object recognition

See allHide authors and affiliations

Science Advances  08 May 2019:
Vol. 5, no. 5, eaav7903
DOI: 10.1126/sciadv.aav7903

Figures

  • Fig. 1 An HCNN for object recognition.

    (A) Simplified architecture of the HCNN. The feature extraction network consists of convolutional layers that compute multiple feature maps. Each feature map represents the presence of a certain visual feature at all possible locations in the input and is computed by convolving the input with a filter and then applying a nonlinear activation function. Max-pooling layers aggregate responses by computing the maximum response in small nonoverlapping regions of their input. The classification network consists of a global average-pooling layer that computes the average response in each input feature map, and a fully connected layer where the response of each unit represents the probability that a specific object class is present in the input image. (B) Successful classification of a wolf spider by the network from other arthropods is shown as an example. Example images representative of those used in the test set and the top 5 predictions made by the network for each image ranked by confidence. Ground-truth labels are shown above each image. Images shown here are from the public domain (Wikimedia Commons).

  • Fig. 2 Numerosity-tuned units emerging in the HCNN.

    (A) Examples of the stimuli used to assess numerosity encoding. Standard stimuli contain dots of the same average radius. Dots in Area & Density stimuli have a constant total area and density across all numerosities. Dots in Shape & Convex hull stimuli have random shapes and a uniform pentagon convex hull (for numerosities >4). (B) Tuning curves for individual numerosity-selective network units. Colored curves show the average responses for each stimulus set. Black curves show the average responses over all stimulus set. Error bars indicate SE measure. PN, preferred numerosity. (C) Same as (B), but for neurons in monkey prefrontal cortex (20). Only the average responses over all stimulus sets are shown. (D) Distribution of preferred numerosities of the numerosity-selective network units. (E) Same as (D), but for real neurons recorded in monkey prefrontal cortex [data from (20)].

  • Fig. 3 Tuning curves of numerosity-selective network units.

    Average tuning curves of numerosity-selective network units tuned to each numerosity. Each curve is computed by averaging the responses of all numerosity-selective units that have the same preferred numerosity. The pooled responses are normalized to the 0 to 1 range. Preferred numerosity and number of numerosity-selective network units are indicated above each curve. Error bars indicate SE measure.

  • Fig. 4 Tuning properties of numerosity-selective network units.

    (A) Left: Average tuning curves for network units preferring each numerosity plotted on a linear scale. Right: Same tuning curves plotted on a logarithmic scale. (B) Average goodness-of-fit measure for fitting Gaussian functions to the tuning curves on different scales [Plinear-log = 0.009; Plinear-pow(1/2) = 0.003; Plinear-pow(1/3) = 0.001]. (C) SD of the best-fitting Gaussian function for each of the tuning curves of numerosity-selective network units for different scales.

  • Fig. 5 Relevance of numerosity-selective units to network performance.

    Average activity of numerosity-selective network units shown as a function of numerical distance between preferred numerosities and sample numerosities in the matching task. Left: Data from network units. Responses were average separately for correct trials (black) and error trials (gray). Responses during error trials are normalized to the maximum average response during correct trials (P = 0.019). Right: Same plot but for real neurons recorded from monkey prefrontal cortex [data from (20)].

  • Fig. 6 Performance of the HCNN model in the numerosity matching task.

    (A) Left: Performance functions resulting from the discrimination of numerosities plotted on a linear scale. Each curve shows the probability of the model predicting that the sample image contains the same number of items as the test image (peak of the function). Sample numerosity is indicated above each curve. Right: Same performance functions but plotted on a logarithmic scale. (B) Average goodness-of-fit measure for fitting Gaussian functions to the performance tuning curves on different scales [Plinear-log = 0.003; Plinear-pow(1/2) = 0.049; Plinear-pow(1/3) = 0.016]. *P < 0.05. (C) SD of the best-fitting Gaussian function for each of the performance tuning curves for different scales.

Tables

  • Table 1 Description of the layers in the HCNN.
    RoleLayerTypeNumber of feature mapsSpatial sizeKernel size
    Feature extraction0Input image3224 × 224
    1Convolutional32224 × 2449 × 9
    2Max-pooling32224 × 2442 × 2
    3Convolutional48112 × 1129 × 9
    4Max-pooling48112 × 1122 × 2
    5Convolutional9656 × 567 × 7
    6Max-pooling9656 × 562 × 2
    7Convolutional19228 × 285 × 5
    8Max-pooling19228 × 282 × 2
    9Convolutional38414 × 145 × 5
    10Max-pooling38414 × 142 × 2
    11Convolutional7687 × 75 × 5
    12Convolutional7687 × 75 × 5
    13Convolutional7687 × 75 × 5
    Classification14Average-pooling7681 × 17 × 7
    15Softmax classifier10001 × 11 × 1

Navigate This Article