Research ArticleSOCIAL SCIENCES

Best reply structure and equilibrium convergence in generic games

See allHide authors and affiliations

Science Advances  20 Feb 2019:
Vol. 5, no. 2, eaat1328
DOI: 10.1126/sciadv.aat1328
  • Fig. 1 Illustration of the best reply structure.

    SR = {1, 2, 3, 4} and SC = {1, 2, 3, 4} are the possible actions of players Row and Column, and each cell in the matrix represents their payoffs (Row is given first). The best response arrows point to the cell corresponding to the best reply. The vertical arrows correspond to player Row, and the horizontal arrows correspond to player Column. The arrows are colored red if they are part of a cycle, orange if they are not part of a cycle but lead to one, blue if they lead directly to a fixed point, and cyan if they lead to a fixed point in more than one step. The best reply matrix in (B) is a Boolean reduction that is constructed to have the same best reply structure as the payoff matrix in (A), but to only have one and zero as its entries.

  • Fig. 2 How the best reply structure influences convergence of six learning algorithms.

    For each payoff matrix (A) to (D), we show the dynamics of two learning algorithms. We plot the probabilities x1 and x2 for Row to play actions sR = 1 and sR = 2, respectively, and the probability y1 for Column to play sC = 1. The black dots are the positions of the NE: For payoff matrices (A) to (C), the dot at x1 = x2 = y1 = 0 is the pure strategy NE; all other dots are mixed strategy equilibria. The trajectories are colored in shades of blue if they converge to a pure equilibrium, in shades of green if they converge to a mixed equilibrium, and in shades of red if they do not converge. NE, Nash equilibrium.

  • Fig. 3 Test for how well the best reply structure predicts nonconvergence for six learning algorithms.

    We generate 1000 random payoff matrices with N = 20 actions, and for each of these, we simulate learning 100 times based on random initial conditions. The same process is repeated using the best reply matrix associated with each of the 1000 random games. Each circle corresponds to a specific best reply vector υ. Its size is the logarithm of the number of times a payoff matrix with υ was sampled. The horizontal axis is the share of best reply cycles F(υ). For example, the largest circle at F(υ) = 0.66 corresponds to υ = (0, …, 0, 1, 1). The vertical axis gives the frequency of nonconvergence in the simulations, averaged over all payoff matrices and initial conditions having the same υ. The top row shows results for the best reply matrices, and the bottom row shows results using normally distributed payoffs. The identity line is plotted for reference.

  • Fig. 4 How the share of best reply cycles predicts convergence as a function of the number of actions N and the competition parameter Γ.

    Dashed lines are the average share of best reply cycles 〈F(υ)〉N for those values of N and Γ. Markers are the fraction of simulation runs in which the learning algorithms do not converge.

  • Fig. 5 Comparison of analytical predictions about best reply cycles to numerical simulations when Γ = 0.

    Markers are numerical results, and solid lines are analytical results. Red circles depict the frequency of randomly generated payoff matrices with no fixed points (F(υ) = 1), and blue triangles show the frequency with at least one cycle (F(υ) > 0). The text in the figure refers to the area delimited by solid lines, e.g., “Cycles + fixed points” means that the fraction of payoff matrices with both cycles and fixed points is the distance between the red and blue lines. Last, green squares represent the average share of best reply cycles FN; this is discontinued at N = 50 due to excessive computational cost (see section S5.2).

  • Table 1 Terminology.

    NE, Nash equlibrium.

    Best replyAction that gives the best payoff in response to a given action by an opponent
    Best reply structureArrangement of the best replies in the payoff matrix
    Best reply matrixDerived payoff matrix, with one for the best reply to each possible move of the opponent and zero everywhere else
    Best reply dynamicsSimple learning algorithm in which the players myopically choose the best reply to the last action of their opponent
    Best reply k-cycleClosed loop of best replies of length k (each player moves k times)
    Best reply fixed pointPure NE, i.e., the action for each player that is a best reply to the move of the other player
    Best reply vector υList of the number of distinct attractors of the best reply dynamics, ordered from longest cycles to fixed points
    Free action/free best replyBest reply to an action that is neither part of a cycle nor a fixed point
    Best reply configurationUnique set of best replies by both players to all actions of their opponent

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/5/2/eaat1328/DC1

    Section S1. Details of the simulation protocol

    Section S2. Best reply structure and mixed strategy NE

    Section S3. Further evidence on the predictive power of the best reply structure

    Section S4. Ensemble of games constrained by a potential function

    Section S5. Analytical calculations on the best reply structure with uncorrelated payoffs

    Fig. S1. Instances of simulation runs of the Bush-Mosteller reinforcement learning algorithm with N = 20.

    Fig. S2. Instances of simulation runs of fictitious play with N = 20.

    Fig. S3. Effect of negative payoff correlation on fictitious play for some values of N and Γ.

    Fig. S4. Instances of simulation runs of replicator dynamics with N = 20.

    Fig. S5. Instances of simulation runs of EWA with N = 20.

    Fig. S6. Instances of simulation runs of EWA and EWA with noise with N = 20.

    Fig. S7. Mixed strategy NE classified in relation to the best reply structure.

    Fig. S8. Test for how well the best reply structure predicts nonconvergence with N = 5, instead of N = 20 as in the main paper.

    Fig. S9. Test for how well the best reply structure predicts nonconvergence with N = 50, instead of N = 20 as in the main paper.

    Fig. S10. Frequency of convergence to mixed strategy NE for N = 20.

    Fig. S11. Correlation matrix of the co-occurrence of nonconvergence for the six learning algorithms.

    Fig. S12. How the share of best reply cycles predicts convergence as perfect potential games are approached.

    Fig. S13. Exhaustive account of best reply configurations with the same best reply vector.

    Fig. S14. Payoff matrix with N = 11 that is used to illustrate the calculation of the frequency of the best reply vectors.

    Fig. S15. Frequency of k-cycles, ρ(N, k), as a function of the number of actions N.

    References (5562)

  • Supplementary Materials

    This PDF file includes:

    • Section S1. Details of the simulation protocol
    • Section S2. Best reply structure and mixed strategy NE
    • Section S3. Further evidence on the predictive power of the best reply structure
    • Section S4. Ensemble of games constrained by a potential function
    • Section S5. Analytical calculations on the best reply structure with uncorrelated payoffs
    • Fig. S1. Instances of simulation runs of the Bush-Mosteller reinforcement learning algorithm with N = 20.
    • Fig. S2. Instances of simulation runs of fictitious play with N = 20.
    • Fig. S3. Effect of negative payoff correlation on fictitious play for some values of N and Γ.
    • Fig. S4. Instances of simulation runs of replicator dynamics with N = 20.
    • Fig. S5. Instances of simulation runs of EWA with N = 20.
    • Fig. S6. Instances of simulation runs of EWA and EWA with noise with N = 20.
    • Fig. S7. Mixed strategy NE classified in relation to the best reply structure.
    • Fig. S8. Test for how well the best reply structure predicts nonconvergence with N = 5, instead of N = 20 as in the main paper.
    • Fig. S9. Test for how well the best reply structure predicts nonconvergence with N = 50, instead of N = 20 as in the main paper.
    • Fig. S10. Frequency of convergence to mixed strategy NE for N = 20.
    • Fig. S11. Correlation matrix of the co-occurrence of nonconvergence for the six learning algorithms.
    • Fig. S12. How the share of best reply cycles predicts convergence as perfect potential games are approached.
    • Fig. S13. Exhaustive account of best reply configurations with the same best reply vector.
    • Fig. S14. Payoff matrix with N = 11 that is used to illustrate the calculation of the frequency of the best reply vectors.
    • Fig. S15. Frequency of k-cycles, ρ(N, k), as a function of the number of actions N.
    • References (5562)

    Download PDF

    Files in this Data Supplement:

Navigate This Article