I do not know the mathematics of evolutionary games, and Replicator Dynamics is just a name to me. But I do know how to run computer simulations, and the Agent-Based Model (ABM) is my language.
Suppose we have an $n \times n$ grid. Generate a population of agents by multiplying the number of cells by the population density. At each step, agents carry an action, move across the grid, find another agent in their Von Neumann neighbourhood, play a round of a classical game, then update their action and move to the next step. All agents update their action the same way. The above defines the basic elements of the model.
Now consider one key variable: what is the basis on which agents update their action?
Compare this step's payoff $P_1$ with neighbours; in the next step, switch to whichever action had the higher $P_1$.
Compare the cumulative payoff $P_2$ from all historical games with neighbours; in the next step, switch to whichever action had the higher $P_2$.
At the micro level, the only difference is "look at the present" versus "look at history." But when I tried to describe these two models mathematically, I found that they correspond to two completely different physical dynamics: a first-order system versus a second-order system, velocity versus acceleration.
Below I work through the derivation step by step.
From Code to Equation: The Mean-Field Approximation
In a computer simulation, there is an $n \times n$ grid and agents walk around looking for neighbours. Mathematicians make a "lazy" but extremely effective assumption β the mean-field approximation: assume the grid is infinitely large, and that everyone is mixed as thoroughly as gas molecules, meeting each other at random.
What does this mean? Suppose at the current moment a proportion $x$ of the population uses strategy $A$ and a proportion $1-x$ uses strategy $B$. In a tiny time step $\Delta t$, if we randomly grab an agent, the probability that they use strategy $B$ is $1-x$; the probability that they happen to bump into an $A$-using neighbour is $x$. So the joint probability of the event "$B$ meets $A$" is $x(1-x)$.
Once $B$ meets $A$, will it switch to $A$? In our rule, the comparison is based on payoffs. Assume the probability of switching strategies is proportional to the payoff difference: if $\pi_A > \pi_B$, then the probability that $B$ becomes $A$ is $P(B \to A) = \alpha (\pi_A - \pi_B)$, where $\alpha$ is a constant scaling factor.
That is the bridge from micro to macro. From here, the two models diverge.
Model 1: Looking at the Current Payoff β Replicator Dynamics
Derivation
Let $x(t)$ be the proportion of agents using strategy $A$ at time $t$ across the grid, and $1-x(t)$ the proportion using $B$. The expected per-step payoff for strategy $A$ is $\pi_A(x)$, and for strategy $B$ is $\pi_B(x)$.
After a tiny time step $\Delta t$, how much does the proportion of strategy $A$ change?
The increment of strategy $A$ = "people who were originally $B$" $\times$ "they met $A$" $\times$ "probability they decide to become $A$":
$$\Delta x = (1-x) \cdot x \cdot \alpha(\pi_A - \pi_B) \cdot \Delta t$$Divide both sides by $\Delta t$:
$$\frac{\Delta x}{\Delta t} = \alpha x(1-x)(\pi_A - \pi_B)$$As $\Delta t \to 0$, $\frac{\Delta x}{\Delta t}$ becomes the derivative $\frac{dx}{dt}$. Setting $\alpha = 1$ (absorbed into the time scale) yields:
$$\frac{dx}{dt} = x(1-x)[\pi_A(x) - \pi_B(x)]$$This is the famous standard replicator equation in evolutionary game theory.
Characteristics
In plain language: velocity = encounter probability $\times$ payoff-driven force.
This is a first-order ordinary differential equation. $\frac{dx}{dt}$ is the "velocity" of population evolution, directly determined by the current payoff difference. If $A$ earns more this step, everyone immediately switches to $A$.
The system has the Markov property (memoryless): the next step's state depends only on the current step's payoff state, with no dependence on the past.
A physical analogy: motion in thick honey. As long as there is no push (the payoff difference is zero), motion stops immediately, and the system reaches a Nash equilibrium.
Model 2: Looking at the Cumulative Payoff β Inertial Dynamics
Derivation
In Model 2, agents no longer compare the current per-step payoff $\pi$, but the cumulative payoff across all historical games.
In discrete code, this is implemented as a running sum. In continuous calculus, "accumulate over time" is a definite integral:
$$U_A(t) = \int_0^t \pi_A(x(\tau)) d\tau, \quad U_B(t) = \int_0^t \pi_B(x(\tau)) d\tau$$The strategy update logic is unchanged β only the comparison targets are swapped for $U_A$ and $U_B$:
$$\frac{dx}{dt} = x(1-x)[U_A(t) - U_B(t)]$$It looks as though we have simply replaced $\pi$ with $U$, but the mathematical properties have fundamentally changed. To see this clearly, do a transformation.
Move $x(1-x)$ to the left:
$$\frac{1}{x(1-x)} \frac{dx}{dt} = \int_0^t \pi_A(x(\tau)) d\tau - \int_0^t \pi_B(x(\tau)) d\tau$$Differentiate both sides of the equation with respect to time $t$.
By the fundamental theorem of calculus (the derivative of a definite integral with a variable upper limit is the integrand itself), the integral signs on the right are "stripped away," revealing the current-step payoff $\pi_A - \pi_B$:
$$\frac{d}{dt} \left( \frac{1}{x(1-x)} \frac{dx}{dt} \right) = \pi_A(x) - \pi_B(x)$$Characteristics
The left side is the derivative of a term that already contains the velocity $\frac{dx}{dt}$ β the derivative of velocity is acceleration.
The current payoff difference $\pi_A - \pi_B$ no longer determines the velocity; it determines the acceleration. This is a second-order integro-differential equation with strong path dependence (memory & inertia), known in the literature as inertial dynamics.
A physical analogy: Newton's second law $F = ma$. The current payoff difference is the "force"; force changes acceleration, not velocity. The system has inertia.
How the Two Dynamics Behave in Simulation
Write both models into code and run them, and they produce strikingly different phenomena.
In Model 1, if the game's Nash equilibrium point shifts, agents pivot with great agility and immediately move toward the new optimum, with the curve gliding smoothly toward a stable proportion. In Model 2, "turning around becomes difficult" β even when the current $A$ is no longer as profitable as $B$ ($\pi_A < \pi_B$), agents are still blindly turning into $A$ because $A$ has accumulated a huge historical wealth total ($U_A > U_B$). Only when $B$'s current advantage has lasted long enough to erase the historical deficit does the population start to switch. The curve shows severe oscillations and delays.
In Model 2, if a fraction of agents gets extremely lucky with a particular strategy in the first few rounds, amassing a very high historical score, that strategy can become permanently locked in across the grid, even though it is not the current global optimum.
Summary
Model 1 is the mathematics of "going with the times," and Model 2 is the mathematics of "historical depth." I find this correspondence both surprising and natural.
In model 1, payoff differences directly determine the velocity of strategy evolution (replicator dynamics).
In model 2, payoff differences accumulate over time and determine the acceleration of strategy evolution, leading to an inertial (second-order) dynamic system.
At the micro level, the change is a tiny one β comparing the current step's payoff versus the cumulative payoff β but at the macro level, the mathematics jumps from first order to second order, from memoryless to inertial. Naturally, once the logic of this jump is understood, it all becomes inevitable: cumulative payoff is the integral of the current payoff. Put the integral into the equation, then differentiate with respect to time to eliminate the integral, and an extra order of differentiation emerges.
ABM is evolutionary game theory from the micro perspective β local interactions, probabilistic imitation, bounded rationality. The mathematical equation, by contrast, is the macro approximation that assumes an infinitely large grid, infinitely many agents, and uniform mixing.
They are two sides of the same coin, and the mean-field approximation is the flip of that coin.
ABM captures spatial effects and stochastic fluctuations; the mathematical equation yields clear qualitative judgments β for instance, the conclusion that "looking at historical scores causes system oscillations" can be predicted directly from the mathematical properties of the second-order equation.
But conversely, the mathematical equation's predictions also have limits. The mean-field approximation ignores spatial aggregation effects β when agents tend to cluster with their own kind, the actual encounter probabilities deviate from $x(1-x)$, and the macro equation's predictions go off. In that case, ABM is the more honest tool.
The "inertia" of evolutionary dynamics makes me think of a broader question: many social phenomena β technology lock-in, institutional path dependence, cultural inertia β could all be understood through a similar second-order dynamic. When individual decisions are based on accumulated historical information rather than current signals, the system naturally exhibits hysteresis and oscillation. This may not be coincidence, but some deeper thermodynamic or statistical-physics principle at work.