🐳⁠ The Rhythm of the Game

The Rhythm of the Game

I have written two ABMs (Agent-Based Models) before. On a grid, agents are paired at random, play one round of a game, then update their action. The only variable is "what you look at" β€” the payoff of this step, or the cumulative payoff across all historical games. I cannot derive the differential equations myself (the mean-field approximation was copied from the literature), but I can still follow the order of the ODEs: one is first-order, the other is second-order. Velocity versus acceleration, memoryless versus inertial. The micro-level setting is a hair's breadth apart.

But those two models share a hidden premise: the payoff matrix is cast in iron. The Prisoner's Dilemma is always the Prisoner's Dilemma. The Hawk-Dove game is always the Hawk-Dove game.

Grass does not feel pain.

Weitz et al. (2016) brought the grass to life β€” strategies alter the environment, the environment rewrites the payoff structure, and the payoff structure reshapes the strategies in turn. Once a closed loop forms, the system begins to breathe.

What I want to do is simpler: I am not opening a continuous feedback channel from the environment, just giving the grid a rhythm.

Giving the Grid a Rhythm

Add a resource state variable to the original ABM, with an initial value of $A$. At each step, agents play games on the grid, consuming $1$ unit of resource. The resource drops from $A$ all the way to $0$, and after a fixed number of steps, resets back to $A$.

Set $b=1$, and let the resource stock $a \in \{0, 1, 2, 3, 4\}$. The payoff matrix is:

$$ \begin{matrix} & C & D \\ \hline C & a/2 & 0 \\ D & a & (a-1)/2 \end{matrix} $$

Starting from scratch, $a=4 \implies \begin{pmatrix} 2 & 0 \\ 4 & 1.5 \end{pmatrix}$

A textbook Prisoner's Dilemma. $D$ strictly dominates $C$ β€” no matter what the opponent chooses, defection earns more than cooperation. Replicator dynamics tells you: close the door, release the defectors, and the whole grid falls.

$a=2 \implies \begin{pmatrix} 1 & 0 \\ 2 & 0.5 \end{pmatrix}$

Still a Prisoner's Dilemma. But the gap between cooperation and defection is narrowing.

$a=1 \implies \begin{pmatrix} 0.5 & 0 \\ 1 & 0 \end{pmatrix}$

The threshold. When the opponent defects, choosing cooperation or defection is a tie. The structure of the Prisoner's Dilemma begins to loosen.

$a=0 \implies \begin{pmatrix} 0 & 0 \\ 0 & -0.5 \end{pmatrix}$

The table flips. Mutual defection drops into negative territory β€” the only move that does not lose is cooperation. This is not the standard Hawk-Dove matrix, but it shares part of the same DNA: defection is no longer dominant.

One table, five ways of setting it. As $a$ slides from $4$ to $0$, the underlying game shifts all the way from a Prisoner's Dilemma to a Hawk-Dove game.

The scarcer the resources, the less foolish cooperation becomes.

Breathing

Now add the reset.

Every fixed number of steps β€” say, $10$ steps β€” the resource snaps back to $A=4$. So:

Resources abundant $\to$ Prisoner's Dilemma $\to$ defection expands $\to$ resources depleted $\to$ the game loosens $\to$ cooperation rises $\to$ reset triggers $\to$ Prisoner's Dilemma returns.

Inhale, exhale. Close, open.

The only difference from Weitz is the form of the feedback: he uses continuous differential equations to couple environment and strategy, while I use a discrete resource budget plus periodic reset. But the underlying logic is the same β€” the structure of the game is no longer a constant, but a function of the resource stock. Strategy bites the resource, the resource bites the matrix, the matrix bites the strategy. A closed loop.

The oscillation is not a bug. It is by design.

What Model 1 Predicts: The Critical Density

The above assumes the entire grid is synchronised β€” every cell drops $a$ at the same time and resets at the same time. The mean-field approximation runs smoothly on top of that assumption.

But an ABM is not a mean field. Agents move locally across the grid, and pairings happen in neighbourhoods. Different cells, different moments, different agents bumping into different $a$ values.

That adds another layer.

Suppose at some step, a sizeable fraction of cells has slid to $a=0$ β€” the Hawk-Dove region, where the cooperator's current payoff overtakes that of defection, and $C$ no longer loses out. But for an agent to switch from $D$ to $C$, it needs to actually bump into a $C$ in its neighbourhood, and that $C$'s current score must be higher than its own β€” only then is a switch possible.

If the proportion of $C$ in the population is too low β€” say, only $5\%$ left β€” most agents cannot see a $C$ in their neighbourhood at all. All they see are other $D$s. In the $a=0$ payoff matrix, a $D$-$D$ encounter scores $-0.5$ β€” admittedly worse than the $C$-$C$ score of $0$. But the problem is that the agent has no idea what $C$-$C$ would yield, because they have never seen a $C$. Their comparison only happens among "the people they actually see."

Switching requires a critical density. Below a threshold, the population is locked into $D$ β€” not because the Prisoner's Dilemma is still in effect (the payoff matrix has already changed), but because there are not enough demonstrators.

That brings out the real role of $N$ β€” the number of reset steps.

$N$ does not only set the phase length of the oscillation. It is the window width that cooperation needs to snowball during the $a$ trough.

If $N$ is short β€” three or five steps, say β€” the window is just enough for a few isolated agents to switch to $C$ based on their current payoff. The density is far from the threshold for a chain reaction. Once the reset comes, $a$ jumps back to $4$, the Prisoner's Dilemma table is reset, and the current-step advantage of $D$ swallows the remaining $C$s one by one β€” not a single $C$ is left to reproduce.

If $N$ is long β€” twenty or thirty steps β€” the window is wide enough. The early-switching $C$s have enough steps to infect their neighbours through local neighbourhoods. Once the snowball starts rolling, the acceleration is fast. The reset can only flatten part of the snowball; the core remains.

Look at it the other way. Even if $N$ is long enough, if at the reset moment there are still large patches of cells with relatively high $a$ values β€” the "highlands" of the Prisoner's Dilemma β€” then the defectors in those regions are still a barrier to $C$'s spread. An agent that has just become a $C$ in a low-$a$ region walks one step into a high-$a$ region and is immediately punished by the Prisoner's Dilemma β€” the opponent plays $D$, and the agent scores $0$. The neighbours see it plummet to the bottom in one round β€” who would copy it?

Spatial pattern here works in concert with the length of $N$: the $a$ troughs need not only to last long enough, but to spread wide enough, so wide that $C$'s diffusion does not just step out of the safe zone and slam straight into $D$'s advantageous battlefield.

So Model 1's prediction has an invisible string: $N$ must cross some critical value before the local $C$ density can leap from "isolated" to "self-sustaining." Only after that string is crossed does the sawtooth oscillation actually happen β€” only then does the flag flutter in the wind. Below it, $x$ sits motionless in the trough and is quietly swallowed by the Prisoner's Dilemma that follows.

Where exactly is that string? It depends on population density, initial $C$ proportion, agent movement speed, the spatial distribution of $a$ β€” none of which can be pinned down from the differential equation on paper. You have to write the code.

What Model 2 Predicts: The Two Faces of Inertia

Model 2 compares cumulative scores. $U_A(t) = \int_0^t \pi_A d\tau$. It has inertia.

In the earlier analysis, I assumed $N$ was short β€” short enough that the $a=0$ phase never gave $C$ a chance to overturn the cumulative ledger. In that scenario, the adaptive lag is a pure burden. That is correct.

But if $N$ is long enough, the situation flips.

A long $N$ means the $a=0$ (or $a=1$) state persists for dozens of steps. At every step, $C$'s relative payoff is higher than $D$'s. At every step, $C$'s account at $U_C$ is padded. The increments are small, but the steps are many.

Suppose the $a=0$ state persists for $T$ steps, with a fraction $x$ of the population being $C$. $C$'s per-step payoff is $0$. $D$'s expected per-step payoff is $0 \cdot x + (-0.5) \cdot (1-x) = -0.5(1-x)$. Each step, $U_C - U_D$ widens by $0.5(1-x)$.

Over $T$ steps, $U_C$ exceeds $U_D$ by $0.5(1-x) \cdot T$. With $T$ large enough, this accumulated gap can fill in the historical lead that $D$ built up during the Prisoner's Dilemma phase β€” and then overtake it.

Once that overtake happens, the nature of the game changes.

Agents start switching en masse from $D$ to $C$. Every additional $C$ shrinks $(1-x)$, $D$-$D$ encounters become more frequent, and $D$'s expected payoff $-0.5(1-x)$ slides further into the negative. $U_D$ depreciates at an accelerating rate. $U_C$ is unchanged (always $0$), but relative to $U_D$ it climbs faster. Positive feedback.

When the reset arrives β€” $a$ jumps back to $4$ and the Prisoner's Dilemma reopens β€” $C$ has already banked a cumulative score. In the $a=4$ regime, $D$'s current payoff crushes $C$: $D$ against $C$ scores $4$, $C$ against $D$ scores $0$. Every step chips away at $C$'s cumulative reserves. But if the reserves are thick enough, the chipping is slow.

This is cooperation inertia.

The cumulative score that $C$ has built up during the Hawk-Dove window becomes its shield during the Prisoner's Dilemma window. Agents compare the cumulative tally β€” and on that tally, $C$ is still in the lead. It takes many steps of $D$'s current-step dominance to overturn the ledger. And during those "many steps," $C$ is still there β€” the population has not collapsed into pure $D$.

One mechanism, two faces. $N$ is the switch:

Short $N$: the historical lead that $D$ builds during the Prisoner's Dilemma phase $\to$ inertia holds back cooperation's counterattack $\to$ reset returns $\to$ $D$'s lead deepens. This is the "historical sandbag" described earlier β€” inertia only weighs cooperation down.

Long $N$: the historical lead that $C$ builds during the Hawk-Dove window $\to$ inertia holds back $D$'s counterattack after the reset $\to$ $C$ survives the Prisoner's Dilemma shock. Inertia ends up protecting cooperation.

More subtly, there is the spatial dimension. If the $a=0$ region forms a contiguous patch β€” a "cooperative enclave" β€” $C$s paired within it hold $U_C$ stable and non-decreasing. $D$s that wander in mostly meet $C$s, with their payoffs locked at $0$ (not negative), so $U_D$ does not drop β€” the enclave's inertia only affects $C$ "from the inside." But if the $a=0$ region is scattered and fragmented, with $C$ and $D$ mixed, $D$-$D$ encounters are frequent, $U_D$ drops faster, and cooperation inertia actually builds more quickly.

(This logic is a bit counterintuitive: a cooperative enclave that is too "safe" leaves $D$ unpunished, which works against cooperation pulling ahead on the cumulative ledger. A fragmented $a$ trough β€” one that lets $D$s bite each other β€” is the shortcut to $C$ building inertia.)

Short $N$: defection's inertia flattens cooperation. Model 2 is the least favourable to $C$ among all possibilities.

Long $N$: cooperation's inertia fights the reset. Model 2 may hold $C$ better than Model 1 β€” because Model 1's cooperators have no memory, and the reset sweeps them away under $D$; Model 2's cooperators have reserves to fall back on, and can hold out until the next $a$ trough.

Translated into Another Language

The previous post translated the two models into "instantaneous selection" and "historical selection." At the time, I said we do not know which one natural selection is really looking at β€” the single-step fitness, or the cumulative fitness.

In this version with a resource rhythm, the distinction is no longer just "slower." It now has a front and a back.

The weakness of instantaneous selection is no longer "short-sightedness" β€” it is a problem that only shows up in finite populations: the abiotic conditions have changed, but the biotic response falls behind because of a lack of demonstrators. The payoff matrix has already been retuned to Hawk-Dove, but the density is too low, and $D$ is locked in place. This is something the mean-field equation will never see β€” the mean field assumes an infinite, well-mixed population, where density is never a variable.

The case of historical selection depends on the timescale of environmental fluctuation β€” that is, $N$.

Short $N$ extreme: historical selection is defection's accomplice. The cumulative lead that $D$ builds during the Prisoner's Dilemma phase holds back every attempt to switch. $C$ has not yet had time to settle its old debts in the $a$ trough before the reset arrives. History is not memory; it is a millstone.

Long $N$ extreme: historical selection is cooperation's armour. Over a long enough Hawk-Dove window, $C$ accumulates reserves β€” not just one or two steps of current payoff, but dozens of cumulative steps. The reset reopens the Prisoner's Dilemma, but $C$'s cumulative lead absorbs $D$'s first wave of counterattack β€” buying the population a "thinking it over" grace period.

One mechanism. Two fates. The switch is $N$.

This brings to mind a very concrete question in biology: how long is the environmental fluctuation cycle? A season? A year? A decade? If the resource trough is deep enough and long enough β€” say, a multi-year drought β€” then natural selection's cumulative ledger will leave cooperators with a foothold when the rainy season returns. If the trough is just a flash β€” a cold snap that lasts only a few days β€” the cumulative ledger instead becomes a drag: cooperators have not yet had time to benefit from the historical score before the environment warms up.

adaptive lag is not an absolute value. It is the phase difference between the memory window and the environmental rhythm.

Instantaneous selection overfits in environments with sharp fluctuations β€” a chance $a$ trough tricks it into switching to cooperation, a chance $a$ peak tricks it into switching back.

Historical selection underfits in slowly drifting environments β€” the old information weighs too heavily, and it stubbornly refuses to follow the new signal.

But if the length of the fluctuation cycle happens to fall inside the memory window β€” then historical selection is neither underfit nor overfit. It does one thing: filter out high-frequency noise with low-frequency inertia. It is not jerked around by every twitch of $a$, and it does not completely ignore the trend in $a$.

Natural selection may not look at a single step, nor at the entire history. It may be some kind of sliding window β€” a weighted sum of the most recent $n$ steps. The width of that window is, perhaps, evolution's definition of "the present."

Too narrow a window, and noise is everything. Too wide a window, and the signal drowns in the old ledger.

And on this breathing grid, the window's width is $N$. With $N$ long enough, cooperation's inertia can take hold β€” not because cooperation is "better," but because this Hawk-Dove window is finally long enough to let history tell a different story.

A Note on Code

All of the above is still reasoning on paper. The mean-field approximation gives the prediction of a first-order ODE, and the second-order ODE gives the direction. But the local pairings, spatial patches, and finite populations in an ABM are the variables that truly determine where the story goes.

Where exactly is the critical density for Model 1. At what $N$ does Model 2's inertia flip from "weighing cooperation down" to "protecting cooperation." Does the spatial fragmentation of $a$ troughs help cooperation build inertia, or break cooperation's enclodes apart. As the reset step count is pulled from $3$ to $30$, does cooperation's relationship with $C$ rise monotonically, or does it bend at some point.

Maybe, once the code is finished, all of the above turns out to be wrong.