Swarm Economies: Market-Based Coordination for Autonomous Agent Fleets

Abstract

A fleet of autonomous agents sharing a finite pool of resources - tool slots, API rate limits, GPU quota, a spend budget - faces a coordination problem that the prevailing answer, a central orchestrator, solves badly. We make the case for an internal market precise. We model coordination as a welfare-maximizing allocation of $m$ scarce resources among $n$ agents with private valuations, and establish four results. First, the welfare-maximizing combinatorial allocation is NP-hard and requires centralizing every agent's private valuation, so the planner scales with neither the fleet nor the catalogue (Proposition 1). Second, a competitive-equilibrium allocation at clearing prices is Pareto efficient - the First Welfare Theorem - so the market reaches the efficient frontier with each agent doing only local arithmetic on a price (Theorem 1). Third, the Vickrey-Clarke-Groves mechanism makes truthful bidding a dominant strategy by charging each winner the externality it imposes (Proposition 2). Fourth, for the relevant smooth auction class the price of anarchy is bounded by a constant, so selfish equilibria retain a constant fraction of optimal welfare (Theorem 2); and per-agent budgets, denominated in a common currency, bound consumption so runaway behavior is self-limiting (Proposition 3). Each quantitative claim is validated by a reproducible Monte Carlo simulation, included and linked. The thesis follows: prices, not a planner, are the cheapest coordination kernel for an agent fleet, and their guarantees - efficiency, truthfulness, graceful degradation - are exactly the ones a central optimizer cannot promise.

The instinct, when a hundred agents start fighting over the same rate limit, is to appoint a manager. It is the wrong instinct, and economics has known why for seventy years.

Picture the situation concretely. A fleet of agents is at work - some researching, some writing code, some reviewing, some idle - and all of them draw on a shared substrate: a bounded number of concurrent tool slots, a rate-limited external API, a pool of GPU quota for embeddings or fine-tuning, and a finite dollar budget that funds the model calls underneath everything. At any instant the demand for these resources exceeds the supply. Two agents want the last sandbox; three want to flush a large batch through the same throttled endpoint. Something has to decide who gets what, and when.

The default answer is a central scheduler that knows everyone's needs and hands out resources optimally. This is the answer the previous essays in this series leaned on for a single agent's task graph, and within one agent it is correct. Across a fleet it breaks, for reasons that are not engineering inconveniences but structural impossibilities. The thesis here is that coordination among self-interested, resource-contending agents is best handled not by a planner but by a market: prices as signals, auctions for allocation, budgets as control. Markets come with efficiency theorems, they decentralize the information a planner would have to centralize, and - the property I find most underrated - they degrade gracefully.¹

Contributions. We (i) give a minimal but faithful model of fleet coordination as combinatorial welfare maximization under private valuations; (ii) state the hardness and informational barriers that doom the planner (Proposition 1), and the efficiency, truthfulness, and robustness guarantees the market enjoys in their place (Theorem 1, Proposition 2, Theorem 2); (iii) show that per-agent budgets convert an open-ended safety question into a closed-form one (Proposition 3); and (iv) validate the quantitative claims with seeded simulations whose code is included and runnable - the market holds near-optimal welfare as contention grows, and selfish equilibria stay above the smoothness price-of-anarchy floor.

Coordination is allocation, and allocation is hard

Strip the scenario to its mathematical core. There are $m$ scarce resources and a population of $n$ agents indexed by $i$ . An allocation $x = (x_{1}, \dots, x_{n})$ assigns bundles to agents, drawn from the feasible set $X$ - the set of assignments that respect every resource's supply.

Definition 1 (Allocation problem). Each agent

i

holds a private valuation

v_{i} : X_{i} \to ℝ_{\geq 0}

over the bundle

x_{i}

it receives, and may hold a budget

b_{i}

in a common currency. The social welfare of an allocation is

W (x) = \sum_{i} v_{i} (x_{i})

. The planner seeks the welfare-maximizing assignment.

x^{⋆} = \underset{x \in X}{arg max} W (x) = \underset{x \in X}{arg max} \sum_{i} v_{i} (x_{i}) .

This is the welfare-maximizing allocation, and it is the problem a central planner is implicitly trying to solve every time it decides who gets the sandbox. Two facts about it doom the planner.

Proposition 1 (Central allocation is hard and centralizing). When agents have combinatorial valuations over bundles - resources that are complements or substitutes - computing

x^{⋆}

is NP-hard, and inapproximable to within

m^{1 / 2 - ε}

under standard assumptions. Moreover, posing the objective at all requires the planner to hold every private valuation

v_{i}

centrally.

The single-minded combinatorial-auction winner-determination problem contains weighted set packing, which is NP-hard and hard to approximate beyond

m^{1 / 2 - ε}

; fleet allocation under complementarities reduces to it.[9] The objective

W (x) = \sum_{i} v_{i} (x_{i})

cannot even be evaluated without each

v_{i}

, which lives inside agent

i

's local state. □

The planner's decision time blows up in the size of the fleet and the catalogue of resources, exactly the two quantities we intend to grow. And the informational barrier is the deeper one: those valuations are private - they live inside each agent's local state, its task, its sense of urgency - and eliciting them honestly is its own mechanism-design problem. A planner that demands global private information before it can act is not merely slow; it is a centralization of knowledge that the system does not naturally possess. This is Hayek's objection to central planning, transplanted intact into the datacenter: the relevant information is dispersed, local, and never given to one mind in its totality.

Prices decentralize the impossible problem

The market's trick is to never assemble the global picture at all. Instead of collecting valuations, post a price for each resource. Let $p$ be a price vector, one entry per resource. Each agent, seeing only $p$ and its own valuation, solves a small local problem - demand the bundle that maximizes its value net of cost - producing a demand $d_{i} (p)$ . A price vector $p^{⋆}$ clears the market when aggregate demand equals supply $s$ :

\sum_{i} d_{i} (p^{⋆}) = s .

The allocation that arises at clearing prices is a competitive equilibrium, and the reason it matters is the First Welfare Theorem.

Theorem 1 (First Welfare Theorem). Under local non-satiation (satisfied by the monotone valuations assumed here), every competitive-equilibrium allocation

(x^{⋆}, p^{⋆})

is Pareto efficient: no feasible reallocation makes some agent strictly better off without making another strictly worse off.

Suppose a feasible

x'

Pareto-dominates

x^{⋆}

. Any bundle an agent strictly prefers to its equilibrium choice must cost strictly more at prices

p^{⋆}

(else it would have demanded it); any weakly preferred bundle costs at least as much. Summing the budget inequalities,

p^{⋆} \cdot \sum_{i} x_{i}^{'} > p^{⋆} \cdot \sum_{i} x_{i}^{⋆}

, contradicting market clearing

\sum_{i} x_{i}^{'} = \sum_{i} x_{i}^{⋆} = s

. □

No reassignment can make some agent better off without making another worse off - which is to say the market reaches the efficient frontier the planner was straining toward, but reaches it with each agent doing only local arithmetic on a price. The global objective is solved without anyone holding the global problem. Existence is not wishful: Arrow and Debreu proved that under convexity a competitive equilibrium exists for a general economy,[5] and Wellman's market-oriented programming showed the same machinery solving distributed multicommodity flow as a computational artifact rather than an economic metaphor.[6]

How does the system find $p^{⋆}$ ? By the oldest idea in the book: tâtonnement. Raise the price of any resource in excess demand, lower the price of any in excess supply, repeat. Under gross substitutes this iteration converges to the clearing vector, and its discrete cousin - an ascending auction where the price of an over-demanded item ticks up until demand fits supply - is the practical realization. The auctioneer broadcasts prices, agents respond with demand, prices adjust. Nobody centralizes a valuation; the price is the only message that ever crosses the wire. Figure 1 sketches the whole loop.

Buyers bid value and quantity upward to the auctioneer, which iterates a clearing price (the dashed tâtonnement loop) and pushes prices and allocations down to the scarce resources. Each agent's budget meter throttles how much it can demand; no agent ever reveals its private valuation to the center.

Auctions price the externality

Clearing prices coordinate when valuations are honestly revealed, but agents have no reason to be honest about demand if shading it lowers what they pay. Auction theory closes this gap. The canonical instrument is the Vickrey-Clarke-Groves (VCG) mechanism, which generalizes the second-price sealed-bid auction [2] to arbitrary combinatorial allocation [3][4]. VCG first computes the welfare-maximizing allocation $x^{⋆}$ given reported valuations, then charges each winner not what it bid but the externality it imposes on everyone else.

Proposition 2 (VCG truthfulness). Under the VCG payment - the difference between the welfare others would enjoy in agent

i

's absence and the welfare they enjoy in its presence - reporting one's true valuation is a dominant strategy, and the mechanism implements the welfare-maximizing allocation

x^{⋆}

in dominant strategies.

t_{i} = \sum_{j \neq i} v_{j} (x_{- i}^{⋆}) - \sum_{j \neq i} v_{j} (x_{j}^{⋆}) .

Fix the others' reports. Agent

i

's utility under report

{\hat{v}}_{i}

v_{i} (x_{i}^{⋆}) - t_{i} = v_{i} (x_{i}^{⋆}) + \sum_{j \neq i} v_{j} (x_{j}^{⋆}) - K

, where

K = \sum_{j \neq i} v_{j} (x_{- i}^{⋆})

is independent of the report. The agent's utility is thus the total reported welfare minus a constant, and the chosen

x^{⋆}

maximizes exactly that quantity precisely when its own report is truthful. □

The remarkable consequence is that bidding one's true valuation becomes a dominant strategy: no report, whatever others do, beats honesty. Strategic deliberation collapses, and the mechanism implements the welfare-maximizing allocation as an equilibrium. For an agent fleet this is seductive - it makes the global optimum the thing self-interested agents arrive at on their own.

The seduction has limits. VCG inherits the NP-hardness of the allocation it sits on: computing payments means solving the welfare problem once with all agents and once per winner without them. It can extract low revenue, it is vulnerable to false-name and collusive bidding, and its truthfulness rests on agents being rational utility-maximizers - an assumption that a language model bidding on its own behalf may quietly violate.² So in practice we reach for simpler instruments: posted prices, ascending clock auctions, proportional-share schemes. These sacrifice exact optimality, and the discipline that tells us how much we sacrifice is the price of anarchy.

Graceful degradation and the price of anarchy

When agents play simple, self-interested strategies rather than a perfectly truthful optimal mechanism, the system settles into an equilibrium that need not be welfare-optimal. The price of anarchy measures the damage: it is the ratio of the optimum to the worst equilibrium's welfare,

PoA = max_{E \in E} \frac{OPT}{W (E)} (\geq 1),

so that $1 / PoA$ is the fraction of optimal welfare the worst equilibrium retains.

Theorem 2 (Constant price of anarchy via smoothness). For the relevant smooth class of mechanisms the price of anarchy is bounded by a constant: simultaneous item auctions with submodular valuations have

PoA \leq e / (e - 1) \approx 1.58

, and the bound is robust - it holds not only at exact Nash equilibria but at coarse-correlated equilibria and under no-regret learning dynamics. Hence every such equilibrium retains at least a constant fraction

(e - 1) / e \approx 0.63

of optimal welfare.

A mechanism is

(λ, μ)

-smooth if for every strategy profile there is a deviation guaranteeing each player a payoff such that the welfare obeys

\sum_{i} u_{i} (a_{i}^{⋆}, a_{- i}) \geq λ \cdot OPT - μ \cdot W (a)

. Summing the equilibrium conditions over players yields

W (E) \geq λ / (1 + μ) \cdot OPT

; the same inequality, being a one-shot deviation argument, survives the relaxation to coarse-correlated and no-regret play.[8] Instantiating the smoothness constants for simultaneous item auctions gives

e / (e - 1)

. □

Self-interested agents, playing cheap decentralized strategies, stay within a small constant factor of the welfare a flawless central planner would achieve - and they do so while learning, while mis-bidding, while the population churns. This is the property a planner cannot match. A central optimizer that loses a partition, lags under load, or chokes on the combinatorics produces no allocation - the fleet stalls on a single point of failure. A market has no such point. If the auctioneer is slow, prices simply lag and agents transact at stale prices, losing a little efficiency. If a fraction of agents behave irrationally, the smoothness bound still caps the damage. Coordination becomes a property of the equilibrium rather than of any one component's correctness, and equilibria, unlike planners, fail soft.

Theorems are existence statements; the engineering question is whether a market actually holds efficiency as a fleet scales. Figure 2 answers it by simulation. We draw random instances of $m$ units of a scarce resource and $n$ agents with private per-unit values, clear by value density (the market), compare against serving agents in arrival order (a naive planner with no price signal), and score both against the exact combinatorial optimum computed by dynamic programming. As contention $n / m$ rises, the market's realized welfare stays near the optimum while the naive heuristic decays - precisely the regime where coordination matters most.

Efficiency ratio (realized welfare divided by the exact optimum) versus contention n/m from about 0.3 to 2.7. The market-clearing curve stays near 0.95 to 0.99 across the whole range with a tight percentile band; the naive greedy curve starts comparable but falls below 0.7 as contention grows, with a wider band. — Allocation efficiency against contention. Market clearing (value-density ordering, the clearing rule below) holds near-optimal welfare as the fleet outgrows supply; a naive arrival-order heuristic degrades. Bands are 10th-90th percentiles over random instances; the dotted line marks the exact optimum computed by a 0/1-knapsack dynamic program ( $m = 6$ , 4,000 instances per point).

Efficiency under truthful clearing is the optimistic half. The robustness claim of Theorem 2 is the load-bearing one, and it too can be put to the simulation. Figure 3 lets bidders shade their reports - underbidding their true per-unit value by a tunable amount - and tracks the realized welfare of the resulting selfish allocation as a fraction of the optimum, as the shading parameter $φ$ grows. The welfare ratio sags as agents grow more strategic, but it never falls through the constant smoothness floor $(e - 1) / e \approx 0.63$ - the empirical price of anarchy stays bounded, exactly as the theory predicts.

Welfare ratio of the selfish equilibrium to the optimum versus a strategic shading parameter phi from 0 to about 0.95. The curve starts near 0.99 at phi = 0 and declines to about 0.88 as shading grows, with a percentile band whose lower edge stays near 0.76, well above a dashed horizontal line at 0.63 marking the smoothness price-of-anarchy floor. — Empirical price of anarchy. As bidders shade reports more aggressively (parameter $φ$ ), realized welfare relative to the optimum declines but stays above the constant smoothness floor $1 / (e / (e - 1)) \approx 0.63$ (dashed). Bands are 10th-90th percentiles over instances ( $n = 14$ , $m = 8$ , 6,000 instances per point).

Budgets are control, and budgets are safety

The final piece turns the market from a coordination device into a governance device. Give each objective - each top-level task the fleet is pursuing - a budget denominated in a common internal currency, and let agents spend it to bid for resources. Two things fall out for free.

The first is automatic rationing by importance. Prices ration scarce tools without anyone writing a priority policy: high-value work can outbid low-value work because it is willing to spend more of its budget, and the clearing price rises until only the work that values a resource above its market price gets it. Priorities are not declared in a config file; they are revealed by willingness to pay, and they re-sort themselves continuously as conditions change. This is dominant resource fairness [10] and its market kin: fairness and efficiency expressed in a single denomination.

Proposition 3 (Budgets as throttle and safety boundary). If agent

i

holds budget

b_{i}

in the common currency and every grant is debited at a non-negative clearing price, then the total resource value it can consume over any horizon is bounded by

b_{i} / \min_{r} p_{r}

; once its balance reaches zero it can win nothing further. Runaway consumption is therefore self-limiting, with a blast radius fixed in advance by

b_{i}

The second property is safety, which I take to be the quietly decisive argument.³ A runaway agent - stuck in a loop, exploiting a tool, hallucinating an expensive plan - is, in a market, an agent that spends. As it spends, the resources it monopolizes rise in price, which throttles it; and when its budget hits zero, it simply stops, exactly as Proposition 3 makes formal. The blast radius of a malfunction is bounded by an accounting limit set in advance, not by a watchdog noticing in time. Better still, the budget makes the cost of any behavior - legitimate or pathological - observable in a common unit. An exploit has a price tag; a misfiring agent leaves a ledger. Denominating action in a finite currency converts an open-ended safety question into a closed-form one.

The auctioneer's core is small. A sketch of a single clearing round:

# A market clearer. Agents bid against budgets; the auctioneer
# allocates a scarce resource and charges the externality (VCG-style).

def clear(bids, supply, budgets):
    # bids: list of (agent_id, value, qty); supply: units available
    # keep only bids the agent can actually afford
    live = [b for b in bids if b.value <= budgets[b.agent_id]]
    live.sort(key=lambda b: b.value / b.qty, reverse=True)

    alloc, pay, remaining = {}, {}, supply
    for b in live:
        if b.qty > remaining:        # all-or-nothing: skip bids that do not fit
            continue
        grant = b.qty
        # price = the externality: the best displaced bid per unit
        displaced = next((x for x in live if x is not b
                         and remaining - grant < x.qty), None)
        unit_price = (displaced.value / displaced.qty) if displaced else 0.0

        alloc[b.agent_id] = grant
        pay[b.agent_id] = unit_price * grant
        budgets[b.agent_id] -= pay[b.agent_id]   # debit; a broke agent cannot bid again
        remaining -= grant

    return alloc, pay, unit_price        # the last unit_price is the clearing price

A single-resource clearing round. Affordability is checked against budgets, allocation is greedy by value density, the price is the externality of the marginal displaced bid, and budgets are debited so spend is self-limiting.

Reproducibility

Figures 2 and 3 are produced by a single seeded script with no inputs beyond its parameters; the market allocation is scored against the exact combinatorial optimum so the reader can see how close clearing comes to it. The full script is available here; the core of the efficiency experiment, which clears by value density and computes the exact optimum on each small instance, is:

# efficiency = market welfare / exact optimum, over random instances
for n in ns:                                  # sweep the fleet size; m fixed
    for _ in range(trials):
        w, q, cap = draw_instance(n, m, RNG)     # per-unit values, demands, supply
        opt    = welfare_opt(w, q, cap)          # exact optimum (0/1-knapsack DP)
        market = welfare_market(w, q, cap)[0]    # greedy-by-value-density clearing
        naive  = welfare_naive(w, q, cap)        # serve in arrival order, no prices
        mkt_r.append(market / opt)             # efficiency ratio in [0, 1]
        naive_r.append(naive / opt)

The efficiency simulation (excerpt). Running the full script regenerates both figures deterministically.

Related work

None of this is new, and that is the strength of the argument. Market-based coordination of computational agents has a forty-five-year lineage. Smith's contract net protocol [1] let nodes in a distributed problem solver announce tasks and award them to the best bidder - a decentralized auction in everything but name. The truthfulness machinery descends from Vickrey's second-price auction [2] and its generalization through Clarke [3] and Groves [4] to the VCG mechanism. Arrow and Debreu established existence of competitive equilibrium [5], the foundation under Theorem 1. Wellman's market-oriented programming [6] made prices a first-class programming construct and computed equilibria to solve resource problems; Clearwater's collection on market-based control [7] gathered a decade of evidence that decentralized price systems regulate everything from climate control to bandwidth. Roughgarden's smoothness framework [8] furnishes the robust price-of-anarchy bound of Theorem 2, and the algorithmic-game-theory program [9] supplies the hardness and inapproximability results of Proposition 1. Dominant resource fairness [10] is the budgeted-fairness kin of Proposition 3. Our contribution is not a new mechanism but a synthesis: we assemble these guarantees - hardness, efficiency, truthfulness, robustness, bounded blast radius - into the case that an agent fleet is the setting these ideas were waiting for, and we validate the two quantitative ones by simulation.

Limitations and threats to validity

The model is deliberately minimal, and its assumptions are where it can mislead. Bidders may not be rational. VCG truthfulness (Proposition 2) and the smoothness bound (Theorem 2) both assume agents are utility-maximizers, but the bidders here are language models that may misreport, hallucinate valuations, or collude; strategic manipulation and false-name bidding are real threats, and the threat model deserves its own treatment rather than an assumption. Valuations are hard to elicit. The whole apparatus presumes an agent can name a price for a resource in a common currency, but mapping a half-finished task's marginal need for GPU quota onto a scalar bid is itself an unsolved inference problem. VCG is computationally heavy. Proposition 1's hardness falls on VCG too, so exact payments are infeasible at fleet scale and the simpler instruments we actually deploy only inherit the price-of-anarchy guarantee, not optimality. The simulation's valuation model is synthetic. Figures 2 and 3 draw values from fixed distributions and clear a single divisible resource; they confirm the mathematics is faithful to its own assumptions but say nothing about the correlated, bursty, multi-resource demand of a real fleet. Results validate models, not deployments. Nothing here substitutes for measuring efficiency and equilibrium welfare on a live multi-agent LLM system, which is the natural next step and the place where these predictions should be falsified.

Conclusion

The shift in stance is the whole point. Stop asking who should decide and start asking what should the price be. The planner's question is NP-hard, centralizing, and brittle; the market's question is local, incentive-aligned, and robust. For a single agent, plan. For a fleet, price.

This is the fourth essay in an arc with one underlying claim: autonomy at scale is a systems problem before it is a modeling problem. We began with the substrate - the operating system that arbitrates one agent's contended resources. We took its memory subsystem and treated the context window as the top of a managed hierarchy. We addressed verification - making a single agent reliable enough to trust per step, and found that reliability is bought on the checking side, up to a ceiling set by how correlated the checks are. And now coordination - markets across a whole fleet, where no single mind can or should hold the global problem. Each layer is the same move: take an ad-hoc behavior the model was improvising and replace it with a principled mechanism that comes with guarantees. The market is not an application running on the substrate; it is a kernel-level coordination service, as load-bearing as the scheduler. The endpoint of the arc is software that runs itself - not because the model got smart enough to manage everything, but because we stopped asking it to.

“Graceful degradation” is doing real work here. A central planner has a sharp failure mode: when it is overloaded or partitioned, the system produces no allocation at all. A market has a soft failure mode: stale prices, slightly inefficient equilibria, but always an allocation. The price-of-anarchy bound of Theorem 2 quantifies exactly how soft. ↩

VCG’s dominant-strategy truthfulness assumes rational utility-maximizers. The bidders here are language-model agents that may misreport, hallucinate valuations, or collude. The threat model is real; mitigations include constraining and normalizing the bid space, preferring strategyproof-by-construction mechanisms (posted prices, serial dictatorship), capping per-agent influence, and auditing the ledger for anomalous spend. ↩

Budgets double as a safety mechanism: denominating every action in a finite common currency makes runaway behavior self-limiting (a broke agent cannot act, by Proposition 3) and makes the cost of an exploit observable and comparable across agents. Spend becomes both the throttle and the audit log. ↩