In Search of an Easy Way to Arm Your Agent Fleet: YAML, CLI, and In-Repo Allocation

This is the second in a series of documents where I work through the architecture of an agent-orchestration library I'm building. The previous one was about the environment in which the agent runs. This is about how to actually declare and use the library.

I wanted to build something that allowed me for declaring agents easily. The way most libraries achieve this is usually by making it a thread, with a fixed configuration. Then spawning one is trivial because there's almost nothing to configure.

I wanted each agent to be its own independent entity, with its own runtime, workspace, harness, and toolchain. This certainly has a cost: If every agent can differ on every axis, then declaring and operating a fleet would be very tedious, because every agent would have to spell out everything about itself.

Note that each agent still has its own isolated workspace, artifacts, and secrets, for the reasons I went through in the previous post. This one is about how to declare and drive them.

Here is the solution that I came up with

Where It Lives

Everything lives under .agents in the repo root. The config sits there too, in agents.yaml, next to the artifacts, secrets, and workspaces from before.

<repo-root>/
	.agents/
		.artifacts/
			agents/
			common/
		.secrets/
			agents/
			common/
		.workspaces/
		agents.yaml

The thing I care about here is that the whole fleet is a directory I can open and read. The config and the state are in the repo, possibly in git.

Declaring Agents

The config has two parts: defaults that every agent inherits, and per-agent entries that override only the axes that differ.

defaults:
  harness: claude
  workspace:
    environment: filesystem
    mode: worktree

agents:
  atlas:
    harness: codex
    runtime: docker
    toolchain: nodepy
    workspace:
      mode: clone
  agent2:
    harness: kimi
  agent3:
    harness: codex
  agent4: null
  agent5: null
  agent6: null

Each agent inherits everything from defaults and changes only what it needs. atlas runs a different harness, runtime, and workspace. agent2 only swaps its harness. agent4 through agent6 are null, so they take the defaults whole.

Any agent can override any axis by writing down the parts that actually differ.

I chose yaml over JSON because this is a file I hand-edit constantly while running the fleet, so I want it optimized for reading and editing, not for being a serialization format.

Driving Them

To start an agent you would do:

agents init # automatically creates the directories for each agent
agents provision atlas # creates the workspace
agents start atlas # starts the actual environment (like the docker container)

init sets things up, provision builds an agent's environment, start runs it.

Once an environment is started, then you can attach to it by doing:

agents attach atlas --via tmux

# to open a single tmux session with one window per agent
agents attach atlas,agent2,agent3 --via tmux

Attaching to the environment is one of the ways to communicate with an agent. What this command does is teletransporting the user to the environment. There is another method of communication.

The important part is that each agent can have a different, personalized environment, without this impacting into conveniency: the environment itself knows how to attach / talk to it.

init and doctor Commands

The library provides two commands that make it easy to set everything up, and ensure the tools are set up correctly.

agents init is the bootstrap command. It creates .agents, copies a built-in template into it, and lays down the directory structure that the rest of the library assumes exists.

Secrets have one slightly non-obvious implementation detail: in Windows, the directories are created with mode 755, and most copied credential files are expected to be 644. That looks loose if you only think about a single-user host process, but it is intentional for Docker-based agents. A lot of agentic tools require to run as a non-root user. Mounted secrets need to be readable inside the container. For the Kimi credential file the expected mode is stricter, 600, because that tool expects it.

There is also an optional --copy-credentials flag. When used, init copies known AI-tool credentials from the user's home directory into .agents/.secrets/common, including Claude, Codex, Gemini, GitHub CLI/Copilot, Kimi, and MiniMax config files. This gives all agents a common fallback credential set without forcing each agent to have its own copy.

agents doctor is the corresponding preflight command.

It checks the directory tree first, verifying that the files exist. Missing per-agent secret directories are not failures, because many agents can rely on common secrets, but wrong permissions on existing secret directories are reported with the exact chmod command to fix them.

After that it checks tools. git is required. Harness binaries are discovered from the configured agents and reported if present, but they are treated as optional at doctor time because an agent may run them inside Docker instead of on the host. Docker is only a hard requirement if at least one configured agent uses runtime: docker; otherwise it is just reported as optional.

Finally unless disabled with --no-network, it runs outbound connectivity checks for GitHub, Anthropic, and OpenAI.

The command exits with 0 only when every required check passes. Otherwise it exits with 1 and prints a summary of the failures and hints. This makes it useful both as a human command and as a setup gate in scripts.

Feedback and Next Steps

There are a few valuable user suggestions regarding this, which I still need to plan how to implement:

One layer I would add is a tiny per-agent state/receipt file: goal, touched files, assumptions/contracts, checks run or skipped, and what would make the agent's work stale.

Note about this suggestion: this is actually thought into the design, but its not really a layer but something that I could instruct the agent to do.

I'd try to keep credentials mounted read-only and tool-specific, with the shortest lifetime you can tolerate, and make "which agent saw which secret" auditable.

They don't break the existing model though, they are improvements that can be made on top.

This part is pretty simple, but it sets the foundation for everything else. Next is about the environment lifecycle for each agent.