01
Agent harnesses · a journey up the stack

Ark & ArkOS

Building reliable coding agents — from harness to operating system.

① Agents & harnesses ② Ark ③ ArkOS ④ Future agent model
Agenda

Four steps up the stack

1Agents & harnesseswhat an agent is ·
the 7-part harness framework
2Arka workflow harness ·
tiers · lifecycle · specs
3ArkOSworkflow as a service ·
self-evolving substrate
4The future agent modelhow agents will work ·
autonomous, isolated, at scale
01

Agents & harnesses

A model in a loop is the engine. The harness is everything that makes it ship.
① Agents & harnesses ② Ark ③ ArkOS ④ Future agent model
① Agents & harnesses

What is an agent?

AGENT the agentic loop reason act perceive 🧠 Model reasoning core 🧰 Tools edit · shell · git · web 💾 Memory what it knows / recalls 🗺 Planning decompose · sequence ENVIRONMENT — codebase · OS · results

An agent = a loop (perceive → reason → act) wired to a model, tools, memory, and planning — acting on its environment until the goal is met.

① Agents & harnesses

The model is only the engine

📦 Framework a library — you import it LangChain · Agents SDK · CrewAI
Harness an app — you install & run it Claude Code · Codex · Aider · OpenHands · Ark

Framework = you import it. Harness = you install & run it.

① Agents & harnesses · the framework

Inside the harness

THE HARNESS — everything that isn't the model 🖥 InterfaceCLI · IDE · web UI — how you talk to it 🔁 Orchestrationthe agent loop · planning · subagents · retries 🧠 Context & memoryprompt · working state · compaction · recall 🧰 Tools & protocolsMCP · bash · file I/O · AGENTS.md 📦 Substrate & sandboxworktree → container → microVM CROSS-CUTTING 📊 EvaluationSWE-bench · RL envs 📡 Observabilitytraces · cost · replay 🛡 Guardrailsgates · policy · audit 🧩 Reference implsClaude Code · Codex apply to every layer

Four layers of runtime, three cross-cutting concerns — the Agent Harness Engineering survey's seven dimensions, in one map.

① Agents & harnesses · in detail

What each dimension does

DimensionWhat it ownsKey techniques
🔁 Orchestrationhow the agent decides & coordinates workloop · subagents · skill composition
🧠 Context & statewhat fits in the window vs. what persistscompaction · memory injection · planning-as-file
📦 Substrateswhere code runs, safely & resumablymicroVM · snapshot/restore · egress policy
🔌 Protocolshow the agent talks to tools & reposMCP · AGENTS.md · git-native contracts
📊 Evaluationwhether a change is actually an improvementSWE-bench · trajectory audit · RL envs
📡 Observabilityseeing what the agent did & what it costtracing · cost attribution · replay
🛡 Guardrailskeeping autonomy safe & accountableapproval gates · policy · fail-closed
① Agents & harnesses · the thesis

The harness is load-bearing

46→80 same model two harnesses, one benchmark
+26% LangChain harness change only
23→45 scaffold basic vs tuned, SWE-bench

Same model. Only the wrapper changed.

02

Ark

An opinionated workflow harness — the seven dimensions, with discipline baked in.
① Agents & harnesses ② Ark ③ ArkOS ④ Future agent model
② Ark · the gap

Today's harnesses have no process

🌀 No structure

no PRD, no plan, no record of why

👤 No second opinion

the agent grades its own work

🧩 No memory

knowledge evaporates between sessions

② Ark · positioning

A workflow harness above the agent

Ark — workflow harness tiered lifecycle · gates · SPECs · journals coding-agent harness Claude Code · Codex · OpenCode model + tools (the agent loop) host — your machine, git, filesystem

Ark is local, multi-platform, workflow-opinionated — the meta-layer that turns any coding agent into a software-engineering process.

② Ark · the framework

Ark on the seven dimensions

DimensionArk's answer
🔁 Orchestrationtiered lifecycle + researcher / reviewer / verifier subagents
🧠 Context & statestructured artifacts on disk · ark context projection
📦 Substratesworktree-per-task · .ark.db snapshots
🔌 ProtocolsAGENTS.md · slash commands · per-platform templates
📊 EvaluationVERIFY gate · PLAN → REVIEW step
🛡 Guardrailsstate machine · user-staged atomic commit

Same seven axes — Ark's take is opinionated workflow discipline.

② Ark · the idea

Two pieces, working as one

ark-cli

the mechanism — a Rust binary you install & run

ark-workflow

the discipline — a tiered lifecycle in workflow.md

Install once → every transition runs through an ark command.

② Ark · in practice

What it looks like

~/my-project — ark
$ ark init                          # scaffold .ark/ + agent integrations
 initialized ark · claude-code, codex, opencode

$ ark context                       # orient: git + tasks + specs
 /ark:design --deep "refactor auth layer"
  DESIGN → PLAN → REVIEW → EXECUTE → VERIFY → COMMIT
 one atomic commit: work + task.toml + SPEC + journal

Two layers of commands: the public lifecycle (init / context / …) and what you type inside the agent (/ark:*).

② Ark · CLI

One binary, two layers

Public — manage Ark's footprint

initcontextloadunloadupgradearchivecleanup

Hidden ark agent — the workflow engine

task
newplanreviewexecuteverifycommitarchive
spec
extractimportregister
workspace
recorddeveloper

You type the slash commands; they call ark agent. The state machine guards every transition.

② Ark · tiers

Pick the smallest tier that fits

⚡ Quick

/ark:quick · reversible
PRD

◆ Standard

/ark:design · feature scope
PRDPLANVERIFY

◈ Deep

/ark:design --deep · architectural
PRDPLAN → REVIEWVERIFYSPEC

🔬 Research

/ark:research · corpus = deliverable
PRDresearch/

When in doubt, pick lower — promotion is cheap.

② Ark · workflow

One task, end to end

DESIGN task new PLAN task plan REVIEW task review deep only fold findings → PLAN EXECUTE task execute VERIFY task verify COMMIT task commit -m gate: no PENDING ARCHIVED ark archive later · bulk RESEARCH /ark:research research → commit → archive (no PLAN/REVIEW/VERIFY)

Each phase is an ark agent command. You type a slash command; it drives the state machine.

② Ark · lifecycle · decide

Decide what, then how

🟢
DESIGN
write the PRD — What · Why · Outcome · related specs
ark agent task new ⛓ fields filled
🔵
PLAN
elaborate how — goals G-N · architecture · API · validation
ark agent task plan ✓ every G-N → a test
🟣
REVIEW deep · single pass
independent reviewer files R-NNN — fold findings into PLAN, then execute
ark agent task review you pick the reviewer
② Ark · lifecycle · ship

Build · audit · close

🟠
EXECUTE
implement the PLAN, following project & feature specs
ark agent task execute ✓ tests · lints · build
🟡
VERIFY
audit shipped code — spec compliance · plan fidelity · drift
ark agent task verify ⛓ nothing PENDING
🔶
COMMIT
you stage, then one atomic commit closes the task
ark agent task commit → next slide
② Ark · the keystone

commit — five things, one commit

① VERIFY gatedeep: refuse on PENDING ② extract SPECdeep · → features tree ③ task.tomlphase = Committed ④ stage filesexactly Ark's · no -A ⑤ journalsession entry 1 git commit all-or-nothing on failure ↺ scoped rollback your index untouched
② Ark · subagents

The author never grades itself

🔎 ark-researcher DESIGN · PLAN gathers what the session lacks → research/
ark-reviewer REVIEW · deep judges the PLAN → verdict + R-NNN
ark-verifier VERIFY · final gate audits shipped code → V-NNN
authorhands offindependent agent you choose
② Ark · memory

Memory ≠ context

💨 Context in-prompt · ephemeral gone when the session ends · costs tokens every turn
💾 Memory on disk · durable survives sessions, agents, platforms · loaded on demand
Ark's bet: state lives in structured files — PRD · PLAN · SPEC · VERIFY · journals. Available, not resident.
② Ark · memory · specs

Specs — the durable contract

📐 Project specs specs/project/<name>/SPEC.md user-authored conventions read before EVERY task · always apply 🧩 Feature specs specs/features/…/SPEC.md (recursive) machine-promoted contracts read only the ones your task touches deep PLAN ## Spec section on commit extract → SPEC.md + upsert INDEX leaf→root VERIFY checks adherence drift needs a CHANGELOG entry to pass

Anchored, versioned, drift-detected — specs are how knowledge compounds instead of evaporating between sessions.

② Ark · memory · the rest

The rest of the durable state

Journals.ark/workspace/<dev>/a session block written on every commit
Worktreesone branch / taskparallel tasks, no collisions
Snapshotsunload / loadfreeze & restore the whole footprint

All on disk, versioned with the repo — the project remembers.

② Ark · architecture

Inside the binary

ARK-CLI · thin adapter main.rs agent_cli.rs parse · dispatch · render(summary) ARK-CORE · all logic commands/ init load unload upgrade context … commands/agent/ task · spec workspace · worktree platforms.rs one path → claude·codex·opencode layout.rs rooted paths discover·resolve_safe state/ manifest·snapshot .state.toml io/ — structured FS & git PathExt · managed blocks · git agent/state.rs tier × phase transition table templates.rs — include_dir!() ark·claude·codex·opencode trees in binary

The CLI parses & prints; the core does everything. A platform registry drives all three agents from one code path.

03

ArkOS

The harness becomes a substrate — workflow as a service, for agents instead of humans.
① Agents & harnesses ② Ark ③ ArkOS ④ Future agent model
③ ArkOS · RFC 001

Next: ArkOS

What Ark does for humans,
ArkOS does for agents.
Ark human · gated a CLI you drive
siblings
not a stack
ArkOS agent · autonomous workflow as a service
services →lifecycletask treememorySPEC storageevent loggrounding hooks
③ ArkOS · how it evolves

Swap the agent form. The substrate stays.

WORKLOADS tasks agents work on · autonomous orchestrators · products — supply the grounding signal SUBSTRATE — peers, different audience Ark · human audience CLI harness · human-gated · no self-improve ArkOS · agent audience workflow as service · autonomy-operable · self-improves AGENT RUNTIMES — the swappable "agent form" Claude Code Codex OpenCode native runtime (stage 2) + raw LLM API HOST — POSIX · Linux · git · process isolation

Stage 1: bootstrap on today's runtimes (Ark's primitives re-exposed to agents). Stage 2: grow a native runtime — the agent form swaps, the substrate is constant.

③ ArkOS · self-evolution

Self-evolving — but grounded

ArkOS rev N+1 improved primitives agents run a real workload workload grades tests · LTP · panel faster · fewer iterations · smaller budget → keep the change 🚫 the substrate never grades itself it cannot edit its own evaluation harness · the judge is independent of the generator

Like the Linux kernel: it doesn't self-grade — programs running on it shipping faster is the signal.

04

The future agent model

My vision for how agents will work — autonomous, isolated, and running at scale.
① Agents & harnesses ② Ark ③ ArkOS ④ Future agent model
④ The future · the agent model

From one agent
to a fleet of sandboxes

🧑‍💻 Today one agent · your terminal you drive it, step by step
🛰️ Next N isolated sandboxes each drives its own project, on its own

Isolation is a stack: worktree → container → microVM / hypervisor. The fleet model takes it all the way down.

④ The future · architecture

Hypervisor below · agent sandboxes above

HYPERVISOR manages VMs · CPU · memory · isolation · scheduling bare metal / cloud host VMs microVM microVM microVM AGENT SANDBOXES 📁 project + 🤖 ArkOS agent drives it ⟳ 📁 project + 🤖 ArkOS agent drives it ⟳ 📁 project + 🤖 ArkOS agent drives it ⟳ + … N more running in parallel fully isolated

One hypervisor schedules many VMs; each VM hosts one agent sandbox — a project plus its own ArkOS agent. No sibling-interference: every project is walled off.

④ The future · inside a sandbox

Each sandbox runs itself

sandbox (one microVM) 🤖 ArkOS agent workflow as a service grounded · self-evolving 📁 the project code · tests · git the grounding signal DESIGN→PLAN→…→COMMIT tests pass / fail → next move ⟳ continuously, with no human in the loop

Gates from Ark · autonomy & grounding from ArkOS · isolation from the sandbox.

Up the stack

Agenta model in a loop — the harness makes it reliable
Arka workflow harness — gates, review, living specs
ArkOSworkflow as a service — self-evolving, grounded
The fleetautonomous agents, isolated, at scale
npm install -g @anekoique/ark github.com/Anekoique/ark Thank you — questions?
1 / 16
/ Space navigate · F fullscreen · O overview