Agent harnesses · a journey up the stack

Ark & ArkOS

Building reliable coding agents — from harness to operating system.

① Agents & harnesses→ ② Ark→ ③ ArkOS→ ④ Future agent model

Agenda

Four steps up the stack

1Agents & harnesseswhat an agent is ·
the 7-part harness framework

2Arka workflow harness ·
tiers · lifecycle · specs

3ArkOSworkflow as a service ·
self-evolving substrate

4The future agent modelhow agents will work ·
autonomous, isolated, at scale

01

Agents & harnesses

A model in a loop is the engine. The harness is everything that makes it ship.

① Agents & harnesses→ ② Ark→ ③ ArkOS→ ④ Future agent model

① Agents & harnesses

What is an agent?

An agent = a loop (perceive → reason → act) wired to a model, tools, memory, and planning — acting on its environment until the goal is met.

① Agents & harnesses

The model is only the engine

📦 Framework a library — you import it LangChain · Agents SDK · CrewAI

⚙ Harness an app — you install & run it Claude Code · Codex · Aider · OpenHands · Ark

Framework = you import it. Harness = you install & run it.

① Agents & harnesses · the framework

Inside the harness

Four layers of runtime, three cross-cutting concerns — the Agent Harness Engineering survey's seven dimensions, in one map.

① Agents & harnesses · in detail

What each dimension does

Dimension	What it owns	Key techniques
🔁 Orchestration	how the agent decides & coordinates work	loop · subagents · skill composition
🧠 Context & state	what fits in the window vs. what persists	compaction · memory injection · planning-as-file
📦 Substrates	where code runs, safely & resumably	microVM · snapshot/restore · egress policy
🔌 Protocols	how the agent talks to tools & repos	MCP · AGENTS.md · git-native contracts
📊 Evaluation	whether a change is actually an improvement	SWE-bench · trajectory audit · RL envs
📡 Observability	seeing what the agent did & what it cost	tracing · cost attribution · replay
🛡 Guardrails	keeping autonomy safe & accountable	approval gates · policy · fail-closed

① Agents & harnesses · the thesis

The harness is load-bearing

46→80 same model two harnesses, one benchmark

+26% LangChain harness change only

23→45 scaffold basic vs tuned, SWE-bench

Same model. Only the wrapper changed.

02

Ark

An opinionated workflow harness — the seven dimensions, with discipline baked in.

① Agents & harnesses→ ② Ark→ ③ ArkOS→ ④ Future agent model

② Ark · the gap

Today's harnesses have no process

🌀 No structure

no PRD, no plan, no record of why

👤 No second opinion

the agent grades its own work

🧩 No memory

knowledge evaporates between sessions

② Ark · positioning

A workflow harness above the agent

Ark is local, multi-platform, workflow-opinionated — the meta-layer that turns any coding agent into a software-engineering process.

② Ark · the framework

Ark on the seven dimensions

Dimension	Ark's answer
🔁 Orchestration	tiered lifecycle + researcher / reviewer / verifier subagents
🧠 Context & state	structured artifacts on disk · `ark context` projection
📦 Substrates	worktree-per-task · `.ark.db` snapshots
🔌 Protocols	AGENTS.md · slash commands · per-platform templates
📊 Evaluation	VERIFY gate · PLAN → REVIEW step
🛡 Guardrails	state machine · user-staged atomic commit

Same seven axes — Ark's take is opinionated workflow discipline.

② Ark · the idea

Two pieces, working as one

⌘ ark-cli

the mechanism — a Rust binary you install & run

◇ ark-workflow

the discipline — a tiered lifecycle in workflow.md

Install once → every transition runs through an ark command.

② Ark · in practice

What it looks like

~/my-project — ark

$ ark init                          # scaffold .ark/ + agent integrations
✓ initialized ark · claude-code, codex, opencode

$ ark context                       # orient: git + tasks + specs
› /ark:design --deep "refactor auth layer"
  DESIGN → PLAN → REVIEW → EXECUTE → VERIFY → COMMIT
✓ one atomic commit: work + task.toml + SPEC + journal

Two layers of commands: the public lifecycle (init / context / …) and what you type inside the agent (/ark:*).

② Ark · CLI

One binary, two layers

⌘ Public — manage Ark's footprint

initcontextloadunloadupgradearchivecleanup

◇ Hidden `ark agent` — the workflow engine

task

newplanreviewexecuteverifycommitarchive

spec

extractimportregister

workspace

recorddeveloper

You type the slash commands; they call ark agent. The state machine guards every transition.

② Ark · tiers

Pick the smallest tier that fits

⚡ Quick

/ark:quick · reversible

PRD

◆ Standard

/ark:design · feature scope

PRDPLANVERIFY

◈ Deep

/ark:design --deep · architectural

PRDPLAN → REVIEWVERIFYSPEC

🔬 Research

/ark:research · corpus = deliverable

PRDresearch/

When in doubt, pick lower — promotion is cheap.

② Ark · workflow

One task, end to end

Each phase is an ark agent command. You type a slash command; it drives the state machine.

② Ark · lifecycle · decide

Decide what, then how

🟢

DESIGN

write the PRD — What · Why · Outcome · related specs

ark agent task new ⛓ fields filled

🔵

PLAN

elaborate how — goals G-N · architecture · API · validation

ark agent task plan ✓ every G-N → a test

🟣

REVIEW deep · single pass

independent reviewer files R-NNN — fold findings into PLAN, then execute

ark agent task review you pick the reviewer

② Ark · lifecycle · ship

Build · audit · close

🟠

EXECUTE

implement the PLAN, following project & feature specs

ark agent task execute ✓ tests · lints · build

🟡

VERIFY

audit shipped code — spec compliance · plan fidelity · drift

ark agent task verify ⛓ nothing PENDING

🔶

COMMIT

you stage, then one atomic commit closes the task

ark agent task commit → next slide

② Ark · the keystone

⚛ `commit` — five things, one commit

② Ark · subagents

The author never grades itself

🔎 ark-researcher DESIGN · PLAN gathers what the session lacks → research/

⚖ ark-reviewer REVIEW · deep judges the PLAN → verdict + R-NNN

✓ ark-verifier VERIFY · final gate audits shipped code → V-NNN

author→hands off→independent agent you choose

② Ark · memory

Memory ≠ context

💨 Context in-prompt · ephemeral gone when the session ends · costs tokens every turn

💾 Memory on disk · durable survives sessions, agents, platforms · loaded on demand

Ark's bet: state lives in structured files — PRD · PLAN · SPEC · VERIFY · journals. Available, not resident.

② Ark · memory · specs

Specs — the durable contract

Anchored, versioned, drift-detected — specs are how knowledge compounds instead of evaporating between sessions.

② Ark · memory · the rest

The rest of the durable state

✎Journals.ark/workspace/<dev>/a session block written on every commit

⑂Worktreesone branch / taskparallel tasks, no collisions

⤓Snapshotsunload / loadfreeze & restore the whole footprint

All on disk, versioned with the repo — the project remembers.

② Ark · architecture

Inside the binary

The CLI parses & prints; the core does everything. A platform registry drives all three agents from one code path.

03

ArkOS

The harness becomes a substrate — workflow as a service, for agents instead of humans.

① Agents & harnesses→ ② Ark→ ③ ArkOS→ ④ Future agent model

③ ArkOS · RFC 001

Next: ArkOS

What Ark does for humans,
ArkOS does for agents.

⌘ Ark human · gated a CLI you drive

⚖

siblings
not a stack

◈ ArkOS agent · autonomous workflow as a service

services →lifecycletask treememorySPEC storageevent loggrounding hooks

③ ArkOS · how it evolves

Swap the agent form. The substrate stays.

Stage 1: bootstrap on today's runtimes (Ark's primitives re-exposed to agents). Stage 2: grow a native runtime — the agent form swaps, the substrate is constant.

③ ArkOS · self-evolution

Self-evolving — but grounded

Like the Linux kernel: it doesn't self-grade — programs running on it shipping faster is the signal.

04

The future agent model

My vision for how agents will work — autonomous, isolated, and running at scale.

① Agents & harnesses→ ② Ark→ ③ ArkOS→ ④ Future agent model

④ The future · the agent model

From one agent
to a fleet of sandboxes

🧑‍💻 Today one agent · your terminal you drive it, step by step

→

🛰️ Next N isolated sandboxes each drives its own project, on its own

Isolation is a stack: worktree → container → microVM / hypervisor. The fleet model takes it all the way down.

④ The future · architecture

Hypervisor below · agent sandboxes above

One hypervisor schedules many VMs; each VM hosts one agent sandbox — a project plus its own ArkOS agent. No sibling-interference: every project is walled off.

④ The future · inside a sandbox

Each sandbox runs itself

Gates from Ark · autonomy & grounding from ArkOS · isolation from the sandbox.

Up the stack

①Agenta model in a loop — the harness makes it reliable

②Arka workflow harness — gates, review, living specs

③ArkOSworkflow as a service — self-evolving, grounded

④The fleetautonomous agents, isolated, at scale

npm install -g @anekoique/ark github.com/Anekoique/ark Thank you — questions?

Ark & ArkOS

Four steps up the stack

Agents & harnesses

What is an agent?

The model is only the engine

Inside the harness

What each dimension does

The harness is load-bearing

Ark

Today's harnesses have no process

🌀 No structure

👤 No second opinion

🧩 No memory

A workflow harness above the agent

Ark on the seven dimensions

Two pieces, working as one

⌘ ark-cli

◇ ark-workflow

What it looks like

One binary, two layers

⌘ Public — manage Ark's footprint

◇ Hidden ark agent — the workflow engine

Pick the smallest tier that fits

⚡ Quick

◆ Standard

◈ Deep

🔬 Research

One task, end to end

Decide what, then how

Build · audit · close

⚛ commit — five things, one commit

The author never grades itself

Memory ≠ context

Specs — the durable contract

The rest of the durable state

Inside the binary

ArkOS

Next: ArkOS

Swap the agent form. The substrate stays.

Self-evolving — but grounded

The future agent model

From one agentto a fleet of sandboxes

Hypervisor below · agent sandboxes above

Each sandbox runs itself

Up the stack

◇ Hidden `ark agent` — the workflow engine

⚛ `commit` — five things, one commit

From one agent
to a fleet of sandboxes