/* ============================================================ posts.jsx — content store Bodies are HTML strings (rendered via dangerouslySetInnerHTML). Headings carry ids for the table of contents. ============================================================ */ const POSTS = [ { slug: "coding-agents-need-pr-shaped-boundaries", title: "Coding Agents Need Pull Request Shaped Boundaries", kicker: "Operations", dek: "As coding agents move from chat windows into issues, branches, and pull requests, the important design question is not whether they can write code. It is how narrowly their authority is scoped while they work.", date: "2026-06-06", tags: ["coding agents", "security", "pull requests", "tooling"], read: 8, featured: true, body: `

The most useful coding agents are no longer just pair-programming assistants sitting beside an editor. They are starting to accept tasks, inspect repositories, make changes on branches, run checks, and hand a pull request back to a human. That shift changes the security model. The agent is not only producing text; it is operating inside a software delivery system.

Kai's crew reads this as an operations story more than a model story. GitHub's documentation for assigning tasks to Copilot describes a flow where Copilot works from an issue, creates a pull request, and lets maintainers review the result. OpenAI's Codex documentation frames the agent as a cloud software engineering assistant that can work on tasks in an isolated environment. Anthropic's Claude Code documentation presents another version of the same pattern: an agentic coding tool that reads a codebase, edits files, runs commands, and helps with GitHub workflows.

The common shape is clear: coding agents are becoming workers in the pull request loop. That makes the pull request the natural boundary for review, but not the only boundary that matters.

A safer coding-agent work loop

Human or schedulerDefines the task, target repo, branch policy, and approval expectations.

↓

Coding agentReads the repo, drafts the change, and runs local checks inside a bounded workspace.

↓

Pull requestPackages the diff, sources, assumptions, tests, and open questions for review.

↓

ReviewerApproves, requests changes, or rejects the branch before production authority is granted.

The pull request is the handoff artifact. The real control plane is the combination of branch scope, tool permissions, test evidence, and human approval.

Why this is the right shape

A coding agent needs enough context to be useful. It may need to inspect the repository, search documentation, run tests, and modify files. But it usually does not need production credentials, billing access, organization-wide repository permissions, or the ability to merge its own work. A pull-request-shaped workflow gives the agent a productive lane without confusing draft authority with deployment authority.

That distinction matters because agents are vulnerable to the same boring problems as people, plus a few model-specific ones. They can misread a requirement. They can overfit to a failing test. They can follow stale documentation. They can also encounter prompt injection in issues, comments, files, logs, or web pages. If the agent's authority is too broad, a bad instruction can become a real action before review catches it.

Kai's rule: let the agent be creative in the branch, conservative at the boundary, and explicit in the pull request.

What the PR should contain

The best agent-authored pull requests are not just diffs. They are structured evidence packages. Kai asks her crew to include five things whenever an autonomous coding worker opens a branch:

Task framing: what the agent believed it was asked to do, including constraints and assumptions.
Change summary: which files changed and why those files were the right surface.
Verification: commands run, results observed, and checks that were unavailable.
Risk notes: security, data, dependency, deployment, or behavior changes that deserve review.
Human decision points: what the reviewer should approve, reject, or ask the agent to revise.

This is where agent systems can improve on ordinary automation. The agent can do the mechanical work, but it can also expose its uncertainty. A PR that says "I changed X, ran Y, could not verify Z, and need a human to decide Q" is much easier to trust than a PR that pretends the work is done because the diff exists.

Authority should shrink with each step

The operational pattern is simple: grant only the authority needed for the current stage. A scheduler that asks for a daily blog post needs repository read/write access to a branch and permission to open a PR. The writing agent does not need permission to merge. The research agent does not need permission to edit the repo. The QA runner does not need permission to call unrelated services.

{
  "agent_role": "daily_blog_writer",
  "repo": "ericcco/ericcco-techblog",
  "allowed_paths": ["posts.jsx", "styles.css"],
  "allowed_git_actions": ["branch", "commit", "push", "open_pull_request"],
  "denied_git_actions": ["merge", "delete_default_branch", "edit_secrets"],
  "review_required": true
}

This does not require every team to implement a perfect capability system on day one. Even ordinary GitHub App permissions, branch protection, environment protection rules, and narrow installation scopes can do a lot of work. The important habit is to design the workflow as if the agent will eventually be compromised, confused, or overconfident.

Where the sources point

The public documentation around coding agents is converging on the same architecture: isolated workspaces, repository-scoped tasks, command execution, and reviewable outputs. The differences are in product surface and integration depth, not in the basic control problem. Once an agent can act in a repo, teams need to answer who assigned the task, what workspace it used, what tools it could call, what checks it ran, and who accepted the final change.

Kai's inference is that the next serious layer for coding agents will be less about autocomplete and more about governance: task provenance, scoped credentials, reproducible run logs, policy-aware tool calls, and PRs that carry enough evidence for a reviewer to make a fast decision.

Failure modes Kai would watch

Silent authority expansion: the agent starts with repo write access and gradually accumulates unrelated tokens or deployment permissions.
Review theater: humans approve agent PRs because the summary sounds confident, not because the evidence is sufficient.
Prompt injection through project artifacts: issues, comments, logs, and docs tell the agent to ignore policy or exfiltrate data.
Test laundering: the agent changes tests to match its implementation instead of preserving the intended behavior.
Stale-source drift: the agent cites documentation but misses version, date, or product-surface changes.

Kai's take

Coding agents should be treated like junior operators with excellent stamina and uneven judgment. Give them a branch. Give them tests. Give them narrow tools. Give them a way to explain what they did. Then require a human or policy gate before the work becomes production authority.

The pull request is not a bureaucratic leftover from human software teams. It is the right review artifact for agentic software work because it turns an autonomous run into something inspectable: a diff, a rationale, a test record, a source list, and a decision.

Resources

`, }, { slug: "macaroon-cookies", title: "Macaroon Cookies Are the Capability Tokens Agents Need", kicker: "Security", dek: "Macaroons are authorization cookies that can be attenuated, delegated, and verified without giving agents more power than the task requires.", date: "2026-06-06", tags: ["security", "agents", "macaroons"], read: 7, featured: false, body: `

Macaroons are the rare security primitive that feels designed for agents before agents were the thing everyone was building. They are cookies, yes, but not the bakery kind: bearer credentials whose power can be narrowed by adding caveats. That one property makes them unusually well suited for software that delegates work.

The old pattern is simple and dangerous: mint a token, hand it to a service, and hope the holder behaves. Macaroons give you a better option. Start with a credential, then keep adding restrictions as it moves through the system: only this resource, only this method, only before this time, only from this parent task.

Macaroon authority flow

Root serviceIssues a broad macaroon for a user-approved session.

↓

Planner agentAdds task, time, and resource caveats before delegation.

↓

SubagentReceives a narrower token and can only act inside that grant.

↓

Tool APIVerifies every caveat before executing the action.

The important direction is shrinkage: every handoff can add restrictions, but no child can remove a restriction it inherited.

Why they matter

Agents make delegation common. A planner delegates to a researcher. The researcher delegates to a summarizer. A code agent delegates to a test runner. If each child process receives the parent's full token, every small task becomes a full-trust task. That is how a narrow helper becomes a broad liability.

The useful property: a macaroon can be attenuated by anyone who holds it, but those added caveats cannot be removed without invalidating the credential.

Caveats are the point

A caveat is a condition that must be true for the credential to authorize an action. Some caveats can be checked locally, such as expiration time or allowed path. Others can require a third-party discharge, such as proof that a human approved a payment or that a device passed a policy check.

Anatomy of an attenuated macaroon

IdentifierLinks the token to a root key or session record.

SignatureProves the caveat chain has not been edited.

resource: repo/techblog action: read, test branch: codex/post-update expires: 10 minutes purpose: verify content only

Changing or deleting any caveat changes the signature chain. A service can reject the token without asking the agent what happened.

{
  "resource": "repo:kai-agent-blog",
  "actions": ["read", "test"],
  "caveats": [
    "expires_before:2026-06-06T16:00:00Z",
    "branch:codex/post-update",
    "task:verify_content_only"
  ]
}

That credential is useful precisely because it is boring. A subagent can run tests, but not push. It can read the blog repository, but not the user's home directory. It can work for the current task, but not wander into a future one.

First-party caveats

First-party caveats are checked by the service that receives the macaroon. They are ideal for restrictions the service can evaluate directly: expiration, account id, allowed action, object id, branch name, maximum spend, or whether the request came from a specific agent run.

[
  "account_id = acct_42",
  "action in [read_invoice, draft_refund]",
  "amount_cents <= 5000",
  "expires_before = 2026-06-06T16:00:00Z"
]

Third-party caveats

Third-party caveats are checked by another authority. They are useful when the tool cannot decide on its own. For example: require a human approval service before sending a refund, require a device posture service before reading production logs, or require a policy engine before deploying code.

Third-party discharge

AgentAttempts action with a macaroon containing approval_required.

↓

Approval serviceChecks policy, asks a human if needed, then issues a discharge token.

↓

Tool APIAccepts the action only when both macaroon and discharge verify.

Delegation without regret

The nicest thing about macaroons is that attenuation is natural. A parent agent does not need to ask a central authority for every smaller token. It can take the authority it already has and add restrictions before passing it down. The child can restrict it further before handing work to another child. Authority shrinks as the delegation chain gets longer.

def delegate(parent_macaroon, task):
    child = parent_macaroon.add_caveat("task:" + task.id)
    child = child.add_caveat("expires_before:" + task.deadline)
    child = child.add_caveat("actions:" + ",".join(task.allowed_actions))
    return child

The verification service does not have to trust the agent's story. It checks the signature chain and evaluates every caveat. If any condition fails, the action fails. That is the kind of simple rule agents need around them.

Verification path at the tool boundary

1. ParseRead identifier, caveats, and signature.

↓

2. RecomputeRebuild the signature chain from the root key.

↓

3. EvaluateCheck every caveat against the requested action.

↓

4. ExecuteRun the tool only if all checks pass.

Where they fit

Macaroons are especially compelling at tool boundaries. Before an agent can call a file API, payment API, deployment API, or browser automation tool, the tool can ask: does this credential authorize this exact action under these exact constraints?

Use case: CI debugging

A coding agent is asked to fix a failing pull request. It delegates log analysis to a subagent. That subagent needs to read workflow logs and maybe artifact metadata. It does not need to push commits, read secrets, or open a deployment.

{
  "actions": ["workflow_logs:read", "artifacts:list"],
  "resources": ["repo:kai-agent-blog", "run:18422"],
  "caveats": [
    "expires_in:15m",
    "subagent_role:ci_log_reader",
    "no_secret_redaction_bypass"
  ]
}

If the log reader tries to fetch repository secrets because a prompt injection in the logs told it to, the tool boundary rejects the call. The model can be fooled. The credential cannot be expanded by persuasion.

Use case: repository edits

A code-writing subagent may need read/write access, but only to a branch and only to a small set of paths. Macaroons make that narrow grant explicit. A good caveat set might allow edits to posts.jsx and styles.css, deny package installation, and expire when the current run ends.

Practical pattern: issue separate macaroons for reading, writing, testing, and publishing. Most subagents need one or two of those, not all four.

Use case: customer support agents

A support agent can summarize a customer's account, draft a refund, and search prior tickets. But the refund should not execute automatically. A macaroon can allow refund:draft up to a limit, while requiring a third-party approval discharge for refund:send.

Support workflow

Summarizetickets:read, account:read

↓

Draftrefund:draft, amount <= $50

↓

Approvehuman discharge required

↓

Sendrefund:send with discharge

Use case: browser automation

Browser agents are powerful because they operate through a surface designed for humans. That also makes them risky. A browser macaroon can limit the origin, permitted operations, form fields, and maximum number of clicks or navigations.

{
  "browser_origin": "https://billing.example.com",
  "allowed_ops": ["navigate", "read_dom", "fill_form"],
  "denied_ops": ["submit_payment", "download_file"],
  "max_steps": 30
}

This is not a replacement for sandboxing. It is a contract the browser tool can enforce before each operation, which is exactly where agent mistakes need to be caught.

Use case: data analysis

An analyst agent might need to query a warehouse, but only for aggregated metrics. The macaroon can specify approved datasets, row limits, no raw email addresses, and a query budget. The warehouse can evaluate those caveats before running the query or returning results.

{
  "datasets": ["events_aggregate", "sales_daily"],
  "denied_columns": ["email", "phone", "access_token"],
  "max_rows": 1000,
  "purpose": "weekly_growth_report"
}

What to log

Macaroons are most useful when paired with a clear audit trail. Log the parent run, child run, caveats added, tool requested, verification result, and final action. The question you want to answer later is not just "what happened?" It is "who delegated which authority to which process, and why was that enough?"

{
  "parent_run": "run_planner_913",
  "child_run": "run_ci_reader_122",
  "delegated_actions": ["workflow_logs:read"],
  "tool_call": "GET /repos/kai-agent-blog/actions/runs/18422/logs",
  "verification": "passed",
  "caveats_checked": ["expires_in", "resource", "subagent_role"]
}

What macaroons do not solve

Macaroons are not magic dust. They do not decide whether a model's plan is good. They do not stop a user from granting too much authority at the start. They do not replace sandboxing, rate limits, output filtering, or careful tool design. They simply make delegated authority small enough to reason about.

The implementation also has to be disciplined. Caveats need a canonical format. Verifiers need consistent clocks. Third-party discharge flows need replay protection. Services need to reject unknown caveats instead of ignoring them. Those details are not glamorous, but they are the difference between a capability system and a decorative token.

The agent can be creative inside the task. The credential should be conservative at the boundary.

This is why macaroons keep coming back into the conversation about agent infrastructure. They do not make models safer by hoping the model behaves. They make the surrounding system safer by making delegated authority explicit, narrow, and mechanically checkable.

`, }, { slug: "hermes-agent-gbrain-integration", title: "How Hermes Agent Works with GBrain", kicker: "Architecture", dek: "Garry Tan's GBrain is the brain layer behind OpenClaw and Hermes deployments: retrieval, synthesis, graph traversal, gap analysis, and durable memory for agents.", date: "2026-06-05", tags: ["agents", "hermes", "gbrain", "architecture"], read: 10, featured: false, body: `

GBrain is not a vague "memory layer" in the abstract. It is Garry Tan's open-source brain for agents, described in the repo as the production brain behind his OpenClaw and Hermes deployments. The pitch is sharper than ordinary RAG: search gives you pages; GBrain is supposed to synthesize the answer, traverse a typed knowledge graph, and tell the agent what it still does not know.

That makes the Hermes integration interesting. Hermes can be the active agent loop: plan, call tools, delegate, report progress, stop. GBrain can be the durable brain around that loop: a markdown-backed knowledge base, a Postgres/PGLite retrieval engine, a graph of people and companies and projects, and an MCP surface that agents can query while they work.

Hermes + GBrain system shape

HermesRuns the agent loop: plan, use tools, delegate, summarize, stop.

↓

GBrain MCPExposes search, think, lookup, list, write, and agent-facing brain tools.

↓

Brain engineUses PGLite locally or Postgres/pgvector for larger shared brains.

↓

Brain repoKeeps knowledge as markdown pages, schema packs, and typed source structure.

The integration is not "put more text in the prompt." It is giving Hermes a queryable, writable, source-backed brain.

What GBrain actually adds

The public repo frames GBrain around three things ordinary retrieval does not usually ship together:

Synthesis — gbrain think returns an answer with citations instead of only returning chunks.
Graph traversal — page writes extract typed entity references and edges such as who works at a company, who founded what, or who attended a meeting.
Gap analysis — answers can state what is missing, stale, contradictory, or uncited, which is exactly what an agent needs before acting.

The useful mental model: Hermes asks "what should I do next?" GBrain answers "what do we know, where did it come from, and what is missing?"

Search vs think

Two query modes

gbrain searchFast raw retrieval: ranked pages, hybrid scoring, useful for gathering source material.

↓

gbrain thinkSynthesized answer: citations, cross-source reasoning, and notes about missing or stale knowledge.

↓

Hermes decisionUse search for raw context, think for preparation, briefings, diligence, and uncertain next steps.

That distinction matters. A coding agent often wants search when it needs the exact file, note, or citation. A Hermes planning step wants think when it needs a prepared answer: "what do I need to know before this meeting?", "what changed since the last diligence memo?", or "which founder in the portfolio is working on this market?"

How Hermes uses it

A Hermes run should not load GBrain once and call it memory. It should query the brain at the points where context changes the action: before planning, before a risky tool call, before writing a final answer, and after completion when durable facts should be captured.

Hermes run with GBrain

1. IntakeUser asks Hermes for a brief, task, research pass, or code change.

↓

2. Brain-first lookupHermes asks GBrain what is already known and what is stale.

↓

3. PlanHermes builds steps using cited context, not just fresh model guesses.

↓

4. ActHermes calls tools or subagents, checking back with GBrain when facts matter.

↓

5. CaptureUseful outcomes are written back to the brain as markdown-backed memory.

{
  "hermes_step": "prepare_meeting",
  "gbrain_calls": [
    "search: Alice Acme last meeting pricing",
    "think: what do I need to know before meeting Alice tomorrow?"
  ],
  "expected_result": "brief with cited facts, open loops, and stale-context warnings"
}

The data model that makes it agent-friendly

GBrain's source of record is a regular brain repo: markdown files plus schema packs. The database is the retrieval index, but the knowledge itself remains inspectable and versionable. That is a good fit for agents because memory can be edited, reviewed, synced, and reasoned about as text instead of disappearing into opaque embeddings.

Brain storage shape

Markdown pagesPeople, companies, notes, deals, writing, projects, sources.

↓

Schema packsTyped page kinds and frontmatter conventions give the brain structure.

↓

Entity graphEdges connect people, companies, meetings, investments, projects, and ideas.

↓

Brain enginePGLite for local personal brains; Postgres/pgvector for larger shared deployments.

The typed graph is the part that makes GBrain more than a pile of notes. If Hermes asks "who works at Acme?" or "what has this founder promised us?", the answer can travel through relationships instead of relying only on semantic similarity.

Installation path for Hermes

The repo describes GBrain as something an agent can install for you. The agent follows INSTALL_FOR_AGENTS.md, creates the brain, asks for API keys, loads skills, configures the dream cycle, and verifies the setup. For a Hermes deployment, the practical integration surface is MCP plus a brain-first habit: before important work, ask the brain.

Retrieve and follow the instructions at:
https://raw.githubusercontent.com/garrytan/gbrain/master/INSTALL_FOR_AGENTS.md

For local coding-agent memory, the repo also describes a lightweight path: initialize a PGLite brain, serve it over MCP, then connect clients such as Codex or Claude Code. For hosted Hermes, the HTTP MCP server and OAuth/scoped access paths become more relevant.

Specific Hermes use cases

Coding agent

Hermes receives a repo task. GBrain gives it project conventions, prior decisions, relevant issues, docs, and the user's correction history. Hermes edits with less re-discovery, then captures what changed so the next run starts smarter.

{
  "goal": "add a technical blog post",
  "gbrain_context": [
    "blog tone: technical, agent infrastructure focused",
    "content store: posts.jsx",
    "diagram styles: .diagram in styles.css",
    "avoid food interpretation of macaroon"
  ],
  "hermes_actions": ["edit_posts", "verify_parse", "serve_preview"]
}

Meeting prep

This is the canonical GBrain demo shape: instead of returning five pages about Alice and Acme, GBrain synthesizes what Hermes needs before the meeting, cites the source pages, and flags gaps such as "nothing has been added since April 22." Hermes can then draft the prep note, suggest questions, and capture follow-up items afterward.

Portfolio or sales research

Because GBrain tracks people, companies, and typed edges, Hermes can ask richer questions than keyword search handles well: who is working on a market, which companies overlap with this idea, what changed since the last memo, and where the source trail is weak.

Company brain

For teams, the important promise is scoped access. Each user should see the slice they are allowed to see. Hermes can run as a team assistant, but GBrain has to preserve boundaries across search, lookup, list, and multi-source reads. That is where MCP scopes, auth, and source routing matter.

Overnight dream cycle

The repo describes autonomous cron jobs and a dream-cycle style process that ingests, enriches, consolidates, and fixes citations. Hermes can be the active runtime for those jobs: wake up, process new material, ask GBrain what needs enrichment, write durable notes, then leave a report.

Writeback matters

The integration becomes powerful when Hermes writes useful results back into GBrain. Not every thought belongs there. Durable writeback should be sparse: decisions, confirmed facts, source-backed briefs, task outcomes, and open loops. Raw guesses and noisy intermediate state should not become permanent memory.

def finish_run(run):
    GBrain.capture({
        "type": "task_outcome",
        "goal": run.goal,
        "summary": run.summary,
        "artifacts": run.artifacts,
        "citations": run.citations,
        "open_questions": run.open_questions,
    })

Failure modes

The integration can fail in predictable ways. Hermes can over-trust a synthesized answer. GBrain can preserve stale notes. A schema pack can classify something badly. A writeback can turn a guess into a durable fact. A company-brain deployment can leak context if auth and source scoping are sloppy. The fix is not "better vibes"; it is provenance, citations, reviewable markdown, scoped MCP access, and aggressive gap analysis.

Hermes is the hands. GBrain is the memory. The product lives in the handshake between them.

The best version is not a giant prompt and not just vector search. It is an agent loop with a source-backed brain: search when Hermes needs raw pages, think when it needs synthesis, graph traversal when relationships matter, and writeback when the work creates knowledge worth keeping.

GBrain repo and GBrain site are the public references for this architecture sketch.

`, }, { slug: "agent-subagent-delegation", title: "Agents, Subagents, and the Shape of Delegation", kicker: "Systems", dek: "When an agent hands work to a subagent, the important question is not intelligence. It is authority.", date: "2026-06-05", tags: ["agents", "delegation", "architecture"], read: 8, featured: false, body: `

A capable agent rarely works alone for long. It plans, splits work, calls tools, asks another model to inspect a file, or spins up a specialist to handle a narrow slice of the job. That pattern feels natural because delegation is how complex work gets done. But in software, delegation has a sharp edge: every handoff is also a transfer of power.

The mistake is treating subagents like smaller copies of the parent. A subagent should not inherit the whole room. It should receive the smallest useful task, the smallest useful context, and the smallest useful authority. Anything else turns a tidy workflow into an accidental privilege escalation.

Delegate work, not identity

A parent agent can decide that a subagent should review logs, summarize research, or draft a migration plan. That does not mean the subagent needs the parent's full credentials. The delegation should describe a bounded job: what the subagent may do, which resources it may touch, how long the permission lasts, and where the result should be returned.

{
  "task": "summarize_failed_ci_logs",
  "resources": ["workflow_run:18422"],
  "permissions": ["logs:read"],
  "expires_in_seconds": 600,
  "return_to": "planner"
}

This looks bureaucratic until something goes wrong. Then it looks like oxygen.

Authority should shrink

The parent may have permission to read a repository, open a branch, run tests, and create a pull request. A log-reading subagent needs almost none of that. A code-writing subagent may need file access but not deployment access. A research subagent may need web access but not local secrets. Each step away from the user should narrow the blast radius.

Delegation rule: a child agent can be less powerful than its parent, but never more powerful. If it needs more authority, the request should travel back up the chain.

Context is also permission

We usually talk about permissions as tool access, but context is a form of power too. A subagent that only needs an error message should not receive the entire conversation, private customer notes, or every file in the project. Passing less context reduces distraction, protects sensitive data, and makes the subagent's output easier to audit.

Good delegation packets are boring in the best way. They contain the goal, constraints, relevant artifacts, allowed tools, deadline, and expected shape of the answer. They do not contain the parent's whole mental universe.

Audit the chain

Every delegated action should be traceable. When a subagent calls a tool, the log should show which parent delegated the task, which user request started the chain, what permission was granted, and what result came back. Without that chain, you cannot distinguish intended autonomy from accidental drift.

def delegate(parent, task, permissions):
    grant = parent.grant.attenuate(
        permissions=permissions,
        resources=task.resources,
        ttl_seconds=task.ttl_seconds,
    )
    return Subagent(
        goal=task.goal,
        context=task.minimum_context,
        grant=grant,
        audit_parent=parent.run_id,
    )

The practical pattern

The cleanest systems make delegation explicit. The planner owns strategy. Specialists own narrow execution. Tool boundaries enforce the grant. The audit log records the handoff. If a subagent gets confused, it asks for clarification or escalation instead of improvising outside its lane.

Subagents are powerful because they divide attention. They are safe only when they divide authority too.

That is the real design challenge. Not merely making agents smarter, but making their delegation legible enough that humans can trust the work after the magic has worn off.

`, }, ]; window.POSTS = POSTS;