Engineering Notes

Continuous AI: What GitHub's Agentic Workflows Actually Change

TV
Thiago Victorino
10 min read

GitHub shipped Agentic Workflows yesterday. The announcement led with markdown-instead-of-YAML authoring. Most coverage followed that thread. This is understandable and also wrong. The authoring format is a convenience. The actual shift is a new category of work entering the CI/CD pipeline: non-deterministic tasks executed by AI agents, running on every push, governed by the same infrastructure that runs your tests.

That distinction --- deterministic versus non-deterministic work --- is the only thing in this announcement that matters at the architectural level.

Deterministic Work Has a Ceiling

CI/CD solved a specific class of problem: work that produces the same output given the same input. Build the code. Run the tests. Check the lint rules. Deploy to staging. These are functions in the mathematical sense. Deterministic, repeatable, auditable. The entire value of CI/CD rests on this property. You trust the pipeline because it behaves identically every time.

But engineering organizations do substantial work that is not deterministic. Triaging an issue requires reading the report, understanding the codebase context, and making a judgment call about severity and routing. Keeping documentation aligned with code requires recognizing when a change is significant enough to warrant a docs update, then writing the update. Investigating a CI failure requires reasoning about what broke, why, and whether the fix is in the code or the test.

These tasks share three properties. They require judgment. They resist automation through traditional scripting. And they are done inconsistently, if they are done at all.

GitHub’s argument is that this class of work --- judgment-requiring, non-deterministic, perpetually under-resourced --- can now be systematized. Not automated in the CI/CD sense. Systematized in the sense that an AI agent can do a reasonable first pass, every time, on a schedule, without someone remembering to do it.

They are calling this “Continuous AI” --- a third column alongside Continuous Integration and Continuous Deployment. The framing is deliberate. CI handles deterministic build-and-test. CD handles deterministic deployment. Continuous AI handles non-deterministic tasks that require judgment.

The Platform Play Hiding in Plain Sight

The announcement supports three AI engines: GitHub Copilot CLI, Claude Code, and OpenAI Codex. This is the detail most coverage treated as a feature list. It is actually the strategic signal.

GitHub is not building a product. It is building a platform.

A product would lock you into Copilot. A platform lets you bring whatever agent you trust most and routes it through GitHub’s infrastructure --- their runners, their security model, their permissions system, their audit log. The agent is interchangeable. The governance layer is not.

This mirrors the pattern we described in the convenience-to-control arc with GitHub Actions. When GitHub becomes the execution environment for your AI agents, you get immediate convenience and long-term dependency. The difference this time is that the dependency is on the governance infrastructure, not just the execution runtime. And governance infrastructure is harder to replace than YAML files.

Organizations evaluating this should understand what they are actually adopting. The agent is a component. The workflow runtime is the commitment. Choosing GitHub Agentic Workflows means choosing GitHub as the governance layer for your AI-assisted development processes. That decision deserves the same scrutiny as any infrastructure commitment --- more, given that the system is in technical preview and the official documentation calls it a “research demonstrator” that “may change significantly.”

The Agent Never Gets Write Access

The security model is the most architecturally interesting part of the announcement, and the least discussed.

GitHub Agentic Workflows implement a five-stage defense-in-depth pipeline. Pre-activation checks verify role permissions and lock file integrity. Input sanitization neutralizes @mentions, filters URLs, and enforces size limits. During execution, the agent operates with read-only access --- all writes are buffered as artifacts, never applied directly. A separate AI-powered threat detection job analyzes the agent’s output before anything is written. Only after passing threat detection do permission-scoped write jobs execute the changes.

Ken Muse, a GitHub Staff DevOps Architect, put the core principle directly: “the agent itself never gets write access.”

This is a meaningful architectural decision. The dominant pattern in AI agent deployment today is to give the agent whatever permissions it needs to complete its task and hope the prompt constrains its behavior. Hope is not a security model. GitHub’s approach treats the agent as an untrusted subprocess --- capable of useful work, but never trusted with direct write access to the repository.

The Agent Workflow Firewall routes all traffic through a Squid proxy with domain allowlisting. Each MCP server runs in an isolated container. The lock file mechanism ensures that the compiled workflow matches the markdown source --- no silent modifications between what a human authored and what actually runs.

This is defense-in-depth applied to AI agent execution. It is also, whether GitHub intended it or not, a reference architecture for how organizations should think about agent permissions broadly. Read-only by default. Writes buffered and reviewed. Threat detection as a separate, independent stage. Every agent deployment in your organization should follow this pattern, regardless of whether it runs on GitHub.

The Maturity Gap You Should Not Ignore

GitHub’s own positioning of this feature is contradictory in a way that matters for planning.

The GitHub Changelog labels it a “technical preview.” The official documentation labels it a “research demonstrator” created by GitHub Next and Microsoft Research, with the explicit disclaimer that it “may change significantly.” These are not the same maturity level. A technical preview implies a product on its way to general availability. A research demonstrator implies an experiment that may or may not become a product.

The collaboration between GitHub Next (GitHub’s research lab) and Microsoft Research (led by Don Syme, the creator of F#, and Peli de Halleux) suggests this is closer to the research end. The ideas are serious. The implementation is real. The commitment to a specific API surface, pricing model, and long-term support is absent.

Peli de Halleux said that the barrier to entry is “basically all the way to almost zero.” That is true for trying it. It is not true for depending on it.

Organizations should experiment with this. Organizations should not build critical workflows on it. The distinction between exploring a research demonstrator and adopting infrastructure is the distinction between learning and liability. GitHub has been transparent about where this sits. The question is whether organizations will read the label or just the headline.

What 1,400 Tests for Eighty Dollars Actually Means

GitHub Next reported generating over 1,400 tests across 45 days for approximately eighty dollars in LLM token costs. This is a concrete data point that deserves both attention and context.

The cost efficiency is real. Generating tests at this scale through human effort would cost orders of magnitude more. For coverage-gap analysis and regression test generation --- the kind of work that engineering teams perpetually defer because it is important but not urgent --- the economics are compelling.

But the number of tests generated is not the same as the number of useful tests generated. Test generation is one of the use cases where AI agents perform well because the success criteria are concrete: does the test compile, does it run, does it cover a previously uncovered path. These are measurable, verifiable outcomes. This is exactly the kind of non-deterministic work --- judgment-requiring but objectively evaluable --- where Continuous AI should start.

Compare this to the harder use cases in GitHub’s list. Continuous documentation requires judging what changes are significant enough to document. Continuous code simplification requires evaluating whether a proposed refactor actually improves the codebase or just rearranges it. Continuous triage requires understanding organizational context that does not exist in the repository. These are tasks where the agent’s judgment is harder to verify and the cost of wrong outputs is higher.

The eighty-dollar number is not a benchmark for all Continuous AI. It is a benchmark for the easiest category --- and organizations should scale their expectations accordingly.

The Absorption Problem Returns

We have written previously about the absorption problem: the gap between what AI systems can produce and what organizations can responsibly integrate. Code generation velocity has outpaced review capacity. The same dynamic applies here.

Each of the six Continuous AI use cases --- triage, documentation, simplification, testing, quality hygiene, reporting --- produces output that a human must evaluate. An auto-generated issue triage must be reviewed before routing decisions are finalized. An auto-generated documentation update must be verified for accuracy before merging. An auto-generated refactoring PR must be reviewed with the same rigor as a human-authored one.

The risk is that organizations deploy all six use cases simultaneously, generating a volume of AI-produced artifacts that exceeds their capacity to review. The output looks like productivity. Triaged issues, updated docs, new tests, refactoring PRs, health reports --- all appearing on schedule, all requiring human judgment to validate. If the review capacity does not scale with the generation capacity, the organization is not getting Continuous AI. It is getting continuous noise.

Home Assistant is using Agentic Workflows to analyze thousands of issues for trend identification. CNCF is using it for documentation automation and cross-org reporting. These are organizations with the engineering maturity to absorb AI output at scale. Most organizations are not Home Assistant or CNCF.

What This Means for Your Organization

  • The concept of Continuous AI is sound. Non-deterministic, judgment-requiring work that is perpetually under-resourced is a legitimate category for AI agent automation. The framing --- CI for deterministic work, Continuous AI for non-deterministic work --- is a useful mental model regardless of whether you use GitHub’s implementation.

  • The security architecture is a reference pattern. Read-only agent execution, buffered writes, independent threat detection, domain-allowlisted network access. This is how every AI agent deployment should be structured. Study it even if you never use GitHub Agentic Workflows.

  • The maturity level does not support production dependency. A research demonstrator from GitHub Next and Microsoft Research is an experiment, not infrastructure. Experiment with it. Learn from it. Do not build critical processes on it until GitHub commits to a stability guarantee.

  • Start with objectively verifiable outputs. Test generation worked at scale for eighty dollars because test quality is measurable. Begin with use cases where the agent’s output can be mechanically verified. Move to judgment-heavy tasks only after you have built the review capacity to absorb them.

  • The platform commitment is the real decision. Choosing an agent is choosing a component. Choosing GitHub as the runtime for your AI workflows is choosing governance infrastructure. Evaluate the second decision with the rigor it deserves.

  • Do not deploy all six use cases at once. Each one generates output that requires human review. Deploying them sequentially, starting with the most verifiable and lowest-risk, is not conservative. It is the only approach that does not overwhelm your review capacity.


Sources


At Victorino Group, we help organizations design governance architectures for AI agent deployment --- from permission models to review workflows to absorption capacity planning. If you are evaluating where Continuous AI fits in your engineering organization, reach out or visit victorinollc.com.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation