AI Coding’s Discipline Turn: Three Open-Source Frameworks Superpowers, gstack, GSD Outpace Model Upgrades

Developers Chasing Better AI Code Quality Find Answers in Rule-Enforcement, Role-Separation, and Context Hygiene — Not Smarter Models

AI Coding
Getty Images/Joe Raedle

Six months after Anthropic shipped a plugin system for Claude Code, the most-installed extensions on the platform have nothing to do with new capabilities. They are opinionated rulebooks — and the three that have broken into mass adoption this spring share a thesis that has since spread to 14 different AI coding agents: the bottleneck in AI-assisted software development is not model intelligence. It is discipline.

GSD, the youngest of the three frameworks, marked its 57th release in as many weeks on May 3 and has since shipped further updates, with 61,600-plus GitHub stars and 138 active contributors. Alongside Superpowers' 192,000-plus stars and gstack's near-100,000 stars, the figures outline a category that barely existed at the end of 2025 and now shapes how hundreds of thousands of developers interact with AI coding agents daily.

Superpowers: Discipline as Iron Law

Superpowers, built by Jesse Vincent — the Perl 5 release manager, Keyboardio co-founder, and creator of the Request Tracker ticketing system — launched on October 9, 2025, the same day Anthropic opened Claude Code to third-party plugins. Anthropic accepted it into the official plugin marketplace on January 15, 2026. As of May 2026 it carries 192,000-plus GitHub stars, 17,100 forks, and a v5.1.0 stable release.

The framework ships as a folder of markdown files — 14 skill files, each encoding a phase of the development cycle: brainstorm, plan, implement, review, ship. The key design choice is not what those phases contain but how they are enforced. Every skill file opens with what open-source maintainer Marc Nuri, a Senior Principal Software Engineer at Red Hat who reviewed the framework in detail, described as a "capitalized, non-negotiable rule, an Iron Law, followed by a table of red flags: the rationalizations the agent is most likely to use to skip the rule."

That structure targets a specific failure mode: the tendency of large language models to reason their way out of constraints mid-session. The target, Nuri wrote, is not teaching the agent — because it already knows the rules — but preventing it from talking itself out of following them. Simon Willison, the creator of Datasette and co-creator of Django, called Vincent "one of the most creative users of coding agents" he knows.

The practical result: when a developer asks Claude Code to build something with Superpowers installed, the agent does not write code. It opens a Socratic design conversation, forces approval of a spec, then proceeds through mandatory test-driven development — red, green, refactor, no exceptions. A hard gate blocks implementation until the planning phase is complete.

gstack: Discipline as Role Separation

Y Combinator President and CEO Garry Tan open-sourced gstack on March 12, 2026. It hit 50,000 GitHub stars in 16 days — TechCrunch covered the launch — and has since climbed toward 100,000 stars with 284 commits and 49 contributors.

Where Superpowers constrains what the agent does inside a task, gstack constrains which perspective the agent occupies before a task begins. The framework splits a single Claude Code session into 23-plus named roles — CEO, Designer, Engineering Manager, Release Manager, QA, Doc Engineer, Chief Security Officer — each implemented as a slash command with its own priorities and constraints. The premise, drawn directly from Tan's README, is that "a single builder with the right tooling can move faster than a traditional team" — but only if the agent is forced to switch modes between product, engineering, and quality work rather than blend them.

Tan claims to have averaged roughly 11,400 logical lines of code per day over a recent 60-day period, describing that as approximately 810 times his 2013 pace, with the methodology and a reproduction script published in the repository itself for verification. Independent reviewers have been skeptical but not dismissive: line counts are a weak proxy for code quality, and gstack's most credible value proposition is structural — the role-switching workflow — not its author's personal output figures.

The reception was polarized almost immediately. YouTuber Mo Bitar produced a video titled "AI is making CEOs delusional," arguing that gstack is essentially "a bunch of prompts in a text file." On Product Hunt, startup founder Sherveen Mashayekhi wrote that if Tan were not the CEO of Y Combinator, the project would not have been featured there.

Those critiques are not entirely wrong. gstack is markdown files. Any experienced Claude Code developer has likely assembled some version of this workflow privately. But that misses the point that multiple reviewers converged on: the value of gstack is not proprietary technology. It is a battle-tested, opinionated workflow encoded once rather than reinvented per project — and co-developed with 49 contributors using the same Claude Opus model the framework is designed to guide.

GSD: Discipline as Context Hygiene

GSD — Get Shit Done — takes the diagnosis in a third direction. Created in December 2025 by developer Lex Christopherson, who publishes under the names TÂCHES and "glittercowboy," it targets what it calls context rot: the quality drop that occurs as an AI coding session fills its context window.

The mechanism is architectural. Rather than constraining what an agent does or what role it occupies, GSD breaks work into atomic plans, then executes each plan in a fresh sub-agent session with a clean 200,000-token context window. The main session — used only for orchestration — is kept at 30 to 40 percent of its window throughout the project. The goal is that task 50 maintains the same quality as task 1.

GSD has shipped 60-plus releases since December 2025. The v1.40.0 update on May 3 added a Minimal Install Profile that cuts system prompt overhead from roughly 12,000 tokens to 700 — a 94 percent reduction that makes GSD viable for local models and metered API plans. The v1.42.x series, released during the week of May 11, added per-phase model selection, dynamic routing with failure-tier escalation, and a package legitimacy gate against what the release notes call "slopsquatting" — malicious packages designed to impersonate legitimate dependencies.

The framework now supports 14 AI coding agents — Claude Code, Cursor, Codex, Copilot, Gemini CLI, Windsurf, Cline, Augment, and six others — and ships an installer that auto-detects and configures the right file layout for each runtime.

Three Diagnoses, One Convergence

What makes the moment notable is not any single framework but their convergence on the same underlying argument. None of the three adds model capability. Superpowers is largely a system of prohibitions. GSD describes itself as a "context engineering system." Garry Tan, in posts surrounding gstack's launch, framed his tools explicitly as forced cognitive switching, not intelligence amplifiers.

The implicit claim across all three is that current frontier coding models are routinely capable enough to ship production code — and routinely talked out of doing it well by the way they are prompted. Each framework codifies a different answer to which intervention matters most: rule discipline (Superpowers), role separation (gstack), or context hygiene (GSD). The community has begun treating them as complementary layers rather than competitors — gstack for strategic decisions, GSD for context stability, Superpowers for execution — with unified install guides and combination workflows emerging across developer blogs and documentation.

Anthropic has, at minimum, implicitly endorsed the direction by featuring Superpowers as a verified marketplace plugin and by consolidating its own slash commands and skills systems into a single unified skills format — a change that makes community-built skill frameworks the canonical extension mechanism for Claude Code.

Adoption Caveats and What Comes Next

The adoption numbers carry caveats. GitHub stars and install counts measure attention, not outcomes. GSD's claim of use by engineers at Amazon, Google, Shopify, and Webflow appears in its own README and has not been independently confirmed by those companies. Tan's line-count figures have drawn consistent criticism from developers who note that raw output volume is a poor proxy for shipped value.

The trajectory is also forking in ways that suggest the current plugin architecture is an intermediate state. GSD's v2 TypeScript rewrite moved from markdown prompts to a TypeScript application specifically because, as its maintainers noted, injecting instructions through slash commands left no actual control over context windows, sessions, or execution flow. The v1.42.x series adds per-phase model selection and dynamic routing — capabilities that sit closer to infrastructure than to prompting.

That shift — from discipline-as-prompting to discipline-as-infrastructure — marks the likely next phase of this category. If it succeeds, the methodology will no longer depend on an agent choosing to follow rules. It will be the runtime.

For now, the simpler reading is the one Superpowers keeps pushing. The agents already know how to write code. They need to be told, in writing, not to start until they have planned.

ⓒ 2026 TECHTIMES.com All rights reserved. Do not reproduce without permission.

Tags:Coding
Join the Discussion