AI Coding Maturity Model: Practical Guide — Action Plans and Operating Cadence
Table of Contents
Challenge
The AI Coding Maturity Model tells you where you stand. But what do you actually do to move to the next level?
A maturity model is useful as a map, but a map alone won't get you to your destination. This post covers concrete action plans for the L1→L2, L2→L3, and L3→L4 transitions, plus an operating cadence to keep the model from becoming shelfware.
L1→L2: A 30-Day Action Plan
The transition from Level 1 (Ad-hoc Exploration) to Level 2 (Individual Optimization) has the highest ROI across the entire maturity model. Policy creation and tool selection are low-cost and the results show up immediately.
Week 1: Assess the Current State
- Run a 5-minute survey on agent usage across the team (tool names, frequency, primary use cases, personal vs. company billing)
- Use the results to identify 2–3 people who are already using agents effectively
Skipping this step and jumping to "everyone starts at once" is a common failure mode. Understanding reality is the starting point.
Week 2: Establish Policy
Work with the security team to create a usage policy. The four minimum decisions:
- Approved tools list
- Scope of code that must not be sent to agents
- Data retention policy
- Whether personal accounts are allowed
The key is defining "you may use these within this scope" rather than imposing a blanket ban. To prevent Shadow AI — developers sending proprietary code through personal accounts — explicit permission is more effective than prohibition.
Week 3: Start the Pilot
- Ask the early adopters identified in Week 1 to set up initial CLAUDE.md configurations
- Compare PRs for similar tasks with and without agent assistance
- Hold a 15-minute end-of-week sharing session on results and challenges
The comparison doesn't need to be a rigorous experiment. Simple labeling like "this PR used an agent" vs. "this one didn't" is enough to spot trends.
Week 4: Evaluate and Decide on Rollout
- Assess quantitative impact from before/after PR comparisons
- Categorize "what was effective" vs. "what didn't work well"
- Draft a team-wide rollout plan (tools, budget, training)
You don't need a perfect setup after 30 days. The entry point for Level 2 is: "a policy exists, tools are selected, and at least a few people are consciously using agents."
L2→L3: Start with Review Guidelines
The highest-impact action for the L2→L3 transition isn't building a skills library — it's creating review guidelines for agent-generated code.
The reason is simple: code review is something everyone does every day. Skills libraries tend to split into "builders" and "consumers," but review guidelines affect the entire team. Building shared understanding of "how to evaluate agent-generated code" raises AI literacy across the board.
Review Criteria for Agent-Generated Code
The following is a starting template for review guidelines. Customize it for your team's tech stack and quality standards.
Always check:
- Business logic correctness (agents frequently misunderstand requirements)
- Edge case handling (agents tend to bias toward happy paths)
- Security (input validation, authentication/authorization gaps)
- Consistency with existing code (naming conventions, architectural patterns)
Watch for:
- Unnecessary dependency additions
- Over-abstraction (agents tend to favor abstraction)
- Test coverage (are generated tests only covering happy paths?)
After Review Guidelines Are Established
Once review guidelines have taken root, move to the next steps:
- Build a team-shared CLAUDE.md (or agent configuration) — Convert individual tacit knowledge into explicit shared knowledge. Don't aim for perfection; start with a draft based on what effective developers are already using
- Build a skills library — Define commonly used workflows (test creation, refactoring, review, etc.) as skills under version control
- Incorporate agent usage into new member onboarding — Creating an environment where new hires can use agents from day one lets team standards propagate naturally
L3→L4: Bootstrapping Cross-Team Visibility
Once intra-team standardization stabilizes, the next step is cross-team visibility. The biggest obstacle here isn't technology — it's getting teams to agree.
When different teams use different agents and prioritize different metrics, defining "common indicators" is politically difficult. The effective approach is starting with the minimum set of shared metrics that all teams can agree on.
Minimum Shared Metrics
Limit required metrics to three:
- Monthly active user rate — measures adoption breadth
- Average agent cost per PR — measures cost efficiency
- Review revision rate for agent-generated code — measures quality
Beyond these, each team selects optional KPIs that fit their context (spec completion rate, autonomous execution success rate, etc.).
Keeping required metrics to three lowers the consensus barrier while creating a foundation for cross-team comparison.
Metrics Operation Pitfalls
Defining metrics is meaningless without thoughtful operation:
- Separate comparable from non-comparable metrics — Usage rate can be compared across teams, but cost per PR depends on task characteristics and can't be directly compared
- Pre-define action thresholds — "If revision rate exceeds 30%, revisit review guidelines." Decide responses to metric changes upfront
- Don't let optimization become the goal — Prevent "increase agent usage rate" from becoming a target that forces agents onto unsuitable tasks
Operating Cadence for the Maturity Model
Executing action plans isn't the finish line. A model that isn't revisited regularly becomes shelfware. You need a cycle of assessing your current position and deciding the next action.
Quarterly Review
Run a level assessment every three months and check for changes since the last review. Even if the level hasn't increased, evaluate improvements within the same level — for instance, a richer skills library at Level 3.
Specifically, have each team lead answer the level assessment checklist from the previous article and share results across the organization. A 15-minute self-check is sufficient.
Connecting Metrics to Maturity
Use dashboard data as evidence for maturity assessments. Quantitative data like "adoption rate increased" or "cost per PR decreased" becomes input for level transition decisions. Conversely, if the level increased but metrics didn't improve, the progress may be purely formal.
The three shared metrics defined in the previous section (monthly active user rate, cost per PR, review revision rate) form the foundation for this connection.
Bottom-Up Improvement Proposals
Maturity improvement shouldn't be exclusively top-down. Create channels for frontline developers to propose changes when they see opportunities. The simplest approach: solicit improvement proposals during quarterly reviews.
Takeaways
- L1→L2 can start in 30 days — A four-week cycle of assessment, policy, pilot, and evaluation gets the first step done. Don't wait for perfection.
- Review guidelines are the L2→L3 catalyst — Starting standardization with an activity everyone does daily lets skills library building and onboarding improvements follow naturally.
- L3→L4 starts with three shared metrics — Keep required metrics minimal to lower the consensus barrier, combining them with team-specific optional KPIs.
- Run the model on a quarterly cadence — Beyond executing action plans, regular assessment connected to metrics keeps the maturity model a living framework.
Series articles:
