MengNotes
BlogTagsAbout
Home/Blog/From Dev Experience to a Complete Skill: How Engineers Can Partner with AI to Package Expertise into Executable Specs
AI

From Dev Experience to a Complete Skill: How Engineers Can Partner with AI to Package Expertise into Executable Specs

A Skill is a structured document that makes AI work your way. This article walks through how software and data engineers can systematically extract, structure, test, and iterate on their domain knowledge to produce a reliable, AI-executable Skill.

March 28, 20262,660 words14 min read
#ai-agent#skill-engineering#prompt-engineering#developer-workflow#knowledge-management

Your Engineering Intuition Is Your Most Valuable — and Most Fragile — Asset

If you have three or more years of software or data engineering experience, your head is packed with reflexive judgments.

An on-call alert fires — you know which dashboard to check first, which log group to open, and which keyword to grep for. You inherit someone else's data pipeline — your first glance isn't at the logic but at whether there's a null check and schema validation. A new requirement lands — you mentally size it as a one-day or one-week job within three seconds.

These judgments happen almost without thinking. But ask yourself: "Could you write this all down so a new hire could follow it step by step?" You'd probably pause.

Knowledge management research calls this "knowing how to do it but not being able to explain it" tacit knowledge. In Nonaka's SECI model from the last century, "Externalization" — converting tacit knowledge into explicit form — is the most critical yet most bottlenecked step in any organization.

Then AI-assisted development tools became part of everyday work, and a new medium emerged: Skill.

A Skill is not documentation, not a README, not a prompt template. It is a workflow specification that AI can automatically discover, load, and execute step by step. A well-written Skill turns your engineering intuition into an API that AI can call.

This article breaks down the entire journey from "experience in your head" to "a reliable, working Skill."


How Skills Work: AI Doesn't Read the Whole Thing at Once

Before you start writing, understand how AI actually uses your Skill — otherwise you'll invest effort in the wrong places.

AI doesn't load a Skill by "opening a document and reading it top to bottom." It follows a Progressive Disclosure pattern rooted in 1980s HCI principles — surface the most critical information first, reveal details only when needed. AI tools apply this same philosophy to the Skill loading mechanism:

Layer 1: The description in frontmatter. AI sees this in every conversation. Token cost is tiny, but these few sentences are how it decides "Is this Skill relevant to the current task?" This is the make-or-break filter.

Layer 2: The SKILL.md body. Only loaded into context after AI determines the Skill is relevant. Your full instructions, decision logic, and constraints live here.

Layer 3: Resources in the references/ directory. API docs, templates, example code. AI only reaches for these when it has a concrete need.

For engineers, this architecture is intuitive — it mirrors how you'd structure a library. The description is the opening paragraph of a README, the body is the API docs, and references are the examples/ directory.

Once you understand this, you know: if the frontmatter isn't right, nothing else matters.


Step 1: Mine Your Experience from Real Scenarios

The most common mistake engineers make is skipping the extraction step and jumping straight into writing SKILL.md.

The result is either too abstract ("analyze the problem and propose a solution") or too granular (listing fifteen CLI commands without explaining when to use which one).

The right approach is to revisit your actual work scenarios with the lens of "What would AI need to know to do this in my place?" Review at least three to five real cases.

Practical Tip

Open your work notes, Slack conversations, or PR review history from the past month. Find tasks you've done repeatedly. If you can identify the same judgment pattern across three different cases, that's a strong candidate for a Skill.

Then use AI for a structured interview. Don't ask it to "write a Skill for you" — ask it to extract patterns from your cases:

I recently handled these three data pipeline incidents:
[Case A: Upstream schema change caused downstream null pointer]
[Case B: Kafka lag spike caused processing delays]
[Case C: S3 permission update caused silent ETL job failure]

From these three cases, please identify:
1. What criteria I used to classify the problem type
2. My fixed investigation sequence or priority order
3. Under what conditions I skip certain steps
4. Things I absolutely never do (e.g., never manually modify production tables)

This works because AI excels at extracting patterns from concrete examples — but it can't guess your preferences from thin air. You provide the raw material; it surfaces regularities you hadn't noticed.

After a few rounds of back-and-forth, you'll have a rough but authentic process transcript. This isn't the final Skill, but it's your most genuine raw material.


Step 2: Compress Raw Material into Engineering Structure

With a transcript in hand, the next step is structuring it. This step determines whether AI can follow your process as a spec rather than freestyling.

From a data engineering perspective, I think of it as schema design — except you're defining not data fields but the "judgment fields" and "transformation rules" for AI when processing a task.

A good Skill body typically consists of the following building blocks (not all required every time — choose based on complexity):

Decision Branches: Let AI Take Different Paths Based on Input

Your workflow has decision points. The same type of problem — different source, different severity — may require a completely different approach.

A Skill without decision branches forces AI onto the default path every time — usually whichever it considers "safest," which isn't necessarily what you need.

When designing decision branches, keep the node count to four or fewer. Each additional judgment layer degrades AI's execution accuracy. Same principle as designing branching logic in a data pipeline — too many branches equals no branches.

Step-by-Step Procedures: Write It Like a Runbook, Not a Manual

Execution steps should be written as directives AI can act on directly, not concept descriptions.

Bad: "Check data quality." Good: "Run null count on the latest batch of the target table; if it exceeds 5%, halt subsequent steps and report."

Think of each step as a runbook entry — the kind of clarity that lets you follow it at 3 AM when your brain is half asleep. AI needs that same level of clarity.

The NEVER List: Your Hard-Won Lessons

This is the section with the highest ROI in the entire Skill.

Community practice has repeatedly validated that telling AI "what not to do" is often more impactful than telling it "what to do." Without explicit constraints, AI defaults to the most generic approach — and those generic approaches happen to be the most dangerous in engineering contexts.

Every NEVER rule should map to a real incident you've lived through. For example:

  • NEVER modify production resources without a dry-run
  • NEVER assume upstream schema won't change — always do defensive parsing
  • NEVER write credentials into code or logs

These aren't generic security guidelines — they're things you learned the hard way. Their value comes precisely from real experience — AI won't generate these rules on its own because it doesn't know what went wrong in your organization.

Done Criteria: What Does "Finished" Actually Mean?

A Skill without done criteria leaves AI unsure where to stop. It might finish too early or keep going down rabbit holes it doesn't need to explore.

Good done criteria should be mechanically verifiable, for example:

  • All relevant tests pass
  • Output files conform to the specified schema
  • git diff is within expected scope
  • No lint errors or type errors

Same thinking as a data pipeline quality gate — you wouldn't let a pipeline go live without passing data validation, and you shouldn't accept a Skill execution result that hasn't passed acceptance criteria.


Step 3: Frontmatter — Three Lines That Decide Your Skill's Fate

Back to Layer 1 of progressive disclosure. The description in frontmatter is the sole basis for AI's decision to load or skip your Skill.

Many people treat description as a field to fill in casually. In reality, it's closer to an API endpoint routing rule — it defines which requests should be routed here.

A well-designed description addresses three dimensions simultaneously:

Positive trigger conditions (when to apply this Skill):

USE FOR: data pipeline failure diagnosis, silent ETL job troubleshooting,
upstream-downstream schema mismatch investigation

Negative exclusion conditions (when not to use it):

DO NOT USE FOR: local dev environment debugging, SQL query optimization,
new pipeline architecture design (use pipeline-design skill instead)

Semantic trigger words (how users might phrase it):

Trigger words: pipeline broken, ETL failed, data missing,
schema mismatch, job stuck

Negative exclusion conditions are the most overlooked yet most important design element. A Skill without boundaries is like an API endpoint with no content-type check — every request gets through, but the ones that actually belong are drowned in noise.


Step 4: Let AI Generate the First Draft, Then Correct It with Your Experience

With structured material ready, you can now ask AI to assemble the first version of SKILL.md.

Your prompt should include full context — not just "write me a Skill":

I need to create an Agent Skill for [one-sentence description].

Here's the material I extracted from real cases:

## Trigger Conditions
[trigger conditions from Step 1]

## Out-of-Scope Scenarios
[scenarios where this Skill should NOT apply]

## Decision Branches
[your decision nodes]

## Execution Steps
[your step-by-step workflow]

## Red Lines
[your NEVER list]

## Done Criteria
[verifiable completion conditions]

Please output in SKILL.md format, with frontmatter including name and description.
The description should include both positive trigger words and DO NOT USE FOR exclusions.

AI-generated first drafts typically have two problems:

  1. Overly generic step descriptions. AI abstracts your specific logic — turning "check Kafka consumer group lag" into "check message queue delay" — which loses precision.
  2. Including things AI already does by default. "Ensure the code compiles" or "use correct Markdown formatting" are pure noise — AI doesn't need you to remind it of basics, and writing these wastes context space.

After receiving the draft, your correction tasks are:

  • Revert generalized steps back to your specific scenario language
  • Remove all statements about things AI already does by default
  • Verify every NEVER rule maps to a real, memorable incident
Quality Check Mindset

For every line you write into a Skill, ask yourself: "If I remove this line, will AI's output get worse?" If the answer is "no," remove it. The truly valuable content is what only you know and AI can't figure out on its own. Redundant content doesn't just waste tokens — it dilutes the weight of genuinely important instructions.


Step 5: Three-Dimensional Testing — A Skill Isn't Ready Just Because It's Written

You wouldn't deploy a data pipeline straight to production the moment it compiles. Same goes for Skills.

Testing a Skill involves three distinct verification dimensions, each catching a different category of problems:

Trigger Boundary Testing

Is your description accurate? Prepare three sets of conversations:

  • Positive match: Clearly belongs to this Skill's scenario. Example: "Help me figure out why pipeline X didn't run this morning"
  • Synonym rewrite: Same meaning, different phrasing. Example: "ETL job X's output table is empty today — what happened?"
  • Unrelated: Completely different intent. Example: "Write me a Python function to read a CSV"

If the first two don't trigger, your description is too narrow. If the third also triggers, it's too broad.

Process Compliance Testing

Once triggered, does AI actually follow the steps you wrote?

The point isn't whether AI completed the task — it's whether it skipped steps. When AI encounters vague step descriptions, it tends to self-assess importance and skip what it considers unimportant.

Skipped steps almost always point to the same root cause: the instruction for that step isn't specific enough. Changing "check relevant settings" to "read the source_table and target_table fields in config.yaml and verify both sides have matching schema versions" usually fixes the problem.

With-Skill vs Without-Skill Comparison Testing

Run the same task twice — once with the Skill enabled and once without. Compare output quality.

If the difference isn't significant, your Skill isn't actually guiding AI — likely because most of the body's content is stuff AI would do anyway. Time to go back and re-distill, adding more content that genuinely comes from your unique experience.


Iteration Is the Main Event; the First Version Is Just the Starting Point

If you've built data pipelines, you know this: the first version is never the final version.

Schemas evolve, upstream sources change formats, business logic shifts. Your pipeline needs ongoing maintenance. Skills are exactly the same.

After each time AI executes your Skill, observe where the output diverges from your expectations. Don't just accept or reject — trace back to which rule wasn't clear enough and fix it.

Common iteration scenarios:

AI doesn't trigger the Skill: Usually the description lacks sufficient keyword coverage. An effective debugging method — ask AI directly "When would you use this Skill?" It will quote the description back to you, and you'll immediately see what's missing.

AI triggers too often: Add negative exclusion conditions. Explicitly tell it "this isn't your job — for scenario X, use a different Skill."

AI completes but the result is wrong: Step descriptions aren't specific enough, or a decision branch is missing a scenario.

AI does something it shouldn't: The NEVER list needs expansion — add the new edge case you just discovered.

Each fix makes the Skill more precise and more reliable. This cycle is the process of gradually "compiling" your experience into the Skill.


Classification Thinking: Not Every Skill Produces Code

Engineering experience isn't limited to writing code. Your judgment takes different forms at different stages, and the Skill design should reflect that.

Companion/Thinking type: Guides you through clarifying your thoughts without directly producing output. For example, requirements review — AI shouldn't jump into implementation but should ask you a series of questions first. Characteristics: open-ended input, output is a decision direction rather than code files, minimal tool requirements.

Planning type: Converts fuzzy requirements into actionable plans. Output is typically structured documents (task lists, architecture decision records). These Skills draw on your schema design experience — which fields are required, which have defaults, which formats upstream and downstream systems accept.

Execution type: Given a clear spec, it gets to work. Writing tests, doing code review, running deployment workflows. These Skills have the most detailed steps, the strictest NEVER lists, and the most rigorous done criteria — because execution-type errors are the most expensive.

Before you start writing, determine which type your Skill belongs to. Different types have different structural centers of gravity — writing a planning Skill with execution-type rigidity makes it too rigid, writing an execution Skill with thinking-type open-ended guidance makes it uncontrollable.


Closing: Your Experience Deserves to Be Compiled

The barrier to creating a Skill is low — a folder, a Markdown file. But writing it well takes not prompt engineering tricks, but a deep understanding of your own workflow.

Anthropic repeatedly emphasizes one principle when designing agent systems: the interface design of a tool is as important as the tool itself. SKILL.md is the interface between you and AI. The care you put into designing it should match the care you put into your code.

Ultimately, a good Skill isn't just a technical document for AI. It's a crystallization of years of your experience — version-controllable, testable, continuously evolvable.

Those things in your head that you "just know without thinking" deserve to be written down. Not in a Notion page people occasionally browse, but as an executable spec that AI can run directly — so your judgment automatically takes effect in every task.


References

  • Anthropic — Building effective agents
  • VS Code — Agent Skills
  • Claude API Docs — Skill authoring best practices
  • Block Engineering — 3 Principles for Designing Agent Skills

Table of Contents

Your Engineering Intuition Is Your Most Valuable — and Most Fragile — AssetHow Skills Work: AI Doesn't Read the Whole Thing at OnceStep 1: Mine Your Experience from Real ScenariosStep 2: Compress Raw Material into Engineering StructureDecision Branches: Let AI Take Different Paths Based on InputStep-by-Step Procedures: Write It Like a Runbook, Not a ManualThe NEVER List: Your Hard-Won LessonsDone Criteria: What Does "Finished" Actually Mean?Step 3: Frontmatter — Three Lines That Decide Your Skill's FateStep 4: Let AI Generate the First Draft, Then Correct It with Your ExperienceTrigger ConditionsOut-of-Scope ScenariosDecision BranchesExecution StepsRed LinesDone CriteriaStep 5: Three-Dimensional Testing — A Skill Isn't Ready Just Because It's WrittenTrigger Boundary TestingProcess Compliance TestingWith-Skill vs Without-Skill Comparison TestingIteration Is the Main Event; the First Version Is Just the Starting PointClassification Thinking: Not Every Skill Produces CodeClosing: Your Experience Deserves to Be CompiledReferences
← Back to all posts

© 2024-2026 MengNotes | All Rights Reserved