티스토리 수익 글 보기
The post GitHub Copilot CLI 101: How to use GitHub Copilot from the command line appeared first on The GitHub Blog.
]]>You don’t have to leave your terminal to use GitHub Copilot anymore.
With GitHub Copilot CLI, you can ask questions, generate scripts, refactor code, and run commands—all without breaking your flow.
In this blog, we’ll explore exactly what GitHub Copilot CLI is, how it works, the best ways to use it, and how you can start working with Copilot right from your terminal.
What is GitHub Copilot CLI?
A command-line interface (CLI) is where you can type commands directly into a terminal or console to interact with software and systems. It’s how you often run scripts, automate workflows, and access APIs.
The GitHub Copilot CLI brings Copilot’s AI capabilities right into that environment. Instead of jumping between your IDE and browser, you can ask Copilot to generate, explain, or execute commands. In short, Copilot CLI gives you more precision and control over how you work.
For instance, you can ask:
copilot "create a bash script to check for uncommitted changes and push if clean"
The Copilot CLI will write the script, explain what it does, and ask you to confirm before running it.
Whether you’re debugging code, managing environments, navigating a legacy codebase, or handling complex implementations, Copilot CLI helps you work faster without leaving your local environment—saving you time and effort.
How does GitHub Copilot CLI work?
CLIs follow a simple loop: you type a command, the system runs it, and you get a result.
GitHub Copilot CLI builds on that workflow but adds an AI-powered twist. Instead of just running predefined commands, you can talk to your terminal in natural language. You tell Copilot CLI what to do, and it figures out the commands to make it happen. (Copilot CLI also supports a number of slash commands and integrations with MCP to extend its capabilities.)
You can use Copilot CLI in two ways:
- Interactive mode (the default mode) lets you start a session with the copilot command and have a back-and-forth conversation, refining tasks as you go.
- Programmatic mode is for one-off prompts: pass a request directly with -p or –prompt, and Copilot responds inline. For tasks that involve modifying or executing files, you can enable approval options to keep things safe and consistent.
No matter how you use it, Copilot CLI will always ask for confirmation before reading, modifying, or executing files. That means you stay in control of your environment. (Note: One exception is if you choose “Yes, and remember this folder for future sessions” or “Yes, and approve TOOL for the rest of the session” when prompted—Copilot will follow these instructions instead. More details in our starter kit below!)
Starter kit: How to install and use GitHub Copilot CLI
Using GitHub Copilot CLI is easier than you think. We created a starter kit for you that explains how to install GitHub Copilot CLI, a step-by-step tutorial on how to use the tool, and common use cases and prompts you can use with Copilot in your terminal. Let’s dive in.
Step one: Installing GitHub Copilot CLI
To get started with GitHub Copilot CLI, you need:
- A GitHub Copilot subscription: Copilot CLI is currently available with the GitHub Copilot Pro, GitHub Copilot Pro+, GitHub Copilot Business, and GitHub Copilot Enterprise plans.
- Node.js version 22 or later
- npm version 10 or later
[Note: If you get Copilot access from an organization, the Copilot CLI policy must be enabled in your organization’s settings.]
Install Copilot CLI with the following command:
npm install -g @github/copilot
That’s it. Now you’re ready to get started. ✨
Step two: How to use GitHub Copilot CLI
It’s time to start using Copilot in the command line (we have step-by-step instructions on how to do this in our docs).
- In your terminal, choose the folder with the code you’re working on.
- Type copilot into the chat box to begin using Copilot CLI. You’ll be prompted to confirm that you trust the contents of the folder.
Important: During your GitHub Copilot CLI session, Copilot may read, modify, and execute files in and below this folder. Only proceed if you trust the files in this location. Read About GitHub Copilot CLI to learn more about trusted directories.
- You can select one of these options:
- Yes, proceed: Copilot can access and use the files in this location for only this session.
- Yes, and remember this folder for future sessions: The files in this folder are trusted for current and future sessions. When starting Copilot CLI from this folder, you won’t be asked this question again (so only select this option if you are sure that it will always be safe for Copilot to work with these files).
- No, exit (Esc): End your Copilot CLI session.
- If you are not currently logged in to GitHub, you’ll be prompted to use the /login slash command. Enter this command and follow the on-screen instructions to authenticate.
Enter your prompt in the CLI (we’ll explore some examples of great prompts in the next section!)
Sometimes you’ll need to approve Copilot’s use of tools that modify or execute files. You’ll have three options:
- Yes: Allow Copilot to use this tool once (and approve it again the next time Copilot needs to use the same tool).
- Yes, and approve TOOL for the rest of the session: Give Copilot full permission to use this tool for the rest of the current session (you’ll need to approve the command again for any future sessions.) This is helpful for when you don’t want to approve commands repeatedly in the same session–but be aware of the security implications. For instance, choosing this option for the command rm would let Copilot to delete any file in or below the current folder without needing your approval.
- No, and tell Copilot what to do differently: Instead of running the command, Copilot will end the current operation and wait for you to prompt it. You can instruct Copilot to continue the task but suggest a different approach.
Step three: GitHub Copilot CLI use cases, plus example prompts and workflows
In this section, we’re providing tons of use cases along with sample prompts that you can feed Copilot to achieve similar outcomes.
GitHub Learn: GitHub Copilot CLI
In this video tutorial, @arilivigni, senior learning advocate and cloud solutions architect at GitHub, demonstrates some foundational ways to use GitHub Copilot CLI to create GitHub issues, pull requests, and more.
Here are some of the GitHub Copilot CLI prompts that were highlighted in the video:
Create a GitHub Issue: Log actionable tasks that keep progress visible.
Create an issue for adding GitHub Copilot instructions
Create Copilot custom instructions: Give Copilot more context on your project so it can deliver even better AI assistance that fits your workflows.
Create a branch for GitHub Copilot custom instructions
Create a pull request: Propose changes that enhance code quality.
Create a pull request with the changes we have made
Attach this pull request to issue #4
Show the content of issue #4
What pull requests are attached to this issue?
Use the MCP server to list all open issues and pull requests
Use MCP servers to query Microsoft Learn: Access official Microsoft Learn content directly from the CLI for quick answers and guidance.
Using the Microsoft Learn MCP server, tell me all the GitHub Copilot Microsoft Learn modules that exist
What are the names of the hands-on Skills exercises that exist in each module?
Create a README with all the Microsoft Learn GitHub Copilot modules and the hands-on skills with headings and subheadings
Create an alias to use with the CLI: Reduce repetitive effort with shortcuts that boost speed and efficiency.
alias grep copilot
cpcli='copilot --allow-all-tools -p "$@"'
Explain and fix scripts: Diagnose script errors and apply fixes to keep your code running smoothly.
cpcli "Explain each of these scripts and offer improvements"
Other ways to use GitHub Copilot CLI
Here are some common use cases for GitHub Copilot CLI, along with specific prompts you can feed Copilot.
Codebase maintenance: Handle security patches, upgrade dependencies, and perform focused refactoring to keep your codebase healthy.
- Request a script for scanning and fixing security vulnerabilities
Generate a bash script to run npm audit and apply fixes automatically
- Ask for targeted refactoring guidance
Upgrade all npm dependencies to their latest safe versions
Generating documentation: Create or update project documentation that improves clarity and visibility.
- Generate more beginner-friendly documentation.
Review the project README to make it easier for newcomers to understand
Understanding your system: Ask questions about system resources, like how your laptop storage is being used, which folders take up the most space, or what processes are running.
- Get a summary of your laptop’s storage capacity.
What is taking up the most space on my own laptop?
Improving test coverage: Add new test suites and enhance existing ones to strengthen quality assurance.
- Ask Copilot to create a command for generating Jest test files
Generate a command to scaffold new Jest test suites for uncovered components
- Request a script to run coverage analysis
Create a bash script to run npm test with coverage and output a summary report
- Ask for best practices for adding integration tests
Suggest steps to add integration tests for API endpoints using Supertest
Prototyping new projects: Kick off greenfield projects and experiment with fresh ideas.
- Ask Copilot to build a proof-of-concept application from scratch
Use create-next-app with Tailwind CSS to build a Next.js dashboard. The dashboard should pull data from the GitHub API and display metrics like build success rate, average build time, failed builds count, and automated test pass rate. After setup, provide clear steps to build, run, and view the app locally in a browser.
Setting up your environment: Run terminal commands to configure your local environment for existing projects.
- Ask Copilot for environment setup commands
Provide commands to set up a Python virtual environment and install requirements.txt
- Request a script for cloning and preparing a project
Generate a bash script to clone a GitHub repo, install dependencies, and start the dev server
- Ask for Docker setup instructions
Suggest commands to build and run a Docker container for this project
Finding the right command to perform a task: Get Copilot’s suggestions for commands relevant to your current task.
- Ask Copilot for a Git command to undo the last commit without losing changes
What is the Git command to undo the last commit but keep the changes staged?
- Request a command to squash commits
Provide the Git command to squash the last three commits into one
Explaining an unfamiliar command: Receive natural language explanations of what a command does and why it’s useful.
- Ask Copilot to explain a Docker command
Explain docker run -it --rm ubuntu bash
- Request an explanation for a Git command
Explain git rebase --interactive HEAD~3
What’s next
Copilot CLI is in public preview, and your feedback will help shape our roadmap—so we want to hear what you think about using the product. You can share your experience using /feedback.
Take this with you
GitHub Copilot CLI brings the power of agentic AI right to where you work: your terminal. Whether you’re launching a new project or tackling a backlog of fixes, putting Copilot at the command line lets you build momentum with less friction and more flow. Happy coding!
Looking to try GitHub Copilot CLI?
Read the Docs and get started today.
More resources to explore:
- GitHub Copilot CLI: How to get started
- Responsible use of GitHub Copilot CLI
- GitHub Copilot CLI demo from GitHub Universe 2025
The post GitHub Copilot CLI 101: How to use GitHub Copilot from the command line appeared first on The GitHub Blog.
]]>The post GitHub Copilot tutorial: How to build, test, review, and ship code faster (with real prompts) appeared first on The GitHub Blog.
]]>If you haven’t used GitHub Copilot since before mission control launched, you haven’t experienced what it can do now.
Copilot used to be an autocomplete tool. Now, it’s a full AI coding assistant that can run multi-step workflows, fix failing tests, review pull requests, and ship code—directly inside VS Code or GitHub.
Back in 2021, Copilot changed how you edited code. Today with Agent HQ and mission control, it’s changing how you build, review, secure and ship software.
Here’s one example:
// Before
"Write tests for this module" = manual setup, fixtures, and edge cases
// Now
Ask Copilot: "Generate Jest tests for userSessionService with cache-enabled branch coverage"
Full test suite + explanations in record time
Under the hood, Copilot runs on multiple models tuned for reasoning, speed, and code understanding. It can see more of your project, generate more accurate results, and move naturally between your editor, terminal, and GitHub.
This guide walks through every part of the new Copilot experience with working examples, best practices, and prompts you can try right now (which you should).
What’s new with Copilot
Larger context + cross-file reasoning (now surfaced through mission control)
Early versions of Copilot saw only what you were typing. Now, it can read across multiple files, helping it understand intent and relationships between modules.
Ask: In mission control: “Find every function using outdated crypto libraries and refactor them to the new API. Open a draft PR.”
Copilot can trace patterns across your codebase, make updates, and explain what changed.
You can choose the right model for the job
You can now choose models based on your needs: one optimized for speed when prototyping, another for deeper reasoning during complex refactors.
It goes beyond code completion
Copilot is now a suite of tools built for every step of the workflow:
- Mission control: Run multi-step tasks, generate tests, and open pull requests.
- Agent mode: Define the outcome, and Copilot determines the best approach seeking feedback from you as needed, testing its own solutions, and refining its work in real time.
- Copilot CLI: Automate and explore your repository directly from the terminal.
- Coding agent: Offload routine fixes or scaffolding to Copilot.
- Code review: Let Copilot highlight risky diffs or missing tests before you merge.
- Scoped agents: Offload routine fixes, refactors, docs, or test generation.
How to use GitHub Copilot (with examples)
Here are actionable items for each mode of Copilot, with code snippets and prompt examples.
Build faster with mission control and agent mode in VS Code
Once you’ve installed the Copilot extension, enable agent mode in settings and open mission control from the sidebar. Start by selecting a workflow (tests, refactor, documentation) or run a custom prompt.
Prompt pattern:
# Add caching to userSessionService to reduce DB hits
In mission control: “Add a Redis caching layer to userSessionService, generate hit/miss tests, and open a draft PR.”
Copilot will create a new file, update the service, add tests, and open a draft pull request with a summary of changes.
Tip: Write comments that explain why, not just what.
// Cache responses by userId for 30s to reduce DB hits >1000/min
Short, specific comments make Copilot work better.
Break into the terminal with Copilot CLI
Copilot CLI brings the same intelligence to your terminal. To install it, use the following command in your terminal:
npm install -g @github/copilot-cli
copilot /login
Once installed and authenticated:
npm install -g @github/copilot-cli
copilot /login
Then run:
copilot explain .
You’ll get a structured summary of your repository, dependencies, test coverage, and potential issues.
Here are some common, useful commands:
copilot explain .
copilot fix tests
copilot setup project
copilot edit src/**/*.py
Try this:
After a failing CI run, use the following command to have Copilot locate the issue, explain why it’s failing, and propose a fix for review.
copilot fix tests
Use Copilot code review
Copilot can now review pull requests directly in GitHub—no plugins required. It identifies risky diffs, missing test coverage, and potential bugs.
Enable Copilot code review via your repository settings to get started.
When a pull request is created, Copilot can comment on:
- Missing test coverage
- Potential bug/edge-case
- Security vulnerabilities
Here’s an example
In your pull request chat, try writing:
Summarize the potential risks in this diff and suggest missing test coverage.
Copilot will reply inline with notes you can accept or ignore. It’s not here to merge for you. It’s here to help you think through issues and concepts faster.
Setting up async tasks with Copilot coding agent
Copilot coding agent can take a structured issue, write code, and open a draft pull request—all asynchronously.
Here’s an example issue:
### Feature Request: CSV Import for User Sessions
- File: import_user_sessions.py
- Parse CSV with headers userId, timestamp, action
- Validate: action in {login, logout, timeout}
- Batch size: up to 10k rows
- On success: append to session table
- Include: tests, docs, API endpoint
Assign that issue to Copilot. It will clone the repo, implement the feature, and open a draft pull request for your review.
Coding agent is best for:
- Repetitive refactors
- Boilerplate or scaffolding
- Docs and test generation
You always review before merge, but Copilot accelerates everything leading up to it.
Best practices and guardrails
- Review everything. AI writes code; you approve it. Always check logic, style, docs before you ship.
- Prompt with context. The better your prompt (why, how, constraints), the better the output.
- Use small increments. For agent mode or CLI edits, do one module at a time. Avoid “rewrite entire app in one shot.”
- Keep developers in the loop. Especially for security, architecture, design decisions.
- Document prompts and decisions. Maintain a log: “Used prompt X, result good/bad, adjustments made”. This helps refine your usage.
- Build trust slowly. Use Copilot for non-critical paths first (tests, refactors), then expand to core workflows.
- Keep context limits in mind. Although Copilot handles more context now, extremely large monolithic repos may still expose limitations.
Why this matters
More than 36 million developers joined GitHub this year (that’s more than one every second!), and 80% used Copilot in their first week.
AI-powered coding is no longer experimental. It’s part of the job.
Typed languages like TypeScript and Python dominate GitHub today, and their structure makes them ideal partners for Copilot. Strong types plus smart suggestions equals faster feedback loops and fewer regressions.
And now with mission control, everything’s in one place. You don’t need five AI tools, ten browser tabs, or a separate review bot. Everything happens where you already build software.
Take this with you
If you’ve been waiting to see what Copilot can really do, mission control is the moment.
With GitHub Copilot—in the editor, in your terminal, in your reviews, and in the background of your team—you’re getting a toolkit designed to help you do real work faster, smarter, and on your terms.
You decide the architecture. You write the tests (or at least the ones you want to write). You merge the pull requests. Copilot helps with boilerplate, scaffolding, and routine tasks so you can keep your focus on the problem that really matters.
Pick one part of your stack this week—tests, docs, refactor—and run it through mission control. See where it saves time, then scale up.
This guide is your map. The tools are in your hands. Now it’s your turn to build.
Start using GitHub Copilot >
The post GitHub Copilot tutorial: How to build, test, review, and ship code faster (with real prompts) appeared first on The GitHub Blog.
]]>The post Measuring what matters: How offline evaluation of GitHub MCP Server works appeared first on The GitHub Blog.
]]>MCP (Model Context Protocol) is a simple, common way for AI models (LLMs) to talk to APIs and data. Think of it like a universal plug: if both sides support MCP, they can connect and work together. An MCP server is any service or app that “speaks MCP” and offers tools the model can use, publishing a list of tools, what each tool does, and what inputs (parameters) each tool needs.
The GitHub MCP Server is the foundation for many GitHub Copilot workflows, both inside and outside of GitHub. As an engineering team working on GitHub MCP, we’re always looking to deliver new features and functionality, while avoiding regressions and improving quality with every iteration. And how we name a tool, explain what it does, and spell out its parameters directly affects whether the model picks the right tool, in the right order, with the right arguments.
When it comes to our work, small edits matter: tightening a description, adding or removing a tool, or combining a few similar tools can shift results a lot. When descriptions are off, agents choose the wrong tool, skip a step, send arguments in the wrong format, or drop them entirely. The outcome is weak. We need a safe way to change MCP and know if things actually got better, not worse. That’s where offline evaluation comes in.
Offline evaluation catches regressions before users see them and keeps the feedback loop short, so we can ship changes that genuinely improve performance.
This article walks through our evaluation pipeline and explains the metrics and algorithms that help us achieve these goals.
How automated offline evaluation works
Our offline evaluation pipeline checks how well our tool prompts work across different models. The tool instructions are kept simple and precise so the model can choose the right tool and fill in the correct parameters. Because LLMs vary in how they use tools, we systematically test each model–MCP pairing to measure compatibility, quality, and gaps.
We have curated datasets that we use as benchmarks. Every benchmark contains the following parameters:
- Input: This is a user request formulated in natural language.
- Expected tools: Tools we expect to be called.
- Expected arguments: Arguments we expect to be passed to each tool.
Here are a few examples:
Asking how many issues were created in a given time period
Input: How many issues were created in the github/github-mcp-server repository during April 2025?
Expected tools: list_issues with arguments:
owner: github
repo: github-mcp-server
since: 2025-04-01T00:00:00Z
Merging pull requests
Input: Merge PR 123 in github/docs using squash merge with title “Update installation guide”
Expected tools: merge_pull_request with arguments:
owner: github
repo: docs
pullNumber: 123
merge_method: squash
commit_title: Update installation guide
Requesting code reviews
Input: Request reviews from alice456 and bob123 for PR 67 in team/project-alpha
Expected tools: update_pull_request with arguments:
owner: team
repo: project-alpha
pullNumber: 67
reviewers: ["alice456", "bob123"]
Summarizing discussion comments
Input: Summarize the comments in discussion 33801, in the facebook/react repository
Expected tools: get_discussion_comments with arguments:
owner: facebook
repo: react
discussionNumber: 33801
The evaluation pipeline has three stages: fulfillment, evaluation, and summarization.
- Fulfillment: We run each benchmark across multiple models, providing the list of available MCP tools with every request. For each run, we record which tools the model invoked and the arguments it supplied.
- Evaluation: We process the raw outputs and compute metrics and scores.
- Summarization: We aggregate dataset-level statistics and produce the final evaluation report.
Evaluation metrics and algorithms
Our evaluation targets two aspects: whether the model selects the correct tools and whether it supplies correct arguments.
Tool selection
When benchmarks involve a single tool call, tool selection reduces to a multi-class classification problem. Each benchmark is labeled with the tool it expects, and each tool is a “class.”
Models tasked with this classification are evaluated using accuracy, precision, recall, and F1-score.
- Accuracy is the simplest measure that shows the percentage of correct classifications. In our case it means the percentage of inputs that resulted in an expected tool call. This is calculated on the whole dataset.
- Precision shows the proportion of the cases for which the tool was called correctly out of all cases where the tool was called. Low precision means the model picks the tool even for the cases where the tool is not expected to be called.
- Recall shows the proportion of correctly called tools out of all cases where the given tool call was expected. Low recall may indicate that the model doesn’t understand that the tool needs to be called and fails to call the tool or calls another tool instead.
- F1-score is a harmonic mean showing how well the model is doing in terms of both precision and recall.
If the model confuses two tools, it can result in low precision or recall for these tools.
We have two similar tools that used to be confused often, which are list_issues and search_issues. Let’s say we have 10 benchmarks for list_issues and 10 benchmarks for search_issues. Imagine list_issues is called correctly in all of 10 cases and on top in 30% of cases where search_issues should be called.
This means we’re going to have lower recall for search_issues and lower precision for list_issues:
Precision (list_issues) = 10 (cases where tool is called correctly) / (10 + 3 (cases where tool is called instead of search_issues)) = 0.77
Recall (search_issues) = 7 (tool was called correctly) / 10 (cases where tool is expected to be called) = 0.7
In order to have visibility into what tools are confused with each other, we build a confusion matrix. Confusion matrix for the search_issues and list_issues tools from the example above would look the following:
| Expected tool / Called tool | search_issues | list_issues |
|---|---|---|
| search_issues | 7 | 3 |
| list_issues | 0 | 10 |
The confusion matrix allows us to see the reason behind low precision and recall for certain tools and tweak their descriptions to minimize confusion.
Argument correctness
Selecting the right tool isn’t enough. The model must also supply correct arguments. We’ve defined a set of argument-correctness metrics that pinpoint specific issues, making regressions easy to diagnose and fix.
We track four argument-quality metrics:
- Argument hallucination: How often the model supplies argument names that aren’t defined for the tool.
- All expected arguments provided: Whether every expected argument is present.
- All required arguments provided: Whether all required arguments are included.
- Exact value match: Whether provided argument values match the expected values exactly.
These metrics are computed for tools that were correctly selected. The final report summarizes each tool’s performance across all four metrics.
Looking forward and filling the gaps
The current evaluation framework gives us a solid read on tool performance against curated datasets, but there’s still room to improve.
More is better
Benchmark volume is the weak point of offline evaluation. With so many classes (tools), we need more robust per-tool coverage. Evaluations based on just a couple of examples aren’t dependable alone. Adding more benchmarks is always useful to increase the reliability of classification evaluation and other metrics.
Evaluation of multi-tool flows
Our current pipeline handles only single tool calls. In practice, tools are often invoked sequentially, with later calls consuming the outputs of earlier ones. To evaluate these flows, we must go beyond fetching the MCP tool list and actually execute tool calls (or mock their responses) during evaluation.
We’ll also update summarization. Today we treat tool selection as multi-class classification, which assumes one tool per input. For flows where a single input can trigger multiple tools, multi-label classification is the better fit.
Take this with you
Offline evaluation gives us a fast, safe way to iterate on MCP, so models pick the right GitHub tools with the right arguments. By combining curated benchmarks with clear metrics—classification scores for tool selection and targeted checks for argument quality—we turn vague “it seems better” into measurable progress and actionable fixes.
We’re not stopping here. We’re expanding benchmark coverage, refining tool descriptions to reduce confusion, and extending the pipeline to handle real multi-tool flows with execution or faithful mocks. These investments mean fewer regressions, clearer insights, and more reliable agents that help developers move faster.
Most importantly, this work raises the bar for product quality without slowing delivery. As we grow the suite and deepen the evaluation, you can expect steadier improvements to GitHub MCP Server—and a better, more predictable experience for anyone building with it.
The post Measuring what matters: How offline evaluation of GitHub MCP Server works appeared first on The GitHub Blog.
]]>The post How to find, install, and manage MCP servers with the GitHub MCP Registry appeared first on The GitHub Blog.
]]>Picture this: you walk into a grocery store and nothing makes sense. The cereal is scattered across three aisles. The milk is hiding in some random cooler near self-checkout. And those produce labels? They haven’t been updated in months.
That’s exactly what discovering Model Context Protocol (MCP) servers felt like. Until now.
As a refresher, MCP is how developers connect tools, APIs, and workflows to their AI systems. Each MCP server is like an ingredient in your AI stack, whether it’s Playwright for browser automation, Notion for knowledge access, or GitHub’s own MCP server with over a hundred tools.
The new GitHub MCP Registry changes everything by giving you a single, canonical source for discovering, installing, and managing MCP servers right on GitHub.
Here’s what you need to know about finding the right tools for your AI stack, publishing your own servers, and setting up governance for your team.
In this blog, we’ll walk through how to:
- Install an MCP server
- Publish your own
- Enable governance and team use
We’ll also share a few tips and tricks for power users. Let’s go!
What’s in the registry today
Currently, the GitHub MCP Registry has 44 MCP servers, including:
- Playwright: Automate and test web apps.
- GitHub MCP server: Access 100+ GitHub API tools.
- Context7, MarkItDown (Microsoft), Terraform (HashiCorp).
- Partner servers from Notion, Unity, Firecrawl, Stripe, and more.
You can browse by tags, popularity, or GitHub stars to find the tools you need.
How to install an MCP server
The registry makes installation a one-click experience in VS Code or VS Code Insiders.
Example: Installing Playwright
- Navigate to Playwright MCP server in the GitHub MCP Registry.
- Click Install in VS Code.
- VS Code launches with a pre-filled configuration.
- Accept or adjust optional parameters (like storage paths).
That’s it. You’re ready to use Playwright in your agentic workflows.
✅ Pro tip: Remote MCP servers (like GitHub’s) use OAuth during install so you don’t need to manually handle tokens or secrets. Just authenticate once and start building.
How to publish your own MCP server
1. Install the MCP Publisher CLI
- macOS/Linux/WSL (Homebrew, recommended):
brew install mcp-publisher
- macOS/Linux/WSL (prebuilt binary, latest version):
"https://github.com/modelcontextprotocol/registry/releases/download/latest/mcp-publisher_$(uname -s | tr '[:upper:]' '[:lower:]')_$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/').tar.gz" | tar xz mcp-publisher && sudo mv mcp-publisher /usr/local/bin/
2. Initialize your server.json file
Navigate to your server’s source directory and run:
cd /path/to/your/mcp-server
mcp-publisher init
This creates a server.json file. Example:
{
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-09-29/server.schema.json",
"name": "io.github.yourname/your-server",
"title": "Describe Your Server",
"description": "A description of your MCP server",
"version": "1.0.0",
"packages": [
{
"registryType": "npm",
"identifier": "your-package-name",
"version": "1.0.0",
"transport": { "type": "stdio" }
}
]
}
3. Prove you own the package
Add the required metadata for your package type.
- NPM: Add an
"mcpName"field to yourpackage.json:
{
"name": "your-npm-package",
"mcpName": "io.github.username/server-name"
}
- PyPI/NuGet: Add this to your README:
mcp-name: io.github.username/server-name
- Docker: Add a label to your Dockerfile:
LABEL io.modelcontextprotocol.server.name="io.github.username/server-name"
4. Authentication
- For GitHub-based namespaces (
io.github.*), run:
mcp-publisher login github
This will open a browser for OAuth login.
- For custom domains (
com.yourcompany/*), follow DNS verification steps in the official docs.
5. Publish your server
Once authenticated, publish to the registry:
mcp-publisher publish
If successful, your server will be discoverable in the MCP registry. You can verify with:
curl "https://registry.modelcontextprotocol.io/v0/servers?search=io.github.yourname/your-server"
Once you’ve completed the steps above, email partnerships@github.com and request for your server to be included.
✅ Pro tips:
- Namespace: Use
io.github.username/*for GitHub auth, orcom.yourcompany/*for DNS-based verification. - Remote endpoints: Add a
"remotes"array in yourserver.jsonfor cloud/HTTP endpoints:
"remotes": [
{
"type": "streamable-http",
"url": "https://yourdomain.com/yourserver"
}
]
- Multiple deployment options: You can list both
"packages"and"remotes"for hybrid deployments. - Examples: See airtable-mcp-server (npm/docker/MCPB), time-mcp-nuget, time-mcp-pypi.
Automate publishing with GitHub Actions
You can automate publishing so every tagged release is published to both your package registry and the MCP registry.
Create .github/workflows/publish-mcp.yml:
name: Publish to MCP Registry
on:
push:
tags: ["v*"]
jobs:
publish:
runs-on: ubuntu-latest
permissions:
id-token: write # For OIDC
contents: read
steps:
- uses: actions/checkout@v5
# (Edit these for your package type)
- name: Setup Node.js
uses: actions/setup-node@v5
with:
node-version: "lts/*"
- name: Install dependencies
run: npm ci
- name: Build and test
run: |
npm run build --if-present
npm run test --if-present
- name: Publish to npm
run: npm publish
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
# MCP publishing (works for all package types)
- name: Download MCP Publisher
run: |
curl -L "https://github.com/modelcontextprotocol/registry/releases/download/latest/mcp-publisher_$(uname -s | tr '[:upper:]' '[:lower:]')_$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/').tar.gz" | tar xz mcp-publisher
- name: Publish to MCP Registry
run: |
./mcp-publisher login github-oidc
./mcp-publisher publish
# Optional: keep server.json version in sync with git tag
- run: |
VERSION=${GITHUB_REF#refs/tags/v}
jq --arg v "$VERSION" '.version = $v' server.json > tmp && mv tmp server.json
To trigger the workflow:
git tag v1.0.0
git push origin v1.0.0
When you publish, your server shows up in the open source registry and downstream registries (like GitHub’s) automatically pick up updates. No more notifying a dozen different registries every time you ship a new version.
✅ Pro tips:
- Host your code in a public GitHub repository to show verified ownership.
- Add tags in
server.jsonso developers can easily discover your server by category. - Updates propagate automatically downstream—no manual notifications required
How to manage MCP servers in the enterprise
If you’re managing MCP usage across a large organization, governance isn’t optional. You need control over which servers your developers can install—especially when those servers interact with sensitive data.
GitHub now supports registry allow lists so admins can control which MCP servers are available to developers.
Here are the steps for admins (which may be you!):
- Stand up or connect an internal registry that follows the MCP API spec (registry + HTTP endpoint).
- Add vetted MCP servers (internal + external) to your registry.
- Point GitHub Enterprise settings to that registry endpoint.
- MCP-aware surfaces (starting with VS Code) enforce the allow list automatically.
Example: How the allow list works
Your internal registry at https://internal.mybank.com/mcp-registry returns:
{
"servers": [
{
"name": "github.com/github/mcp-server",
"version": "1.0.0"
},
{
"name": "github.com/microsoft/markitdown-mcp",
"version": "2.1.0"
},
{
"name": "internal.mybank.com/mcp-servers/custom-tools",
"version": "1.5.0"
}
]
}
When developers try to install an MCP server in VS Code, GitHub checks your registry endpoint and only allows installations from your approved list.
This governance model means you can vet partnerships, run security scans, and maintain compliance, all while giving developers access to the tools they need.
✅ Pro tip: Use GitHub’s API or your existing security pipeline to vet MCP servers before adding them to your allow list.
Tips and tricks for power users
Once you’ve got the basics down, here are some shortcuts to get more out of the registry:
- Sort smarter: Use GitHub stars and org verification to quickly assess quality and legitimacy. If a server has thousands of stars and comes from a verified org like Microsoft or HashiCorp, that’s a strong signal.
- Local testing: Test your MCP server before publishing using the MCP Inspector. This helps you catch issues early without polluting the registry.
- Agent synergy: Copilot coding agent comes preloaded with GitHub and Playwright MCP servers. This combo enables auto-generated pull requests with screenshots of web apps, perfect for UI-heavy projects where visual validation matters.
- Tool overload fix: VS Code is rolling out semantic tool lookups, so your agent won’t flood contexts with 90+ tools. Instead, only the relevant ones surface based on your prompt. This makes working with large MCP servers like GitHub’s much more manageable.
What’s next?
The GitHub MCP Registry is just getting started. Here’s a look at what’s on the horizon—from self-publication to enterprise adoption—so you can see where the ecosystem is heading.
- Self-publication: Expected in the next couple months. This will unlock community-driven growth and make the registry the canonical source for all public MCP servers.
- More IDE support: Other IDEs are coming. The goal is to make MCP server installation seamless regardless of where you write code.
- Enterprise features: Governance flows to help unlock MCP usage in regulated industries. Think financial services, healthcare, and other sectors where compliance isn’t negotiable.
- Agentic workflows: GitHub MCP server will start bundling tools into use-case-driven flows (e.g., “analyze repository + open pull request”) instead of just exposing raw API endpoints. This will make complex workflows feel like simple commands.
Get started today
The GitHub MCP Registry has 44 servers today and will continue growing (trust us!).
👉 Explore the MCP Registry on GitHub
👉 To nominate your server now, email partnerships@github.com.
Soon, this registry will become the single source of truth for MCP servers, giving you one place to discover, install, and govern tools without hopping across outdated registries.
The future of AI-assisted development isn’t about coding faster. It’s about orchestrating tools that amplify your impact. And the GitHub MCP Registry is where that orchestration begins.
The post How to find, install, and manage MCP servers with the GitHub MCP Registry appeared first on The GitHub Blog.
]]>The post The road to better completions: Building a faster, smarter GitHub Copilot with a new custom model appeared first on The GitHub Blog.
]]>Code completion remains the most widely used GitHub Copilot feature, helping millions of developers stay in the flow every day. Our team has continuously iterated on the custom models powering the completions experience in GitHub Copilot driven by developer feedback. That work has had a big impact on giving you faster, more relevant suggestions in the editor.
We’re now delivering suggestions with 20% more accepted and retained characters, 12% higher acceptance rate, 3x higher token-per-second throughput, and a 35% reduction in latency.
These updates now power GitHub Copilot across editors and environments. We’d like to share our journey on how we trained and evaluated our custom model for code completions.
Why it matters
When Copilot completions improve, you spend less time editing and more time building. The original Copilot was optimized for the highest acceptance rate possible. However, we realized that a heavy focus on acceptance rates could lead to incorrectly favoring a high volume of simple and short suggestions.
We heard your feedback that this didn’t reflect real developer needs or deliver the highest quality experience. So, we pivoted to also optimize for accepted and retained characters, code flow, and other metrics.
- 20% higher accepted-and-retained characters results in more of each Copilot suggestion staying in your final code, not just ending up temporarily accepted and deleted later. In other words, suggestions provide more value with fewer keystrokes.
- 12% higher acceptance rate means you find suggestions more useful more often, reflecting better immediate utility.
- 3x throughput with 35% lower latency makes Copilot feel faster. It handles more requests at once while keeping your coding flow unbroken (throughput describes how much work the system can handle overall, while latency describes how quickly each individual request completes).
How we evaluate custom models
Copilot models are evaluated using combined signals from offline, pre-production, and production evaluations. Each layer helps us refine different aspects of the experience while ensuring better quality in real developer workflows.
1) Offline evaluations
Execution-based benchmark: As part of our offline evaluations, we first test against internal and public repositories with strong code by unit test and scenario coverage, spanning all major languages. Each test simulates real tasks, accepts suggestions, and measures build-and-test pass rates. This emphasizes functional correctness over surface fluency.
Below is an example of a partial token completion error: the model produced dataet instead of dataset.

LLM-judge scoring: While we start with execution-based evaluation, this has downsides: it only tells if the code will compile, but the results are not always aligned with developer preferences. To ensure the best possible outcomes, we run an independent LLM to score completions across three axes:
- Quality: Ensure syntax validity, duplication/overlap, format and style consistency.
- Relevance: Focus on relevant code, avoid hallucination and overreach.
- Helpfulness: Reduce manual effort, avoid outdated or deprecated APIs.
2) Pre-production evaluations: Qualitative dogfooding
Our next step includes working with internal developers and partners to test models side-by-side in real workflows (to do the latter, we exposed the preview model to developers through Copilot’s model picker). We collect structured feedback on readability, trust, and “taste.” Part of this process includes working with language experts to improve overall completion quality. This is unique: while execution-based testing, LLM-based evaluations, dogfood testing, and A/B testing are common, we find language-specific evaluations lead to better outcomes along quality and style preferences.
3) Production-based evaluations: A/B testing
Ultimately, the lived experience of developers like you is what matters most. We measure improvements using accepted-and-retained characters, acceptance rates, completion-shown rate, time-to-first token, latency, and many other metrics. We ship only when statistically significant improvements hold up under real developer workloads.
How we trained our new Copilot completions model
Mid-training
Modern codebases use modern APIs. Before fine-tuning, we build a code-specific foundational model via mid-training using a curated, de-duplicated corpus of modern, idiomatic, public, and internal code with nearly 10M repositories and 600-plus programming languages. (Mid-training refers to the stage after the base model has been pretrained on a very large, diverse corpus, but before it undergoes final fine-tuning or instruction-tuning).
This is a critical step to ensure behaviors, new language syntax, and recent API versions are utilized by the model. We then use supervised fine- tuning and reinforcement learning while mixing objectives beyond next-token prediction—span infillings and docstring/function pairs—so the model learns structure, naming, and intent, not just next-token prediction. This helps us make the foundational model code-fluent, style-consistent, and context-aware, ready for more targeted fine-tuning via supervised fine-tuning.
Supervised fine-tuning
Newer general-purpose chat models perform well in natural language to generate code, but underperform on fill-in-the-middle (FIM) code completion. In practice, chat models experience cursor-misaligned inserts, duplication of code before the cursor (prefix), and overwrites of code after the cursor (suffix).
As we moved to fine-tuned behaviors, we trained models specialized in completions by way of synthetic fine-tuning to behave like a great FIM engine. In practice, this improves:
- Prefix/suffix awareness: Accurate inserts between tokens, mid-line continuations, full line completions, and multi-line block completions without trampling the suffix.
- Formatting fidelity: Respect local style (indentation, imports, docstrings) and avoid prefix duplication.
The result is significantly improved FIM performance. For example, here is a benchmark comparing our latest completions model to GPT-4.1-mini on OpenAI’s HumanEval Infilling Benchmarks.

Reinforcement learning
Finally, we used a custom reinforcement learning algorithm, teaching the model through rewards and penalties to internalize what makes code suggestions useful in real developer scenarios along three axes:
- Quality: Syntax-valid, compilable code that follows project style (indentations, imports, headers).
- Relevance: On-task suggestions that respect surrounding context and the file’s intent.
- Helpfulness: Suggestions that reduce manual effort and prefer modern APIs.
Together, these create completions that are correct, relevant, and genuinely useful at the cursor instead of being verbose or superficially helpful.
What we learned
After talking with programming language experts and finding success in our prompt-based approach, one of our most important lessons was adding related files like C++ header files to our training data. Beyond this, we also came away with three key learnings:
- Reward carefully: Early reinforcement learning version over-optimized for longer completions, adding too many comments in the form of “reward hacking.” To mitigate this problem, we introduced comment guardrails to keep completions concise and focused on moving the task forward while penalizing unnecessary commentary.
- Metrics matter: Being hyper-focused on a metric like acceptance rate can lead to experiences that look good on paper, but do not result in happy developers. That makes it critical to evaluate performance by monitoring multiple metrics with real-world impact.
- Train for real-world usage: We align our synthetic fine-tuning data with real-world usage and adapt our training accordingly. This helps us identify problematic patterns and remove them via training to improve real-world outcomes.
What’s next
We’re continuing to push the frontier of Copilot completions by:
- Expanding into domain-specific slices (e.g., game engines, financial, ERP).
- Refining reward functions for build/test success, semantic usefulness (edits that advance the user’s intent without bloat), and API modernity preference for up-to-date, idiomatic libraries and patterns. This is helping us shape completion behavior with greater precision.
- Driving faster, cheaper, higher-quality completions across all developer environments.
Experience faster, smarter code completions yourself. Try GitHub Copilot in VS Code >
Acknowledgments
First, a big shoutout to our developer community for continuing to give us feedback and push us to deliver the best possible experiences with GitHub Copilot. Moreover, a huge thanks to the researchers, engineers, product managers, designers across GitHub and Microsoft who curated the training data, built the training pipeline, evaluation suites, client and serving stack — and to the GitHub Copilot product and engineering teams for smooth model releases.
The post The road to better completions: Building a faster, smarter GitHub Copilot with a new custom model appeared first on The GitHub Blog.
]]>The post How to update community health files with AI appeared first on The GitHub Blog.
]]>Maintaining your project’s community health files shouldn’t get in the way of writing code. GitHub Copilot can help you update and enhance your documentation, so you can stay focused on what really matters: working on the projects that excite you most.
In this blog, we’ll touch on some of the most common community health files (focusing on README, contributor guides, and licenses) and why they’re so important for maintainers, along with actionable steps you can take to add them to your projects. ✨
What are community health files and why are they so important?
Community health files are standardized documents that help maintain a welcoming, organized, and collaborative environment in open source projects. These files communicate expectations, guide contributors, and support the overall health of a repository. They do not include technical documentation or code itself, but rather the scaffolding that supports healthy collaboration. You can typically find them in a repository’s root directory or in a special .github folder (if they need to be applied across multiple repositories).
Keeping these files up-to-date should be considered a practical investment into your project’s future and reputation, as they’re often the first touchpoint for new contributors, and their existence signals project maturity and maintainability. They not only improve transparency, consistency, and collaboration, but also help set the tone for how contributors and maintainers interact and engage productively.
If crucial community health files are missing or outdated, everyone feels the effects. Picture this: Your open source project starts gaining traction with new contributors. They want to help, but your repository doesn’t have the right files, which leads to contributors unintentionally formatting pull requests incorrectly, opening vague issues, and even introducing security vulnerabilities—all because they didn’t know the proper procedures from the start. Now, your maintainers are overwhelmed and faced with answering the same questions over and over, while also trying to retroactively enforce standards.
It’s clear that the presence of these files helps promote efficiency and clearly communicates best practices, which in turn, creates a better environment for contributors and makes life easier for maintainers—and thanks to AI, the process doesn’t have to be manual. AI tools like GitHub Copilot, for example, can automatically detect missing or stale files, suggest updates, and even generate drafts—saving time and reducing human error.
Here are three common types of community health files and why they’re so important for building a welcoming community (and don’t worry, we’ll tell you exactly how you can generate your own with Copilot later in this blog!):
README
Often one of the first things a visitor sees when viewing a repository, a README.MD introduces the project and explains its purpose, along with how to get started. Intended to help remove barriers, this document gives your users crucial information they need to quickly get up and running—like what the project is, information on its features, and how to install or use it.
CONTRIBUTOR GUIDE
A contributor guide provides guidelines on how contributors can and should participate—things like coding standards and pull request instructions. This guide tells users how they can efficiently contribute and what to expect. For instance, does the project even accept contributions? Contributor guides help set standards and expectations.
LICENSE
A license specifies the legal terms under which the project can be used, modified, and distributed. In short, it tells people how they can use your software. A common example of this type of file is the MIT License.
Here are some other popular community health files:
| ISSUE/PULL REQUEST TEMPLATES | Standardizes the format and information required when submitting issues or pull requests. |
| SECURITY | Provides instructions for reporting vulnerabilities and outlines the project’s security policy. |
| GOVERNANCE | Explains how the project is managed, including roles, responsibilities, and decision-making processes. |
| CODE OF CONDUCT | Defines standards for how to engage in a community. |
| SUPPORT | Shares specific guidance on how others can get help with your project. |
| FUNDING | Displays a sponsor button in your repository to increase the visibility of funding options for your open source project. |
And while it’s not exactly considered a community health file, we wanted to give an honorable mention to… the Copilot instructions file, which is an AI configuration that complements health docs. It uses the other community health files as context and tells GitHub Copilot exactly how to interact with the codebase, including what to prioritize or avoid. This file helps ground the LLM—whether you’re using GitHub Copilot or another LLM in VS Code, on github.com, or Copilot coding agent—giving it an understanding of what your project is and how it’s structured, allowing for consistency across your codebase.
Having these kinds of files in your project is so important, especially when it comes to scaling open source projects where maintainers probably don’t have time to personally help every contributor.
That’s where time-saving tools like GitHub Copilot come in handy. Keep on reading for actionable next steps, tips, and tutorials on the most efficient ways to add these files to your repositories. ✨
Starter kit: How to update community health files using GitHub Copilot
We created a starter kit for you that explains how you can use AI to add these valuable files to your projects, complete with prompting best practices, a checklist full of things to consider, and step-by-step tutorials on how to add three common files to your repository using Copilot. Let’s dive in.
Part one: Prompting
Whether you’re starting from scratch or refining existing documentation, GitHub Copilot can help you write clearer, more consistent community health files with just a few prompts.
One thing to note: The LLMs powering GitHub Copilot are nondeterministic, which means that you can receive different outputs each time you prompt the model. Prompt engineering can drastically improve the quality and relevance of the outputs you get from an LLM, but you’ll still want to verify the accuracy of these outputs, especially when using Copilot to generate more sensitive files like licenses that have legal weight.
Part two: Checklist
This checklist helps ensure that Copilot-generated content is accurate, inclusive, secure, and aligned with your project’s goals.
🔍 Before you start
- Have you reviewed existing community health files in similar or related repositories?
- Do you have clear goals for what each file should communicate (e.g., onboarding, behavior expectations, security reporting)?
- Are you familiar with your organization’s GitHub usage policies and branding guidelines?
🧠 Prompting Copilot effectively
- Are your prompts specific and contextual? (e.g., “Generate a
CONTRIBUTING.mdfor a Python-based open source project with a code style guide.”) - Have you included examples or tone preferences in your prompt? (e.g., “Use inclusive language and a welcoming tone.”)
🛡️ Security & privacy
- Are you avoiding prompts that include sensitive or proprietary information (e.g., internal credentials, private URLs, confidential project names)?
- Have you reviewed your repository’s visibility settings (public vs. private) and ensured that community health files are appropriate for that audience?
- Are you familiar with GitHub Copilot’s privacy settings and how your prompts and suggestions are handled?
- Will your
SECURITY.mdinclude:- A clear contact method for reporting vulnerabilities?
- A brief explanation of how security issues are triaged?
- Any relevant links to your organization’s responsible disclosure policy?
🧾 Reviewing Copilot output
- Does the generated content reflect your project’s values and community standards?
- Have you checked for hallucinated links, names, or policies that don’t exist?
- Are all references to external resources accurate and up-to-date?
🧪 Testing & feedback
- Have you asked a teammate or contributor to review the generated files?
- Have you tested any instructions (e.g., setup steps in
READMEorCONTRIBUTING) to ensure they work? - Are you open to iterating based on community feedback?
Part three: Tutorial
In this tutorial, we’ll walk through how you can use Copilot to quickly and easily update README.md, a LICENSE file, and CONTRIBUTING.md.
📝 Create a README
Why make a README? Adding a README provides a clear overview of your project, helping users and contributors quickly understand its purpose, setup, and usage. Without it, potential users could abandon your repository due to confusion or lack of context.
Here’s how to make one:
- Open GitHub Copilot Chat in your IDE (e.g., VS Code).
- Switch to agent mode to enable project-aware assistance.
- Select your preferred model (e.g., Claude for strong writing and coding support).
- Ensure your project is open in the IDE so Copilot can read its context (e.g.,
package.json,app.tsx). - In the chat window, type: “Help me write a
README.mdfor my project. Ensure it includes installation instructions, a project overview, and follows standardREADMEpractices.” - Review the generated README.md. Copilot will analyze your project files and generate a structured
README.md. - Validate the installation instructions manually to ensure accuracy (LLMs may hallucinate).
- If satisfied, click “Keep” to save the
README.mdfile. - Commit the
README.mdto your repository.
📄 Add a license
Why make a license? A license defines how others can legally use, modify, and distribute your code, protecting both your rights and theirs. It removes ambiguity and prevents misuse, making your project safer to adopt and contribute to.
Here’s how to add one:
- Open GitHub Copilot Chat in your IDE.
- Decide what kind of license you want to add.
- Type the following prompt: “Can you add [the license you want] to my project?”
- Copilot will generate a
LICENSEfile with the license of your choice. - Review the license to ensure it’s accurate (especially any copyright owner names and statements).
- If correct, click “Keep” to save the file.
- Commit the
LICENSEfile to your repository.
🤝 Create a contributor guide
Why make a contributor guide? A contributor guide streamlines collaboration by outlining contribution standards, workflows, and expectations. This makes it easier for others to get involved with your project. The goal is to reduce friction and errors while also encouraging consistent, scalable contributions.
Here’s how to create one:
- Open GitHub Copilot Chat in your IDE.
- Click the “+” icon to start a new chat.
- Type this prompt: “Create a contributing guide file that follows best practices and link it in the
README.” - Copilot will generate a
CONTRIBUTING.mdfile with:- Contribution guidelines
- Code standards
- Pull request instructions
- Issue reporting process
- Review and edit the guide to match your team’s workflow.
- Save and commit the
CONTRIBUTING.mdfile. - Update your README to include a link to the contributor guide:
## Contributing
See CONTRIBUTING.md for guidelines.
Take this with you
GitHub Copilot isn’t just for writing code—it can be your documentation sidekick, too. Helping you write smarter, faster, and with less friction, Copilot sharpens your community health files, scales best practices, and turns good intentions into great documentation.
The result? Better docs, stronger communities, and happier maintainers.
Read the Docs to learn more about GitHub Copilot features or get started today.
The post How to update community health files with AI appeared first on The GitHub Blog.
]]>The post Copilot: Faster, smarter, and built for how you work now appeared first on The GitHub Blog.
]]>You probably remember when GitHub Copilot first showed up in your editor with that little gray box. It was fast, surprising, and sometimes weird. But it hinted at something bigger: AI could actually help you code, not just autocomplete it.
Fast forward to today, and AI is part of our daily workflows. From Cursor to Windsurf and Claude Code to Gemini to OpenAI Codex, there’s no shortage of new tools. And that’s great. Developers need options.
But with 20 million-plus developers across IDEs, the command line, and pull requests, GitHub Copilot is the most-used AI tool among developers, according to a recent Pragmatic Engineer survey. Devs have used Copilot to accept more than 3 billion code suggestions to date. And every month, Copilot helps deliver millions of code reviews and contribute 1.2 million pull requests, directly inside GitHub.
And because GitHub is where your code already lives (plus your pull requests, reviews, and tests), Copilot doesn’t stop at writing code. It plugs into everything you rely on via the GitHub MCP Server.
We haven’t always been the fastest (though our Changelog may beg to differ) or the loudest. But we’ve been building Copilot since before ChatGPT existed, and we are focused on one purpose: to help developers turn TODOs into committed code. And while some chase the bleeding edge, we know developers don’t want their production code balanced on it.
All that to say: if you tried Copilot early on, things have changed in some pretty big ways.

From autocomplete to actual collaboration 💻
If 2024 was about showing what’s possible with AI, 2025 is about making it practical. Copilot has quietly grown from a neat autocomplete trick into a multi-modal, multi-model assistant that actually understands your projects and helps you move them forward.
After opening up support for multiple models from different providers in 2024, we’ve been shipping new models almost as fast as they drop from OpenAI’s latest releases to Google’s Gemini 2.0 Flash.
This evolution didn’t happen by accident. Developers told us what worked, what didn’t, and that they wanted more powerful agentic workflows and multi-file editing. So we made that happen.
And that’s just one part of how far Copilot’s come. It’s all part of a bigger goal: making Copilot smarter without you ever needing to install or configure a thing.
From idea to merge in record time ⚡
Over the last year, raw speed and agentic workflows helped define a new crop of AI tools. We took that as a challenge.
- Agent mode: Copilot now takes on cross-file tasks, runs commands, refactors entire modules, and suggests terminal operations—all without leaving your editor.
- Coding agent: Assign an issue to Copilot, and it drafts a pull request with code, tests, and context from your project. Coding agent now contributes to roughly 1.2 million pull requests per month.
- Next-edit suggestions: Copilot predicts the next change you’ll make and offers it inline. One Tab and you’re done.
- Low-latency completions: Most Copilot responses now render in under 400 ms (fast enough that you stop noticing them).
- Copilot CLI: The same brains, now in your terminal. Setup, debug, and script without switching windows.
- Multi-model routing: Different jobs call for different brains. Copilot gives you access to multiple LLMs from leading frontier AI firms.
The result: fewer interruptions, faster loops, and a workflow that finally keeps pace with how you think.
AI that scales with your workflow 📐
Copilot doesn’t live in a new environment you need to learn. It’s part of the same ecosystem you already use, and scales with it.
- JetBrains + VS Code + CLI parity: Same Copilot, wherever you build.
- Custom instructions: Drop a .copilot-instructions.md file in to teach Copilot your naming conventions, test frameworks, comment formats.
- GitHub MCP Server: Lets any AI tool securely access your GitHub context (pull requests, issues, actions) without leaving GitHub.
- Workspace prompt files: Reusable blueprints for consistent prompts across teams.
- 20M+ developers strong: Every Copilot update compounds through the world’s largest network of real developer data (and feedback).
Copilot isn’t a separate tool you “add” to GitHub. It’s part of what makes GitHub a full-stack development platform. Other tools might help you code; Copilot helps you build, test, secure, and ship.
Smarter, cleaner, and safer code 🔍
Fast is nice. Correct is better (ask us how we know). We’ve spent a lot of cycles quietly leveling up Copilot’s overall code quality and security guardrails where they matter most to you.
- Copilot Autofix: Detects and patches vulnerabilities automatically (it was used to fix over a million vulnerabilities this year alone).
- Code review: Summarizes diffs, flags logic bugs, and suggests fixes right inside your pull requests with a tool that powers millions of code reviews a month on GitHub.
- Improved model reasoning: Generates more readable, test-passing code with fewer lint errors and less regressions.
- CodeQL integration: Integrations with GitHub Advanced Security, Dependabot, and GitHub Actions keeps your supply chain solid.
- Built-in privacy: Enterprise isolation, audit logs, and tenant-level control mean your work stays off the grid.
Our research shows new code written with Copilot tends to have higher readability, better reliability, and improved maintainability scores.
Here’s the good news: Copilot’s backed by the same security stack that protects the world’s largest open source ecosystem and more than 90% of Fortune 100 companies.

Real talk: Copilot vs. the rest 👀
Let’s be honest: there are some great tools out there that make agentic coding workflows feel intuitive and bring real polish to multi-file editing.
Copilot lives in GitHub. That means it’s close to everything else you do, whether it’s your pull requests, GitHub Actions workflows, or CI/CD pipelines. Every day, GitHub powers over 3 million pull request merges and 50 million actions runs. And Copilot lives in that flow.
Other tools might help you write code faster. But Copilot helps you ship better software.
That means:
- No migration, no new IDE, no new habits: Copilot lives inside the tools you already use.
- Full-stack awareness: Your pull requests, reviews, tests, and workflows are part of the same conversation.
- End-to-end coverage: Copilot brings AI assistance to real-world delivery.
What’s next 🚀
We’re just getting started.
At the end of this month, GitHub Universe 2025 kicks off, and you can expect a lot of news. From smarter agent workflows to deeper multi-model integration and next-gen security features, we’re building what’s next for how software gets built.
Because our goal hasn’t changed. We’re here to help every developer commit code faster instead of chasing TODOs.
Ready to see how far we’ve come? Get started with GitHub Copilot >
The post Copilot: Faster, smarter, and built for how you work now appeared first on The GitHub Blog.
]]>The post How GitHub Copilot and AI agents are saving legacy systems appeared first on The GitHub Blog.
]]>Picture this: you’re a developer in 2025, and your company just told you they need to modernize a mainframe system that processes millions of ATM transactions daily. We’re talking about COBOL, a programming language that’s been around for 65 years. That’s older than the internet.
Now, your first instinct might be to laugh or maybe cry a little. But here’s the thing—COBOL isn’t going anywhere. In fact, it’s powering some of the largest and most critical systems on the planet right now.
The problem? Finding developers who understand COBOL is like finding unicorns. The original developers are retiring, and yet 200 billion lines of COBOL code are still running our banks, insurance companies, and government systems.
But here’s the plot twist: we now have the opportunity to support the unicorns. We have GitHub Copilot and autonomous AI agents.
Meet the developer who’s modernizing COBOL (without learning COBOL)
I recently spoke with Julia Kordick, Microsoft Global Black Belt, who’s modernizing COBOL systems using AI. What’s remarkable? She never learned COBOL.
Julia brought her AI expertise and worked directly with the people who had decades of domain knowledge. That partnership is the key insight here. She didn’t need to become a COBOL expert. Instead, she focused on what she does best: designing intelligent solutions. The COBOL experts provided the legacy system knowledge.
When this whole idea of Gen AI appeared, we were thinking about how we can actually use AI to solve this problem that has not been really solved yet.
Julia Kordick, Microsoft Global Black Belt
The three-step framework for AI-powered legacy modernization
Julia and her team at Microsoft have cracked the code (pun intended) with a systematic approach that works for any legacy modernization project, not just COBOL. Here’s their GitHub Copilot powered, battle-tested framework.
Step 1: Code preparation (reverse engineering)
The biggest problem with legacy systems? Organizations have no idea what their code actually does anymore. They use it, they depend on it, but understanding it? That’s another story.
This is where GitHub Copilot becomes your archaeological tool. Instead of hiring consultants to spend months analyzing code, you can use AI to:
- Extract business logic from legacy files.
- Document everything in markdown for human review.
- Automatically identify call chains and dependencies.
- Clean up irrelevant comments and historical logs.
- Add additional information as comments where needed.
| 💡Pro tip: Always have human experts review AI-generated analysis. AI is incredible at pattern recognition, but domain knowledge still matters for business context. |
Here’s what GitHub Copilot generates for you:
# Business Logic Analysis Generated by GitHub Copilot
## File Inventory
- listings.cobol: List management functionality (~100 lines)
- mainframe-example.cobol: Full mainframe program (~10K lines, high complexity)
## Business Purpose
Customer account validation with balance checking
- Validates account numbers against master file
- Performs balance calculations with overdraft protection
- Generates transaction logs for audit compliance
## Dependencies Discovered
- DB2 database connections via SQLCA
- External validation service calls
- Legacy print queue system
Step 2: Enrichment (making code AI-digestible)
You usually need to add context to help AI understand your code better. Here’s what that looks like:
Translation: If your code has Danish, German, or other non-English comments, translate them. Models work better with English context.
Structural analysis: COBOL has deterministic patterns. Even if you’ve never written COBOL, you can leverage these patterns because they’re predictable. Here’s how:
COBOL programs always follow the same four-division structure:
- IDENTIFICATION DIVISION (metadata about the program)
- ENVIRONMENT DIVISION (file and system configurations)
- DATA DIVISION (variable declarations and data structures)
- PROCEDURE DIVISION (the actual business logic)
Ask GitHub Copilot to map these divisions for you. Use prompts like:
"Identify all the divisions in this COBOL file and summarize what each one does"
"List all data structures defined in the DATA DIVISION and their purpose"
"Extract the main business logic flow from the PROCEDURE DIVISION"
The AI can parse these structured sections and explain them in plain English. You don’t need to understand COBOL syntax. You just need to know that COBOL’s rigid structure makes it easier for AI to analyze than more flexible languages.
Documentation as source of truth: Save everything AI generates as markdown files that become the primary reference. Julia explained it this way: “Everything that you let Copilot generate as a preparation, write it down as a markdown file so that it can actually reference these markdown files as source of truth.”
💡Pro tip: COBOL’s verbosity is actually an advantage here. Statements like ADD TOTAL-SALES TO ANNUAL-REVENUE are almost self-documenting. Ask Copilot to extract these business rules into natural language descriptions. |
Step 3: Automation Aids (Scaling the Process)
Once you’ve analyzed and enriched individual files, you need to understand how they all fit together. This is where you move from using Copilot interactively to building automated workflows with AI agents.
Julia’s team built a framework using Microsoft Semantic Kernel, which orchestrates multiple specialized agents. Each agent has a specific job, and they work together to handle the complexity that would overwhelm a single AI call.
Here’s what this orchestration looks like in practice:
- Call chain mapping: Generate Mermaid diagrams showing how files interact. One agent reads your COBOL files, another traces the CALL statements between programs, and a third generates a visual diagram. You end up with a map of your entire system without manually tracing dependencies.
- Test-driven modernization: Extract business logic (agent 1), generate test cases that validate that logic (agent 2), then generate modern code that passes those tests (agent 3). The tests become your safety net during migration.
- Dependency optimization: Identify utility classes and libraries that you can replace with modern equivalents. An agent analyzes what third-party COBOL libraries you’re using, checks if modern alternatives exist, and flags opportunities to simplify your migration.
Think of it like this: Copilot in your IDE is a conversation. This framework is a production line. Each agent does one thing well, and the orchestration layer manages the workflow between them.
| 💡Pro tip: Use Mermaid diagrams to visualize complex dependencies before making any changes. It helps you catch edge cases early. You can generate these diagrams by asking Copilot to trace all CALL statements in your codebase and output them in Mermaid syntax. Mermaid chart example: |

The reality check: It’s not a silver bullet
Julia’s brutally honest about limitations:
Everyone who’s currently promising you, ‘hey, I can solve all your mainframe problems with just one click’ is lying to you.
The reality is:
- Humans must stay in the loop for validation.
- Each COBOL codebase is unique and complex.
- We’re early in the agentic AI journey.
- Full automation is probably at least five years away.
But that doesn’t mean we can’t make massive progress today.
See it in action: the Azure samples framework
Julia and her team have open-sourced their entire framework. It’s built with Microsoft Semantic Kernel for agentic orchestration and includes:
- Multiple specialized agents: DependencyMapperAgent, COBOLAnalyzerAgent, JavaConverterAgent
- Cost tracking: See exactly how much each AI operation costs (usually $2-5 per 1000 lines analyzed)
- Human validation points: Built-in checkpoints for expert review
doctor.sh: A configuration and testing script that gets you started quickly
Try running the COBOL modernization framework:
- Fork the repository: aka.ms/cobol
- Set up your environment: Configure Azure OpenAI endpoint (or use local models for sensitive data)
- Run the doctor script:
./doctor.sh doctorvalidates your setup and dependencies - Start modernization:
./doctor.sh runbegins the automated process
# Quick setup for the impatient developer
git clone https://github.com/Azure-Samples/Legacy-Modernization-Agents
cd Legacy-Modernization-Agents
./doctor.sh setup
./doctor.sh run
The business case that changes everything
This isn’t just about technical debt. It’s about business survival. Organizations are facing a critical shortage of COBOL expertise right when they need it most.
The traditional approach has been to hire expensive consultants, spend 5+ years on manual conversion, and end up with auto-generated code that’s unmaintainable. I’ve seen this play out at multiple organizations. The consultants come in, run automated conversion tools, hand over thousands of lines of generated code, and leave. Then the internal team is stuck maintaining code they don’t understand in a language they’re still learning.
The AI-powered approach changes this. You use AI to understand business logic, generate human-readable modern code, and maintain control of your intellectual property. Your team stays involved throughout the process. They learn the business logic as they go. The code that comes out the other end is something your developers can actually work with.
Julia explained the shift:
What a lot of customers do not want to actually give all their intellectual property like a hundred percent to a partner anymore, right? They want to keep it in check.
Start here: your path to becoming the modernization hero
Whether you’re dealing with COBOL, ancient Java, or any legacy system, here’s how you can start today:
Start small
- Identify one problematic legacy system (start with fewer than 5,000 lines)
- Use GitHub Copilot to analyze a single file
- Document what you discover in markdown
- Share findings with your team
Build your AI toolkit
- Experiment with the Azure Samples framework
- Learn prompt engineering for code analysis (try: “Analyze this COBOL program and explain its business purpose in simple terms”)
- Practice iterative modernization techniques
Think beyond code
- Consider nonfunctional requirements for cloud-native design
- Plan for distributed systems architecture
- Remember: most COBOL programs are doing simple CRUD operations. They don’t need the complexity of a mainframe. They need the simplicity of modern architecture.
Here’s a challenge: Find a legacy system in your organization. Six-month-old code counts as legacy in our industry. Try using GitHub Copilot to:
- Generate business logic documentation
- Identify potential modernization opportunities
- Create a migration plan with human validation checkpoints
Share your results on LinkedIn and tag me. I’d love to see what you discover.
The best time to start is now
The most powerful insight from my conversation with Julia is this: AI doesn’t replace developer expertise. It amplifies it.
COBOL experts bring irreplaceable domain knowledge. Modern developers bring fresh perspectives on architecture and best practices. AI brings pattern recognition and translation capabilities at scale.
When these three forces work together, legacy modernization transforms from an impossible challenge into an achievable project.
The best time to modernize legacy code was 10 years ago. The second-best time is now.
Special thanks to Julia Kordick, Microsoft Global Black Belt, who shared her insights and experiences that made this blog post possible.
Ready to dive deeper? Check out the full blog post about this project at aka.ms/cobol-blog and connect with Julia on LinkedIn for the latest updates.
The age of legacy code doesn’t have to be a barrier anymore. With the right AI tools and framework, even 65-year-old COBOL can become approachable, maintainable, and modern.
What legacy system will you modernize next? Start building now with GitHub Copilot now >
The post How GitHub Copilot and AI agents are saving legacy systems appeared first on The GitHub Blog.
]]>The post GitHub Copilot CLI: How to get started appeared first on The GitHub Blog.
]]>You already live in the terminal. You clone repositories there, install dependencies, debug issues, and run builds. But until now, when you needed AI help, you had to leave the CLI and open your editor or browser. Not anymore.
GitHub Copilot CLI brings that same assistance straight to your shell. No switching contexts, no breaking flow. Just you, your terminal, and an AI that can actually help you get things done.
Install once, authenticate, and start working
With Copilot CLI, you don’t have to juggle your API keys. Just install the assistant, sign in with your existing GitHub Copilot Pro, Pro+, Business, or Enterprise plan, and go.
# 1. Install via npm
npm install -g @github/copilot
# 2. Launch Copilot CLI
copilot
# Authenticate with your GitHub account
/login
Requirements:
- Node v22+
- npm version 10 or later
- Launch Copilot CLI
- Log in with your GitHub account
From here, you can get hands-on immediately—debugging tests, spinning up preview deploys, or writing one-off scripts—without leaving your terminal.
Use case: From clone to pull request in the terminal
Imagine this: You’ve just cloned a repository you want to contribute to. Normally, you’d spend time reading through the README, manually checking dependencies, and combing through open issues to find a place to start. But with Copilot CLI, you can offload all of that.
1. Clone and launch Copilot
First things first. Grab the repository you want to work on and run Copilot CLI in your terminal.
gh repo clone github/spec-kit
cd spec-kit
copilot
Copilot greets you in the terminal. Type / at any time to see available commands, or use Ctrl+R to see logs of the commands Copilot has run on your behalf.
2. Get oriented in a new codebase
Once you’re inside the project, the first step is understanding how everything fits together. Instead of scrolling through files manually or piecing it together from the README, ask Copilot to explain it for you.
You say: Explain the layout of this project.
Copilot inspects the repository using find, tree, and the README, then returns a clean Markdown summary. No more hunting through nested directories trying to figure out where things live.
3. Check your environment
After you know the lay of the land, the next question is: can you actually build it? Normally, you’d spend time hunting for dependencies and making sure you’ve installed the right versions. Copilot now handles that.
You say: Make sure my environment is ready to build this project.
Copilot verifies dependencies, installs missing tools (like Go for the GitHub CLI), and confirms you can build locally. All without you having to comb through setup docs or run trial-and-error commands.
4. Find a good first issue
Now that you’re set up, you’ll want to start contributing. Instead of browsing through dozens of open issues, let Copilot surface the ones that make sense for you.
You say: Find good first issues in this repository and rank them by difficulty.
Copilot queries GitHub Issues with its built-in GitHub MCP server and suggests a curated list, complete with difficulty levels. Instead of scanning dozens of issues, you can dive straight into a task that matches your comfort zone.
5. Start implementing
Now comes the real work. Normally, after finding an issue to work on, you’d create a branch, open the right files, make edits, and double-check your changes before committing. With Copilot CLI, you can let it draft the fix for you while you stay in control at every step.
You say: Start implementing issue #1234. Show me the diff before applying.
Copilot drafts a plan, makes the edits, and presents the diff. You stay in control, review, and approve before changes are applied.
👀 Pro tip: You can @-mention files in your prompt if you want Copilot to focus on specific parts of the code.
6. Commit and open a draft pull request
Once the changes look good, the next step is packaging them up and sharing your work. Normally, that means staging files, writing a commit message, pushing a branch, and opening a pull request, which is all a bit of a dance in Git. Copilot CLI streamlines the whole flow so you can stay focused on the code.
You say: Stage changes, write a commit referencing #1234, and open a draft PR.
Copilot will then stage files, write the commit message, and open a draft pull request for you to review.
7. Bonus: Kill that process hogging your port
Let’s say you’ve hit another common headache: a process hogging a port. You know, that moment when you try to start your dev server and it tells you a port (let’s say 8080 for this example) is already in use, and you have to go hunting for the right lsof command and flags.
You say: What process is using port 8080? Kill it and verify the port is free.
Copilot runs the right lsof command, shows the PID, kills the process, and verifies it’s gone. No more Googling arcane flags or trying to remember if it’s lsof -i :8080 or lsof -t -i:8080 or something else entirely.
I’m horrible at remembering commands, especially ones I use infrequently. With Copilot CLI, I just defer these tasks straight to it. Maybe I’ll remember the command next time, or maybe (probably) not. But I’ll definitely ask Copilot again.
Stay in control
Copilot always asks before running commands or accessing directories. This is critical when you’re giving an AI access to run things on your machine.
Before Copilot can execute anything, it will prompt you to:
- Allow once
- Allow always for this command
- Deny
You can also:
- Use
/sessionto view what’s currently allowed - Reset permissions at any time with
/reset - Add directories to your allowed list with
/add-directory
Extend with MCP servers
Copilot CLI ships with the GitHub MCP server already installed and running. This is what powers the issue search and repository interactions. But you can add any MCP server you want from the registry using /mcp.
Want to add Playwright for browser testing? Or integrate with your company’s internal tools? You can customize and extend Copilot CLI to match your workflow.

Why this matters
Here’s what I appreciate most about Copilot CLI: It meets me where I already work. I spend a lot of time in the terminal anyway, jumping between repositories, checking logs, running builds. Having Copilot right there means I’m not constantly switching contexts between my IDE, browser, and command line just to get AI help.
When I’m onboarding contributors to our projects or exploring a new codebase myself, I can stay in that flow. I can ask about the project structure, verify dependencies, find issues to work on, and start implementing without bouncing around between tools. That consistency matters when you’re trying to maintain momentum.
This isn’t about replacing your IDE. It’s about having the right tool in the right place.
What’s next
Copilot CLI is in public preview, and your feedback will shape our roadmap. We have ideas for what’s coming next, but we want to know what matters most to you.
👉 Install it today with:
npm install -g copilot
Then share your experience using /feedback.
Start using GitHub Copilot CLI >
The post GitHub Copilot CLI: How to get started appeared first on The GitHub Blog.
]]>The post How to build reliable AI workflows with agentic primitives and context engineering appeared first on The GitHub Blog.
]]>Many developers begin their AI explorations with a prompt. Perhaps you started the same way: You opened GitHub Copilot, started asking questions in natural language, and hoped for a usable output. This approach can work for simple fixes and code suggestions, but as your needs get more complex—or as your work gets more collaborative—you’re going to need a more foolproof strategy.
This guide will introduce you to a three-part framework that transforms this ad-hoc style of AI experimentation into a repeatable and reliable engineering practice. At its core are two concepts: agentic primitives, which are reusable, configurable building blocks that enable AI agents to work systematically; and context engineering, which ensures your AI agents always focus on the right information. By familiarizing yourself with these concepts, you’ll be able to build AI systems that can not only code independently, but do so reliably, predictably, and consistently.

Markdown prompt engineering + agent primitives + context engineering = reliability
Whether you’re new to AI-native development or looking to bring deeper reliability to your agent workflows, this guide will give you the foundation you need to build, scale, and share intelligent systems that learn and improve with every use.
What are agent primitives?
The three-layer framework below turns ad-hoc AI experimentation into a reliable, repeatable process. It does this by combining the structure of Markdown; the power of agent primitives, simple building blocks that give your AI agents clear instructions and capabilities; and smart context management, so your agents always get the right information (not just more information).
Layer 1: Use Markdown for more strategic prompt engineering
We’ve written about the importance of prompt engineering. But here’s what you need to know: The clearer, more precise, more context-rich your prompt, the better, more accurate your outcome. This is where Markdown comes in. With Markdown’s structure (its headers, lists, and links), you can naturally guide AI’s reasoning, making outputs more predictable and consistent.
To provide a strong foundation for your prompt engineering, try these techniques with Markdown as your guide:
- Context loading:
[Review existing patterns](./src/patterns/). In this case, links become context injection points that pull in relevant information, either from files or websites. - Structured thinking: Use headers and bullets to create clear reasoning pathways for the AI to follow.
- Role activation: Use phrases like “You are an expert [in this role].” This triggers specialized knowledge domains and will focus the AI’s responses.
- Tool integration: Use MCP tool
tool-name. This lets your AI agent run code in a controlled, repeatable, and predictable way on MCP servers. - Precise language: Eliminate ambiguity through specific instructions.
- Validation gates: “Stop and get user approval.” Make sure there is always human oversight at critical decision points.
For example, instead of saying, Find and fix the bug, use the following:
You are an expert debugger, specialized in debugging complex programming issues.
You are particularly great at debugging this project, which architecture and quirks can be consulted in the [architecture document](./docs/architecture.md).
Follow these steps:
1. Review the [error logs](./logs/error.log) and identify the root cause.
2. Use the `azmcp-monitor-log-query` MCP tool to retrieve infrastructure logs from Azure.
3. Once you find the root cause, think about 3 potential solutions with trade-offs
4. Present your root cause analysis and suggested solutions with trade-offs to the user and seek validation before proceeding with fixes - do not change any files.
Once you’re comfortable with structured prompting, you’ll quickly realize that manually crafting perfect prompts for every task is unsustainable. (Who has the time?) This is where the second step comes in: turning your prompt engineering insights into reusable, configurable systems.
Layer 2: Agentic primitives: Deploying your new prompt engineering techniques
Now it’s time to implement all of your new strategies more systematically, instead of prompting ad hoc. These configurable tools will help you do just that.
Core agent primitives
When it comes to AI-native development, a core agent primitive refers to a simple, reusable file or module that provides a specific capability or rule for an agent.
Here are some examples:
- Instructions files: Deploy structured guidance through modular
.instructions.mdfiles with targeted scope. At GitHub, we offer custom instructions to give Copilot repository-specific guidance and preferences. - Chat modes: Deploy role-based expertise through
.chatmode.mdfiles with MCP tool boundaries that prevent security breaches and cross-domain interference. For example, professional licenses that keep architects from building and engineers from planning. - Agentic workflows: Deploy reusable prompts through
.prompt.mdfiles with built-in validation. - Specification files: Create implementation-ready blueprints through
.spec.mdfiles that ensure repeatable results, whether the work is done by a person or by AI. - Agent memory files: Preserve knowledge across sessions through
.memory.mdfiles. - Context helper files: Optimize information retrieval through
.context.mdfiles.
This transformation might seem complex, but notice the pattern: What started as an ad-hoc request became a systematic workflow with clear handoff points, automatic context loading, and built-in validation.
When you use these files and modules, you can keep adjusting and improving how your AI agent works at every step. Every time you iterate, you make your agent a little more reliable and consistent. And this isn’t just random trial and error — you’re following a structured, repeatable approach that helps you get better and more predictable results every time you use the AI.
💡 Native VS Code support: While VS Code natively supports .instructions.md, .prompt.md, and .chatmode.md files, this framework takes things further with .spec.md, .memory.md, and .context.md patterns that unlock even more exciting possibilities AI-powered software development. |
With your prompts structured and your agentic primitives set up, you may encounter a new challenge: Even the best prompts and primitives can fail when they’re faced with irrelevant context or they’re competing for limited AI attention. The third layer, which we’ll get to next, addresses this through strategic context management.
Layer 3: Context engineering: Helping your AI agents focus on what matters
Just like people, LLMs have finite limited memory (context windows), and can sometimes be forgetful. If you can be strategic about the context you give them, you can help them focus on what’s relevant and enable them to get started and work quicker. This helps them preserve valuable context window space and improve their reliability and effectiveness.
Here are some techniques to make sure they get the right context—this is called context engineering:
- Session splitting: Use distinct agent sessions for different development phases and tasks. For example, use one session for planning, one for implementation, and one for testing. If an agent has fresh context, it’ll have better focus. It’s always better to have a fresh context window for complex tasks.
- Modular and custom rules and instructions: Apply only relevant instructions through targeted
.instructions.mdfiles usingapplyToYAML frontmatter syntax. This preserves context space for actual work and reduces irrelevant suggestions. - Memory-driven development: Leverage agent memory through
.memory.mdfiles to maintain project knowledge and decisions across sessions and time. - Context optimization: Use
.context.mdcontext helper files strategically to accelerate information retrieval and reduce cognitive load. - Cognitive focus optimization: Use chat modes in
.chatmode.mdfiles to keep the AI’s attention on relevant domains and prevent cross-domain interference. Less context pollution means you’ll have more consistent and accurate outputs.
Agentic workflows: The complete system in action
Now that you understand all three layers, you can see how they combine into agentic workflows—complete, systematic processes where all of your agentic primitives are working together, understanding your prompts, and using only the context they need.
These agentic workflows can be implemented as .prompt.md files that coordinate multiple agentic primitives into processes, designed to work whether executed locally in your IDE, in your terminal or in your CI pipelines.
Tooling: how to scale agent primitives
Now that you understand the three-layer framework and that the agentic primitives are essentially executable software written in natural language, the question is: How can you scale these Markdown files beyond your individual development workflow?
Natural language as code
The answer mirrors every programming ecosystem’s evolution. Just like JavaScript evolved from browser scripts to using Node.js runtimes, package managers, and deployment tooling, agent primitives need similar infrastructure to reach their full potential.
This isn’t just a metaphor: These .prompt.md and .instructions.md files represent a genuine new form of software development that requires proper tooling infrastructure.
Here’s what we mean: Think of your agent primitives as real pieces of software, just written in natural language instead of code. They have all the same qualities: You can break complex tasks into smaller pieces (modularity), use the same instructions in multiple places (reusability), rely on other tools or files (dependencies), keep improving and updating them (evolution), and share them across teams (distribution).
That said, your natural language programs are going to need the same infrastructure support as any other software.
Agent CLI runtimes
Most developers start by creating and running agent primitives directly in VS Code with GitHub Copilot, which is ideal for interactive development, debugging, and refining daily workflows. However, when you want to move beyond the editor—to automate your workflows, schedule them, or integrate them into larger systems—you need agent CLI runtimes like Copilot CLI.
These runtimes let you execute your agent primitives from the command line and tap into advanced model capabilities. This shift unlocks automation, scaling, and seamless integration into production environments, taking your natural language programs from personal tools to powerful, shareable solutions.
Runtime management
While VS Code and GitHub Copilot handle individual development, some teams may want additional infrastructure for sharing, versioning, and productizing their agent primitives. Managing multiple Agent CLI runtimes can become complex quickly, with different installation procedures, configuration requirements, and compatibility matrices.
APM (Agent Package Manager) solves this by providing unified runtime management and package distribution. Instead of manually installing and configuring each vendor CLI, APM handles the complexity while preserving your existing VS Code workflow.
Here’s how runtime management works in practice:
# Install APM once
curl -sSL https://raw.githubusercontent.com/danielmeppiel/apm/main/install.sh | sh
# Optional: setup your GitHub PAT to use GitHub Copilot CLI
export GITHUB_COPILOT_PAT=your_token_here
# APM manages runtime installation for you
apm runtime setup copilot # Installs GitHub Copilot CLI
apm runtime setup codex # Installs OpenAI Codex CLI
# Install MCP dependencies (like npm install)
apm install
# Compile Agent Primitive files to Agents.md files
apm compile
# Run workflows against your chosen runtime
# This will trigger 'copilot -p security-review.prompt.md' command
# Check the example apm.yml file a bit below in this guide
apm run copilot-sec-review --param pr_id=123
As you can see, your daily development stays exactly the same in VS Code, APM installs and configures runtimes automatically, your workflows run regardless of which runtime is installed, and the same apm run command works consistently across all runtimes.
Distribution and packaging
Agent primitives’ similarities to traditional software become most apparent when you get to the point of wanting to share them with your team or deploying them into production—when you start to require things like package management, dependency resolution, version control, and distribution mechanisms.
Here’s the challenge: You’ve built powerful agent primitives in VS Code and your team wants to use them, but distributing Markdown files and ensuring consistent MCP dependencies across different environments becomes unwieldy. You need the equivalent of npm for natural language programs.
APM provides this missing layer. It doesn’t replace your VS Code workflow—it extends it by creating distributable packages of agent primitives complete with dependencies, configuration, and runtime compatibility that teams can share, just like npm packages.
Package management in practice
# Initialize new APM project (like npm init)
apm init security-review-workflow
# Develop and test your workflow locally
cd security-review-workflow
apm compile && apm install
apm run copilot-sec-review --param pr_id=123
# Package for distribution (future: apm publish)
# Share apm.yml and Agent Primitive files with team
# Team members can install and use your primitives
git clone your-workflow-repo
cd your-workflow-repo && apm compile && apm install
apm run copilot-sec-review --param pr_id=456
The benefits compound quickly: You can distribute tested workflows as versioned packages with dependencies, automatically resolve and install required MCP servers, track workflow evolution and maintain compatibility across updates, build on (and contribute to) shared libraries from the community, and ensure everyone’s running the same thing.
Project configuration
The following apm.yml configuration file serves as the package.json equivalent for agent primitives, defining scripts, dependencies, and input parameters:
# apm.yml - Project configuration (like package.json)
name: security-review-workflow
version: 1.2.0
description: Comprehensive security review process with GitHub integration
scripts:
copilot-sec-review: "copilot --log-level all --log-dir copilot-logs --allow-all-tools -p security-review.prompt.md"
codex-sec-review: "codex security-review.prompt.md"
copilot-debug: "copilot --log-level all --log-dir copilot-logs --allow-all-tools -p security-review.prompt.md"
dependencies:
mcp:
- ghcr.io/github/github-mcp-server
With this, your agent primitives can now be packaged as distributable software with managed dependencies.
Production deployment
The final piece of the tooling ecosystem enables continuous AI: packaged agent primitives can now run automatically in the same CI/CD pipelines you use every day, bringing your carefully developed workflows into your production environment.
Using APM GitHub Action, and building on the security-review-workflow package example above, here’s how the same APM project deploys to production with multi-runtime flexibility:
# .github/workflows/security-review.yml
name: AI Security Review Pipeline
on:
pull_request:
types: [opened, synchronize]
jobs:
security-analysis:
runs-on: ubuntu-latest
strategy:
matrix:
# Maps to apm.yml scripts
script: [copilot-sec-review, codex-sec-review, copilot-debug]
permissions:
models: read
pull-requests: write
contents: read
steps:
- uses: actions/checkout@v4
- name: Run Security Review (${{ matrix.script }})
uses: danielmeppiel/action-apm-cli@v1
with:
script: ${{ matrix.script }}
parameters: |
{
"pr_id": "${{ github.event.pull_request.number }}"
}
env:
GITHUB_COPILOT_PAT: ${{ secrets.COPILOT_CLI_PAT }}
Key connection: The matrix.script values (copilot-sec-review, codex-sec-review, copilot-debug) correspond exactly to the scripts defined in the apm.yml configuration above. APM automatically installs the MCP dependencies (ghcr.io/github/github-mcp-server) and passes the input parameters (pr_id) to your security-review.prompt.md workflow.
Here’s why this matters:
- Automation: Your AI workflows now run on their own, without anyone needing to manually trigger them.
- Reliability: They run with the same consistency and reproducibility as traditional code deployments.
- Flexibility: You can run different versions or types of analysis (mapped to different scripts) as needed.
- Integration: These workflows become part of your organization’s standard CI/CD pipelines, just like regular software quality checks.
This setup ultimately means your agent primitives are no longer just local experiments—they are fully automated tools that you can rely on as part of your software delivery process, running in CI/CD whenever needed, with all dependencies and parameters managed for you.
Ecosystem evolution
This progression follows the same predictable pattern as every successful programming ecosystem. Understanding this pattern helps you see where AI-native development is heading and how to position your work strategically.
The evolution happens in four stages:
- Raw Code → agent primitives (
.prompt.md,.instructions.mdfiles) - Runtime environments → Agent CLI runtimes
- Package management → APM (distribution and orchestration layer)
- Thriving ecosystem → Shared libraries, tools, and community packages
Just as npm enabled JavaScript’s explosive growth by solving the package distribution problem, APM enables the agent primitive ecosystem to flourish by providing the missing infrastructure layer that makes sharing and scaling natural language programs practical.
The transformation is profound: what started as individual Markdown files in your editor becomes a systematic software development practice with proper tooling, distribution, and production deployment capabilities.
How to get started with building your first agent primitive
Now it’s time to build your first agent primitives. Here’s the plan:
- Start with instructions: Write clear instructions that tell the AI exactly what you want it to do and how it should behave.
- Add chat modes: Set up special rules (chat modes) to create safe boundaries for the AI, making sure it interacts in the way you want and avoids unwanted behavior.
- Build reusable prompts: Create prompt templates for tasks you do often, so you don’t have to start from scratch each time. These templates help the AI handle common jobs quickly and consistently.
- Create specification templates: Make templates that help you plan out what you want your AI to accomplish, then turn those plans into actionable steps the AI can follow.
Instructions architecture
Instructions form the bedrock of reliable AI behavior: They’re the rules that guide the agent without cluttering your immediate context. Rather than repeating the same guidance in every conversation, instructions embed your team’s knowledge directly into the AI’s reasoning process.
The key insight is modularity: instead of one massive instruction file that applies everywhere, you can create targeted files that activate only when working with specific technologies or file types. This context engineering approach keeps your AI focused and your guidance relevant.
✅ Quick actions:
- Create the general
copilot-instructions.mdfile in the.githubfolder for the repository with common rules. - Create modular
.instructions.mdfiles in the.github/instructions/folder by domain (frontend, backend, testing, docs, specs…). - Use
applyTo: "**/*.{js,ts...}"patterns for selective application.
.github/
├── copilot-instructions.md # Global repository rules
└── instructions/
├── frontend.instructions.md # applyTo: "**/*.{jsx,tsx,css}"
├── backend.instructions.md # applyTo: "**/*.{py,go,java}"
└── testing.instructions.md # applyTo: "**/test/**"
Example: Markdown prompt engineering in Instructions with frontend.instructions.md:
---
applyTo: "**/*.{ts,tsx}"
description: "TypeScript development guidelines with context engineering"
---
# TypeScript Development Guidelines
## Context Loading
Review [project conventions](../docs/conventions.md) and
[type definitions](../types/index.ts) before starting.
## Deterministic Requirements
- Use strict TypeScript configuration
- Implement error boundaries for React components
- Apply ESLint TypeScript rules consistently
## Structured Output
Generate code with:
- [ ] JSDoc comments for all public APIs
- [ ] Unit tests in `__tests__/` directory
- [ ] Type exports in appropriate index files
⚠️ Checkpoint: Instructions are context-efficient and non-conflicting.
Chat modes configuration
With your instruction architecture in place, you still need a way to enforce domain boundaries and prevent AI agents from overstepping their expertise. Chat modes solve this by creating professional boundaries similar to real-world licensing. For example, you’d want your architect to plan a bridge and not build it themself.
Here’s how to set those boundaries:
- Define domain-specific custom chat modes with MCP tool boundaries.
- Encapsulate tech stack knowledge and guidelines per mode.
- Define the most appropriate LLM model for your chat mode.
- Configure secure MCP tool access to prevent cross-domain security breaches.
| 💡 Security through MCP tool boundaries: Each chat mode receives only the specific MCP tools needed for their domain. Giving each chat mode only the tools it needs keeps your AI workflows safe, organized, and professionally separated—just like real-world roles and permissions. |
.github/
└── chatmodes/
├── architect.chatmode.md # Planning specialist - designs, cannot execute
├── frontend-engineer.chatmode.md # UI specialist - builds interfaces, no backend access
├── backend-engineer.chatmode.md # API specialist - builds services, no UI modification
└── technical-writer.chatmode.md # Documentation specialist - writes docs, cannot run code
Example: Creating MCP tool boundaries with backend-engineer.chatmode.md:
---
description: 'Backend development specialist with security focus'
tools: ['changes', 'codebase', 'editFiles', 'runCommands', 'runTasks',
'search', 'problems', 'testFailure', 'terminalLastCommand']
model: Claude Sonnet 4
---
You are a backend development specialist focused on secure API development, database design, and server-side architecture. You prioritize security-first design patterns and comprehensive testing strategies.
## Domain Expertise
- RESTful API design and implementation
- Database schema design and optimization
- Authentication and authorization systems
- Server security and performance optimization
You master the backend of this project thanks to you having read all [the backend docs](../../docs/backend).
## Tool Boundaries
- **CAN**: Modify backend code, run server commands, execute tests
- **CANNOT**: Modify client-side assets
You can also create security and professional boundaries, including:
- Architect mode: Allow access to research tools only, so they can’t execute destructive commands or modify production code.
- Frontend engineer mode: Allow access to UI development tools only, so they can’t access databases or backend services.
- Backend engineer mode: Allow access to API and database tools only, so they can’t modify user interfaces or frontend assets.
- Technical writer mode: Allow access to documentation tools only, so they can’t run code, deploy, or access sensitive systems.
⚠️ Checkpoint: Each mode has clear boundaries and tool restrictions.
Agentic workflows
Agentic workflows can be implemented as reusable .prompt.md files that orchestrate all your primitives into systematic, repeatable end-to-end processes. These can be executed locally or delegated to independent agents. Here’s how to get started:
- Create
.prompt.mdfiles for complete development processes. - Build in mandatory human reviews.
- Design workflows for both local execution and independent delegation.
.github/prompts/
├── code-review.prompt.md # With validation checkpoints
├── feature-spec.prompt.md # Spec-first methodology
└── async-implementation.prompt.md # GitHub Coding Agent delegation
Example: Complete agentic workflow with feature-spec.prompt.md:
---
mode: agent
model: gpt-4
tools: ['file-search', 'semantic-search', 'github']
description: 'Feature implementation workflow with validation gates'
---
# Feature Implementation from Specification
## Context Loading Phase
1. Review [project specification](${specFile})
2. Analyze [existing codebase patterns](./src/patterns/)
3. Check [API documentation](./docs/api.md)
## Deterministic Execution
Use semantic search to find similar implementations
Use file search to locate test patterns: `**/*.test.{js,ts}`
## Structured Output Requirements
Create implementation with:
- [ ] Feature code in appropriate module
- [ ] Comprehensive unit tests (>90% coverage)
- [ ] Integration tests for API endpoints
- [ ] Documentation updates
## Human Validation Gate
🚨 **STOP**: Review implementation plan before proceeding to code generation.
Confirm: Architecture alignment, test strategy, and breaking change impact.
⚠️ Checkpoint: As you can see, these prompts include explicit validation gates.
Specification templates
There’s often a gap between planning (coming up with what needs to be built) and implementation (actually building it). Without a clear, consistent way to document requirements, things can get lost in translation, leading to mistakes, misunderstandings, or missed steps. This is where specification templates come in. These templates ensure that both people and AI agents can take a concept (like a new feature or API) and reliably implement it.
Here’s what these templates help you accomplish:
- Standardize the process: You create a new specification for each feature, API endpoint, or component.
- Provide blueprints for implementation: These specs include everything a developer (or an AI agent) needs to know to start building: the problem, the approach, required components, validation criteria, and a checklist for handoff.
- Make handoff deterministic: By following a standard, the transition from planning to doing is clear and predictable.
Spec-kit is a neat tool that fully implements a specification-driven approach to agentic coding. It allows you to easily get started with creating specs (spec.md), an implementation plan (plan.md) and splitting that into actual tasks (tasks.md) ready for developers or coding agents to work on.
⚠️ ️Checkpoint: Specifications are split into tasks that are implementation-ready before delegation.
Ready to go? Here’s a quickstart checklist
You now have a complete foundation for systematic AI development. The checklist below walks through the implementation sequence, building toward creating complete agentic workflows.
- Understand Markdown prompt engineering principles (semantic structure, precision, and tools).
- Grasp context engineering fundamentals (context window optimization and session strategy).
- Create
.github/copilot-instructions.mdwith basic project guidelines (context engineering: global rules). - Set up domain-specific
.instructions.mdfiles withapplyTopatterns (context engineering: selective loading). - Configure chat modes for your tech stack domains (context engineering: domain boundaries).
- Create your first
.prompt.mdagentic workflow. - Build your first
.spec.mdtemplate for feature specifications (you can use spec-kit for this). - Practice a spec-driven approach with session splitting: plan first, split into tasks, and lastly, implement.
Take this with you
Working with AI agents shouldn’t have to be unpredictable. With the right planning and tools, these agents can quickly become a reliable part of your workflow and processes—boosting not only your own productivity, but your team’s too.
Ready for the next phase of multi-agent coordination delegation? Try GitHub Copilot CLI to get started >
The post How to build reliable AI workflows with agentic primitives and context engineering appeared first on The GitHub Blog.
]]>