Trying something new
The AI side of the Microsoft stack moves faster than the weekly Azure update can comfortably absorb. So I’m starting a separate monthly AI round-up alongside the Azure Weekly. A focused view on what changed across Foundry, M365 Copilot, GitHub Copilot, Copilot Studio and the model catalog. If it’s useful, I’ll keep it going each month.
This is the May 2026 edition. Grouped by area.
Models
The model catalog had a busy month. Frontier additions, lower-cost options, and Microsoft’s own MAI models.
Claude Opus 4.8 in Foundry, M365 Copilot and GitHub Copilot
Anthropic’s latest frontier model is now available in Microsoft Foundry, M365 Copilot Chat and GitHub Copilot. It’s designed for:
- Complex multi-step coding.
- Long-horizon agentic work (runs that go for days).
- Improved tool selection and instruction following.
- Better multi-turn follow-through.
- 4x less likely than 4.7 to let its own code flaws pass through.
- Dynamic workflows, can plan and run hundreds of parallel sub-agents.
It’s positioned squarely at frontier coding and large-scale agentic work. Codebase migrations across very large repos are the obvious use case. Same caveat as always: don’t reflex-reach for the biggest model. Use Opus where the task actually demands it, and route everything else to cheaper options.
GPT-5.5 Instant and GPT-5.5 Thinking in Foundry and M365 Copilot Chat
Two flavours of GPT-5.5:
- Instant (GPT-chat-latest): low-latency, tuned for chat, retrieval, multi-turn assistance, interactive experiences, RAG-shaped apps.
- Thinking: deeper reasoning for analysis and harder problems.
You can select between them directly in Copilot Chat. Fast model for routine work, thinking model where the problem actually requires deliberation.
GPT realtime 2, GPT realtime translate, GPT realtime whisper in Foundry
The three new OpenAI realtime models all landed in Foundry as global standard deployments:
- GPT realtime 2: speech-to-speech with internal reasoning.
- GPT realtime translate: continuous real-time speech translation.
- GPT realtime whisper: low-latency streaming transcription.
Useful in any application where you need real-time voice (customer support, translation services, conversational agents) and where the previous-generation realtime models weren’t quite good enough.
xAI Grok 4.3 in Foundry
xAI’s Grok 4.3 is also in Foundry. Improvements in tool calling, instruction following and lower hallucinations. Another frontier choice inside the same enterprise-governed catalog, so you can route the right prompt to the right model.
Deepseek V4, V4 Pro and Kimi 2.6 in Foundry via Fireworks
Three additional models exposed through Foundry via Fireworks:
- Deepseek V4 Pro: high precision, deep reasoning, long document understanding.
- Deepseek V4: low-latency, high-throughput scenarios.
- Kimi 2.6: Moonshot’s long-context reasoning model.
The economics matter here. These are capable models that often sit at lower price points than the frontier alternatives. Useful where you’ve got high-volume summarisation, classification, or routine reasoning that doesn’t need an Opus to handle.
Cohere Command A+ and new image models in Foundry
Cohere Command A+ is an enterprise RAG-and-agent model. Alongside it, a batch of image models (Z Image Turbo, Flux 1 Chanel, SDXL 1.0) landed for image generation and multimodal retrieval. Data stays in your Azure tenant.
MAI-Image-2-Efficient in Foundry Labs
Microsoft’s own MAI image 2 efficient. 22% faster and 4x more efficient than the previously released MAI Image 2, and roughly 40% ahead of leading models on average. Lives in Foundry Labs, which is the experimental playground for early-stage AI. Cheaper, faster image generation lowers the cost of adding visuals to whatever you’re building.
Microsoft Foundry
Trace-based evaluation for external and hosted agents
This is the most useful Foundry update this month, even though it sounds mundane.
Evaluations are essential when you ship generative AI because traditional input-A-yields-output-B testing doesn’t work. These models are non-deterministic. You use evals (often LLM-as-judge) to check that the agent stays fit for purpose and doesn’t degrade.
What’s new: instead of running evals against a curated test dataset only in CI / dev cycles, you can now run them against real production traces. The agent emits traces (App Insights or equivalent), and quality is measured against real user interactions. No curated test set required.
Works across Foundry, GCP, AWS, any framework. Anywhere you can wire up tracing. Combine it with Foundry’s optimiser and you get a continuous improvement loop driven by real usage. This is what production-grade agents actually looks like.
GPT-5 reinforcement fine-tuning, gated GA
GPT-5 reinforcement fine-tuning reaches gated GA. Still gated, but RL-based fine-tuning lets you specialise a frontier model for your domain through a reward-based process instead of relying on labelled data. Higher accuracy on proprietary workflows than you’ll get from a system prompt or few-shot examples alone.
If you’ve been hitting the ceiling of what prompt engineering can do, this is the next lever.
Managed virtual networks, project cost attribution, Content Understanding (GA)
Three Foundry features go GA:
- Managed virtual networks: keep traffic inside a defined network boundary, no public endpoint exposure.
- Project-level cost attribution: actually understand which project is spending what.
- Content Understanding: the read and layout analysers go GA.
All useful in the same direction: control and finance visibility for production AI.
Open agentic stack from Microsoft Research: MagenticLite, MagenticBrain, Fara 1.5
An interesting drop from Microsoft Research. A transparent, self-hostable open agentic stack:
- MagenticLite: application layer.
- MagenticBrain: 14B-parameter orchestrator (fine-tuned from Qwen 3 8B). Handles planning, coding, delegation.
- Fara 1.5: computer-use models in 3 sizes (4B / 9B / 27B). The 9B is considered state of the art among small computer-use models.
The pitch: an autonomous computer-use agent you can run yourself, with full control and cost advantages over a closed black-box service. Worth a look if you’re investigating agentic systems and want something you can host and reason about.
Foundry Local 1.1 and 1.2 + azure-ai-projects 2.2.0 SDK
Foundry Local (running models locally at the edge) got two releases. New across them:
- Live audio transcription.
- Text embeddings.
- Qwen 35 vision.
- Multilingual ASR (speech recognition).
- Linux ARM 64 support.
- Cancel model downloads.
- ONNX Runtime 1.26.
The azure-ai-projects 2.2.0 SDK adds skills, toolboxes, external agent definitions, and a model weight registry.
The whole point of Foundry Local is hybrid. Use cloud for what cloud is good for, run on-edge or on-device for privacy, offline, low-latency or cost reasons. Same SDK either side.
Foundry IQ
Foundry IQ is the curated-knowledge layer. A knowledge base sits over multiple knowledge sources and decides which source matters for each query, customises the retrieval, and returns either a generated answer or the raw data.
Why it matters: token economics. Sending an agent your whole document store and hoping the model figures it out is wasteful. Sending the specific data the model actually needs is much cheaper and produces better output. Foundry IQ is the path to that.
Copilot Studio
Computer-using agents
Many things have APIs or MCP servers. Many do not. They only have an app or a website. Computer-using agents in Copilot Studio let an agent drive the actual UI of an app or website, and can now be wired into multi-step workflows.
For organisations with long tails of internal systems that never got modernised, this is useful. You don’t have to wait for someone to expose an API.
New workflow experience
A redesigned workflow experience in early-release environments. Single unified canvas, more intuitive, easier to design end-to-end. The interesting bit:
- Workflow steps are deterministic and rule-based for consistency, speed, repeatability.
- Within those steps, you can call a Foundry-hosted or prompt-based agent where you need judgment or creativity.
Deterministic where determinism matters, agentic where you actually need judgment. Don’t treat them as opposing choices.
M365 Copilot
Redesign
A full redesign of the M365 Copilot surface. Better task awareness, expandable prompt box, automatic Work IQ grounding, automatic model selection. The point isn’t the look. A cleaner surface should translate into higher productivity and better ROI on Copilot licensing.
Implicit Outlook grounding
Outlook content is now automatically grounded in relevant Copilot interactions when appropriate. You don’t have to explicitly point Copilot at a thread for it to pick up the context.
PDFs in chat
Copilot Chat can now reason directly over PDFs you drop into the conversation.
GitHub Copilot
Opus 4.8, Gemini 3.5 Flash GA, GPT-5.3 Codex, automatic model selection
The Copilot model menu expanded again. Opus 4.8, Gemini 3.5 Flash (GA), GPT-5.3 Codex all available. Automatic selection routes each request to the appropriate model. Hover the response to see which model handled it. Same direction as the Foundry Model Router: stop picking models manually for routine work.
GitHub Copilot app (preview)
A standalone Copilot app (technical preview). Manages sessions, each with its own branch, files, conversation and state. You can move between sessions to keep work streams separate without losing context.
Copilot CLI remote control (GA)
The Copilot CLI gains remote control. Start a session on VS Code, JetBrains, or the CLI, enable remote mode, and the session streams. Continue from the GitHub mobile app or the web.
For anyone juggling between devices, this is a real productivity win.
Organisational model rules (preview)
At enterprise level you can now set organisational rules for which models are available. Preferred, optional, blocked. Finer-grained governance of model use across the orgs in your enterprise.
Final thoughts
A few things from the month worth acting on.
Foundry’s trace-based evaluation is the one I’d wire up first. It closes the gap between “agent works on the demo” and “agent stays good in production over time”. If you ship anything agentic, plug it in.
Pick the right model. The catalog continues expanding both up and down. Opus 4.8 and Fable-class at the frontier, Deepseek V4, Kimi 2.6, MAI-Image-2-Efficient and Foundry Local for cost-efficient routine work. Use Opus where the task demands Opus. Route everything else to a smaller, cheaper model. Automatic model selection in Copilot and the Foundry Model Router are the easy wins.
Hybrid is real. Foundry Local 1.2 plus the same SDK between cloud and edge, plus MagenticBrain and Fara 1.5 as a self-hostable stack. The agentic story is no longer cloud-only. For workloads with privacy, offline or latency constraints, you have real options.
And the Copilot Studio redesign captures the right pattern for production AI work: deterministic workflows for consistency, agents for judgment. Use both.
See you next month.
Sources
- John Savill, “Microsoft AI Update May 2026,” YouTube, https://www.youtube.com/watch?v=N-K1AS7vbAQ