Category: Artificial Intelligence in Business

  • How to Build a Custom Connector for Copilot Studio Step by Step

    How to Build a Custom Connector for Copilot Studio Step by Step

    Building a custom connector for Copilot Studio with OpenAPI definition and authentication settings

    The built-in connectors cover a lot, but the moment you need to talk to an internal REST API from a Copilot Studio agent, you need a custom connector. This walkthrough builds a working copilot studio custom connector end to end, from the OpenAPI definition to the agent calling it as a tool with structured output the LLM can actually reason about.

    The result: an agent that calls an internal API, gets back a typed JSON response, and uses it in the conversation without guessing.

    Step 1: Get your API contract straight before you open the portal

    Do not start in Power Apps. Start with the API spec. You need an OpenAPI 2.0 (Swagger) file or a clear list of endpoints, methods, query parameters, request bodies, and response schemas.

    If your API returns 200 OK with an empty body on success, fix that first. An agent needs structured output to evaluate what happened. I learned this the hard way building agentic flows: returning {"status":"done"} is not enough. Return the actual record, the actual ID, the actual changed fields.

    Make sure every response has a defined schema. type: object with named properties. No loose additionalProperties: true dumps. The agent reads the schema to decide how to use the result.

    Step 2: Create the custom connector in the Power Platform maker portal

    Go to make.powerautomate.com, pick the right environment (this matters, custom connectors are environment-scoped), then Data, Custom connectors, New custom connector, Import an OpenAPI file.

    Upload your Swagger file. Fill in:

    • Host: api.yourdomain.com (no https prefix)
    • Base URL: /v1 or whatever your API uses
    • Scheme: HTTPS only. Do not ship HTTP.

    If you do not have a Swagger file, use the blank template and define operations manually. It is slower but gives you full control over operation IDs and summaries, which the agent uses to pick the right tool.

    Step 3: Configure authentication properly

    For internal enterprise APIs, OAuth 2.0 with Microsoft Entra ID is the only option I would ship. API key auth works for prototypes and falls apart the moment you need per-user identity in the agent. The same principle applies when thinking about Power Platform governance that does not kill adoption — authentication decisions made at the connector level affect every maker and every environment downstream.

    In the Security tab, pick OAuth 2.0, Identity Provider Azure Active Directory, and fill in:

    • Client ID: from your app registration
    • Client secret: from the same registration
    • Resource URL: the App ID URI of the API, for example api://your-api-app-id
    • Scope: the scope you exposed on the API app registration, e.g. access_as_user

    Save, then copy the Redirect URL Power Platform generates and add it to your Entra app registration’s redirect URIs. Skip this and you will get AADSTS50011 on every connection attempt.

    Step 4: Define operations with names the LLM can understand

    This is where most connectors fail as agent tools. The operation summary and description are not metadata. They are the prompt the LLM uses to pick the tool.

    Bad: GetData. Good: Get order details by order ID.

    For each operation set:

    • Summary: what the operation does, in plain English
    • Description: when to use it, what it returns
    • Operation ID: camelCase, descriptive, like getOrderById
    • Parameter descriptions: every parameter, including required vs optional

    Test each operation in the Test tab before moving on. If it does not return clean JSON here, it will not work in the agent.

    Step 5: Add the connector to a Copilot Studio agent as a tool

    Open your agent in Copilot Studio. Go to Tools, Add a tool, then pick Connector and find your custom connector. Microsoft’s docs on connector tools in Copilot Studio cover the UI changes if anything looks different in your tenant. If you are still getting oriented with the platform itself, If You Are Starting Copilot Studio in 2026 Skip the Chatbot Tutorials is worth reading before you wire up your first tool.

    For each operation you want to expose:

    • Review the auto-generated tool description. Rewrite it if it is vague.
    • Mark inputs as Dynamically fill with AI for parameters the LLM should infer from conversation, or Custom value for static config.
    • Set the connection. For testing, use your own. For production, configure end-user authentication so each user authenticates with their own identity.

    Step 6: Test it end to end with realistic input

    In the Test pane, do not type the obvious phrasing. Type how a real user would ask. Watch the activity map. You want to see the tool getting picked, the parameters extracted correctly, the response coming back as structured JSON, and the agent using specific values from that JSON in its reply.

    If the agent paraphrases instead of citing values, your response schema is too loose or the tool description is misleading. Tighten both.

    The final state and one common pitfall

    You now have a custom connector calling an authenticated internal API, registered as a tool the agent can invoke based on conversation context, returning structured data the LLM uses in its response.

    The pitfall to watch for: connection caching across environments. A connector that works in your dev environment will not auto-promote. You need to export the solution, import it into the next environment, and recreate the connection reference there. Skip that and your ALM pipeline will fail in ways that look like auth bugs but are really environment scoping. If your setup involves agents talking to external systems across multiple environments, the same scoping problems come up in Power Platform Agents Talking to GitHub Sounds Simple Until You Hit Enterprise Environment Sprawl.

    Frequently Asked Questions

    How do I create a custom connector in Copilot Studio?

    To build a Copilot Studio custom connector, start by preparing an OpenAPI 2.0 (Swagger) file that defines your API endpoints, parameters, and response schemas. Then go to make.powerautomate.com, navigate to Data, Custom Connectors, and import your Swagger file. From there you configure authentication, define your operations, and connect the connector to your Copilot Studio agent as a tool.

    What is the best authentication method for a custom connector in Power Platform?

    OAuth 2.0 with Microsoft Entra ID is the recommended approach for enterprise APIs, as API key authentication does not support per-user identity. Authentication choices made at the connector level affect every maker and environment that uses it downstream, so it is worth getting right from the start.

    Why does my Copilot Studio agent not understand the API response correctly?

    If your API returns a vague or empty response body, the agent has nothing structured to reason about. Every response should include a defined schema with named properties so the agent can interpret the result and decide how to use it in the conversation.

    When should I build a custom connector instead of using a built-in one?

    You need a custom connector when your Copilot Studio agent needs to call an internal or proprietary REST API that is not covered by the existing built-in connectors. If your use case involves company-specific data or internal services, a custom connector is the right path.

  • Power Automate Error Handling Patterns That Actually Work

    Power Automate Error Handling Patterns That Actually Work

    Power Automate error handling patterns with scopes and run after configuration

    Most Power Automate error handling I see in the wild is one Try scope, one Catch scope, and a Teams message that says Flow failed. That is not error handling. That is a notification with extra steps.

    Real Power Automate error handling patterns answer three questions. What failed. Why it failed. What happens to the work that was already in flight when it failed. If your flow does not answer all three, you are going to find out about problems from an angry colleague, not from your monitoring.

    I have rebuilt enough flows after silent failures to have strong opinions on this. Here is what I actually use.

    The Try Catch Finally pattern is the floor, not the ceiling

    Three scopes. Try runs your logic. Catch is configured with Run After set to has failed, is skipped, and has timed out. Finally runs after both, regardless of outcome. This is documented well in the Microsoft Learn Power Automate docs and most builders get this far.

    The problem is what people put inside Catch. Usually a single Post Message action with @{workflow()?['run']?['name']} and a generic failure string. That tells you the flow failed. You already knew that. It does not tell you which action failed inside the Try, what the actual error message was, or what input caused it.

    The fix is using result('Try') inside Catch and filtering for items where status is not Succeeded. That gives you the specific action name, the status code, and the error body. Now your alert is useful.

    Differentiate transient from terminal errors

    This is the pattern most flows skip and it is the one that matters most in production. A 429 from a connector is not the same problem as a 400 from bad input. One needs a retry with backoff. The other needs a human.

    Inside Catch, parse the error and branch. Status codes 408, 429, 500, 502, 503, 504 are transient. Retry them, ideally with a Do Until that has a delay and a max iteration count. Status codes 400, 401, 403, 404 are terminal. Do not retry. Log them and move on or escalate.

    Power Automate’s built-in retry policy on individual actions covers some of this, but it does not let you do anything intelligent with the failure. It just retries with exponential backoff and then gives up. For anything that touches an external system with rate limits, I wrote about how this connects to throttling limits and why default retry behaviour can mask problems until volume increases.

    Compensating actions for partial failures

    This is the one almost nobody does. If your flow creates a SharePoint item, then sends an email, then updates a Dataverse record, and step three fails, what happens to steps one and two? Nothing, by default. You have a SharePoint item that should not exist and an email that should not have been sent.

    The pattern is simple. Inside Catch, run compensating actions for whatever Try already completed. Delete the SharePoint item. Send a correction email. Mark the Dataverse record as Failed rather than leaving it half-updated. You do this by checking result('Try') for which actions actually succeeded before the failure, then reversing only those. If you are using SharePoint lists as your backend, as I covered in SharePoint Lists Are Still the Best Backend for 80 Percent of Power Platform Apps, the compensating delete is straightforward because the list item ID is always available in scope.

    It is more code. It is also the difference between a flow that fails cleanly and a flow that leaves your data in a state nobody can reason about three weeks later.

    Centralise error logging

    Stop writing custom logging logic in every flow. Build one child flow that takes the run ID, the flow name, the failed action, the error body, and the input payload, and writes it to a single Dataverse table or SharePoint list. Every flow calls that child flow from its Catch scope.

    Now you have one place to look when things break. You can build a Power BI report on it. You can spot patterns across flows. You can see that 80 percent of your failures are coming from one connector and actually fix the root cause instead of patching individual flows.

    The notification trap

    If every failure sends a Teams message, people stop reading them within two weeks. I have seen this play out on multiple internal builds. Tier your alerts. Transient errors that self-recover do not need a notification. Terminal errors that need human input do. Compensating actions that ran successfully need a log entry, not a ping.

    The goal is that when a notification arrives, the person receiving it actually opens it. Anything else is noise. This connects to a broader problem I have written about in Power Platform Governance That Does Not Kill Adoption, where poorly designed alerting policies erode trust in automation the same way overly restrictive DLP policies erode maker trust in the platform.

    The pattern that ties it together

    Try Catch Finally for structure. Result filtering for specificity. Transient versus terminal branching for intelligence. Compensating actions for data integrity. Centralised logging for visibility. Tiered notifications for sanity.

    None of this is exotic. All of it is skipped because the happy path works in testing and the edge cases only show up at volume. Build the error handling first. The flow will be slower to ship and faster to trust. And if you are still deciding whether Power Automate is worth investing this depth of effort into, Why Power Automate Is Still Worth Learning in 2026 covers exactly that question.

    Frequently Asked Questions

    What are the best power automate error handling patterns to use in production flows?

    Effective power automate error handling patterns go beyond a basic Try and Catch scope. You should capture specific action-level failures using result(‘Try’), differentiate between transient and terminal errors, and include compensating actions to undo partial work when a flow fails midway.

    How do I find out which action failed inside a Power Automate Try scope?

    Use the result(‘Try’) expression inside your Catch scope and filter for items where the status is not Succeeded. This returns the specific action name, status code, and error body, giving you meaningful diagnostic information instead of a generic failure message.

    When should I retry a failed action in Power Automate versus escalate to a human?

    Retry transient errors such as 408, 429, 500, and 503 using a Do Until loop with a delay and a maximum iteration count. Terminal errors like 400, 401, 403, and 404 indicate a problem with the request itself, so retrying will not help and the failure should be logged or escalated instead.

    Why does Power Automate’s built-in retry policy not work well for rate-limited connectors?

    The default retry policy applies exponential backoff and then stops, but it does not let you inspect or act on the failure in a meaningful way. At low volumes this can go unnoticed, but as traffic increases the lack of intelligent handling can cause widespread failures that are difficult to diagnose.

  • Microsoft Discovery Is the First Real Glimpse of Domain-Specific Agent Platforms

    Microsoft Discovery Is the First Real Glimpse of Domain-Specific Agent Platforms

    Microsoft Discovery agentic RD platform sitting above Copilot Studio in the enterprise agent stack

    I came across the Azure Blog post about Microsoft Discovery expanding its preview, and it crystallised something I have been chewing on for months. Most enterprise AI conversation right now is stuck on horizontal agents. Generic copilots doing generic things across generic data. Microsoft Discovery agentic RD goes the other direction, and that direction is where the interesting architectural decisions are about to happen.

    What Microsoft Discovery Actually Is If You Skip the Marketing

    Strip the announcement language away and Discovery is a vertical agent platform shaped specifically for research and development workflows. It is not a chatbot. It is not Copilot Studio with a science skin. It is a purpose-built layer with domain primitives baked in: scientific data structures, simulation orchestration, multi-agent coordination tuned for R&D problems instead of generic enterprise tasks.

    The important word is shape. A horizontal agent platform gives you a blank canvas and a set of generic tools. A domain-shaped platform gives you a canvas where the grid lines already match the work. You give up flexibility. You gain a tenth of the build time when the shape fits.

    Why Domain-Shaped Agent Platforms Beat Generic Copilots for R&D Workflows

    I have written before about how most agentic workflows are just fancy if/then logic in a trench coat. The reason is almost always the same. Teams use a general-purpose tool to model a domain it does not understand, then spend weeks bolting domain logic on top through prompts and tool definitions.

    R&D is the perfect example. A real research workflow involves hypothesis tracking, simulation runs, candidate scoring, lineage of why a decision was made three steps ago. None of that is native to a generic Copilot Studio agent. You can build it. I have seen people try. It ends up as a fragile stack of topics, variables, and Power Automate flows pretending to be a state machine.

    A domain-shaped platform encodes those primitives directly. The agent does not need a 4000-token system prompt explaining what a candidate molecule is, because the platform already knows. That is the productivity unlock, and it is also why I think we are about to see a lot more of these.

    How This Changes the Build vs Buy Decision for Power Platform Teams

    Here is the part Power Platform people should pay attention to. The skill that matters going forward is not how well you can build in Power Automate or Copilot Studio. It is picking the right altitude for the automation in front of you.

    I keep seeing teams default to building everything in Copilot Studio because that is the tool they know. Someone wants a research assistant. Someone wants a contract review agent. Someone wants a finance close helper. All of it gets crammed into Copilot Studio topics and custom connectors, and six months later the build is brittle, slow, and three people deep in technical debt. If you are just getting started, getting started with Copilot Studio in 2026 means skipping the chatbot tutorials entirely and learning to think in terms of orchestration first.

    The decision tree is going to look more like this:

    • Is there a domain-shaped platform that already models this work? Use it. Customise on top.
    • Is the workflow generic but cross-system? Copilot Studio agent with deterministic Power Automate flows underneath.
    • Is the workflow narrow, predictable, high volume? Raw Power Automate. No agent. No reasoning layer. Just a flow.
    • Is the workflow heavy on judgment with messy unstructured inputs? Reasoning model in the orchestration layer, not the response layer. I covered this in my post on Claude as orchestration brain.

    Picking the wrong altitude is the most expensive mistake I see. Discovery is interesting precisely because it adds a new altitude that did not exist in the Microsoft stack before. R&D teams who would have been forced into Copilot Studio now have a layer that fits their work natively.

    What I Would Watch For Next in the Microsoft Agent Stack

    Discovery is the canary. R&D is just the first vertical because Microsoft has obvious customers there and the workflows are well-understood. The pattern will repeat. I would expect domain-shaped agent layers for clinical workflows, manufacturing operations, financial close, regulatory review. Each one will sit above the general-purpose Copilot stack and offer the same trade: less flexibility, much faster time to a working system.

    The thing I am watching is interoperability. Can a domain platform like Discovery call out to a Copilot Studio agent for a side task? Can a Power Automate flow trigger a Discovery workflow? If yes, the stack becomes composable and the architectural decisions get genuinely interesting. If no, we end up with another round of silos with their own latency problems and integration debt.

    For now, the practical move is to stop treating Copilot Studio as the universal hammer. In my experience, the teams who consistently ship working automations are the ones who match the tool to the shape of the work. Discovery just made that decision a little more interesting.

    Frequently Asked Questions

    What is Microsoft Discovery and how does it differ from Copilot Studio?

    Microsoft Discovery is a purpose-built agent platform designed specifically for research and development workflows, not a general-purpose copilot tool. Unlike Copilot Studio, it comes with domain-specific primitives like scientific data structures and simulation orchestration built in, so teams spend far less time engineering workarounds for R&D-specific tasks.

    How does Microsoft Discovery agentic RD improve research and development workflows?

    Because the platform already understands R&D concepts like hypothesis tracking, candidate scoring, and simulation runs, agents do not need lengthy prompts or custom-built logic to handle them. This reduces build time significantly compared to trying to model the same workflows on a generic agent platform.

    When should I choose a domain-specific agent platform over a generic one like Copilot Studio?

    A domain-specific platform makes sense when your workflows map closely to the vertical it was designed for, since the built-in primitives cut build time and reduce fragility. If your use case is too broad or does not fit the platform shape, a general-purpose tool with custom configuration will give you more flexibility.

    Why do generic agentic workflows often fail for complex enterprise use cases?

    General-purpose platforms require teams to manually encode domain logic through prompts, tool definitions, and automation flows, which produces brittle systems that are hard to maintain. When the platform has no native understanding of the domain, complexity accumulates quickly and the resulting agent is difficult to scale or debug.

    This post was inspired by Microsoft Discovery: Advancing agentic R&D at scale via Azure Blog.

  • Anthropic Running Claude on Trainium Matters More for Enterprise Than the Benchmarks Suggest

    Anthropic Running Claude on Trainium Matters More for Enterprise Than the Benchmarks Suggest

    Diagram of claude on amazon trainium for enterprise automation pipelines

    Most of the coverage of Anthropic running Claude on Amazon’s Trainium chips frames it as a benchmark race. Faster training. Cheaper inference. Another shot at Nvidia. That framing misses what actually matters if you are building production automations. The thing that should make enterprise Power Platform and AI people pay attention to claude on amazon trainium for enterprise is not raw performance. It is supply, capacity, and price stability.

    I have been building Claude-backed flows internally for a while now. The model quality has not been my problem. The economics and the throttling have.

    Why the Trainium Story Is Actually a Capacity Story

    When you read about Anthropic moving serious training and inference onto Trainium, the interesting part is not whether the chip beats an H100 on some synthetic benchmark. The interesting part is that for the first time there is a credible path to Claude pricing that is not entirely tied to Nvidia GPU scarcity.

    If you have ever tried to scale a customer-facing agent on a shared inference pool, you know what I mean. Peak hours hit. Latency drifts up. Occasionally you get a 429. Your Power Automate flow has a retry policy, sure, but the user already saw the spinning circle for nine seconds and moved on.

    Capacity is the silent killer. Benchmarks are the loud distraction.

    What Token Cost Drift Looks Like in a Real Power Automate Flow

    Here is the thing nobody tells you when you slot Claude into a flow through Bedrock or the API. The first version of your agent has a tight system prompt. Maybe 800 tokens. Then someone asks for a new edge case. You add a few examples. Then someone reports a wrong answer, so you add a guardrail paragraph. Then you add tool descriptions. Then you add a few more examples because the tool descriptions confused the model.

    Six weeks later your system prompt is 4,200 tokens. Every single invocation pays for those tokens. If your flow runs 12,000 times a month, you just multiplied your input cost by five and nobody noticed because the per-call cost still looks tiny on the invoice.

    I learned this the hard way on an internal agent. The unit cost looked fine. The monthly bill did not. The fix was not switching models. The fix was treating the system prompt like code, with a review step, and splitting context that only some topics need into retrieval rather than baking it into every call.

    This is the part where chip economics actually touch your Power Automate flow. If inference cost per token drops because Anthropic has cheaper compute, that bloated prompt hurts less. If it does not drop, your business case erodes quietly while you build more features on top.

    Bedrock vs Direct Anthropic API for Enterprise Automation Workloads

    People ask me which one to use. The honest answer is it depends on what your governance team will sign off on, not what the model does.

    Bedrock gives you the AWS contract, the data residency story, the IAM model your security team already understands, and provisioned throughput as an option. The direct Anthropic API gives you faster access to new models and sometimes better pricing on burst usage.

    For anything customer-facing or anything that touches regulated data, Bedrock usually wins on the paperwork alone. For internal experimentation and prototypes, the direct API is fine. The mistake I see people make is prototyping on the direct API and then trying to lift and shift to Bedrock at the last minute. Region availability, model version naming, and quota structure are different enough that you will burn a sprint on it.

    Pick the path your production version will live on. Build there from day one. If you are still weighing how Claude fits into your automation architecture more broadly, Claude as an Orchestration Brain Is the Most Interesting Thing Happening in Enterprise AI Right Now covers where it actually earns its place as a reasoning layer.

    How I Would Plan Claude Capacity for an Agent You Actually Depend On

    If the agent matters, on-demand inference is not enough. Provisioned or reserved capacity is starting to look less like a luxury and more like a baseline. Latency Is the Quiet Killer of Agentic Workflows and Almost Nobody Talks About It goes into how to budget round-trip time before you build, and the same logic applies to capacity. A flow that works at 2pm on Tuesday and times out at 10am on Monday is not a production system. It is a demo with good luck.

    Three things I would actually do.

    Measure your real token distribution. Not the average. The 95th percentile input and output. That is what your capacity needs to handle, not the median case.

    Separate your workloads. The agent that drafts an email for an internal user can sit on shared on-demand inference. The agent that responds to a customer in under three seconds cannot. Different SLAs, different capacity tiers.

    Track cost per successful outcome, not cost per call. An agent that fails 20 percent of the time and gets retried is twice as expensive as the invoice suggests. This is where bad tool design quietly destroys your unit economics. If you are unsure whether the model choice even matters as much as you think for automation workloads, Claude vs ChatGPT Is the Wrong Question When You Are Building Automations is worth reading before you optimize the wrong variable.

    The Trainium news matters because it changes the long-term curve on what any of this costs. But the curve only helps you if your architecture is set up to benefit from it. Bloated prompts, on-demand only inference, and no measurement of cost per outcome will eat any savings the chip story delivers.

    Read the news through that lens. The benchmarks are not the point.

    Frequently Asked Questions

    Why does running Claude on Amazon Trainium matter for enterprise AI deployments?

    The real benefit of Claude on Amazon Trainium for enterprise is not raw chip performance but improved supply, capacity, and price stability for inference workloads. Enterprises building production automations have historically struggled with throttling and unpredictable costs tied to GPU scarcity, and Trainium offers a credible path to more reliable, affordable access.

    Why does my Power Automate flow keep getting slow or failing when using Claude?

    The most common culprit is shared inference pool capacity, especially during peak hours, which causes latency spikes and occasional rate limit errors. Even with a retry policy in place, the delay is often long enough that users abandon the process before it completes.

    How do I control Claude API costs in a high-volume Power Automate flow?

    Treat your system prompt like code and review it regularly, because prompts tend to grow over time as edge cases and guardrails are added, multiplying your input token costs across every invocation. Moving context that is only relevant to certain topics into retrieval rather than including it in every call can significantly reduce per-run costs.

    When should I start worrying about token cost drift in an AI automation?

    Token cost drift can begin within weeks of deploying an agent as system prompts expand to handle new requirements and edge cases. The per-call cost often still looks small, so the problem tends to go unnoticed until the monthly total becomes difficult to justify.

  • Power Platform Governance That Does Not Kill Adoption

    Power Platform Governance That Does Not Kill Adoption

    Power Platform governance that does not kill adoption in an enterprise environment

    I keep seeing the same pattern on LinkedIn and in conversations with people at other organisations. Power Platform governance gets handed to a security team, they lock everything down, and six months later nobody is building anything. Then someone writes a post about low adoption and blames the makers. It is not the makers. It is the governance design.

    Good Power Platform governance is not about stopping people. It is about making the safe path the easy path. If your governance model forces a citizen dev to file three tickets and wait two weeks to use a SharePoint connector, you do not have governance. You have a queue.

    The two failure modes of Power Platform governance

    Every governance setup I have seen fails in one of two ways.

    The first is the lockdown. Default environment is disabled for everyone. Every connector is in the Blocked DLP group unless explicitly approved. New environments require a business case signed by three people. Makers give up and go back to Excel macros. Shadow IT grows in Teams and OneDrive where nobody is watching.

    The second is the free-for-all. Everyone builds in the default environment. No DLP. No naming conventions. Flows owned by people who left two years ago still run in production because nobody knows what they do or who to ask. The CoE Kit is installed but nobody looks at the dashboards.

    Both end in the same place. Leadership decides Power Platform does not work for the enterprise. The real problem was never the platform.

    What actually works

    The teams I talk to who have this working treat governance as a product with makers as the users. That framing changes everything.

    They have a managed maker environment that anyone in the org can request access to in under an hour. Standard connectors are in the Business group. Premium and risky connectors need justification but the path is documented. People know where to go.

    They have a promotion path from maker environment to a shared Dev, then UAT, then Prod. Each environment has different DLP settings and the differences are written down. This is the part most teams miss. I wrote about this before in the context of Power Platform agents and GitHub integration, where a connector sitting in Business in Dev and Blocked in Prod silently kills an agent at go-live. Same pattern applies to any flow or app that crosses environments.

    They have an ownership policy that actually runs. When someone leaves, their flows and apps get reassigned within a defined window, not forgotten. The Power Platform PowerShell modules handle most of this if you script it. The CoE Kit is fine for inventory but I have stopped relying on it as the source of truth for ownership because the lag is too long.

    DLP policies that do not punish makers

    Most DLP policies I see are written by someone who has never built a flow. They block the HTTP connector by default and then wonder why nobody integrates with anything. Or they put SharePoint and Outlook in different data groups and break every approval flow in the tenant.

    The approach that works is starting from what makers actually need and working backwards. Look at what the top 50 flows in your tenant use. Build your Business data group around that. Put the genuinely risky stuff like custom connectors to unknown endpoints in Blocked. Review quarterly, not yearly.

    And write the policy down in plain language somewhere makers can find it. Not a SharePoint page buried in the IT site. Pin it in the Teams channel where makers ask questions.

    The CoE Kit is not a governance strategy

    I see this a lot. A team installs the CoE Kit, enables a few flows, and calls it done. The CoE Kit is a starting point. It is inventory and some reporting. It is not a governance strategy on its own.

    The manual updates every few months genuinely do break things. The premium license requirement for some of the governance features is a real cost. The handoff complexity when the person who installed it leaves is a known problem. None of this is a secret.

    What I have seen work is using the CoE Kit for what it is good at, which is discovery, and building lightweight custom tooling for the parts that matter to your org. Inactivity policies. Ownership reassignment on leaver events. Environment request intake. None of this needs to be fancy. A few flows and a SharePoint list go a long way, which ties back to a point I have made before about SharePoint being the right backend for most Power Platform needs.

    Governance is a product

    The shift that changed how I think about this is treating governance as a product with makers as the customer. If makers avoid your governance model, it is broken. If they route around it, it is broken. If they ask for exceptions constantly, it is broken.

    The safe path has to be the easy path. Otherwise adoption dies and you spend the next year explaining to leadership why the platform did not deliver. If you are still building out the foundation, understanding why Power Automate is still worth learning in 2026 is a good place to start before layering governance on top of a platform your team is not yet fluent in.

    Frequently Asked Questions

    What is Power Platform governance and why does it matter?

    Power Platform governance is the set of policies, environments, and controls that shape how people build and share apps, flows, and agents in your organisation. When designed well, it protects the business without slowing down the people trying to do useful work. When designed poorly, it either locks everything down or lets chaos grow unchecked.

    How do I set up Power Platform governance without killing adoption?

    Treat governance as a product where makers are the users, not a security checklist applied on top of them. Give people a managed environment they can access quickly, document the DLP policies clearly, and make the approved path easier than any workaround. If the process to get started takes weeks, people will find another way.

    Why does my Power Platform app break when it moves from Dev to Production?

    This usually happens because DLP connector policies differ between environments and those differences are not documented anywhere. A connector allowed in your Dev environment may be blocked in Production, which silently breaks apps or flows at go-live. Writing down the DLP differences for each environment and testing against them before promotion prevents this.

    When should I reassign Power Platform flows and apps after someone leaves the organisation?

    Reassignment should happen within a defined window as soon as someone leaves, not reactively when something breaks. Flows and apps owned by former employees can keep running in production for years with no clear owner, which creates a serious operational and security risk. Power Platform PowerShell modules can automate much of this process.

  • Claude vs ChatGPT Is the Wrong Question When You Are Building Automations

    Claude vs ChatGPT Is the Wrong Question When You Are Building Automations

    Comparing Claude vs ChatGPT for automation workflows inside Power Platform

    Another Claude vs ChatGPT comparison landed in my feed this week. I came across a piece on the Zapier Blog running the usual head-to-head: reasoning, coding, writing, ethical dilemmas. Useful if you are picking a chat assistant for personal use. Almost useless if you are deciding claude vs chatgpt for automation inside a real enterprise flow.

    I keep seeing people pick a model based on a consumer benchmark and then act confused when their Copilot Studio agent starts returning malformed JSON in week three. The criteria that matter when a model sits behind a connector are not the criteria that make for a good blog post.

    Why Head to Head Model Comparisons Stop Being Useful the Moment You Add a Connector

    Consumer comparisons test the model in isolation. One prompt in, one answer out, a human judges the output. That setup tells you nothing about what happens when the model has to call a tool, parse a response, call another tool, and feed a structured result into a downstream action.

    Inside an automation, the model is not the product. The model is one component in a pipeline. The question is not which one writes better poetry. The question is which one fails in ways your orchestration layer can actually handle.

    I wrote about this angle in a previous post on agentic workflows. The LLM is the reasoning layer, not the agent. Picking the reasoning layer on vibes from a consumer benchmark is how you end up with a beautifully worded confident response for a task that never completed.

    The Four Things That Actually Matter When a Model Sits Inside an Automation

    These are what I actually test for. None of them show up in head-to-head comparisons.

    Structured output stability under load. Ask the same model for the same JSON schema a hundred times with slightly different inputs. Count how often it adds a trailing comma, drops a required field, wraps the JSON in a code fence, or decides today is the day to add a helpful explanation before the response. This is the single biggest source of silent failures I see in production.

    Tool-calling predictability with multiple connectors. Give the model five tools. Watch how it picks. A model that is 95 percent accurate on tool selection with two tools can drop to 70 percent with five because the descriptions start competing. Consumer tests never measure this.

    Behaviour when context gets long. Most real flows accumulate context: user input, previous tool results, system instructions, retrieved documents. I want to know how the model behaves at 40k tokens of accumulated state, not at 500. Instruction drift usually shows up here first.

    Pricing behaviour under loops. An agent that retries three times on a failed tool call can quietly 10x your cost. The cheaper model on paper is not always the cheaper model in production once you account for retry patterns and token accumulation. Latency Is the Quiet Killer of Agentic Workflows covers how round-trip costs compound in ways most people never budget for until it is too late.

    How I Pick Between Claude and GPT for a Specific Flow

    I do not pick a model for the whole platform. I pick per use case.

    For long-context reasoning where the model needs to hold a lot of state and follow detailed instructions without drifting, Claude has been the more predictable option in my testing. Fewer surprise deviations from the system prompt when the context gets messy. If you want to go deeper on why Claude works well as a reasoning layer inside enterprise pipelines, Claude as an Orchestration Brain Is the Most Interesting Thing Happening in Enterprise AI Right Now gets into the architecture side of that decision.

    For fast, cheap, high-volume classification or extraction where the schema is simple and the input is short, GPT models tend to win on cost-per-call and latency. If the task is “read this email and return one of five categories,” I am not paying for a heavyweight reasoning model.

    For tool-calling inside a Copilot Studio agent with multiple Power Automate actions, I test both. There is no universal winner. It depends on how the tool descriptions are written, how many there are, and how ambiguous the user input gets.

    The honest answer most of the time is: it does not matter as much as the people arguing about it think it does. The bigger wins come from tool design, prompt structure, and failure handling. A well-designed flow with a mid-tier model beats a sloppy flow with the flagship every time.

    What to Test Before You Commit a Model to Production

    Before a model goes behind a production flow, I run four checks. Not benchmarks. Checks against the actual flow.

    Run the real schema a hundred times with production-like inputs. Measure malformed output rate. Anything above one percent and you need a validation and retry layer, no matter which model you picked.

    Run the tool-calling logic with the real connector set, not a simplified test set. Watch for the model picking the wrong tool when two descriptions overlap. This is where I lost the most time the hard way.

    Simulate a long session. Feed it accumulated context that looks like a real user journey, not a single clean turn. Watch for instruction drift.

    Load test with the pricing model in mind. Know what a retry storm costs you before it happens in production, not after finance asks questions. The Power Automate documentation covers retry policies, but most people never configure them until something breaks.

    The Claude vs ChatGPT question is the wrong frame. The right question is: which model handles the specific shape of failure my flow is most exposed to. Answer that and the comparison stops mattering. That is the part I keep trying to explain when people ask me, and it still gets pushed aside for whichever model topped a benchmark last week.

    Frequently Asked Questions

    Which is better, Claude vs ChatGPT for automation workflows?

    Choosing between Claude and ChatGPT for automation is less about which model performs better in general benchmarks and more about how each behaves inside a pipeline. The criteria that matter are structured output reliability, tool-calling accuracy, and how well the model holds instructions as context grows. Testing both models against your specific workflow conditions will tell you far more than any consumer comparison.

    Why does my AI agent start producing errors after working fine at first?

    This often happens because the model experiences instruction drift as context accumulates over time. Long flows gather user inputs, tool results, and retrieved documents, and some models struggle to maintain consistent behaviour at high token counts. Testing your model under realistic context lengths before going to production can help catch this early.

    How do I choose an AI model for a Power Automate or Copilot Studio flow?

    Focus on how the model handles structured outputs, selects the right tools when multiple connectors are available, and behaves when context is long rather than short. Consumer benchmarks test models in isolation, but real automation pipelines require consistent, predictable behaviour across repeated calls with varying inputs. Running your own tests against your actual schema and tools will give you more reliable answers.

    What causes silent failures in AI automation workflows?

    One of the most common causes is inconsistent structured output, where a model occasionally adds unexpected formatting, drops required fields, or wraps a response in a code block instead of returning clean JSON. These errors can pass through without triggering obvious alerts while still breaking downstream actions. Testing output stability across many varied inputs is one of the most important steps before deploying a model-powered flow.

    This post was inspired by Claude vs. ChatGPT: What’s the difference? [2026] via Zapier Blog.

  • Scheduled Codex Runs Are the Missing Piece Between Chatbots and Real Automation

    Scheduled Codex Runs Are the Missing Piece Between Chatbots and Real Automation

    Codex automations scheduled ai runs replacing recurring jobs

    I came across a post from OpenAI about Codex Automations the other day, and it reminded me of a pattern I keep seeing people miss. Everyone is obsessed with chatbots. Meanwhile the real unlock is boring and familiar to anyone from the Power Platform world. It is the schedule. Codex automations scheduled ai runs are the bridge between cool demo and something that actually replaces a recurring job.

    Most AI tooling still assumes a human is in the loop pressing a button. That assumption is the ceiling. Break it and the shape of what you build changes.

    Why a scheduled AI run is different from a scheduled flow

    A scheduled Power Automate flow is deterministic. Same trigger, same actions, same branches. You can draw it on a whiteboard before it runs and the drawing will be correct. I have written about this before. If you can fully diagram the execution path before it runs, it is not an agentic workflow. It is a flow.

    A scheduled Codex run is the opposite. The trigger fires on a schedule, but the work happening inside is a reasoning step. The model decides what to read, what to compare, what to summarise, what to flag. You are not wiring actions. You are wiring a recurring thought.

    That sounds fluffy. It is not. It changes what workloads are worth automating at all.

    The workload shape where Codex automations scheduled ai runs actually fit

    Here is the shape I look for. The task runs on a cadence. The inputs vary in structure every time. The output is a judgement, a summary, or a prioritised list. No two runs look the same but the goal is identical.

    Think about the recurring jobs that never got automated because the logic was too fuzzy. The weekly review of open pull requests that actually need attention. The Monday morning scan of overnight alerts to decide which three matter. The monthly pass over a folder of documents to flag what changed in a way a human cares about.

    In Power Automate you would try and fail. You would end up with a flow that emails everything to a human who then does the real work. The flow is a courier, not an automation.

    A scheduled AI run is different. The reasoning is the automation. The delivery is the courier part.

    What I would build with this tomorrow if I had it internally

    A daily 7am run that reads the previous day’s pipeline run logs across a set of flows, clusters the failures by likely root cause, and posts a short Teams message with the three things worth looking at. Not the raw error list. The interpretation.

    A weekly pass over a shared folder that produces a diff in plain English. What changed, who changed it, whether it looks like policy drift or normal edits.

    A monthly review of connector usage that flags flows quietly heading toward platform-level throttling before they break in production.

    None of these are chatbots. None of them need a human to press a button. All of them are reasoning tasks that happen on a clock. That is the fit.

    Where Power Automate still wins and where it does not

    Power Automate wins the moment the work is deterministic and the integrations are inside the Microsoft estate. Approvals. SharePoint updates. Dataverse writes. Email parsing with known templates. Anything with governance, DLP, and environment strategy attached. A scheduled AI run from outside the tenant does not solve those things. Power Automate does.

    It loses the moment the work is a judgement call on messy inputs that change shape every run. That is where a scheduled Codex or Claude run wins by a wide margin. Trying to force that into a flow gives you the courier pattern. Useful, but not automation. Latency Is the Quiet Killer of Agentic Workflows and the same principle applies here — the more reasoning steps you stack inside a scheduled run, the more carefully you need to budget what actually happens inside that window.

    The interesting move is using both. The scheduled AI run produces the judgement. Power Automate delivers it, logs it, routes approvals, writes to the system of record. The reasoning layer decides. The execution layer acts. I have said this more than once and I will keep saying it because most teams still collapse the two. If you are thinking about where Workspace Agents compare to Power Automate in this picture, that framing is worth reading before you decide which layer owns the work.

    If you already think in triggers and schedules from the Power Platform world, you are better positioned than most to use this well. You know what a cadence looks like. You know what idempotent means. You know why retry logic matters. Now the thing running inside the schedule can think. That is the shift.

    Stop waiting for someone to press a button.

    Frequently Asked Questions

    What are codex automations scheduled AI runs and how do they work?

    Codex automations scheduled AI runs are recurring AI tasks that fire on a set schedule, where the model performs reasoning rather than following a fixed, pre-wired set of actions. Unlike a traditional scheduled flow, the AI decides what to read, compare, summarise, or flag each time it runs. This makes them suited to tasks where the inputs vary but the goal stays the same.

    How do I know when to use a scheduled AI run instead of a Power Automate flow?

    If you can map out every branch and action of a task before it runs, a standard flow is the right tool. When the output requires interpretation, prioritisation, or judgement based on inputs that change each time, a scheduled AI run is a better fit. Tasks like triaging alerts, reviewing documents for meaningful changes, or summarising error logs fall into this second category.

    Why does a scheduled Power Automate flow struggle with fuzzy or variable logic?

    Power Automate flows are deterministic, meaning they follow the same paths every time regardless of context. When the logic requires understanding nuance or making a judgement call, the flow typically ends up forwarding everything to a human rather than completing the work itself. The flow becomes a delivery mechanism rather than a true automation.

    When should I consider replacing a recurring manual review task with an AI automation?

    If a task runs on a regular cadence, involves inputs that vary in structure, and produces an output that is a summary, ranking, or decision rather than a fixed result, it is a strong candidate for AI automation. Examples include weekly pull request reviews, overnight alert triage, and monthly document audits where a human currently does the interpretation work.

    This post was inspired by Automations via OpenAI.

  • Power Platform Agents Talking to GitHub Sounds Simple Until You Hit Enterprise Environment Sprawl

    Power Platform Agents Talking to GitHub Sounds Simple Until You Hit Enterprise Environment Sprawl

    Power Platform agent GitHub integration enterprise environment mapping diagram

    I keep seeing the same demo on LinkedIn. Someone wires a Power Platform agent to GitHub in ten minutes, the agent answers questions about a repo, everyone claps. Then a team at a real enterprise tries to copy the pattern and stalls for three weeks. The problem is never the connector. The problem is power platform agent github integration enterprise reality, where you have twelve environments, three GitHub orgs, DLP policies that differ per environment, and a security team that wants tokens rotated quarterly.

    The connector works. The environment strategy around it does not.

    Why the GitHub Connector Demo Lies to You

    In a demo, one person builds in their default environment, authenticates with their own GitHub account, and points the agent at a public or personal repo. Every layer of that setup is the easy path.

    Personal auth hides the service account problem. Default environment hides the DLP problem. A single repo hides the org sprawl problem. Running it locally hides the fact that the Prod environment sits in a different security posture entirely.

    I have watched teams ship an agent to UAT, watch it work, promote it to Prod, and then hit a wall where the connection simply cannot see the repo. No error message that is useful. Just empty knowledge and a confident agent saying it could not find the information. Same failure mode I wrote about in agent testing versus production behavior. The agent answers. The answer is empty. Nobody notices for a week.

    The Environment Sprawl Problem Nobody Scopes For

    Large enterprises do not have one Power Platform environment. They have Dev, multiple UAT environments per business unit, a shared services environment, regional Prod environments, a sandbox for citizen devs, and a couple of orphaned ones nobody wants to delete. Each one has its own DLP policy group at the tenant level.

    GitHub is not one thing either. Most large orgs I hear about run at least two GitHub organisations. Sometimes an enterprise account with several orgs under it. One for platform code, one for product code, maybe one for security-sensitive work behind SSO with stricter SAML enforcement.

    Now ask the real question. Which Power Platform environment is allowed to talk to which GitHub org, with which connector classification, using which identity? Most teams have never drawn this map. They build in Dev, it works, and they assume Prod behaves the same. It does not.

    The GitHub connector can sit in the Business data group in one environment and the Blocked group in another. The agent that passed UAT will not even load the connection in Prod because the policy blocks it. You find out at go-live.

    How I Would Map Power Platform Environments to GitHub Orgs and Repos

    Start with the principle that environment design is the hard part, not the connector. Then draw the map before you build anything.

    Dev Power Platform environment talks to a Dev GitHub org, or a dedicated sandbox org, with a scoped token and loose DLP. UAT talks to the same repos as Prod but through a read-only identity, so you are testing against real structure without write risk. Prod talks to Prod GitHub orgs through a managed service identity, with the connector in the Business data group and DLP exceptions documented.

    The knowledge source in the agent has to point at a repo that the target environment’s connection can actually see. This is where most builds break. The agent was built in Dev pointing at a repo in the Dev org. In Prod, that repo does not exist, and the Prod connection has no permission on the Prod equivalent. The fix is not technical. It is an environment variable pattern where the repo reference is parameterised per environment, and solution deployment swaps the value.

    The Microsoft Learn docs on environment strategy cover the platform side. They do not cover the mapping to external orgs. That part is on you.

    Service Accounts Tokens and the Stuff That Actually Breaks in Prod

    Personal Access Tokens are how every demo works and how no enterprise should run anything. The person who created the PAT leaves the company, the token is revoked, the agent goes dark. I have seen this happen. Twice.

    GitHub Apps are the right answer for Prod. Fine-grained permissions, installable per org, rotate credentials without losing the identity. The connector supports GitHub App auth. Use it. The trade-off is setup time. You have to get the security team to approve the app installation on the target org, which takes weeks in a large enterprise. Plan for that before you commit to a go-live date.

    Service account seats are the other thing that breaks quietly. The identity your Prod connection uses needs a seat on the target repo. In a GitHub Enterprise plan with seat limits, this is a budget conversation, not a technical one. I have seen agent deployments stall because nobody wanted to pay for an extra seat.

    Token rotation policy is the last piece. If your security team rotates every ninety days, build the rotation into your deployment pipeline, not into a calendar reminder. Otherwise the agent fails silently on day ninety-one and the confident-but-empty response problem shows up again. And if those silent failures start compounding across chained steps, you are looking at the kind of agentic workflow latency problem that is easy to miss until it is already affecting users.

    The connector is not the hard part. It never was. The teams that succeed stop treating integration as a connector problem and start treating it as an environment design problem. If you are still early in this process, getting started with Copilot Studio in 2026 means thinking through environment strategy from day one, not after your first failed Prod deployment. Draw the map first. Build second.

    Frequently Asked Questions

    How do I set up a Power Platform agent GitHub integration in an enterprise environment?

    Start by mapping which Power Platform environments need to connect to which GitHub organisations, and what DLP policies apply to each. You cannot assume a connection that works in Dev or UAT will behave the same way in Production, since connector classifications can differ across environments. Sorting out service account credentials and token rotation policies before you build will save significant rework later.

    Why does my Power Platform agent work in UAT but fail in Production when connecting to GitHub?

    The most common cause is a difference in DLP policies between environments. The GitHub connector may be classified as allowed in your UAT environment but blocked or restricted in Production, which stops the connection from loading at all. The agent will often still respond but return empty results, making the failure easy to miss until users report it.

    What is environment sprawl and why does it matter for Power Platform agent deployments?

    Environment sprawl refers to the accumulation of multiple Power Platform environments across an organisation, each with its own DLP rules, security posture, and connector permissions. It matters for agent deployments because a GitHub connection that is permitted in one environment may be completely blocked in another, and most teams do not map these differences before they start building.

    When should I use a service account instead of personal authentication for a GitHub connector in Power Platform?

    Any time the agent is intended for a team or production use case, a service account is the right choice over personal authentication. Personal credentials tie the connection to an individual user, which creates access and continuity risks when that person changes roles or leaves the organisation. A shared service account also makes token rotation and permission auditing much easier to manage.

  • Workspace Agents Are ChatGPT’s Answer to Power Automate and That Comparison Matters

    Workspace Agents Are ChatGPT’s Answer to Power Automate and That Comparison Matters

    OpenAI Workspace Agents compared to Power Automate flow diagrams

    I came across the OpenAI page on Workspace Agents and my first thought was blunt. This is Power Automate with a chat interface sitting in front of it. That is not a dig. The fact that OpenAI Workspace Agents land so close to what Microsoft has been building for years is the interesting part, because it tells you where the bar is moving for every automation builder in the enterprise.

    I have been building on Power Platform full time inside a large organisation for years. I am not worried about Workspace Agents replacing anything in my stack next week. I am thinking about what happens when the people I build for start using ChatGPT at home and walk into the office expecting the same feel.

    What OpenAI actually shipped and why it looks familiar

    Strip the marketing language and Workspace Agents are a way to let a user describe a repeatable task, connect some tools, and have the agent run it on a schedule or on demand. Triggers. Actions. Conditions. A reasoning layer that decides what to do next.

    If that sounds like a Power Automate flow with a Copilot Studio agent sitting on top, that is because functionally it overlaps a lot. The difference is not in what it does. It is in how you build it.

    Conversation-first automation versus flow-first automation

    Power Automate starts from a diagram. You pick a trigger, you add steps, you see the branches. Even when Copilot writes the flow for you, the output is still a visual graph you can inspect, test, and version.

    Workspace Agents start from a conversation. You tell the agent what you want. It figures out the steps. You refine by talking, not by dragging.

    Neither approach is better. They attract different builders and produce different kinds of automations. Flow-first builders think in terms of state, error paths, and what happens when step 4 fails. Conversation-first builders think in terms of outcomes and trust the model to fill in the middle.

    I have written before about what actually makes a workflow agentic, and the same rule applies here. If you can fully diagram the execution path before it runs, you built a flow with a chat skin. The interesting Workspace Agent use cases are the ones where the agent genuinely picks the path.

    What this means if you already run on Power Platform

    Workspace Agents are not going to displace Power Platform inside a large enterprise. Governance, DLP, environment strategy, audit, the whole compliance layer. None of that is solved by a chat interface on top of a model provider.

    But the comparison matters for two reasons.

    First, it shows what conversation-first building can feel like when it works. Power Automate with Copilot is moving in that direction, just slower and with more guard rails. If you want to understand where the platform is heading, watching how people actually use Workspace Agents is more useful than reading another Microsoft roadmap post.

    Second, it exposes the parts of Power Platform that still feel heavy. Creating a solution, picking an environment, sorting out connection references, publishing, sharing. A business user who just had a working agent in ChatGPT in four minutes is going to ask why the internal version takes four days. Part of that friction is unavoidable — as I explored in why Power Automate is still worth learning in 2026, the platform carries real enterprise weight that consumer tools simply do not have to.

    The expectation shift that is about to hit your intake queue

    This is the part people I talk to at other organisations are not ready for.

    The OpenAI Workspace Agents launch does not change what is technically possible inside your tenant. It changes what your users think should be easy. Someone who built an agent over the weekend to summarise emails and update a Google Sheet is going to file an intake ticket asking for the same thing against SharePoint and Outlook, and they will be confused when the answer is not “sure, by Friday.”

    The honest answer is that the internal version has to handle auth, permissions, data residency, retention, and the fact that the output will be read by someone who makes a decision based on it. That is not bureaucracy. That is the cost of operating in a regulated enterprise. But nobody wants to hear it when the external version just works.

    The teams that will handle this well are the ones that stop treating every request as a custom build and start shipping pre-approved agent templates with the governance already baked in. Citizen devs get conversation-first speed. The platform team keeps control of the risk surface. That is the only way the intake queue survives the next year. And it is worth remembering that who owns the decision inside these automations matters as much as how fast they run — shipping an agent template without settling that question just moves the risk downstream.

    I have opinions on how to structure that, and I will write about it soon. You can follow along on my LinkedIn if you want the next piece when it lands.

    Workspace Agents are not a threat. They are a preview of the conversation you are about to have with every business user who used ChatGPT over the weekend.

    Frequently Asked Questions

    What are OpenAI Workspace Agents?

    OpenAI Workspace Agents let users describe a repeatable task, connect tools, and have an agent run it on a schedule or on demand. They use a conversation-first approach, meaning you define what you want through chat rather than building a visual workflow diagram.

    How do OpenAI Workspace Agents compare to Power Automate?

    Both handle triggers, actions, and conditions to automate tasks, so they overlap significantly in what they can do. The key difference is how you build them: Power Automate starts from a visual flow diagram, while Workspace Agents start from a conversation with the model.

    When should I use Power Automate instead of a conversational agent?

    Power Automate is better suited when you need clear error handling, version control, and a fully inspectable execution path. Conversation-first tools like Workspace Agents work well when you want to define an outcome and let the model determine the steps.

    Why does the rise of OpenAI Workspace Agents matter for enterprise automation builders?

    As more people use conversational AI tools like ChatGPT in their personal lives, they will expect a similar experience in workplace tools. This raises the bar for how automation platforms present themselves, even if enterprise governance and compliance requirements still favour established platforms.

    This post was inspired by Workspace agents via OpenAI.

  • Latency Is the Quiet Killer of Agentic Workflows and Almost Nobody Talks About It

    Latency Is the Quiet Killer of Agentic Workflows and Almost Nobody Talks About It

    Diagram showing agentic workflow latency across multiple model calls in a Copilot Studio and Power Automate loop

    Everyone obsesses over model quality, tool design, and prompt structure when building agents. The thing that actually kills adoption in production is something else entirely. Agentic workflow latency is the quiet killer, and most Power Platform and Copilot Studio builders are not thinking about it until users start abandoning the tool.

    I came across a post from OpenAI about using WebSockets and connection-scoped caching in the Responses API to speed up their Codex loop. It confirmed something I keep running into building multi-step agents internally. The math is brutal once you do it honestly.

    Why Agent Loops Feel Slow Even When Each Call Is Fast

    A single model call at 800ms feels fine. A tool call at 300ms feels fine. A Dataverse lookup at 500ms feels fine. Everyone looks at these numbers in isolation and says the platform is fast enough.

    Then you build an actual agent. It reasons, calls a tool, reads the result, reasons again, calls another tool, checks a condition, calls a third tool, summarises, responds. That is 8 to 15 round trips for one user request. Each round trip carries connection setup, authentication overhead, token streaming setup, and the model’s own time to first token.

    A 400ms overhead per call sounds small. Multiply by 12 calls. That is almost 5 seconds of pure overhead before any actual thinking or work happens. Users do not wait 15 seconds for a confident answer. They ask once, get nothing for a few seconds, and switch back to the old way of doing it.

    I have watched this kill internal tools that were technically correct. The agent did the right thing. Nobody used it.

    What OpenAI Just Shipped and Why It Matters Beyond Codex

    The short version of what they did: move from repeated HTTP requests to a persistent WebSocket connection, and keep cache state scoped to that connection so repeat context does not need to be re-processed on every turn.

    This is not a Codex-only trick. It is a general pattern. Connection-scoped caching means the expensive part of a call, the part that handles your system prompt and tool definitions and prior context, does not get redone from scratch every time your agent takes another step.

    For anyone building agents that loop, this is the shape of the next year of infrastructure work. The platforms that expose this properly will feel instant. The ones that do not will feel like they are thinking through molasses.

    What This Looks Like Inside Copilot Studio and Power Automate

    Here is where it gets uncomfortable. In Copilot Studio, you do not see the round trips. You see a topic, a few actions, a generative answer node. The platform hides every call behind its own orchestration.

    That hiding is the problem. A Copilot Studio agent doing generative orchestration with three tool calls backed by Power Automate flows is making far more round trips than most builders realise. Each tool call is a Power Automate HTTP trigger plus whatever that flow does internally, often including another connector call to SharePoint, Dataverse, or an external API. The agent then reads the response and decides what to do next, which is another model call. And if you are hitting Power Automate throttling limits under real load, every one of those round trips gets longer.

    I built one recently that felt snappy in testing with one user. In production with ten concurrent sessions, response times doubled. Nothing in the flow was slow on its own. The sum was slow, and throttling on shared connectors made it worse. This is the same class of problem I wrote about in Most Agentic Workflows Are Just Fancy If/Then Logic in a Trench Coat. The difference between a real agent loop and a glorified flow shows up in latency first.

    How I Would Budget Latency Before I Build the Agent

    I treat latency as a first-class design constraint now, not something I measure after the fact. Before I build, I do this:

    • Estimate the number of model calls per user request. Not best case. Typical case.
    • Estimate the number of tool calls and what each one hits. A SharePoint list call in the same tenant is not a Graph API call with auth handshake.
    • Set a budget. I aim for under 4 seconds total for anything conversational, under 10 seconds for anything that is clearly doing work.
    • Cut calls aggressively. Can two tools be one? Can I pre-fetch context in a single call instead of three? Can the agent skip a reasoning step when the intent is obvious?
    • Parallelise where I can. Power Automate lets you run actions in parallel branches. Most builders do not use them.

    The other thing I stopped doing: chaining LLM calls for steps that do not need reasoning. If a step is deterministic, I call the tool directly, not through the model. Every model call I can remove from the loop gives me back 500 to 1500ms.

    Latency is also where the question of who owns the decision in an agentic workflow becomes a performance problem, not just a governance one. Every checkpoint that routes back to a human approver adds another wait state to the loop. The more of those you have, the more your total response time is dominated by human latency, not model latency.

    I have written more about my approach to this kind of trade-off on my LinkedIn, because I keep having the same conversation with people at other organisations who hit the wall when their demo hits real users.

    The agents that win in production are not the smartest ones. They are the ones that answer before the user gives up.

    Frequently Asked Questions

    Why does agentic workflow latency get so bad in multi-step agents?

    Each individual call in an agent loop may seem fast, but the overhead adds up across 8 to 15 round trips per user request. Connection setup, authentication, and token streaming costs stack on every single step, turning individually acceptable delays into a frustrating overall wait time.

    What is connection-scoped caching and how does it help agent performance?

    Connection-scoped caching keeps expensive context like system prompts, tool definitions, and prior conversation state ready across multiple calls instead of reprocessing it each time. This avoids redundant work on every step of an agent loop and significantly reduces the overhead that accumulates across a multi-turn interaction.

    How do I reduce latency in Copilot Studio and Power Automate agents?

    Start by auditing how many round trips your agent actually makes for a single user request, since this is where most hidden latency lives. Look for opportunities to batch tool calls, reduce unnecessary steps in your loop, and watch for platform-level improvements like persistent connections that reduce per-call overhead.

    Why do users abandon AI agents even when the agent gives correct answers?

    If the response takes too long, users lose confidence and revert to familiar alternatives before the agent finishes. Technical correctness does not matter if the experience feels slow enough to suggest something has gone wrong.

    This post was inspired by Speeding up agentic workflows with WebSockets in the Responses API via OpenAI.