Tag: Power Platform

  • Claude as an Orchestration Brain Is the Most Interesting Thing Happening in Enterprise AI Right Now

    Claude as an Orchestration Brain Is the Most Interesting Thing Happening in Enterprise AI Right Now

    Most of the conversation around Claude in enterprise automation circles is stuck on the wrong question. People are comparing it to GPT-4o or Gemini as a text generator, debating which one writes better emails or summarises documents more accurately. That framing completely misses what makes Claude enterprise automation orchestration genuinely interesting right now.

    The practitioners I talk to who are getting real results are not using Claude as a chatbot. They are using it as the reasoning layer that decides what to do next in a multi-step, stateful workflow. That is a different problem than answering a question, and it changes everything about where Claude fits in your architecture.

    The chatbot framing is getting in the way

    When a team says they want to “add Claude” to something, the default mental model is a chat interface. User sends message, model replies. Maybe it calls a tool or two. That is not orchestration. That is a smarter input box.

    Orchestration is what happens when you need a model to receive a complex goal, break it into sequenced steps, call different tools at different points, evaluate intermediate results, and decide whether to continue, retry, or escalate. The model is not answering a question. It is managing execution across a process that has state, has branching conditions, and has consequences if it goes wrong.

    I wrote about this problem directly in the post on agentic workflows. The LLM is not the agent. The LLM is the reasoning layer. If you treat them as the same thing, you end up bolting a model onto the response step of what is really just a structured flow. That is not orchestration. That is decoration.

    What makes Claude specifically interesting for orchestration logic

    Two things stand out when I look at how Claude behaves in multi-step contexts compared to other models at similar capability levels.

    First, instruction following under load. When you give Claude a detailed system prompt with conditional logic, constraints, tool-use rules, and output format requirements, it holds those instructions across a long session more reliably than most alternatives. With other models I have tested, instruction drift starts showing up once you push past a few thousand tokens of context. Claude handles longer, more complex prompts without silently dropping constraints mid-execution. For orchestration, where the system prompt is essentially your process logic written in natural language, that matters a lot.

    Second, the extended context window is not just about volume. It is about statefulness. A workflow that processes a contract, then a set of approval records, then a policy document, then makes a decision that references all three needs a model that can hold all of that in scope simultaneously. Losing context partway through an orchestration run means the model makes decisions with incomplete information. It does not know it has incomplete information. It proceeds confidently anyway. I have seen exactly this failure mode in Copilot Studio agents, where silent context loss leads to confident-sounding responses for tasks that were never properly evaluated.

    Where I would actually slot this into a Power Platform architecture

    I would not replace the existing orchestration layer in a Power Automate flow with a Claude prompt. That is not the use case. Power Automate is still the right place for deterministic, sequential steps with connectors, triggers, and error handling you can inspect.

    Where Claude earns its place is in the decision layer that sits above or between those steps. Think of a workflow that processes incoming requests, where each request has variable structure, ambiguous intent, and routing logic that depends on context that changes week to week. A hard-coded set of conditions in Power Automate will break the moment the business logic shifts. A Claude orchestration layer that reads the request, evaluates the current context loaded from Dataverse, and decides which downstream flow to invoke handles that variability without you rewriting conditions every time.

    In practice, I would build it as a Copilot Studio agent backed by Claude through a custom connector or direct API call, where Claude handles the reasoning and routing logic and Power Automate handles execution of the discrete steps. The agent decides. The flows act. The separation matters because it keeps your execution logic testable and your reasoning logic flexible. Before wiring any of this together, it is also worth auditing what adding Copilot to an existing app actually changes versus what it just surfaces differently.

    The governance piece from the post on enterprise Power Platform applies here too. Calling an external Anthropic API endpoint means your orchestration reasoning is leaving the tenant. That is an audit trail split and a DLP conversation you need to have before you build, not after.

    The honest constraints before you redesign anything

    Claude is not a free variable. Longer context windows mean higher token costs per run, and orchestration workflows that run hundreds of times a day will surface that quickly in billing. Model latency at high context volumes is also real. If your process requires sub-second decisions, this is not your tool.

    The other constraint is testability. When your orchestration logic lives in a system prompt rather than a flow diagram, reproducing a failure is harder. The model made a bad routing decision on Tuesday afternoon. Why? You need logging at the prompt level, not just at the action level. Most teams I see building this way have not set that up, and they hit the same silent failure problem I described in the Copilot Studio testing post: everything looks fine until a real user finds the edge case.

    Claude as an orchestration brain is a genuinely different capability than what most teams are building with today. The question is not whether it is smarter than the last model. The question is whether your architecture is designed to use a reasoning layer at all, or whether you are still just looking for a better chatbot to put at the front of a process that was never designed to be orchestrated.

    Frequently Asked Questions

    What is Claude enterprise automation orchestration and how is it different from using Claude as a chatbot?

    Claude enterprise automation orchestration means using Claude as the reasoning layer that manages multi-step workflows, rather than as a simple question-and-answer interface. Instead of responding to single prompts, Claude receives a complex goal, breaks it into steps, calls tools, evaluates results, and decides how to proceed. This requires stateful, branching logic that goes well beyond what a chat interface is designed to handle.

    Why does instruction drift matter when using an LLM for workflow orchestration?

    In orchestration, your system prompt acts as the process logic for the entire workflow, so if the model quietly forgets constraints or rules mid-execution, the whole process can break or produce incorrect outcomes. Some models begin losing adherence to instructions as context grows, which is a serious problem in long-running enterprise workflows. Consistency across extended sessions is one of the key reasons practitioners favour certain models for this use case.

    When should I use an LLM as an orchestration layer instead of a traditional workflow tool?

    An LLM-based orchestration layer becomes valuable when your workflow involves conditional reasoning, ambiguous inputs, or decisions that depend on synthesising information from multiple sources rather than following a fixed rule set. If your process logic can be fully mapped in advance and never changes based on context, a traditional workflow tool is likely simpler and more reliable. The LLM adds value where judgment and adaptability are required at execution time.

    How does a large context window improve multi-step enterprise workflows?

    A large context window allows the model to hold all relevant documents, intermediate results, and prior decisions in scope at once, rather than losing earlier information as the workflow progresses. This matters in processes that require a final decision to reference multiple earlier inputs, such as reviewing a contract alongside approval records and a policy document. Losing that context mid-run can lead to decisions that are inconsistent with earlier steps in the same workflow.

  • Adding Copilot to Your Power App Is Not the Same as Making It Smarter

    Adding Copilot to Your Power App Is Not the Same as Making It Smarter

    Microsoft published a post this week about making business apps smarter by embedding Copilot, app skills, and agents directly into Power Apps. The features are real and some of them are genuinely useful. But I keep seeing teams read announcements like that and immediately open their existing apps to start wiring things in. That is where it goes wrong. Adding Copilot to Power Apps does not make the app smarter. It makes the AI visible. Those are different things.

    What App Skills and Agent Integration Actually Do Under the Hood

    When you expose a Power App as an app skill or embed a Copilot Studio agent into a canvas app, you are giving the AI a surface to operate on. The agent can read context from the app, trigger actions, and return responses into the UI. In theory, the AI bridges what the user needs and what the app can do.

    In practice, the agent is only as capable as what you hand it. It reads data from your app’s data sources. It calls the actions you have defined. It interprets user intent against the topics and instructions you have written. If your data model is inconsistent, your actions are incomplete, or your process logic has gaps, the agent does not compensate for any of that. It just operates on top of it and returns confident-sounding responses anyway.

    I wrote about this problem in a different context when covering why Copilot Studio agents fail in production. Silent action failures are one of the nastiest issues: the agent completes its response, the user thinks something happened, nothing actually did. That risk does not disappear when you move the agent inside a Power App. If anything, it gets harder to spot because users expect the app to be reliable.

    Why the Data Model and UX Structure Matter More Than the AI Feature

    Most Power Apps I have seen built inside large organisations were designed around a specific, narrow workflow. The data model reflects decisions made at the time of build, often under time pressure, often by someone who is no longer on the team. Fields are repurposed. Status columns hold values that mean three different things depending on which team is using them. Lookup tables have orphaned records nobody cleaned up.

    When you put an agent on top of that, the agent queries this data and tries to give useful answers. The answers will be coherent. They will not be correct. Not reliably.

    The UX structure compounds this. Canvas apps built for point-and-click navigation do not automatically become good AI surfaces. If a user can ask the agent to update a record, but the app’s own form has fifteen required fields and three conditional rules that only run client-side, you now have a conflict between what the agent can do via a Power Automate action and what the app enforces through its UI. One of them will win. It will not always be the right one.

    This is the same argument I made about automating a bad process. The automation does not fix the process, it executes it faster and more consistently, including the broken parts. Embedding AI into a poorly structured app works the same way.

    What I Check Before Wiring Any Agent Into an Existing App

    Before I connect anything to a Copilot Studio agent or enable app skills on an existing Power App, I go through a short audit. Not a formal document. Just four questions that save a lot of cleanup later.

    • Is the data model clean enough to query? If the same concept is stored in three different columns across two tables with inconsistent naming, the agent will surface that inconsistency directly to the user. Fix the model first.
    • Are the actions the agent can trigger complete and safe? Every Power Automate flow an agent can call needs proper error handling and a defined failure response. Silent failures inside agent topics are a known problem. If the flow does not return a clear success or failure, the agent cannot respond accurately.
    • Does the app enforce rules that the agent needs to know about? If business logic lives only in Power Fx expressions inside the app’s forms, the agent does not see it. Validation that matters needs to exist at the data layer or inside the flows the agent calls.
    • Is the process the app supports well-defined enough to describe to an AI? If I cannot write a clear system prompt describing what the agent should and should not do in this app, the process is not ready. Ambiguity in the process becomes ambiguity in agent behaviour.

    When Embedding AI in a Power App Is Worth It and When It Is Not

    There are genuinely good cases for this. An app where users regularly need to find records across complex filters is a reasonable candidate. Surfacing a conversational shortcut to navigate a large dataset, trigger a common action, or get a summary of a record without clicking through multiple screens can reduce real friction. I have seen it work well when the underlying data is clean and the scope of what the agent can do is narrow and explicit.

    The cases where it is not worth it yet are more common. An app with inconsistent data. A process with unresolved exceptions. A UX that was never designed with AI interaction in mind. In those situations, embedding an agent creates a new layer of support burden without a proportional benefit.

    I also want to be direct about something I mentioned in my post on when Copilot Studio is the wrong choice: not every interaction benefits from being conversational. Some things in a Power App are faster as a button. The AI control is not always an upgrade on a well-placed filter or a clear form layout.

    The Microsoft announcement covers what these features can do. That is useful to know. But the question worth spending time on is not whether you can add Copilot to your Power App. It is whether the app you have is ready to have AI sitting on top of it. Most of the time, that answer requires more honesty than the feature release notes will prompt you to apply.

    Frequently Asked Questions

    How do I add Copilot to a Power App?

    You can embed a Copilot Studio agent into a canvas app or expose your app as an app skill, giving the AI a surface to read context and trigger actions. However, before doing this, your data model and process logic need to be solid, because the agent will only be as reliable as what you give it to work with.

    Why does adding Copilot to Power Apps not make the app smarter?

    Embedding Copilot makes the AI visible inside your app, but it does not fix underlying problems with your data or logic. If your data model is inconsistent or your actions are incomplete, the agent will still return confident-sounding responses that may not be accurate or reliable.

    What is the difference between an app skill and a Copilot Studio agent in Power Apps?

    An app skill exposes your Power App so an AI can interact with it from outside, while embedding a Copilot Studio agent brings the AI directly into the canvas app interface. Both approaches rely on the same principle: the AI can only work with the data sources and actions you have defined for it.

    When should I consider adding AI features to an existing Power App?

    You should only add AI features once your data model is clean, your process logic is complete, and your app’s actions are properly defined and tested. Layering AI onto a poorly structured app creates a conflict between what the agent can do and what the app enforces, which makes failures harder to detect.

    This post was inspired by Making business apps smarter with AI, Copilot, and agents in Power Apps via Microsoft Power Platform Blog.

  • Automating a Bad Process Just Makes It Fail Faster

    Automating a Bad Process Just Makes It Fail Faster

    I came across a post from Zapier Blog about process improvement recently, and it made a familiar point: most broken work isn’t actually broken work, it’s a broken process behind it. Messy handoffs, unclear ownership, approvals that live in one person’s head. Good framing. But it treats process improvement before automation as something you do once, upfront, like a checklist item you can tick and move past. In enterprise Power Platform work, that assumption is where things go wrong.

    What Building Automations Teaches You About How Work Actually Flows

    When you sit down to build a flow, you have to make the process machine-readable. That means every branch needs a condition. Every input needs a defined type. Every approval needs an owner. Every exception needs a path.

    Most processes handed over for automation have none of that. What they have instead is a document someone wrote two years ago, a few spreadsheets nobody fully trusts, and a senior colleague who holds the real logic in their head and has been doing it long enough that they don’t notice the decisions they’re making.

    The automation developer ends up being the first person to actually interrogate the process at that level of detail. Not because they went looking for it, but because the flow won’t build until the ambiguity is resolved. You cannot write a condition on a field that sometimes exists and sometimes doesn’t. You cannot route an approval to a role when the role changes depending on factors nobody documented.

    This is not a Power Platform problem. It surfaces in every serious automation project I’ve heard about across different organisations. The tool just makes the gaps visible faster than any process workshop usually does.

    The Specific Process Failures That Surface When You Try to Automate

    There are a few categories I keep running into.

    Unclear ownership. A task gets triggered, but nobody agreed who acts on it. The automation sends an email. Nobody responds. The flow sits waiting. Eventually it times out. Everyone blames the automation.

    Inconsistent inputs. The data coming in doesn’t conform to any standard. Fields are free text when they should be dropdowns. Dates are formatted three different ways. Required fields are blank because the upstream system never enforced them. Your flow handles the clean case fine and breaks silently on everything else. I wrote about this kind of silent failure in the context of Copilot Studio agents failing in production, but the same thing happens in flows where bad input just passes through without raising an error until something downstream breaks.

    Approval logic nobody can fully articulate. You ask who approves a request above a certain threshold. You get three different answers from three different people. All of them are confident. When you automate the majority answer, you will eventually automate the wrong one for someone’s edge case, and that person will be senior enough that it becomes your problem.

    Exception handling that lives in tribal knowledge. The manual process survives because a human notices something feels off and picks up the phone. The automated process has no equivalent. The exception just propagates.

    Why Fixing the Process First Does Not Mean Waiting to Build

    The standard advice is to fix the process before you automate it. That advice is correct and also almost never followed, because the people who own the process don’t feel urgency until they see the automation breaking. The broken automation is what creates the pressure to fix the underlying problem.

    This doesn’t mean you should automate bad processes and hope for the best. It means process improvement and automation are parallel work, not sequential steps. You build, you find the gap, you surface it to the right person, you agree on a rule, you build that rule into the flow. Then you find the next gap.

    The first build is often a diagnostic as much as a delivery. You are not just producing a flow. You are producing a map of where the process is genuinely undefined. That map is more useful than most process workshops, because it was produced by the requirement to actually execute the logic rather than describe it at a whiteboard.

    The risk is treating that diagnostic build as the final product. It isn’t. The flow that handles the happy path and ignores edge cases is not done. It is a prototype that revealed the real work. Those edge cases are also where Power Automate throttling limits tend to surface, once real volume hits paths that were never properly stress-tested.

    How to Pressure-Test Process Logic Before You Commit It to a Flow

    Before building anything complex, I walk through the process as if I were writing the conditions myself, not interviewing someone about it. Specifically:

    • Ask what happens when a required field is missing. If the answer is “that doesn’t happen,” it will happen.
    • Ask who the fallback approver is when the primary approver is unavailable. If there isn’t one, your flow will block silently until someone notices.
    • Ask what the exception path looks like and who owns it. If the answer is vague, you have found the part of the process that was always handled by instinct rather than logic.
    • Take a real sample of historical cases and walk them through your intended logic manually before writing a single action. The cases that don’t fit cleanly are the ones that will break production.

    This is not a formal methodology. It is just refusing to start building until the people handing you the process have answered the questions the machine will ask anyway.

    The automation doesn’t forgive ambiguity. It just executes it. And when it does, at scale, faster than any manual process ever ran, the results are hard to ignore. That’s not a bug in the automation. That’s the process finally being honest about itself.

    If you are responsible for building the automation, you are often the first person in the room with both the technical access and the obligation to ask those questions. Use it. The alternative is building something that works perfectly and fixes nothing. And if the process involves deciding whether to introduce an AI layer on top, that same discipline applies — agentic workflows require a different design approach, not just dropping intelligence onto a process that was never properly defined in the first place.

    Frequently Asked Questions

    Why should I focus on process improvement before automation?

    Automating a flawed process does not fix it, it just causes problems to occur more quickly and at greater scale. Issues like unclear ownership, inconsistent data, and undocumented decision logic only become more visible and damaging once a workflow is running automatically. Resolving these gaps before building any automation saves significant rework later.

    How do I know if a process is ready to automate?

    A process is ready to automate when every decision has a documented condition, every input has a defined and consistent format, and every approval has a named owner. If you cannot describe the process in those terms without relying on a single person’s institutional knowledge, it needs more work before automation begins.

    Why does automation fail even when it appears to be built correctly?

    Many automations fail silently because the underlying process was never fully defined. Problems like missing fields, inconsistently formatted data, or approval logic that varies depending on who you ask can all cause a flow to break or stall without an obvious error. The automation itself is often not the cause, the broken process feeding into it is.

    When should I involve a developer in reviewing a business process?

    Bringing a developer in early, before any build starts, is worthwhile because the act of making a process machine-readable forces a level of scrutiny that workshops and documentation reviews often miss. Developers building automation are frequently the first people to ask the precise questions that expose gaps in how a process actually works.

    This post was inspired by What is process improvement? via Zapier Blog.

  • Copilot Studio Is Not Always the Answer

    Copilot Studio Is Not Always the Answer

    I keep seeing this on LinkedIn and in community forums. Someone describes an internal use case, and the first five replies are all “have you tried Copilot Studio?” The tool has gotten good enough that it has become the reflexive answer to any question involving automation, conversation, or AI. That reflex is causing real problems. Knowing when Copilot Studio is the wrong tool is as important as knowing how to build with it well.

    When Copilot Studio Is the Wrong Tool for the Job

    Most misuse I see falls into one of three situations. The use case is purely transactional. The interaction model is not conversational. Or the team wants a workflow, not an agent.

    If someone needs to submit a form, approve a request, or trigger a process on a schedule, that is Power Automate territory. Putting a conversational interface in front of a single-action task does not make it better. It makes it slower, harder to test, and harder to maintain. Users do not want to type a sentence to do something they could do in two clicks.

    The second situation is harder to spot. Some interactions look conversational but are not. A knowledge base search, a document lookup, a status check. These are point-in-time queries with no real back-and-forth. You could build them in Copilot Studio. You could also build them as a Power Apps canvas app with a simple search interface and ship it in a day with less moving parts and a much more predictable failure surface.

    The Agent Complexity Problem

    There is also a complexity ceiling that teams hit faster than expected. Copilot Studio agents work well when the conversation scope is tight. One domain. A few topics. Defined intents. When someone tries to build a single agent that handles HR queries, IT requests, and finance approvals inside the same session, topic routing starts failing at the edges. I wrote about this in Your Copilot Studio Agent Passed Every Test and Still Failed in Production. When a user’s phrasing sits between two topics, the agent picks one confidently and gets it wrong. The more topics you add, the more edge cases you create, and the harder they are to test systematically.

    The instinct to build one agent that does everything is understandable. It feels cleaner. In practice it produces an agent that does everything poorly and fails in ways that are genuinely difficult to diagnose.

    Where the Wrong Choice Usually Starts

    It usually starts with the framing of the requirement. Someone says “we want a chatbot” and that phrase triggers Copilot Studio before anyone has defined what the interaction actually needs to do. I have seen teams spend weeks building agent topics, writing generative AI prompts, and wiring up Power Automate actions, when what the users actually wanted was a better SharePoint search and a weekly digest email.

    The honest question to ask before opening Copilot Studio is this: does this use case genuinely require back-and-forth conversation, or does it just need to surface information or move data? If the answer is the second one, there is almost always a simpler path.

    This is not a knock on Copilot Studio. The tool is genuinely capable when it fits the problem. Handling multi-turn conversations, routing across complex intent patterns, integrating generative answers with structured actions, those are things it does well. But that capability comes with a real operational cost. There is a topic structure to maintain, system prompts that drift when production data introduces edge cases, Power Automate actions that can fail silently inside a topic and return a confident-sounding response for work that was never done.

    What to Reach for Instead

    Power Apps for anything with a fixed interaction model. Canvas apps are underrated for internal tooling. They give you a defined UI, predictable state, and a clear place to debug when something breaks.

    Power Automate for anything triggered, scheduled, or event-driven. If there is no user in the loop having a conversation, there is no reason for Copilot Studio to be involved. Keep in mind that even straightforward flows can run into issues at scale, as Power Automate throttling limits will break your flow in production under real load if you have not accounted for them.

    SharePoint or Dataverse with a search interface for knowledge retrieval. If users are looking something up, build a search experience, not a conversational one.

    In enterprise environments, the governance overhead of Copilot Studio also matters. You are managing an agent that generates natural language responses. That response quality needs to be reviewed, monitored, and occasionally corrected. Most teams I talk to underestimate this cost until they are three months into production and someone in legal asks why the agent said something it should not have.

    The Right Question Before You Build

    Before any Copilot Studio project starts, the question worth asking is not “how do we build this agent” but “does this use case actually need an agent.” If the answer requires you to stretch the definition of conversation to make it fit, that is a sign to stop and pick the simpler tool.

    Copilot Studio is a good tool. It is not a default. Using it where it fits produces something worth building. Using it where it does not produces something you will be maintaining and explaining for a long time.

    Frequently Asked Questions

    When should I use Copilot Studio instead of another tool?

    Copilot Studio works best when the interaction is genuinely conversational, scoped to a single domain, and involves a defined set of intents. If the task is transactional, point-in-time, or better served by a simple form or search interface, tools like Power Automate or Power Apps are likely a faster and more maintainable choice.

    What is the difference between Copilot Studio and Power Automate?

    Power Automate is built for workflow and process automation, such as form submissions, approvals, and scheduled triggers. Copilot Studio is designed for conversational agent experiences. Using Copilot Studio for single-action tasks adds unnecessary complexity without improving the user experience.

    Why does my Copilot Studio agent keep routing users to the wrong topic?

    Topic routing breaks down when an agent is built to handle too many domains or intents within a single session. When a user’s phrasing falls between two topics, the agent will confidently pick one and get it wrong. Keeping each agent focused on a narrow scope reduces these edge cases and makes failures easier to diagnose.

    How do I know if my use case actually needs a chatbot?

    Start by defining what the interaction needs to do before choosing a tool. If users need a back-and-forth conversation to complete a task, a conversational agent may be appropriate. If they need a search result, a status update, or a simple action, a canvas app or improved search interface will often deliver a better outcome in less time.

  • Low-Code Platform Comparisons Miss the Point for Enterprise Power Platform Teams

    Low-Code Platform Comparisons Miss the Point for Enterprise Power Platform Teams

    I came across a post from Zapier Blog ranking the best low-code automation platforms, and it reminded me of a conversation I keep having with stakeholders. Someone reads a roundup, sends it over, and asks why we are not using one of the other tools on the list. The question sounds reasonable. The comparison is not. For teams doing power platform for enterprise automation, these lists are almost always built around the wrong frame entirely.

    Why Platform Comparison Lists Are Built for Buyers Who Do Not Exist in Enterprise

    Roundups like this are useful for one type of reader: someone at a small company, starting from scratch, with no existing infrastructure, who needs to pick a tool this week. That reader exists. Most people building automation inside a large organisation are not that reader.

    Enterprise teams are not choosing between platforms in a vacuum. They are operating inside a tenant. They have an existing Microsoft 365 agreement. They have an IT security function that has already decided what can touch production data. They have a DLP policy, or they are about to have one. The question is never which platform wins a feature comparison. The question is what is already inside the perimeter and how far can it go.

    When the starting point is a Microsoft 365 E3 or E5 agreement, Power Platform is not an option on a menu. It is largely already there. The conversation is about how deeply to use it, not whether to adopt it at all.

    What These Roundups Get Wrong About How Power Platform Actually Works at Scale

    The comparisons that show up in these lists treat features as equivalent when they are not. They will note that Power Automate supports HTTP connectors, and so does Zapier, so check. They will note that both have flow triggers and conditional logic. Check and check.

    What they do not cover is how governance works when you have hundreds of flows built by dozens of makers across multiple environments. Power Platform has environment-level DLP policies that enforce which connectors can interact with which data classifications. You can block a connector tenant-wide from the admin centre. You can require solution-aware flows before anything goes near a production environment. None of that is a feature you evaluate in a roundup. It is architecture you depend on when something goes wrong at 2am and you need to know exactly what touched what.

    Connector-level governance also ties directly into Entra ID. Service principal authentication, conditional access policies, managed identities for flows that call Azure resources. These are not nice-to-haves. They are what your security team will ask about before any automation touches HR data or finance systems. A platform comparison that does not address this is not comparing the same thing your enterprise is actually buying.

    The Governance and Tenant Boundary Argument Nobody in These Lists Makes

    The argument that actually matters for enterprise teams is about the boundary. Everything inside your Microsoft tenant shares an identity layer, a licensing model, an audit log, and a set of compliance controls. Power Platform lives inside that boundary by design. When a Power Automate flow calls Dataverse, or a Copilot Studio agent hands off to an AI Builder model, or a Power App writes back to SharePoint, none of that crosses a boundary. It is all inside the same governance envelope.

    When you bring in a third-party automation tool, you immediately introduce a boundary crossing. Data leaves the tenant. Authentication has to be managed separately. Your audit trail splits. Your DLP logic does not follow. That is not an argument against ever using other tools. But it is the argument that platform comparison lists never make, because they are not written for people managing compliance obligations across a 10,000-person organisation.

    I have written before about how throttling in Power Automate has two distinct layers, platform-level and connector-level, and understanding which one you are hitting matters. The same principle applies here. There are two distinct layers to platform selection: what the tool can do, and what the tool is allowed to do inside your security perimeter. Most comparison articles only address the first layer.

    How to Respond When a Stakeholder Sends You One of These Articles

    This happens. Someone senior reads a roundup, sees that another tool scored well on ease of use or pricing, and asks a reasonable question. Here is how I handle it.

    First, do not get defensive about Power Platform. That reads as tribal and closes the conversation. Instead, reframe the question. The roundup is answering “which tool is easiest to try”. The enterprise question is “which tool can we govern, audit, and scale without introducing a new identity boundary or violating our data residency requirements”.

    Second, be specific about what already exists. If you have 200 flows in production, connectors pre-approved by security, an admin centre your IT team actually monitors, and makers who already know the platform, the switching cost is not zero. It is very large. That context belongs in the conversation.

    Third, acknowledge what the other tools do well. Zapier is genuinely easier to set up for a simple two-step integration. Make has a visual canvas that some people find clearer than Power Automate’s. Agreeing on the narrow case where another tool wins builds credibility for the broader argument about why it does not win at enterprise scale. The same logic applies when teams start layering AI into their automations: as I explored in Agentic Workflows Are Not Just Fancy Automation, adding an AI layer does not transform a poorly governed process into a reliable one, regardless of which platform you are on.

    The roundup is not wrong. It is just answering a different question. Once you say that clearly, the conversation usually moves to something more useful than defending a platform choice that was effectively made the day the Microsoft agreement was signed.

    Frequently Asked Questions

    Why should enterprises use Power Platform for enterprise automation instead of other low-code tools?

    For most large organisations, Power Platform is already included in their Microsoft 365 agreement, so the decision is less about choosing a tool and more about how deeply to use one that is already available. It also integrates directly with existing Microsoft security infrastructure, including Entra ID, conditional access policies, and tenant-level governance controls that other platforms simply cannot replicate in that environment.

    How do I govern Power Automate flows across a large organisation?

    Power Platform allows admins to apply environment-level DLP policies that control which connectors can access which types of data, and connectors can be blocked tenant-wide from the admin centre. Requiring solution-aware flows before anything reaches a production environment adds another layer of control, giving teams a clear audit trail when something needs investigating.

    What is a DLP policy in Power Platform and why does it matter for enterprise teams?

    A DLP (Data Loss Prevention) policy in Power Platform defines which connectors can interact with business or sensitive data within a given environment. For enterprise teams handling HR or finance data, these policies are a security requirement rather than an optional feature, and they are enforced at the tenant level rather than left to individual flow builders.

    When should I question a low-code platform comparison for enterprise use?

    Most platform comparison lists are designed for small teams starting from scratch with no existing infrastructure, which is a very different situation from a large organisation with an established Microsoft 365 tenancy and security requirements already in place. If a comparison does not address governance at scale, service principal authentication, or tenant boundary controls, it is not evaluating the same things your enterprise actually needs.

    This post was inspired by The 7 best low-code automation platforms in 2026 via Zapier Blog.

  • Your Copilot Studio Agent Passed Every Test and Still Failed in Production

    Your Copilot Studio Agent Passed Every Test and Still Failed in Production

    I came across a post from Zapier Blog about AI agent evaluation, and it described something I keep seeing inside large organisations: an agent that looks perfect in a demo, gets signed off, goes live, and then immediately starts doing things nobody expected. Wrong tool calls. Conversation loops that never resolve. Outputs that look confident and are completely wrong. The post frames this well as a sandbox problem. But the fix it describes, better test coverage and smarter metrics, only gets you partway there. The deeper issue with Copilot Studio agent testing is not the quantity of your tests. It is what you are actually testing for.

    Why Demo-Passing Agents Break in Real Workflows

    When a team builds an agent in Copilot Studio, they test it against the happy path. A user asks a clean question. The agent triggers the right topic or action. The response looks good. Someone in the review meeting says it works great. The agent gets promoted to production.

    The problem is that real users do not ask clean questions. They ask incomplete ones. They switch intent halfway through a conversation. They paste in text that includes formatting your prompt never anticipated. They use your agent for things it was never designed to do, because nothing in the interface tells them not to.

    None of that shows up in a demo. It shows up three days after go-live when someone forwards you a conversation log that reads like a stress test you forgot to run.

    The Three Failure Modes I Keep Seeing in Copilot Studio Agents

    After building and reviewing a number of agents internally, the failures cluster into three patterns.

    Topic misrouting at the edges. Your agent routes correctly when the user says exactly what you expected. But natural language is messy. When a user’s phrasing sits between two topics, the agent picks one confidently and gets it wrong. You only discover this when someone captures a failed session and traces it back. By then, a dozen other users have hit the same wall and just stopped using the agent.

    Action failures that degrade silently. A Power Automate flow or a connector action fails in the background and the agent carries on as if nothing happened. No error surfaced. No fallback triggered. The user gets a response that implies the task completed. It did not. This is the agent equivalent of a flow that retries quietly and masks the problem until the load goes up. I wrote about that pattern in the context of Power Automate throttling limits breaking flows under real load. The same logic applies here: silent success is not success.

    Prompt instruction drift under real data. Your system prompt was written against clean test data. Production data is not clean. It has unexpected characters, long strings, mixed languages, or values that push the model toward an interpretation you did not intend. The agent’s behaviour drifts. Not catastrophically. Just enough to become unreliable in ways that are hard to reproduce and harder to explain to stakeholders.

    How to Build a Behavioral Test Suite Instead of an Output Checklist

    Most teams build an output checklist. Did the agent return the right answer for these ten questions? That tells you almost nothing about production behaviour.

    What you actually need is a behavioral test suite. The difference is this: output testing checks what the agent said. Behavioral testing checks how the agent handled the situation.

    Here is how I approach it inside Copilot Studio before promoting anything to production.

    Build adversarial input sets, not just representative ones. For every topic your agent handles, write three versions of the trigger: the clean version, an ambiguous version that could belong to two topics, and a broken version with incomplete or oddly formatted input. If the agent routes all three correctly, you have something worth shipping. If it fails on the ambiguous case, you have a routing gap that will hit real users constantly.

    Test conversation state, not just single turns. Copilot Studio agents hold context across a conversation. Test what happens when a user changes their mind on turn three. Test what happens when they ask a follow-up that assumes context the agent should have retained but might not. Single-turn testing misses an entire class of failure that only appears in multi-turn sessions. This is also why agentic workflows require a fundamentally different design approach, not just an AI layer placed on top of existing processes.

    Inject real data samples into action inputs. Pull a sample of actual data from your environment and run it through the actions your agent calls. Do not use synthetic test data if you can avoid it. Real data has edge cases your synthetic data will never cover. If your agent calls a flow that queries a SharePoint list, run the query against the actual list with actual entries, including the ones with blank fields and formatting you did not anticipate.

    Define explicit fallback behaviour and test it deliberately. Every agent should have a defined behaviour for when it cannot complete a task. Most teams add a fallback topic and assume it works. Test it by constructing inputs that should trigger it. If the fallback does not fire, or fires on the wrong inputs, fix it before go-live. A graceful failure is far better than a confident wrong answer.

    What to Monitor After Go-Live and When to Pull an Agent Back

    Testing before launch is necessary but not sufficient. Agent behaviour shifts as the inputs it receives in production diverge from what you tested against. You need monitoring in place from day one.

    Track escalation rate and abandon rate per topic. If a topic is seeing significantly higher escalations than others, that is a signal of routing or response quality problems, not user error. Track action failure rates separately from conversation outcomes. An agent can complete a conversation and still have failed to do the thing the user needed.

    Set a threshold before launch. If escalation rate exceeds a number you agree on in advance, or if a specific action is failing more than a defined percentage of the time, you pull the agent back or disable the affected topic. The threshold is arbitrary. Having no threshold at all is not.

    The agents I have seen hold up in production are not the ones with the most sophisticated prompts. They are the ones where someone spent real time on the failure cases before launch and built actual monitoring into the plan from the start.

    If you are still signing off agents based on demo performance, you are not testing. You are hoping.

    Frequently Asked Questions

    Why does my Copilot Studio agent testing pass in demos but fail in production?

    Most Copilot Studio agent testing is built around ideal user inputs and predictable conversation paths, which do not reflect how real users actually behave. In production, users ask incomplete questions, switch intent mid-conversation, and use the agent in unintended ways that no demo ever surfaces. Testing needs to go beyond the happy path to catch these edge cases before go-live.

    What are the most common failure modes in Copilot Studio agents?

    The three patterns that appear most often are topic misrouting when user phrasing falls between two intents, action failures that complete silently without triggering any error or fallback, and prompt instructions that break down when they encounter messy real-world data. Each of these can go undetected in testing because they only emerge under realistic conditions.

    How do I know if a Power Automate action failed inside my Copilot Studio agent?

    Silent action failures are a serious risk because the agent can continue the conversation and imply a task completed when it did not. You need explicit error handling and fallback logic in your flows so that failures surface to the user rather than being masked by a confident-sounding response.

    When should I test my Copilot Studio agent against real production data?

    You should test against realistic data before promotion to production, not after. System prompts written against clean test data can behave unpredictably when they encounter unexpected characters, mixed languages, or long strings that only appear in live environments. Incorporating a sample of real or representative data into your test suite is a necessary step before sign-off.

    This post was inspired by AI agent evaluation: How to test and improve your AI agents via Zapier Blog.

  • Power Automate Throttling Limits Will Break Your Flow in Production

    Power Automate Throttling Limits Will Break Your Flow in Production

    If you have ever had a Power Automate flow run perfectly in testing and then start failing two weeks after go-live, Power Automate throttling limits are a likely culprit. Not a bug in your logic. Not a connector issue. Just the platform telling you that you asked for too much, too fast.

    This post is not about what throttling is in theory. It is about what it looks like when it hits you, and what you can actually do about it.

    What Power Automate Throttling Actually Looks Like

    Throttling in Power Automate surfaces as HTTP 429 errors. You will see them in your flow run history as failed actions, usually on connector calls. SharePoint, Dataverse, and HTTP actions are the most common places I see them show up.

    The problem is that most people do not notice at first. The flow has retry logic built in by default, so it quietly retries and sometimes succeeds. Then load increases. Retries stack up. Runs queue behind each other. Eventually things time out or fail hard, and by then you have a real incident on your hands.

    I ran into this building a document processing flow internally. Under testing with twenty files it was fine. Under real load with several hundred files triggered in a short window, the SharePoint connector started returning 429s, retries piled up, and runs that should take seconds were taking minutes or failing entirely.

    Understanding the Two Layers of Throttling

    There are two distinct layers and conflating them leads to bad fixes.

    The first is platform-level throttling. Power Automate itself limits how many actions a flow can execute per minute and per day depending on your licence tier. Performance tier and Attended RPA add-ons give you higher limits. If you are running high-volume flows on a standard per-user licence, you will hit these limits faster than you expect.

    The second is connector-level throttling. This is imposed by the service on the other end, not by Power Automate. SharePoint has API call limits per user per minute. Dataverse has its own service protection limits. An external API you are calling over HTTP has whatever limits its vendor decided on. Power Automate has no control over these, and the retry behaviour it adds does not always help if you are genuinely over the limit.

    Most tutorials only mention one of these. Then your flow breaks in prod and you spend an afternoon figuring out which layer you actually hit.

    How to Handle Power Automate Throttling Limits

    There is no single fix. The right approach depends on which layer is throttling you and why.

    Slow down intentional bulk operations. If your flow is processing items in a loop, add a Delay action inside the loop. Even a one or two second delay dramatically reduces API pressure. It feels wrong to add artificial waits, but it is far better than random failures.

    Reduce concurrency. By default, Apply to Each runs with a concurrency of 20 or 50 depending on settings. Dropping this to 1 or 5 is often enough to stop triggering connector-level throttling. Yes, your flow will run slower. That is usually acceptable. Failed runs are not.

    Batch instead of looping. SharePoint and Dataverse both support batch operations. If you are creating or updating records one at a time in a loop, look at whether you can batch those calls. Fewer requests means less throttling exposure.

    Check your licence tier against your actual volume. This one people skip. If you are running flows that process thousands of actions per day, look at your licence entitlements honestly. The Power Automate Process licence exists for high-volume scenarios. Using a per-user licence for something that genuinely needs a process licence is not a workaround, it is a problem waiting to happen.

    Do not rely on default retry logic as a strategy. The built-in retry handles transient blips. It is not designed to absorb sustained throttling. If your flow needs retries to survive normal operating conditions, that is a signal to fix the root cause, not tune the retry settings.

    The Monitoring Gap

    Most teams I talk to have no visibility into throttling until something breaks. Flow run history shows failures, but it does not surface throttling patterns clearly. Setting up alerts on failed runs is table stakes. What is less common is tracking run duration over time. A flow that starts taking twice as long to complete is often being quietly throttled before it starts failing outright.

    Azure Monitor and the Power Platform admin centre both give you data here. Use them before users start sending messages asking why the automation is slow.

    The Bottom Line

    Power Automate throttling limits are not a corner case. They are something you will hit if your flows handle real enterprise volume. The fix is almost never a single setting. It is a combination of slowing down bulk operations, reducing concurrency, batching where possible, and being honest about whether your licence matches your workload. If you are also thinking about how automation fits into larger orchestration patterns, agentic workflows are not just fancy automation and require a fundamentally different design approach from the start.

    Test under realistic load before go-live. Not twenty items. The actual volume you expect in week three after rollout.

    Frequently Asked Questions

    What are Power Automate throttling limits and why do they cause flows to fail?

    Power Automate throttling limits are restrictions on how many actions or API calls your flow can make within a given time window. There are two layers: platform-level limits set by Microsoft based on your licence tier, and connector-level limits imposed by external services like SharePoint or Dataverse. When these limits are exceeded, you get HTTP 429 errors that can cause flows to fail or time out under real production load.

    Why does my Power Automate flow work in testing but fail in production?

    Testing typically uses a small number of records or files, which stays well within throttling thresholds. Once real users and data volumes are involved, API call rates increase and throttling kicks in. Built-in retry logic can mask the problem initially, but as load grows the retries stack up and flows start timing out or failing outright.

    How do I fix throttling errors in a Power Automate loop?

    Adding a Delay action inside your loop is one of the most effective ways to reduce API pressure during bulk operations. Even a one to two second pause between iterations can significantly cut the rate of connector calls and prevent 429 errors from accumulating.

    How do I know if my Power Automate flow is being throttled?

    Check your flow run history for failed actions showing HTTP 429 responses, which is the standard signal that a throttling limit has been hit. You may also notice runs taking much longer than expected, since the built-in retry logic can quietly delay execution before an eventual hard failure.