Why does my Power Platform center of excellence setup stop working after a few weeks?

The CoE Starter Kit relies on scheduled sync flows that call admin connectors on a recurring basis. If the service account loses its licence, hits throttling limits, or has a permission issue, those flows fail silently and your dashboards show stale data without any obvious warning.

What licences and permissions does the CoE Starter Kit service account need?

The service account requires either a Power Platform Administrator or Global Administrator role, plus a per-user Power Automate licence that covers premium connectors. Without the premium entitlement, the admin connector calls used by the sync flows will not run.

How do I know if my CoE sync flows have stopped running correctly?

The dashboards will not alert you automatically when sync flows fail, so you need to monitor flow run history directly. Comparing your app and environment counts against known tenant activity over time is a practical way to spot when the inventory has drifted from reality.

Why does the CoE Starter Kit struggle with throttling on large tenants?

The sync flows paginate through every environment and every app in a single run, which generates a high volume of connector calls in a short period. This makes them prone to both platform-level and connector-level throttling, so transient errors need to be handled with retries rather than treated as permanent failures.

What are the best power automate error handling patterns to use in production flows?

Effective power automate error handling patterns go beyond a basic Try and Catch scope. You should capture specific action-level failures using result('Try'), differentiate between transient and terminal errors, and include compensating actions to undo partial work when a flow fails midway.

How do I find out which action failed inside a Power Automate Try scope?

Use the result('Try') expression inside your Catch scope and filter for items where the status is not Succeeded. This returns the specific action name, status code, and error body, giving you meaningful diagnostic information instead of a generic failure message.

When should I retry a failed action in Power Automate versus escalate to a human?

Retry transient errors such as 408, 429, 500, and 503 using a Do Until loop with a delay and a maximum iteration count. Terminal errors like 400, 401, 403, and 404 indicate a problem with the request itself, so retrying will not help and the failure should be logged or escalated instead.

Why does Power Automate's built-in retry policy not work well for rate-limited connectors?

The default retry policy applies exponential backoff and then stops, but it does not let you inspect or act on the failure in a meaningful way. At low volumes this can go unnoticed, but as traffic increases the lack of intelligent handling can cause widespread failures that are difficult to diagnose.

Why does running Claude on Amazon Trainium matter for enterprise AI deployments?

The real benefit of Claude on Amazon Trainium for enterprise is not raw chip performance but improved supply, capacity, and price stability for inference workloads. Enterprises building production automations have historically struggled with throttling and unpredictable costs tied to GPU scarcity, and Trainium offers a credible path to more reliable, affordable access.

Why does my Power Automate flow keep getting slow or failing when using Claude?

The most common culprit is shared inference pool capacity, especially during peak hours, which causes latency spikes and occasional rate limit errors. Even with a retry policy in place, the delay is often long enough that users abandon the process before it completes.

How do I control Claude API costs in a high-volume Power Automate flow?

Treat your system prompt like code and review it regularly, because prompts tend to grow over time as edge cases and guardrails are added, multiplying your input token costs across every invocation. Moving context that is only relevant to certain topics into retrieval rather than including it in every call can significantly reduce per-run costs.

When should I start worrying about token cost drift in an AI automation?

Token cost drift can begin within weeks of deploying an agent as system prompts expand to handle new requirements and edge cases. The per-call cost often still looks small, so the problem tends to go unnoticed until the monthly total becomes difficult to justify.

Which is better, Claude vs ChatGPT for automation workflows?

Choosing between Claude and ChatGPT for automation is less about which model performs better in general benchmarks and more about how each behaves inside a pipeline. The criteria that matter are structured output reliability, tool-calling accuracy, and how well the model holds instructions as context grows. Testing both models against your specific workflow conditions will tell you far more than any consumer comparison.

Why does my AI agent start producing errors after working fine at first?

This often happens because the model experiences instruction drift as context accumulates over time. Long flows gather user inputs, tool results, and retrieved documents, and some models struggle to maintain consistent behaviour at high token counts. Testing your model under realistic context lengths before going to production can help catch this early.

How do I choose an AI model for a Power Automate or Copilot Studio flow?

Focus on how the model handles structured outputs, selects the right tools when multiple connectors are available, and behaves when context is long rather than short. Consumer benchmarks test models in isolation, but real automation pipelines require consistent, predictable behaviour across repeated calls with varying inputs. Running your own tests against your actual schema and tools will give you more reliable answers.

What causes silent failures in AI automation workflows?

One of the most common causes is inconsistent structured output, where a model occasionally adds unexpected formatting, drops required fields, or wraps a response in a code block instead of returning clean JSON. These errors can pass through without triggering obvious alerts while still breaking downstream actions. Testing output stability across many varied inputs is one of the most important steps before deploying a model-powered flow.

What are codex automations scheduled AI runs and how do they work?

Codex automations scheduled AI runs are recurring AI tasks that fire on a set schedule, where the model performs reasoning rather than following a fixed, pre-wired set of actions. Unlike a traditional scheduled flow, the AI decides what to read, compare, summarise, or flag each time it runs. This makes them suited to tasks where the inputs vary but the goal stays the same.

How do I know when to use a scheduled AI run instead of a Power Automate flow?

If you can map out every branch and action of a task before it runs, a standard flow is the right tool. When the output requires interpretation, prioritisation, or judgement based on inputs that change each time, a scheduled AI run is a better fit. Tasks like triaging alerts, reviewing documents for meaningful changes, or summarising error logs fall into this second category.

Why does a scheduled Power Automate flow struggle with fuzzy or variable logic?

Power Automate flows are deterministic, meaning they follow the same paths every time regardless of context. When the logic requires understanding nuance or making a judgement call, the flow typically ends up forwarding everything to a human rather than completing the work itself. The flow becomes a delivery mechanism rather than a true automation.

When should I consider replacing a recurring manual review task with an AI automation?

If a task runs on a regular cadence, involves inputs that vary in structure, and produces an output that is a summary, ranking, or decision rather than a fixed result, it is a strong candidate for AI automation. Examples include weekly pull request reviews, overnight alert triage, and monthly document audits where a human currently does the interpretation work.

What are OpenAI Workspace Agents?

OpenAI Workspace Agents let users describe a repeatable task, connect tools, and have an agent run it on a schedule or on demand. They use a conversation-first approach, meaning you define what you want through chat rather than building a visual workflow diagram.

How do OpenAI Workspace Agents compare to Power Automate?

Both handle triggers, actions, and conditions to automate tasks, so they overlap significantly in what they can do. The key difference is how you build them: Power Automate starts from a visual flow diagram, while Workspace Agents start from a conversation with the model.

When should I use Power Automate instead of a conversational agent?

Power Automate is better suited when you need clear error handling, version control, and a fully inspectable execution path. Conversation-first tools like Workspace Agents work well when you want to define an outcome and let the model determine the steps.

Why does the rise of OpenAI Workspace Agents matter for enterprise automation builders?

As more people use conversational AI tools like ChatGPT in their personal lives, they will expect a similar experience in workplace tools. This raises the bar for how automation platforms present themselves, even if enterprise governance and compliance requirements still favour established platforms.

Why does agentic workflow latency get so bad in multi-step agents?

Each individual call in an agent loop may seem fast, but the overhead adds up across 8 to 15 round trips per user request. Connection setup, authentication, and token streaming costs stack on every single step, turning individually acceptable delays into a frustrating overall wait time.

What is connection-scoped caching and how does it help agent performance?

Connection-scoped caching keeps expensive context like system prompts, tool definitions, and prior conversation state ready across multiple calls instead of reprocessing it each time. This avoids redundant work on every step of an agent loop and significantly reduces the overhead that accumulates across a multi-turn interaction.

How do I reduce latency in Copilot Studio and Power Automate agents?

Start by auditing how many round trips your agent actually makes for a single user request, since this is where most hidden latency lives. Look for opportunities to batch tool calls, reduce unnecessary steps in your loop, and watch for platform-level improvements like persistent connections that reduce per-call overhead.

Why do users abandon AI agents even when the agent gives correct answers?

If the response takes too long, users lose confidence and revert to familiar alternatives before the agent finishes. Technical correctness does not matter if the experience feels slow enough to suggest something has gone wrong.

What is AI automation decision ownership and why does it matter?

AI automation decision ownership refers to who or what holds responsibility for making judgment calls inside a workflow. It matters because modern AI agents can now handle exceptions and reasoning tasks that previously required a human approver, fundamentally changing the role people play in automated processes.

How do I know if my automation is truly agentic or just a faster rule-based flow?

A good indicator is where decisions actually happen. If a human or a rigid rule set still handles every exception and the AI only executes predefined steps, the workflow is not truly agentic. An agentic setup puts the AI in the decision layer, with deterministic steps carrying out what it concludes.

Why does automating a bad process lead to worse outcomes?

Automation removes the friction that sometimes forces people to notice problems. When a flawed process runs faster and at higher volume, errors multiply at the same rate as the throughput, making the underlying issues harder to catch and more costly to fix.

When should I move a human out of the approval step in an automated workflow?

A human can move out of direct approvals when the decision follows a consistent pattern that can be expressed as clear policy constraints. The better use of human judgment at that point is writing and periodically reviewing those constraints, rather than approving individual cases one by one.

Is it still worth it to learn Power Automate in 2026?

Yes, learning Power Automate in 2026 remains valuable because it is the execution layer for almost everything that runs inside a Microsoft-stack enterprise. AI tools and agents can help you build flows faster, but you still need to understand how flows work to debug failures, handle errors, and get workflows into production reliably.

What is the difference between Copilot Studio and Power Automate?

Copilot Studio handles the reasoning and decision-making layer, while Power Automate is the execution layer that actually interacts with systems like SharePoint, Dataverse, and external APIs. An agent can trigger a flow, but the flow is what performs the real work and touches the system of record.

Why does my Power Automate flow work in testing but fail in production?

Common causes include missing retry logic, shared credentials hitting rate limits under load, or steps that silently fail and pass null values forward. These issues are easy to miss during demos but become serious problems at scale, which is why understanding flow structure matters beyond just getting something running.

When should I learn Power Automate instead of relying on AI-generated flows?

You should invest in learning Power Automate when you are responsible for workflows that need to be reliable in production, not just functional in a demo. If you cannot read a run history or diagnose a throttling error, you will struggle to fix failures that AI tooling alone cannot explain.

Tag: Power Automate

Inside a Power Platform Center of Excellence: Why Most Setups Stall in Month Three

Most people think a Power Platform Center of Excellence setup works like installing a product. You import the CoE Starter Kit solution, run the setup wizard, point it at your tenant, and the dashboards fill up. Job done.

That is the surface behaviour. The actual mechanism underneath is a chain of dependencies, sync jobs, and admin connector calls that quietly degrade if any one link breaks. I keep seeing teams hit this on LinkedIn and in conversations with people at other organisations. The kit looks healthy for six weeks, then the inventory stops matching reality and nobody knows why.

Let me walk through what is actually happening underneath.

What you see on the surface

You install the CoE Starter Kit, the wizard provisions a Dataverse environment, and a set of cloud flows starts populating tables like Environments, Apps, Flows, and Makers. The Power BI dashboard lights up. You see a maker count, an app count, an orphaned resource list.

From the outside, it looks like the kit is scanning your tenant. It is not scanning anything in real time. Every number you see is the result of scheduled flows that ran sometime in the last 24 hours, hit admin connectors, paginated through results, and wrote rows into Dataverse. The dashboard is just a read on that table.

This matters because the moment those flows stop succeeding, your dashboard stops being true. And it does not tell you it stopped being true.

The underlying mechanism

The CoE kit runs on a stack of sync flows. The most important ones are Admin Sync Template v3 (environments), Admin Sync Template v4 (apps and flows), and the maker activity flows. Each one authenticates as the service account you set up during install and calls the Power Platform for Admins, Power Apps for Admins, and Power Automate Management connectors.

Three things have to be true for those flows to keep working. The service account needs an active Power Platform Administrator or Global Administrator role. The account needs a per-user Power Automate licence with the right premium entitlements, because the admin connectors are premium. And the account needs to not be hitting throttling limits while paginating through a tenant with thousands of resources.

The CoE sync flows are exactly the kind of workload that hits both platform-level and connector-level throttling, because they loop through every environment and every app in the tenant in one run. Getting your Power Automate error handling patterns right matters here — transient throttling errors need to be caught and retried differently from terminal failures, or the sync silently drops data.

Where it breaks

The most common failure mode is not the install. It is month three.

The service account password expires, or MFA gets enforced tenant-wide, or someone removes the admin role because of a security review. The flows start failing silently. Default retry logic masks it for a week or two. Then the runs hit timeout and stop entirely. The dashboard freezes on stale data, but the numbers still look plausible, so nobody notices.

The second failure mode is scale. The kit was designed for small to medium tenants. If you have 40,000 apps and 80,000 flows across hundreds of environments, the sync flows do not finish inside the 30-day Dataverse retention window for run history. You lose visibility into your own automation.

The third one is the licensing trap. Teams install the kit on a trial, then move to production without giving the service account a proper premium licence. The flows technically run, but premium connectors throw 403s on specific calls, and only some tables populate. Half the dashboard works. The other half lies.

What this means for how you build it

Treat the CoE as a product you operate, not a kit you install. That changes a few decisions.

Use a dedicated service principal with certificate auth where the connectors support it, instead of a user account with a password. The service principal does not expire, does not get MFA, does not get caught in a leaver process. Where you must use a user account, document it, monitor it, and put the password rotation in a runbook owned by a real team.

Build a health check flow that runs daily and alerts when the last successful sync timestamp on each core table is older than 48 hours. Do not trust the dashboard to tell you the dashboard is broken.

For larger tenants, split the sync flows by environment group instead of running them tenant-wide. The kit supports filtering, and partial visibility refreshed daily beats full visibility refreshed never.

Decide what governance question the CoE is actually answering for you before you build dashboards on top of it. Inventory is not governance. A list of 12,000 apps with no owner attached is just a longer problem. The broader challenge of Power Platform governance that does not kill adoption is worth thinking through before you design your DLP and ownership policies around what the CoE surfaces, because the data is only useful if makers trust the system enough to stay inside it.

The CoE Starter Kit is genuinely good engineering. It just is not magic. If you are starting to build out more automation on top of your tenant inventory, the question of why Power Automate is still worth learning in 2026 is a good framing for where to focus the team’s time once the CoE is stable. If you want to compare notes on how other teams are running theirs, I am always up for that conversation.

Frequently Asked Questions

Why does my Power Platform center of excellence setup stop working after a few weeks?

The CoE Starter Kit relies on scheduled sync flows that call admin connectors on a recurring basis. If the service account loses its licence, hits throttling limits, or has a permission issue, those flows fail silently and your dashboards show stale data without any obvious warning.

What licences and permissions does the CoE Starter Kit service account need?

The service account requires either a Power Platform Administrator or Global Administrator role, plus a per-user Power Automate licence that covers premium connectors. Without the premium entitlement, the admin connector calls used by the sync flows will not run.

How do I know if my CoE sync flows have stopped running correctly?

The dashboards will not alert you automatically when sync flows fail, so you need to monitor flow run history directly. Comparing your app and environment counts against known tenant activity over time is a practical way to spot when the inventory has drifted from reality.

Why does the CoE Starter Kit struggle with throttling on large tenants?

The sync flows paginate through every environment and every app in a single run, which generates a high volume of connector calls in a short period. This makes them prone to both platform-level and connector-level throttling, so transient errors need to be handled with retries rather than treated as permanent failures.

May 3, 2026

RPA vs AI Automation for Enterprise Workflows

The decision I keep watching teams get wrong: should this workflow be built with RPA or with an AI agent. The RPA vs AI automation debate gets framed as old tech versus new tech, which is the wrong frame entirely. They solve different problems. Picking the wrong one is how you end up with a fragile bot that needs babysitting or an agent that hallucinates its way through invoice approvals.

I have built both inside a large org. Here is how I actually decide.

Determinism and predictability

RPA assumes the screen, the field, and the click path are the same every time. If the SAP transaction code is VA01 today and VA01 tomorrow, RPA wins. It will execute that path 10,000 times with zero variance.

AI automation assumes variance is the input. The email phrasing changes, the PDF layout changes, the customer asks the same thing five different ways. An agent reasons over that variance. It is non-deterministic by design, which is a feature for unstructured input and a liability for structured execution.

Rule of thumb I use: if I can write the decision tree on a whiteboard in 15 minutes, it is RPA work. If the decision tree has more than 30 branches and half of them are “it depends on the wording,” it is agent work.

Cost per execution

Dimension	RPA (Power Automate Desktop)	AI Agent (Copilot Studio)
Per-run cost	Near zero after license	Roughly 1 message credit per turn, often 5 to 15 turns per task
License model	Per-bot or per-user attended/unattended	Message packs, 25,000 messages per pack
Scaling cost	Linear with bot count	Linear with conversation volume and tool calls
Failure cost	Bot stops, you fix it	Agent confidently completes the wrong task

RPA at 100,000 runs a month is basically free compute after the license. An agent at 100,000 runs is not. I have seen teams underestimate this by an order of magnitude because they tested with 50 runs and extrapolated linearly without counting tool calls and orchestration turns.

Maintenance and brittleness

RPA breaks when the UI changes. A vendor pushes a new SAP Fiori update, three selectors shift, your bot fails at 3am. I have lived this. The fix is usually 30 minutes, but you need someone on call who knows the bot.

AI agents break differently. They do not fail loudly. They drift. The model provider updates, your prompt that worked last month now produces a slightly different output format, and downstream parsing silently fails. I wrote about this in my agentic workflow post. The failure mode is worse because users find out three days later when the wrong invoice gets paid. If you are building flows that sit underneath an agent, Power Automate error handling patterns that actually work will save you from the silent failures that surface weeks after go-live.

RPA maintenance is reactive and obvious. Agent maintenance is proactive and requires evaluation infrastructure most teams do not build.

What the work actually looks like

This is the dimension nobody compares on. Look at the input.

Structured input, structured output, no judgment needed: RPA. Copying 200 rows from a legacy system into a SharePoint list, kicking off a daily report, screen-scraping a vendor portal that has no API. Boring, repetitive, deterministic. Power Automate Desktop handles this all day. If you are still deciding whether to invest time in the broader platform, RPA is not the right tool for every repetitive task is worth reading before you commit to a build.

Unstructured input, structured output, judgment needed: AI. Reading 500 supplier emails and extracting the PO number, classifying tickets by intent, summarizing a 40-page contract into five bullet points. This is where Copilot Studio or a custom agent earns its cost.

The hybrid case is the most common one and the one most teams miss. The agent reads the email, extracts the structured fields, then hands off to an RPA bot or a cloud flow that executes the deterministic part. The agent is the reasoning layer. RPA is the execution layer. They are not competitors. They are stacked.

Governance and auditability

RPA logs are simple. Action ran, action succeeded, here is the screenshot. Auditors love this.

AI agents need decision logs, not just execution logs. You need to capture why the agent picked tool A over tool B. Most teams I talk to are not logging this and will get caught when the first compliance review hits. I covered this in The Real Shift Is Not Faster Work It Is Who Owns the Decision. Based on what I have built, this is the gap that bites you 6 months in, not on day one.

Choose RPA if / Choose AI if

Choose RPA if: the input is structured, the path is deterministic, the volume is high, the cost per run needs to be near zero, and the system has no API. This is most legacy integration work.

Choose AI automation if: the input is unstructured, the work requires classification or extraction or summarization, variance is the norm, and you have the evaluation discipline to catch silent drift.

Choose both if: you have a real workflow. Most enterprise automation is hybrid. The line is not RPA versus AI. It is figuring out which layer does what.

Frequently Asked Questions

What is the difference between RPA vs AI automation for enterprise workflows?

RPA is built for repetitive, predictable tasks where the process follows the same steps every time, while AI automation handles unstructured or variable inputs that require reasoning. They are not competing technologies but tools suited to different problems. Choosing the wrong one leads to either a fragile bot or an agent making confident mistakes.

When should I use RPA instead of an AI agent?

Use RPA when your process is consistent, rule-based, and can be mapped out as a clear decision tree. If the same fields, screens, or steps repeat thousands of times without variation, RPA will be faster, cheaper, and more reliable than an AI agent.

How do I know if AI automation is worth the cost for my workflow?

AI agents consume message credits per turn and most tasks require multiple turns, so costs scale quickly at high volumes. Before committing, calculate expected monthly runs and multiply by average turns per task, not just per conversation. Teams often underestimate this significantly when testing at small scale.

Why does RPA break so often in enterprise environments?

RPA relies on fixed UI selectors, so any interface update from a vendor can shift elements and cause the bot to fail. These failures are usually quick to fix but require someone familiar with the bot to be available when issues occur. Unlike AI agents, RPA fails loudly and immediately rather than silently producing wrong results.

May 1, 2026

Power Automate Error Handling Patterns That Actually Work

Most Power Automate error handling I see in the wild is one Try scope, one Catch scope, and a Teams message that says Flow failed. That is not error handling. That is a notification with extra steps.

Real Power Automate error handling patterns answer three questions. What failed. Why it failed. What happens to the work that was already in flight when it failed. If your flow does not answer all three, you are going to find out about problems from an angry colleague, not from your monitoring.

I have rebuilt enough flows after silent failures to have strong opinions on this. Here is what I actually use.

The Try Catch Finally pattern is the floor, not the ceiling

Three scopes. Try runs your logic. Catch is configured with Run After set to has failed, is skipped, and has timed out. Finally runs after both, regardless of outcome. This is documented well in the Microsoft Learn Power Automate docs and most builders get this far.

The problem is what people put inside Catch. Usually a single Post Message action with @{workflow()?['run']?['name']} and a generic failure string. That tells you the flow failed. You already knew that. It does not tell you which action failed inside the Try, what the actual error message was, or what input caused it.

The fix is using result('Try') inside Catch and filtering for items where status is not Succeeded. That gives you the specific action name, the status code, and the error body. Now your alert is useful.

Differentiate transient from terminal errors

This is the pattern most flows skip and it is the one that matters most in production. A 429 from a connector is not the same problem as a 400 from bad input. One needs a retry with backoff. The other needs a human.

Inside Catch, parse the error and branch. Status codes 408, 429, 500, 502, 503, 504 are transient. Retry them, ideally with a Do Until that has a delay and a max iteration count. Status codes 400, 401, 403, 404 are terminal. Do not retry. Log them and move on or escalate.

Power Automate’s built-in retry policy on individual actions covers some of this, but it does not let you do anything intelligent with the failure. It just retries with exponential backoff and then gives up. For anything that touches an external system with rate limits, I wrote about how this connects to throttling limits and why default retry behaviour can mask problems until volume increases.

Compensating actions for partial failures

This is the one almost nobody does. If your flow creates a SharePoint item, then sends an email, then updates a Dataverse record, and step three fails, what happens to steps one and two? Nothing, by default. You have a SharePoint item that should not exist and an email that should not have been sent.

The pattern is simple. Inside Catch, run compensating actions for whatever Try already completed. Delete the SharePoint item. Send a correction email. Mark the Dataverse record as Failed rather than leaving it half-updated. You do this by checking result('Try') for which actions actually succeeded before the failure, then reversing only those. If you are using SharePoint lists as your backend, as I covered in SharePoint Lists Are Still the Best Backend for 80 Percent of Power Platform Apps, the compensating delete is straightforward because the list item ID is always available in scope.

It is more code. It is also the difference between a flow that fails cleanly and a flow that leaves your data in a state nobody can reason about three weeks later.

Centralise error logging

Stop writing custom logging logic in every flow. Build one child flow that takes the run ID, the flow name, the failed action, the error body, and the input payload, and writes it to a single Dataverse table or SharePoint list. Every flow calls that child flow from its Catch scope.

Now you have one place to look when things break. You can build a Power BI report on it. You can spot patterns across flows. You can see that 80 percent of your failures are coming from one connector and actually fix the root cause instead of patching individual flows.

The notification trap

If every failure sends a Teams message, people stop reading them within two weeks. I have seen this play out on multiple internal builds. Tier your alerts. Transient errors that self-recover do not need a notification. Terminal errors that need human input do. Compensating actions that ran successfully need a log entry, not a ping.

The goal is that when a notification arrives, the person receiving it actually opens it. Anything else is noise. This connects to a broader problem I have written about in Power Platform Governance That Does Not Kill Adoption, where poorly designed alerting policies erode trust in automation the same way overly restrictive DLP policies erode maker trust in the platform.

The pattern that ties it together

Try Catch Finally for structure. Result filtering for specificity. Transient versus terminal branching for intelligence. Compensating actions for data integrity. Centralised logging for visibility. Tiered notifications for sanity.

None of this is exotic. All of it is skipped because the happy path works in testing and the edge cases only show up at volume. Build the error handling first. The flow will be slower to ship and faster to trust. And if you are still deciding whether Power Automate is worth investing this depth of effort into, Why Power Automate Is Still Worth Learning in 2026 covers exactly that question.

Frequently Asked Questions

What are the best power automate error handling patterns to use in production flows?

Effective power automate error handling patterns go beyond a basic Try and Catch scope. You should capture specific action-level failures using result(‘Try’), differentiate between transient and terminal errors, and include compensating actions to undo partial work when a flow fails midway.

How do I find out which action failed inside a Power Automate Try scope?

Use the result(‘Try’) expression inside your Catch scope and filter for items where the status is not Succeeded. This returns the specific action name, status code, and error body, giving you meaningful diagnostic information instead of a generic failure message.

When should I retry a failed action in Power Automate versus escalate to a human?

Retry transient errors such as 408, 429, 500, and 503 using a Do Until loop with a delay and a maximum iteration count. Terminal errors like 400, 401, 403, and 404 indicate a problem with the request itself, so retrying will not help and the failure should be logged or escalated instead.

Why does Power Automate’s built-in retry policy not work well for rate-limited connectors?

The default retry policy applies exponential backoff and then stops, but it does not let you inspect or act on the failure in a meaningful way. At low volumes this can go unnoticed, but as traffic increases the lack of intelligent handling can cause widespread failures that are difficult to diagnose.

April 26, 2026
Anthropic Running Claude on Trainium Matters More for Enterprise Than the Benchmarks Suggest

Most of the coverage of Anthropic running Claude on Amazon’s Trainium chips frames it as a benchmark race. Faster training. Cheaper inference. Another shot at Nvidia. That framing misses what actually matters if you are building production automations. The thing that should make enterprise Power Platform and AI people pay attention to claude on amazon trainium for enterprise is not raw performance. It is supply, capacity, and price stability.

I have been building Claude-backed flows internally for a while now. The model quality has not been my problem. The economics and the throttling have.

Why the Trainium Story Is Actually a Capacity Story

When you read about Anthropic moving serious training and inference onto Trainium, the interesting part is not whether the chip beats an H100 on some synthetic benchmark. The interesting part is that for the first time there is a credible path to Claude pricing that is not entirely tied to Nvidia GPU scarcity.

If you have ever tried to scale a customer-facing agent on a shared inference pool, you know what I mean. Peak hours hit. Latency drifts up. Occasionally you get a 429. Your Power Automate flow has a retry policy, sure, but the user already saw the spinning circle for nine seconds and moved on.

Capacity is the silent killer. Benchmarks are the loud distraction.

What Token Cost Drift Looks Like in a Real Power Automate Flow

Here is the thing nobody tells you when you slot Claude into a flow through Bedrock or the API. The first version of your agent has a tight system prompt. Maybe 800 tokens. Then someone asks for a new edge case. You add a few examples. Then someone reports a wrong answer, so you add a guardrail paragraph. Then you add tool descriptions. Then you add a few more examples because the tool descriptions confused the model.

Six weeks later your system prompt is 4,200 tokens. Every single invocation pays for those tokens. If your flow runs 12,000 times a month, you just multiplied your input cost by five and nobody noticed because the per-call cost still looks tiny on the invoice.

I learned this the hard way on an internal agent. The unit cost looked fine. The monthly bill did not. The fix was not switching models. The fix was treating the system prompt like code, with a review step, and splitting context that only some topics need into retrieval rather than baking it into every call.

This is the part where chip economics actually touch your Power Automate flow. If inference cost per token drops because Anthropic has cheaper compute, that bloated prompt hurts less. If it does not drop, your business case erodes quietly while you build more features on top.

Bedrock vs Direct Anthropic API for Enterprise Automation Workloads

People ask me which one to use. The honest answer is it depends on what your governance team will sign off on, not what the model does.

Bedrock gives you the AWS contract, the data residency story, the IAM model your security team already understands, and provisioned throughput as an option. The direct Anthropic API gives you faster access to new models and sometimes better pricing on burst usage.

For anything customer-facing or anything that touches regulated data, Bedrock usually wins on the paperwork alone. For internal experimentation and prototypes, the direct API is fine. The mistake I see people make is prototyping on the direct API and then trying to lift and shift to Bedrock at the last minute. Region availability, model version naming, and quota structure are different enough that you will burn a sprint on it.

Pick the path your production version will live on. Build there from day one. If you are still weighing how Claude fits into your automation architecture more broadly, Claude as an Orchestration Brain Is the Most Interesting Thing Happening in Enterprise AI Right Now covers where it actually earns its place as a reasoning layer.

How I Would Plan Claude Capacity for an Agent You Actually Depend On

If the agent matters, on-demand inference is not enough. Provisioned or reserved capacity is starting to look less like a luxury and more like a baseline. Latency Is the Quiet Killer of Agentic Workflows and Almost Nobody Talks About It goes into how to budget round-trip time before you build, and the same logic applies to capacity. A flow that works at 2pm on Tuesday and times out at 10am on Monday is not a production system. It is a demo with good luck.

Three things I would actually do.

Measure your real token distribution. Not the average. The 95th percentile input and output. That is what your capacity needs to handle, not the median case.

Separate your workloads. The agent that drafts an email for an internal user can sit on shared on-demand inference. The agent that responds to a customer in under three seconds cannot. Different SLAs, different capacity tiers.

Track cost per successful outcome, not cost per call. An agent that fails 20 percent of the time and gets retried is twice as expensive as the invoice suggests. This is where bad tool design quietly destroys your unit economics. If you are unsure whether the model choice even matters as much as you think for automation workloads, Claude vs ChatGPT Is the Wrong Question When You Are Building Automations is worth reading before you optimize the wrong variable.

The Trainium news matters because it changes the long-term curve on what any of this costs. But the curve only helps you if your architecture is set up to benefit from it. Bloated prompts, on-demand only inference, and no measurement of cost per outcome will eat any savings the chip story delivers.

Read the news through that lens. The benchmarks are not the point.

Frequently Asked Questions

Why does running Claude on Amazon Trainium matter for enterprise AI deployments?

The real benefit of Claude on Amazon Trainium for enterprise is not raw chip performance but improved supply, capacity, and price stability for inference workloads. Enterprises building production automations have historically struggled with throttling and unpredictable costs tied to GPU scarcity, and Trainium offers a credible path to more reliable, affordable access.

Why does my Power Automate flow keep getting slow or failing when using Claude?

The most common culprit is shared inference pool capacity, especially during peak hours, which causes latency spikes and occasional rate limit errors. Even with a retry policy in place, the delay is often long enough that users abandon the process before it completes.

How do I control Claude API costs in a high-volume Power Automate flow?

Treat your system prompt like code and review it regularly, because prompts tend to grow over time as edge cases and guardrails are added, multiplying your input token costs across every invocation. Moving context that is only relevant to certain topics into retrieval rather than including it in every call can significantly reduce per-run costs.

When should I start worrying about token cost drift in an AI automation?

Token cost drift can begin within weeks of deploying an agent as system prompts expand to handle new requirements and edge cases. The per-call cost often still looks small, so the problem tends to go unnoticed until the monthly total becomes difficult to justify.

April 25, 2026
Claude vs ChatGPT Is the Wrong Question When You Are Building Automations

Another Claude vs ChatGPT comparison landed in my feed this week. I came across a piece on the Zapier Blog running the usual head-to-head: reasoning, coding, writing, ethical dilemmas. Useful if you are picking a chat assistant for personal use. Almost useless if you are deciding claude vs chatgpt for automation inside a real enterprise flow.

I keep seeing people pick a model based on a consumer benchmark and then act confused when their Copilot Studio agent starts returning malformed JSON in week three. The criteria that matter when a model sits behind a connector are not the criteria that make for a good blog post.

Why Head to Head Model Comparisons Stop Being Useful the Moment You Add a Connector

Consumer comparisons test the model in isolation. One prompt in, one answer out, a human judges the output. That setup tells you nothing about what happens when the model has to call a tool, parse a response, call another tool, and feed a structured result into a downstream action.

Inside an automation, the model is not the product. The model is one component in a pipeline. The question is not which one writes better poetry. The question is which one fails in ways your orchestration layer can actually handle.

I wrote about this angle in a previous post on agentic workflows. The LLM is the reasoning layer, not the agent. Picking the reasoning layer on vibes from a consumer benchmark is how you end up with a beautifully worded confident response for a task that never completed.

The Four Things That Actually Matter When a Model Sits Inside an Automation

These are what I actually test for. None of them show up in head-to-head comparisons.

Structured output stability under load. Ask the same model for the same JSON schema a hundred times with slightly different inputs. Count how often it adds a trailing comma, drops a required field, wraps the JSON in a code fence, or decides today is the day to add a helpful explanation before the response. This is the single biggest source of silent failures I see in production.

Tool-calling predictability with multiple connectors. Give the model five tools. Watch how it picks. A model that is 95 percent accurate on tool selection with two tools can drop to 70 percent with five because the descriptions start competing. Consumer tests never measure this.

Behaviour when context gets long. Most real flows accumulate context: user input, previous tool results, system instructions, retrieved documents. I want to know how the model behaves at 40k tokens of accumulated state, not at 500. Instruction drift usually shows up here first.

Pricing behaviour under loops. An agent that retries three times on a failed tool call can quietly 10x your cost. The cheaper model on paper is not always the cheaper model in production once you account for retry patterns and token accumulation. Latency Is the Quiet Killer of Agentic Workflows covers how round-trip costs compound in ways most people never budget for until it is too late.

How I Pick Between Claude and GPT for a Specific Flow

I do not pick a model for the whole platform. I pick per use case.

For long-context reasoning where the model needs to hold a lot of state and follow detailed instructions without drifting, Claude has been the more predictable option in my testing. Fewer surprise deviations from the system prompt when the context gets messy. If you want to go deeper on why Claude works well as a reasoning layer inside enterprise pipelines, Claude as an Orchestration Brain Is the Most Interesting Thing Happening in Enterprise AI Right Now gets into the architecture side of that decision.

For fast, cheap, high-volume classification or extraction where the schema is simple and the input is short, GPT models tend to win on cost-per-call and latency. If the task is “read this email and return one of five categories,” I am not paying for a heavyweight reasoning model.

For tool-calling inside a Copilot Studio agent with multiple Power Automate actions, I test both. There is no universal winner. It depends on how the tool descriptions are written, how many there are, and how ambiguous the user input gets.

The honest answer most of the time is: it does not matter as much as the people arguing about it think it does. The bigger wins come from tool design, prompt structure, and failure handling. A well-designed flow with a mid-tier model beats a sloppy flow with the flagship every time.

What to Test Before You Commit a Model to Production

Before a model goes behind a production flow, I run four checks. Not benchmarks. Checks against the actual flow.

Run the real schema a hundred times with production-like inputs. Measure malformed output rate. Anything above one percent and you need a validation and retry layer, no matter which model you picked.

Run the tool-calling logic with the real connector set, not a simplified test set. Watch for the model picking the wrong tool when two descriptions overlap. This is where I lost the most time the hard way.

Simulate a long session. Feed it accumulated context that looks like a real user journey, not a single clean turn. Watch for instruction drift.

Load test with the pricing model in mind. Know what a retry storm costs you before it happens in production, not after finance asks questions. The Power Automate documentation covers retry policies, but most people never configure them until something breaks.

The Claude vs ChatGPT question is the wrong frame. The right question is: which model handles the specific shape of failure my flow is most exposed to. Answer that and the comparison stops mattering. That is the part I keep trying to explain when people ask me, and it still gets pushed aside for whichever model topped a benchmark last week.

Frequently Asked Questions

Which is better, Claude vs ChatGPT for automation workflows?

Choosing between Claude and ChatGPT for automation is less about which model performs better in general benchmarks and more about how each behaves inside a pipeline. The criteria that matter are structured output reliability, tool-calling accuracy, and how well the model holds instructions as context grows. Testing both models against your specific workflow conditions will tell you far more than any consumer comparison.

Why does my AI agent start producing errors after working fine at first?

This often happens because the model experiences instruction drift as context accumulates over time. Long flows gather user inputs, tool results, and retrieved documents, and some models struggle to maintain consistent behaviour at high token counts. Testing your model under realistic context lengths before going to production can help catch this early.

How do I choose an AI model for a Power Automate or Copilot Studio flow?

Focus on how the model handles structured outputs, selects the right tools when multiple connectors are available, and behaves when context is long rather than short. Consumer benchmarks test models in isolation, but real automation pipelines require consistent, predictable behaviour across repeated calls with varying inputs. Running your own tests against your actual schema and tools will give you more reliable answers.

What causes silent failures in AI automation workflows?

One of the most common causes is inconsistent structured output, where a model occasionally adds unexpected formatting, drops required fields, or wraps a response in a code block instead of returning clean JSON. These errors can pass through without triggering obvious alerts while still breaking downstream actions. Testing output stability across many varied inputs is one of the most important steps before deploying a model-powered flow.

This post was inspired by Claude vs. ChatGPT: What’s the difference? [2026] via Zapier Blog.

April 23, 2026
Scheduled Codex Runs Are the Missing Piece Between Chatbots and Real Automation

I came across a post from OpenAI about Codex Automations the other day, and it reminded me of a pattern I keep seeing people miss. Everyone is obsessed with chatbots. Meanwhile the real unlock is boring and familiar to anyone from the Power Platform world. It is the schedule. Codex automations scheduled ai runs are the bridge between cool demo and something that actually replaces a recurring job.

Most AI tooling still assumes a human is in the loop pressing a button. That assumption is the ceiling. Break it and the shape of what you build changes.

Why a scheduled AI run is different from a scheduled flow

A scheduled Power Automate flow is deterministic. Same trigger, same actions, same branches. You can draw it on a whiteboard before it runs and the drawing will be correct. I have written about this before. If you can fully diagram the execution path before it runs, it is not an agentic workflow. It is a flow.

A scheduled Codex run is the opposite. The trigger fires on a schedule, but the work happening inside is a reasoning step. The model decides what to read, what to compare, what to summarise, what to flag. You are not wiring actions. You are wiring a recurring thought.

That sounds fluffy. It is not. It changes what workloads are worth automating at all.

The workload shape where Codex automations scheduled ai runs actually fit

Here is the shape I look for. The task runs on a cadence. The inputs vary in structure every time. The output is a judgement, a summary, or a prioritised list. No two runs look the same but the goal is identical.

Think about the recurring jobs that never got automated because the logic was too fuzzy. The weekly review of open pull requests that actually need attention. The Monday morning scan of overnight alerts to decide which three matter. The monthly pass over a folder of documents to flag what changed in a way a human cares about.

In Power Automate you would try and fail. You would end up with a flow that emails everything to a human who then does the real work. The flow is a courier, not an automation.

A scheduled AI run is different. The reasoning is the automation. The delivery is the courier part.

What I would build with this tomorrow if I had it internally

A daily 7am run that reads the previous day’s pipeline run logs across a set of flows, clusters the failures by likely root cause, and posts a short Teams message with the three things worth looking at. Not the raw error list. The interpretation.

A weekly pass over a shared folder that produces a diff in plain English. What changed, who changed it, whether it looks like policy drift or normal edits.

A monthly review of connector usage that flags flows quietly heading toward platform-level throttling before they break in production.

None of these are chatbots. None of them need a human to press a button. All of them are reasoning tasks that happen on a clock. That is the fit.

Where Power Automate still wins and where it does not

Power Automate wins the moment the work is deterministic and the integrations are inside the Microsoft estate. Approvals. SharePoint updates. Dataverse writes. Email parsing with known templates. Anything with governance, DLP, and environment strategy attached. A scheduled AI run from outside the tenant does not solve those things. Power Automate does.

It loses the moment the work is a judgement call on messy inputs that change shape every run. That is where a scheduled Codex or Claude run wins by a wide margin. Trying to force that into a flow gives you the courier pattern. Useful, but not automation. Latency Is the Quiet Killer of Agentic Workflows and the same principle applies here — the more reasoning steps you stack inside a scheduled run, the more carefully you need to budget what actually happens inside that window.

The interesting move is using both. The scheduled AI run produces the judgement. Power Automate delivers it, logs it, routes approvals, writes to the system of record. The reasoning layer decides. The execution layer acts. I have said this more than once and I will keep saying it because most teams still collapse the two. If you are thinking about where Workspace Agents compare to Power Automate in this picture, that framing is worth reading before you decide which layer owns the work.

If you already think in triggers and schedules from the Power Platform world, you are better positioned than most to use this well. You know what a cadence looks like. You know what idempotent means. You know why retry logic matters. Now the thing running inside the schedule can think. That is the shift.

Stop waiting for someone to press a button.

Frequently Asked Questions

What are codex automations scheduled AI runs and how do they work?

Codex automations scheduled AI runs are recurring AI tasks that fire on a set schedule, where the model performs reasoning rather than following a fixed, pre-wired set of actions. Unlike a traditional scheduled flow, the AI decides what to read, compare, summarise, or flag each time it runs. This makes them suited to tasks where the inputs vary but the goal stays the same.

How do I know when to use a scheduled AI run instead of a Power Automate flow?

If you can map out every branch and action of a task before it runs, a standard flow is the right tool. When the output requires interpretation, prioritisation, or judgement based on inputs that change each time, a scheduled AI run is a better fit. Tasks like triaging alerts, reviewing documents for meaningful changes, or summarising error logs fall into this second category.

Why does a scheduled Power Automate flow struggle with fuzzy or variable logic?

Power Automate flows are deterministic, meaning they follow the same paths every time regardless of context. When the logic requires understanding nuance or making a judgement call, the flow typically ends up forwarding everything to a human rather than completing the work itself. The flow becomes a delivery mechanism rather than a true automation.

When should I consider replacing a recurring manual review task with an AI automation?

If a task runs on a regular cadence, involves inputs that vary in structure, and produces an output that is a summary, ranking, or decision rather than a fixed result, it is a strong candidate for AI automation. Examples include weekly pull request reviews, overnight alert triage, and monthly document audits where a human currently does the interpretation work.

This post was inspired by Automations via OpenAI.

April 23, 2026
Workspace Agents Are ChatGPT’s Answer to Power Automate and That Comparison Matters

I came across the OpenAI page on Workspace Agents and my first thought was blunt. This is Power Automate with a chat interface sitting in front of it. That is not a dig. The fact that OpenAI Workspace Agents land so close to what Microsoft has been building for years is the interesting part, because it tells you where the bar is moving for every automation builder in the enterprise.

I have been building on Power Platform full time inside a large organisation for years. I am not worried about Workspace Agents replacing anything in my stack next week. I am thinking about what happens when the people I build for start using ChatGPT at home and walk into the office expecting the same feel.

What OpenAI actually shipped and why it looks familiar

Strip the marketing language and Workspace Agents are a way to let a user describe a repeatable task, connect some tools, and have the agent run it on a schedule or on demand. Triggers. Actions. Conditions. A reasoning layer that decides what to do next.

If that sounds like a Power Automate flow with a Copilot Studio agent sitting on top, that is because functionally it overlaps a lot. The difference is not in what it does. It is in how you build it.

Conversation-first automation versus flow-first automation

Power Automate starts from a diagram. You pick a trigger, you add steps, you see the branches. Even when Copilot writes the flow for you, the output is still a visual graph you can inspect, test, and version.

Workspace Agents start from a conversation. You tell the agent what you want. It figures out the steps. You refine by talking, not by dragging.

Neither approach is better. They attract different builders and produce different kinds of automations. Flow-first builders think in terms of state, error paths, and what happens when step 4 fails. Conversation-first builders think in terms of outcomes and trust the model to fill in the middle.

I have written before about what actually makes a workflow agentic, and the same rule applies here. If you can fully diagram the execution path before it runs, you built a flow with a chat skin. The interesting Workspace Agent use cases are the ones where the agent genuinely picks the path.

What this means if you already run on Power Platform

Workspace Agents are not going to displace Power Platform inside a large enterprise. Governance, DLP, environment strategy, audit, the whole compliance layer. None of that is solved by a chat interface on top of a model provider.

But the comparison matters for two reasons.

First, it shows what conversation-first building can feel like when it works. Power Automate with Copilot is moving in that direction, just slower and with more guard rails. If you want to understand where the platform is heading, watching how people actually use Workspace Agents is more useful than reading another Microsoft roadmap post.

Second, it exposes the parts of Power Platform that still feel heavy. Creating a solution, picking an environment, sorting out connection references, publishing, sharing. A business user who just had a working agent in ChatGPT in four minutes is going to ask why the internal version takes four days. Part of that friction is unavoidable — as I explored in why Power Automate is still worth learning in 2026, the platform carries real enterprise weight that consumer tools simply do not have to.

The expectation shift that is about to hit your intake queue

This is the part people I talk to at other organisations are not ready for.

The OpenAI Workspace Agents launch does not change what is technically possible inside your tenant. It changes what your users think should be easy. Someone who built an agent over the weekend to summarise emails and update a Google Sheet is going to file an intake ticket asking for the same thing against SharePoint and Outlook, and they will be confused when the answer is not “sure, by Friday.”

The honest answer is that the internal version has to handle auth, permissions, data residency, retention, and the fact that the output will be read by someone who makes a decision based on it. That is not bureaucracy. That is the cost of operating in a regulated enterprise. But nobody wants to hear it when the external version just works.

The teams that will handle this well are the ones that stop treating every request as a custom build and start shipping pre-approved agent templates with the governance already baked in. Citizen devs get conversation-first speed. The platform team keeps control of the risk surface. That is the only way the intake queue survives the next year. And it is worth remembering that who owns the decision inside these automations matters as much as how fast they run — shipping an agent template without settling that question just moves the risk downstream.

I have opinions on how to structure that, and I will write about it soon. You can follow along on my LinkedIn if you want the next piece when it lands.

Workspace Agents are not a threat. They are a preview of the conversation you are about to have with every business user who used ChatGPT over the weekend.

Frequently Asked Questions

What are OpenAI Workspace Agents?

OpenAI Workspace Agents let users describe a repeatable task, connect tools, and have an agent run it on a schedule or on demand. They use a conversation-first approach, meaning you define what you want through chat rather than building a visual workflow diagram.

How do OpenAI Workspace Agents compare to Power Automate?

Both handle triggers, actions, and conditions to automate tasks, so they overlap significantly in what they can do. The key difference is how you build them: Power Automate starts from a visual flow diagram, while Workspace Agents start from a conversation with the model.

When should I use Power Automate instead of a conversational agent?

Power Automate is better suited when you need clear error handling, version control, and a fully inspectable execution path. Conversation-first tools like Workspace Agents work well when you want to define an outcome and let the model determine the steps.

Why does the rise of OpenAI Workspace Agents matter for enterprise automation builders?

As more people use conversational AI tools like ChatGPT in their personal lives, they will expect a similar experience in workplace tools. This raises the bar for how automation platforms present themselves, even if enterprise governance and compliance requirements still favour established platforms.

This post was inspired by Workspace agents via OpenAI.

April 22, 2026
Latency Is the Quiet Killer of Agentic Workflows and Almost Nobody Talks About It
Everyone obsesses over model quality, tool design, and prompt structure when building agents. The thing that actually kills adoption in production is something else entirely. Agentic workflow latency is the quiet killer, and most Power Platform and Copilot Studio builders are not thinking about it until users start abandoning the tool.

I came across a post from OpenAI about using WebSockets and connection-scoped caching in the Responses API to speed up their Codex loop. It confirmed something I keep running into building multi-step agents internally. The math is brutal once you do it honestly.

Why Agent Loops Feel Slow Even When Each Call Is Fast

A single model call at 800ms feels fine. A tool call at 300ms feels fine. A Dataverse lookup at 500ms feels fine. Everyone looks at these numbers in isolation and says the platform is fast enough.

Then you build an actual agent. It reasons, calls a tool, reads the result, reasons again, calls another tool, checks a condition, calls a third tool, summarises, responds. That is 8 to 15 round trips for one user request. Each round trip carries connection setup, authentication overhead, token streaming setup, and the model’s own time to first token.

A 400ms overhead per call sounds small. Multiply by 12 calls. That is almost 5 seconds of pure overhead before any actual thinking or work happens. Users do not wait 15 seconds for a confident answer. They ask once, get nothing for a few seconds, and switch back to the old way of doing it.

I have watched this kill internal tools that were technically correct. The agent did the right thing. Nobody used it.

What OpenAI Just Shipped and Why It Matters Beyond Codex

The short version of what they did: move from repeated HTTP requests to a persistent WebSocket connection, and keep cache state scoped to that connection so repeat context does not need to be re-processed on every turn.

This is not a Codex-only trick. It is a general pattern. Connection-scoped caching means the expensive part of a call, the part that handles your system prompt and tool definitions and prior context, does not get redone from scratch every time your agent takes another step.

For anyone building agents that loop, this is the shape of the next year of infrastructure work. The platforms that expose this properly will feel instant. The ones that do not will feel like they are thinking through molasses.

What This Looks Like Inside Copilot Studio and Power Automate

Here is where it gets uncomfortable. In Copilot Studio, you do not see the round trips. You see a topic, a few actions, a generative answer node. The platform hides every call behind its own orchestration.

That hiding is the problem. A Copilot Studio agent doing generative orchestration with three tool calls backed by Power Automate flows is making far more round trips than most builders realise. Each tool call is a Power Automate HTTP trigger plus whatever that flow does internally, often including another connector call to SharePoint, Dataverse, or an external API. The agent then reads the response and decides what to do next, which is another model call. And if you are hitting Power Automate throttling limits under real load, every one of those round trips gets longer.

I built one recently that felt snappy in testing with one user. In production with ten concurrent sessions, response times doubled. Nothing in the flow was slow on its own. The sum was slow, and throttling on shared connectors made it worse. This is the same class of problem I wrote about in Most Agentic Workflows Are Just Fancy If/Then Logic in a Trench Coat. The difference between a real agent loop and a glorified flow shows up in latency first.

How I Would Budget Latency Before I Build the Agent

I treat latency as a first-class design constraint now, not something I measure after the fact. Before I build, I do this:
- Estimate the number of model calls per user request. Not best case. Typical case.
- Estimate the number of tool calls and what each one hits. A SharePoint list call in the same tenant is not a Graph API call with auth handshake.
- Set a budget. I aim for under 4 seconds total for anything conversational, under 10 seconds for anything that is clearly doing work.
- Cut calls aggressively. Can two tools be one? Can I pre-fetch context in a single call instead of three? Can the agent skip a reasoning step when the intent is obvious?
- Parallelise where I can. Power Automate lets you run actions in parallel branches. Most builders do not use them.
The other thing I stopped doing: chaining LLM calls for steps that do not need reasoning. If a step is deterministic, I call the tool directly, not through the model. Every model call I can remove from the loop gives me back 500 to 1500ms.

Latency is also where the question of who owns the decision in an agentic workflow becomes a performance problem, not just a governance one. Every checkpoint that routes back to a human approver adds another wait state to the loop. The more of those you have, the more your total response time is dominated by human latency, not model latency.

I have written more about my approach to this kind of trade-off on my LinkedIn, because I keep having the same conversation with people at other organisations who hit the wall when their demo hits real users.

The agents that win in production are not the smartest ones. They are the ones that answer before the user gives up.

Frequently Asked Questions

Why does agentic workflow latency get so bad in multi-step agents?

Each individual call in an agent loop may seem fast, but the overhead adds up across 8 to 15 round trips per user request. Connection setup, authentication, and token streaming costs stack on every single step, turning individually acceptable delays into a frustrating overall wait time.

What is connection-scoped caching and how does it help agent performance?

Connection-scoped caching keeps expensive context like system prompts, tool definitions, and prior conversation state ready across multiple calls instead of reprocessing it each time. This avoids redundant work on every step of an agent loop and significantly reduces the overhead that accumulates across a multi-turn interaction.

How do I reduce latency in Copilot Studio and Power Automate agents?

Start by auditing how many round trips your agent actually makes for a single user request, since this is where most hidden latency lives. Look for opportunities to batch tool calls, reduce unnecessary steps in your loop, and watch for platform-level improvements like persistent connections that reduce per-call overhead.

Why do users abandon AI agents even when the agent gives correct answers?

If the response takes too long, users lose confidence and revert to familiar alternatives before the agent finishes. Technical correctness does not matter if the experience feels slow enough to suggest something has gone wrong.

This post was inspired by Speeding up agentic workflows with WebSockets in the Responses API via OpenAI.
April 22, 2026
The Real Shift Is Not Faster Work It Is Who Owns the Decision
I came across a post from the Microsoft Power Platform Blog about intelligent apps, human leadership, and the new shape of work. The framing is fine, but the speed angle buries what I think is the actual shift happening right now. The real story is about ai automation decision ownership, not output volume. Two years ago a flow ran rules and a human approved the exceptions. Now the agent handles the exceptions and the human sets the policy the agent operates under. That is a completely different job.

Speed Was Never The Interesting Part Of Automation

Every automation pitch I have seen in the last five years leads with hours saved. It is the easiest metric to put on a slide. It is also the least interesting thing about a good automation.

The flows I am proud of did not win because they were fast. They won because they removed a decision that did not need a human in the loop. A purchase order under a certain threshold. A leave request that matches a pattern. A ticket that routes itself based on content. The speed was a side effect of letting the process run without waiting for someone to click approve.

Speed framing also makes everyone lazy about design. If the only goal is faster, you end up automating a bad process and shipping the same broken logic at ten times the throughput. I have written about this before. Bad process plus automation equals faster failure.

The Decision Boundary Is What Actually Moved

Here is the shift I keep seeing internally and hearing from people at other organisations.

Old model: deterministic flow runs the rules, human handles anything weird. The human owns judgment. The flow owns execution.

New model: agent handles the weird cases too, because it can reason about context, read the attachment, compare against policy, and make a call. The human no longer sits in the approval step. The human sits above the agent, writing the constraints it operates under.

That is not a speed change. That is a decision ownership change. The human used to approve ten exceptions a day. Now the human writes the rules for how exceptions should be resolved and reviews a sample at the end of the week.

Most teams I talk to have not internalised this yet. They still put the agent in the response step of a structured flow, which I already called out as not really agentic. The interesting version is when the agent sits in the decision layer and the deterministic steps execute what it decides.

What This Changes About How I Build Flows And Agents

When I build a flow now, I spend less time on the happy path and more time on what the agent is allowed to decide on its own.

Concretely:
- I write the policy first in plain language, not the flow. What can the agent approve without escalation. What must it always escalate. What does good look like. What does a bad outcome look like.
- I design the tools the agent calls as if I am writing an API contract, because that is what I am doing. A tool returning done is useless. It needs to return the state the agent can reason about.
- I build the escalation path before I build the automation path. If the agent is uncertain, where does it hand off. To whom. With what context.
- I log the decisions, not just the executions. A flow run log tells me what happened. A decision log tells me why the agent chose what it chose, which is the only way to improve the policy.
This is closer to writing Power Automate flows with an orchestration brain on top than it is to classical automation. If you are curious about that orchestration layer, Claude has been the most interesting model for this in my testing. Anthropic is shipping the kind of stateful reasoning this job actually needs.

Stop Measuring Hours Saved Start Measuring Decisions Delegated

If your AI project feels underwhelming even when it technically works, look at what you are measuring. Hours saved is a dashboard metric. Decisions delegated is an architecture metric.

Some questions I ask when I review an automation now:
- How many decisions used to need a human that no longer do.
- What is the policy the agent is enforcing, and who owns that policy.
- When the agent is wrong, how do we find out, and how fast do we update the policy.
- What decisions are we deliberately not delegating, and why.
None of these show up on a time-saved slide. All of them determine whether the automation holds up six months in.

The job is not building flows anymore. The job is writing the operating constraints for something that makes judgment calls. That is a different skill, and I think the teams that figure this out early will look very different from the ones still counting hours. In my own experience, the projects that aged well are the ones where someone owned the policy, not the flow.

Frequently Asked Questions

What is AI automation decision ownership and why does it matter?

AI automation decision ownership refers to who or what holds responsibility for making judgment calls inside a workflow. It matters because modern AI agents can now handle exceptions and reasoning tasks that previously required a human approver, fundamentally changing the role people play in automated processes.

How do I know if my automation is truly agentic or just a faster rule-based flow?

A good indicator is where decisions actually happen. If a human or a rigid rule set still handles every exception and the AI only executes predefined steps, the workflow is not truly agentic. An agentic setup puts the AI in the decision layer, with deterministic steps carrying out what it concludes.

Why does automating a bad process lead to worse outcomes?

Automation removes the friction that sometimes forces people to notice problems. When a flawed process runs faster and at higher volume, errors multiply at the same rate as the throughput, making the underlying issues harder to catch and more costly to fix.

When should I move a human out of the approval step in an automated workflow?

A human can move out of direct approvals when the decision follows a consistent pattern that can be expressed as clear policy constraints. The better use of human judgment at that point is writing and periodically reviewing those constraints, rather than approving individual cases one by one.

This post was inspired by Intelligent apps, human leadership, and the new shape of work via Microsoft Power Platform Blog.
April 21, 2026
Why Power Automate Is Still Worth Learning in 2026

I keep hearing a version of the same idea from people just getting into the Microsoft stack. Why spend time learning Power Automate when you can describe what you want to a Copilot Studio agent and have it figure out the execution? The assumption behind that question is that Power Automate is scaffolding you work around, not something you need to understand. That assumption is costing people real time. If you want to learn Power Automate in 2026, the argument for doing it has not weakened. It has gotten stronger.

The ‘Just Use an Agent’ Shortcut Has a Hidden Cost

Copilot Studio agents and AI-assisted flow building have made it faster to get something working. That part is real. The problem is that faster to working and closer to production are not the same thing.

I have seen this pattern repeatedly. Someone builds an agentic workflow, the demo runs cleanly, and then it breaks in production in a way they cannot explain. Not because the agent reasoning was wrong. Because the underlying flow had no retry logic, the connection was using a shared credential that hit a rate limit under load, or a step was silently failing and passing a null value forward. The agent kept going. The result was wrong. Nobody knew until downstream.

I wrote about this dynamic in the context of agentic workflows before. An agent that generates a confident-sounding response for a task that did not complete destroys user trust faster than almost any other failure mode. The agent is the reasoning layer. Power Automate is still the execution layer. If you do not understand the execution layer, you cannot debug what the reasoning layer is telling you went wrong.

What Power Automate Actually Does That Nothing Else Replaces

Power Automate is where things actually run inside a Microsoft-stack enterprise. SharePoint events, Teams messages, Dataverse writes, approval routing, HTTP callouts to external APIs, scheduled jobs, document generation triggers. The connectors, the authentication, the retry behaviour, the action limits. All of it lives here.

Copilot Studio can call a Power Automate flow. An AI Builder model can feed into one. An agent can trigger one. But the flow is what actually touches the system of record. When something goes wrong at 2am and you are looking at a run history, you need to know what you are reading. You need to know whether a 429 error is platform-level throttling or connector-level throttling, because the fix is different. I spent a full post on exactly that distinction and it is still one of the most common misdiagnoses I hear about. Applying the wrong fix because you conflated the two layers is a common reason throttling problems persist after troubleshooting.

That kind of mechanical understanding does not come from describing what you want to an agent. It comes from building flows, watching them fail, and understanding why. The Microsoft Learn documentation for Power Automate covers the connector model and trigger types well, but the intuition for where things go wrong only comes from time in the tool.

What I Would Focus on Learning First

If I were starting today, I would ignore the visual polish and go straight to the things that bite you in production.

Trigger types first. Automated, scheduled, and instant triggers behave differently. Understanding which one is firing your flow, and why, is foundational. A flow that should run once per item can run multiple times if the trigger is misconfigured and the conditions are wrong.

Then connector authentication. Service principal vs. shared connection vs. connection reference. In enterprise environments, shared personal connections on flows that other people own is a support incident waiting to happen. Connection references and service principals exist for a reason. Learn them before you need them.

Then error handling. Specifically, configure run after on every branch that matters. A flow that has no error path is a flow that fails silently. Silent failures in automation are worse than loud ones. I have written about how automated processes have no equivalent to a human noticing something feels off and escalating. Automating a bad process just makes it fail faster, and the exception silently propagates instead. Configure run after is the closest thing Power Automate gives you to that instinct.

Apply to each concurrency settings come next. The default behaviour is sequential. Changing it to parallel speeds things up and introduces race conditions if you are not careful. Knowing when to use which, and how to tune it, matters the moment you are processing more than a handful of records.

Where This Knowledge Starts Paying Off in More Advanced Work

Once you understand how flows are structured and where they fail, everything built on top of them becomes more readable. Copilot Studio action steps that call flows stop being black boxes. You can look at what a cloud flow is doing inside an agent and understand whether the tool response is reliable. You can design the tool contract properly instead of returning a status that gives the agent nothing to evaluate.

The people I talk to at other organisations who are building agentic workflows that actually hold up in production share one thing. They did not skip the foundation. They know what happens inside the flow the agent is calling. They know what a bad intermediate result looks like and they have built the error paths to catch it before it propagates.

The practitioners skipping straight to orchestration are building things that look impressive in demos and require constant firefighting after launch. That gap in reliability traces back to the same place every time. They do not know the execution layer well enough to control it.

Learning Power Automate in 2026 is not about catching up on something old. It is about having the mechanical understanding that makes everything else you build on the Microsoft stack actually reliable. That foundation is still the thing that separates flows that work from flows that work until they do not.

Frequently Asked Questions

Is it still worth it to learn Power Automate in 2026?

Yes, learning Power Automate in 2026 remains valuable because it is the execution layer for almost everything that runs inside a Microsoft-stack enterprise. AI tools and agents can help you build flows faster, but you still need to understand how flows work to debug failures, handle errors, and get workflows into production reliably.

What is the difference between Copilot Studio and Power Automate?

Copilot Studio handles the reasoning and decision-making layer, while Power Automate is the execution layer that actually interacts with systems like SharePoint, Dataverse, and external APIs. An agent can trigger a flow, but the flow is what performs the real work and touches the system of record.

Why does my Power Automate flow work in testing but fail in production?

Common causes include missing retry logic, shared credentials hitting rate limits under load, or steps that silently fail and pass null values forward. These issues are easy to miss during demos but become serious problems at scale, which is why understanding flow structure matters beyond just getting something running.

When should I learn Power Automate instead of relying on AI-generated flows?

You should invest in learning Power Automate when you are responsible for workflows that need to be reliable in production, not just functional in a demo. If you cannot read a run history or diagnose a throttling error, you will struggle to fix failures that AI tooling alone cannot explain.

April 19, 2026

Tag: Power Automate

What you see on the surface

The underlying mechanism

Where it breaks

What this means for how you build it

Frequently Asked Questions

Why does my Power Platform center of excellence setup stop working after a few weeks?

What licences and permissions does the CoE Starter Kit service account need?

How do I know if my CoE sync flows have stopped running correctly?

Why does the CoE Starter Kit struggle with throttling on large tenants?

Determinism and predictability

Cost per execution

Maintenance and brittleness

What the work actually looks like

Governance and auditability

Choose RPA if / Choose AI if

Frequently Asked Questions

What is the difference between RPA vs AI automation for enterprise workflows?

When should I use RPA instead of an AI agent?

How do I know if AI automation is worth the cost for my workflow?

Why does RPA break so often in enterprise environments?

The Try Catch Finally pattern is the floor, not the ceiling

Differentiate transient from terminal errors

Compensating actions for partial failures

Centralise error logging

The notification trap

The pattern that ties it together

Frequently Asked Questions

What are the best power automate error handling patterns to use in production flows?

How do I find out which action failed inside a Power Automate Try scope?

When should I retry a failed action in Power Automate versus escalate to a human?

Why does Power Automate’s built-in retry policy not work well for rate-limited connectors?

Why the Trainium Story Is Actually a Capacity Story

What Token Cost Drift Looks Like in a Real Power Automate Flow

Bedrock vs Direct Anthropic API for Enterprise Automation Workloads

How I Would Plan Claude Capacity for an Agent You Actually Depend On

Frequently Asked Questions

Why does running Claude on Amazon Trainium matter for enterprise AI deployments?

Why does my Power Automate flow keep getting slow or failing when using Claude?

How do I control Claude API costs in a high-volume Power Automate flow?

When should I start worrying about token cost drift in an AI automation?

Why Head to Head Model Comparisons Stop Being Useful the Moment You Add a Connector

The Four Things That Actually Matter When a Model Sits Inside an Automation

How I Pick Between Claude and GPT for a Specific Flow

What to Test Before You Commit a Model to Production

Frequently Asked Questions

Which is better, Claude vs ChatGPT for automation workflows?

Why does my AI agent start producing errors after working fine at first?

How do I choose an AI model for a Power Automate or Copilot Studio flow?

What causes silent failures in AI automation workflows?

Why a scheduled AI run is different from a scheduled flow

The workload shape where Codex automations scheduled ai runs actually fit

What I would build with this tomorrow if I had it internally

Where Power Automate still wins and where it does not

Frequently Asked Questions

What are codex automations scheduled AI runs and how do they work?

How do I know when to use a scheduled AI run instead of a Power Automate flow?

Why does a scheduled Power Automate flow struggle with fuzzy or variable logic?

When should I consider replacing a recurring manual review task with an AI automation?

What OpenAI actually shipped and why it looks familiar

Conversation-first automation versus flow-first automation

What this means if you already run on Power Platform

The expectation shift that is about to hit your intake queue

Frequently Asked Questions

What are OpenAI Workspace Agents?

How do OpenAI Workspace Agents compare to Power Automate?

When should I use Power Automate instead of a conversational agent?

Why does the rise of OpenAI Workspace Agents matter for enterprise automation builders?

Why Agent Loops Feel Slow Even When Each Call Is Fast

What OpenAI Just Shipped and Why It Matters Beyond Codex

What This Looks Like Inside Copilot Studio and Power Automate

How I Would Budget Latency Before I Build the Agent

Frequently Asked Questions

Why does agentic workflow latency get so bad in multi-step agents?

What is connection-scoped caching and how does it help agent performance?

How do I reduce latency in Copilot Studio and Power Automate agents?

Why do users abandon AI agents even when the agent gives correct answers?

Speed Was Never The Interesting Part Of Automation

The Decision Boundary Is What Actually Moved

What This Changes About How I Build Flows And Agents