Category: Artificial Intelligence in Business

  • Anthropic Just Launched Claude Finance Agents and the Specialization Trend Is Real

    Anthropic Just Launched Claude Finance Agents and the Specialization Trend Is Real

    Claude Finance Agents announcement from Anthropic with connectors to financial data providers

    Anthropic shipped Claude Finance Agents this week. A set of Claude-powered agents built specifically for financial analysts, with native connectors to LSEG, Moody’s, S&P Global, and Morningstar. This is not a generic chat assistant pointed at a finance prompt. The Claude finance agents are a packaged product with the data plumbing already done.

    I have been watching this trend build for months. This release makes it impossible to ignore.

    What Anthropic actually shipped

    Three agents wrapped around specific analyst workflows. One handles modeling. One handles due diligence. One handles comparable company analysis. Each one is a Claude agent with a defined scope, a system prompt tuned to that workflow, and direct connectors into the data providers analysts already pay for.

    The connector list is the part that matters. LSEG for market data. Moody’s for credit. S&P Global for company financials. Morningstar for funds. These are not scraped sources. They are licensed enterprise feeds. Anthropic did the integration work that every internal team building a finance copilot would otherwise have to do themselves.

    You can read the full announcement on Anthropic’s site. The pricing structure is enterprise. The target user is clear. This is not a consumer move.

    Why this release matters beyond finance

    For two years the pitch from foundation model vendors was: here is the model, build whatever you want. That is over.

    Now the pitch is: here is the model, here is the agent, here are the connectors, here is the workflow. The vendor is moving up the stack into the application layer. Anthropic is doing it for finance. Microsoft is doing it for general enterprise productivity. OpenAI is doing it for coding and research. The pattern is consistent. Anthropic launching an enterprise AI services arm was an early signal of exactly this direction.

    This changes the build-vs-buy math in a real way. If you are an enterprise team that was about to spend six months building a Claude-based comparable company analysis agent on top of a generic platform, you now have to ask whether your custom version will actually beat what Anthropic ships out of the box. Most of the time, in the specific domains where these vertical agents land, the answer will be no.

    That does not mean custom builds are dead. It means the line moves. Custom builds make sense where the vendor product does not exist or does not match your specific data and policies. Generic finance modeling? Probably not worth building. Your firm’s specific deal screening logic with your proprietary scoring model? Still custom.

    The other thing this release confirms is that tool design is product design. I have written before that agentic workflows live or die on the quality of their tool layer. Anthropic clearly figured this out. Wrapping LSEG and S&P data with proper structured outputs that Claude can reason about is the actual hard work. Anyone who has tried to build this on top of raw connectors knows.

    This specialization pattern also raises a real architectural question: when the vendor ships a domain-specific agent, does your orchestration layer treat it as a peer, a sub-agent, or a replacement? That is the same question I work through in when to build a multi-agent system instead of a single agent.

    What I would do with it this week

    I do not work in finance, so I am not deploying this in production. But here is what I would do if I were on a finance team, and what I am doing in adjacent domains.

    First, audit any internal agent project that overlaps with what Anthropic just shipped. If a team has been building a comparable company analysis tool for four months and Anthropic just released one, that conversation has to happen now, not in Q3.

    Second, look at the connector list and ask which of those data sources your team already licenses. The value of Claude Finance Agents drops fast if you do not have LSEG or S&P feeds. Vendor lock through data integration is the real moat here.

    Third, think about what the equivalent looks like in your domain. If Anthropic shipped finance agents in May 2026, what does an HR agent product look like? A legal one? A procurement one? Someone is building each of these. Probably more than one someone. In my experience, the teams that win the build-vs-buy decision are the ones that ask the question early, not the ones that finish their custom build and then discover the vendor product. The same specialization logic is visible in Microsoft Discovery as the first real glimpse of domain-specific agent platforms.

    For Power Platform builders, this is also a useful signal. Copilot Studio is Microsoft’s answer to the same trend, and the business skills work in Dataverse is the integration layer equivalent. The shape of the market is clear.

    The era of generic agent platforms competing on model quality alone is closing. The next round is about who owns the workflow.

    This post was inspired by Finance Agents via Anthropic.

  • Microsoft Just Shipped Business Skills in Dataverse and This Is How You Teach Agents Your Org

    Microsoft Just Shipped Business Skills in Dataverse and This Is How You Teach Agents Your Org

    Dataverse business skills for agents announcement reaction

    Microsoft announced business skills in Dataverse on May 1, and this is the announcement I have been waiting for. Dataverse business skills for agents let you encode org processes, policies, and the tribal knowledge that lives in people’s heads as natural-language instructions. Agents discover them and follow them at runtime. No more cramming everything into a 4000-token system prompt and hoping the model remembers how your finance team handles approvals.

    I have been reading the docs since Friday. Here is my honest take.

    What business skills actually do

    A business skill is a Dataverse record. It contains a natural-language description of when the skill applies, what the agent should do, and what data or actions it can use. Agents query Dataverse at runtime, find the skills that match the user’s intent, and follow the instructions inside.

    The shape matters. You are not writing code. You are writing the kind of paragraph you would send to a new hire on day one. Things like: When someone asks about expense approvals over 5000 EUR, route to the regional finance lead, never to the team manager. The lookup table is in the Finance Approvers table. Always confirm the amount and the cost center before submitting.

    That description is stored, versioned, and indexed. Multiple agents can use the same skill. You update the skill once and every agent that discovers it picks up the new behavior. There is also a permission layer, so a skill can be scoped to a security role, a team, or an environment.

    Underneath, this is grounding. The agent does not memorize your org. It retrieves the relevant skill at runtime and follows it.

    Why this changes how you build internal agents

    The hardest part of deploying internal agents has never been the model. It has been getting the agent to behave like someone who actually works at your company. The model can reason. It cannot know that your procurement policy changed in March, or that the Madrid office handles APAC tickets on Wednesdays because of a coverage gap.

    Until now, that context lived in three bad places. System prompts that grew until they hit the context window and started degrading. Power Automate flows with hardcoded business logic that nobody could find six months later. Or worse, it lived nowhere and the agent guessed.

    I have written before that a focused 400-token instruction set produces more reliable behavior than a 4000-token one. Business skills make that practical. You stop stuffing the prompt and start composing skills. The agent picks the right ones for the job.

    The other thing this fixes is ownership. A business skill in Dataverse has an owner, an audit trail, and a lifecycle. When the policy changes, the policy owner updates the skill. They do not need to find the agent maker, file a ticket, or wait for a release. That is a real architectural shift, not a feature flag. If your org is still working out who owns what when policies change, Power Platform governance that does not kill adoption covers how to structure that before it becomes a cleanup problem.

    The risk I am watching: skill sprawl. If every team writes their own skills with overlapping scopes, the agent will face the same routing problem multi-agent setups face. Skill descriptions will start competing with each other and you will get silent misrouting. Governance has to come early, not as a cleanup project in month six.

    What I would do with it this week

    Pick one painful, well-bounded process. The kind where the answer is always it depends on who you ask. Approval routing is a good candidate. Onboarding checklists work too.

    Write three to five business skills that capture the rules. Keep each one short and specific. Connect them to a Copilot Studio agent that already has the right Dataverse and Power Automate connectors. Test with the messy questions, not the clean ones. Watch which skills get picked and which do not.

    The thing to measure is not whether the agent answers correctly. It is whether the right skill was selected for the right phrasing. If selection is unreliable, your skill descriptions are too similar or too vague. Rewrite them and try again. If you are also wiring up custom connectors to extend what the agent can reach, How to Build a Custom Connector for Copilot Studio Step by Step is worth keeping open in a tab.

    I will be writing more about this once I have run a real internal pilot. Early signals are good. In my experience, the patterns that survive contact with production are the ones where context is stored where it belongs, not where it was convenient at build time.

    This one belongs in Dataverse. Finally.

    This post was inspired by Introducing business skills: Teach agents how your organization works via Microsoft Power Platform Blog.

  • When should I build a multi-agent system instead of a single agent?

    When should I build a multi-agent system instead of a single agent?

    Diagram comparing single agent vs multi-agent system architecture

    Short answer: stay single-agent until you hit one of three specific failure modes. Tool overload past roughly 8 to 10 connectors. Conflicting system prompts that cannot be reconciled in one instruction set. Or genuinely parallel workstreams that need to run at the same time. If none of those apply, the single agent vs multi-agent question is already answered. Build one agent.

    Most posts on this topic make multi-agent sound like the natural next step. It is not. It is a tax you pay when a single agent can no longer do the job, not an upgrade you take because it sounds more sophisticated.

    The longer answer

    I read a good piece on Towards Data Science walking through ReAct workflows and when scaling to multi-agent makes sense. The framing matched what I keep running into when I talk to people building on Copilot Studio and similar platforms.

    A single agent is a loop. It reads, picks a tool, calls the tool, reads the result, picks again, until it decides it is done. That loop works well when the tool list is small enough for the model to reason over cleanly and the instructions do not pull the model in two directions.

    It starts breaking in predictable places.

    The first is tool overload. I have written before about how a model that hits 95 percent accuracy on tool selection with two connectors can drop to 70 percent with five, because tool descriptions start competing with each other. By the time you have ten or twelve tools, the agent picks the wrong one regularly and you cannot fix it with prompt tweaks.

    The second is prompt conflict. If your agent needs to behave like a strict policy checker for one task and a friendly explainer for another, those two personas fight inside one system prompt. You can feel it in the outputs. The model compromises in the wrong direction.

    The third is parallelism. A single agent loop is sequential by design. If you have three independent workstreams that must run at the same time, no amount of prompt engineering will make a single ReAct loop parallel. This is also where thinking through whether AI automation is even the right fit versus a simpler RPA approach becomes worth the time.

    Everything else, latency, observability, prompt size, can usually be solved without splitting agents.

    How to decide in practice

    I use a short checklist when someone asks about single agent vs multi-agent for a Copilot Studio build.

    Count the tools. If you are under eight connectors and the descriptions do not overlap, one agent is fine. Past ten with overlap, start thinking about splitting.

    Read the system prompt out loud. If it contains contradictory instructions for different scenarios, that is a real signal. Splitting reduces prompt size per agent, and a focused 400-token instruction set produces more reliable behavior than a 4000-token one. I covered this in more detail in my post on multi-agent orchestration patterns in Copilot Studio.

    Map the workstreams. Are they actually independent, or are they sequential steps you are calling parallel because it sounds nicer? Most automation work is sequential. Real parallelism is rarer than people think.

    Budget the latency. Every hop between agents adds round-trip overhead. If you split a single agent into three, you have just added two more model calls and two more HTTP boundaries to every request. I have written about how accumulated round-trip overhead kills perceived performance long before any single call gets slow.

    If the checklist points to multi-agent, default to a supervisor pattern. One parent agent routes to focused child agents. Skip the peer network where agents call each other freely. It looks elegant in diagrams and is painful to debug in production.

    Microsoft has shipped real multi-agent orchestration in Copilot Studio, so the platform support is there. The question is whether your problem actually needs it.

    Related gotchas

    Routing in Copilot Studio multi-agent setups depends on the description you write for each connected agent, not on trigger phrases. A vague description causes silent misrouting that is harder to debug than a broken trigger. Write descriptions like API contracts, not marketing copy.

    When a parent agent picks the wrong child confidently, you get the same failure mode as a single overloaded agent, just one layer deeper. Splitting agents does not eliminate misrouting. It moves it.

    Token costs multiply faster than you expect. Each agent in the chain re-processes context. Three agents in a sequence is not three times the cost. It is often closer to five or six times once you count the context each one needs to reason properly.

    If you are still on the fence, build single first. You can always split later. Going from multi-agent back to single, on the other hand, almost never happens once the architecture is in place. That is the trade-off worth keeping in mind. More on how I think about these trade-offs here.

    Frequently Asked Questions

    When should I use a single agent vs multi-agent system?

    Stick with a single agent until you run into one of three problems: too many tools causing the model to pick the wrong one, conflicting instructions that cannot coexist in one system prompt, or parallel workstreams that need to run at the same time. If none of those apply, a single agent is the right choice.

    How do I know if my agent has too many tools?

    A good rule of thumb is to keep your tool count under eight to ten connectors. Beyond that, tool descriptions start competing with each other and the model’s ability to select the right one drops noticeably, even if prompt tweaks do not seem to help.

    Why does a single agent struggle with conflicting instructions?

    When a system prompt asks the model to take on two opposing behaviours, such as acting as a strict policy checker and a friendly explainer, those personas create tension within a single instruction set. The model tends to compromise in ways that produce unreliable outputs rather than handling each mode correctly.

    What are the main reasons to build a multi-agent system?

    Multi-agent systems are worth the added complexity when you need genuinely parallel workstreams, have irreconcilable prompt conflicts, or are dealing with tool overload that cannot be resolved through better descriptions. Outside of those scenarios, the extra coordination overhead rarely pays off.

    This post was inspired by Single Agent vs Multi-Agent: When to Build a Multi-Agent System via Towards Data Science.

  • Anthropic Is Launching an Enterprise AI Services Arm and That Changes the Vendor Conversation

    Anthropic Is Launching an Enterprise AI Services Arm and That Changes the Vendor Conversation

    Anthropic enterprise AI services announcement and what it means for buyers

    Anthropic announced it is standing up a dedicated enterprise AI services company to help large organizations deploy Claude in production. This is the kind of move that does not look loud on a Monday but reshapes how anthropic enterprise ai services conversations go inside large orgs for the next two years.

    Until now, if you wanted hands-on Anthropic help inside your organisation, you went through a partner, hired a boutique, or figured it out yourself. That is changing.

    What Anthropic actually announced

    Anthropic is launching a services arm focused on enterprise deployment. Not just API access. Not just Claude for individuals. A real services org built to sit next to enterprise teams and help them stand up Claude inside production environments.

    That means architecture work, integration help, deployment patterns, and the kind of hands-on engagement that used to belong exclusively to Microsoft Consulting Services, Accenture, Deloitte, and the big SI bench. Anthropic is now in that conversation directly.

    It is worth being precise here. This is not Anthropic becoming a generic consultancy. The framing is narrower: help enterprises actually deploy Claude in production for high-value use cases. That is a much sharper offer than “AI transformation,” and it is the kind of focus that tends to ship working systems instead of slide decks.

    Why this matters for enterprise AI buyers

    For most of 2024 and 2025, if you were building AI automation inside a large enterprise, your reference architecture defaulted to the Microsoft stack. Azure OpenAI, Copilot Studio, Power Platform, the whole vertical. That is not because Microsoft is always the best fit. It is because Microsoft has the procurement story, the EA discount, the field engineers, and the reference architectures already on the shelf.

    Anthropic just moved to close that gap.

    A credible second source for enterprise AI deployment work changes three things in real procurement conversations. First, you can now run a genuine bake-off where the non-Microsoft option has hands-on support, not just a model endpoint. Second, reference architectures stop being Microsoft-shaped by default. Third, the negotiation leverage shifts. When the only credible vendor is also your cloud provider, your CRM, your collaboration suite, and your AI copilot, you are not really negotiating.

    I have written before about why the Bedrock vs direct Anthropic API question is a governance decision, not a model decision. This announcement is the next step on that same line. Anthropic is acknowledging that getting Claude into a regulated enterprise is not a model problem. It is a deployment problem. They are now staffing for that reality.

    The honest part: services orgs at model companies are hard. The talent market for AI engineers who can also navigate enterprise IT is brutal. Anthropic will get pulled into pre-sales work, RFP responses, and stakeholder meetings that consume engineering capacity. Whether they keep the focus tight or drift into generic consultancy is the open question.

    What I would do with this news this week

    If you are anywhere near AI procurement or architecture decisions, three concrete things.

    First, put Anthropic on your shortlist for the next AI deployment review. Not as a model. As a deployment partner. The conversation is now legitimately different from “call the AWS rep about Bedrock.”

    Second, revisit your reference architectures. If yours quietly assumes the Microsoft stack from end to end, write down why. Some of those reasons will hold up. Some will turn out to be “because that is what the last project did.” Those are the ones to challenge.

    Third, if you are a Power Platform shop, this does not mean ripping anything out. Copilot Studio, Power Automate, and the Microsoft surface area are still where citizen development happens. But the heavy orchestration brain behind your agents does not have to be Azure OpenAI by default. I have been thinking about this for months, and as I covered in Claude vs ChatGPT Is the Wrong Question When You Are Building Automations, the model choice sitting behind your flows is less important than the deployment and governance story around it. It is now a real option with real support behind it.

    The interesting enterprise AI conversations for the rest of this year are not going to be about which model wins a benchmark. They are going to be about who shows up when you need to ship. From what I see in the community, that is exactly the conversation Anthropic just inserted itself into.

    And if the services arm delivers on its framing, it will also change how multi-agent orchestration patterns get designed in the first place — because you will finally have an Anthropic-native team in the room when those architecture decisions get made.

    The vendor field just got more interesting.

    This post was inspired by Enterprise Ai Services Company via Anthropic.

  • Microsoft Open Sourced the Azure Integrated HSM Design and That is a Bigger Deal Than It Sounds

    Microsoft Open Sourced the Azure Integrated HSM Design and That is a Bigger Deal Than It Sounds

    Azure Integrated HSM open source hardware security module diagram

    Microsoft open-sourced the design of the Azure Integrated HSM. The azure integrated hsm open source release is not a marketing move dressed up as transparency. It is the hardware security module that sits in Azure silicon and anchors key protection for workloads running on top, and the design is now public for anyone to read, audit, and pick apart.

    I have been reading through the announcement and the surrounding material for a couple of days. My honest first take: this matters more than the headline suggests, especially if you are building anything agentic that touches sensitive data.

    What it actually does

    Azure Integrated HSM is a hardware security module Microsoft designed in-house and integrates into Azure servers. The job of an HSM is narrow and important. It generates, stores, and uses cryptographic keys inside tamper-resistant hardware so that the keys never leave the chip in plaintext. Encryption, signing, and key wrapping happen inside the module. The application above it gets the result, not the key.

    What shipped this week is the design itself. Schematics, firmware interfaces, the cryptographic boundary, the attestation flow. Open-sourced for review. Not the silicon, the design.

    This sits underneath services people actually use. Azure Key Vault Managed HSM, confidential computing workloads, the key material protecting storage and databases, and increasingly the trust roots for AI inference where prompts and outputs cannot be exposed to the host. If you have ever clicked “customer-managed key” on an Azure resource, something like this was already in the path. The shift is that you can now read how it works.

    Why it matters

    Cloud trust has been a faith-based exercise for a long time. You read the compliance certifications, you trust the vendor, you move on. That worked when the workloads were a SQL database and a web app. It works less well when the workload is an agent making autonomous decisions over sensitive data, calling tools, and producing outputs that have to be cryptographically attributable.

    Open-sourcing the HSM design changes the trust model from “Microsoft says it is secure” to “here is the design, run it past your own cryptographers.” That is a real shift. Apple did something similar with Private Cloud Compute last year, publishing the design and inviting external researchers in. The pattern is becoming the bar for any infrastructure provider that wants to host AI workloads with sensitive data.

    The other reason it matters: agentic workloads will multiply the number of cryptographic operations per user request by an order of magnitude. Every tool call that needs a signed token, every cross-service hop that needs an attestation, every model output that needs to be tied back to a verified context. The HSM is no longer a sleepy compliance box. It is in the hot path.

    I have written before about latency in agentic workflows. Cryptographic operations are part of that budget. Knowing how the hardware actually works, and being able to reason about what it costs per call, stops being academic.

    What I would do with it this week

    I am not going to pretend I will sit down and audit silicon firmware this week. I will not. But there are concrete things worth doing if you build on Azure and you care about where your keys live.

    First, read the design document end to end. Even at a surface level, understanding the attestation flow, the key hierarchy, and the boundary between firmware and host gives you a much better mental model when you are reasoning about Key Vault, Managed HSM, and confidential computing. The Managed HSM docs become much more useful once you can picture what is underneath.

    Second, look at where in your current architecture you are accepting hardware-rooted trust on faith. If you are building Power Platform solutions that pull from sensitive data sources, the keys protecting that data sit in this stack. Decisions about who owns and governs that data access matter too — something I covered in Power Platform Governance That Does Not Kill Adoption. If you are building Copilot Studio agents that call into systems holding regulated content, your trust chain runs through here. Knowing the chain is the first step to defending it in a design review.

    Third, watch how the community responds. Open-sourcing a design only matters if people actually look. The interesting signal over the next few months will be what independent researchers find, what they push back on, and how Microsoft responds. That conversation is more informative than any vendor whitepaper.

    For a deeper dive into the rationale, the Azure blog post is the place to start. My own running notes on infrastructure shifts like this end up on my LinkedIn as I work through them.

    Inspectable infrastructure is becoming the floor for serious AI workloads, and this release nudges that floor higher. The broader question of who owns the decision when agents act autonomously over that infrastructure is the next thing worth thinking through.

    This post was inspired by Enforcing trust and transparency: Open-sourcing the Azure Integrated HSM via Azure Blog.

  • Inside a Power Platform Center of Excellence: Why Most Setups Stall in Month Three

    Inside a Power Platform Center of Excellence: Why Most Setups Stall in Month Three

    Power Platform Center of Excellence setup architecture diagram

    Most people think a Power Platform Center of Excellence setup works like installing a product. You import the CoE Starter Kit solution, run the setup wizard, point it at your tenant, and the dashboards fill up. Job done.

    That is the surface behaviour. The actual mechanism underneath is a chain of dependencies, sync jobs, and admin connector calls that quietly degrade if any one link breaks. I keep seeing teams hit this on LinkedIn and in conversations with people at other organisations. The kit looks healthy for six weeks, then the inventory stops matching reality and nobody knows why.

    Let me walk through what is actually happening underneath.

    What you see on the surface

    You install the CoE Starter Kit, the wizard provisions a Dataverse environment, and a set of cloud flows starts populating tables like Environments, Apps, Flows, and Makers. The Power BI dashboard lights up. You see a maker count, an app count, an orphaned resource list.

    From the outside, it looks like the kit is scanning your tenant. It is not scanning anything in real time. Every number you see is the result of scheduled flows that ran sometime in the last 24 hours, hit admin connectors, paginated through results, and wrote rows into Dataverse. The dashboard is just a read on that table.

    This matters because the moment those flows stop succeeding, your dashboard stops being true. And it does not tell you it stopped being true.

    The underlying mechanism

    The CoE kit runs on a stack of sync flows. The most important ones are Admin Sync Template v3 (environments), Admin Sync Template v4 (apps and flows), and the maker activity flows. Each one authenticates as the service account you set up during install and calls the Power Platform for Admins, Power Apps for Admins, and Power Automate Management connectors.

    Three things have to be true for those flows to keep working. The service account needs an active Power Platform Administrator or Global Administrator role. The account needs a per-user Power Automate licence with the right premium entitlements, because the admin connectors are premium. And the account needs to not be hitting throttling limits while paginating through a tenant with thousands of resources.

    The CoE sync flows are exactly the kind of workload that hits both platform-level and connector-level throttling, because they loop through every environment and every app in the tenant in one run. Getting your Power Automate error handling patterns right matters here — transient throttling errors need to be caught and retried differently from terminal failures, or the sync silently drops data.

    Where it breaks

    The most common failure mode is not the install. It is month three.

    The service account password expires, or MFA gets enforced tenant-wide, or someone removes the admin role because of a security review. The flows start failing silently. Default retry logic masks it for a week or two. Then the runs hit timeout and stop entirely. The dashboard freezes on stale data, but the numbers still look plausible, so nobody notices.

    The second failure mode is scale. The kit was designed for small to medium tenants. If you have 40,000 apps and 80,000 flows across hundreds of environments, the sync flows do not finish inside the 30-day Dataverse retention window for run history. You lose visibility into your own automation.

    The third one is the licensing trap. Teams install the kit on a trial, then move to production without giving the service account a proper premium licence. The flows technically run, but premium connectors throw 403s on specific calls, and only some tables populate. Half the dashboard works. The other half lies.

    What this means for how you build it

    Treat the CoE as a product you operate, not a kit you install. That changes a few decisions.

    Use a dedicated service principal with certificate auth where the connectors support it, instead of a user account with a password. The service principal does not expire, does not get MFA, does not get caught in a leaver process. Where you must use a user account, document it, monitor it, and put the password rotation in a runbook owned by a real team.

    Build a health check flow that runs daily and alerts when the last successful sync timestamp on each core table is older than 48 hours. Do not trust the dashboard to tell you the dashboard is broken.

    For larger tenants, split the sync flows by environment group instead of running them tenant-wide. The kit supports filtering, and partial visibility refreshed daily beats full visibility refreshed never.

    Decide what governance question the CoE is actually answering for you before you build dashboards on top of it. Inventory is not governance. A list of 12,000 apps with no owner attached is just a longer problem. The broader challenge of Power Platform governance that does not kill adoption is worth thinking through before you design your DLP and ownership policies around what the CoE surfaces, because the data is only useful if makers trust the system enough to stay inside it.

    The CoE Starter Kit is genuinely good engineering. It just is not magic. If you are starting to build out more automation on top of your tenant inventory, the question of why Power Automate is still worth learning in 2026 is a good framing for where to focus the team’s time once the CoE is stable. If you want to compare notes on how other teams are running theirs, I am always up for that conversation.

    Frequently Asked Questions

    Why does my Power Platform center of excellence setup stop working after a few weeks?

    The CoE Starter Kit relies on scheduled sync flows that call admin connectors on a recurring basis. If the service account loses its licence, hits throttling limits, or has a permission issue, those flows fail silently and your dashboards show stale data without any obvious warning.

    What licences and permissions does the CoE Starter Kit service account need?

    The service account requires either a Power Platform Administrator or Global Administrator role, plus a per-user Power Automate licence that covers premium connectors. Without the premium entitlement, the admin connector calls used by the sync flows will not run.

    How do I know if my CoE sync flows have stopped running correctly?

    The dashboards will not alert you automatically when sync flows fail, so you need to monitor flow run history directly. Comparing your app and environment counts against known tenant activity over time is a practical way to spot when the inventory has drifted from reality.

    Why does the CoE Starter Kit struggle with throttling on large tenants?

    The sync flows paginate through every environment and every app in a single run, which generates a high volume of connector calls in a short period. This makes them prone to both platform-level and connector-level throttling, so transient errors need to be handled with retries rather than treated as permanent failures.

  • RPA vs AI Automation for Enterprise Workflows

    RPA vs AI Automation for Enterprise Workflows

    RPA vs AI automation comparison for enterprise workflows

    The decision I keep watching teams get wrong: should this workflow be built with RPA or with an AI agent. The RPA vs AI automation debate gets framed as old tech versus new tech, which is the wrong frame entirely. They solve different problems. Picking the wrong one is how you end up with a fragile bot that needs babysitting or an agent that hallucinates its way through invoice approvals.

    I have built both inside a large org. Here is how I actually decide.

    Determinism and predictability

    RPA assumes the screen, the field, and the click path are the same every time. If the SAP transaction code is VA01 today and VA01 tomorrow, RPA wins. It will execute that path 10,000 times with zero variance.

    AI automation assumes variance is the input. The email phrasing changes, the PDF layout changes, the customer asks the same thing five different ways. An agent reasons over that variance. It is non-deterministic by design, which is a feature for unstructured input and a liability for structured execution.

    Rule of thumb I use: if I can write the decision tree on a whiteboard in 15 minutes, it is RPA work. If the decision tree has more than 30 branches and half of them are “it depends on the wording,” it is agent work.

    Cost per execution

    Dimension RPA (Power Automate Desktop) AI Agent (Copilot Studio)
    Per-run cost Near zero after license Roughly 1 message credit per turn, often 5 to 15 turns per task
    License model Per-bot or per-user attended/unattended Message packs, 25,000 messages per pack
    Scaling cost Linear with bot count Linear with conversation volume and tool calls
    Failure cost Bot stops, you fix it Agent confidently completes the wrong task

    RPA at 100,000 runs a month is basically free compute after the license. An agent at 100,000 runs is not. I have seen teams underestimate this by an order of magnitude because they tested with 50 runs and extrapolated linearly without counting tool calls and orchestration turns.

    Maintenance and brittleness

    RPA breaks when the UI changes. A vendor pushes a new SAP Fiori update, three selectors shift, your bot fails at 3am. I have lived this. The fix is usually 30 minutes, but you need someone on call who knows the bot.

    AI agents break differently. They do not fail loudly. They drift. The model provider updates, your prompt that worked last month now produces a slightly different output format, and downstream parsing silently fails. I wrote about this in my agentic workflow post. The failure mode is worse because users find out three days later when the wrong invoice gets paid. If you are building flows that sit underneath an agent, Power Automate error handling patterns that actually work will save you from the silent failures that surface weeks after go-live.

    RPA maintenance is reactive and obvious. Agent maintenance is proactive and requires evaluation infrastructure most teams do not build.

    What the work actually looks like

    This is the dimension nobody compares on. Look at the input.

    Structured input, structured output, no judgment needed: RPA. Copying 200 rows from a legacy system into a SharePoint list, kicking off a daily report, screen-scraping a vendor portal that has no API. Boring, repetitive, deterministic. Power Automate Desktop handles this all day. If you are still deciding whether to invest time in the broader platform, RPA is not the right tool for every repetitive task is worth reading before you commit to a build.

    Unstructured input, structured output, judgment needed: AI. Reading 500 supplier emails and extracting the PO number, classifying tickets by intent, summarizing a 40-page contract into five bullet points. This is where Copilot Studio or a custom agent earns its cost.

    The hybrid case is the most common one and the one most teams miss. The agent reads the email, extracts the structured fields, then hands off to an RPA bot or a cloud flow that executes the deterministic part. The agent is the reasoning layer. RPA is the execution layer. They are not competitors. They are stacked.

    Governance and auditability

    RPA logs are simple. Action ran, action succeeded, here is the screenshot. Auditors love this.

    AI agents need decision logs, not just execution logs. You need to capture why the agent picked tool A over tool B. Most teams I talk to are not logging this and will get caught when the first compliance review hits. I covered this in The Real Shift Is Not Faster Work It Is Who Owns the Decision. Based on what I have built, this is the gap that bites you 6 months in, not on day one.

    Choose RPA if / Choose AI if

    Choose RPA if: the input is structured, the path is deterministic, the volume is high, the cost per run needs to be near zero, and the system has no API. This is most legacy integration work.

    Choose AI automation if: the input is unstructured, the work requires classification or extraction or summarization, variance is the norm, and you have the evaluation discipline to catch silent drift.

    Choose both if: you have a real workflow. Most enterprise automation is hybrid. The line is not RPA versus AI. It is figuring out which layer does what.

    Frequently Asked Questions

    What is the difference between RPA vs AI automation for enterprise workflows?

    RPA is built for repetitive, predictable tasks where the process follows the same steps every time, while AI automation handles unstructured or variable inputs that require reasoning. They are not competing technologies but tools suited to different problems. Choosing the wrong one leads to either a fragile bot or an agent making confident mistakes.

    When should I use RPA instead of an AI agent?

    Use RPA when your process is consistent, rule-based, and can be mapped out as a clear decision tree. If the same fields, screens, or steps repeat thousands of times without variation, RPA will be faster, cheaper, and more reliable than an AI agent.

    How do I know if AI automation is worth the cost for my workflow?

    AI agents consume message credits per turn and most tasks require multiple turns, so costs scale quickly at high volumes. Before committing, calculate expected monthly runs and multiply by average turns per task, not just per conversation. Teams often underestimate this significantly when testing at small scale.

    Why does RPA break so often in enterprise environments?

    RPA relies on fixed UI selectors, so any interface update from a vendor can shift elements and cause the bot to fail. These failures are usually quick to fix but require someone familiar with the bot to be available when issues occur. Unlike AI agents, RPA fails loudly and immediately rather than silently producing wrong results.

  • Anthropic Just Launched Claude for Creative Work and It Is a Real Positioning Move

    Anthropic Just Launched Claude for Creative Work and It Is a Real Positioning Move

    Claude for Creative Work announcement from Anthropic targeting writers and designers

    Anthropic shipped Claude for Creative Work this week. It is a dedicated offering aimed at writers, designers, and other creative professionals. I spend most of my day in the automation side of AI, but I want to talk about this one because the positioning is more interesting than the product itself.

    This is the first time I have seen a major lab carve out creative work as a first-class lane, separate from coding and enterprise automation. That is a signal worth paying attention to.

    What Claude for Creative Work actually does

    The release packages Claude around the workflows that creative professionals actually run. Long-form drafting, editorial revision, voice and style consistency across a body of work, brainstorming with a model that pushes back instead of agreeing with everything. There are tighter integrations for the tools writers and designers already live in, and the framing leans heavily on Claude’s writing quality rather than benchmark scores.

    It is not a new model. It is a new product wrapper around Claude with prompts, defaults, and surfaces tuned for creative output. Think of it as Anthropic saying: we know writers were already using us, here is the version built for them.

    The technical novelty is limited. The positioning novelty is not.

    Why this segmentation matters

    For the last two years, model vendors have sold one product to everyone. Same Claude for the lawyer, the developer, the marketing copywriter, the Power Platform builder. Differentiation happened at the system prompt layer, which meant every team had to figure out their own configuration from scratch.

    Anthropic is now segmenting by job-to-be-done. Claude for coding. Claude for enterprise. Claude for creative work. This is closer to how Adobe, Microsoft, and Salesforce package software, and it is a meaningful shift in how AI gets sold and bought.

    Two things follow from this.

    First, procurement gets easier for non-technical buyers. A marketing director does not want to evaluate a foundation model. They want to know if the tool fits their team’s workflow. A clearly named product solves that.

    Second, the system prompt and tooling work that used to be invisible becomes the actual product. I wrote about this when I covered Claude on Trainium: prompt engineering is production code, and the cost of treating it casually compounds. Anthropic is now productising that layer for one specific audience. I expect the same move for legal, research, and analytics in the next year.

    The interesting question for anyone in automation is whether the enterprise lane gets the same treatment. A Claude for Enterprise Automation with first-class tool calling defaults, audit logging, and connection-scoped caching would do more for the agentic workflow problem than another model bump. If you are thinking about how Claude already fits into that orchestration layer, Claude as an Orchestration Brain Is the Most Interesting Thing Happening in Enterprise AI Right Now is worth reading alongside this one.

    What I would do with it this week

    I am not a creative professional. But drafting is a real part of my week, and I want to test this against my current setup.

    I would use Claude for Creative Work to draft long-form posts and technical explainers, then compare the output against the same prompts run through the standard Claude interface. The thing I want to know is whether the tuned defaults actually change the output meaningfully, or whether a good system prompt gets you 90 percent of the way there. My bet is on the latter, but I want to be wrong.

    I would also try it on editorial revision passes. Take a 1500-word draft I already wrote, run it through with instructions to tighten and remove filler, and see how it handles voice preservation. This is where Claude has historically been strong, and a product surface tuned for it should make the loop faster.

    One thing I will not do is force this into a Power Platform flow. Power Automate already has decent options for content generation through the existing Claude and OpenAI connectors. A creative-tuned product surface does not change the connector story. If you are weighing which model to point at which workload inside those flows, Claude vs ChatGPT Is the Wrong Question When You Are Building Automations covers exactly that tradeoff.

    The real takeaway is not the product. It is the precedent. Vendors are starting to segment foundation models by job-to-be-done, and that changes how I will think about which model lane to point at which workload going forward. More on that as the other lanes ship. I have been tracking this shift closely because it changes the buying conversation as much as the building one.

    Watching to see if Anthropic ships a creative-equivalent for enterprise automation next.

    This post was inspired by Claude For Creative Work via Anthropic.

  • Power Pages Agentic Code Just Got Server-Side Skills and I Cannot Wait to Try Them

    Power Pages Agentic Code Just Got Server-Side Skills and I Cannot Wait to Try Them

    Power Pages agentic code server-side skills generating Liquid and Web API logic

    Microsoft just shipped three new skills for the Power Pages agentic code plugin that finally let GitHub Copilot and Claude Code CLI generate server-side logic, not just front-end markup. The announcement landed on the Power Platform blog and this is the gap-closer I have been waiting for. Power pages agentic code server-side generation is the part that was missing.

    If you have built anything non-trivial on Power Pages you know the pattern. The studio handles the front-end fine. The moment you need a Web API call, a table permission rule, or a Liquid template that actually does work, you are out of the visual layer and into code that the AI tools could not see properly.

    What it actually does

    The plugin now ships three new skills focused on the server-side surface of a Power Pages site. From the post and what I have read so far, these cover generating and wiring up Liquid templates with the right context, scaffolding Web API calls against your actual Dataverse tables, and producing the configuration around table permissions and site settings that normally requires you to know exactly which knob to turn.

    The important detail is grounding. The plugin pulls from your real site context: the tables you have, the columns on them, the page structure, the site settings already in place. So when GitHub Copilot or Claude Code CLI generate Liquid or a Web API snippet, it is generated against your actual environment, not a hypothetical portal.

    I wrote about this grounding problem before in an earlier post on Power Pages and AI. Generic LLM output for Liquid looks correct and breaks immediately on deploy because the model has no idea what your tables are called or what permissions are wired up. Environmental grounding is the entire game here.

    Why server-side AI generation is different leverage

    Front-end scaffolding from AI is useful but cheap. Anyone can prompt a model to spit out HTML and CSS for a form. The hard part of Power Pages was never the markup. It was the layer underneath: Liquid templates, Web API permissions, plugin logic, the settings that decide whether a record is visible to an authenticated contact or not.

    That layer is where sites break. That layer is where you used to bolt on a Power Automate flow because it felt easier than figuring out the right server-side pattern. And bolting on a flow for what should be a server-side query is exactly the kind of decision that creates a silo three months later when nobody remembers why the form posts to a flow instead of using the Web API directly. Understanding Power Automate error handling patterns helps when those flows do need to exist, but the goal should be keeping server-side work on the server side.

    Generating server-side logic with proper grounding cuts that path off. You stay inside the site. You use the layer that was already designed for it. The trade-off is real though: you now have AI-generated Liquid and permissions in a place where mistakes are harder to spot than a broken button. Server-side bugs do not show up as the dreaded broken image icon. They show up as a record being visible to the wrong user, which is a much worse failure mode.

    This is why I keep saying tool design for AI agents is an API design problem. The quality of the environmental signals fed to the model matters more than the model itself. Microsoft Learn for Power Pages is the reference I keep open when I am sanity-checking what these skills produce.

    What I would do with it this week

    I would pick a small internal-style site with two or three Dataverse tables, authenticated users, and one workflow that currently lives in a Power Automate flow it should not live in. Something where a list view filters by the logged-in contact, and a form writes back to a related table.

    Then I would have Claude Code CLI do three things. First, generate the Liquid for the filtered list using the new skill, against the real tables. Second, generate the Web API call for the form submission with the right permissions, instead of the Power Automate detour. Third, write a small piece of logic that should obviously fail without context, like referencing a column that does not exist, just to see how the grounding holds up.

    The point is not to ship anything. The point is to find where it breaks. In my experience, the only honest way to evaluate one of these releases is to push it until it produces something wrong, then look at why. If you are thinking about where this fits in a broader agentic architecture, most agentic workflows are just fancy if/then logic until the grounding is solid enough to trust the generated output in production.

    If the grounding is as good as the announcement suggests, this changes how I would start any new Power Pages build going forward.

    This post was inspired by Build your server-side logic with AI: new Power Pages Agentic Code skills via Microsoft Power Platform Blog.

  • Microsoft Shipped Multi-Agent Orchestration in Copilot Studio and the Patterns Matter More Than the Feature

    Microsoft Shipped Multi-Agent Orchestration in Copilot Studio and the Patterns Matter More Than the Feature

    Multi-agent orchestration patterns in Copilot Studio diagram

    Microsoft shipped general availability of multi-agent orchestration in Copilot Studio this month, letting one agent call another agent as a skill, including agents built in Azure AI Foundry and external M365 Copilot agents. The announcement landed on the Copilot Studio docs and the blog feed last week. If you build agents inside Microsoft 365, this changes how you should think about agent design starting now.

    I have been testing it for a few days. The feature itself is straightforward. The interesting part is which multi-agent orchestration pattern you pick, because most teams will default to the wrong one and rebuild six months from now.

    What it actually does

    You can now register another agent as a connected agent inside Copilot Studio. The parent agent sees it as a tool. When the LLM decides the user request fits the connected agent’s described purpose, it hands off the conversation, gets a structured response back, and continues reasoning.

    Three things are worth knowing. The handoff is generative, meaning routing depends on the description you write for each connected agent, not on trigger phrases. The child agent runs with its own instructions, its own tools, and its own knowledge sources. And the parent agent receives the child’s output as context it can reason over, not as a final answer it must return verbatim.

    You can chain this. Parent calls child A, child A calls child B. You can also fan out, where the parent calls multiple children based on the request shape.

    Why it matters

    Single-agent designs hit a wall fast. I wrote about this in Most Agentic Workflows Are Just Fancy If/Then Logic in a Trench Coat. When one agent owns too many topics, routing fails at the edges, the system prompt balloons past 4000 tokens, and every change risks breaking three unrelated flows.

    Multi-agent orchestration lets you split by domain. One agent for HR queries, one for IT, one for finance, each with its own tight instructions and tool list. The parent becomes a router with personality. That sounds clean, and it can be, but only if you pick the right pattern.

    From what I have seen building and from what I hear from people at other organisations, three patterns keep showing up:

    Supervisor pattern. One parent agent owns the conversation, delegates to specialists, aggregates results. Best when the user does not need to know which agent is answering. Trade-off: the parent’s instructions become the bottleneck. If the parent picks the wrong child confidently, you get the same misrouting failure mode I described in my post on Copilot Studio agents passing tests and still failing in production, just one layer deeper and harder to debug.

    Sequential pipeline. Agent A produces output, agent B consumes it, agent C finalises. Best for structured workflows like draft, review, publish. Trade-off: latency stacks. Each hop adds seconds, and latency is the quiet killer of agentic workflows — budgeting round-trip time before you build matters more than most teams expect.

    Peer network. Agents call each other based on need, no fixed parent. Powerful, almost always premature. I have not seen a single internal use case where this beats a supervisor design once you account for debugging cost.

    The real win is not the multi-agent capability itself. It is the reduction in system prompt size per agent. When each agent has 400 tokens of focused instructions instead of 4000 tokens trying to cover everything, behavior gets predictable. Output testing gets meaningful. Drift gets easier to catch.

    What I would do with it this week

    Pick one existing Copilot Studio agent that has grown too many topics. Look at the system prompt. If it is over 2000 tokens or covers more than two distinct domains, it is a candidate.

    Split it. Create one child agent per domain. Move the relevant tools, knowledge sources, and instructions into the child. Write a clear description for each child, because that description is what the parent’s LLM uses to route. Vague descriptions kill routing accuracy faster than anything else. If you need to wire external capabilities into those child agents, building a custom connector for Copilot Studio is the practical path for connecting APIs that do not already have a prebuilt connector.

    Then test the edges. Phrasings that sit between two children. Requests that should hit two children sequentially. Requests that should fail cleanly when no child fits. The supervisor pattern looks great on the happy path. The behavior at the edges is what tells you whether the split was worth it.

    Multi-agent orchestration is not a silver bullet, it is a structural tool. Used right, it makes agents maintainable. Used wrong, it builds another silo with extra latency. The patterns matter more than the feature.