“Self-healing” is one of the most overloaded terms in test automation. Walk into any vendor demo and you will hear it. Read any product page for any AI testing tool launched in the last four years and you will see it. Ask ten different tools what self-healing means and you will get ten slightly different answers, most of which describe the same basic mechanism dressed up in progressively more impressive-sounding language.
This post is for the people who are tired of the marketing. If you’ve been burned by a “self-healing” tool that kept breaking after every software update, or you’re evaluating platforms and trying to figure out what the term actually means before you sign a contract, this is where that question gets answered plainly.
We’ll cover three genuinely different levels of self-healing capability, what most vendors are actually selling when they use the word, and the specific questions to ask in a demo that will tell you immediately which level you’re looking at.
Full disclosure up front: Sofy builds AI test agents for SAP and Dynamics 365, and we think we operate at a level most tools don’t. We’re going to explain exactly what we mean by that, and give you the tools to verify it for yourself, not just take our word for it.
If you’re coming to this from a specific ERP context, the Beyond RSAT guide covers how self-healing specifically applies to Dynamics 365 release wave testing, and our AI agents for ERP testing explainer covers the broader category. But this post is tool-agnostic, useful whether you’re evaluating Sofy, Leapwork, Tricentis, or anything else.
1. What ‘Self-Healing’ Actually Means: The Three Levels
The phrase “self-healing test automation” sounds intuitive: tests that fix themselves when something breaks. But that description covers an enormous range of actual capability, from “retries the same thing twice before giving up” to “understand that the underlying business process hasn’t changed and finds a new path to validate it.”
These aren’t points on a spectrum. They’re fundamentally different architectural approaches to the same stated goal. Here’s the framework that actually separates them:
| Level | Name | What it does | Who offers it | What it actually means |
| Level 1 | Selector Retry | Re-tries the same CSS/XPath selector 2–3 times before failing | Most automation tools | Fails on any structural DOM change. Just delays failure. |
| Level 2 | Element Re-ID | Scans the DOM to find a visually similar element after a change | Some AI-assisted tools | Still UI-bound. Breaks when layout changes fundamentally. |
| Level 3 | Workflow Adaptation | Understands the process intent and re-routes the test path | Sofy AI Agents | Heals at the business process layer, not the DOM layer. |
Most demos you see will look like Level 3 from the outside. The test runs. Something changes. The test adapts. It passes. What you don’t see in the demo is what actually happened under the hood, and that determines everything about how the system behaves at scale, under real update pressure, in a production-grade ERP environment.
2. Level 1 (Most Tools): Selector Retry, Why It’s Not Really Healing
When most automation tools market “self-healing,” they mean this: when a test step fails to find an element, the tool tries the same selector two or three more times before marking the test as failed.
That’s it. There is no intelligence involved. There is no adaptation. The tool simply waits and retries, on the assumption that the failure was a timing issue rather than a structural change. In some contexts, slow-loading pages, network latency, retry logic is useful. In the context of ERP test automation, where failures are almost never timing-related, it’s close to useless.
Here’s what actually happens when a Dynamics 365 release wave changes a button label from “Submit” to “Post”:
- A Level 1 tool finds the “Submit” button. It’s not there. It waits 500ms. Still not there. It waits another 500ms. Still not there. It marks the test as failed.
- Your QA team gets the alert at 7am. Someone opens the test. Someone realizes the button was renamed. Someone updates the selector. The test gets re-run. It passes.
- This process takes between 20 minutes and 2 days, depending on team size and release pressure.
- Multiply this by the 150–300 UI changes that come with a typical D365 Wave release. That’s your regression sprint.
Most tools that claim self-healing are in this category. What they tell is in how they describe it: “automatic retry,” “flaky test detection,” “wait strategies,” “smart delays.” These are timing mechanisms, not healing mechanisms.
The test: ask the vendor “what happens when a button is permanently renamed, not temporarily missing?” If the answer involves retrying, waiting, or flagging the test as flaky, that’s Level 1. If the answer doesn’t involve any of those things, keep reading.
3. Level 2: Element Re-Identification, Better, But Still UI-Bound
Level 2 is a genuine improvement. Instead of retrying the same selector, these tools scan the page after a failure and try to find an element that looks functionally similar to what they were looking for. They might use machine learning to match visual characteristics, position, color, label text, surrounding context, and re-identify the element in its new state.
This is materially more useful than retry logic. If a button moves from the top of a form to the bottom, Level 2 can often find it. If a field gets a new CSS class but keeps the same label, Level 2 can often match it. For straightforward web application testing with stable layouts, Level 2 represents a real reduction in manual maintenance.
The problem, particularly for ERP testing, is that Level 2 is still fundamentally UI-bound. It heals at the element layer, finding the right thing to click, but it has no understanding of whether clicking that thing produces the correct outcome. Consider what happens when:
- A D365 Wave release reorganizes a Financial Dimensions form, moving fields between tabs, adding a new required field, and changing the tab structure. A Level 2 tool might find some of the moved elements, but it has no framework for knowing that the test its running is supposed to validate a journal posting, and that the new required field changes what “valid submission” means.
- SAP migrates a workflow from SAP GUI to SAP Fiori, a completely different interface with different interaction patterns. Level 2 element matching fails because there are no similar elements to match. The old SAP GUI transaction is gone. The new Fiori app is a different product.
- A multi-step approval workflow in D365 Finance gets a new intermediate step added by a Wave update. The test gets partway through the approval chain and then hits a new screen it has never seen. Element re-identification can’t help because there’s nothing to re-identify, it’s a page that didn’t exist when the test was recorded.
In each of these cases, Level 2 healing doesn’t fail loudly. It either silently gets stuck or marks the test as failed for reasons that require a human to diagnose. Which means your QA team is still doing the same work, they just have slightly fewer false positives from timing issues.
“Level 2 is like a GPS that finds the same address in a city that’s been rebuilt. It can’t navigate if the road no longer exists.”
4. Level 3 (Sofy): Workflow-Level Adaptation, Healing at the Process Layer
Level 3 starts from a fundamentally different premise. Instead of asking “where is the element I need to click?”, it asks “what is this test trying to validate, and what does success look like at the process level?”
For most software testing, this distinction barely matters. A web form is a web form. What you’re validating is usually close enough to what’s on the screen that UI-level testing is sufficient.
For ERP testing, the distinction is everything. A D365 general journal posting test isn’t trying to validate that a specific button was clicked. It’s trying to validate that:
- The correct GL account was debited
- The correct GL account was credited
- The financial dimensions on the header carried through to the line
- The posting period is correct
- The resulting ledger entry is in balance
None of those validations are UI states. They’re financial data outcomes. And the specific sequence of UI interactions needed to trigger and verify them might change entirely with a release wave, while the underlying business requirement stays exactly the same.
When Sofy’s Finance Agent runs a journal posting test and a Wave release changes the journal entry form, new fields added, some moved, the approval flow extended, the agent doesn’t look for the old form elements. It understands what a valid journal posting looks like in D365 Finance: which tables get updated, what values they should contain, what the GL entry looks like when the process completes correctly. It navigates whatever UI is present to reach that outcome, then validates the outcome directly against the expected financial state.
This is the distinction that matters for compliance-driven testing: “Did the test pass?” is a UI question. “Did the journal post to the correct GL account with the correct dimensions?” is a financial question. Level 3 answers the financial question. Levels 1 and 2 only answer the UI question.
What workflow-level adaptation looks like in practice
Here’s a concrete example from a recent Microsoft Dynamics 365 Business Central launch behavior change.
Microsoft altered how Business Central is opened from within the Microsoft 365 Copilot environment. What used to be a direct entry point no longer consistently lands the user inside the application.
From Microsoft’s perspective, this is a minor navigation change. For automation, it breaks the workflow.
For a Level 1 tool:
- The script clicks the Business Central entry point.
- The expected application screen does not load.
- The script has no awareness of context or fallback paths.
- Test fails. Manual investigation required.
For a Level 2 tool:
The tool successfully identifies and clicks the Business Central element. But the application does not launch as expected—it remains on the Copilot page with an account flyout visible.
The tool cannot reason about why the workflow failed or what to do next. Test fails. Manual investigation required.
For Sofy:
The agent recognizes that clicking the Business Central entry did not result in the expected application state.
Instead of stopping, it evaluates the context:
- Confirms current state is still within Copilot
- Detects that Business Central did not launch
- Identifies alternate valid entry paths
It then dynamically switches strategy:
- Navigates directly via the Business Central URL
- Re-enters the application through a secondary route
Once the correct landing (sign-in / dashboard flow) is detected, the agent resumes execution.
Test completes successfully. No manual intervention.

Screenshot: Sofy self-healing event log from a Microsoft Business Central change, showing change detection, agent adaptation, and continued test execution.
5. When Self-Healing Isn’t Enough: What Still Needs Human Judgment
This section exists because honest marketing matters. Sofy’s Level 3 self-healing is genuinely powerful for the scenarios we’ve described. It’s not magic. There are situations where automated healing, at any level, isn’t appropriate, and where human judgment is the right call.
If we don’t tell you this, you’ll find out in production. Better to hear it from us now.
When the business process itself changes, not just the UI
Self-healing of any kind addresses UI changes, adaptations to how a process is navigated. If Microsoft changes the underlying business logic of a process in a release wave (which they do, vendor payment terms, tax calculation methods, financial dimension validation rules have all changed in recent waves), a self-healing test should not automatically adapt to that change.
If the process outcome should be different in the new version, an automated test adapting to produce the old outcome is not healing, it’s masking a change that a human needs to evaluate. Sofy’s agent’s flag process-level changes for human review rather than silently adapting. The healing log distinguishes between UI path changes (auto-adapted) and process logic changes (flagged for review).
When a new process step has compliance implications
Some workflow changes are safe to auto-heal, especially when a new field or step allows a default action that doesn’t impact the financial or operational outcome.
But not all changes are equal. A newly introduced mandatory approval step in a critical process (like period close) is different. Automatically bypassing or resolving a step that is designed to require human authorization would undermine the control itself.
Real self-healing at the workflow level requires understanding which adaptations are safe to make autonomously and which require human intervention. A robust system makes this distinction explicit, it doesn’t blindly adapt to every change.
When test coverage was wrong to begin with
Self-healing can’t fix a test that was validating the wrong thing. If a test was written to check that a field contains a specific hardcoded value rather than that the field is correct for the given transaction context, no amount of UI adaptation will make that test meaningful. Garbage in, garbage out, at every level of self-healing.
This is why we always recommend building a test suite that validates outcomes before optimizing for self-healing. A self-healing test that validates the right thing is powerful. A self-healing test that validates the wrong thing is an efficiently maintained false sense of security.
The honest summary: self-healing is not a replacement for good test design. It is a multiplier on good test design, it makes well-constructed tests durable across change. It cannot make poorly constructed tests meaningful.
6. How to Evaluate a Vendor’s Self-Healing Claim in 3 Questions
You’re in a demo. The vendor shows you self-healing in action. The test adapts. It passes. You’re impressed.
Here are the three questions that will tell you in under five minutes which level you’re actually looking at, and whether it will hold up in a real release wave environment.
| Evaluation question | Ask your vendor this | Level 1–2 answer | Level 3 answer | Why this question |
| Question 1 | What layer does healing operate at? | DOM/UI layer | Business process / workflow layer | If the answer is about selectors, locators, or DOM scanning, it is Level 1 or 2. Real workflow healing never mentions the DOM. |
| Question 2 | Can it heal when the UI is redesigned, not just when an element moves? | No, requires re-scripting for layout changes | Yes, process logic is independent of UI structure | Most tools answer yes to element moves. Very few can handle a full UI redesign, like the Fiori migration in SAP or a D365 form reorganization after a Wave release. |
| Question 3 | Can you show me a healing event log from a real release wave change? | Pass/fail log only, no healing detail | Field-level log: change detected, path re-routed, test passed | Ask for evidence, not a claim. A real self-healing system produces an audit trail showing what changed, how the agent adapted, and what the outcome was. |
These three questions work for any vendor, any tool. Level 1 and Level 2 tools will struggle with Questions 1 and 3. Question 2 is where most demos are structured to impress, elements moving around a page, while real ERP scenarios involve processes changing, not just elements moving.
The healing event log question (Question 3) is the most important. A tool that genuinely heals at the workflow level will have a detailed, inspectable record of what changed, how the agent adapted, and what the test validated. A tool that retries selectors will have a pass/fail log with a note that the test was retried.
One more question worth asking, specific to ERP: “Has your self-healing been tested against an actual D365 release wave or SAP transport in a customer environment?”
Press for specifics. Ask for the Wave version. Ask what changed. Ask what the agent adapted. Real answers are specific. Marketing answers are general.
The Bottom Line
“Self-healing test automation” means three very different things depending on who’s saying it. Most vendors mean selector retry with an AI label on top. Some mean element re-identification, which is genuinely useful but still breaks when processes change, not just when elements move. A small number mean workflow-level adaptation that heals at the business process layer and produces an auditable record of every adaptation.
If you’re running ERP test automation, for SAP, Dynamics 365, or any enterprise platform with a biannual release cadence, the level of self-healing you need is Level 3. Levels 1 and 2 reduce the frequency of test maintenance. They don’t eliminate it. And in an environment that ships hundreds of changes twice a year, “reduced maintenance” isn’t a solution, it’s a slower version of the same problem.
The good news: the questions above will tell you exactly which level you’re evaluating. Ask them early. Ask for evidence. Don’t accept general claims about AI as a substitute for a specific healing event log from a real release scenario.
See Level 3 self-healing in action, for Dynamics 365 and SAP.
Sofy’s AI agents adapt to ERP changes at the workflow level, not the DOM level. Field-level healing event logs. Auditable evidence, not a green checkmark.
Dynamics 365 Agent | ERP Test Agent | SAP Test Agent
Frequently Asked Questions
Self-healing test automation is the ability of a test system to detect that something has changed in the application under test and automatically adapt, without requiring a human to manually update test scripts. The term covers three genuinely different levels of capability: selector retry (Level 1), element re-identification (Level 2), and workflow-level process adaptation (Level 3). Most vendor marketing uses the term to describe Level 1 or 2 behavior.
In ERP environments, self-healing must operate at the process level, not just the UI level, because ERP releases change business logic, form structures, and approval workflows simultaneously. A tool that re-identifies UI elements (Level 2) will still fail when a process step is added or a form is redesigned. Level 3 self-healing understands the business process intent (e.g., “validate a vendor payment posting”) and navigates whatever UI is present to reach and validate the correct financial outcome, regardless of how the interface changed.
“Adaptive test automation” and “self-healing test automation” are often used interchangeably. When a vendor uses “adaptive,” it typically signals Level 2 capability, adapting to element changes within the same UI paradigm. “Self-healing” is broader and covers all three levels. The more meaningful distinction is not terminology but the layer at which adaptation occurs: DOM/selector level, element/visual level, or business process level.
No, and any vendor who claims it can is overselling. Self-healing automation is a multiplier on good test design, not a replacement for human judgment. Process-level changes with compliance implications should always involve human review. Tests that were poorly designed remain poorly designed after self-healing, they’re just maintained more efficiently. The right mental model is that self-healing eliminates the maintenance overhead of UI-level changes, freeing QA teams to focus on higher-value work like test design, exploratory testing, and release readiness assessment.