planning: Automation testing strategy for Jan #4907

david-menloai · 2025-04-14T15:06:32Z

Problem Statement:

Currently, creating automated UI tests involves a significant amount of manual effort and technical expertise. The typical process includes:

Manually Identifying Element Locators: Finding stable CSS selectors, XPaths, IDs, etc., for each UI element the test interacts with. This is often brittle and time-consuming.

Coding User Interactions: Writing code (e.g., using Selenium, Playwright, Cypress) to simulate user actions like clicking buttons, typing text, selecting dropdowns, etc., based on the identified locators.
Structuring Tests: Grouping individual actions into logical test steps and combining steps to form complete test scenarios.
Maintenance: Updating locators and scripts when the UI changes.
This multi-step, code-intensive process acts as a bottleneck, slowing down test creation, increasing maintenance overhead, and requiring specialized automation engineers.

Proposed Solution:

We propose the development of an AI Test Agent designed to drastically simplify UI test automation. The core idea is to shift the low-level implementation details (locator finding, action simulation, step coding) from the human user to an AI.

The intended workflow would be

User Input: The user provides high-level test scenarios written in a Behavior-Driven Development (BDD) format (e.g., Gherkin syntax: Given/When/Then). Example:
Gherkin

Scenario: View Jan Hardware usage
  Given I open Jan application
  When I click the 'System Monitor' button on the bottom right
  Then I should see current hardware usage for CPU and Memory

AI Test Agent Processing: The AI agent takes the BDD scenario as input and performs the following:

Parses BDD Steps: Understands the intent described in each Given/When/Then step.
Interprets Actions: Maps natural language descriptions (e.g., "click the 'System Monitor' button") to corresponding UI interactions.
Dynamic Element Location: Intelligently identifies the target UI elements (e.g., the button labeled "System Monitor", the labels "CPU", "Memory") at runtime, potentially using a combination of visual AI and DOM analysis, reducing reliance on brittle selectors.
Executes Actions: Uses an underlying browser automation driver (Playwright) to perform the identified actions on the application under test.
Performs Validations: Checks for expected outcomes as described in "Then" steps.

Output: The agent executes the test and reports the results (pass/fail) based on the BDD scenario.

Goals & Motivation:

Reduce Manual Effort: Significantly decrease the time and effort required to create and maintain UI automation scripts.
Accelerate Test Creation: Enable faster development of new automated tests.
Lower Technical Barrier: Allow team members with less coding experience (e.g., manual QAs, BAs) to contribute effectively to test automation by focusing on writing BDD scenarios.
Improve Maintainability: Reduce test fragility by relying on AI for element location, potentially making tests more resilient to minor UI changes.
Focus on Behavior: Shift the team's focus from how to automate (implementation details) to what to test (application behavior).

Acceptance Criteria (Initial High-Level):

The system accepts BDD feature files or plain text scenarios as input.
The AI Test Agent can parse common Gherkin keywords (Given, When, Then, And, But).
The AI agent can interpret and execute basic UI actions (e.g., navigate, click, type, select, verify text presence).
The AI agent can dynamically locate target UI elements based on textual descriptions (e.g., labels, placeholders, button text) within the BDD steps.
Test execution results are reported clearly, indicating pass/fail status for scenarios/steps.

Potential Challenges & Considerations:

Reliability and accuracy of AI-driven element identification.
Handling ambiguity in natural language BDD steps.
Performance overhead of AI analysis during test execution.
Integration with existing testing frameworks and CI/CD pipelines.
Managing complex interactions, waits, and synchronization issues.

The text was updated successfully, but these errors were encountered:

david-menloai · 2025-04-14T15:35:16Z

A problem with the current tool-based implementation is the context length limitation.

{'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 206607 tokens (205528 in the messages, 1079 in the functions). Please reduce the length of the messages or functions.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

❌ This happens with a demo website with a huge DOM 😅

david-menloai added the QA For tracking QA processes and initiatives label Apr 14, 2025

david-menloai self-assigned this Apr 14, 2025

david-menloai added this to Jan Apr 14, 2025

github-project-automation bot added this to Menlo Apr 14, 2025

github-project-automation bot moved this to Investigating in Menlo Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planning: Automation testing strategy for Jan #4907

planning: Automation testing strategy for Jan #4907

david-menloai commented Apr 14, 2025

david-menloai commented Apr 14, 2025 •

edited

Loading

planning: Automation testing strategy for Jan #4907

planning: Automation testing strategy for Jan #4907

Comments

david-menloai commented Apr 14, 2025

Problem Statement:

Proposed Solution:

The intended workflow would be

Goals & Motivation:

Acceptance Criteria (Initial High-Level):

Potential Challenges & Considerations:

david-menloai commented Apr 14, 2025 • edited Loading

david-menloai commented Apr 14, 2025 •

edited

Loading