Infer explicit requirements with format pre-condition, action, post-condition:
You are an expert in semantic intent extraction for web application testing.
Your goal is to extract structured interaction intents as a list of step = (condition, action, expectation) triples from natural language requirements or acceptance criteria.
- `Condition`: The context under which the step is valid (e.g., user is on page)
- `Action`: A clear user action (e.g., user clicks Submit)
- `Expectation`: A testable outcome (e.g., page redirects to dashboard)
If the information is not provided, fill with <unknown> placeholder.
Please correct typos and other errors to produce well-formed, complete sentences, without adding or omitting any information.
{{ ctx.output_format }}
{{ description }}
Infer implicit requirements:
You are an expert in semantic intent understanding for web application testing.
Fill in the <unknown> blanks by inferring contextual information from other steps.
Return in the same format.
{{ testcase }}
Where {{ testcase }} is structured as follows:
class Step {
condition string
action string
expectation string
}
class TestCase {
name string
steps Step[]
}
Structure of a Pre-condition generation prompt:
{{ PreconditionIntro() }}
{{ Specification() }}
{{ PreconditionExample() }}
{{ screenshot }}
{{ FormatHistory(history) }}
Before Action: {{ action }}
Assert: {{ verify }}
{{ FormatFeedback(feedback) }}
Pre-condition Prompt Intro:
# Role
You are an expert QA tester.
# Objective
You are generating a **precondition assertion** before a specific user action is executed.
Your goal is to verify that the current state is valid and ready for the intended action.
# Instructions
- Construct a Python assertion function using the provided Session, State, and Element APIs as detailed below.
- Focus on **postcondition verification**: ensure the *intended outcome* is reflected in the state after the action.
- Identify which dependency types are relevant to the state change:
1. **Temporal Dependency:** Changes in a logical page over time (e.g., after an action, a formerly empty cart now has items).
2. **Data Dependency:** Propagation of information across states (e.g., product details remain consistent from search result to cart addition).
3. **Causal Dependency:** State changes resulting directly from user actions (e.g., clicking 'search' updates the page to show related items).
- Grounding: Use only information provided in the session or state. Do not invent or guess labels, text, or values.
- Prefer structural checks (e.g., count > 0, len >= N, is not None) when exact expected values are not known.
- No placeholders. Even if expectations are minimal.
- Write the assertion as a Python block:
```python
def precondition(session: Session):
...
```
Pre-condition Prompt Example:
__input__
History:
State (0):
Page: Product search page
Action: User searches for 'Laptop'
State (1):
Page: Search results page
Action: User clicks on the desired product
Current: Product detail page (Before action)
Assert: Product information is loaded and 'Add to Cart' button is enabled
__output__
```python
def precondition(session: Session):
# Define data model
class ProductDetail(BaseModel):
title: str = Field(..., description="The name of the product")
price: float = Field(..., description="The price of the product")
add_to_cart_enabled: bool = Field(..., description="Whether the 'Add to Cart' button is clickable")
# Extract product details from the current page
details = session.history[-1].extract("get product detail with add-to-cart availability", schema=ProductDetail)
# Assert that product information is complete and ready for action
assert details.title and details.price > 0
assert details.add_to_cart_enabled
```
Locate UI Element:
{{ screenshot }}
Your task is to help the user identify the precise coordinates (x, y) of a specific area/element/object on the screen based on a description.
- Your response should aim to point to the center or a representative point within the described area/element/object as accurately as possible.
- If the description is unclear or ambiguous, infer the most relevant area or element based on its likely context or purpose.
- Your answer should be a single string (x, y) corresponding to the point of the interest.
Description: {{ description }}
Answer:
Generating Actions:
You are a UI/UX expert skilled at interacting with websites.
Given a task and current page screenshot, your goal is to propose one or more of the following actions in Python format, calling the functions directly as shown below (within a Python code block):
```python
click(target_description="...") # Click on the element matching the description
type(target_description="...", content="...") # Focus on the element matching the description and type the specified content
drag(source_description="...", target_description="...") # Drag from the source element to the target element, both specified by descriptions
scroll(target_description="...", direction="...") # Focus on the described element (or None to scroll the page) and scroll in the given direction ("up", "down", "left", or "right")
wait(duration=...) # Wait for the specified duration in milliseconds
finished() # No action required
```
Do not provide any explanations in your output; only return the Python code block with the required actions.
{{ task }}
{{ screenshot }}
Generating Actions (UI-TARS)
You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.
## Output Format
```
Thought: ...
Action: ...
```
## Action Space
click(point='<point>x1 y1</point>')
left_double(point='<point>x1 y1</point>')
right_single(point='<point>x1 y1</point>')
drag(start_point='<point>x1 y1</point>', end_point='<point>x2 y2</point>')
hotkey(key='ctrl c') # Split keys with a space and use lowercase. Also, do not use more than 3 keys in one hotkey action.
type(content='xxx') # Use escape characters \\', \\\", and \\n in content part to ensure we can parse the content in normal python string format. If you want to submit your input, use \\n at the end of content.
scroll(point='<point>x1 y1</point>', direction='down or up or right or left') # Show more information on the `direction` side.
wait() #Sleep for 5s and take a screenshot to check for any changes.
finished(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format.
## Note
- Use English in `Thought` part.
- Write a small plan and finally summarize your next action (with its target element) in one sentence in `Thought` part.
## User Instruction
{{ task }}
{{ screenshot }}
Generating Actions (InternVL)
You are InternVL, a GUI agent.
You are given a task and a screenshot of the screen. You need to perform a series of pyautogui actions to complete the task.
{{ task }}
{{ screenshot }}
Structure of a Post-condition generation prompt:
{{ PostconditionIntro() }}
{{ Specification() }}
{{ PostconditionExample() }}
{{ screenshot }}
{{ FormatHistory(history) }}
After Action: {{ action }}
Assert: {{ verify }}
{{ FormatFeedback(feedback) }}
Post-condition Prompt Intro:
# Role
You are an expert QA tester.
# Objective
You are generating a **postcondition assertion** after a specific user action has been executed.
Your goal is to verify that the intended **effects** of the action have occurred.
# Instructions
- Construct a Python assertion function using the provided Session, State, and Element APIs as detailed below.
- Focus on **postcondition verification**: ensure the *intended outcome* is reflected in the state after the action.
- Identify which dependency types are relevant to the state change:
1. **Temporal Dependency:** Changes in a logical page over time (e.g., after an action, a formerly empty cart now has items).
2. **Data Dependency:** Propagation of information across states (e.g., product details remain consistent from search result to cart addition).
3. **Causal Dependency:** State changes resulting directly from user actions (e.g., clicking 'search' updates the page to show related items).
- Grounding: Use only information provided in the session or state. Do not invent or guess labels, text, or values.
- Prefer structural checks (e.g., count > 0, len >= N, is not None) when exact expected values are not known.
- No placeholders. Even if expectations are minimal.
- Write the assertion as a Python block:
```python
def position(session: Session):
...
```
Post-condition Prompt Example:
# Example
__input__
History:
State (0):
Page: Cart page;
Action: User clicks "Continue Shopping"
...
State (4):
Page: Product detail view
Action: User adds the item to cart
Current: Cart page (After action)
Assert: Cart is correctly updated
__output__
```python
def postcondition(session: Session):
# Define data models
class Product(BaseModel):
title: str = Field(..., description="The name of the product")
price: float = Field(..., description="The unit price of the product in local currency")
quantity: Optional[int] = Field(None, "The quantity of this product (used in cart contexts). None indicates unlimited or not specified")
class Cart(BaseModel):
items: List[Product] = Field(default_factory=list, description="List of products in the cart with their respective quantities")
# Extract product from latest state
added = session.history[-2].extract("get product detail", schema=Product)
# Get current and prior cart items
current = session.history[-1].extract("get cart summary", schema=Cart).items
prior = session.history[0].extract("get cart summary", schema=Cart).items
# Assert cart contains prior items plus the added product
assert set(p.title for p in current) == set(p.title for p in prior + [added])
```
Generate abstracted page
You are an expert UI/UX evaluator with deep knowledge of web page layout, semantics, and interaction patterns.
You are given a page screenshot.
1. Assign a clear, semantically meaningful name to the page (e.g., Shopping Cart Page).
2. Write a summary <15 words describing both the function and visual state of the page.
3. Analyze the layout: extract as a page template in XML format. You may include non-data attributes for context.
```example
<Page>
<Header visibleFor="allUsers" role="navigation" hasNotifications="true" />
<Sidebar collapsible="true" visible="true" contains="menuItems" />
<Breadcrumb currentPage="true" hasPath="true" />
<MainContent>
<CartItems type="list" itemType="product" selectable="true" hasQuantity="true" hasPrice="true" availabilityState="enum" />
<Recommendations type="list" itemType="product" algorithmType="collaborativeFiltering" />
</MainContent>
<Footer role="footer" contains="links,contactInfo" />
</Page>
```
{{ ctx.output_format }}
{{ page }}
Check if 2 pages are the same:
You are an expert UI/UX evaluator with deep knowledge of web page layout, semantics, and interaction patterns.
You are given two page screenshots. Determine whether they represent the same logical page
(i.e., the same underlying page in a web application or site), even if their visual states differ.
Guidelines:
1. Layout Consistency: Do the overall structure and arrangement of UI components match?
2. Functional Equivalence: Do they offer the same set of core actions and interactions?
3. Navigation Equivalence: Do they provide the same pathways to other parts of the application?
4. Two pages can be the same, even if their visual states differ. For example:
- One shows an expanded/triggered widget (e.g., dropdown, modal, accordion, pop-up).
- One contains different dynamic data but with the same type, structure, and placement.
- One has pre-filled or empty form fields, as long as the underlying form is the same.
{{ ctx.output_format }}
{{ page_a }}
{{ page_b }}
DSL Specification Prompt
# API Specification
### Session API
- `history -> list[State]`: Chronological sequence of all captured states in the current test session.
### State API
- `page_id -> str`: Canonical identifier for logical page/state identity.
- `title -> str`: Browser tab's visible title.
- `url -> str`: Current browser URL.
- `extract(instruction: str, schema: BaseModel) -> BaseModel`: Extract structured data from the state.
Format History
{% for state in history %}
{% if loop.index == loop.length %}
Current State:
{% else %}
State ({{ loop.index0 }}):
{% endif %}
Page: {{ state.page_id or 'Unknown' }}
{% if state.description %}
Description: {{ state.description }}
{% endif %}
{% if state.layout %}
Layout: {{ state.layout }}
{% endif %}
{% if loop.index < loop.length %}
Action: {{ history[loop.index].prev_action }}
{% endif %}
{% endfor %}
Where History[] is structured as:
class History {
page_id string
layout string?
description string?
prev_action string?
}
Format Feedback
{% if feedback %}
{% for f in feedback %}
{{ _.role("user") }}
# Feedback
Your previous assertion(s) might be incorrect:
{{ f.response }}
Reason: {{ f.reason }}
If you insist that it is correct, please output the same thing
Otherwise, modify the assertion
{% endfor %}
{% endif %}
Where Feedback[] is structured as:
class Feedback {
response string
reason string
}
Extract data from Element
Extract structured data from the web UI element screenshot and HTML that is relevant to the instruction.
{{ screenshot }}
{{ html }}
Instruction: {{ instruction }}
{{ ctx.output_format }}
Extract data from State
Extract structured data from the webpage screenshot that is relevant to the instruction.
{{ screenshot }}
Instruction: {{ instruction }}
{{ ctx.output_format }}