πDemonstration Data
Demonstration Data
Every aspect of our platform is designed to generate high-quality training data for multimodal computer-use agents. From AI-generated tasks to expert demonstrations, we capture comprehensive data that helps train AI to understand and replicate human computer interactions.
Demonstration Files Structure
Each recorded demonstration consists of multiple files:
input_log.jsonl - Detailed event log of all user interactions
meta.json - Metadata and configuration information about the demonstration
recording.mp4 - Video recording of the screen during the demonstration
When you record a demonstration in the Training Gym, the system captures all your interactions in a standardized JSON format. Each user interaction is saved as a separate event in a JSONL file (JSON Lines format) called input_log.jsonl
. Understanding this format can help you create more effective demonstrations.
Event Structure
Each recorded event has the following structure:
{
"event": "eventType",
"data": {
// Event-specific properties
},
"time": 1234567890
}
event: The type of interaction (e.g., mousemove, mousedown, keydown)
data: Object containing event-specific details
time: Timestamp in milliseconds when the event occurred
Event Types
The system records the following types of events:
Accessibility Tree Events (Windows)
axtree
{ "event": "axtree", "data": { "duration": 5645, "focused_element": { "bbox": { "height": 600, "width": 800, "x": 722, "y": 335 }, "children": [], "description": "", "name": "", "role": "Pane", "states": { "enabled": true, "keyboard_focusable": false, "visible": true }, "value": "" }, "queries": { "cursor": { "element": { /* Element details */ }, "position": { "x": 1270, "y": 688 } }, "random1": { "element": { /* Element details */ }, "position": { "x": 378, "y": 400 } }, "random2": { "element": { /* Element details */ }, "position": { "x": 1985, "y": 813 } } }, "tree": [ /* Array of UI elements and their properties */ ] }, "time": 1741119497902 }
Provides comprehensive information about the accessibility tree, capturing the structure and properties of all on-screen elements from every open application.
duration: Time in milliseconds that the scan took
focused_element: Details about the currently focused UI element
queries: Information about elements at specific positions
tree: Full hierarchical structure of all visible UI elements, including:
bbox: Bounding box (position and dimensions)
children: Nested UI elements
description: Element description
name: Element name
role: Element type (e.g., "Pane", "Text", "Button", "Image")
states: Element properties (enabled, focusable, visible)
value: Element content value
Mouse Events
mousemove
{ "event": "mousemove", "data": { "x": 123, "y": 456 }, "time": 1234567890 }
Tracks the mouse cursor position on screen.
mousedown
{ "event": "mousedown", "data": { "x": 123, "y": 456, "button": "Left" }, "time": 1234567890 }
Recorded when a mouse button is pressed.
mouseup
{ "event": "mouseup", "data": { "x": 123, "y": 456, "button": "Left" }, "time": 1234567890 }
Recorded when a mouse button is released.
mousewheel
{ "event": "mousewheel", "data": { "delta": -120 }, "time": 1234567890 }
Tracks scrolling with positive values for scrolling down and negative for scrolling up.
Keyboard Events
keydown
{ "event": "keydown", "data": { "key": "A" }, "time": 1234567890 }
Recorded when a key is pressed.
keyup
{ "event": "keyup", "data": { "key": "A" }, "time": 1234567890 }
Recorded when a key is released.
Key Identifiers
The system supports both Windows and Mac key formats:
Windows Format Keys
Letters:
A
,B
,C
, ...Z
Numbers:
Zero
,One
,Two
, ...Nine
Modifiers:
Shift
,LeftCtrl
,RightCtrl
,LeftAlt
,RightAlt
Navigation:
Left
,Right
,Up
,Down
,Home
,End
,PageUp
,PageDown
Special:
Space
,Return
,Backspace
,Escape
,Tab
,Delete
, etc.Symbols:
BackTick
,BackSlash
,ForwardSlash
,Plus
,Minus
, etc.
Mac Format Keys
Letters:
KeyA
,KeyB
,KeyC
, ...KeyZ
Numbers:
Num0
,Num1
,Num2
, ...Num9
Modifiers:
ShiftLeft
,ShiftRight
,ControlLeft
,ControlRight
,Alt
,AltGr
, etc.Navigation:
LeftArrow
,RightArrow
,UpArrow
,DownArrow
Function keys:
F1
,F2
, ...F12
Symbols:
BackQuote
,Equal
,Minus
,LeftBracket
,RightBracket
, etc.
Event Processing
The demonstration system processes these raw events into higher-level actions:
mouseclick: A brief press and release at nearly the same position
mousedrag: A sequence of mouse movements with the button held down
type: A sequence of character inputs combined into text
hotkey: Special key combinations like keyboard shortcuts
Tips for Quality Recordings
Clear Actions: Make deliberate, clear mouse movements and clicks
Consistent Typing: Type at a steady pace to ensure accurate capture
Complete Workflows: Ensure you capture all steps without skipping any
Minimize Errors: While the system can handle corrections, try to minimize misclicks or typing errors
Demonstration Metadata (meta.json)
The meta.json
file contains important contextual information about each demonstration:
{
"id": "20250304_151816", // Unique identifier for the demonstration (timestamp format)
"timestamp": "2025-03-04T15:18:16.813394600-05:00", // When the demonstration was recorded (ISO 8601 format)
"duration_seconds": 21, // Total length of the demonstration in seconds
"status": "completed", // Current status of the demonstration (completed, in-progress, aborted)
"title": "Find Kendrick Lamar concert tickets", // Title of the demonstration task
"description": "", // Optional additional description
"platform": "windows", // Operating system used (windows, macos, linux)
"arch": "x86_64", // System architecture
"version": "10.0.26100", // OS version
"locale": "en-GB", // System language/region setting
"primary_monitor": { // Information about the main display
"width": 3440, // Screen width in pixels
"height": 1440 // Screen height in pixels
},
"reason": "fail", // Why the demonstration ended (fail = task failed, done = task completed)
"quest": { // Information about the assigned task
"title": "Find Kendrick Lamar concert tickets", // Task title
"app": "StubHub", // Primary application to be used
"icon_url": "https://s2.googleusercontent.com/s2/favicons?domain=stubhub.com&sz=64", // Icon for the application
"objectives": [ // Step-by-step objectives for the task
"Open <app>StubHub</app> website in your browser",
"Search for Kendrick Lamar concert in Inglewood",
"Select quantity of tickets and apply price filter",
"Review available options before purchase"
],
"content": "Hi! I need to find 4 tickets to the Kendrick Lamar concert in Inglewood, and I'm trying to keep it under $500. Can you assist me with using StubHub?", // Task instructions in natural language
"pool_id": "67af7cf3a903635118192a5d", // Identifier for the task pool
"reward": { // Reward information
"time": 1741119480000, // Timestamp when reward was calculated
"max_reward": 7 // Maximum possible reward for completing this task (in $VIRAL)
}
}
}
Video Recording (recording.mp4)
Each demonstration includes a screen recording that shows exactly what the user saw and did during the task. This provides:
Visual context for all the recorded interactions
Confirmation of the UI state during each action
A reference for how the task should be completed
Visual verification for quality evaluation
Understanding all components of the demonstration data helps you create high-quality submissions that achieve better scores in the grading system, maximizing both AI training effectiveness and your rewards.
Last updated