Demonstration Data
Demonstration Data
Every aspect of our platform is designed to generate high-quality training data for multimodal computer-use agents. From AI-generated tasks to expert demonstrations, we capture comprehensive data that helps train AI to understand and replicate human computer interactions.
Demonstration Files Structure
Each recorded demonstration consists of multiple files:
input_log.jsonl - Detailed event log of all user interactions
meta.json - Metadata and configuration information about the demonstration
recording.mp4 - Video recording of the screen during the demonstration
When you record a demonstration in the Training Gym, the system captures all your interactions in a standardized JSON format. Each user interaction is saved as a separate event in a JSONL file (JSON Lines format) called input_log.jsonl
. Understanding this format can help you create more effective demonstrations.
Event Structure
Each recorded event has the following structure:
event: The type of interaction (e.g., mousemove, mousedown, keydown)
data: Object containing event-specific details
time: Timestamp in milliseconds when the event occurred
Event Types
The system records the following types of events:
Accessibility Tree Events (Windows)
axtree
Provides comprehensive information about the accessibility tree, capturing the structure and properties of all on-screen elements from every open application.
duration: Time in milliseconds that the scan took
focused_element: Details about the currently focused UI element
queries: Information about elements at specific positions
tree: Full hierarchical structure of all visible UI elements, including:
bbox: Bounding box (position and dimensions)
children: Nested UI elements
description: Element description
name: Element name
role: Element type (e.g., "Pane", "Text", "Button", "Image")
states: Element properties (enabled, focusable, visible)
value: Element content value
Mouse Events
mousemove
Tracks the mouse cursor position on screen.
mousedown
Recorded when a mouse button is pressed.
mouseup
Recorded when a mouse button is released.
mousewheel
Tracks scrolling with positive values for scrolling down and negative for scrolling up.
Keyboard Events
keydown
Recorded when a key is pressed.
keyup
Recorded when a key is released.
Key Identifiers
The system supports both Windows and Mac key formats:
Windows Format Keys
Letters:
A
,B
,C
, ...Z
Numbers:
Zero
,One
,Two
, ...Nine
Modifiers:
Shift
,LeftCtrl
,RightCtrl
,LeftAlt
,RightAlt
Navigation:
Left
,Right
,Up
,Down
,Home
,End
,PageUp
,PageDown
Special:
Space
,Return
,Backspace
,Escape
,Tab
,Delete
, etc.Symbols:
BackTick
,BackSlash
,ForwardSlash
,Plus
,Minus
, etc.
Mac Format Keys
Letters:
KeyA
,KeyB
,KeyC
, ...KeyZ
Numbers:
Num0
,Num1
,Num2
, ...Num9
Modifiers:
ShiftLeft
,ShiftRight
,ControlLeft
,ControlRight
,Alt
,AltGr
, etc.Navigation:
LeftArrow
,RightArrow
,UpArrow
,DownArrow
Function keys:
F1
,F2
, ...F12
Symbols:
BackQuote
,Equal
,Minus
,LeftBracket
,RightBracket
, etc.
Event Processing
The demonstration system processes these raw events into higher-level actions:
mouseclick: A brief press and release at nearly the same position
mousedrag: A sequence of mouse movements with the button held down
type: A sequence of character inputs combined into text
hotkey: Special key combinations like keyboard shortcuts
Tips for Quality Recordings
Clear Actions: Make deliberate, clear mouse movements and clicks
Consistent Typing: Type at a steady pace to ensure accurate capture
Complete Workflows: Ensure you capture all steps without skipping any
Minimize Errors: While the system can handle corrections, try to minimize misclicks or typing errors
Demonstration Metadata (meta.json)
The meta.json
file contains important contextual information about each demonstration:
Video Recording (recording.mp4)
Each demonstration includes a screen recording that shows exactly what the user saw and did during the task. This provides:
Visual context for all the recorded interactions
Confirmation of the UI state during each action
A reference for how the task should be completed
Visual verification for quality evaluation
Understanding all components of the demonstration data helps you create high-quality submissions that achieve better scores in the grading system, maximizing both AI training effectiveness and your rewards.
Last updated