📊Demonstration Data

Demonstration Data

Every aspect of our platform is designed to generate high-quality training data for multimodal computer-use agents. From AI-generated tasks to expert demonstrations, we capture comprehensive data that helps train AI to understand and replicate human computer interactions.

Demonstration Files Structure

Each recorded demonstration consists of multiple files:

  1. input_log.jsonl - Detailed event log of all user interactions

  2. meta.json - Metadata and configuration information about the demonstration

  3. recording.mp4 - Video recording of the screen during the demonstration

When you record a demonstration in the Training Gym, the system captures all your interactions in a standardized JSON format. Each user interaction is saved as a separate event in a JSONL file (JSON Lines format) called input_log.jsonl. Understanding this format can help you create more effective demonstrations.

Event Structure

Each recorded event has the following structure:

{
  "event": "eventType",
  "data": {
    // Event-specific properties
  },
  "time": 1234567890
}
  • event: The type of interaction (e.g., mousemove, mousedown, keydown)

  • data: Object containing event-specific details

  • time: Timestamp in milliseconds when the event occurred

Event Types

The system records the following types of events:

Accessibility Tree Events (Windows)

  1. axtree

    Provides comprehensive information about the accessibility tree, capturing the structure and properties of all on-screen elements from every open application.

    • duration: Time in milliseconds that the scan took

    • focused_element: Details about the currently focused UI element

    • queries: Information about elements at specific positions

    • tree: Full hierarchical structure of all visible UI elements, including:

      • bbox: Bounding box (position and dimensions)

      • children: Nested UI elements

      • description: Element description

      • name: Element name

      • role: Element type (e.g., "Pane", "Text", "Button", "Image")

      • states: Element properties (enabled, focusable, visible)

      • value: Element content value

Mouse Events

  1. mousemove

    Tracks the mouse cursor position on screen.

  2. mousedown

    Recorded when a mouse button is pressed.

  3. mouseup

    Recorded when a mouse button is released.

  4. mousewheel

    Tracks scrolling with positive values for scrolling down and negative for scrolling up.

Keyboard Events

  1. keydown

    Recorded when a key is pressed.

  2. keyup

    Recorded when a key is released.

Key Identifiers

The system supports both Windows and Mac key formats:

Windows Format Keys

  • Letters: A, B, C, ... Z

  • Numbers: Zero, One, Two, ... Nine

  • Modifiers: Shift, LeftCtrl, RightCtrl, LeftAlt, RightAlt

  • Navigation: Left, Right, Up, Down, Home, End, PageUp, PageDown

  • Special: Space, Return, Backspace, Escape, Tab, Delete, etc.

  • Symbols: BackTick, BackSlash, ForwardSlash, Plus, Minus, etc.

Mac Format Keys

  • Letters: KeyA, KeyB, KeyC, ... KeyZ

  • Numbers: Num0, Num1, Num2, ... Num9

  • Modifiers: ShiftLeft, ShiftRight, ControlLeft, ControlRight, Alt, AltGr, etc.

  • Navigation: LeftArrow, RightArrow, UpArrow, DownArrow

  • Function keys: F1, F2, ... F12

  • Symbols: BackQuote, Equal, Minus, LeftBracket, RightBracket, etc.

Event Processing

The demonstration system processes these raw events into higher-level actions:

  1. mouseclick: A brief press and release at nearly the same position

  2. mousedrag: A sequence of mouse movements with the button held down

  3. type: A sequence of character inputs combined into text

  4. hotkey: Special key combinations like keyboard shortcuts

Tips for Quality Recordings

  • Clear Actions: Make deliberate, clear mouse movements and clicks

  • Consistent Typing: Type at a steady pace to ensure accurate capture

  • Complete Workflows: Ensure you capture all steps without skipping any

  • Minimize Errors: While the system can handle corrections, try to minimize misclicks or typing errors

Demonstration Metadata (meta.json)

The meta.json file contains important contextual information about each demonstration:

Video Recording (recording.mp4)

Each demonstration includes a screen recording that shows exactly what the user saw and did during the task. This provides:

  1. Visual context for all the recorded interactions

  2. Confirmation of the UI state during each action

  3. A reference for how the task should be completed

  4. Visual verification for quality evaluation

Understanding all components of the demonstration data helps you create high-quality submissions that achieve better scores in the grading system, maximizing both AI training effectiveness and your rewards.

Last updated