Platform

From physical capture to training-ready tensors

ROBOTRAIN is the end-to-end data pipeline for embodied AI teams. We handle collection, annotation, and packaging so your engineers stay focused on models, not logistics.

Pipeline

Four stages, one continuous flow

Each stage has clear inputs and outputs so your team always knows where data sits in the lifecycle.

01

Capture

Field operators deploy calibrated POV rigs across residential and industrial sites. Every session records RGB video, depth, IMU telemetry, and controller signals in synchronized streams. Raw data is encrypted at the device and transferred over secure channels.

02

Process

Footage passes automated quality checks — blur, occlusion, sensor sync — before annotation. Human reviewers label manipulation events, object classes, scene boundaries, and safety flags. QA rejection rates are logged per annotator.

03

Package

Approved sessions become versioned releases with a schema manifest, per-frame quality scores, a provenance log, and exports for HDF5, RLDS, and MP4 + JSON sidecars.

04

Access

Approved teams pull data via API or bulk download. Releases are immutable — pin a version for reproducible experiments. New captures ship as incremental releases.

Dataset specifications

What ships in each release

Current release150 POV annotated videos
ViewpointEgo-centric (head / wrist mount)
EnvironmentsResidential · Light industrial
ModalitiesRGB · Depth · IMU · Telemetry
Annotation typesObject · Action · Scene · Safety
Export formatsHDF5 · RLDS · MP4 + JSON
VersioningSemantic tags · Immutable snapshots
AccessPrivate preview — details posted with each release

Access

Choose the tier that fits your team

Research

Academic labs & independent researchers

  • Full dataset download
  • Version-pinned snapshots
  • Community support channel
  • Non-commercial research use

Access and onboarding — coming soon

Commercial

Robotics companies & AI teams

  • Full dataset download
  • Priority new-release access
  • API access for large-scale jobs
  • Dedicated support
  • Custom capture requests

Access and onboarding — coming soon

For developers

Plug into your stack

Dataset bundles are schema-first. Drop HDF5 into a LeRobot or OpenVLA run, or use RLDS for JAX pipelines. JSON sidecars carry full annotation metadata when you need raw video.

Compatible with

  • LeRobot
  • OpenVLA
  • Octo
  • Diffusion Policy
  • Custom RLDS pipelines
How data enters the pipeline →