Platform
From physical capture to training-ready tensors
ROBOTRAIN is the end-to-end data pipeline for embodied AI teams. We handle collection, annotation, and packaging so your engineers stay focused on models, not logistics.
Pipeline
Four stages, one continuous flow
Each stage has clear inputs and outputs so your team always knows where data sits in the lifecycle.
Capture
Field operators deploy calibrated POV rigs across residential and industrial sites. Every session records RGB video, depth, IMU telemetry, and controller signals in synchronized streams. Raw data is encrypted at the device and transferred over secure channels.
Process
Footage passes automated quality checks — blur, occlusion, sensor sync — before annotation. Human reviewers label manipulation events, object classes, scene boundaries, and safety flags. QA rejection rates are logged per annotator.
Package
Approved sessions become versioned releases with a schema manifest, per-frame quality scores, a provenance log, and exports for HDF5, RLDS, and MP4 + JSON sidecars.
Access
Approved teams pull data via API or bulk download. Releases are immutable — pin a version for reproducible experiments. New captures ship as incremental releases.
Dataset specifications
What ships in each release
Access
Choose the tier that fits your team
Research
Academic labs & independent researchers
- Full dataset download
- Version-pinned snapshots
- Community support channel
- Non-commercial research use
Access and onboarding — coming soon
Commercial
Robotics companies & AI teams
- Full dataset download
- Priority new-release access
- API access for large-scale jobs
- Dedicated support
- Custom capture requests
Access and onboarding — coming soon
For developers
Plug into your stack
Dataset bundles are schema-first. Drop HDF5 into a LeRobot or OpenVLA run, or use RLDS for JAX pipelines. JSON sidecars carry full annotation metadata when you need raw video.
Compatible with
- LeRobot
- OpenVLA
- Octo
- Diffusion Policy
- Custom RLDS pipelines