HSR Inverse Dynamics Model (IDM)

Inverse Dynamics Model trained on Toyota HSR robot manipulation episodes.

Architecture

  • Vision encoder: SigLIP-2 (google/siglip2-base-patch16-224, frozen)
  • Action head: Flow Matching Transformer (4-layer, hidden_dim=512)
  • Input: (frame_t, frame_t+1) from head + hand cameras → 4 images
  • Output: action_chunk (H=4 future actions, 8-DOF)

Training

  • Dataset: 44,892 train / 4,987 val frame-action pairs from approved HSR episodes
  • 50 epochs, AdamW lr=1e-4, cosine+warmup schedule (per-batch stepping)
  • Mixed precision: BF16

Files

  • best_model.pt: Full model checkpoint (weights only, no optimizer state)
  • action_stats.json: Action normalization statistics (mean, std, min, max)
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading