FT-LLM-2026-RAMEN
/

hsr-idm-siglip2

inverse-dynamics

Model card Files Files and versions

HSR Inverse Dynamics Model (IDM)

Inverse Dynamics Model trained on Toyota HSR robot manipulation episodes.

Architecture

Vision encoder: SigLIP-2 (google/siglip2-base-patch16-224, frozen)
Action head: Flow Matching Transformer (4-layer, hidden_dim=512)
Input: (frame_t, frame_t+1) from head + hand cameras → 4 images
Output: action_chunk (H=4 future actions, 8-DOF)

Training

Dataset: 44,892 train / 4,987 val frame-action pairs from approved HSR episodes
50 epochs, AdamW lr=1e-4, cosine+warmup schedule (per-batch stepping)
Mixed precision: BF16

Files

best_model.pt: Full model checkpoint (weights only, no optimizer state)
action_stats.json: Action normalization statistics (mean, std, min, max)

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

loading