Given sparse-view RGB-D videos, PhysHanDI reconstructs a dense 3D hand–deformable object interaction model through a three-stage optimization pipeline: (1) hand reconstruction, fitting a dense parametric hand model (MANO) to the RGB-D observations; (2) object reconstruction, fitting a Spring–Mass model whose deformations are simulated from interaction forces induced by the reconstructed hand motions; and (3) hand refinement, which refines the hands via inverse physics using the fitted physics-based object model. Unlike previous linear pipelines, we establish a novel cyclical dependency: reconstructed 3D hand motions derive interaction forces for physically plausible object simulation, and the resulting deformable object provides an inverse-physics prior to refine the hand poses.
Compared to prior state-of-the-art baselines, PhysHanDI produces more accurate object simulations that are better aligned with the ground-truth observations and the interacting hand contacts, on both reconstruction and future prediction. Yellow circles indicate regions where baselines are less accurately aligned.
To evaluate reconstruction quality under dense hand–deformable object interactions, we introduce the DenseHDI dataset, captured with RealSense D455 RGB-D cameras that record synchronized three-view videos. Unlike the existing PhysTwin dataset, which mainly features sparse, point-like contacts, DenseHDI targets richer, full-palm interactions—such as wiping with a dishcloth or folding a pouch—that induce denser and more challenging hand–object contacts.
@inproceedings{lee2026physhandi,
title={PhysHanDI: Physics-Based Reconstruction of Hand-Deformable Object Interactions},
author={Lee, Jihyun and Lee, Changmin and Kim, Donghwan and Kim, Tae-Kyun},
booktitle={ICML},
year={2026}
}