HumanSplat: Closing the Loop Between Human Mesh Recovery and Gaussian Splatting Avatars

Zong, Yeheng*; Kung, Pou-Chun*; Pan, Yike; Isaacson, Seth; Chen, Yizhou; Vasudevan, Ram; Skinner, Katherine A.

HumanSplat: Closing the Loop Between Human Mesh Recovery and Gaussian Splatting Avatars
Yeheng Zong^*
yehengz@umich.edu Pou-Chun Kung^*
pckung@umich.edu Yike Pan
yikepan@umich.edu Seth Isaacson
sethgi@umich.edu Yizhou Chen
yizhouch@umich.edu
Ram Vasudevan
ramv@umich.edu Katherine A. Skinner
kskin@umich.edu

University of Michigan
Robotics
^*Indicates Equal Contribution

Paper Code (Coming Soon!)

Abstract

Accurately recovering human pose and appearance from video is an essential component of scene reconstruction, with applications to motion capture, motion prediction, virtual reality, and digital twinning. Despite significant interest in building realistic human avatars from video, this paper demonstrates that existing methods do not accurately recover the 3D geometry of humans. ViT-based approaches are not consistently reliable and can overfit to 2D views, while NeRF- and Gaussian Splatting–based avatars treat pose and appearance separately, limiting rendering generalization to new poses. To resolve these shortcomings, this paper proposes HumanSplat, a joint optimization framework that refines 3D human poses while simultaneously learning a high-fidelity avatar for novel-view and novel-pose synthesis. Our key insight is to close the loop between geometric pose estimation and differentiable rendering. Unlike prior human avatar methods that rely on accurate human pose obtained through motion capture systems or offline refinement, which are impractical in in-the-wild scenarios, our approach uses only human mesh estimates from a state-of-the-art human pose estimator to better reflect real-world conditions. Therefore, instead of using the human pose only as a deformation prior, HumanSplat backpropagates photometric, segmentation, and depth losses through a differentiable renderer to the pose parameters and global position. This coupling refines the global 3D pose over time, improving accuracy and alignment while producing better renderings from novel views. Experiments show consistent improvements over pose recovery baselines that omit image-level refinement and avatar baselines that decouple pose estimation from avatar reconstruction.

High-level overview of HumanSplat — HumanSplat advances both SMPL estimation and novel-view rendering by enabling joint optimization of the human mesh and Gaussian splats, taking SMPL estimates from HMR 2.0 as input. In contrast, prior works, such as GART, rely on accurate SMPL from motion capture or refined SMPL to deform Gaussians for avatar reconstruction. These methods also deliberately decouple SMPL from Gaussian splats to improve rendering quality, but this prevents refinement of human pose estimation and ultimately results in sub-optimal novel-view performance. HumanSplat addresses this limitation and allows human pose and Gaussian splats to mutually refine each other. As a result, it achieves more accurate SMPL estimation and consistently higher rendering quality.

Method Overview

CAMEL illustration — Cloth-Aware Mesh-Embedded Loss (CAMEL) illustration. CAMEL loosely couples the SMPL mesh with the Gaussian representation. The key motivation is to better model clothing by allowing local non-rigid deformations. CAMEL constrains Gaussians to remain close to the human mesh while enforcing surface alignment and ensuring full mesh coverage. n is the normal of the mesh vertex and δ represents the tolerance margin between the cloth and the body.

HumanSplat Deforms on Shohei

HumanSplat deforms on Shohei's motion.

HumanSplat Deforms on Brady

HumanSplat deforms on Brady's motion.

BibTeX

@article{zongkung2025humansplat,
  title={HumanSplat: Closing the Loop Between Human Mesh Recovery and Gaussian Splatting Avatars},
  author={Zong, Yeheng* and Kung, Pou-Chun* and Pan, Yike and Isaacson, Seth and Chen, Yizhou and Vasudevan, Ram and Skinner, Katherine A.},
  journal={In Submission 2025},
  year={2025},
  url={https://scottyehengz.github.io/HumanSplat/}
}

More Works from Our Lab

DEFT: Differentiable Branched Discrete Elastic Rods for Modeling Furcated DLOs in Real-Time

LONER: LiDAR Only Neural Representations for Real-time SLAM

Let's Make a Splan: Risk-Aware Trajectory Optimization in a Normalized Gaussian Splat