Occlusion-Aware Video Object Inpainting

Lei Ke


Yu-Wing Tai

Kwai Inc.

Overview Video

Supplemental Video

Visualization Results




Conventional video inpainting is neither object-oriented nor occlusion-aware, making it liable to obvious artifacts when large occluded object regions are inpainted. This paper presents occlusion-aware video object inpainting, which recovers both the complete shape and appearance for occluded objects in videos given their visible mask segmentation.

To facilitate this new research, we construct the first large-scale video object inpainting benchmark {\em YouTube-VOI} to provide realistic occlusion scenarios with both occluded and visible object masks available. Our technical contribution VOIN jointly performs video object shape completion and occluded texture generation. In particular, the shape completion module models long-range object coherence while the flow completion module recovers accurate flow with sharp motion boundary, for propagating temporally-consistent texture to the same moving object across frames. For more realistic results, VOIN is optimized using both T-PatchGAN and a new spatio-temporal attention-based multi-class discriminator.

Finally, we compare VOIN and strong baselines on YouTube-VOI. Experimental results clearly demonstrate the efficacy of our method including inpainting complex and dynamic objects. VOIN degrades gracefully with inaccurate input visible mask.


  author = {Ke, Lei and Tai, Yu-Wing and Tang, Chi-Keung},
  title = {Occlusion-Aware Video Object Inpainting},
  booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
  year = {2021}