Most recent diffusion-based methods still show a large gap compared to non-diffusion methods for video frame interpolation, in both accuracy and efficiency. Most of them formulate the problem as a denoising procedure in latent space directly, which is less effective caused by the large latent space. We propose to model bilateral optical flow explicitly by hierarchical diffusion models, which has much smaller search space in the denoising procedure. Based on the flow diffusion model, we then use a flow-guided images synthesizer to produce the final result. We train the flow diffusion model and the image synthesizer end to end. Our method achieves state of the art in accuracy, and 10+ times faster than other diffusion-based methods.
Different strategies with diffusion models for video frame interpolation. Given an image pair \( (I_0, I_1) \), our goal is to predict the intermediate frame \(\tilde{I}_t\). (a) Most diffusion-based methods formulate the problem as a denoising process in the latent space \(\tilde{F}_t\) directly, and train the diffusion network and the encoder-decoder (“E” and “D”) network separately. This strategy is less effective due to the large latent space. On the other hand, this method cannot handle complex motions and large displacement. (b) We use a hierarchical strategy with explicit flow modeling. We first train a flow-based encoder-decoder for image synthesis with image pairs and the ground truth optical flow. Then, unlike most diffusion-based methods that denoise the latent space directly, we use a hierarchical diffusion model, conditioned on the encoder features \((F_0, F_1)\), to explicitly denoise optical flow from coarse to fine. We use the predicted bilateral flow \((\tilde{f}_0, \tilde{f}_1)\) to warp image features for the synthesizer, and finally fine-tune the synthesizer and the diffusion models jointly.
@inproceedings{yang2025hfd,
title = {Hierarchical Flow Diffusion for Efficient Frame Interpolation},
author = {Hai, Yang and Wang, Guo and Su, Tan and Jiang, Wenjie and Hu, Yinlin},
booktitle = {CVPR},
year = {2025},
}