Abstract: Transferring visual language models (VLMs) from the image domain to the video domain has recently yielded great success on human action recognition tasks. However, standard recognition ...
Abstract: The diffusion model is widely leveraged for either controllable video generation or video interpolation. As each field has its task-specific problems, it is difficult to merely develop a ...