Home
Research
Publications
Members
Categories
CVPR
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
arXiv
GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives
Fast-ParC: Position Aware Global Kernel for ConvNets and ViTs
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Prompt-Matched Semantic Segmentation
MM
Toward Human Perception-Centric Video Thumbnail Generation