GPT4SGG: Synthesizing Scene Graphs from Holistic ...
Learning scene graphs from natural language descriptions has proven to be a cheap and promising scheme for Scene Graph Generation (SGG). However, such unstructured caption...
arXivToward Human Perception-Centric Video Thumbnail G...
Video thumbnail plays an essential role in summarizing video content into a compact and concise image for users to browse efficiently. However, automatically generating attractive...
MMFast-ParC: Position Aware Global Kernel for ConvN...
Transformer models have made tremendous progress in various fields in recent years. In the field of computer vision, vision transformers (ViTs) also become strong alternatives...
arXivTowards a Unified View on Visual Parameter-Effici...
Since the release of various large-scale natural language processing (NLP) pre-trained models, parameter efficient transfer learning (PETL) has become a popular paradigm capable of achieving...
arXiv