Times are displayed in (UTC-07:00) Pacific Time (US & Canada)Change
2/3/2025 | 9:30 AM - 10:30 AM | Regency A
Write sentence with images: Revisit the large vision model with visual sentence
Author(s)
Quan Liu | Vanderbilt University
Can Cui | Vanderbilt University
Ruining Deng | Vanderbilt University
Tianyuan Yao | Vanderbilt University
Yuechen Yang | Vanderbilt University
Yucheng Tang | NVIDIA
Yuankai Huo | Vanderbilt University
Abstract
This paper introduces a novel approach to image generation from visual sentences extracted from videos. By combining a lightweight autoregressive model with a Vector Quantized Generative Adversarial Network (VQGAN), we aim to bridge the gap between quality and computational efficiency. Unlike traditional methods, which often require extensive computational resources, our approach achieves comparable performance to state-of-the-art models while improving processing efficiency. The autoregressive model captures the sequential patterns within the visual sentences, allowing for more coherent and contextually accurate image generation. Our experimental results demonstrate that this approach can generate high-quality images with a lower computational burden, making it a viable option for applications requiring real-time or resource-constrained environments. This work presents a balance between performance and efficiency that could be beneficial in various multimedia and creative domains.
Write sentence with images: Revisit the large vision model with visual sentence
Description
Date and Location: 2/3/2025 | 09:30 AM - 09:50 AM | Regency A
Primary Session Chair:
Xiao Wang | Oak Ridge National Laboratory
Session Co-Chair: