Multimedia

Authors and titles for May 2025

Total of 146 entries : 1-25 51-75 76-100 101-125 126-146

Showing up to 25 entries per page: fewer | more | all

[126] arXiv:2505.20405 (cross-list from cs.CV) [pdf, html, other]: Title: What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Lorenzo Baraldi, Davide Bucciarelli, Federico Betti, Marcella Cornia, Lorenzo Baraldi, Nicu Sebe, Rita Cucchiara

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[127] arXiv:2505.20606 (cross-list from cs.CL) [pdf, html, other]: Title: Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation

Dancheng Liu, Amir Nassereldine, Chenhui Xu, Jinjun Xiong

Comments: in submission

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[128] arXiv:2505.20638 (cross-list from cs.SD) [pdf, html, other]: Title: Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs

Wenhao You, Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Zhongyu Ouyang, Chiyu Ma, Tingxuan Wu, Noah Wei, Zong Ke, Ming Cheng, Soroush Vosoughi, Jiang Gui

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[129] arXiv:2505.20756 (cross-list from eess.AS) [pdf, html, other]: Title: REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion

Ishan D. Biyani, Nirmesh J. Shah, Ashishkumar P. Gudmalwar, Pankaj Wasnik, Rajiv R. Shah

Comments: Accepted in INTERSPEECH 2025

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[130] arXiv:2505.20770 (cross-list from cs.SD) [pdf, html, other]: Title: Can Large Language Models Predict Audio Effects Parameters from Natural Language?

Seungheon Doh, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Juhan Nam, Yuki Mitsufuji

Comments: Accepted for publication at The IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[131] arXiv:2505.21445 (cross-list from cs.SD) [pdf, html, other]: Title: VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin

Zhiqi Ai, Meixuan Bao, Zhiyong Chen, Zhi Yang, Xinnuo Li, Shugong Xu

Comments: 5 pages, 4 figures, Accepted by Interspeech 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[132] arXiv:2505.21459 (cross-list from cs.DB) [pdf, html, other]: Title: LazyVLM: Neuro-Symbolic Approach to Video Analytics

Xiangru Jian, Wei Pang, Zhengyuan Dong, Chao Zhang, M. Tamer Özsu

Comments: 5 pages, 2 figures, Working paper

Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM)
[133] arXiv:2505.21905 (cross-list from cs.CV) [pdf, html, other]: Title: Reference-Guided Identity Preserving Face Restoration

Mo Zhou, Keren Ye, Viraj Shah, Kangfu Mei, Mauricio Delbracio, Peyman Milanfar, Vishal M. Patel, Hossein Talebi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[134] arXiv:2505.21966 (cross-list from cs.HC) [pdf, other]: Title: MapStory: Prototyping Editable Map Animations with LLM Agents

Aditya Gunturu, Ben Pearman, Keiichi Ihara, Morteza Faraji, Bryan Wang, Rubaiat Habib Kazi, Ryo Suzuki

Comments: UIST 2025. Project page: this https URL

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[135] arXiv:2505.22053 (cross-list from cs.SD) [pdf, html, other]: Title: AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation

Yan Rong, Jinting Wang, Guangzhi Lei, Shan Yang, Li Liu

Subjects: Sound (cs.SD); Multiagent Systems (cs.MA); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[136] arXiv:2505.22266 (cross-list from cs.SD) [pdf, html, other]: Title: FGAS: Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation

Jialin Yan, Yu Cheng, Zhaoxia Yin, Xinpeng Zhang, Shilin Wang, Tanfeng Sun, Xinghao Jiang

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[137] arXiv:2505.22517 (cross-list from cs.CL) [pdf, html, other]: Title: Multi-MLLM Knowledge Distillation for Out-of-Context News Detection

Yimeng Gu, Zhao Tong, Ignacio Castro, Shu Wu, Gareth Tyson

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[138] arXiv:2505.22633 (cross-list from cs.CL) [pdf, html, other]: Title: Spatial Knowledge Graph-Guided Multimodal Synthesis

Yida Xue, Zhen Bi, Jinnan Yang, Jungang Lou, Huajun Chen, Ningyu Zhang

Comments: Ongoing work

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[139] arXiv:2505.22705 (cross-list from cs.CV) [pdf, html, other]: Title: HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer

Qi Cai, Jingwen Chen, Yang Chen, Yehao Li, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Yiheng Zhang, Fengbin Gao, Peihan Xu, Yimeng Wang, Kai Yu, Wenxuan Chen, Ziwei Feng, Zijian Gong, Jianzhuang Pan, Yi Peng, Rui Tian, Siyu Wang, Bo Zhao, Ting Yao, Tao Mei

Comments: Source codes and models are available at this https URL and this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[140] arXiv:2505.23268 (cross-list from cs.CV) [pdf, html, other]: Title: Unsupervised Transcript-assisted Video Summarization and Highlight Detection

Spyros Barbakos, Charalampos Antoniadis, Gerasimos Potamianos, Gianluca Setti

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[141] arXiv:2505.23586 (cross-list from cs.CV) [pdf, html, other]: Title: Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features

Ziyong Wang, Charith Abhayaratne

Comments: This paper was presented at the British Machine Vision Conference 2024 workshop on Media authenticity in the age of artificial intelligence

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[142] arXiv:2505.23727 (cross-list from cs.CV) [pdf, html, other]: Title: PixelThink: Towards Efficient Chain-of-Pixel Reasoning

Song Wang, Gongfan Fang, Lingdong Kong, Xiangtai Li, Jianyun Xu, Sheng Yang, Qiang Li, Jianke Zhu, Xinchao Wang

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[143] arXiv:2505.23784 (cross-list from cs.SD) [pdf, html, other]: Title: Learning Normal Patterns in Musical Loops

Shayan Dadman, Bernt Arild Bremdal, Børre Bang, Rune Dalmo

Comments: 27 pages, 10 figures

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[144] arXiv:2505.23822 (cross-list from cs.CL) [pdf, html, other]: Title: Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction

Mai Ali, Christopher Lucasius, Tanmay P. Patel, Madison Aitken, Jacob Vorstman, Peter Szatmari, Marco Battaglia, Deepa Kundur

Comments: 6 pages, 1 figure, 3 tables. The corresponding author is Mai Ali (maia dot ali at mail dot utoronto dot ca). Christopher Lucasius and Tanmay P. Patel contributed equally

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[145] arXiv:2505.24253 (cross-list from cs.CV) [pdf, other]: Title: Interactive Video Generation via Domain Adaptation

Ishaan Rawal, Suryansh Kumar

Comments: Preprint. Under Review

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[146] arXiv:2505.24518 (cross-list from cs.SD) [pdf, html, other]: Title: ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation

Jiatong Shi, Yifan Cheng, Bo-Hao Su, Hye-jin Shim, Jinchuan Tian, Samuele Cornell, Yiwen Zhao, Siddhant Arora, Shinji Watanabe

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Total of 146 entries : 1-25 51-75 76-100 101-125 126-146

Showing up to 25 entries per page: fewer | more | all