Multimedia

Authors and titles for October 2025

Total of 38 entries : 1-25 26-38

Showing up to 25 entries per page: fewer | more | all

[1] arXiv:2510.00050 [pdf, html, other]: Title: Object-AVEdit: An Object-level Audio-Visual Editing Model

Youquan Fu, Ruiyang Si, Hongfa Wang, Dongzhan Zhou, Jiacheng Sun, Ping Luo, Di Hu, Hongyuan Zhang, Xuelong Li

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:2510.01284 [pdf, html, other]: Title: Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

Chetwin Low, Weimin Wang, Calder Katyal

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3] arXiv:2510.02161 [pdf, html, other]: Title: Comparing Contrastive and Triplet Loss: Variance Analysis and Optimization Behavior

Donghuo Zeng

Comments: 8 pages, 4 tables, 3 figures

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[4] arXiv:2510.02746 [pdf, other]: Title: Detecting Notational Errors in Digital Music Scores

Géré Léo (Cnam, CEDRIC - VERTIGO), Nicolas Audebert (LaSTIG, IGN, CEDRIC - VERTIGO), Florent Jacquemard (CEDRIC - VERTIGO)

Journal-ref: International Conference on Technologies for Music Notation and Representation (TENOR) 2025, Oct 2025, Beijing, China

Subjects: Multimedia (cs.MM)
[5] arXiv:2510.03965 [pdf, html, other]: Title: FinCall-Surprise: A Large Scale Multi-modal Benchmark for Earning Surprise Prediction

Dong Shu, Yanguang Liu, Huopu Zhang, Mengnan Du

Subjects: Multimedia (cs.MM)
[6] arXiv:2510.04396 [pdf, html, other]: Title: Evaluating Keyframe Layouts for Visual Known-Item Search in Homogeneous Collections

Bastian Jäckl, Jiří Kruchina, Lucas Joos, Daniel A. Keim, Ladislav Peška, Jakub Lokoč

Comments: 28 Pages, 17 Figures

Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR)
[7] arXiv:2510.05839 [pdf, html, other]: Title: Towards Robust and Realible Multimodal Fake News Detection with Incomplete Modality

Hengyang Zhou, Yiwei Wei, Jian Yang, Zhenyu Zhang

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[8] arXiv:2510.06060 [pdf, html, other]: Title: Controllable Audio-Visual Viewpoint Generation from 360° Spatial Information

Christian Marinoni, Riccardo Fosco Gramaccioni, Eleonora Grassucci, Danilo Comminiello

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[9] arXiv:2510.07326 [pdf, other]: Title: Audio-Visual Separation with Hierarchical Fusion and Representation Alignment

Han Hu, Dongheng Lin, Qiming Huang, Yuqi Hou, Hyung Jin Chang, Jianbo Jiao

Subjects: Multimedia (cs.MM); Sound (cs.SD)
[10] arXiv:2510.07355 [pdf, html, other]: Title: AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues

Krish Patel, Dingkun Zhou, Ajay Kankipati, Akshaj Gupta, Zeyi Austin Li, Mohul Shukla, Vibhor Narang, Sara Kofman, Zongli Ye, Grace Wang, Xiaoyu Shi, Tingle Li, Guan-Ting Lin, Kan Jen Cheng, Huang-Cheng Chou, Jiachen Lian, Gopala Anumanchipalli

Subjects: Multimedia (cs.MM); Sound (cs.SD)
[11] arXiv:2510.00006 (cross-list from cs.SD) [pdf, other]: Title: Unpacking Musical Symbolism in Online Communities: Content-Based and Network-Centric Approaches

Kajwan Ziaoddini

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Computers and Society (cs.CY); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[12] arXiv:2510.00058 (cross-list from eess.IV) [pdf, html, other]: Title: Variable Rate Image Compression via N-Gram Context based Swin-transformer

Priyanka Mudgal, Feng Liu

Comments: Accepted at ISVC 2025

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[13] arXiv:2510.00261 (cross-list from cs.CL) [pdf, html, other]: Title: Retrieval-Augmented Generation for Electrocardiogram-Language Models

Xiaoyu Song, William Han, Tony Chen, Chaojing Duan, Michael A. Rosenberg, Emerson Liu, Ding Zhao

Comments: 5 pages, 2 figures; Submitted to ICASSP 2026

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[14] arXiv:2510.00481 (cross-list from cs.NI) [pdf, html, other]: Title: Make a Video Call with LLM: A Measurement Campaign over Five Mainstream Apps

Jiayang Xu, Xiangjie Huang, Zijie Li, Zili Meng

Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Performance (cs.PF)
[15] arXiv:2510.00990 (cross-list from cs.CY) [pdf, html, other]: Title: Disc-Cover Complexity Trends in Music Illustrations from Sinatra to Swift

Nicolas Fracaro, Stefano Cecconello, Mauro Conti, Niccolò Di Marco, Alessandro Galeazzi

Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[16] arXiv:2510.01009 (cross-list from cs.CV) [pdf, html, other]: Title: POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency

Ashim Dahal, Ankit Ghimire, Saydul Akbar Murad, Nick Rahimi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[17] arXiv:2510.01174 (cross-list from cs.CV) [pdf, html, other]: Title: Code2Video: A Code-centric Paradigm for Educational Video Generation

Yanzhe Chen, Kevin Qinghong Lin, Mike Zheng Shou

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[18] arXiv:2510.01361 (cross-list from eess.IV) [pdf, other]: Title: An Efficient Quality Metric for Video Frame Interpolation Based on Motion-Field Divergence

Conall Daly, Darren Ramsook, Anil Kokaram

Comments: IEEE 17th International Conference on Quality of Multimedia Experience 2025 accepted manuscript, 7 pages

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[19] arXiv:2510.01698 (cross-list from cs.IR) [pdf, html, other]: Title: TalkPlay-Tools: Conversational Music Recommendation with LLM Tool Calling

Seungheon Doh, Keunwoo Choi, Juhan Nam

Comments: Accepted for publication at The Workshop on AI for Music, Neural Information Processing Systems (NeurIPS-AI4Music)

Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:2510.02790 (cross-list from cs.CV) [pdf, html, other]: Title: MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding

Jingyuan Deng, Yujiu Yang

Comments: accepted to emnlp2025 findings

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[21] arXiv:2510.03833 (cross-list from eess.IV) [pdf, html, other]: Title: Towards Robust and Generalizable Continuous Space-Time Video Super-Resolution with Events

Shuoyan Wei, Feng Li, Shengeng Tang, Runmin Cong, Yao Zhao, Meng Wang, Huihui Bai

Comments: 17 pages, 12 figures, 14 tables. Under review

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[22] arXiv:2510.04010 (cross-list from cs.IR) [pdf, html, other]: Title: Visual Lifelog Retrieval through Captioning-Enhanced Interpretation

Yu-Fei Shih, An-Zi Yen, Hen-Hsen Huang, Hsin-Hsi Chen

Journal-ref: 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 2024, pp. 479-486

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[23] arXiv:2510.04024 (cross-list from cs.CV) [pdf, html, other]: Title: Enhancing Fake News Video Detection via LLM-Driven Creative Process Simulation

Yuyan Bu, Qiang Sheng, Juan Cao, Shaofei Wang, Peng Qi, Yuhui Shi, Beizhe Hu

Comments: ACM CIKM 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[24] arXiv:2510.04577 (cross-list from cs.SD) [pdf, html, other]: Title: Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers

Juncheng Wang, Chao Xu, Cheng Yu, Zhe Hu, Haoyu Xie, Guoqi Yu, Lei Shang, Shujun Wang

Comments: Accepted to EMNLP 2025

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[25] arXiv:2510.04630 (cross-list from cs.CV) [pdf, html, other]: Title: SFANet: Spatial-Frequency Attention Network for Deepfake Detection

Vrushank Ahire, Aniruddh Muley, Shivam Zample, Siddharth Verma, Pranav Menon, Surbhi Madan, Abhinav Dhall

Journal-ref: IEEE SPS Signal Processing Cup at ICASSP 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

Total of 38 entries : 1-25 26-38

Showing up to 25 entries per page: fewer | more | all