Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.MM

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Multimedia

Authors and titles for July 2025

Total of 147 entries : 1-50 51-100 101-147
Showing up to 50 entries per page: fewer | more | all
[1] arXiv:2507.00926 [pdf, other]
Title: HyperFusion: Hierarchical Multimodal Ensemble Learning for Social Media Popularity Prediction
Liliang Ye (1), Yunyao Zhang (1), Yafeng Wu (1), Yi-Ping Phoebe Chen (2), Junqing Yu (1), Wei Yang (1), Zikai Song (1) ((1) Huazhong University of Science and Technology, Wuhan, China, (2) La Trobe University, Melbourne, Australia)
Subjects: Multimedia (cs.MM); Machine Learning (cs.LG)
[2] arXiv:2507.01320 [pdf, html, other]
Title: Robust Multi-generation Learned Compression of Point Cloud Attribute
Xiangzuo Liu, Zhikai Liu, PengPeng Yu, Ruishan Huang, Fan Liang
Subjects: Multimedia (cs.MM)
[3] arXiv:2507.02080 [pdf, html, other]
Title: TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation
Yubeen Lee, Sangeun Lee, Chaewon Park, Junyeop Cha, Eunil Park
Comments: 9 pages, 2 figures, 2 tables
Subjects: Multimedia (cs.MM); Sound (cs.SD)
[4] arXiv:2507.02626 [pdf, html, other]
Title: VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning
Siran Chen, Boyu Chen, Chenyun Yu, Yuxiao Luo, Ouyang Yi, Lei Cheng, Chengxiang Zhuo, Zang Li, Yali Wang
Subjects: Multimedia (cs.MM)
[5] arXiv:2507.04758 [pdf, html, other]
Title: Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning
Jiayun Hu, Yueyi He, Tianyi Liang, Changbo Wang, Chenhui Li
Subjects: Multimedia (cs.MM)
[6] arXiv:2507.05113 [pdf, html, other]
Title: CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation
Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang
Comments: 15 pages, 9 figures, 15 tables. To appear in the Proceedings of the 32nd ACM International Conference on Multimedia (MM '25)
Subjects: Multimedia (cs.MM); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[7] arXiv:2507.07396 [pdf, html, other]
Title: IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
Zeyang Song, Shimin Zhang, Yuhong Chou, Jibin Wu, Haizhou Li
Comments: Under review of TNNLS
Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:2507.07911 [pdf, other]
Title: The Potential of Olfactory Stimuli in Stress Reduction through Virtual Reality
Yasmin Elsaddik Valdivieso, Mohd Faisal, Karim Alghoul, Monireh (Monica)Vahdati, Kamran Gholizadeh Hamlabadi, Fedwa Laamarti, Hussein Al Osman, Abdulmotaleb El Saddik
Comments: Accepted to IEEE Medical Measurements & Applications (MeMeA) 2025
Journal-ref: 2025 IEEE Medical Measurements & Applications (MeMeA), Chania, Greece, 2025, pp. 1-6
Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC)
[9] arXiv:2507.07938 [pdf, html, other]
Title: Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency
Abolfazl Zarghani, Amirhossein Ebrahimi, Amir Malekesfandiari
Subjects: Multimedia (cs.MM)
[10] arXiv:2507.08064 [pdf, other]
Title: PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning
Yibo Lyu, Rui Shao, Gongwei Chen, Yijie Zhu, Weili Guan, Liqiang Nie
Comments: Accepted to ACM MM 2025
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[11] arXiv:2507.08104 [pdf, html, other]
Title: VideoConviction: A Multimodal Benchmark for Human Conviction and Stock Market Recommendations
Michael Galarnyk, Veer Kejriwal, Agam Shah, Yash Bhardwaj, Nicholas Meyer, Anand Krishnan, Sudheer Chava
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[12] arXiv:2507.08590 [pdf, html, other]
Title: Visual Semantic Description Generation with MLLMs for Image-Text Matching
Junyu Chen, Yihua Gao, Mingyong Li
Comments: Accepted by ICME2025 oral
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[13] arXiv:2507.09647 [pdf, html, other]
Title: KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection
Peican Zhu, Yubo Jing, Le Cheng, Keke Tang, Yangming Guo
Comments: Accepted by ACM MM 2025
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[14] arXiv:2507.09945 [pdf, html, other]
Title: ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
Huilai Li, Yonghao Dang, Ying Xing, Yiming Wang, Jianqin Yin
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[15] arXiv:2507.10066 [pdf, html, other]
Title: LayLens: Improving Deepfake Understanding through Simplified Explanations
Abhijeet Narang, Parul Gupta, Liuyijia Su, Abhinav Dhall
Comments: Accepted to ACM ICMI 2025 Demos
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[16] arXiv:2507.10109 [pdf, html, other]
Title: DualDub: Video-to-Soundtrack Generation via Joint Speech and Background Audio Synthesis
Wenjie Tian, Xinfa Zhu, Haohe Liu, Zhixian Zhao, Zihao Chen, Chaofan Ding, Xinhan Di, Junjie Zheng, Lei Xie
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2507.10859 [pdf, html, other]
Title: MultiVox: Benchmarking Voice Assistants for Multimodal Interactions
Ramaneswaran Selvakumar, Ashish Seth, Nishit Anand, Utkarsh Tyagi, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha
Comments: Work In Progress
Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[18] arXiv:2507.13415 [pdf, html, other]
Title: SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection
Peican Zhu, Yubo Jing, Le Cheng, Bin Chen, Xiaodong Cui, Lianwei Wu, Keke Tang
Comments: Accepted by SMC 2025
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[19] arXiv:2507.14915 [pdf, html, other]
Title: Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
Xiaojie Li, Ronghui Li, Shukai Fang, Shuzhao Xie, Xiaoyang Guo, Jiaqing Zhou, Junkun Peng, Zhi Wang
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:2507.15491 [pdf, html, other]
Title: Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval
Deyu Zhang, Tingting Long, Jinrui Zhang, Ligeng Chen, Ju Ren, Yaoxue Zhang
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[21] arXiv:2507.15673 [pdf, html, other]
Title: Point Cloud Streaming with Latency-Driven Implicit Adaptation using MoQ
Andrew Freeman, Michael Rudolph, Amr Rizk
Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[22] arXiv:2507.16396 [pdf, html, other]
Title: Knowledge-aware Diffusion-Enhanced Multimedia Recommendation
Xian Mo, Fei Liu, Rui Tang, Jintao, Gao, Hao Liu
Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR)
[23] arXiv:2507.17232 [pdf, html, other]
Title: A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task
Mashiro Toyooka, Kiyoharu Aizawa, Yoko Yamakata
Comments: Accepted to ACM Multimedia 2025. The dataset are publicly available at: this https URL
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[24] arXiv:2507.17653 [pdf, html, other]
Title: QuMAB: Query-based Multi-Annotator Behavior Modeling with Reliability under Sparse Labels
Liyun Zhang, Zheng Lian, Hong Liu, Takanori Takebe, Yuta Nakashima
Comments: 12 pages. arXiv admin note: substantial text overlap with arXiv:2503.15237
Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR)
[25] arXiv:2507.18750 [pdf, other]
Title: CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation
Hyunwoo Oh, SeungJu Cha, Kwanyoung Lee, Si-Woo Kim, Dong-Jin Kim
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:2507.18932 [pdf, html, other]
Title: MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
Lei Zhang, Xin Zhou, Chaoyue He, Di Wang, Yi Wu, Hong Xu, Wei Liu, Chunyan Miao
Comments: Accepted at ACM MM 2025
Subjects: Multimedia (cs.MM); Computation and Language (cs.CL)
[27] arXiv:2507.19863 [pdf, html, other]
Title: Anchoring Trends: Mitigating Social Media Popularity Prediction Drift via Feature Clustering and Expansion
Chia-Ming Lee, Bo-Cheng Qiu, Cheng-Jun Kang, Yi-Hsuan Wu, Jun-Lin Chen, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Chung Hsu
Comments: Accepted by ACM Multimedia 2025
Subjects: Multimedia (cs.MM)
[28] arXiv:2507.20627 [pdf, other]
Title: Controllable Video-to-Music Generation with Multiple Time-Varying Conditions
Junxian Wu, Weitao You, Heda Zuo, Dengming Zhang, Pei Chen, Lingyun Sun
Comments: Accepted by the 33rd ACM International Conference on Multimedia (ACMMM 2025). The project page is available at this https URL
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29] arXiv:2507.20738 [pdf, other]
Title: Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph Reasoning
Yu Zhao, Ying Zhang, Xuhui Sui, Baohang Zhou, Haoze Zhu, Jeff Z. Pan, Xiaojie Yuan
Comments: Accepted by ACM MM 2025
Subjects: Multimedia (cs.MM)
[30] arXiv:2507.21395 [pdf, html, other]
Title: Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion
Zeyu Deng, Yanhui Lu, Jiashu Liao, Shuang Wu, Chongfeng Wei
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2507.21557 [pdf, html, other]
Title: PC-JND: Subjective Study and Dataset on Just Noticeable Difference for Point Clouds in 6DoF Virtual Reality
Chunling Fan, Yun Zhang, Dietmar Saupe, Raouf Hamzaoui, Weisi Lin
Comments: 13 pages, 10 figures, Journal
Subjects: Multimedia (cs.MM)
[32] arXiv:2507.21926 [pdf, other]
Title: Efficient Sub-pixel Motion Compensation in Learned Video Codecs
Théo Ladune, Thomas Leguay, Pierrick Philippe, Gordon Clare, Félix Henry
Subjects: Multimedia (cs.MM); Image and Video Processing (eess.IV)
[33] arXiv:2507.22731 [pdf, html, other]
Title: GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation
Quanwei Yang, Luying Huang, Kaisiyuan Wang, Jiazhi Guan, Shengyi He, Fengguo Li, Hang Zhou, Lingyun Yu, Yingying Li, Haocheng Feng, Hongtao Xie
Comments: 10 pages, 5 figures, Accepted by ICCV 2025
Subjects: Multimedia (cs.MM)
[34] arXiv:2507.23444 [pdf, html, other]
Title: Hybrid CNN-Mamba Enhancement Network for Robust Multimodal Sentiment Analysis
Xiang Li, Xianfu Cheng, Xiaoming Zhang, Zhoujun Li
Subjects: Multimedia (cs.MM)
[35] arXiv:2507.00055 (cross-list from cs.LG) [pdf, html, other]
Title: Leveraging Unlabeled Audio-Visual Data in Speech Emotion Recognition using Knowledge Distillation
Varsha Pendyala, Pedro Morgado, William Sethares
Comments: Accepted at INTERSPEECH 2025
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
[36] arXiv:2507.00466 (cross-list from cs.SD) [pdf, html, other]
Title: Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture
Sebastian Murgul, Michael Heizmann
Comments: Accepted to the 22nd Sound and Music Computing Conference (SMC), 2025
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[37] arXiv:2507.00498 (cross-list from cs.SD) [pdf, html, other]
Title: MuteSwap: Visual-informed Silent Video Identity Conversion
Yifan Liu, Yu Fang, Zhouhan Lin
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[38] arXiv:2507.00950 (cross-list from cs.CV) [pdf, html, other]
Title: MVP: Winning Solution to SMP Challenge 2025 Video Track
Liliang Ye (1), Yunyao Zhang (1), Yafeng Wu (1), Yi-Ping Phoebe Chen (2), Junqing Yu (1), Wei Yang (1), Zikai Song (1) ((1) Huazhong University of Science and Technology, Wuhan, China, (2) La Trobe University, Melbourne, Australia)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[39] arXiv:2507.01022 (cross-list from eess.AS) [pdf, html, other]
Title: Workflow-Based Evaluation of Music Generation Systems
Shayan Dadman, Bernt Arild Bremdal, Andreas Bergsland
Comments: 54 pages, 3 figures, 6 tables, 5 appendices
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
[40] arXiv:2507.01582 (cross-list from cs.SD) [pdf, html, other]
Title: Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder
Jing Luo, Xinyu Yang, Jie Wei
Comments: Accepted by IEEE SMC 2025
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[41] arXiv:2507.01652 (cross-list from cs.CV) [pdf, html, other]
Title: Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective
Yuxin Mao, Zhen Qin, Jinxing Zhou, Hui Deng, Xuyang Shen, Bin Fan, Jing Zhang, Yiran Zhong, Yuchao Dai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[42] arXiv:2507.01776 (cross-list from cs.HC) [pdf, html, other]
Title: Human-Machine Collaboration-Guided Space Design: Combination of Machine Learning Models and Humanistic Design Concepts
Yuxuan Yang
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[43] arXiv:2507.01800 (cross-list from cs.CV) [pdf, html, other]
Title: HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision
Shengli Zhou, Jianuo Zhu, Qilin Huang, Fangjing Wang, Yanfu Zhang, Feng Zheng
Comments: ICANN 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[44] arXiv:2507.02000 (cross-list from cs.IR) [pdf, html, other]
Title: Why Multi-Interest Fairness Matters: Hypergraph Contrastive Multi-Interest Learning for Fair Conversational Recommender System
Yongsen Zheng, Zongxuan Xie, Guohua Wang, Ziyao Liu, Liang Lin, Kwok-Yan Lam
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Multimedia (cs.MM)
[45] arXiv:2507.02271 (cross-list from cs.CV) [pdf, html, other]
Title: Spotlighting Partially Visible Cinematic Language for Video-to-Audio Generation via Self-distillation
Feizhen Huang, Yu Wu, Yutian Lin, Bo Du
Comments: Accepted by IJCAI 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[46] arXiv:2507.02900 (cross-list from cs.CV) [pdf, html, other]
Title: Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions
Vineet Kumar Rakesh, Soumya Mazumdar, Research Pratim Maity, Sarbajit Pal, Amitabha Das, Tapas Samanta
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[47] arXiv:2507.02941 (cross-list from cs.CV) [pdf, html, other]
Title: GameTileNet: A Semantic Dataset for Low-Resolution Game Art in Procedural Content Generation
Yi-Chun Chen, Arnav Jhala
Comments: Note: This is a preprint version of a paper submitted to AIIDE 2025. It includes additional discussion of limitations and future directions that were omitted from the conference version due to space constraints
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[48] arXiv:2507.03286 (cross-list from cs.HC) [pdf, html, other]
Title: Gaze and Glow: Exploring Editing Processes on Social Media through Interactive Exhibition
Yang Hong, Jie-Yi Feng, Yi-Chun Yao, I-Hsuan Cho, Yu-Ting Lin, Ying-Yu Chen
Comments: 6 pages, 6 figures, to be published in DIS 2025 (Provocations and Works in Progress)
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[49] arXiv:2507.03434 (cross-list from cs.CV) [pdf, html, other]
Title: Unlearning the Noisy Correspondence Makes CLIP More Robust
Haochen Han, Alex Jinpeng Wang, Peijun Ye, Fangming Liu
Comments: ICCV 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[50] arXiv:2507.03797 (cross-list from cs.HC) [pdf, html, other]
Title: Assessing the Viability of Wave Field Synthesis in VR-Based Cognitive Research
Benjamin Kahl
Comments: 35 pages
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 147 entries : 1-50 51-100 101-147
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack