Multimedia

Authors and titles for June 2025

Total of 153 entries : 1-50 51-100 101-150 151-153

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2506.00868 [pdf, html, other]: Title: Multiverse Through Deepfakes: The MultiFakeVerse Dataset of Person-Centric Visual and Conceptual Manipulations

Parul Gupta, Shreya Ghosh, Tom Gedeon, Thanh-Toan Do, Abhinav Dhall

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[2] arXiv:2506.01211 [pdf, html, other]: Title: Iola Walker: A Mobile Footfall Detection System for Music Composition

Will James

Subjects: Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[3] arXiv:2506.01668 [pdf, html, other]: Title: Small Stickers, Big Meanings: A Multilingual Sticker Semantic Understanding Dataset with a Gamified Approach

Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang

Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR)
[4] arXiv:2506.02380 [pdf, html, other]: Title: EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR

Zihao Ding, Cheng-Tse Lee, Mufeng Zhu, Tao Guan, Yuan-Chun Sun, Cheng-Hsin Hsu, Yao Liu

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC)
[5] arXiv:2506.02414 [pdf, html, other]: Title: StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion

Fengjin Li, Jie Wang, Yadong Niu, Yongqing Wang, Meng Meng, Jian Luan, Zhiyong Wu

Comments: 5 pages, 2 figures, Accepted by Interspeech 2025, Demo: this https URL

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6] arXiv:2506.02997 [pdf, html, other]: Title: Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style-Rich Representation

Yongqi Wang, Chunlei Zhang, Hangting Chen, Zhou Zhao, Dong Yu

Subjects: Multimedia (cs.MM)
[7] arXiv:2506.03530 [pdf, other]: Title: How Far Are We from Predicting Missing Modalities with Foundation Models?

Guanzhou Ke, Yi Xie, Xiaoli Wang, Guoqing Chao, Bo Wang, Shengfeng He

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[8] arXiv:2506.05851 [pdf, html, other]: Title: DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection

Marcel Klemt, Carlotta Segna, Anna Rohrbach

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:2506.05987 [pdf, html, other]: Title: The JPEG XL Image Coding System: History, Features, Coding Tools, Design Rationale, and Future

Jon Sneyers, Jyrki Alakuijala, Luca Versari, Zoltán Szabadka, Sami Boukortt, Amnon Cohen-Tidhar, Moritz Firsching, Evgenii Kliuchnikov, Tal Lev-Ami, Eric Portis, Thomas Richter, Osamu Watanabe

Comments: 73 pages, 62 figures

Subjects: Multimedia (cs.MM)
[10] arXiv:2506.06018 [pdf, html, other]: Title: Optimization-Free Universal Watermark Forgery with Regenerative Diffusion Models

Chaoyi Zhu, Zaitang Li, Renyi Yang, Robert Birke, Pin-Yu Chen, Tsung-Yi Ho, Lydia Y. Chen

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[11] arXiv:2506.06037 [pdf, html, other]: Title: SVD: Spatial Video Dataset

M. H. Izadimehr, Milad Ghanbari, Guodong Chen, Wei Zhou, Xiaoshuai Hao, Mallesham Dasari, Christian Timmerer, Hadi Amirpour

Subjects: Multimedia (cs.MM)
[12] arXiv:2506.06691 [pdf, html, other]: Title: An Efficient Digital Watermarking Technique for Small Scale devices

Kaushik Talathi, Aparna Santra Biswas

Comments: 28 pages, 11 figures, 4 tables

Subjects: Multimedia (cs.MM); Cryptography and Security (cs.CR)
[13] arXiv:2506.06743 [pdf, html, other]: Title: The State-of-the-Art in Lifelog Retrieval: A Review of Progress at the ACM Lifelog Search Challenge Workshop 2022-24

Allie Tran, Werner Bailer, Duc-Tien Dang-Nguyen, Graham Healy, Steve Hodges, Björn Þór Jónsson, Luca Rossetto, Klaus Schoeffmann, Minh-Triet Tran, Lucia Vadicamo, Cathal Gurrin

Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR)
[14] arXiv:2506.06938 [pdf, other]: Title: Experimental Evaluation of Static Image Sub-Region-Based Search Models Using CLIP

Bastian Jäckl, Vojtěch Kloda, Daniel A. Keim, Jakub Lokoč

Comments: 14 pages, 4 figures, 2 tables

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[15] arXiv:2506.07076 [pdf, html, other]: Title: Harmony-Aware Music-driven Motion Synthesis with Perceptual Constraint on UGC Datasets

Xinyi Wu, Haohong Wang, Aggelos K. Katsaggelos

Subjects: Multimedia (cs.MM)
[16] arXiv:2506.09506 [pdf, html, other]: Title: Dynamic Sub-region Search in Homogeneous Collections Using CLIP

Bastian Jäckl, Vojtěch Kloda, Daniel A. Keim, Jakub Lokoč

Comments: 18 pages, 4 figures, 5 tables

Subjects: Multimedia (cs.MM)
[17] arXiv:2506.09795 [pdf, html, other]: Title: Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model for Video Quality Assessment

Amritha Premkumar, Prajit T Rajendran, Vignesh V Menon

Comments: ICME 2025

Subjects: Multimedia (cs.MM)
[18] arXiv:2506.10001 [pdf, html, other]: Title: Semantic Communication-Enabled Cloud-Edge-End-collaborative Metaverse Services Architecure

Yuxuan Li, Sheng Jinag, Bizhu Wang

Comments: arXiv admin note: text overlap with arXiv:2407.13764 by other authors

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[19] arXiv:2506.10002 [pdf, html, other]: Title: EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis

Jianwu Fang, Lei-Lei Li, Zhedong Zheng, Hongkai Yu, Jianru Xue, Zhengguo Li, Tat-Seng Chua

Comments: Accepted by IEEE-TMM

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[20] arXiv:2506.10003 [pdf, html, other]: Title: Integrating multimedia documents in 3D city models for a better understanding of territories

C.Gautier, J. Delanoy, G. Gesquière

Comments: 8 pages, 11 figures

Journal-ref: sprs-annals-X-4-W2-2022-69-2022

Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC)
[21] arXiv:2506.10004 [pdf, other]: Title: Immersive Multimedia Communication: State-of-the-Art on eXtended Reality Streaming

Haopeng Wang, Haiwei Dong, Abdulmotaleb El Saddik

Comments: accepted by ACM Transactions on Multimedia Computing, Communications, and Applications

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Networking and Internet Architecture (cs.NI)
[22] arXiv:2506.10006 [pdf, other]: Title: HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction

Jie Qin, Wei Yang, Yan Su, Yiran Zhu, Weizhen Li, Yunyue Pan, Chengchang Pan, Honggang Qi

Comments: 7 pages,5 figures,3 tables,submitted to the 33rd ACM International Conference on Multimedia(ACM MM 2025)

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[23] arXiv:2506.10007 [pdf, html, other]: Title: Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space

Kangwei Liu, Junwu Liu, Xiaowei Yi, Jinlin Guo, Yun Cao

Comments: Accepted by ICME2025

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[24] arXiv:2506.10008 [pdf, html, other]: Title: Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics

Yi-Chun Chen

Comments: This paper has been submitted to ACM Multimedia 2025 and is currently under review

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[25] arXiv:2506.10010 [pdf, other]: Title: Multimodal Emotion Coupling via Speech-to-Facial and Bodily Gestures in Dyadic Interaction

Von Ralph Dane Marquez Herbuela, Yukie Nagai

Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:2506.10011 [pdf, html, other]: Title: WDMIR: Wavelet-Driven Multimodal Intent Recognition

Weiyin Gong, Kai Zhang, Yanghai Zhang, Qi Liu, Xinjie Sun, Junyu Lu, Linbo Zhu

Comments: Accepted at IJCAI 2025, 9pages, 6figures

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
[27] arXiv:2506.10012 [pdf, other]: Title: Thief of Truth: VR comics about the relationship between AI and humans

Joonhyung Bae

Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC)
[28] arXiv:2506.10013 [pdf, html, other]: Title: Immersive Fantasy Based on Digital Nostalgia: Environmental Narratives for the Korean Millennials and Gen Z

Yerin Doh, Joonhyung Bae

Comments: Accepted at ISEA 2025 (International Symposium on Electronic Art)

Subjects: Multimedia (cs.MM); Computers and Society (cs.CY)
[29] arXiv:2506.10016 [pdf, other]: Title: A Survey of Generative Categories and Techniques in Multimodal Large Language Models

Longzhen Han, Awes Mubarak, Almas Baimagambetov, Nikolaos Polatidis, Thar Baker

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[30] arXiv:2506.10416 [pdf, html, other]: Title: Can Sound Replace Vision in LLaVA With Token Substitution?

Ali Vosoughi, Jing Bi, Pinxin Liu, Yunlong Tang, Chenliang Xu

Comments: 29 pages including references and appendices

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2506.14803 [pdf, html, other]: Title: Omnidirectional Video Super-Resolution using Deep Learning

Arbind Agrahari Baniya, Tsz-Kwan Lee, Peter W. Eklund, Sunil Aryal

Journal-ref: in IEEE Transactions on Multimedia, vol. 26, pp. 540-554, 2024

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[32] arXiv:2506.16258 [pdf, html, other]: Title: ViFusion: In-Network Tensor Fusion for Scalable Video Feature Indexing

Yisu Wang, Yixiang Zhu, Xinjiao Li, Yulong Zhang, Ruilong Wu, Dirk Kutscher

Subjects: Multimedia (cs.MM)
[33] arXiv:2506.16495 [pdf, html, other]: Title: DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution Transformation

Changsheng Gao, Zijie Liu, Li Li, Dong Liu, Xiaoyan Sun, Weisi Lin

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[34] arXiv:2506.17623 [pdf, html, other]: Title: Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning?

Yuesheng Huang, Peng Zhang, Riliang Liu, Jiaqi Liang

Comments: 4 figures,7 tables

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[35] arXiv:2506.18055 [pdf, html, other]: Title: Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings

Jason Clarke, Yoshihiko Gotoh, Stefan Goetze

Comments: Accepted to EUSIPCO 2025. 5 pages, 1 figure. To appear in the Proceedings of the 33rd European Signal Processing Conference (EUSIPCO), September 8-12, 2025, Palermo, Italy

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36] arXiv:2506.19769 [pdf, html, other]: Title: A Survey of Multi-sensor Fusion Perception for Embodied AI: Background, Methods, Challenges and Prospects

Shulan Ruan, Rongwei Wang, Xuchen Shen, Huijie Liu, Baihui Xiao, Jun Shi, Kun Zhang, Zhenya Huang, Yu Liu, Enhong Chen, You He

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[37] arXiv:2506.20944 [pdf, html, other]: Title: E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs

Van-Hoang Phan, Long-Khanh Pham, Dang Vu, Anh-Duy Tran, Minh-Son Dao

Comments: Accepted to AsiaCCS 2025 @ SCID

Subjects: Multimedia (cs.MM); Cryptography and Security (cs.CR)
[38] arXiv:2506.21865 [pdf, html, other]: Title: RiverEcho: Real-Time Interactive Digital System for Ancient Yellow River Culture

Haofeng Wang, Yilin Guo, Zehao Li, Tong Yue, Yizong Wang, Enci Zhang, Rongqun Lin, Feng Gao, Shiqi Wang, Siwei Ma

Comments: IEEE International Conference on Multimedia and Expo Workshop, 2025.(Accepted)

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL)
[39] arXiv:2506.23484 [pdf, html, other]: Title: TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity

Yuzhuo Chen, Zehua Ma, Han Fang, Weiming Zhang, Nenghai Yu

Comments: Accepted by ICCV 2025 (2025 IEEE/CVF International Conference on Computer Vision)

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[40] arXiv:2506.23707 [pdf, html, other]: Title: Efficient and Accurate Image Provenance Analysis: A Scalable Pipeline for Large-scale Images

Jiewei Lai, Lan Zhang, Chen Tang, Pengcheng Sun

Comments: 25 pages, 6 figures

Subjects: Multimedia (cs.MM)
[41] arXiv:2506.00562 (cross-list from cs.CV) [pdf, html, other]: Title: SEED: A Benchmark Dataset for Sequential Facial Attribute Editing with Diffusion Models

Yule Zhu, Ping Liu, Zhedong Zheng, Wei Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[42] arXiv:2506.00667 (cross-list from cs.CV) [pdf, html, other]: Title: Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis

Vasilii Korolkov

Comments: 24 pages, 8 figures, submitted as a preprint. ArXiv preprint only, not submitted to a journal yet

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[43] arXiv:2506.00854 (cross-list from cs.CL) [pdf, html, other]: Title: EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG

Jacky Tai-Yu Lu, Jung Chiang, Chi-Sheng Chen, Anna Nai-Yun Tung, Hsiang Wei Hu, Yuan Chiao Cheng

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Neurons and Cognition (q-bio.NC)
[44] arXiv:2506.00974 (cross-list from cs.CV) [pdf, html, other]: Title: Camera Trajectory Generation: A Comprehensive Survey of Methods, Metrics, and Future Directions

Zahra Dehghanian, Pouya Ardekhani, Amir Vahedi, Hamid Beigy, Hamid R. Rabiee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[45] arXiv:2506.01109 (cross-list from cs.CV) [pdf, html, other]: Title: CountingFruit: Real-Time 3D Fruit Counting with Language-Guided Semantic Gaussian Splatting

Fengze Li, Yangle Liu, Jieming Ma, Hai-Ning Liang, Yaochun Shen, Huangxiang Li, Zhijing Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[46] arXiv:2506.01319 (cross-list from cs.SD) [pdf, html, other]: Title: Learning Sparsity for Effective and Efficient Music Performance Question Answering

Xingjian Diao, Tianzhen Yang, Chunhui Zhang, Weiyi Wu, Ming Cheng, Jiang Gui

Comments: Accepted to the main conference of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[47] arXiv:2506.01478 (cross-list from cs.LG) [pdf, html, other]: Title: MUDI: A Multimodal Biomedical Dataset for Understanding Pharmacodynamic Drug-Drug Interactions

Tung-Lam Ngo, Ba-Hoang Tran, Duy-Cat Can, Trung-Hieu Do, Oliver Y. Chén, Hoang-Quynh Le

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Multimedia (cs.MM); Quantitative Methods (q-bio.QM)
[48] arXiv:2506.01482 (cross-list from cs.LG) [pdf, html, other]: Title: Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?

Zijian Zhao, Dian Jin, Zijing Zhou, Xiaoyu Zhang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[49] arXiv:2506.01822 (cross-list from cs.CV) [pdf, html, other]: Title: GSCodec Studio: A Modular Framework for Gaussian Splat Compression

Sicheng Li, Chengzhen Wu, Hao Li, Xiang Gao, Yiyi Liao, Lu Yu

Comments: Repository of the project: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[50] arXiv:2506.01850 (cross-list from cs.CV) [pdf, html, other]: Title: MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs

Wayner Barrios, Andrés Villa, Juan León Alcázar, SouYoung Jin, Bernard Ghanem

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)

Total of 153 entries : 1-50 51-100 101-150 151-153

Showing up to 50 entries per page: fewer | more | all