Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.MM

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Multimedia

Authors and titles for February 2024

Total of 86 entries
Showing up to 2000 entries per page: fewer | more | all
[1] arXiv:2402.00045 [pdf, html, other]
Title: Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, Shu Hu
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[2] arXiv:2402.00622 [pdf, other]
Title: Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementations
Vignesh V Menon, Adam Wieckowski, Jens Brandenburg, Benjamin Bross, Thomas Schierl, Detlev Marpe
Comments: 2024 Mile High Video (MHV)
Subjects: Multimedia (cs.MM)
[3] arXiv:2402.03413 [pdf, html, other]
Title: Perceptual Video Quality Assessment: A Survey
Xiongkuo Min, Huiyu Duan, Wei Sun, Yucheng Zhu, Guangtao Zhai
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[4] arXiv:2402.03513 [pdf, html, other]
Title: Video Super-Resolution for Optimized Bitrate and Green Online Streaming
Vignesh V Menon, Prajit T Rajendran, Amritha Premkumar, Benjamin Bross, Detlev Marpe
Comments: 2024 Picture Coding Symposium (PCS)
Subjects: Multimedia (cs.MM)
[5] arXiv:2402.03946 [pdf, other]
Title: BioNet-XR: Biological Network Visualization Framework for Virtual Reality and Mixed Reality Environments
Busra Senderin, Nurcan Tuncbag, Elif Surer
Subjects: Multimedia (cs.MM)
[6] arXiv:2402.05508 [pdf, html, other]
Title: Performance Evaluation of Associative Watermarking Using Statistical Neurodynamics
Ryoto Kanegae, Masaki Kawamura
Comments: 8 pages, 6 figures
Journal-ref: J. Phys. Soc. Jpn., Vol.93, No.11, 2024, Article ID: 114004
Subjects: Multimedia (cs.MM); Statistical Mechanics (cond-mat.stat-mech)
[7] arXiv:2402.06424 [pdf, other]
Title: Reducing Latency for Multimedia Broadcast Services Over Mobile Networks
C. M. Lentisco, L. Bellido, A. Cárdenas, R. F. Moyano, D. Fernández
Comments: 10 pages
Journal-ref: IEEE Transactions on Multimedia, vol. 19, no. 1, pp. 173-182, Jan. 2017
Subjects: Multimedia (cs.MM)
[8] arXiv:2402.06437 [pdf, other]
Title: Design of a 5G Multimedia Broadcast Application Function Supporting Adaptive Error Recovery
C. M. Lentisco, L. Bellido, A. Cárdenas, R. F. Moyano, D. Fernández
Comments: 14 pages, 10 figures
Journal-ref: in IEEE Transactions on Multimedia, vol. 25, pp. 378-388, 2023
Subjects: Multimedia (cs.MM)
[9] arXiv:2402.06945 [pdf, html, other]
Title: Evaluation Metrics for Automated Typographic Poster Generation
Sérgio M. Rebelo, J. J. Merelo, João Bicker, Penousal Machado
Comments: Paper accepted be presented in the 13th International Conference Artificial Intelligence in Music, Sound, Art and Design -- EvoMUSART 2024, Held as Part of EvoStar 2024, Aberystwyth, Wales, United Kingdom, April 3\textendash{}5, 2024
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[10] arXiv:2402.07402 [pdf, other]
Title: BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind
Yuanyuan Mao, Xin Lin, Qin Ni, Liang He
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[11] arXiv:2402.07640 [pdf, html, other]
Title: Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data
Puneet Kumar, Sarthak Malik, Balasubramanian Raman, Xiaobai Li
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[12] arXiv:2402.09062 [pdf, other]
Title: Blind Deep-Learning-Based Image Watermarking Robust Against Geometric Transformations
Hannes Mareen, Lucas Antchougov, Glenn Van Wallendael, Peter Lambert
Comments: Accepted and presented at IEEE International Conference on Consumer Electronics (ICCE) 2024
Subjects: Multimedia (cs.MM); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[13] arXiv:2402.09392 [pdf, other]
Title: LL-GABR: Energy Efficient Live Video Streaming Using Reinforcement Learning
Adithya Raman, Bekir Turkkan, Tevfik Kosar
Comments: 10 pages, 3 figures, 3 Tables
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[14] arXiv:2402.09720 [pdf, html, other]
Title: SpaceMeta: Global-Scale Massive Multi-User Virtual Interaction over LEO Satellite Constellations
Jiahe Huang, Yifei Zhu
Comments: Accepted by IEEE Satellite'23
Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[15] arXiv:2402.10805 [pdf, other]
Title: Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond
Yongqi Li, Wenjie Wang, Leigang Qu, Liqiang Nie, Wenjie Li, Tat-Seng Chua
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
[16] arXiv:2402.12629 [pdf, html, other]
Title: Television Discourse Decoded: Comprehensive Multimodal Analytics at Scale
Anmol Agarwal, Pratyush Priyadarshi, Shiven Sinha, Shrey Gupta, Hitkul Jangra, Ponnurangam Kumaraguru, Kiran Garimella
Comments: KDD 2024 [Updates for Camera Ready version]
Subjects: Multimedia (cs.MM); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
[17] arXiv:2402.12760 [pdf, html, other]
Title: A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis
Nailei Hei, Qianyu Guo, Zihao Wang, Yan Wang, Haofen Wang, Wenqiang Zhang
Comments: Accepted by The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[18] arXiv:2402.14326 [pdf, html, other]
Title: Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation
Mingxuan Yan, Yi Wang, Xuedou Xiao, Zhiqing Luo, Jianhua He, Wei Wang
Comments: Accepted by ACM Multimedia 2023
Subjects: Multimedia (cs.MM)
[19] arXiv:2402.15513 [pdf, html, other]
Title: Investigating the Generalizability of Physiological Characteristics of Anxiety
Emily Zhou, Mohammad Soleymani, Maja J. Matarić
Journal-ref: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023, pp. 4848-4855
Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Signal Processing (eess.SP); Medical Physics (physics.med-ph)
[20] arXiv:2402.18107 [pdf, html, other]
Title: Multimodal Interaction Modeling via Self-Supervised Multi-Task Learning for Review Helpfulness Prediction
HongLin Gong, Mengzhao Jia, Liqiang Jing
Comments: 10 pages,4 figures, 4 tables
Subjects: Multimedia (cs.MM)
[21] arXiv:2402.18400 [pdf, html, other]
Title: Towards Alleviating Text-to-Image Retrieval Hallucination for CLIP in Zero-shot Learning
Hanyao Wang, Yibing Zhan, Liu Liu, Liang Ding, Yan Yang, Jun Yu
Comments: This work has been submitted to the lEEE for possible publication. Copyright may betransferred without notice, after which this version may no longer be accessible
Subjects: Multimedia (cs.MM)
[22] arXiv:2402.18702 [pdf, other]
Title: Characterizing Multimedia Information Environment through Multi-modal Clustering of YouTube Videos
Niloofar Yousefi, Mainuddin Shaik, Nitin Agarwal
Comments: 14 pages, In the 4th International Conference on SMART MULTIMEDIA, 2024
Subjects: Multimedia (cs.MM)
[23] arXiv:2402.01180 (cross-list from cs.NI) [pdf, html, other]
Title: Real-time Extended Reality Video Transmission Optimization Based on Frame-priority Scheduling
Guangjin Pan, Shugong Xu, Shunqing Zhang, Xiaojing Chen, Yanzan Sun
Comments: 6 pages, 7 figures
Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM); Signal Processing (eess.SP)
[24] arXiv:2402.02210 (cross-list from cs.CV) [pdf, other]
Title: Wavelet-Decoupling Contrastive Enhancement Network for Fine-Grained Skeleton-Based Action Recognition
Haochen Chang, Jing Chen, Yilin Li, Jixiang Chen, Xiaofeng Zhang
Comments: Accepted by ICASSP 2024
Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2024, Seoul (Korea), South Korea
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[25] arXiv:2402.02369 (cross-list from cs.CV) [pdf, other]
Title: M$^3$Face: A Unified Multi-Modal Multilingual Framework for Human Face Generation and Editing
Mohammadreza Mofayezi, Reza Alipour, Mohammad Ali Kakavand, Ehsaneddin Asgari
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[26] arXiv:2402.02733 (cross-list from cs.CV) [pdf, html, other]
Title: ToonAging: Face Re-Aging upon Artistic Portrait Style Transfer
Bumsoo Kim, Abdul Muqeet, Kyuchul Lee, Sanghyun Seo
Comments: Accepted at CVPR 2024 AI4CC Workshop, Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[27] arXiv:2402.02836 (cross-list from eess.IV) [pdf, other]
Title: Perceptual Learned Image Compression via End-to-End JND-Based Optimization
Farhad Pakdaman, Sanaz Nami, Moncef Gabbouj
Comments: Copyright 2024 IEEE - Submitted to IEEE ICIP 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[28] arXiv:2402.02936 (cross-list from eess.IV) [pdf, html, other]
Title: Panoramic Image Inpainting With Gated Convolution And Contextual Reconstruction Loss
Li Yu, Yanjun Gao, Farhad Pakdaman, Moncef Gabbouj
Comments: Copyright 2024 IEEE - to appear in IEEE ICASSP 2024
Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[29] arXiv:2402.03040 (cross-list from cs.CV) [pdf, other]
Title: InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
Yiyuan Zhang, Yuhao Kang, Zhixin Zhang, Xiaohan Ding, Sanyuan Zhao, Xiangyu Yue
Comments: Code, models, and demo are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[30] arXiv:2402.03190 (cross-list from cs.CL) [pdf, other]
Title: Unified Hallucination Detection for Multimodal Large Language Models
Xiang Chen, Chenxi Wang, Yida Xue, Ningyu Zhang, Xiaoyan Yang, Qiang Li, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen
Comments: Accepted by ACL 2024 (main conference)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)
[31] arXiv:2402.03658 (cross-list from cs.CL) [pdf, html, other]
Title: Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue
Kun Ouyang, Liqiang Jing, Xuemeng Song, Meng Liu, Yupeng Hu, Liqiang Nie
Comments: This paper got accepted by IEEE TMM
Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[32] arXiv:2402.05448 (cross-list from cs.CV) [pdf, html, other]
Title: Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application
Bumsoo Kim, Sanghyun Byun, Yonghoon Jung, Wonseop Shin, Sareer UI Amin, Sanghyun Seo
Comments: 2 pages, 2 figures. Accepted as Spotlight to NeurIPS 2023 Workshop on Machine Learning for Creativity and Design
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[33] arXiv:2402.05457 (cross-list from cs.CL) [pdf, other]
Title: It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng, Chao-Han Huck Yang
Comments: Accepted to ICLR 2024, 17 pages. This work will be open sourced under MIT license
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34] arXiv:2402.05567 (cross-list from cs.SD) [pdf, other]
Title: Listening Between the Lines: Synthetic Speech Detection Disregarding Verbal Content
Davide Salvi, Temesgen Semu Balcha, Paolo Bestagini, Stefano Tubaro
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[35] arXiv:2402.05582 (cross-list from eess.IV) [pdf, other]
Title: Joint End-to-End Image Compression and Denoising: Leveraging Contrastive Learning and Multi-Scale Self-ONNs
Yuxin Xie, Li Yu, Farhad Pakdaman, Moncef Gabbouj
Comments: Copyright 2024 IEEE - Submitted to IEEE ICIP 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[36] arXiv:2402.05608 (cross-list from cs.CV) [pdf, html, other]
Title: Scalable Diffusion Models with State Space Backbone
Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[37] arXiv:2402.05887 (cross-list from eess.IV) [pdf, html, other]
Title: Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers
Onur G. Guleryuz, Philip A. Chou, Berivan Isik, Hugues Hoppe, Danhang Tang, Ruofei Du, Jonathan Taylor, Philip Davidson, Sean Fanello
Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[38] arXiv:2402.06178 (cross-list from cs.SD) [pdf, html, other]
Title: MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
Yixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon
Comments: Accepted to IJCAI 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[39] arXiv:2402.06244 (cross-list from cs.CV) [pdf, html, other]
Title: Quantifying and Enhancing Multi-modal Robustness with Modality Preference
Zequn Yang, Yake Wei, Ce Liang, Di Hu
Comments: Accepted to ICLR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[40] arXiv:2402.06389 (cross-list from cs.AI) [pdf, other]
Title: Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example
Aven-Le Zhou, Yu-Ao Wang, Wei Wu, Kang Zhang
Comments: 9 pages, 10 figures
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[41] arXiv:2402.06661 (cross-list from cs.CR) [pdf, other]
Title: Authentication and integrity of smartphone videos through multimedia container structure analysis
Carlos Quinto Huamán, Ana Lucila Sandoval Orozco, Luis Javier García Villalba
Journal-ref: Quinto Huam\'an, A. L. Sandoval Orozco, L. J. Garc\'ia Villalba: Authentication and Integrity of Smartphone Videos Through Multimedia Container Structure Analysis. Future Generation Computer Systems. Vol. 108, pp. 15-33, July 2020
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[42] arXiv:2402.06692 (cross-list from eess.IV) [pdf, html, other]
Title: HistoHDR-Net: Histogram Equalization for Single LDR to HDR Image Translation
Hrishav Bakul Barua, Ganesh Krishnasamy, KokSheik Wong, Abhinav Dhall, Kalin Stefanov
Comments: Submitted to IEEE
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[43] arXiv:2402.06777 (cross-list from cs.HC) [pdf, html, other]
Title: Capturing Cancer as Music: Cancer Mechanisms Expressed through Musification
Rostyslav Hnatyshyn, Jiayi Hong, Ross Maciejewski, Christopher Norby, Carlo C. Maley
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44] arXiv:2402.06984 (cross-list from cs.SD) [pdf, html, other]
Title: Speech motion anomaly detection via cross-modal translation of 4D motion fields from tagged MRI
Xiaofeng Liu, Fangxu Xing, Jiachen Zhuo, Maureen Stone, Jerry L. Prince, Georges El Fakhri, Jonghye Woo
Comments: SPIE Medical Imaging 2024: Image Processing
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[45] arXiv:2402.07057 (cross-list from eess.IV) [pdf, other]
Title: Rate-Quality or Energy-Quality Pareto Fronts for Adaptive Video Streaming?
Angeliki Katsenou, Xinyi Wang, Daniel Schien, David Bull
Comments: 6, submitted to a conference
Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[46] arXiv:2402.07300 (cross-list from cs.HC) [pdf, html, other]
Title: SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers
Zheng Ning, Brianna L. Wimer, Kaiwen Jiang, Keyi Chen, Jerrick Ban, Yapeng Tian, Yuhang Zhao, Toby Jia-Jun Li
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[47] arXiv:2402.07466 (cross-list from cs.IR) [pdf, html, other]
Title: VCR: Video representation for Contextual Retrieval
Oron Nir, Idan Vidra, Avi Neeman, Barak Kinarti, Ariel Shamir
Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[48] arXiv:2402.07736 (cross-list from cs.IR) [pdf, other]
Title: Multimodal Learned Sparse Retrieval for Image Suggestion
Thong Nguyen, Mariya Hendriksen, Andrew Yates
Comments: 5 pages, TREC 2023
Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[49] arXiv:2402.07924 (cross-list from cs.HC) [pdf, html, other]
Title: IllusionX: An LLM-powered mixed reality personal companion
Ramez Yousri, Zeyad Essam, Yehia Kareem, Youstina Sherief, Sherry Gamil, Soha Safwat
Comments: 9 pages
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[50] arXiv:2402.08125 (cross-list from cs.RO) [pdf, html, other]
Title: Customizable Perturbation Synthesis for Robust SLAM Benchmarking
Xiaohao Xu, Tianyi Zhang, Sibo Wang, Xiang Li, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Xiaonan Huang
Comments: 40 pages
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[51] arXiv:2402.08577 (cross-list from cs.CL) [pdf, other]
Title: Test-Time Backdoor Attacks on Multimodal Large Language Models
Dong Lu, Tianyu Pang, Chao Du, Qian Liu, Xianjun Yang, Min Lin
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[52] arXiv:2402.08846 (cross-list from cs.CL) [pdf, html, other]
Title: An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen
Comments: Working in progress and will open-source soon
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53] arXiv:2402.09318 (cross-list from cs.SD) [pdf, other]
Title: Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio
Pablo Alonso-Jiménez, Leonardo Pepino, Roser Batlle-Roca, Pablo Zinemanas, Dmitry Bogdanov, Xavier Serra, Martín Rocamora
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[54] arXiv:2402.09430 (cross-list from eess.SP) [pdf, html, other]
Title: WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
Shuokang Huang, Kaihan Li, Di You, Yichong Chen, Arvin Lin, Siying Liu, Xiaohui Li, Julie A. McCann
Comments: We present WiMANS, to our knowledge, the first dataset for multi-user activity sensing based on WiFi
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[55] arXiv:2402.09871 (cross-list from cs.SD) [pdf, html, other]
Title: MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music
Zihao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, Jinyang Luo, Yan Liu, Ming Xi, Kejun Zhang
Comments: Accepted by International Joint Conference on Artificial Intelligence 2024 (IJCAI 2024)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[56] arXiv:2402.09883 (cross-list from cs.CV) [pdf, other]
Title: Lester: rotoscope animation through video object segmentation and tracking
Ruben Tous
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Multimedia (cs.MM)
[57] arXiv:2402.10002 (cross-list from cs.CV) [pdf, html, other]
Title: MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding
Hai-Tao Yu, Mofei Song
Comments: Accepted by AAAI 2024
Journal-ref: AAAI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[58] arXiv:2402.10294 (cross-list from cs.HC) [pdf, html, other]
Title: LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing
Bryan Wang, Yuliang Li, Zhaoyang Lv, Haijun Xia, Yan Xu, Raj Sodhi
Comments: Paper accepted to the ACM Conference on Intelligent User Interfaces (ACM IUI) 2024
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[59] arXiv:2402.11250 (cross-list from eess.IV) [pdf, other]
Title: Hierarchical Prior-based Super Resolution for Point Cloud Geometry Compression
Dingquan Li, Kede Ma, Jing Wang, Ge Li
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[60] arXiv:2402.11520 (cross-list from cs.CV) [pdf, html, other]
Title: Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading
Samar Daou, Achraf Ben-Hamadou, Ahmed Rekik, Abdelaziz Kallel
Comments: submitted for review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[61] arXiv:2402.11812 (cross-list from cs.CV) [pdf, other]
Title: Interpretable Embedding for Ad-hoc Video Search
Jiaxin Wu, Chong-Wah Ngo
Comments: accepted in ACMMM 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[62] arXiv:2402.11954 (cross-list from cs.SD) [pdf, html, other]
Title: Multimodal Emotion Recognition from Raw Audio with Sinc-convolution
Xiaohui Zhang, Wenjie Fu, Mangui Liang
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[63] arXiv:2402.12121 (cross-list from cs.CL) [pdf, html, other]
Title: IRR: Image Review Ranking Framework for Evaluating Vision-Language Models
Kazuki Hayashi, Kazuma Onishi, Toma Suzuki, Yusuke Ide, Seiji Gobara, Shigeki Saito, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
Comments: 18pages, Accepted at COLING25
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[64] arXiv:2402.12412 (cross-list from cs.HC) [pdf, html, other]
Title: Dynamic and Super-Personalized Media Ecosystem Driven by Generative AI: Unpredictable Plays Never Repeating The Same
Sungjun Ahn, Hyun-Jeong Yim, Youngwan Lee, Sung-Ik Park
Comments: 13 pages, 7 figures
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Signal Processing (eess.SP)
[65] arXiv:2402.12451 (cross-list from cs.CV) [pdf, html, other]
Title: The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni, Federico Cocchi, Luca Barsellotti, Nicholas Moratelli, Sara Sarto, Lorenzo Baraldi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
Comments: ACL 2024 (Findings)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[66] arXiv:2402.13022 (cross-list from cs.CL) [pdf, html, other]
Title: SoMeLVLM: A Large Vision Language Model for Social Media Processing
Xinnong Zhang, Haoyu Kuang, Xinyi Mou, Hanjia Lyu, Kun Wu, Siming Chen, Jiebo Luo, Xuanjing Huang, Zhongyu Wei
Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[67] arXiv:2402.14947 (cross-list from cs.HC) [pdf, html, other]
Title: An Avalanche of Images on Telegram Preceded Russia's Full-Scale Invasion of Ukraine
William Theisen, Michael Yankoski, Kristina Hook, Ernesto Verdeja, Walter Scheirer, Tim Weninger
Comments: 20 pages, 7 figures
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Social and Information Networks (cs.SI)
[68] arXiv:2402.15096 (cross-list from cs.LG) [pdf, html, other]
Title: Multimodal Transformer With a Low-Computational-Cost Guarantee
Sungjin Park, Edward Choi
Comments: Accepted to ICASSP 2024 (5 pages)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[69] arXiv:2402.15300 (cross-list from cs.CV) [pdf, html, other]
Title: Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding
Ailin Deng, Zhirui Chen, Bryan Hooi
Comments: Code URL: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[70] arXiv:2402.15444 (cross-list from cs.AI) [pdf, html, other]
Title: Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion
Yichi Zhang, Zhuo Chen, Lei Liang, Huajun Chen, Wen Zhang
Comments: Accepted by LREC-COLING 2024
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[71] arXiv:2402.15695 (cross-list from cs.HC) [pdf, other]
Title: Applied User Research in Virtual Reality: Tools, Methods, and Challenges
Leonie Bensch, Andrea Casini, Aidan Cowley, Florian Dufresne, Enrico Guerra, Paul de Medeiros, Tommy Nilsson, Flavie Rometsch, Andreas Treuer, Anna Vock
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[72] arXiv:2402.15746 (cross-list from cs.CV) [pdf, html, other]
Title: Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT
Sixiao Zheng, Jingyang Huo, Yu Wang, Yanwei Fu
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[73] arXiv:2402.15923 (cross-list from cs.LG) [pdf, html, other]
Title: Predicting Outcomes in Video Games with Long Short Term Memory Networks
Kittimate Chulajata, Sean Wu, Fabien Scalzo, Eun Sang Cha
Comments: 7 pages, 2 Figures, 2 Tables. Kittimate Chulajata and Sean Wu are considered co-first authors
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[74] arXiv:2402.16110 (cross-list from cs.IR) [pdf, html, other]
Title: Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation with Interpretability
Xin Zhou, Chunyan Miao
Comments: 12 pages, 7 figures
Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[75] arXiv:2402.16153 (cross-list from cs.SD) [pdf, html, other]
Title: ChatMusician: Understanding and Generating Music Intrinsically with LLM
Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike Guo
Comments: GitHub: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[76] arXiv:2402.16318 (cross-list from cs.CV) [pdf, html, other]
Title: Gradient-Guided Modality Decoupling for Missing-Modality Robustness
Hao Wang, Shengda Luo, Guosheng Hu, Jianguo Zhang
Comments: AAAI24
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[77] arXiv:2402.16364 (cross-list from cs.CL) [pdf, html, other]
Title: Where Do We Go from Here? Multi-scale Allocentric Relational Inference from Natural Spatial Descriptions
Tzuf Paz-Argaman, Sayali Kulkarni, John Palowitch, Jason Baldridge, Reut Tsarfaty
Journal-ref: EACL 2024
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[78] arXiv:2402.16366 (cross-list from cs.CV) [pdf, html, other]
Title: SPC-NeRF: Spatial Predictive Compression for Voxel Based Radiance Field
Zetian Song, Wenhong Duan, Yuhuai Zhang, Shiqi Wang, Siwei Ma, Wen Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[79] arXiv:2402.16665 (cross-list from cs.HC) [pdf, html, other]
Title: The Interaction Fidelity Model: A Taxonomy to Distinguish the Aspects of Fidelity in Virtual Reality
Michael Bonfert, Thomas Muender, Ryan P. McMahan, Frank Steinicke, Doug Bowman, Rainer Malaka, Tanja Döring
Comments: 34 pages incl. references and appendix
Journal-ref: International Journal of Human-Computer Interaction by Taylor and Francis, 2024, 1-33
Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR); Multimedia (cs.MM)
[80] arXiv:2402.17723 (cross-list from cs.CV) [pdf, html, other]
Title: Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen
Comments: Accepted to CVPR 2024. Project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81] arXiv:2402.18122 (cross-list from cs.CV) [pdf, html, other]
Title: G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment
Juan Zhang, Jiahao Chen, Cheng Wang, Zhiwang Yu, Tangquan Qi, Di Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[82] arXiv:2402.18208 (cross-list from cs.SI) [pdf, html, other]
Title: Shorts on the Rise: Assessing the Effects of YouTube Shorts on Long-Form Video Content
Prajit T. Rajendran, Kevin Creusy, Vivien Garnes
Subjects: Social and Information Networks (cs.SI); Multimedia (cs.MM)
[83] arXiv:2402.18761 (cross-list from eess.IV) [pdf, html, other]
Title: Exploration of Learned Lifting-Based Transform Structures for Fully Scalable and Accessible Wavelet-Like Image Compression
Xinyue Li, Aous Naman, David Taubman
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[84] arXiv:2402.18844 (cross-list from cs.CV) [pdf, html, other]
Title: Deep learning for 3D human pose estimation and mesh recovery: A survey
Yang Liu, Changzhen Qiu, Zhiyong Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[85] arXiv:2402.18927 (cross-list from cs.CV) [pdf, html, other]
Title: Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering
Xiang Chen, Wenjie Zhu, Jiayuan Chen, Tong Zhang, Changyan Yi, Jun Cai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[86] arXiv:2402.19330 (cross-list from cs.CV) [pdf, html, other]
Title: A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation
Hanxi Li, Zhengxun Zhang, Hao Chen, Lin Wu, Bo Li, Deyin Liu, Mingwen Wang
Comments: 13 pages,7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Total of 86 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack