Multimedia

Authors and titles for February 2024

Total of 86 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2402.00045 [pdf, html, other]: Title: Detecting Multimedia Generated by Large AI Models: A Survey

Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, Shu Hu

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[2] arXiv:2402.00622 [pdf, other]: Title: Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementations

Vignesh V Menon, Adam Wieckowski, Jens Brandenburg, Benjamin Bross, Thomas Schierl, Detlev Marpe

Comments: 2024 Mile High Video (MHV)

Subjects: Multimedia (cs.MM)
[3] arXiv:2402.03413 [pdf, html, other]: Title: Perceptual Video Quality Assessment: A Survey

Xiongkuo Min, Huiyu Duan, Wei Sun, Yucheng Zhu, Guangtao Zhai

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[4] arXiv:2402.03513 [pdf, html, other]: Title: Video Super-Resolution for Optimized Bitrate and Green Online Streaming

Vignesh V Menon, Prajit T Rajendran, Amritha Premkumar, Benjamin Bross, Detlev Marpe

Comments: 2024 Picture Coding Symposium (PCS)

Subjects: Multimedia (cs.MM)
[5] arXiv:2402.03946 [pdf, other]: Title: BioNet-XR: Biological Network Visualization Framework for Virtual Reality and Mixed Reality Environments

Busra Senderin, Nurcan Tuncbag, Elif Surer

Subjects: Multimedia (cs.MM)
[6] arXiv:2402.05508 [pdf, html, other]: Title: Performance Evaluation of Associative Watermarking Using Statistical Neurodynamics

Ryoto Kanegae, Masaki Kawamura

Comments: 8 pages, 6 figures

Journal-ref: J. Phys. Soc. Jpn., Vol.93, No.11, 2024, Article ID: 114004

Subjects: Multimedia (cs.MM); Statistical Mechanics (cond-mat.stat-mech)
[7] arXiv:2402.06424 [pdf, other]: Title: Reducing Latency for Multimedia Broadcast Services Over Mobile Networks

C. M. Lentisco, L. Bellido, A. Cárdenas, R. F. Moyano, D. Fernández

Comments: 10 pages

Journal-ref: IEEE Transactions on Multimedia, vol. 19, no. 1, pp. 173-182, Jan. 2017

Subjects: Multimedia (cs.MM)
[8] arXiv:2402.06437 [pdf, other]: Title: Design of a 5G Multimedia Broadcast Application Function Supporting Adaptive Error Recovery

C. M. Lentisco, L. Bellido, A. Cárdenas, R. F. Moyano, D. Fernández

Comments: 14 pages, 10 figures

Journal-ref: in IEEE Transactions on Multimedia, vol. 25, pp. 378-388, 2023

Subjects: Multimedia (cs.MM)
[9] arXiv:2402.06945 [pdf, html, other]: Title: Evaluation Metrics for Automated Typographic Poster Generation

Sérgio M. Rebelo, J. J. Merelo, João Bicker, Penousal Machado

Comments: Paper accepted be presented in the 13th International Conference Artificial Intelligence in Music, Sound, Art and Design -- EvoMUSART 2024, Held as Part of EvoStar 2024, Aberystwyth, Wales, United Kingdom, April 3\textendash{}5, 2024

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[10] arXiv:2402.07402 [pdf, other]: Title: BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind

Yuanyuan Mao, Xin Lin, Qin Ni, Liang He

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[11] arXiv:2402.07640 [pdf, html, other]: Title: Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data

Puneet Kumar, Sarthak Malik, Balasubramanian Raman, Xiaobai Li

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[12] arXiv:2402.09062 [pdf, other]: Title: Blind Deep-Learning-Based Image Watermarking Robust Against Geometric Transformations

Hannes Mareen, Lucas Antchougov, Glenn Van Wallendael, Peter Lambert

Comments: Accepted and presented at IEEE International Conference on Consumer Electronics (ICCE) 2024

Subjects: Multimedia (cs.MM); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[13] arXiv:2402.09392 [pdf, other]: Title: LL-GABR: Energy Efficient Live Video Streaming Using Reinforcement Learning

Adithya Raman, Bekir Turkkan, Tevfik Kosar

Comments: 10 pages, 3 figures, 3 Tables

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[14] arXiv:2402.09720 [pdf, html, other]: Title: SpaceMeta: Global-Scale Massive Multi-User Virtual Interaction over LEO Satellite Constellations

Jiahe Huang, Yifei Zhu

Comments: Accepted by IEEE Satellite'23

Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[15] arXiv:2402.10805 [pdf, other]: Title: Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond

Yongqi Li, Wenjie Wang, Leigang Qu, Liqiang Nie, Wenjie Li, Tat-Seng Chua

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
[16] arXiv:2402.12629 [pdf, html, other]: Title: Television Discourse Decoded: Comprehensive Multimodal Analytics at Scale

Anmol Agarwal, Pratyush Priyadarshi, Shiven Sinha, Shrey Gupta, Hitkul Jangra, Ponnurangam Kumaraguru, Kiran Garimella

Comments: KDD 2024 [Updates for Camera Ready version]

Subjects: Multimedia (cs.MM); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
[17] arXiv:2402.12760 [pdf, html, other]: Title: A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis

Nailei Hei, Qianyu Guo, Zihao Wang, Yan Wang, Haofen Wang, Wenqiang Zhang

Comments: Accepted by The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[18] arXiv:2402.14326 [pdf, html, other]: Title: Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation

Mingxuan Yan, Yi Wang, Xuedou Xiao, Zhiqing Luo, Jianhua He, Wei Wang

Comments: Accepted by ACM Multimedia 2023

Subjects: Multimedia (cs.MM)
[19] arXiv:2402.15513 [pdf, html, other]: Title: Investigating the Generalizability of Physiological Characteristics of Anxiety

Emily Zhou, Mohammad Soleymani, Maja J. Matarić

Journal-ref: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023, pp. 4848-4855

Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Signal Processing (eess.SP); Medical Physics (physics.med-ph)
[20] arXiv:2402.18107 [pdf, html, other]: Title: Multimodal Interaction Modeling via Self-Supervised Multi-Task Learning for Review Helpfulness Prediction

HongLin Gong, Mengzhao Jia, Liqiang Jing

Comments: 10 pages,4 figures, 4 tables

Subjects: Multimedia (cs.MM)
[21] arXiv:2402.18400 [pdf, html, other]: Title: Towards Alleviating Text-to-Image Retrieval Hallucination for CLIP in Zero-shot Learning

Hanyao Wang, Yibing Zhan, Liu Liu, Liang Ding, Yan Yang, Jun Yu

Comments: This work has been submitted to the lEEE for possible publication. Copyright may betransferred without notice, after which this version may no longer be accessible

Subjects: Multimedia (cs.MM)
[22] arXiv:2402.18702 [pdf, other]: Title: Characterizing Multimedia Information Environment through Multi-modal Clustering of YouTube Videos

Niloofar Yousefi, Mainuddin Shaik, Nitin Agarwal

Comments: 14 pages, In the 4th International Conference on SMART MULTIMEDIA, 2024

Subjects: Multimedia (cs.MM)
[23] arXiv:2402.01180 (cross-list from cs.NI) [pdf, html, other]: Title: Real-time Extended Reality Video Transmission Optimization Based on Frame-priority Scheduling

Guangjin Pan, Shugong Xu, Shunqing Zhang, Xiaojing Chen, Yanzan Sun

Comments: 6 pages, 7 figures

Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM); Signal Processing (eess.SP)
[24] arXiv:2402.02210 (cross-list from cs.CV) [pdf, other]: Title: Wavelet-Decoupling Contrastive Enhancement Network for Fine-Grained Skeleton-Based Action Recognition

Haochen Chang, Jing Chen, Yilin Li, Jixiang Chen, Xiaofeng Zhang

Comments: Accepted by ICASSP 2024

Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2024, Seoul (Korea), South Korea

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[25] arXiv:2402.02369 (cross-list from cs.CV) [pdf, other]: Title: M$^3$Face: A Unified Multi-Modal Multilingual Framework for Human Face Generation and Editing

Mohammadreza Mofayezi, Reza Alipour, Mohammad Ali Kakavand, Ehsaneddin Asgari

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[26] arXiv:2402.02733 (cross-list from cs.CV) [pdf, html, other]: Title: ToonAging: Face Re-Aging upon Artistic Portrait Style Transfer

Bumsoo Kim, Abdul Muqeet, Kyuchul Lee, Sanghyun Seo

Comments: Accepted at CVPR 2024 AI4CC Workshop, Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[27] arXiv:2402.02836 (cross-list from eess.IV) [pdf, other]: Title: Perceptual Learned Image Compression via End-to-End JND-Based Optimization

Farhad Pakdaman, Sanaz Nami, Moncef Gabbouj

Comments: Copyright 2024 IEEE - Submitted to IEEE ICIP 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[28] arXiv:2402.02936 (cross-list from eess.IV) [pdf, html, other]: Title: Panoramic Image Inpainting With Gated Convolution And Contextual Reconstruction Loss

Li Yu, Yanjun Gao, Farhad Pakdaman, Moncef Gabbouj

Comments: Copyright 2024 IEEE - to appear in IEEE ICASSP 2024

Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[29] arXiv:2402.03040 (cross-list from cs.CV) [pdf, other]: Title: InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

Yiyuan Zhang, Yuhao Kang, Zhixin Zhang, Xiaohan Ding, Sanyuan Zhao, Xiangyu Yue

Comments: Code, models, and demo are available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[30] arXiv:2402.03190 (cross-list from cs.CL) [pdf, other]: Title: Unified Hallucination Detection for Multimodal Large Language Models

Xiang Chen, Chenxi Wang, Yida Xue, Ningyu Zhang, Xiaoyan Yang, Qiang Li, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen

Comments: Accepted by ACL 2024 (main conference)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)
[31] arXiv:2402.03658 (cross-list from cs.CL) [pdf, html, other]: Title: Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue

Kun Ouyang, Liqiang Jing, Xuemeng Song, Meng Liu, Yupeng Hu, Liqiang Nie

Comments: This paper got accepted by IEEE TMM

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[32] arXiv:2402.05448 (cross-list from cs.CV) [pdf, html, other]: Title: Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application

Bumsoo Kim, Sanghyun Byun, Yonghoon Jung, Wonseop Shin, Sareer UI Amin, Sanghyun Seo

Comments: 2 pages, 2 figures. Accepted as Spotlight to NeurIPS 2023 Workshop on Machine Learning for Creativity and Design

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[33] arXiv:2402.05457 (cross-list from cs.CL) [pdf, other]: Title: It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng, Chao-Han Huck Yang

Comments: Accepted to ICLR 2024, 17 pages. This work will be open sourced under MIT license

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34] arXiv:2402.05567 (cross-list from cs.SD) [pdf, other]: Title: Listening Between the Lines: Synthetic Speech Detection Disregarding Verbal Content

Davide Salvi, Temesgen Semu Balcha, Paolo Bestagini, Stefano Tubaro

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[35] arXiv:2402.05582 (cross-list from eess.IV) [pdf, other]: Title: Joint End-to-End Image Compression and Denoising: Leveraging Contrastive Learning and Multi-Scale Self-ONNs

Yuxin Xie, Li Yu, Farhad Pakdaman, Moncef Gabbouj

Comments: Copyright 2024 IEEE - Submitted to IEEE ICIP 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[36] arXiv:2402.05608 (cross-list from cs.CV) [pdf, html, other]: Title: Scalable Diffusion Models with State Space Backbone

Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[37] arXiv:2402.05887 (cross-list from eess.IV) [pdf, html, other]: Title: Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers

Onur G. Guleryuz, Philip A. Chou, Berivan Isik, Hugues Hoppe, Danhang Tang, Ruofei Du, Jonathan Taylor, Philip Davidson, Sean Fanello

Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[38] arXiv:2402.06178 (cross-list from cs.SD) [pdf, html, other]: Title: MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

Yixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

Comments: Accepted to IJCAI 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[39] arXiv:2402.06244 (cross-list from cs.CV) [pdf, html, other]: Title: Quantifying and Enhancing Multi-modal Robustness with Modality Preference

Zequn Yang, Yake Wei, Ce Liang, Di Hu

Comments: Accepted to ICLR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[40] arXiv:2402.06389 (cross-list from cs.AI) [pdf, other]: Title: Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example

Aven-Le Zhou, Yu-Ao Wang, Wei Wu, Kang Zhang

Comments: 9 pages, 10 figures

Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[41] arXiv:2402.06661 (cross-list from cs.CR) [pdf, other]: Title: Authentication and integrity of smartphone videos through multimedia container structure analysis

Carlos Quinto Huamán, Ana Lucila Sandoval Orozco, Luis Javier García Villalba

Journal-ref: Quinto Huam\'an, A. L. Sandoval Orozco, L. J. Garc\'ia Villalba: Authentication and Integrity of Smartphone Videos Through Multimedia Container Structure Analysis. Future Generation Computer Systems. Vol. 108, pp. 15-33, July 2020

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[42] arXiv:2402.06692 (cross-list from eess.IV) [pdf, html, other]: Title: HistoHDR-Net: Histogram Equalization for Single LDR to HDR Image Translation

Hrishav Bakul Barua, Ganesh Krishnasamy, KokSheik Wong, Abhinav Dhall, Kalin Stefanov

Comments: Submitted to IEEE

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[43] arXiv:2402.06777 (cross-list from cs.HC) [pdf, html, other]: Title: Capturing Cancer as Music: Cancer Mechanisms Expressed through Musification

Rostyslav Hnatyshyn, Jiayi Hong, Ross Maciejewski, Christopher Norby, Carlo C. Maley

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44] arXiv:2402.06984 (cross-list from cs.SD) [pdf, html, other]: Title: Speech motion anomaly detection via cross-modal translation of 4D motion fields from tagged MRI

Xiaofeng Liu, Fangxu Xing, Jiachen Zhuo, Maureen Stone, Jerry L. Prince, Georges El Fakhri, Jonghye Woo

Comments: SPIE Medical Imaging 2024: Image Processing

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[45] arXiv:2402.07057 (cross-list from eess.IV) [pdf, other]: Title: Rate-Quality or Energy-Quality Pareto Fronts for Adaptive Video Streaming?

Angeliki Katsenou, Xinyi Wang, Daniel Schien, David Bull

Comments: 6, submitted to a conference

Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[46] arXiv:2402.07300 (cross-list from cs.HC) [pdf, html, other]: Title: SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

Zheng Ning, Brianna L. Wimer, Kaiwen Jiang, Keyi Chen, Jerrick Ban, Yapeng Tian, Yuhang Zhao, Toby Jia-Jun Li

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[47] arXiv:2402.07466 (cross-list from cs.IR) [pdf, html, other]: Title: VCR: Video representation for Contextual Retrieval

Oron Nir, Idan Vidra, Avi Neeman, Barak Kinarti, Ariel Shamir

Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[48] arXiv:2402.07736 (cross-list from cs.IR) [pdf, other]: Title: Multimodal Learned Sparse Retrieval for Image Suggestion

Thong Nguyen, Mariya Hendriksen, Andrew Yates

Comments: 5 pages, TREC 2023

Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[49] arXiv:2402.07924 (cross-list from cs.HC) [pdf, html, other]: Title: IllusionX: An LLM-powered mixed reality personal companion

Ramez Yousri, Zeyad Essam, Yehia Kareem, Youstina Sherief, Sherry Gamil, Soha Safwat

Comments: 9 pages

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[50] arXiv:2402.08125 (cross-list from cs.RO) [pdf, html, other]: Title: Customizable Perturbation Synthesis for Robust SLAM Benchmarking

Xiaohao Xu, Tianyi Zhang, Sibo Wang, Xiang Li, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Xiaonan Huang

Comments: 40 pages

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[51] arXiv:2402.08577 (cross-list from cs.CL) [pdf, other]: Title: Test-Time Backdoor Attacks on Multimodal Large Language Models

Dong Lu, Tianyu Pang, Chao Du, Qian Liu, Xianjun Yang, Min Lin

Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[52] arXiv:2402.08846 (cross-list from cs.CL) [pdf, html, other]: Title: An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

Comments: Working in progress and will open-source soon

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53] arXiv:2402.09318 (cross-list from cs.SD) [pdf, other]: Title: Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio

Pablo Alonso-Jiménez, Leonardo Pepino, Roser Batlle-Roca, Pablo Zinemanas, Dmitry Bogdanov, Xavier Serra, Martín Rocamora

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[54] arXiv:2402.09430 (cross-list from eess.SP) [pdf, html, other]: Title: WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing

Shuokang Huang, Kaihan Li, Di You, Yichong Chen, Arvin Lin, Siying Liu, Xiaohui Li, Julie A. McCann

Comments: We present WiMANS, to our knowledge, the first dataset for multi-user activity sensing based on WiFi

Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[55] arXiv:2402.09871 (cross-list from cs.SD) [pdf, html, other]: Title: MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

Zihao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, Jinyang Luo, Yan Liu, Ming Xi, Kejun Zhang

Comments: Accepted by International Joint Conference on Artificial Intelligence 2024 (IJCAI 2024)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[56] arXiv:2402.09883 (cross-list from cs.CV) [pdf, other]: Title: Lester: rotoscope animation through video object segmentation and tracking

Ruben Tous

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Multimedia (cs.MM)
[57] arXiv:2402.10002 (cross-list from cs.CV) [pdf, html, other]: Title: MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding

Hai-Tao Yu, Mofei Song

Comments: Accepted by AAAI 2024

Journal-ref: AAAI 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[58] arXiv:2402.10294 (cross-list from cs.HC) [pdf, html, other]: Title: LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

Bryan Wang, Yuliang Li, Zhaoyang Lv, Haijun Xia, Yan Xu, Raj Sodhi

Comments: Paper accepted to the ACM Conference on Intelligent User Interfaces (ACM IUI) 2024

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[59] arXiv:2402.11250 (cross-list from eess.IV) [pdf, other]: Title: Hierarchical Prior-based Super Resolution for Point Cloud Geometry Compression

Dingquan Li, Kede Ma, Jing Wang, Ge Li

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[60] arXiv:2402.11520 (cross-list from cs.CV) [pdf, html, other]: Title: Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading

Samar Daou, Achraf Ben-Hamadou, Ahmed Rekik, Abdelaziz Kallel

Comments: submitted for review

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[61] arXiv:2402.11812 (cross-list from cs.CV) [pdf, other]: Title: Interpretable Embedding for Ad-hoc Video Search

Jiaxin Wu, Chong-Wah Ngo

Comments: accepted in ACMMM 2020

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[62] arXiv:2402.11954 (cross-list from cs.SD) [pdf, html, other]: Title: Multimodal Emotion Recognition from Raw Audio with Sinc-convolution

Xiaohui Zhang, Wenjie Fu, Mangui Liang

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[63] arXiv:2402.12121 (cross-list from cs.CL) [pdf, html, other]: Title: IRR: Image Review Ranking Framework for Evaluating Vision-Language Models

Kazuki Hayashi, Kazuma Onishi, Toma Suzuki, Yusuke Ide, Seiji Gobara, Shigeki Saito, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

Comments: 18pages, Accepted at COLING25

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[64] arXiv:2402.12412 (cross-list from cs.HC) [pdf, html, other]: Title: Dynamic and Super-Personalized Media Ecosystem Driven by Generative AI: Unpredictable Plays Never Repeating The Same

Sungjun Ahn, Hyun-Jeong Yim, Youngwan Lee, Sung-Ik Park

Comments: 13 pages, 7 figures

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Signal Processing (eess.SP)
[65] arXiv:2402.12451 (cross-list from cs.CV) [pdf, html, other]: Title: The Revolution of Multimodal Large Language Models: A Survey

Davide Caffagni, Federico Cocchi, Luca Barsellotti, Nicholas Moratelli, Sara Sarto, Lorenzo Baraldi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara

Comments: ACL 2024 (Findings)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[66] arXiv:2402.13022 (cross-list from cs.CL) [pdf, html, other]: Title: SoMeLVLM: A Large Vision Language Model for Social Media Processing

Xinnong Zhang, Haoyu Kuang, Xinyi Mou, Hanjia Lyu, Kun Wu, Siming Chen, Jiebo Luo, Xuanjing Huang, Zhongyu Wei

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[67] arXiv:2402.14947 (cross-list from cs.HC) [pdf, html, other]: Title: An Avalanche of Images on Telegram Preceded Russia's Full-Scale Invasion of Ukraine

William Theisen, Michael Yankoski, Kristina Hook, Ernesto Verdeja, Walter Scheirer, Tim Weninger

Comments: 20 pages, 7 figures

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Social and Information Networks (cs.SI)
[68] arXiv:2402.15096 (cross-list from cs.LG) [pdf, html, other]: Title: Multimodal Transformer With a Low-Computational-Cost Guarantee

Sungjin Park, Edward Choi

Comments: Accepted to ICASSP 2024 (5 pages)

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[69] arXiv:2402.15300 (cross-list from cs.CV) [pdf, html, other]: Title: Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding

Ailin Deng, Zhirui Chen, Bryan Hooi

Comments: Code URL: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[70] arXiv:2402.15444 (cross-list from cs.AI) [pdf, html, other]: Title: Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion

Yichi Zhang, Zhuo Chen, Lei Liang, Huajun Chen, Wen Zhang

Comments: Accepted by LREC-COLING 2024

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[71] arXiv:2402.15695 (cross-list from cs.HC) [pdf, other]: Title: Applied User Research in Virtual Reality: Tools, Methods, and Challenges

Leonie Bensch, Andrea Casini, Aidan Cowley, Florian Dufresne, Enrico Guerra, Paul de Medeiros, Tommy Nilsson, Flavie Rometsch, Andreas Treuer, Anna Vock

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[72] arXiv:2402.15746 (cross-list from cs.CV) [pdf, html, other]: Title: Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT

Sixiao Zheng, Jingyang Huo, Yu Wang, Yanwei Fu

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[73] arXiv:2402.15923 (cross-list from cs.LG) [pdf, html, other]: Title: Predicting Outcomes in Video Games with Long Short Term Memory Networks

Kittimate Chulajata, Sean Wu, Fabien Scalzo, Eun Sang Cha

Comments: 7 pages, 2 Figures, 2 Tables. Kittimate Chulajata and Sean Wu are considered co-first authors

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[74] arXiv:2402.16110 (cross-list from cs.IR) [pdf, html, other]: Title: Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation with Interpretability

Xin Zhou, Chunyan Miao

Comments: 12 pages, 7 figures

Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[75] arXiv:2402.16153 (cross-list from cs.SD) [pdf, html, other]: Title: ChatMusician: Understanding and Generating Music Intrinsically with LLM

Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike Guo

Comments: GitHub: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[76] arXiv:2402.16318 (cross-list from cs.CV) [pdf, html, other]: Title: Gradient-Guided Modality Decoupling for Missing-Modality Robustness

Hao Wang, Shengda Luo, Guosheng Hu, Jianguo Zhang

Comments: AAAI24

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[77] arXiv:2402.16364 (cross-list from cs.CL) [pdf, html, other]: Title: Where Do We Go from Here? Multi-scale Allocentric Relational Inference from Natural Spatial Descriptions

Tzuf Paz-Argaman, Sayali Kulkarni, John Palowitch, Jason Baldridge, Reut Tsarfaty

Journal-ref: EACL 2024

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[78] arXiv:2402.16366 (cross-list from cs.CV) [pdf, html, other]: Title: SPC-NeRF: Spatial Predictive Compression for Voxel Based Radiance Field

Zetian Song, Wenhong Duan, Yuhuai Zhang, Shiqi Wang, Siwei Ma, Wen Gao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[79] arXiv:2402.16665 (cross-list from cs.HC) [pdf, html, other]: Title: The Interaction Fidelity Model: A Taxonomy to Distinguish the Aspects of Fidelity in Virtual Reality

Michael Bonfert, Thomas Muender, Ryan P. McMahan, Frank Steinicke, Doug Bowman, Rainer Malaka, Tanja Döring

Comments: 34 pages incl. references and appendix

Journal-ref: International Journal of Human-Computer Interaction by Taylor and Francis, 2024, 1-33

Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR); Multimedia (cs.MM)
[80] arXiv:2402.17723 (cross-list from cs.CV) [pdf, html, other]: Title: Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen

Comments: Accepted to CVPR 2024. Project website: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81] arXiv:2402.18122 (cross-list from cs.CV) [pdf, html, other]: Title: G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment

Juan Zhang, Jiahao Chen, Cheng Wang, Zhiwang Yu, Tangquan Qi, Di Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[82] arXiv:2402.18208 (cross-list from cs.SI) [pdf, html, other]: Title: Shorts on the Rise: Assessing the Effects of YouTube Shorts on Long-Form Video Content

Prajit T. Rajendran, Kevin Creusy, Vivien Garnes

Subjects: Social and Information Networks (cs.SI); Multimedia (cs.MM)
[83] arXiv:2402.18761 (cross-list from eess.IV) [pdf, html, other]: Title: Exploration of Learned Lifting-Based Transform Structures for Fully Scalable and Accessible Wavelet-Like Image Compression

Xinyue Li, Aous Naman, David Taubman

Comments: This work has been submitted to the IEEE for possible publication

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[84] arXiv:2402.18844 (cross-list from cs.CV) [pdf, html, other]: Title: Deep learning for 3D human pose estimation and mesh recovery: A survey

Yang Liu, Changzhen Qiu, Zhiyong Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[85] arXiv:2402.18927 (cross-list from cs.CV) [pdf, html, other]: Title: Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

Xiang Chen, Wenjie Zhu, Jiayuan Chen, Tong Zhang, Changyan Yi, Jun Cai

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[86] arXiv:2402.19330 (cross-list from cs.CV) [pdf, html, other]: Title: A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation

Hanxi Li, Zhengxun Zhang, Hao Chen, Lin Wu, Bo Li, Deyin Liu, Mingwen Wang

Comments: 13 pages,7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Total of 86 entries

Showing up to 2000 entries per page: fewer | more | all