Multimedia

Authors and titles for April 2024

Total of 123 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2404.04545 [pdf, html, other]: Title: TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis

Weize Quan, Yunfei Feng, Ming Zhou, Yunzhen Zhao, Tong Wang, Dong-Ming Yan

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL)
[2] arXiv:2404.05522 [pdf, html, other]: Title: 3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering

Qingyuan Zhou, Weidong Yang, Ben Fei, Jingyi Xu, Rui Zhang, Keyi Liu, Yeqi Luo, Ying He

Comments: Accepted at AAAI-25

Subjects: Multimedia (cs.MM)
[3] arXiv:2404.07484 [pdf, other]: Title: Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios

Yuan Zhang, Xiaomei Tao, Hanxu Ai, Tao Chen, Yanling Gan

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[4] arXiv:2404.07872 [pdf, html, other]: Title: Video Compression Beyond VVC: Quantitative Analysis of Intra Coding Tools in Enhanced Compression Model (ECM)

Mohsen Abdoli, Ramin G. Youvalari, Karam Naser, Kevin Reuzé, Fabrice Le Léannec

Comments: Submitted to IEEE ICIP 2024

Subjects: Multimedia (cs.MM); Image and Video Processing (eess.IV)
[5] arXiv:2404.08264 [pdf, html, other]: Title: Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis

Masahiro Yasuda, Noboru Harada, Yasunori Ohishi, Shoichiro Saito, Akira Nakayama, Nobutaka Ono

Comments: 13page, 7figure, under review

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[6] arXiv:2404.09029 [pdf, other]: Title: A Parametric Rate-Distortion Model for Video Transcoding

Maedeh Jamali, Nader Karimi, Shadrokh Samavi, Shahram Shirani

Subjects: Multimedia (cs.MM); Information Theory (cs.IT); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[7] arXiv:2404.09245 [pdf, html, other]: Title: Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics

Haosong Peng, Wei Feng, Hao Li, Yufeng Zhan, Ren Jin, Yuanqing Xia

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[8] arXiv:2404.10528 [pdf, html, other]: Title: AllTheDocks road safety dataset: A cyclist's perspective and experience

Chia-Yen Chiang, Ruikang Zhong, Jennifer Ding, Joseph Wood, Stephen Bee, Mona Jaber

Subjects: Multimedia (cs.MM)
[9] arXiv:2404.10702 [pdf, html, other]: Title: Retrieval Augmented Verification for Zero-Shot Detection of Multimodal Disinformation

Arka Ujjal Dey, Artemis Llabrés, Ernest Valveny, Dimosthenis Karatzas

Subjects: Multimedia (cs.MM)
[10] arXiv:2404.11938 [pdf, html, other]: Title: HyDiscGAN: A Hybrid Distributed cGAN for Audio-Visual Privacy Preservation in Multimodal Sentiment Analysis

Zhuojia Wu, Qi Zhang, Duoqian Miao, Kun Yi, Wei Fan, Liang Hu

Comments: 13 pages, IJCAI-2024

Subjects: Multimedia (cs.MM); Distributed, Parallel, and Cluster Computing (cs.DC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:2404.12169 [pdf, html, other]: Title: Shotit: compute-efficient image-to-video search engine for the cloud

Leslie Wong

Comments: Submitted to ACM ICMR 2024

Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR)
[12] arXiv:2404.12903 [pdf, html, other]: Title: ConCLVD: Controllable Chinese Landscape Video Generation via Diffusion Model

Dingming Liu, Shaowei Li, Ruoyan Zhou, Lili Liang, Yongguan Hong, Fei Chao, Rongrong Ji

Subjects: Multimedia (cs.MM)
[13] arXiv:2404.13134 [pdf, html, other]: Title: Deep Learning-based Text-in-Image Watermarking

Bishwa Karki, Chun-Hua Tsai, Pei-Chi Huang, Xin Zhong

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[14] arXiv:2404.13619 [pdf, html, other]: Title: Towards Unified Representation of Multi-Modal Pre-training for 3D Understanding via Differentiable Rendering

Ben Fei, Yixuan Li, Weidong Yang, Lipeng Ma, Ying He

Subjects: Multimedia (cs.MM)
[15] arXiv:2404.13640 [pdf, html, other]: Title: Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer

Kepeng Xu, Li Xu, Gang He, Wenxin Yu, Yunsong Li

Comments: 9 pages

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[16] arXiv:2404.13792 [pdf, html, other]: Title: Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion Outcome

Donghuo Zeng, Roberto S. Legaspi, Yuewen Sun, Xinshuai Dong, Kazushi Ikeda, Peter Spirtes, kun Zhang

Comments: 14 pages, 10 figures, Accepted by Persuasive Technology 2024

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[17] arXiv:2404.13993 [pdf, html, other]: Title: Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion

Yingxuan Li, Ryota Hinami, Kiyoharu Aizawa, Yusuke Matsui

Comments: Accepted to ACM Multimedia 2024. Project page: this https URL ; Github repo: this https URL

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[18] arXiv:2404.14573 [pdf, html, other]: Title: Tile-Weighted Rate-Distortion Optimized Packet Scheduling for 360$^\circ$ VR Video Streaming

Haopeng Wang, Haiwei Dong, Abdulmotaleb El Saddik

Comments: Accepted by IEEE Intelligent Systems

Subjects: Multimedia (cs.MM)
[19] arXiv:2404.14687 [pdf, html, other]: Title: Pegasus-v1 Technical Report

Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon, Genie Heo, Henry Choi, Jenna Kang, Kevin Han, Noah Seo, Sunny Nguyen, Ryan Won, Yeonhoo Park, Anthony Giuliani, Dave Chung, Hans Yoon, James Le, Jenny Ahn, June Lee, Maninder Saini, Meredith Sanders, Soyoung Lee, Sue Kim, Travis Couture

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[20] arXiv:2404.14755 [pdf, html, other]: Title: SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models

Bo Lin, Yingjing Xu, Xuanwen Bao, Zhou Zhao, Zhouyang Wang, Jianwei Yin

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[21] arXiv:2404.14934 [pdf, html, other]: Title: G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition

Kaikai Deng, Dong Zhao, Wenxin Zheng, Yue Ling, Kangwen Yin, Huadong Ma

Comments: 18 pages, 29 figures

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[22] arXiv:2404.15875 [pdf, html, other]: Title: Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval

Haokun Wen, Xuemeng Song, Xiaolin Chen, Yinwei Wei, Liqiang Nie, Tat-Seng Chua

Comments: ACM SIGIR 2024

Subjects: Multimedia (cs.MM)
[23] arXiv:2404.16305 [pdf, html, other]: Title: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model

Gehui Chen, Guan'an Wang, Xiaowen Huang, Jitao Sang

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2404.17151 [pdf, html, other]: Title: MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

Chengpei Xu, Wenjing Jia, Ruomei Wang, Xiaonan Luo, Xiangjian He

Comments: Accepted by Transaction on Multimedia

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[25] arXiv:2404.18162 [pdf, html, other]: Title: fMRI Exploration of Visual Quality Assessment

Yiming Zhang, Ying Hu, Xiongkuo Min, Yan Zhou, Guangtao Zhai

Subjects: Multimedia (cs.MM); Neurons and Cognition (q-bio.NC)
[26] arXiv:2404.18343 [pdf, html, other]: Title: G-Refine: A General Quality Refiner for Text-to-Image Generation

Chunyi Li, Haoning Wu, Hongkun Hao, Zicheng Zhang, Tengchaun Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[27] arXiv:2404.18746 [pdf, html, other]: Title: Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models

Hongyi Zhu, Jia-Hong Huang, Stevan Rudinac, Evangelos Kanoulas

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[28] arXiv:2404.19282 [pdf, html, other]: Title: Dual Dynamic Threshold Adjustment Strategy for Deep Metric Learning

Xiruo Jiang, Yazhou Yao, Sheng Liu, Fumin Shen, Liqiang Nie, Xiansheng Hua

Comments: accepted by ACM Transactions on Multimedia Computing, Communications, and Applications

Subjects: Multimedia (cs.MM)
[29] arXiv:2404.00511 (cross-list from cs.CL) [pdf, html, other]: Title: MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models

Zebang Cheng, Fuqiang Niu, Yuxiang Lin, Zhi-Qi Cheng, Bowen Zhang, Xiaojiang Peng

Comments: Ranked 3rd in SemEval '24 Task 3 with F1 of 0.3435, close to 1st & 2nd by 0.0339 & 0.0025

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[30] arXiv:2404.00621 (cross-list from cs.IR) [pdf, html, other]: Title: Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey

Qijiong Liu, Jieming Zhu, Yanting Yang, Quanyu Dai, Zhaocheng Du, Xiao-Ming Wu, Zhou Zhao, Rui Zhang, Zhenhua Dong

Comments: Accepted by KDD 2024. See our tutorial materials at this https URL

Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[31] arXiv:2404.00636 (cross-list from cs.CV) [pdf, html, other]: Title: Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation

Taekyung Ki, Dongchan Min, Gyeongsu Chae

Comments: ECCV 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[32] arXiv:2404.00989 (cross-list from cs.CV) [pdf, html, other]: Title: 360+x: A Panoptic Multi-modal Scene Understanding Dataset

Hao Chen, Yuqi Hou, Chenyuan Qu, Irene Testini, Xiaohan Hong, Jianbo Jiao

Comments: CVPR 2024 (Oral Presentation), Project page: this https URL

Journal-ref: The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:2404.01174 (cross-list from cs.CV) [pdf, html, other]: Title: SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding

Wenrui Li, Xiaopeng Hong, Ruiqin Xiong, Xiaopeng Fan

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[34] arXiv:2404.01291 (cross-list from cs.CV) [pdf, other]: Title: Evaluating Text-to-Visual Generation with Image-to-Text Generation

Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan

Comments: We open-source our data, model, and code at: this https URL ; Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[35] arXiv:2404.01336 (cross-list from cs.CL) [pdf, html, other]: Title: FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection

Ziyi Zhou, Xiaoming Zhang, Litian Zhang, Jiacheng Liu, Senzhang Wang, Zheng Liu, Xi Zhang, Chaozhuo Li, Philip S. Yu

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[36] arXiv:2404.01409 (cross-list from cs.CV) [pdf, html, other]: Title: OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation

Xiongwei Wu, Sicheng Yu, Ee-Peng Lim, Chong-Wah Ngo

Comments: CVPR 2024; 12 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[37] arXiv:2404.01617 (cross-list from cs.NI) [pdf, html, other]: Title: Designing Network Algorithms via Large Language Models

Zhiyuan He, Aashish Gottipati, Lili Qiu, Xufang Luo, Kenuo Xu, Yuqing Yang, Francis Y. Yan

Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Multimedia (cs.MM)
[38] arXiv:2404.01713 (cross-list from cs.CL) [pdf, html, other]: Title: Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

Nassim Sehad, Lina Bariah, Wassim Hamidouche, Hamed Hellaoui, Riku Jäntti, Mérouane Debbah

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[39] arXiv:2404.01735 (cross-list from cs.IR) [pdf, html, other]: Title: CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling

Yunshan Ma, Yingzhi He, Wenjun Zhong, Xiang Wang, Roger Zimmermann, Tat-Seng Chua

Comments: arXiv preprint, 10 pages, 4 figures, 6 tables

Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[40] arXiv:2404.01862 (cross-list from cs.CV) [pdf, html, other]: Title: Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu

Comments: 22 pages, 8 figures, CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[41] arXiv:2404.02731 (cross-list from eess.IV) [pdf, html, other]: Title: Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

Yunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, Hui Xiong

Comments: Accepted for the CVPR 2024 Workshop on Mobile Intelligent Photography & Imaging

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[42] arXiv:2404.02755 (cross-list from cs.CV) [pdf, html, other]: Title: DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement

Hao Wu, Huabin Liu, Yu Qiao, Xiao Sun

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[43] arXiv:2404.03161 (cross-list from cs.CV) [pdf, html, other]: Title: BioVL-QR: Egocentric Biochemical Vision-and-Language Dataset Using Micro QR Codes

Tomohiro Nishimoto, Taichi Nishimura, Koki Yamamoto, Keisuke Shirai, Hirotaka Kameko, Yuto Haneji, Tomoya Yoshida, Keiya Kajimura, Taiyu Cui, Chihiro Nishiwaki, Eriko Daikoku, Natsuko Okuda, Fumihito Ono, Shinsuke Mori

Comments: ICIP2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[44] arXiv:2404.03179 (cross-list from cs.CV) [pdf, html, other]: Title: UniAV: Unified Audio-Visual Perception for Multi-Task Video Event Localization

Tiantian Geng, Teng Wang, Yanfu Zhang, Jinming Duan, Weili Guan, Feng Zheng, Ling shao

Comments: This work has been submitted to the IEEE for possible publication

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[45] arXiv:2404.03635 (cross-list from cs.CV) [pdf, html, other]: Title: WorDepth: Variational Language Prior for Monocular Depth Estimation

Ziyao Zeng, Daniel Wang, Fengyu Yang, Hyoungseob Park, Yangchao Wu, Stefano Soatto, Byung-Woo Hong, Dong Lao, Alex Wong

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[46] arXiv:2404.04037 (cross-list from cs.CV) [pdf, html, other]: Title: InstructHumans: Editing Animated 3D Human Textures with Instructions

Jiayin Zhu, Linlin Yang, Angela Yao

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[47] arXiv:2404.04807 (cross-list from cs.CV) [pdf, html, other]: Title: D2SL: Decouple Defogging and Semantic Learning for Foggy Domain-Adaptive Segmentation

Xuan Sun, Zhanfu An, Yuyu Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[48] arXiv:2404.04996 (cross-list from cs.CV) [pdf, html, other]: Title: Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM

Pingping Zhang, Tianyu Yan, Yang Liu, Huchuan Lu

Comments: Accepted by CVPR2024 as Poster(Highlight)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[49] arXiv:2404.05206 (cross-list from cs.CV) [pdf, html, other]: Title: SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos

Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman

Comments: Accepted at CVPR 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:2404.05317 (cross-list from cs.CV) [pdf, html, other]: Title: WebXR, A-Frame and Networked-Aframe as a Basis for an Open Metaverse: A Conceptual Architecture

Giuseppe Macario

Comments: draftcls option

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[51] arXiv:2404.05321 (cross-list from eess.IV) [pdf, other]: Title: Unravelling the Power of Single-Pass Look-Ahead in Modern Codecs for Optimized Transcoding Deployment

Vibhoothi Vibhoothi, Julien Zouein, François Pitié, Anil Kokaram

Comments: Accepted paper for NAB 2024

Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[52] arXiv:2404.05802 (cross-list from cs.CE) [pdf, html, other]: Title: BatSort: Enhanced Battery Classification with Transfer Learning for Battery Sorting and Recycling

Yunyi Zhao, Wei Zhang, Erhai Hu, Qingyu Yan, Cheng Xiang, King Jet Tseng, Dusit Niyato

Subjects: Computational Engineering, Finance, and Science (cs.CE); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[53] arXiv:2404.06022 (cross-list from cs.CV) [pdf, other]: Title: Band-Attention Modulated RetNet for Face Forgery Detection

Zhida Zhang, Jie Cao, Wenkui Yang, Qihang Fan, Kai Zhou, Ran He

Comments: The essay is poorly expressed in writing and will be re-optimised

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[54] arXiv:2404.06165 (cross-list from cs.CV) [pdf, html, other]: Title: Enhanced Radar Perception via Multi-Task Learning: Towards Refined Data for Sensor Fusion Applications

Huawei Sun, Hao Feng, Gianfranco Mauro, Julius Ott, Georg Stettinger, Lorenzo Servadei, Robert Wille

Comments: Accepted by IEEE Intelligent Vehicles Symposium (IV 2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
[55] arXiv:2404.06220 (cross-list from cs.LG) [pdf, html, other]: Title: Zero-Shot Relational Learning for Multimodal Knowledge Graphs

Rui Cai, Shichao Pei, Xiangliang Zhang

Comments: In the Proceedings of the 2024 IEEE International Conference on Big Data (IEEE BigData 2024)

Subjects: Machine Learning (cs.LG); Multimedia (cs.MM)
[56] arXiv:2404.06243 (cross-list from cs.CV) [pdf, html, other]: Title: ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos

Sharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy, Raveendran Paramesran, Raphael C.-W. Phan

Comments: Submitted for peer review

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM)
[57] arXiv:2404.06365 (cross-list from cs.CV) [pdf, html, other]: Title: Dynamic Resolution Guidance for Facial Expression Recognition

Songpan Wang, Xu Li, Tianxiang Jiang, Yuanlun Xie

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[58] arXiv:2404.06563 (cross-list from cs.DB) [pdf, html, other]: Title: Demonstration of MaskSearch: Efficiently Querying Image Masks for Machine Learning Workflows

Lindsey Linxi Wei, Chung Yik Edward Yeung, Hongjian Yu, Jingchuan Zhou, Dong He, Magdalena Balazinska

Subjects: Databases (cs.DB); Machine Learning (cs.LG); Multimedia (cs.MM)
[59] arXiv:2404.06936 (cross-list from cs.CV) [pdf, html, other]: Title: Efficient and Generic Point Model for Lossless Point Cloud Attribute Compression

Kang You, Pan Gao, Zhan Ma

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[60] arXiv:2404.07206 (cross-list from cs.CV) [pdf, html, other]: Title: GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

Zewei Zhang, Huan Liu, Jun Chen, Xiangyu Xu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[61] arXiv:2404.07336 (cross-list from cs.CV) [pdf, html, other]: Title: PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores

Lucas Goncalves, Prashant Mathur, Chandrashekhar Lavania, Metehan Cekic, Marcello Federico, Kyu J. Han

Comments: 24 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[62] arXiv:2404.08281 (cross-list from cs.CV) [pdf, html, other]: Title: Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation

Yichen Yan, Xingjian He, Sihan Chen, Jing Liu

Comments: 9 pages, 8 figures ICMR2024. arXiv admin note: text overlap with arXiv:2305.14969

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[63] arXiv:2404.08965 (cross-list from cs.CV) [pdf, html, other]: Title: Seeing Text in the Dark: Algorithm and Benchmark

Chengpei Xu, Hao Fu, Long Ma, Wenjing Jia, Chengqi Zhang, Feng Xia, Xiaoyu Ai, Binghao Li, Wenjie Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[64] arXiv:2404.09516 (cross-list from cs.LG) [pdf, html, other]: Title: State Space Model for New-Generation Network Alternative to Transformers: A Survey

Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang, Ziwen Wang, Bo Jiang, Chenglong Li, Yaowei Wang, Yonghong Tian, Jin Tang

Comments: The First review of State Space Model (SSM)/Mamba and their applications in artificial intelligence, 33 pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[65] arXiv:2404.09654 (cross-list from cs.CV) [pdf, html, other]: Title: Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection

Jiaqi Zhu, Shaofeng Cai, Fang Deng, Beng Chin Ooi, Junran Wu

Comments: Accepted by MM'24 (Oral)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[66] arXiv:2404.09905 (cross-list from cs.NI) [pdf, html, other]: Title: Quality of Experience Oriented Cross-layer Optimization for Real-time XR Video Transmission

Guangjin Pan, Shugong Xu, Shunqing Zhang, Xiaojing Chen, Yanzan Sun

Comments: 14 pages, 13 figures. arXiv admin note: text overlap with arXiv:2402.01180

Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM); Image and Video Processing (eess.IV); Systems and Control (eess.SY)
[67] arXiv:2404.10141 (cross-list from cs.CV) [pdf, html, other]: Title: ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis

Aashish Anantha Ramakrishnan, Sharon X. Huang, Dongwon Lee

Comments: 23 pages, 9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[68] arXiv:2404.10250 (cross-list from cs.PL) [pdf, html, other]: Title: AniFrame: A Programming Language for 2D Drawing and Frame-Based Animation

Mark Edward M. Gonzales, Hans Oswald A. Ibrahim, Elyssia Barrie H. Ong, Ryan Austin Fernandez

Comments: Accepted for paper presentation at the 24th Philippine Computing Science Congress (PCSC 2024), held in Laguna, Philippines

Subjects: Programming Languages (cs.PL); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[69] arXiv:2404.10292 (cross-list from cs.CV) [pdf, html, other]: Title: From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search

Jintao Sun, Hao Fei, Zhedong Zheng, Gangyi Ding

Comments: 11 pages, 8 figures, Proceedings of the ACM Web Conference 2025 (WWW '25)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[70] arXiv:2404.10342 (cross-list from cs.CV) [pdf, html, other]: Title: Referring Flexible Image Restoration

Runwei Guan, Rongsheng Hu, Zhuhao Zhou, Tianlang Xue, Ka Lok Man, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue

Comments: 15 pages, 19 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[71] arXiv:2404.10838 (cross-list from cs.CV) [pdf, html, other]: Title: Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Representation Learning

Zhengyang Liang, Meiyu Liang, Wei Huang, Yawen Li, Zhe Xue

Comments: 10 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[72] arXiv:2404.10989 (cross-list from cs.CV) [pdf, html, other]: Title: FairSSD: Understanding Bias in Synthetic Speech Detectors

Amit Kumar Singh Yadav, Kratika Bhagtani, Davide Salvi, Paolo Bestagini, Edward J.Delp

Comments: Accepted at CVPR 2024 (WMF)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73] arXiv:2404.11116 (cross-list from cs.SD) [pdf, html, other]: Title: Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge

Keren Shao, Ke Chen, Shlomo Dubnov

Comments: 2 pages, 2 figures, 1 tables, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[74] arXiv:2404.11119 (cross-list from cs.IR) [pdf, html, other]: Title: DREAM: A Dual Representation Learning Model for Multimodal Recommendation

Kangning Zhang, Yingjie Qin, Jiarui Jin, Yifan Liu, Ruilong Su, Weinan Zhang, Yong Yu

Comments: 10 pages, 11 figures

Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[75] arXiv:2404.11209 (cross-list from cs.AI) [pdf, html, other]: Title: Prompt-Guided Generation of Structured Chest X-Ray Report Using a Pre-trained LLM

Hongzhao Li, Hongyu Wang, Xia Sun, Hua He, Jun Feng

Comments: Accepted by IEEE Conference on Multimedia Expo 2024

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[76] arXiv:2404.11375 (cross-list from cs.CV) [pdf, html, other]: Title: Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

Xinghan Wang, Zixi Kang, Yadong Mu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[77] arXiv:2404.12257 (cross-list from cs.CV) [pdf, html, other]: Title: Food Portion Estimation via 3D Object Scaling

Gautham Vinod, Jiangpeng He, Zeman Shao, Fengqing Zhu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[78] arXiv:2404.12330 (cross-list from cs.CV) [pdf, html, other]: Title: A Perspective on Deep Vision Performance with Standard Image and Video Codecs

Christoph Reich, Oliver Hahn, Daniel Cremers, Stefan Roth, Biplob Debnath

Comments: Accepted at CVPR 2024 Workshop on AI for Streaming (AIS)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[79] arXiv:2404.12630 (cross-list from cs.CV) [pdf, html, other]: Title: MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

Zixuan Gong, Qi Zhang, Guangyin Bao, Lei Zhu, Ke Liu, Liang Hu, Duoqian Miao

Comments: AAAI 2025, 14 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[80] arXiv:2404.12725 (cross-list from cs.SD) [pdf, html, other]: Title: Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction

Zhaoxi Mu, Xinyu Yang

Comments: Accepted by IJCAI 2024

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[81] arXiv:2404.12794 (cross-list from cs.CV) [pdf, html, other]: Title: MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang

Comments: Accepted to ACM MM 2024. The source code is publicly available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO); Image and Video Processing (eess.IV)
[82] arXiv:2404.12900 (cross-list from cs.CV) [pdf, html, other]: Title: Training-and-Prompt-Free General Painterly Harmonization via Zero-Shot Disentenglement on Style and Content References

Teng-Fang Hsiao, Bo-Kai Ruan, Hong-Han Shuai

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[83] arXiv:2404.13282 (cross-list from cs.CV) [pdf, html, other]: Title: Wills Aligner: Multi-Subject Collaborative Brain Visual Decoding

Guangyin Bao, Qi Zhang, Zixuan Gong, Jialei Zhou, Wei Fan, Kun Yi, Usman Naseem, Liang Hu, Duoqian Miao

Comments: AAAI 2025, 16 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[84] arXiv:2404.13289 (cross-list from cs.CL) [pdf, html, other]: Title: Double Mixture: Towards Continual Event Detection from Speech

Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Yinwei Wei, Hao Yang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari

Comments: The first two authors contributed equally to this work

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85] arXiv:2404.13306 (cross-list from cs.CV) [pdf, html, other]: Title: FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models

Yixuan Li, Xuelin Liu, Xiaoyang Wang, Bu Sung Lee, Shiqi Wang, Anderson Rocha, Weisi Lin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[86] arXiv:2404.13370 (cross-list from cs.CV) [pdf, html, other]: Title: Movie101v2: Improved Movie Narration Benchmark

Zihao Yue, Yepeng Zhang, Ziheng Wang, Qin Jin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[87] arXiv:2404.13621 (cross-list from cs.CV) [pdf, html, other]: Title: Attack on Scene Flow using Point Clouds

Haniyeh Ehsani Oskouie, Mohammad-Shahram Moin, Shohreh Kasaei

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[88] arXiv:2404.13628 (cross-list from cs.CL) [pdf, html, other]: Title: Mixture of LoRA Experts

Xun Wu, Shaohan Huang, Furu Wei

Comments: 17 pages, 11 figures

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[89] arXiv:2404.13789 (cross-list from cs.SD) [pdf, html, other]: Title: Anchor-aware Deep Metric Learning for Audio-visual Retrieval

Donghuo Zeng, Yanan Wang, Kazushi Ikeda, Yi Yu

Comments: 9 pages, 5 figures. Accepted by ACM ICMR 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[90] arXiv:2404.13808 (cross-list from cs.IR) [pdf, html, other]: Title: General Item Representation Learning for Cold-start Content Recommendations

Jooeun Kim, Jinri Kim, Kwangeun Yeo, Eungi Kim, Kyoung-Woon On, Jonghwan Mun, Joonseok Lee

Comments: 14 pages

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)
[91] arXiv:2404.13899 (cross-list from cs.CL) [pdf, html, other]: Title: Towards Better Text-to-Image Generation Alignment via Attention Modulation

Yihang Wu, Xiao Cao, Kaixin Li, Zitan Chen, Haonan Wang, Lei Meng, Zhiyong Huang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[92] arXiv:2404.13914 (cross-list from cs.SD) [pdf, html, other]: Title: A Survey on Speech Deepfake Detection

Menglu Li, Yasaman Ahmadiadli, Xiao-Ping Zhang

Comments: 38 pages. This paper has been accepted by ACM Computing Surveys

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[93] arXiv:2404.13944 (cross-list from cs.CV) [pdf, html, other]: Title: Gorgeous: Create Your Desired Character Facial Makeup from Any Ideas

Jia Wei Sii, Chee Seng Chan

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[94] arXiv:2404.14037 (cross-list from cs.CV) [pdf, html, other]: Title: GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu

Comments: Accepted by ACM MM 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[95] arXiv:2404.14381 (cross-list from cs.CV) [pdf, html, other]: Title: TAVGBench: Benchmarking Text to Audible-Video Generation

Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai

Comments: Technical Report. Project page:this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[96] arXiv:2404.14674 (cross-list from cs.LG) [pdf, html, other]: Title: HOIN: High-Order Implicit Neural Representations

Yang Chen, Ruituo Wu, Yipeng Liu, Ce Zhu

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[97] arXiv:2404.14985 (cross-list from cs.CV) [pdf, html, other]: Title: Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification

Yingquan Wang, Pingping Zhang, Dong Wang, Huchuan Lu

Comments: Accepted by CVIU2024. More modifications may be performed

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[98] arXiv:2404.15100 (cross-list from cs.CV) [pdf, html, other]: Title: Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

Xun Wu, Shaohan Huang, Furu Wei

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[99] arXiv:2404.15107 (cross-list from cs.HC) [pdf, html, other]: Title: MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos

Zheng Ning, Zheng Zhang, Jerrick Ban, Kaiwen Jiang, Ruohong Gan, Yapeng Tian, Toby Jia-Jun Li

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[100] arXiv:2404.15143 (cross-list from cs.SD) [pdf, html, other]: Title: Every Breath You Don't Take: Deepfake Speech Detection Using Breath

Seth Layton, Thiago De Andrade, Daniel Olszewski, Kevin Warren, Kevin Butler, Patrick Traynor

Comments: Submitted to ACM journal -- Digital Threats: Research and Practice

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[101] arXiv:2404.15276 (cross-list from cs.CV) [pdf, html, other]: Title: SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation

Xiangyu Xu, Lijuan Liu, Shuicheng Yan

Comments: Published at TPAMI 2024

Journal-ref: https://www.computer.org/csdl/journal/tp/2024/05/10354384/1SP2qWh8Fq0

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[102] arXiv:2404.15349 (cross-list from eess.SP) [pdf, html, other]: Title: A Survey on Multimodal Wearable Sensor-based Human Action Recognition

Jianyuan Ni, Hao Tang, Syed Tousiful Haque, Yan Yan, Anne H.H. Ngu

Comments: Multimodal Survey for Wearable Sensor-based Human Action Recognition

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Multimedia (cs.MM)
[103] arXiv:2404.15406 (cross-list from cs.CV) [pdf, html, other]: Title: Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

Davide Caffagni, Federico Cocchi, Nicholas Moratelli, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Comments: CVPR 2024 Workshop on What is Next in Multimodal Foundation Models

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[104] arXiv:2404.15637 (cross-list from cs.SD) [pdf, html, other]: Title: HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts

Xinlei Niu, Jing Zhang, Charles Patrick Martin

Comments: Proceedings of Interspeech

Journal-ref: Proc. Interspeech 2024, 4368-4372

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[105] arXiv:2404.15771 (cross-list from cs.CV) [pdf, html, other]: Title: DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines

Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[106] arXiv:2404.16012 (cross-list from cs.CV) [pdf, html, other]: Title: GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting

Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn, Seungryong Kim

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[107] arXiv:2404.16038 (cross-list from cs.CV) [pdf, html, other]: Title: A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming

Pengyuan Zhou, Lin Wang, Zhi Liu, Yanbin Hao, Pan Hui, Sasu Tarkoma, Jussi Kangasharju

Comments: 16 pages, 10 figures, 4 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[108] arXiv:2404.16112 (cross-list from cs.LG) [pdf, html, other]: Title: Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

Badri Narayana Patro, Vijay Srinivas Agneeswaran

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[109] arXiv:2404.16193 (cross-list from cs.CV) [pdf, html, other]: Title: Improving Multi-label Recognition using Class Co-Occurrence Probabilities

Samyak Rawlekar, Shubhang Bhatnagar, Vishnuvardhan Pogunulu Srinivasulu, Narendra Ahuja

Comments: Accepted to ICPR 2024, CVPR workshops 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[110] arXiv:2404.16205 (cross-list from cs.CV) [pdf, html, other]: Title: AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Wei Sun, Yuqin Cao, Yanwei Jiang, Jun Jia, Zhichao Zhang, Zijian Chen, Weixia Zhang, Xiongkuo Min, Steve Göring, Zihao Qi, Chen Feng

Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[111] arXiv:2404.16302 (cross-list from cs.CV) [pdf, html, other]: Title: CFMW: Cross-modality Fusion Mamba for Robust Object Detection under Adverse Weather

Haoyuan Li, Qi Hu, Binjia Zhou, You Yao, Jiacheng Lin, Kailun Yang, Peng Chen

Comments: Accepted to IEEE Transactions on Circuits and Systems for Video Technology (TCSVT). The dataset and source code will be made publicly available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO); Image and Video Processing (eess.IV)
[112] arXiv:2404.17534 (cross-list from cs.CV) [pdf, html, other]: Title: Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models

Yuhang Huang, Zihan Wu, Chongyang Gao, Jiawei Peng, Xu Yang

Comments: 11 pages, 9 figures, 6 tables. For associated code, see this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[113] arXiv:2404.17821 (cross-list from cs.SD) [pdf, other]: Title: An automatic mixing speech enhancement system for multi-track audio

Xiaojing Liu, Hongwei Ai, Joshua D. Reiss

Comments: 5 pages

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[114] arXiv:2404.18081 (cross-list from cs.SD) [pdf, other]: Title: ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[115] arXiv:2404.18114 (cross-list from cs.CV) [pdf, html, other]: Title: Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

Haiwen Diao, Ying Zhang, Shang Gao, Xiang Ruan, Huchuan Lu

Comments: 12 pages, 9 figures, Accepted by TIP2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[116] arXiv:2404.18136 (cross-list from cs.CV) [pdf, html, other]: Title: SafePaint: Anti-forensic Image Inpainting with Domain Adaptation

Dunyun Chen, Xin Liao, Xiaoshuai Wu, Shiwei Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[117] arXiv:2404.18149 (cross-list from cs.CV) [pdf, html, other]: Title: Compressed Deepfake Video Detection Based on 3D Spatiotemporal Trajectories

Zongmei Chen, Xin Liao, Xiaoshuai Wu, Yanxiang Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[118] arXiv:2404.18202 (cross-list from cs.AI) [pdf, html, other]: Title: WorldGPT: Empowering LLM as Multimodal World Model

Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang

Comments: update v2

Subjects: Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[119] arXiv:2404.18398 (cross-list from cs.CL) [pdf, html, other]: Title: UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts

Zhi-Qi Cheng, Xiang Li, Jun-Yan He, Junyao Chen, Xiaomao Fan, Xiaojiang Peng, Alexander G. Hauptmann

Comments: Accepted to ICASSP 2025, Code available at this https URL

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[120] arXiv:2404.18976 (cross-list from cs.LG) [pdf, html, other]: Title: Foundations of Multisensory Artificial Intelligence

Paul Pu Liang

Comments: CMU Machine Learning Department PhD Thesis

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[121] arXiv:2404.19311 (cross-list from cs.CV) [pdf, html, other]: Title: A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images

Wang Zhang, Tingting Li, Yuntian Zhang, Gensheng Pei, Xiruo Jiang, Yazhou Yao

Comments: accepted by Information Fusion

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[122] arXiv:2404.19500 (cross-list from cs.CV) [pdf, html, other]: Title: Towards Real-world Video Face Restoration: A New Benchmark

Ziyan Chen, Jingwen He, Xinqi Lin, Yu Qiao, Chao Dong

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[123] arXiv:2404.19615 (cross-list from cs.CV) [pdf, other]: Title: SemiPL: A Semi-supervised Method for Event Sound Source Localization

Yue Li, Baiqiao Yin, Jinfu Liu, Jiajun Wen, Jiaying Lin, Mengyuan Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 123 entries

Showing up to 2000 entries per page: fewer | more | all