Multimedia

Authors and titles for May 2024

Total of 100 entries : 1-50 51-100

Showing up to 50 entries per page: fewer | more | all

[51] arXiv:2405.09539 (cross-list from eess.IV) [pdf, html, other]: Title: MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer

Chengyu Wu, Chengkai Wang, Yaqi Wang, Huiyu Zhou, Yatao Zhang, Qifeng Wang, Shuai Wang

Comments: Early accepted to MICCAI 2024 (6/6/5)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[52] arXiv:2405.10121 (cross-list from cs.CL) [pdf, html, other]: Title: Distilling Implicit Multimodal Knowledge into Large Language Models for Zero-Resource Dialogue Generation

Bo Zhang, Hui Ma, Jian Ding, Jian Wang, Bo Xu, Hongfei Lin

Comments: Accepted by Information Fusion. The code is available at this https URL

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[53] arXiv:2405.11093 (cross-list from eess.AS) [pdf, html, other]: Title: AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations

David Xu

Comments: typos corrected

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)
[54] arXiv:2405.11145 (cross-list from cs.CV) [pdf, html, other]: Title: Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

Junzhang Liu, Zhecan Wang, Hammad Ayyubi, Haoxuan You, Chris Thomas, Rui Sun, Shih-Fu Chang, Kai-Wei Chang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[55] arXiv:2405.11273 (cross-list from cs.AI) [pdf, html, other]: Title: Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang

Comments: 22 pages, 13 figures. Project Website: this https URL. Working in progress

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[56] arXiv:2405.11295 (cross-list from eess.IV) [pdf, other]: Title: Medical Image Analysis for Detection, Treatment and Planning of Disease using Artificial Intelligence Approaches

Nand Lal Yadav, Satyendra Singh, Rajesh Kumar, Sudhakar Singh

Comments: 10 pages, 3 figures

Journal-ref: International Journal of Microsystems and IoT, Vol. 1, Issue 5, pp.278- 287, 2023

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[57] arXiv:2405.12126 (cross-list from cs.CV) [pdf, html, other]: Title: Alzheimer's Magnetic Resonance Imaging Classification Using Deep and Meta-Learning Models

Nida Nasir, Muneeb Ahmed, Neda Afreen, Mustafa Sameer

Subjects: Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Multimedia (cs.MM)
[58] arXiv:2405.12221 (cross-list from cs.CV) [pdf, html, other]: Title: Images that Sound: Composing Images and Sounds on a Single Canvas

Ziyang Chen, Daniel Geng, Andrew Owens

Comments: Accepted to NeurIPS 2024. Project site: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[59] arXiv:2405.12336 (cross-list from cs.CR) [pdf, other]: Title: Interoperable Provenance Authentication of Broadcast Media using Open Standards-based Metadata, Watermarking and Cryptography

John C. Simmons, Joseph M. Winograd

Comments: 17 pages, 9 figures. Submitted to IBC2024 Technical Papers Programme

Journal-ref: IBC2024 Technical Papers Programme. https://www.ibc.org/technical-papers/ibc2024-tech-papers-interoperable-provenance-authentication-of-broadcast-media-using-open-standards-based-metadata-watermarking-and-cryptography/12063.article

Subjects: Cryptography and Security (cs.CR); Multimedia (cs.MM)
[60] arXiv:2405.12512 (cross-list from cs.CV) [pdf, html, other]: Title: Rethink Predicting the Optical Flow with the Kinetics Perspective

Yuhao Cheng, Siru Zhang, Yiqiang Yan

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[61] arXiv:2405.12540 (cross-list from cs.CV) [pdf, html, other]: Title: Context-Enhanced Video Moment Retrieval with Large Language Models

Weijia Liu, Bo Miao, Jiuxin Cao, Xuelin Zhu, Bo Liu, Mehwish Nasim, Ajmal Mian

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[62] arXiv:2405.12564 (cross-list from q-bio.QM) [pdf, html, other]: Title: ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

Comments: ACL 2024, 9 pages

Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Multimedia (cs.MM)
[63] arXiv:2405.12847 (cross-list from cs.IR) [pdf, html, other]: Title: A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability

Li-Yang Tseng, Tzu-Ling Lin, Hong-Han Shuai, Jen-Wei Huang, Wen-Whei Chang

Journal-ref: Proceedings of the 24th International Society for Music Information Retrieval Conference, 174-181. Milan, Italy, November 5-9, 2023

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64] arXiv:2405.12983 (cross-list from eess.AS) [pdf, html, other]: Title: Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer

Maxime Burchi, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg, Radu Timofte

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
[65] arXiv:2405.13049 (cross-list from cs.CL) [pdf, html, other]: Title: SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations

Fanfan Wang, Heqing Ma, Jianfei Yu, Rui Xia, Erik Cambria

Comments: Accepted to the 18th International Workshop on Semantic Evaluation (SemEval-2024). 12 pages, 3 figures, 4 Tables

Journal-ref: https://aclanthology.org/2024.semeval-1.277/

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[66] arXiv:2405.13127 (cross-list from cs.CV) [pdf, html, other]: Title: Towards Retrieval-Augmented Architectures for Image Captioning

Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Alessandro Nicolosi, Rita Cucchiara

Comments: ACM Transactions on Multimedia Computing, Communications and Applications (2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[67] arXiv:2405.13389 (cross-list from cs.CV) [pdf, html, other]: Title: HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera

Yunfan Lu, Zipeng Wang, Yusheng Wang, Hui Xiong

Comments: 30 pages, 20 figures, 8 tables. This work was submitted for review in the second half of 2023. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO)
[68] arXiv:2405.13403 (cross-list from eess.IV) [pdf, html, other]: Title: Adaptive Wireless Image Semantic Transmission and Over-The-Air Testing

Jiarun Ding, Peiwen Jiang, Chao-Kai Wen, Shi Jin

Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[69] arXiv:2405.13762 (cross-list from cs.CV) [pdf, html, other]: Title: A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli

Journal-ref: In Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70] arXiv:2405.13984 (cross-list from cs.CL) [pdf, html, other]: Title: Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation

Dimitris Gkoumas, Maria Liakata

Comments: ACL25 Main

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[71] arXiv:2405.14225 (cross-list from q-bio.QM) [pdf, html, other]: Title: ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining

Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

Comments: ACL 2024 Findings, 9 pages

Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Multimedia (cs.MM)
[72] arXiv:2405.14312 (cross-list from cs.CV) [pdf, html, other]: Title: Improving Gloss-free Sign Language Translation by Reducing Representation Density

Jinhui Ye, Xing Wang, Wenxiang Jiao, Junwei Liang, Hui Xiong

Comments: Accepted at NeurIPS'24; Representation Density Problem and Performance Drop in Gloss-free SLT

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[73] arXiv:2405.14556 (cross-list from cs.CV) [pdf, other]: Title: Deep Learning Classification of Photoplethysmogram Signal for Hypertension Levels

Nida Nasir, Mustafa Sameer, Feras Barneih, Omar Alshaltone, Muneeb Ahmed

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[74] arXiv:2405.14598 (cross-list from cs.CV) [pdf, html, other]: Title: Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Comments: 10 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75] arXiv:2405.14709 (cross-list from cs.CV) [pdf, html, other]: Title: OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance

Shuheng Ge, Haoyu Xing, Li Zhang, Xiangqian Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[76] arXiv:2405.15451 (cross-list from cs.CV) [pdf, html, other]: Title: Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval

Yiming Wu, Hangfei Li, Fangfang Wang, Yilong Zhang, Ronghua Liang

Comments: ICASSP 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM)
[77] arXiv:2405.15757 (cross-list from cs.CV) [pdf, html, other]: Title: Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Feng Liang, Akio Kodaira, Chenfeng Xu, Masayoshi Tomizuka, Kurt Keutzer, Diana Marculescu

Comments: ICLR 2025. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[78] arXiv:2405.16000 (cross-list from cs.SD) [pdf, html, other]: Title: Carnatic Raga Identification System using Rigorous Time-Delay Neural Network

Sanjay Natesan, Homayoon Beigi

Comments: 7 pages, 2 tables, 3 figures

Journal-ref: Recognition Technologies, Inc. Technical Report (2024), RTI-20240524-01

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[79] arXiv:2405.16296 (cross-list from cs.CV) [pdf, html, other]: Title: Neural Network-Based Tracking and 3D Reconstruction of Baseball Pitch Trajectories from Single-View 2D Video

Jhen Hsieh

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)
[80] arXiv:2405.16640 (cross-list from cs.AI) [pdf, html, other]: Title: A Survey of Multimodal Large Language Model from A Data-centric Perspective

Tianyi Bai, Hao Liang, Binwang Wan, Yanran Xu, Xi Li, Shiyu Li, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Ping Huang, Jiulong Shan, Conghui He, Binhang Yuan, Wentao Zhang

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[81] arXiv:2405.16728 (cross-list from cs.CV) [pdf, other]: Title: Towards Multi-Task Multi-Modal Models: A Video Generative Perspective

Lijun Yu

Comments: PhD thesis

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[82] arXiv:2405.16807 (cross-list from cs.CV) [pdf, other]: Title: Extreme Compression of Adaptive Neural Images

Leo Hoshikawa, Marcos V. Conde, Takeshi Ohashi, Atsushi Irie

Comments: Technical Report. Work in progress

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Multimedia (cs.MM)
[83] arXiv:2405.16961 (cross-list from eess.IV) [pdf, other]: Title: Blind Data Adaptation to tackle Covariate Shift in Operational Steganalysis

Rony Abecidan (CRIStAL), Vincent Itier (IMT Nord Europe, CRIStAL), Jérémie Boulanger (CRIStAL), Patrick Bas (CRIStAL), Tomáš Pevný (CTU)

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Multimedia (cs.MM)
[84] arXiv:2405.17729 (cross-list from cs.CV) [pdf, html, other]: Title: Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Rui Zhang, Shuailong Li, Junxiao Xue, Feng Lin, Qing Zhang, Xiao Ma, Xiaoran Yan

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[85] arXiv:2405.17730 (cross-list from cs.CV) [pdf, html, other]: Title: MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Yake Wei, Di Hu

Comments: Accepted by ICML2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[86] arXiv:2405.17842 (cross-list from cs.CV) [pdf, html, other]: Title: MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation

Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Comments: ICLR 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2405.18386 (cross-list from cs.SD) [pdf, html, other]: Title: Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

Comments: Accepted at ISMIR 2025 Conference. Code and demo are available at: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[88] arXiv:2405.18726 (cross-list from cs.SD) [pdf, html, other]: Title: Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI

Che Liu, Changde Du, Xiaoyu Chen, Huiguang He

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[89] arXiv:2405.18790 (cross-list from cs.CV) [pdf, html, other]: Title: Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics

Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, Shiqi Wang

Comments: Accepted to IEEE Transactions on Multimedia 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[90] arXiv:2405.18887 (cross-list from cs.HC) [pdf, html, other]: Title: 4Doodle: Two-handed Gestures for Immersive Sketching of Architectural Models

Fernando Fonseca, Maurício Sousa, Daniel Mendes, Alfredo Ferreira, Joaquim Jorge

Comments: 9 pages; 15 Figures

Subjects: Human-Computer Interaction (cs.HC); Computational Engineering, Finance, and Science (cs.CE); Graphics (cs.GR); Multimedia (cs.MM)
[91] arXiv:2405.18959 (cross-list from cs.CV) [pdf, html, other]: Title: Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval

Rui Yang, Shuang Wang, Yingping Han, Yuanheng Li, Dong Zhao, Dou Quan, Yanhe Guo, Licheng Jiao

Comments: 16 pages, 9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[92] arXiv:2405.18991 (cross-list from cs.CV) [pdf, html, other]: Title: EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang

Comments: 8 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[93] arXiv:2405.19226 (cross-list from cs.CV) [pdf, html, other]: Title: ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions

Honglin Lin, Siyu Li, Guoshun Nan, Chaoyue Tang, Xueting Wang, Jingxin Xu, Rong Yankai, Zhili Zhou, Yutong Gao, Qimei Cui, Xiaofeng Tao

Comments: Accepted in ACL 2024 Findings

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[94] arXiv:2405.19334 (cross-list from cs.AI) [pdf, html, other]: Title: LLMs Meet Multimodal Generation and Editing: A Survey

Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen

Comments: 52 Pages with 16 Figures, 12 Tables, and 545 References. GitHub Repository at: this https URL

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
[95] arXiv:2405.19889 (cross-list from eess.SP) [pdf, html, other]: Title: Deep Joint Semantic Coding and Beamforming for Near-Space Airship-Borne Massive MIMO Network

Minghui Wu, Zhen Gao, Zhaocheng Wang, Dusit Niyato, George K. Karagiannidis, Sheng Chen

Comments: Major Revision by IEEE JSAC

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG); Multimedia (cs.MM)
[96] arXiv:2405.20032 (cross-list from cs.NI) [pdf, html, other]: Title: Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion

Jiangkai Wu, Liming Liu, Yunpeng Tan, Junlin Hao, Xinggong Zhang

Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[97] arXiv:2405.20606 (cross-list from cs.CV) [pdf, html, other]: Title: Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning

Yang Chen, Tian He, Junfeng Fu, Ling Wang, Jingcai Guo, Ting Hu, Hong Cheng

Comments: Accepted by IEEE Transactions on Multimedia

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[98] arXiv:2405.20675 (cross-list from cs.CV) [pdf, html, other]: Title: Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling

Kidist Amde Mekonnen, Nicola Dall'Asen, Paolo Rota

Comments: 7 pages, 11 figures, ELLIS Doctoral Symposium 2023 in Helsinki, Finland

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[99] arXiv:2405.20687 (cross-list from cs.CV) [pdf, html, other]: Title: Conditioning GAN Without Training Dataset

Kidist Amde Mekonnen

Comments: 5 pages, 2 figures, Part of my MSc project course, School Project Course 2022

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[100] arXiv:2405.20775 (cross-list from cs.CR) [pdf, html, other]: Title: Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Xijie Huang, Xinyuan Wang, Hantao Zhang, Yinghao Zhu, Jiawen Xi, Jingkun An, Hao Wang, Hao Liang, Chengwei Pan

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)

Total of 100 entries : 1-50 51-100

Showing up to 50 entries per page: fewer | more | all