Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.MM

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Multimedia

Authors and titles for April 2024

Total of 123 entries : 1-25 26-50 51-75 76-100 ... 101-123
Showing up to 25 entries per page: fewer | more | all
[1] arXiv:2404.04545 [pdf, html, other]
Title: TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis
Weize Quan, Yunfei Feng, Ming Zhou, Yunzhen Zhao, Tong Wang, Dong-Ming Yan
Subjects: Multimedia (cs.MM); Computation and Language (cs.CL)
[2] arXiv:2404.05522 [pdf, html, other]
Title: 3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering
Qingyuan Zhou, Weidong Yang, Ben Fei, Jingyi Xu, Rui Zhang, Keyi Liu, Yeqi Luo, Ying He
Comments: Accepted at AAAI-25
Subjects: Multimedia (cs.MM)
[3] arXiv:2404.07484 [pdf, other]
Title: Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
Yuan Zhang, Xiaomei Tao, Hanxu Ai, Tao Chen, Yanling Gan
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[4] arXiv:2404.07872 [pdf, html, other]
Title: Video Compression Beyond VVC: Quantitative Analysis of Intra Coding Tools in Enhanced Compression Model (ECM)
Mohsen Abdoli, Ramin G. Youvalari, Karam Naser, Kevin Reuzé, Fabrice Le Léannec
Comments: Submitted to IEEE ICIP 2024
Subjects: Multimedia (cs.MM); Image and Video Processing (eess.IV)
[5] arXiv:2404.08264 [pdf, html, other]
Title: Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis
Masahiro Yasuda, Noboru Harada, Yasunori Ohishi, Shoichiro Saito, Akira Nakayama, Nobutaka Ono
Comments: 13page, 7figure, under review
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[6] arXiv:2404.09029 [pdf, other]
Title: A Parametric Rate-Distortion Model for Video Transcoding
Maedeh Jamali, Nader Karimi, Shadrokh Samavi, Shahram Shirani
Subjects: Multimedia (cs.MM); Information Theory (cs.IT); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[7] arXiv:2404.09245 [pdf, html, other]
Title: Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
Haosong Peng, Wei Feng, Hao Li, Yufeng Zhan, Ren Jin, Yuanqing Xia
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[8] arXiv:2404.10528 [pdf, html, other]
Title: AllTheDocks road safety dataset: A cyclist's perspective and experience
Chia-Yen Chiang, Ruikang Zhong, Jennifer Ding, Joseph Wood, Stephen Bee, Mona Jaber
Subjects: Multimedia (cs.MM)
[9] arXiv:2404.10702 [pdf, html, other]
Title: Retrieval Augmented Verification for Zero-Shot Detection of Multimodal Disinformation
Arka Ujjal Dey, Artemis Llabrés, Ernest Valveny, Dimosthenis Karatzas
Subjects: Multimedia (cs.MM)
[10] arXiv:2404.11938 [pdf, html, other]
Title: HyDiscGAN: A Hybrid Distributed cGAN for Audio-Visual Privacy Preservation in Multimodal Sentiment Analysis
Zhuojia Wu, Qi Zhang, Duoqian Miao, Kun Yi, Wei Fan, Liang Hu
Comments: 13 pages, IJCAI-2024
Subjects: Multimedia (cs.MM); Distributed, Parallel, and Cluster Computing (cs.DC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:2404.12169 [pdf, html, other]
Title: Shotit: compute-efficient image-to-video search engine for the cloud
Leslie Wong
Comments: Submitted to ACM ICMR 2024
Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR)
[12] arXiv:2404.12903 [pdf, html, other]
Title: ConCLVD: Controllable Chinese Landscape Video Generation via Diffusion Model
Dingming Liu, Shaowei Li, Ruoyan Zhou, Lili Liang, Yongguan Hong, Fei Chao, Rongrong Ji
Subjects: Multimedia (cs.MM)
[13] arXiv:2404.13134 [pdf, html, other]
Title: Deep Learning-based Text-in-Image Watermarking
Bishwa Karki, Chun-Hua Tsai, Pei-Chi Huang, Xin Zhong
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[14] arXiv:2404.13619 [pdf, html, other]
Title: Towards Unified Representation of Multi-Modal Pre-training for 3D Understanding via Differentiable Rendering
Ben Fei, Yixuan Li, Weidong Yang, Lipeng Ma, Ying He
Subjects: Multimedia (cs.MM)
[15] arXiv:2404.13640 [pdf, html, other]
Title: Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer
Kepeng Xu, Li Xu, Gang He, Wenxin Yu, Yunsong Li
Comments: 9 pages
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[16] arXiv:2404.13792 [pdf, html, other]
Title: Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion Outcome
Donghuo Zeng, Roberto S. Legaspi, Yuewen Sun, Xinshuai Dong, Kazushi Ikeda, Peter Spirtes, kun Zhang
Comments: 14 pages, 10 figures, Accepted by Persuasive Technology 2024
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[17] arXiv:2404.13993 [pdf, html, other]
Title: Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion
Yingxuan Li, Ryota Hinami, Kiyoharu Aizawa, Yusuke Matsui
Comments: Accepted to ACM Multimedia 2024. Project page: this https URL ; Github repo: this https URL
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[18] arXiv:2404.14573 [pdf, html, other]
Title: Tile-Weighted Rate-Distortion Optimized Packet Scheduling for 360$^\circ$ VR Video Streaming
Haopeng Wang, Haiwei Dong, Abdulmotaleb El Saddik
Comments: Accepted by IEEE Intelligent Systems
Subjects: Multimedia (cs.MM)
[19] arXiv:2404.14687 [pdf, html, other]
Title: Pegasus-v1 Technical Report
Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon, Genie Heo, Henry Choi, Jenna Kang, Kevin Han, Noah Seo, Sunny Nguyen, Ryan Won, Yeonhoo Park, Anthony Giuliani, Dave Chung, Hans Yoon, James Le, Jenny Ahn, June Lee, Maninder Saini, Meredith Sanders, Soyoung Lee, Sue Kim, Travis Couture
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[20] arXiv:2404.14755 [pdf, html, other]
Title: SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models
Bo Lin, Yingjing Xu, Xuanwen Bao, Zhou Zhao, Zhouyang Wang, Jianwei Yin
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[21] arXiv:2404.14934 [pdf, html, other]
Title: G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition
Kaikai Deng, Dong Zhao, Wenxin Zheng, Yue Ling, Kangwen Yin, Huadong Ma
Comments: 18 pages, 29 figures
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[22] arXiv:2404.15875 [pdf, html, other]
Title: Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval
Haokun Wen, Xuemeng Song, Xiaolin Chen, Yinwei Wei, Liqiang Nie, Tat-Seng Chua
Comments: ACM SIGIR 2024
Subjects: Multimedia (cs.MM)
[23] arXiv:2404.16305 [pdf, html, other]
Title: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
Gehui Chen, Guan'an Wang, Xiaowen Huang, Jitao Sang
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2404.17151 [pdf, html, other]
Title: MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection
Chengpei Xu, Wenjing Jia, Ruomei Wang, Xiaonan Luo, Xiangjian He
Comments: Accepted by Transaction on Multimedia
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[25] arXiv:2404.18162 [pdf, html, other]
Title: fMRI Exploration of Visual Quality Assessment
Yiming Zhang, Ying Hu, Xiongkuo Min, Yan Zhou, Guangtao Zhai
Subjects: Multimedia (cs.MM); Neurons and Cognition (q-bio.NC)
Total of 123 entries : 1-25 26-50 51-75 76-100 ... 101-123
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack