Multimedia

Authors and titles for June 2023

Total of 81 entries : 1-50 51-81

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2306.03395 [pdf, other]: Title: Computational Technologies for Fashion Recommendation: A Survey

Yujuan Ding, Zhihui Lai, P. Y. Mok, Tat-Seng Chua

Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR)
[2] arXiv:2306.03873 [pdf, other]: Title: Pivotuner: automatic real-time pure intonation and microtonal modulation

Dmitri Volkov

Comments: 5 pages, associated files and additional information available at this https URL

Subjects: Multimedia (cs.MM)
[3] arXiv:2306.04202 [pdf, other]: Title: Video Compression with Arbitrary Rescaling Network

Mengxi Guo, Shijie Zhao, Hao Jiang, Junlin Li, Li Zhang

Comments: Accepted as a one-page poster by 2023 Data Compression Conference (DCC). This is the full paper

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[4] arXiv:2306.05241 [pdf, other]: Title: Two Heads Are Better Than One: Improving Fake News Video Detection by Correlating with Neighbors

Peng Qi, Yuyang Zhao, Yufeng Shen, Wei Ji, Juan Cao, Tat-Seng Chua

Comments: To appear in ACL 2023 Findings

Subjects: Multimedia (cs.MM)
[5] arXiv:2306.06285 [pdf, other]: Title: Circular Rectifiction of 3D Video and Efficient Modification of 3D-HEVC

Jarosław Samelak, Marek Domański

Subjects: Multimedia (cs.MM); Image and Video Processing (eess.IV)
[6] arXiv:2306.07187 [pdf, other]: Title: Video-to-Music Recommendation using Temporal Alignment of Segments

Laure Prétet, Gaël Richard, Clément Souchier, Geoffroy Peeters

Journal-ref: IEEE Transactions on Multimedia, 18 February 2022

Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7] arXiv:2306.08089 [pdf, html, other]: Title: 360TripleView: 360-Degree Video View Management System Driven by Convergence Value of Viewing Preferences

Qian Zhou, Michael Zink, Ramesh Sitaraman, Klara Nahrstedt

Subjects: Multimedia (cs.MM)
[8] arXiv:2306.08306 [pdf, other]: Title: Towards Balanced Active Learning for Multimodal Classification

Meng Shen, Yizheng Huang, Jianxiong Yin, Heqing Zou, Deepu Rajan, Simon See

Comments: 12 pages, accepted by ACMMM 2023

Subjects: Multimedia (cs.MM); Machine Learning (cs.LG)
[9] arXiv:2306.08966 [pdf, other]: Title: Training Multimedia Event Extraction With Generated Images and Captions

Zilin Du, Yunxin Li, Xu Guo, Yidan Sun, Boyang Li

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[10] arXiv:2306.09431 [pdf, other]: Title: Towards Long Form Audio-visual Video Understanding

Wenxuan Hou, Guangyao Li, Yapeng Tian, Di Hu

Subjects: Multimedia (cs.MM)
[11] arXiv:2306.09776 [pdf, other]: Title: Inspire creativity with ORIBA: Transform Artists' Original Characters into Chatbots through Large Language Model

Yuqian Sun, Xingyu Li, Ze Gao

Comments: 5 pages, 2 figures, 1 table

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[12] arXiv:2306.11341 [pdf, other]: Title: MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian

Willy Fitra Hendria

Comments: 13 pages, 5 figures, 5 tables

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[13] arXiv:2306.12790 [pdf, other]: Title: DiffWA: Diffusion Models for Watermark Attack

Xinyu Li

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[14] arXiv:2306.12829 [pdf, other]: Title: Relevance-Based Compression of Cataract Surgery Videos

Natalia Mathá, Klaus Schoeffmann, Konstantin Schekotihin, Stephanie Sarny, Doris Putzgruber-Adamitsch, Yosuf El-Shabrawi

Comments: 11 pages, 5 figures, 3 tables

Subjects: Multimedia (cs.MM)
[15] arXiv:2306.13592 [pdf, other]: Title: TACOformer:Token-channel compounded Cross Attention for Multimodal Emotion Recognition

Xinda Li

Comments: Accepted by IJCAI 2023- AI4TS workshop

Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16] arXiv:2306.14170 [pdf, other]: Title: AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction

Jiuxin Lin, Xinyu Cai, Heinrich Dinkel, Jun Chen, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Zhiyong Wu, Yujun Wang, Helen Meng

Comments: Accepted by ICASSP2023

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2306.15401 [pdf, html, other]: Title: Explainable Multimodal Emotion Recognition

Zheng Lian, Haiyang Sun, Licai Sun, Hao Gu, Zhuofan Wen, Siyuan Zhang, Shun Chen, Mingyu Xu, Ke Xu, Kang Chen, Lan Chen, Shan Liang, Ya Li, Jiangyan Yi, Bin Liu, Jianhua Tao

Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC)
[18] arXiv:2306.15808 [pdf, other]: Title: Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data

Kai Chieh Chang, Mark Hasegawa-Johnson, Nancy L. McElwain, Bashima Islam

Comments: Preprint for APSIPA2023

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[19] arXiv:2306.16359 [pdf, other]: Title: Mulsemedia Communication Research Challenges for Metaverse in 6G Wireless Systems

Ian F. Akyildiz, Hongzhi Guo, Rui Dai, Wolfgang Gerstacker

Journal-ref: ITU Journal on Future and Evolving Technologies, Volume 4 (2023), Issue 4, Pages 562-579

Subjects: Multimedia (cs.MM)
[20] arXiv:2306.16786 [pdf, other]: Title: All-intra rate control using low complexity video features for Versatile Video Coding

Vignesh V Menon, Anastasia Henkel, Prajit T Rajendran, Christian R. Helmrich, Adam Wieckowski, Benjamin Bross, Christian Timmerer, Detlev Marpe

Comments: Accepted in IEEE International Conference on Image Processing (ICIP), 2023

Subjects: Multimedia (cs.MM)
[21] arXiv:2306.17498 [pdf, other]: Title: INDCOR White Paper 0: Interactive Digital Narratives (IDNs) -- A Solution to the Challenge of Representing Complex Issues

Hartmut Koenitz, Jonathan Barbara, Lissa Holloway-Attaway, Frank Nack, Mirjam Palosaari Eladhari, Agnes Bakk

Subjects: Multimedia (cs.MM)
[22] arXiv:2306.00110 (cross-list from cs.SD) [pdf, other]: Title: MuseCoco: Generating Symbolic Music from Text

Peiling Lu, Xin Xu, Chenfei Kang, Botao Yu, Chengyi Xing, Xu Tan, Jiang Bian

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[23] arXiv:2306.01016 (cross-list from cs.CL) [pdf, other]: Title: PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

Hejie Cui, Rongmei Lin, Nasser Zalmout, Chenwei Zhang, Jingbo Shang, Carl Yang, Xian Li

Comments: ACL 2023 Findings

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[24] arXiv:2306.01081 (cross-list from cs.CV) [pdf, other]: Title: 4DSR-GCN: 4D Video Point Cloud Upsampling using Graph Convolutional Networks

Lorenzo Berlincioni, Stefano Berretti, Marco Bertini, Alberto Del Bimbo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[25] arXiv:2306.01304 (cross-list from cs.SD) [pdf, other]: Title: JEPOO: Highly Accurate Joint Estimation of Pitch, Onset and Offset for Music Information Retrieval

Haojie Wei, Jun Yuan, Rui Zhang, Yueguo Chen, Gang Wang

Comments: This paper has been accepted by IJCAI 2023; 11 pages, 6 figures

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[26] arXiv:2306.02623 (cross-list from cs.CV) [pdf, other]: Title: Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models

Jiabang He, Yi Hu, Lei Wang, Xing Xu, Ning Liu, Hui Liu, Heng Tao Shen

Comments: SIGIR 2023. The code and datasets for our Do-GOOD benchmark can be found at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[27] arXiv:2306.02898 (cross-list from cs.CV) [pdf, other]: Title: Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark

Shuyu Yang, Yinan Zhou, Yaxiong Wang, Yujiao Wu, Li Zhu, Zhedong Zheng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[28] arXiv:2306.03403 (cross-list from cs.CV) [pdf, html, other]: Title: SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

Xuewei Li, Tao Wu, Zhongang Qi, Gaoang Wang, Ying Shan, Xi Li

Comments: Accepted by IJCAI 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[29] arXiv:2306.03718 (cross-list from cs.SD) [pdf, other]: Title: Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder

Shulei Ji, Xinyu Yang

Comments: Accepted by IEEE SMC 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[30] arXiv:2306.04216 (cross-list from cs.CV) [pdf, other]: Title: MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Jielin Qiu, Jiacheng Zhu, William Han, Aditesh Kumar, Karthik Mittal, Claire Jin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Ding Zhao, Bo Li, Lijuan Wang

Comments: Project website: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[31] arXiv:2306.04321 (cross-list from cs.AI) [pdf, other]: Title: Generative Semantic Communication: Diffusion Models Beyond Bit Recovery

Eleonora Grassucci, Sergio Barbarossa, Danilo Comminiello

Subjects: Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[32] arXiv:2306.04345 (cross-list from cs.CV) [pdf, other]: Title: An Overview of Challenges in Egocentric Text-Video Retrieval

Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim

Comments: 4 pages, CVPR 2023 Joint Ego4D&EPIC Workshop, Extended Abstract

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM)
[33] arXiv:2306.04628 (cross-list from cs.SD) [pdf, other]: Title: Systematic Analysis of Music Representations from BERT

Sangjun Han, Hyeongrae Ihm, Woohyung Lim

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[34] arXiv:2306.05268 (cross-list from cs.LG) [pdf, other]: Title: Factorized Contrastive Learning: Going Beyond Multi-view Redundancy

Paul Pu Liang, Zihao Deng, Martin Ma, James Zou, Louis-Philippe Morency, Ruslan Salakhutdinov

Comments: NeurIPS 2023. Code available at: this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[35] arXiv:2306.05523 (cross-list from cs.CL) [pdf, other]: Title: FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering

Megha Chakraborty, Khushbu Pahwa, Anku Rani, Shreyas Chatterjee, Dwip Dalal, Harshit Dave, Ritvik G, Preethi Gurumurthy, Adarsh Mahor, Samahriti Mukherjee, Aditya Pakala, Ishan Paul, Janvita Reddy, Arghya Sarkar, Kinjal Sensharma, Aman Chadha, Amit P. Sheth, Amitava Das

Comments: arXiv admin note: text overlap with arXiv:2305.04329

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[36] arXiv:2306.05584 (cross-list from cs.CV) [pdf, other]: Title: Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation

Jia-Xing Zhong, Ta-Ying Cheng, Yuhang He, Kai Lu, Kaichen Zhou, Andrew Markham, Niki Trigoni

Comments: To appear at NeurIPS 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[37] arXiv:2306.05704 (cross-list from cs.CV) [pdf, other]: Title: Exploring Effective Mask Sampling Modeling for Neural Image Compression

Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Comments: 10 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[38] arXiv:2306.06284 (cross-list from cs.SD) [pdf, other]: Title: Everybody Compose: Deep Beats To Music

Conghao Shen, Violet Z. Yao, Yixin Liu

Comments: Accepted MMSys '23

Journal-ref: Proceedings of the 14th Conference on ACM Multimedia Systems (2023)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[39] arXiv:2306.06979 (cross-list from cs.HC) [pdf, other]: Title: A Weakly Supervised Approach to Emotion-change Prediction and Improved Mood Inference

Soujanya Narayana, Ibrahim Radwan, Ravikiran Parameshwara, Iman Abbasnejad, Akshay Asthana, Ramanathan Subramanian, Roland Goecke

Comments: 9 pages, 3 figures, 6 tables, published in IEEE International Conference on Affective Computing and Intelligent Interaction

Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM)
[40] arXiv:2306.07346 (cross-list from cs.CV) [pdf, html, other]: Title: Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training

Lorenzo Baraldi, Roberto Amoroso, Marcella Cornia, Lorenzo Baraldi, Andrea Pilzer, Rita Cucchiara

Comments: Computer Vision and Image Understanding (2025)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[41] arXiv:2306.07646 (cross-list from cs.CV) [pdf, other]: Title: Enhanced Multimodal Representation Learning with Cross-modal KD

Mengxi Chen, Linyu Xing, Yu Wang, Ya Zhang

Comments: Accepted by CVPR2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[42] arXiv:2306.07678 (cross-list from cs.CV) [pdf, other]: Title: Localization of Just Noticeable Difference for Image Compression

Guangan Chen, Hanhe Lin, Oliver Wiedemann, Dietmar Saupe

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[43] arXiv:2306.07848 (cross-list from cs.CL) [pdf, html, other]: Title: GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition

Yu Pan, Yanni Hu, Yuguang Yang, Wen Fei, Jixun Yao, Heng Lu, Lei Ma, Jianjun Zhao

Comments: 5 pages

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44] arXiv:2306.07969 (cross-list from cs.CV) [pdf, other]: Title: GeneCIS: A Benchmark for General Conditional Image Similarity

Sagar Vaze, Nicolas Carion, Ishan Misra

Comments: CVPR 2023 (Highlighted Paper). Project page at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[45] arXiv:2306.08657 (cross-list from cs.CV) [pdf, other]: Title: EMERSK -- Explainable Multimodal Emotion Recognition with Situational Knowledge

Mijanur Palash, Bharat Bhargava

Comments: Emotion Recognition, Deep Learning, Multi-modal, Convolutional neural network (CNN), LSTM, Situational-Knowledge, Novelty

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[46] arXiv:2306.08730 (cross-list from eess.SP) [pdf, html, other]: Title: Over-the-Air Learning-based Geometry Point Cloud Transmission

Chenghong Bian, Yulin Shao, Deniz Gunduz

Comments: 17 pages, accepted to IEEE JSAC SI on Intelligent Communications for Real-Time Computer Vision (Comm4CV)

Subjects: Signal Processing (eess.SP); Multimedia (cs.MM)
[47] arXiv:2306.08733 (cross-list from cs.CV) [pdf, other]: Title: Continuous Learning Based Novelty Aware Emotion Recognition System

Mijanur Palash, Bharat Bhargava

Comments: Automatic Emotion Detection, Novelty, Deep Learning

Journal-ref: AAAI Spring Symposium 2022

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[48] arXiv:2306.09085 (cross-list from cs.CV) [pdf, other]: Title: COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Sihan Chen, Xingjian He, Handong Li, Xiaojie Jin, Jiashi Feng, Jing Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[49] arXiv:2306.09126 (cross-list from cs.SD) [pdf, other]: Title: STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

Comments: 27 pages, 9 figures, accepted for publication in NeurIPS 2023 Track on Datasets and Benchmarks

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[50] arXiv:2306.09382 (cross-list from cs.SD) [pdf, other]: Title: Sound Demixing Challenge 2023 Music Demixing Track Technical Report: TFC-TDF-UNet v3

Minseok Kim, Jun Hyung Lee, Soonyoung Jung

Comments: 5 pages, 4 tables

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Total of 81 entries : 1-50 51-81

Showing up to 50 entries per page: fewer | more | all