Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.MM

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Multimedia

Authors and titles for June 2023

Total of 81 entries : 1-25 26-50 51-75 76-81
Showing up to 25 entries per page: fewer | more | all
[26] arXiv:2306.02623 (cross-list from cs.CV) [pdf, other]
Title: Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models
Jiabang He, Yi Hu, Lei Wang, Xing Xu, Ning Liu, Hui Liu, Heng Tao Shen
Comments: SIGIR 2023. The code and datasets for our Do-GOOD benchmark can be found at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[27] arXiv:2306.02898 (cross-list from cs.CV) [pdf, other]
Title: Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark
Shuyu Yang, Yinan Zhou, Yaxiong Wang, Yujiao Wu, Li Zhu, Zhedong Zheng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[28] arXiv:2306.03403 (cross-list from cs.CV) [pdf, html, other]
Title: SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation
Xuewei Li, Tao Wu, Zhongang Qi, Gaoang Wang, Ying Shan, Xi Li
Comments: Accepted by IJCAI 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[29] arXiv:2306.03718 (cross-list from cs.SD) [pdf, other]
Title: Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder
Shulei Ji, Xinyu Yang
Comments: Accepted by IEEE SMC 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[30] arXiv:2306.04216 (cross-list from cs.CV) [pdf, other]
Title: MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu, Jiacheng Zhu, William Han, Aditesh Kumar, Karthik Mittal, Claire Jin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Ding Zhao, Bo Li, Lijuan Wang
Comments: Project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[31] arXiv:2306.04321 (cross-list from cs.AI) [pdf, other]
Title: Generative Semantic Communication: Diffusion Models Beyond Bit Recovery
Eleonora Grassucci, Sergio Barbarossa, Danilo Comminiello
Subjects: Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[32] arXiv:2306.04345 (cross-list from cs.CV) [pdf, other]
Title: An Overview of Challenges in Egocentric Text-Video Retrieval
Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim
Comments: 4 pages, CVPR 2023 Joint Ego4D&EPIC Workshop, Extended Abstract
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM)
[33] arXiv:2306.04628 (cross-list from cs.SD) [pdf, other]
Title: Systematic Analysis of Music Representations from BERT
Sangjun Han, Hyeongrae Ihm, Woohyung Lim
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[34] arXiv:2306.05268 (cross-list from cs.LG) [pdf, other]
Title: Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Paul Pu Liang, Zihao Deng, Martin Ma, James Zou, Louis-Philippe Morency, Ruslan Salakhutdinov
Comments: NeurIPS 2023. Code available at: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[35] arXiv:2306.05523 (cross-list from cs.CL) [pdf, other]
Title: FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering
Megha Chakraborty, Khushbu Pahwa, Anku Rani, Shreyas Chatterjee, Dwip Dalal, Harshit Dave, Ritvik G, Preethi Gurumurthy, Adarsh Mahor, Samahriti Mukherjee, Aditya Pakala, Ishan Paul, Janvita Reddy, Arghya Sarkar, Kinjal Sensharma, Aman Chadha, Amit P. Sheth, Amitava Das
Comments: arXiv admin note: text overlap with arXiv:2305.04329
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[36] arXiv:2306.05584 (cross-list from cs.CV) [pdf, other]
Title: Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation
Jia-Xing Zhong, Ta-Ying Cheng, Yuhang He, Kai Lu, Kaichen Zhou, Andrew Markham, Niki Trigoni
Comments: To appear at NeurIPS 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[37] arXiv:2306.05704 (cross-list from cs.CV) [pdf, other]
Title: Exploring Effective Mask Sampling Modeling for Neural Image Compression
Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[38] arXiv:2306.06284 (cross-list from cs.SD) [pdf, other]
Title: Everybody Compose: Deep Beats To Music
Conghao Shen, Violet Z. Yao, Yixin Liu
Comments: Accepted MMSys '23
Journal-ref: Proceedings of the 14th Conference on ACM Multimedia Systems (2023)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[39] arXiv:2306.06979 (cross-list from cs.HC) [pdf, other]
Title: A Weakly Supervised Approach to Emotion-change Prediction and Improved Mood Inference
Soujanya Narayana, Ibrahim Radwan, Ravikiran Parameshwara, Iman Abbasnejad, Akshay Asthana, Ramanathan Subramanian, Roland Goecke
Comments: 9 pages, 3 figures, 6 tables, published in IEEE International Conference on Affective Computing and Intelligent Interaction
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM)
[40] arXiv:2306.07346 (cross-list from cs.CV) [pdf, html, other]
Title: Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Lorenzo Baraldi, Roberto Amoroso, Marcella Cornia, Lorenzo Baraldi, Andrea Pilzer, Rita Cucchiara
Comments: Computer Vision and Image Understanding (2025)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[41] arXiv:2306.07646 (cross-list from cs.CV) [pdf, other]
Title: Enhanced Multimodal Representation Learning with Cross-modal KD
Mengxi Chen, Linyu Xing, Yu Wang, Ya Zhang
Comments: Accepted by CVPR2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[42] arXiv:2306.07678 (cross-list from cs.CV) [pdf, other]
Title: Localization of Just Noticeable Difference for Image Compression
Guangan Chen, Hanhe Lin, Oliver Wiedemann, Dietmar Saupe
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[43] arXiv:2306.07848 (cross-list from cs.CL) [pdf, html, other]
Title: GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition
Yu Pan, Yanni Hu, Yuguang Yang, Wen Fei, Jixun Yao, Heng Lu, Lei Ma, Jianjun Zhao
Comments: 5 pages
Subjects: Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44] arXiv:2306.07969 (cross-list from cs.CV) [pdf, other]
Title: GeneCIS: A Benchmark for General Conditional Image Similarity
Sagar Vaze, Nicolas Carion, Ishan Misra
Comments: CVPR 2023 (Highlighted Paper). Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[45] arXiv:2306.08657 (cross-list from cs.CV) [pdf, other]
Title: EMERSK -- Explainable Multimodal Emotion Recognition with Situational Knowledge
Mijanur Palash, Bharat Bhargava
Comments: Emotion Recognition, Deep Learning, Multi-modal, Convolutional neural network (CNN), LSTM, Situational-Knowledge, Novelty
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[46] arXiv:2306.08730 (cross-list from eess.SP) [pdf, html, other]
Title: Over-the-Air Learning-based Geometry Point Cloud Transmission
Chenghong Bian, Yulin Shao, Deniz Gunduz
Comments: 17 pages, accepted to IEEE JSAC SI on Intelligent Communications for Real-Time Computer Vision (Comm4CV)
Subjects: Signal Processing (eess.SP); Multimedia (cs.MM)
[47] arXiv:2306.08733 (cross-list from cs.CV) [pdf, other]
Title: Continuous Learning Based Novelty Aware Emotion Recognition System
Mijanur Palash, Bharat Bhargava
Comments: Automatic Emotion Detection, Novelty, Deep Learning
Journal-ref: AAAI Spring Symposium 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[48] arXiv:2306.09085 (cross-list from cs.CV) [pdf, other]
Title: COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Sihan Chen, Xingjian He, Handong Li, Xiaojie Jin, Jiashi Feng, Jing Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[49] arXiv:2306.09126 (cross-list from cs.SD) [pdf, other]
Title: STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji
Comments: 27 pages, 9 figures, accepted for publication in NeurIPS 2023 Track on Datasets and Benchmarks
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[50] arXiv:2306.09382 (cross-list from cs.SD) [pdf, other]
Title: Sound Demixing Challenge 2023 Music Demixing Track Technical Report: TFC-TDF-UNet v3
Minseok Kim, Jun Hyung Lee, Soonyoung Jung
Comments: 5 pages, 4 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Total of 81 entries : 1-25 26-50 51-75 76-81
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack