Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.MM

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Multimedia

Authors and titles for October 2022

Total of 98 entries
Showing up to 2000 entries per page: fewer | more | all
[1] arXiv:2210.00330 [pdf, other]
Title: Social VR and multi-party holographic communications: Opportunities, Challenges and Impact in the Education and Training Sectors
Mario Montagud, Gianluca Cernigliaro, Miguel Arevalillo-Herráez, Miguel García-Pineda, Jaume Segura-Garcia, Sergi Fernández
Subjects: Multimedia (cs.MM)
[2] arXiv:2210.00821 [pdf, other]
Title: A high accuracy and low complexity quality control method for image compression
Xiao Yan, Zhangxin Gong, Wenqiang Wang, Xiaoyang Zeng, Yibo Fan
Subjects: Multimedia (cs.MM)
[3] arXiv:2210.01652 [pdf, other]
Title: A Conditional-Probability-Distribution Model for Bandwidth Estimation with Application in Live Video Streaming
Weijia Zheng
Comments: 5 pages, 6 figures
Subjects: Multimedia (cs.MM)
[4] arXiv:2210.02206 [pdf, other]
Title: Improving Visual-Semantic Embedding with Adaptive Pooling and Optimization Objective
Zijian Zhang, Chang Shu, Ya Xiao, Yuan Shen, Di Zhu, Jing Xiao, Youxin Chen, Jey Han Lau, Qian Zhang, Zheng Lu
Subjects: Multimedia (cs.MM)
[5] arXiv:2210.04677 [pdf, other]
Title: UAV Placement for Real-time Video Acquisition: A Tradeoff between Resolution and Delay
Tang Xiao-Wei, Huang Xin-Lin Huang
Comments: submitted to ieee for possible publication. arXiv admin note: text overlap with arXiv:2006.14438 by other authors
Subjects: Multimedia (cs.MM)
[6] arXiv:2210.06697 [pdf, other]
Title: Size Does Matter: An Experimental Study of Anxiety in Virtual Reality
Junyi Shen, Itaru Kitahara, Shinichi Koyama, Qiaoge Li
Comments: to appear in VRST 2022
Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC)
[7] arXiv:2210.06794 [pdf, other]
Title: Towards Holographic Video Communications: A Promising AI-driven Solution
Yakun Huang, Yuanwei Zhu, Xiuquan Qiao, Xiang Su, Schahram Dustdar, Ping Zhang
Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[8] arXiv:2210.06802 [pdf, other]
Title: Multi-Player Immersive Communications and Interactions in Metaverse: Challenges, Architecture, and Future Directions
Yakun Huang, Xiuquan Qiao, Haowen Wang, Xiang Su, Schahram Dustdar, Ping Zhang
Subjects: Multimedia (cs.MM)
[9] arXiv:2210.06808 [pdf, other]
Title: ISCom: Interest-aware Semantic Communication Scheme for Point Cloud Video Streaming
Yakun Huang, Boyuan Bai, Yuanwei Zhu, Xiuquan Qiao, Xiang Su, Ping Zhang
Subjects: Multimedia (cs.MM)
[10] arXiv:2210.07839 [pdf, other]
Title: Contrastive Audio-Visual Masked Autoencoder
Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass
Comments: Accepted at ICLR 2023 as a notable top 25% paper. Code and pretrained models are at this https URL
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:2210.09651 [pdf, other]
Title: Comparison of Popular Video Conferencing Apps Using Client-side Measurements on Different Backhaul Networks
Rohan Kumar, Dhruv Nagpal, Vinayak Naik, Dipanjan Chakraborty
Journal-ref: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing 2022
Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[12] arXiv:2210.09946 [pdf, other]
Title: MMGA: Multimodal Learning with Graph Alignment
Xuan Yang, Quanjin Tao, Xiao Feng, Donghong Cai, Xiang Ren, Yang Yang
Comments: Please contact xuany@zju.this http URL for the dataset
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[13] arXiv:2210.10330 [pdf, other]
Title: Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming
Vignesh V Menon, Hadi Amirpour, Prajit T Rajendran, Mohammad Ghanbari, Christian Timmerer
Comments: Accepted in PCS2022
Subjects: Multimedia (cs.MM)
[14] arXiv:2210.10972 [pdf, other]
Title: A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition
Vijay John, Yasutomo Kawanishi
Comments: Accepted for ACM Multimedia Asia, 2022
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[15] arXiv:2210.12201 [pdf, other]
Title: A computational analysis on the relationship between melodic originality and thematic fame in classical music from the Romantic period
Hudson Griffith
Subjects: Multimedia (cs.MM)
[16] arXiv:2210.13827 [pdf, other]
Title: End-to-end Transformer for Compressed Video Quality Enhancement
Li Yu, Wenshuai Chang, Shiyu Wu, Moncef Gabbouj
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[17] arXiv:2210.13890 [pdf, other]
Title: Fast multi-encoding to reduce the cost of video streaming
Hadi Amirpour, Vignesh V Menon, Ekrem Çetinkaya, Adithyan Ilangovan, Christian Feldmann, Martin Smole, Christian Timmerer
Comments: Accepted in IBC2022
Subjects: Multimedia (cs.MM)
[18] arXiv:2210.14349 [pdf, other]
Title: A DirectX-Based DICOM Viewer for Multi-User Surgical Planning in Augmented Reality
Menghe Zhang, Weichen Liu, Nadir Weibel, Jurgen Schulze
Journal-ref: ISVC 2022 symposium proceeding, will be on Lecture Notes in Computer Science (LNCS) series
Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC)
[19] arXiv:2210.15300 [pdf, other]
Title: Leveraging Computer Vision Application in Visual Arts: A Case Study on the Use of Residual Neural Network to Classify and Analyze Baroque Paintings
Daniel Kvak
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[20] arXiv:2210.15824 [pdf, other]
Title: Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis
Peipei Liu, Xin Zheng, Hong Li, Jie Liu, Yimo Ren, Hongsong Zhu, Limin Sun
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[21] arXiv:2210.16470 [pdf, other]
Title: Improving Audio Captioning Using Semantic Similarity Metrics
Rehana Mahfuz, Yinyi Guo, Erik Visser
Comments: Accepted at ICASSP 2023
Subjects: Multimedia (cs.MM)
[22] arXiv:2210.16639 [pdf, other]
Title: GRACE: Loss-Resilient Real-Time Video Communication Using Data-Scalable Autoencoder
Yihua Cheng, Anton Arapin, Ziyi Zhang, Qizheng Zhang, Hanchen Li, Nick Feamster, Junchen Jiang
Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[23] arXiv:2210.00312 (cross-list from cs.CL) [pdf, other]
Title: Multimodal Analogical Reasoning over Knowledge Graphs
Ningyu Zhang, Lei Li, Xiang Chen, Xiaozhuan Liang, Shumin Deng, Huajun Chen
Comments: Accepted by ICLR 2023. The project website is this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[24] arXiv:2210.00378 (cross-list from eess.AS) [pdf, other]
Title: Optimized Decoders for Mixed-Order Ambisonics
Aaron Heller (1), Eric Benjamin (2), Fernando Lopez-Lezcano (3) ((1) Artificial Intelligence Center, SRI International, (2) Surround Research, (3) Center for Computer Research in Music and Acoustics (CCRMA), Stanford University)
Comments: 9 pages, 10 figures,
Journal-ref: Paper 10507, 150th Audio Engineering Society Convention, May 2021
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[25] arXiv:2210.00434 (cross-list from eess.AS) [pdf, other]
Title: Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings
Zhihuan Kuang, Shi Zong, Jianbing Zhang, Jiajun Chen, Hongfu Liu
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)
[26] arXiv:2210.00753 (cross-list from cs.SD) [pdf, other]
Title: Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection
Xuanjun Chen, Haibin Wu, Helen Meng, Hung-yi Lee, Jyh-Shing Roger Jang
Comments: Accepted by SLT 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[27] arXiv:2210.00757 (cross-list from cs.CV) [pdf, other]
Title: Fully Transformer Network for Change Detection of Remote Sensing Images
Tianyu Yan, Zifu Wan, Pingping Zhang
Comments: 18 pages, 6 figures and 5 tables. This work will appear in ACCV2022 as a poster paper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[28] arXiv:2210.01402 (cross-list from cs.CV) [pdf, other]
Title: Streaming Video Analytics On The Edge With Asynchronous Cloud Support
Anurag Ghosh, Srinivasan Iyengar, Stephen Lee, Anuj Rathore, Venkat N Padmanabhan
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Multimedia (cs.MM)
[29] arXiv:2210.01719 (cross-list from cs.SD) [pdf, html, other]
Title: Learning Temporal Resolution in Spectrogram for Audio Classification
Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley
Comments: Accepted by the 38th Annual AAAI Conference on Artificial Intelligence
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[30] arXiv:2210.02227 (cross-list from cs.CV) [pdf, other]
Title: Comprint: Image Forgery Detection and Localization using Compression Fingerprints
Hannes Mareen, Dante Vanden Bussche, Fabrizio Guillaro, Davide Cozzolino, Glenn Van Wallendael, Peter Lambert, Luisa Verdoliva
Comments: Presented at the Workshop on MultiMedia FORensics in the WILD 2022, held in conjunction with the International Conference on Pattern Recognition (ICPR) 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[31] arXiv:2210.02257 (cross-list from cs.CR) [pdf, other]
Title: Hiding Images in Deep Probabilistic Models
Haoyu Chen, Linqi Song, Zhenxing Qian, Xinpeng Zhang, Kede Ma
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[32] arXiv:2210.02324 (cross-list from cs.CV) [pdf, other]
Title: Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images
Yafei Yang, Bo Yang
Comments: NeurIPS 2022. Code and data are available at project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Robotics (cs.RO)
[33] arXiv:2210.02391 (cross-list from cs.CV) [pdf, other]
Title: Geometry Driven Progressive Warping for One-Shot Face Animation
Yatao Zhong, Faezeh Amjadi, Ilya Zharkov
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[34] arXiv:2210.02437 (cross-list from cs.SD) [pdf, other]
Title: ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild
Xuechen Liu, Xin Wang, Md Sahidullah, Jose Patino, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas Evans, Andreas Nautsch, Kong Aik Lee
Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[35] arXiv:2210.02946 (cross-list from cs.IR) [pdf, other]
Title: VLSNR:Vision-Linguistics Coordination Time Sequence-aware News Recommendation
Songhao Han (1), Wei Huang (1), Xiaotian Luan (2) ((1) Beihang University, (2) Peking University)
Comments: 10 pages, 5 figures
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[36] arXiv:2210.03382 (cross-list from cs.CV) [pdf, other]
Title: Temporal Feature Alignment in Contrastive Self-Supervised Learning for Human Activity Recognition
Bulat Khaertdinov, Stylianos Asteriadis
Comments: Accepted to IJCB 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[37] arXiv:2210.03625 (cross-list from cs.CL) [pdf, other]
Title: C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass
Comments: Accepted at ICASSP 2023. The code, models, and dataset are available at this https URL
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[38] arXiv:2210.03799 (cross-list from cs.SD) [pdf, other]
Title: Supervised and Unsupervised Learning of Audio Representations for Music Understanding
Matthew C. McCallum, Filip Korzeniowski, Sergio Oramas, Fabien Gouyon, Andreas F. Ehmann
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[39] arXiv:2210.04112 (cross-list from cs.CV) [pdf, other]
Title: Leveraging progressive model and overfitting for efficient learned image compression
Honglei Zhang, Francesco Cricri, Hamed Rezazadegan Tavakoli, Emre Aksu, Miska M. Hannuksela
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[40] arXiv:2210.04135 (cross-list from cs.CV) [pdf, other]
Title: VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick, Li Jing, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann LeCun, Rama Chellappa
Comments: Published in TMLR 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[41] arXiv:2210.04183 (cross-list from cs.CV) [pdf, other]
Title: MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Zijia Zhao, Longteng Guo, Xingjian He, Shuai Shao, Zehuan Yuan, Jing Liu
Comments: SIGIR 2023, 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[42] arXiv:2210.04216 (cross-list from cs.CV) [pdf, other]
Title: AMPose: Alternately Mixed Global-Local Attention Model for 3D Human Pose Estimation
Hongxin Lin, Yunwei Chiu, Peiyuan Wu
Comments: ICASSP 2023 Accepted Paper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[43] arXiv:2210.04671 (cross-list from cs.CV) [pdf, other]
Title: TCDM: Transformational Complexity Based Distortion Metric for Perceptual Point Cloud Quality Assessment
Yujie Zhang, Qi Yang, Yifei Zhou, Xiaozhong Xu, Le Yang, Yiling Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[44] arXiv:2210.05335 (cross-list from cs.CV) [pdf, other]
Title: MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model
Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang
Comments: CVPR 2023 Main Track Long Paper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[45] arXiv:2210.05357 (cross-list from cs.CV) [pdf, other]
Title: Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment
Haoning Wu, Chaofeng Chen, Liang Liao, Jingwen Hou, Wenxiu Sun, Qiong Yan, Jinwei Gu, Weisi Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[46] arXiv:2210.05766 (cross-list from cs.CV) [pdf, other]
Title: Match Cutting: Finding Cuts with Smooth Visual Transitions
Boris Chen, Amir Ziai, Rebecca Tucker, Yuchen Xie
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[47] arXiv:2210.05916 (cross-list from cs.CL) [pdf, other]
Title: Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features
Gokul Karthik Kumar, Karthik Nandakumar
Comments: Accepted at EMNLP 2022 Workshop on NLP for Positive Impact
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[48] arXiv:2210.06366 (cross-list from cs.CV) [pdf, other]
Title: A Generalist Framework for Panoptic Segmentation of Images and Videos
Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, David J. Fleet
Comments: ICCV'23. Code at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[49] arXiv:2210.06756 (cross-list from cs.CV) [pdf, other]
Title: Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features
Changde Du, Kaicheng Fu, Jinpeng Li, Huiguang He
Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)
[50] arXiv:2210.06790 (cross-list from cs.RO) [pdf, other]
Title: Deep Gesture Generation for Social Robots Using Type-Specific Libraries
Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi
Subjects: Robotics (cs.RO); Multimedia (cs.MM)
[51] arXiv:2210.07055 (cross-list from cs.CV) [pdf, other]
Title: Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors
Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman
Comments: Accepted as a spotlight presentation for the BMVC 2022. Code: this https URL Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52] arXiv:2210.07594 (cross-list from cs.CV) [pdf, other]
Title: See Blue Sky: Deep Image Dehaze Using Paired and Unpaired Training Images
Xiaoyan Zhang, Gaoyang Tang, Yingying Zhu, Qi Tian
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[53] arXiv:2210.08164 (cross-list from cs.CV) [pdf, other]
Title: Linear Video Transformer with Feature Fixation
Kaiyue Lu, Zexiang Liu, Jianyuan Wang, Weixuan Sun, Zhen Qin, Dong Li, Xuyang Shen, Hui Deng, Xiaodong Han, Yuchao Dai, Yiran Zhong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[54] arXiv:2210.08481 (cross-list from cs.CV) [pdf, other]
Title: TLDW: Extreme Multimodal Summarisation of News Videos
Peggy Tang, Kun Hu, Lei Zhang, Jiebo Luo, Zhiyong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[55] arXiv:2210.08535 (cross-list from cs.CV) [pdf, other]
Title: Realistic, Animatable Human Reconstructions for Virtual Fit-On
Gayal Kuruppu, Bumuthu Dilshan, Shehan Samarasinghe, Nipuna Madhushan, Ranga Rodrigo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)
[56] arXiv:2210.08737 (cross-list from cs.CV) [pdf, other]
Title: Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows
Anyi Rao, Xuekun Jiang, Sichen Wang, Yuwei Guo, Zihao Liu, Bo Dai, Long Pang, Xiaoyu Wu, Dahua Lin, Libiao Jin
Comments: Extended Abstract of ECCV 2022 Workshop on AI for Creative Video Editing and Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[57] arXiv:2210.08759 (cross-list from cs.CL) [pdf, other]
Title: Towards Relation Extraction From Speech
Tongtong Wu, Guitao Wang, Jinming Zhao, Zhaoran Liu, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari
Comments: Accepted by EMNLP 2022
Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[58] arXiv:2210.08812 (cross-list from cs.CV) [pdf, other]
Title: ITSRN++: Stronger and Better Implicit Transformer Network for Continuous Screen Content Image Super-Resolution
Sheng Shen, Huanjing Yue, Jingyu Yang, Kun Li
Comments: 14pages,10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[59] arXiv:2210.08821 (cross-list from cs.CL) [pdf, other]
Title: MoSE: Modality Split and Ensemble for Multimodal Knowledge Graph Completion
Yu Zhao, Xiangrui Cai, Yike Wu, Haiwei Zhang, Ying Zhang, Guoqing Zhao, Ning Jiang
Comments: Accepted by EMNLP 2022
Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[60] arXiv:2210.09052 (cross-list from cs.CV) [pdf, other]
Title: Digital Image Forensics using Deep Learning
Akash Nagaraj, Mukund Sood, Vivek Kapoor, Yash Mathur, Bishesh Sinha
Comments: This paper was written in 2018 as a part of our submission to the 2018 IEEE Signal Processing Cup: Forensic Camera Model Identification Challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[61] arXiv:2210.09911 (cross-list from cs.HC) [pdf, other]
Title: Leveraging Cluster Analysis to Understand Educational Game Player Experiences and Support Design
Luke Swanson, David Gagnon, Jennifer Scianna, John McCloskey, Nicholas Spevacek, Stefan Slater, Erik Harpstead
Comments: Presented at Games, Learning & Society (GLS) 2022 Conference. Irving, CA
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM)
[62] arXiv:2210.10349 (cross-list from cs.SD) [pdf, other]
Title: Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation
Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang, Tao Qin, Tie-Yan Liu
Comments: Accepted by the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[63] arXiv:2210.10652 (cross-list from cs.IR) [pdf, other]
Title: Multi-Modal Recommendation System with Auxiliary Information
Mufhumudzi Muthivhi, Terence L. van Zyl, Hairong Wang
Comments: 15 pages, 3 figures, 3 tables, to be published in the SACAIR CCIS Springer proceedings volume
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)
[64] arXiv:2210.11065 (cross-list from cs.CV) [pdf, other]
Title: MovieCLIP: Visual Scene Recognition in Movies
Digbalay Bose, Rajat Hebbar, Krishna Somandepalli, Haoyang Zhang, Yin Cui, Kree Cole-McLaughlin, Huisheng Wang, Shrikanth Narayanan
Comments: Accepted to 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2023). Project website with supplemental material: this https URL. Revised version with updated author affiliations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[65] arXiv:2210.11166 (cross-list from cs.NI) [pdf, other]
Title: Dude, where's my NFT? Distributed Infrastructures for Digital Art
Leonhard Balduf, Martin Florian, Björn Scheuermann
Comments: To be presented at the DICG Workshop 2022
Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM)
[66] arXiv:2210.11319 (cross-list from cs.CV) [pdf, other]
Title: Image-Text Retrieval with Binary and Continuous Label Supervision
Zheng Li, Caili Guo, Zerun Feng, Jenq-Neng Hwang, Ying Jin, Yufeng Zhang
Comments: 13 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[67] arXiv:2210.11549 (cross-list from cs.CV) [pdf, other]
Title: H4VDM: H.264 Video Device Matching
Ziyue Xiang, Paolo Bestagini, Stefano Tubaro, Edward J. Delp
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[68] arXiv:2210.11603 (cross-list from cs.HC) [pdf, other]
Title: 3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows
Vivian Liu, Jo Vermeulen, George Fitzmaurice, Justin Matejka
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Multimedia (cs.MM)
[69] arXiv:2210.12309 (cross-list from cs.CL) [pdf, other]
Title: Learning a Grammar Inducer from Massive Uncurated Instructional Videos
Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, Jiebo Luo
Comments: Accepted by EMNLP 2022
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[70] arXiv:2210.12430 (cross-list from cs.SD) [pdf, other]
Title: Speech Emotion Recognition via an Attentive Time-Frequency Neural Network
Cheng Lu, Wenming Zheng, Hailun Lian, Yuan Zong, Chuangao Tang, Sunan Li, Yan Zhao
Comments: This paper has been accepted as a regular paper on IEEE Transactions on Computational Social Systems
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[71] arXiv:2210.12444 (cross-list from cs.CV) [pdf, other]
Title: Weakly-Supervised Temporal Article Grounding
Long Chen, Yulei Niu, Brian Chen, Xudong Lin, Guangxing Han, Christopher Thomas, Hammad Ayyubi, Heng Ji, Shih-Fu Chang
Comments: EMNLP 2022, this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[72] arXiv:2210.12705 (cross-list from cs.CV) [pdf, other]
Title: Few-Shot Meta Learning for Recognizing Facial Phenotypes of Genetic Disorders
Ömer Sümer, Fabio Hellmann, Alexander Hustinx, Tzung-Chien Hsieh, Elisabeth André, Peter Krawitz
Comments: This paper is accepted for publication at MIE 2023 Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[73] arXiv:2210.14163 (cross-list from cs.CV) [pdf, other]
Title: Multi-Granularity Cross-Modality Representation Learning for Named Entity Recognition on Social Media
Peipei Liu, Gaosheng Wang, Hong Li, Jie Liu, Yimo Ren, Hongsong Zhu, Limin Sun
Comments: We have reconducted experiments of the paper, but found that there were fatal errors in our datasets leading to the wrong results and analyses. Therefore, we have to withdraw the paper to ensure the authenticity of science. We are very sorry
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[74] arXiv:2210.14321 (cross-list from eess.AS) [pdf, other]
Title: Artificial ASMR: A Cyber-Psychological Approach
Zexin Fang, Bin Han, C. Clark Cao, Hans. D. Schotten
Comments: Accepted by IEEE MLSP 2023
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Signal Processing (eess.SP)
[75] arXiv:2210.14461 (cross-list from cs.CV) [pdf, other]
Title: TPFNet: A Novel Text In-painting Transformer for Text Removal
Onkar Susladkar, Dhruv Makwana, Gayatri Deshmukh, Sparsh Mittal, Sai Chandra Teja R, Rekha Singhal
Comments: 10 pages, 5 figures, 5 tables, Neurips Proceedings
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[76] arXiv:2210.14632 (cross-list from cs.CR) [pdf, other]
Title: Cover Reproducible Steganography via Deep Generative Models
Kejiang Chen, Hang Zhou, Yaofei Wang, Menghan Li, Weiming Zhang, Nenghai Yu
Comments: Accepted by IEEE TDSC
Subjects: Cryptography and Security (cs.CR); Multimedia (cs.MM)
[77] arXiv:2210.14714 (cross-list from cs.CV) [pdf, other]
Title: TAMFormer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction
Nada Osman, Guglielmo Camporese, Lamberto Ballan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[78] arXiv:2210.14889 (cross-list from cs.CR) [pdf, other]
Title: Perfectly Secure Steganography Using Minimum Entropy Coupling
Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, Martin Strohmeier
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[79] arXiv:2210.15198 (cross-list from cs.LG) [pdf, other]
Title: Watermarking for Out-of-distribution Detection
Qizhou Wang, Feng Liu, Yonggang Zhang, Jing Zhang, Chen Gong, Tongliang Liu, Bo Han
Subjects: Machine Learning (cs.LG); Multimedia (cs.MM)
[80] arXiv:2210.15230 (cross-list from cs.CL) [pdf, other]
Title: How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?
Hritik Bansal, Da Yin, Masoud Monajatipoor, Kai-Wei Chang
Comments: 13 pages, 8 figures, 6 tables. Accepted as Oral Presentation at EMNLP 2022
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[81] arXiv:2210.15247 (cross-list from cs.LG) [pdf, other]
Title: A few-shot learning approach with domain adaptation for personalized real-life stress detection in close relationships
Kexin Feng, Jacqueline B. Duong, Kayla E. Carta, Sierra Walters, Gayla Margolin, Adela C. Timmons, Theodora Chaspari
Subjects: Machine Learning (cs.LG); Multimedia (cs.MM)
[82] arXiv:2210.15368 (cross-list from cs.SD) [pdf, other]
Title: A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech
Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
Comments: Accepted by Interspeech 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[83] arXiv:2210.15370 (cross-list from cs.SD) [pdf, other]
Title: CasNet: Investigating Channel Robustness for Speech Separation
Fan-Lin Wang, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
Comments: Submitted to ICASSP 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[84] arXiv:2210.15511 (cross-list from cs.CV) [pdf, other]
Title: ProContEXT: Exploring Progressive Context Transformer for Tracking
Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie
Comments: Accepted at ICASSP 2023, source code is at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[85] arXiv:2210.15518 (cross-list from cs.CV) [pdf, other]
Title: LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception
Chenyang Li, Zhi-Qi Cheng, Jun-Yan He, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie
Comments: Accepted at ICASSP 2023, source code is at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[86] arXiv:2210.15638 (cross-list from cs.SD) [pdf, other]
Title: LyricJam Sonic: A Generative System for Real-Time Composition and Musical Improvisation
Olga Vechtomova, Gaurav Sahu
Comments: 15 pages, 9 figures, 2 tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[87] arXiv:2210.15750 (cross-list from cs.SD) [pdf, other]
Title: One-Shot Acoustic Matching Of Audio Signals -- Learning to Hear Music In Any Room/ Concert Hall
Prateek Verma, Chris Chafe, Jonathan Berger
Comments: 5 pages, 1 figure; fixed up broken url; added acknowledgments
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[88] arXiv:2210.15796 (cross-list from cs.CV) [pdf, other]
Title: Layout Aware Inpainting for Automated Furniture Removal in Indoor Scenes
Prakhar Kulshreshtha, Konstantinos-Nektarios Lianos, Brian Pugh, Salma Jiddi
Comments: 6 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[89] arXiv:2210.15828 (cross-list from cs.SD) [pdf, other]
Title: On the Role of Visual Context in Enriching Music Representations
Kleanthis Avramidis, Shanti Stewart, Shrikanth Narayanan
Comments: 5 pages, 4 figures, 1 table
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[90] arXiv:2210.15872 (cross-list from cs.CV) [pdf, other]
Title: Exploring Spatial-Temporal Features for Deepfake Detection and Localization
Wu Haiwei, Zhou Jiantao, Zhang Shile, Tian Jinyu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[91] arXiv:2210.15947 (cross-list from cs.CV) [pdf, other]
Title: NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields
Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, Andreas Geiger
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)
[92] arXiv:2210.15977 (cross-list from cs.CV) [pdf, other]
Title: FedVMR: A New Federated Learning method for Video Moment Retrieval
Yan Wang, Xin Luo, Zhen-Duo Chen, Peng-Fei Zhang, Meng Liu, Xin-Shun Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM)
[93] arXiv:2210.15988 (cross-list from cs.SD) [pdf, other]
Title: Spectrograms Are Sequences of Patches
Leyi Zhao, Yi Li
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[94] arXiv:2210.16428 (cross-list from eess.AS) [pdf, other]
Title: Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang
Comments: INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD)
[95] arXiv:2210.16478 (cross-list from cs.CV) [pdf, other]
Title: GPA-Net:No-Reference Point Cloud Quality Assessment with Multi-task Graph Convolutional Network
Ziyu Shan, Qi Yang, Rui Ye, Yujie Zhang, Yiling Xu, Xiaozhong Xu, Shan Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[96] arXiv:2210.17009 (cross-list from cs.CV) [pdf, other]
Title: Point-Syn2Real: Semi-Supervised Synthetic-to-Real Cross-Domain Learning for Object Classification in 3D Point Clouds
Ziwei Wang, Reza Arablouei, Jiajun Liu, Paulo Borges, Greg Bishop-Hurley, Nicholas Heaney
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[97] arXiv:2210.17222 (cross-list from cs.SD) [pdf, other]
Title: Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection
Luigi Attorresi, Davide Salvi, Clara Borrelli, Paolo Bestagini, Stefano Tubaro
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[98] arXiv:2210.17367 (cross-list from cs.SD) [pdf, other]
Title: Analysis and Detection of Singing Techniques in Repertoires of J-POP Solo Singers
Yuya Yamamoto, Juhan Nam, Hiroko Terasawa
Comments: Accepted at ISMIR 2022, appendix website: this https URL
Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Total of 98 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack