Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.MM

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Multimedia

Authors and titles for January 2023

Total of 55 entries : 1-50 51-55
Showing up to 50 entries per page: fewer | more | all
[1] arXiv:2301.00254 [pdf, other]
Title: Depression Diagnosis and Analysis via Multimodal Multi-order Factor Fusion
Chengbo Yuan, Qianhui Xu, Yong Luo
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[2] arXiv:2301.00726 [pdf, other]
Title: 3-D Markerless Tracking of Human Gait by Geometric Trilateration of Multiple Kinects
Lin Yang, Bowen Yang, Haiwei Dong, Abdulmotaleb El Saddik
Journal-ref: IEEE Systems Journal, vol. 12, no. 2, pp. 1393-1403, 2018
Subjects: Multimedia (cs.MM)
[3] arXiv:2301.01134 [pdf, other]
Title: Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos
Khalid Alnajjar, Mika Hämäläinen, Shuo Zhang
Comments: Figlang 2022
Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[4] arXiv:2301.01420 [pdf, other]
Title: Improved CNN Prediction Based Reversible Data Hiding
Yingqiang Qiu, Wanli Peng, Xiaodan Lin, Huanqiang Zeng, Zhenxing Qian
Subjects: Multimedia (cs.MM); Image and Video Processing (eess.IV)
[5] arXiv:2301.02363 [pdf, other]
Title: Text2Poster: Laying out Stylized Texts on Retrieved Images
Chuhao Jin, Hongteng Xu, Ruihua Song, Zhiwu Lu
Comments: 5 pages, Accepted to ICASSP 2022
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[6] arXiv:2301.05541 [pdf, other]
Title: From Ember to Blaze: Swift Interactive Video Adaptation via Meta-Reinforcement Learning
Xuedou Xiao, Mingxuan Yan, Yingying Zuo, Boxi Liu, Paul Ruan, Yang Cao, Wei Wang
Comments: 9 pages, 13 figures
Subjects: Multimedia (cs.MM)
[7] arXiv:2301.06375 [pdf, html, other]
Title: OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
Jeongkyun Park, Jung-Wook Hwang, Kwanghee Choi, Seung-Hyun Lee, Jun Hwan Ahn, Rae-Hong Park, Hyung-Min Park
Comments: Accepted to ICASSP 2024
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[8] arXiv:2301.06876 [pdf, other]
Title: CS-lol: a Dataset of Viewer Comment with Scene in E-sports Live-streaming
Junjie H. Xu, Yu Nakano, Lingrong Kong, Kojiro Iizuka
Comments: 5 pages, 3 figures, In ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR 23)
Subjects: Multimedia (cs.MM); Machine Learning (cs.LG)
[9] arXiv:2301.07681 [pdf, other]
Title: Reduced-Reference Quality Assessment of Point Clouds via Content-Oriented Saliency Projection
Wei Zhou, Guanghui Yue, Ruizeng Zhang, Yipeng Qin, Hantao Liu
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[10] arXiv:2301.07740 [pdf, other]
Title: The Metaverse from a Multimedia Communications Perspective
Haiwei Dong, Jeannie S. A. Lee
Journal-ref: IEEE Multimedia Magazine, vol. 29, no. 4, pp. 123-127, 2022
Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[11] arXiv:2301.09080 [pdf, html, other]
Title: Dance2MIDI: Dance-driven multi-instruments music generation
Bo Han, Yuheng Li, Yixuan Shen, Yi Ren, Feilin Han
Comments: has been accepted by Computational Visual Media Journal
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12] arXiv:2301.11648 [pdf, other]
Title: Top-down and bottom-up approaches to video Quality of Experience studies; overview and proposal of a new model
Kamil Koniuch, Sabina Baraković, Jasmina Baraković Husić, Katrien De Moor, Lucjan Janowski, Michał Wierzchoń
Comments: 35 pages, 2 figures, preprint submitted to review
Subjects: Multimedia (cs.MM)
[13] arXiv:2301.12191 [pdf, other]
Title: Multi-resolution encoding and optimization for next generation video compression
Vignesh V Menon
Comments: Degree project in Electrical Engineering, Second Cycle, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology (16 October 2020)
Subjects: Multimedia (cs.MM)
[14] arXiv:2301.12831 [pdf, html, other]
Title: M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System
Chenqi Kong, Kexin Zheng, Yibing Liu, Shiqi Wang, Anderson Rocha, Haoliang Li
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[15] arXiv:2301.13523 [pdf, other]
Title: Towards Better Quality of Experience in HTTP Adaptive Streaming
Babak Taraghi, Selina Zoë Haack, Christian Timmerer
Subjects: Multimedia (cs.MM)
[16] arXiv:2301.13617 [pdf, other]
Title: A Closer Look into Recent Video-based Learning Research: A Comprehensive Review of Video Characteristics, Tools, Technologies, and Learning Effectiveness
Evelyn Navarrete, Andreas Nehring, Sascha Schanze, Ralph Ewerth, Anett Hoppe
Subjects: Multimedia (cs.MM)
[17] arXiv:2301.00078 (cross-list from physics.flu-dyn) [pdf, other]
Title: Image and video compression of fluid flow data
Vishal Anatharaman, Jason Feldkamp, Kai Fukami, Kunihiko Taira
Subjects: Fluid Dynamics (physics.flu-dyn); Multimedia (cs.MM)
[18] arXiv:2301.00965 (cross-list from cs.CV) [pdf, other]
Title: OccluMix: Towards De-Occlusion Virtual Try-on by Semantically-Guided Mixup
Zhijing Yang, Junyang Chen, Yukai Shi, Hao Li, Tianshui Chen, Liang Lin
Comments: To be published in IEEE T-MM; Code is available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[19] arXiv:2301.01904 (cross-list from cs.CY) [pdf, other]
Title: Piloting Virtual Reality Photo-Based Tours among Students of a Filipino Language Class: A Case of Emergency Remote Teaching in Japan
Roberto Bacani Figueroa Jr., Florinda Amparo Adarayan Palma Gil, Hiroshi Taniguchi
Comments: 25 pages including appendices
Journal-ref: Avant: trends in interdisciplinary studies 13(1) (2022)
Subjects: Computers and Society (cs.CY); Multimedia (cs.MM)
[20] arXiv:2301.01949 (cross-list from cs.CL) [pdf, other]
Title: SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph
Yuxing Long, Binyuan Hui, Fulong Ye, Yanyang Li, Zhuoxin Han, Caixia Yuan, Yongbin Li, Xiaojie Wang
Comments: AAAI 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[21] arXiv:2301.03127 (cross-list from cs.CL) [pdf, other]
Title: Logically at Factify 2: A Multi-Modal Fact Checking System Based on Evidence Retrieval techniques and Transformer Encoder Architecture
Pim Jordi Verschuuren, Jie Gao, Adelize van Eeden, Stylianos Oikonomou, Anil Bandhakavi
Comments: Accepted in AAAI'23: Second Workshop on Multimodal Fact-Checking and Hate Speech Detection, February 2023, Washington, DC, USA
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[22] arXiv:2301.03829 (cross-list from cs.LG) [pdf, other]
Title: From Plate to Prevention: A Dietary Nutrient-aided Platform for Health Promotion in Singapore
Kaiping Zheng, Thao Nguyen, Jesslyn Hwei Sing Chong, Charlene Enhui Goh, Melanie Herschel, Hee Hoon Lee, Changshuo Liu, Beng Chin Ooi, Wei Wang, James Yip
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Multimedia (cs.MM)
[23] arXiv:2301.03992 (cross-list from cs.CV) [pdf, other]
Title: Vision Transformers Are Good Mask Auto-Labelers
Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima Anandkumar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[24] arXiv:2301.04117 (cross-list from eess.IV) [pdf, other]
Title: Adaptive and Scalable Compression of Multispectral Images using VVC
Philipp Seltsam, Priyanka Das, Mathias Wien
Comments: 10 pages, 5 figures, accepted as poster at Data Compression Conference 2023
Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[25] arXiv:2301.04366 (cross-list from cs.CL) [pdf, other]
Title: Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering
Paul Lerner, Olivier Ferret, Camille Guinaudeau
Comments: Accepted at ECIR 2023
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)
[26] arXiv:2301.05174 (cross-list from cs.IR) [pdf, other]
Title: Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study
Mariya Hendriksen, Svitlana Vakulenko, Ernst Kuiper, Maarten de Rijke
Comments: 18 pages, accepted as a reproducibility paper at ECIR 2023
Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[27] arXiv:2301.05908 (cross-list from cs.SD) [pdf, other]
Title: An Order-Complexity Model for Aesthetic Quality Assessment of Symbolic Homophony Music Scores
Xin Jin, Wu Zhou, Jinyu Wang, Duo Xu, Yiqing Rong, Shuai Cui
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[28] arXiv:2301.06993 (cross-list from cs.HC) [pdf, other]
Title: Your Day in Your Pocket: Complex Activity Recognition from Smartphone Accelerometers
Emma Bouton--Bessac, Lakmal Meegahapola, Daniel Gatica-Perez
Comments: 16th EAI International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) 2022
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[29] arXiv:2301.07431 (cross-list from cs.CV) [pdf, other]
Title: Sharp Eyes: A Salient Object Detector Working The Same Way as Human Visual Characteristics
Ge Zhu, Jinbao Li, Yahong Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[30] arXiv:2301.08565 (cross-list from cs.HC) [pdf, other]
Title: Developing a Framework for Heterotopias as Discursive Playgrounds: A Comparative Analysis of Non-Immersive and Immersive Technologies
Elif Hilal Korkut, Elif Surer
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[31] arXiv:2301.08664 (cross-list from cs.CV) [pdf, other]
Title: AccDecoder: Accelerated Decoding for Neural-enhanced Video Analytics
Tingting Yuan, Liang Mi, Weijun Wang, Haipeng Dai, Xiaoming Fu
Comments: Accepted by 2023 IEEE INFOCOM
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[32] arXiv:2301.08752 (cross-list from eess.IV) [pdf, other]
Title: Optimized learned entropy coding parameters for practical neural-based image and video compression
Amir Said, Reza Pourreza, Hoang Le
Comments: 2022 IEEE International Conference on Image Processing (ICIP)
Journal-ref: IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 661-665
Subjects: Image and Video Processing (eess.IV); Information Theory (cs.IT); Machine Learning (cs.LG); Multimedia (cs.MM)
[33] arXiv:2301.08783 (cross-list from cs.CV) [pdf, other]
Title: An Asynchronous Intensity Representation for Framed and Event Video Sources
Andrew C. Freeman, Montek Singh, Ketan Mayer-Patel
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[34] arXiv:2301.09492 (cross-list from cs.HC) [pdf, other]
Title: Understanding Context to Capture when Reconstructing Meaningful Spaces for Remote Instruction and Connecting in XR
Hanuma Teja Maddali, Amanda Lazar
Comments: 26 pages, 5 figures, 4 tables
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Graphics (cs.GR); Multimedia (cs.MM)
[35] arXiv:2301.09772 (cross-list from cs.HC) [pdf, other]
Title: SONIA: an immersive customizable virtual reality system for the education and exploration of brain networks
Owen Hellum, Christopher Steele, Yiming Xiao
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[36] arXiv:2301.09776 (cross-list from eess.IV) [pdf, other]
Title: Differentiable bit-rate estimation for neural-based video codec enhancement
Amir Said, Manish Kumar Singh, Reza Pourreza
Journal-ref: Picture Coding Symposium (PCS), San Jose, CA, USA, 2022, pp. 379-383
Subjects: Image and Video Processing (eess.IV); Information Theory (cs.IT); Machine Learning (cs.LG); Multimedia (cs.MM)
[37] arXiv:2301.09799 (cross-list from eess.IV) [pdf, other]
Title: LDMIC: Learning-based Distributed Multi-view Image Coding
Xinjie Zhang, Jiawei Shao, Jun Zhang
Comments: Accepted by ICLR 2023
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[38] arXiv:2301.10056 (cross-list from cs.CR) [pdf, other]
Title: Side Eye: Characterizing the Limits of POV Acoustic Eavesdropping from Smartphone Cameras with Rolling Shutters and Movable Lenses
Yan Long, Pirouz Naghavi, Blas Kojusner, Kevin Butler, Sara Rampazzi, Kevin Fu
Journal-ref: 2023 IEEE Symposium on Security and Privacy
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:2301.10455 (cross-list from eess.IV) [pdf, other]
Title: Rate-Perception Optimized Preprocessing for Video Coding
Chengqian Ma, Zhiqiang Wu, Chunlei Cai, Pengwei Zhang, Yi Wang, Long Zheng, Chao Chen, Quan Zhou
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[40] arXiv:2301.10972 (cross-list from cs.CV) [pdf, other]
Title: On the Importance of Noise Scheduling for Diffusion Models
Ting Chen
Comments: tech report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[41] arXiv:2301.11145 (cross-list from cs.CV) [pdf, html, other]
Title: Learning from Mistakes: Self-Regularizing Hierarchical Representations in Point Cloud Semantic Segmentation
Elena Camuffo, Umberto Michieli, Simone Milani
Journal-ref: IEEE Transactions on Multimedia (TMM), 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Machine Learning (stat.ML)
[42] arXiv:2301.11274 (cross-list from cs.CV) [pdf, other]
Title: Self-Supervised RGB-T Tracking with Cross-Input Consistency
Xingchen Zhang, Yiannis Demiris
Comments: 12 pages,9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[43] arXiv:2301.11752 (cross-list from cs.CV) [pdf, other]
Title: Inter-View Depth Consistency Testing in Depth Difference Subspace
Pravin Kumar Rana, Markus Flierl
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[44] arXiv:2301.12084 (cross-list from cs.SD) [pdf, other]
Title: Automated Arrangements of Multi-Part Music for Sets of Monophonic Instruments
Matthew Mccloskey, Gabrielle Curcio, Amulya Badineni, Kevin Mcgrath, Dimitris Papamichail
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[45] arXiv:2301.12097 (cross-list from cs.IR) [pdf, other]
Title: Enhancing Dyadic Relations with Homogeneous Graphs for Multimodal Recommendation
Hongyu Zhou, Xin Zhou, Lingzi Zhang, Zhiqi Shen
Comments: 17 pages, 3 figures
Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[46] arXiv:2301.12354 (cross-list from cs.SD) [pdf, other]
Title: Artistic Curve Steganography Carried by Musical Audio
Christopher J. Tralie
Comments: 18 pages, 14 figures, in Proceedings of EvoMUSART 2023
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[47] arXiv:2301.12503 (cross-list from cs.SD) [pdf, other]
Title: AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley
Comments: Accepted by ICML 2023. Demo and implementation at this https URL. Evaluation toolbox at this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[48] arXiv:2301.12613 (cross-list from cs.CV) [pdf, other]
Title: AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio
Xiaoyang Huang, Yanjun Wang, Yang Liu, Bingbing Ni, Wenjun Zhang, Jinxian Liu, Teng Li
Comments: Accepted by Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[49] arXiv:2301.12661 (cross-list from cs.SD) [pdf, other]
Title: Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao
Comments: Audio samples are available at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[50] arXiv:2301.12662 (cross-list from cs.SD) [pdf, other]
Title: SingSong: Generating musical accompaniments from singing
Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Total of 55 entries : 1-50 51-55
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack