Multimedia

Authors and titles for January 2023

Total of 55 entries : 1-50 51-55

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2301.00254 [pdf, other]: Title: Depression Diagnosis and Analysis via Multimodal Multi-order Factor Fusion

Chengbo Yuan, Qianhui Xu, Yong Luo

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[2] arXiv:2301.00726 [pdf, other]: Title: 3-D Markerless Tracking of Human Gait by Geometric Trilateration of Multiple Kinects

Lin Yang, Bowen Yang, Haiwei Dong, Abdulmotaleb El Saddik

Journal-ref: IEEE Systems Journal, vol. 12, no. 2, pp. 1393-1403, 2018

Subjects: Multimedia (cs.MM)
[3] arXiv:2301.01134 [pdf, other]: Title: Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos

Khalid Alnajjar, Mika Hämäläinen, Shuo Zhang

Comments: Figlang 2022

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[4] arXiv:2301.01420 [pdf, other]: Title: Improved CNN Prediction Based Reversible Data Hiding

Yingqiang Qiu, Wanli Peng, Xiaodan Lin, Huanqiang Zeng, Zhenxing Qian

Subjects: Multimedia (cs.MM); Image and Video Processing (eess.IV)
[5] arXiv:2301.02363 [pdf, other]: Title: Text2Poster: Laying out Stylized Texts on Retrieved Images

Chuhao Jin, Hongteng Xu, Ruihua Song, Zhiwu Lu

Comments: 5 pages, Accepted to ICASSP 2022

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[6] arXiv:2301.05541 [pdf, other]: Title: From Ember to Blaze: Swift Interactive Video Adaptation via Meta-Reinforcement Learning

Xuedou Xiao, Mingxuan Yan, Yingying Zuo, Boxi Liu, Paul Ruan, Yang Cao, Wei Wang

Comments: 9 pages, 13 figures

Subjects: Multimedia (cs.MM)
[7] arXiv:2301.06375 [pdf, html, other]: Title: OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset

Jeongkyun Park, Jung-Wook Hwang, Kwanghee Choi, Seung-Hyun Lee, Jun Hwan Ahn, Rae-Hong Park, Hyung-Min Park

Comments: Accepted to ICASSP 2024

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[8] arXiv:2301.06876 [pdf, other]: Title: CS-lol: a Dataset of Viewer Comment with Scene in E-sports Live-streaming

Junjie H. Xu, Yu Nakano, Lingrong Kong, Kojiro Iizuka

Comments: 5 pages, 3 figures, In ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR 23)

Subjects: Multimedia (cs.MM); Machine Learning (cs.LG)
[9] arXiv:2301.07681 [pdf, other]: Title: Reduced-Reference Quality Assessment of Point Clouds via Content-Oriented Saliency Projection

Wei Zhou, Guanghui Yue, Ruizeng Zhang, Yipeng Qin, Hantao Liu

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[10] arXiv:2301.07740 [pdf, other]: Title: The Metaverse from a Multimedia Communications Perspective

Haiwei Dong, Jeannie S. A. Lee

Journal-ref: IEEE Multimedia Magazine, vol. 29, no. 4, pp. 123-127, 2022

Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[11] arXiv:2301.09080 [pdf, html, other]: Title: Dance2MIDI: Dance-driven multi-instruments music generation

Bo Han, Yuheng Li, Yixuan Shen, Yi Ren, Feilin Han

Comments: has been accepted by Computational Visual Media Journal

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12] arXiv:2301.11648 [pdf, other]: Title: Top-down and bottom-up approaches to video Quality of Experience studies; overview and proposal of a new model

Kamil Koniuch, Sabina Baraković, Jasmina Baraković Husić, Katrien De Moor, Lucjan Janowski, Michał Wierzchoń

Comments: 35 pages, 2 figures, preprint submitted to review

Subjects: Multimedia (cs.MM)
[13] arXiv:2301.12191 [pdf, other]: Title: Multi-resolution encoding and optimization for next generation video compression

Vignesh V Menon

Comments: Degree project in Electrical Engineering, Second Cycle, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology (16 October 2020)

Subjects: Multimedia (cs.MM)
[14] arXiv:2301.12831 [pdf, html, other]: Title: M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System

Chenqi Kong, Kexin Zheng, Yibing Liu, Shiqi Wang, Anderson Rocha, Haoliang Li

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[15] arXiv:2301.13523 [pdf, other]: Title: Towards Better Quality of Experience in HTTP Adaptive Streaming

Babak Taraghi, Selina Zoë Haack, Christian Timmerer

Subjects: Multimedia (cs.MM)
[16] arXiv:2301.13617 [pdf, other]: Title: A Closer Look into Recent Video-based Learning Research: A Comprehensive Review of Video Characteristics, Tools, Technologies, and Learning Effectiveness

Evelyn Navarrete, Andreas Nehring, Sascha Schanze, Ralph Ewerth, Anett Hoppe

Subjects: Multimedia (cs.MM)
[17] arXiv:2301.00078 (cross-list from physics.flu-dyn) [pdf, other]: Title: Image and video compression of fluid flow data

Vishal Anatharaman, Jason Feldkamp, Kai Fukami, Kunihiko Taira

Subjects: Fluid Dynamics (physics.flu-dyn); Multimedia (cs.MM)
[18] arXiv:2301.00965 (cross-list from cs.CV) [pdf, other]: Title: OccluMix: Towards De-Occlusion Virtual Try-on by Semantically-Guided Mixup

Zhijing Yang, Junyang Chen, Yukai Shi, Hao Li, Tianshui Chen, Liang Lin

Comments: To be published in IEEE T-MM; Code is available at: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[19] arXiv:2301.01904 (cross-list from cs.CY) [pdf, other]: Title: Piloting Virtual Reality Photo-Based Tours among Students of a Filipino Language Class: A Case of Emergency Remote Teaching in Japan

Roberto Bacani Figueroa Jr., Florinda Amparo Adarayan Palma Gil, Hiroshi Taniguchi

Comments: 25 pages including appendices

Journal-ref: Avant: trends in interdisciplinary studies 13(1) (2022)

Subjects: Computers and Society (cs.CY); Multimedia (cs.MM)
[20] arXiv:2301.01949 (cross-list from cs.CL) [pdf, other]: Title: SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

Yuxing Long, Binyuan Hui, Fulong Ye, Yanyang Li, Zhuoxin Han, Caixia Yuan, Yongbin Li, Xiaojie Wang

Comments: AAAI 2023

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[21] arXiv:2301.03127 (cross-list from cs.CL) [pdf, other]: Title: Logically at Factify 2: A Multi-Modal Fact Checking System Based on Evidence Retrieval techniques and Transformer Encoder Architecture

Pim Jordi Verschuuren, Jie Gao, Adelize van Eeden, Stylianos Oikonomou, Anil Bandhakavi

Comments: Accepted in AAAI'23: Second Workshop on Multimodal Fact-Checking and Hate Speech Detection, February 2023, Washington, DC, USA

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[22] arXiv:2301.03829 (cross-list from cs.LG) [pdf, other]: Title: From Plate to Prevention: A Dietary Nutrient-aided Platform for Health Promotion in Singapore

Kaiping Zheng, Thao Nguyen, Jesslyn Hwei Sing Chong, Charlene Enhui Goh, Melanie Herschel, Hee Hoon Lee, Changshuo Liu, Beng Chin Ooi, Wei Wang, James Yip

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Multimedia (cs.MM)
[23] arXiv:2301.03992 (cross-list from cs.CV) [pdf, other]: Title: Vision Transformers Are Good Mask Auto-Labelers

Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima Anandkumar

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[24] arXiv:2301.04117 (cross-list from eess.IV) [pdf, other]: Title: Adaptive and Scalable Compression of Multispectral Images using VVC

Philipp Seltsam, Priyanka Das, Mathias Wien

Comments: 10 pages, 5 figures, accepted as poster at Data Compression Conference 2023

Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[25] arXiv:2301.04366 (cross-list from cs.CL) [pdf, other]: Title: Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering

Paul Lerner, Olivier Ferret, Camille Guinaudeau

Comments: Accepted at ECIR 2023

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)
[26] arXiv:2301.05174 (cross-list from cs.IR) [pdf, other]: Title: Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study

Mariya Hendriksen, Svitlana Vakulenko, Ernst Kuiper, Maarten de Rijke

Comments: 18 pages, accepted as a reproducibility paper at ECIR 2023

Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[27] arXiv:2301.05908 (cross-list from cs.SD) [pdf, other]: Title: An Order-Complexity Model for Aesthetic Quality Assessment of Symbolic Homophony Music Scores

Xin Jin, Wu Zhou, Jinyu Wang, Duo Xu, Yiqing Rong, Shuai Cui

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[28] arXiv:2301.06993 (cross-list from cs.HC) [pdf, other]: Title: Your Day in Your Pocket: Complex Activity Recognition from Smartphone Accelerometers

Emma Bouton--Bessac, Lakmal Meegahapola, Daniel Gatica-Perez

Comments: 16th EAI International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) 2022

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[29] arXiv:2301.07431 (cross-list from cs.CV) [pdf, other]: Title: Sharp Eyes: A Salient Object Detector Working The Same Way as Human Visual Characteristics

Ge Zhu, Jinbao Li, Yahong Guo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[30] arXiv:2301.08565 (cross-list from cs.HC) [pdf, other]: Title: Developing a Framework for Heterotopias as Discursive Playgrounds: A Comparative Analysis of Non-Immersive and Immersive Technologies

Elif Hilal Korkut, Elif Surer

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[31] arXiv:2301.08664 (cross-list from cs.CV) [pdf, other]: Title: AccDecoder: Accelerated Decoding for Neural-enhanced Video Analytics

Tingting Yuan, Liang Mi, Weijun Wang, Haipeng Dai, Xiaoming Fu

Comments: Accepted by 2023 IEEE INFOCOM

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[32] arXiv:2301.08752 (cross-list from eess.IV) [pdf, other]: Title: Optimized learned entropy coding parameters for practical neural-based image and video compression

Amir Said, Reza Pourreza, Hoang Le

Comments: 2022 IEEE International Conference on Image Processing (ICIP)

Journal-ref: IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 661-665

Subjects: Image and Video Processing (eess.IV); Information Theory (cs.IT); Machine Learning (cs.LG); Multimedia (cs.MM)
[33] arXiv:2301.08783 (cross-list from cs.CV) [pdf, other]: Title: An Asynchronous Intensity Representation for Framed and Event Video Sources

Andrew C. Freeman, Montek Singh, Ketan Mayer-Patel

Comments: 10 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[34] arXiv:2301.09492 (cross-list from cs.HC) [pdf, other]: Title: Understanding Context to Capture when Reconstructing Meaningful Spaces for Remote Instruction and Connecting in XR

Hanuma Teja Maddali, Amanda Lazar

Comments: 26 pages, 5 figures, 4 tables

Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Graphics (cs.GR); Multimedia (cs.MM)
[35] arXiv:2301.09772 (cross-list from cs.HC) [pdf, other]: Title: SONIA: an immersive customizable virtual reality system for the education and exploration of brain networks

Owen Hellum, Christopher Steele, Yiming Xiao

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[36] arXiv:2301.09776 (cross-list from eess.IV) [pdf, other]: Title: Differentiable bit-rate estimation for neural-based video codec enhancement

Amir Said, Manish Kumar Singh, Reza Pourreza

Journal-ref: Picture Coding Symposium (PCS), San Jose, CA, USA, 2022, pp. 379-383

Subjects: Image and Video Processing (eess.IV); Information Theory (cs.IT); Machine Learning (cs.LG); Multimedia (cs.MM)
[37] arXiv:2301.09799 (cross-list from eess.IV) [pdf, other]: Title: LDMIC: Learning-based Distributed Multi-view Image Coding

Xinjie Zhang, Jiawei Shao, Jun Zhang

Comments: Accepted by ICLR 2023

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[38] arXiv:2301.10056 (cross-list from cs.CR) [pdf, other]: Title: Side Eye: Characterizing the Limits of POV Acoustic Eavesdropping from Smartphone Cameras with Rolling Shutters and Movable Lenses

Yan Long, Pirouz Naghavi, Blas Kojusner, Kevin Butler, Sara Rampazzi, Kevin Fu

Journal-ref: 2023 IEEE Symposium on Security and Privacy

Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:2301.10455 (cross-list from eess.IV) [pdf, other]: Title: Rate-Perception Optimized Preprocessing for Video Coding

Chengqian Ma, Zhiqiang Wu, Chunlei Cai, Pengwei Zhang, Yi Wang, Long Zheng, Chao Chen, Quan Zhou

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[40] arXiv:2301.10972 (cross-list from cs.CV) [pdf, other]: Title: On the Importance of Noise Scheduling for Diffusion Models

Ting Chen

Comments: tech report

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
[41] arXiv:2301.11145 (cross-list from cs.CV) [pdf, html, other]: Title: Learning from Mistakes: Self-Regularizing Hierarchical Representations in Point Cloud Semantic Segmentation

Elena Camuffo, Umberto Michieli, Simone Milani

Journal-ref: IEEE Transactions on Multimedia (TMM), 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Machine Learning (stat.ML)
[42] arXiv:2301.11274 (cross-list from cs.CV) [pdf, other]: Title: Self-Supervised RGB-T Tracking with Cross-Input Consistency

Xingchen Zhang, Yiannis Demiris

Comments: 12 pages,9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[43] arXiv:2301.11752 (cross-list from cs.CV) [pdf, other]: Title: Inter-View Depth Consistency Testing in Depth Difference Subspace

Pravin Kumar Rana, Markus Flierl

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[44] arXiv:2301.12084 (cross-list from cs.SD) [pdf, other]: Title: Automated Arrangements of Multi-Part Music for Sets of Monophonic Instruments

Matthew Mccloskey, Gabrielle Curcio, Amulya Badineni, Kevin Mcgrath, Dimitris Papamichail

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[45] arXiv:2301.12097 (cross-list from cs.IR) [pdf, other]: Title: Enhancing Dyadic Relations with Homogeneous Graphs for Multimodal Recommendation

Hongyu Zhou, Xin Zhou, Lingzi Zhang, Zhiqi Shen

Comments: 17 pages, 3 figures

Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM)
[46] arXiv:2301.12354 (cross-list from cs.SD) [pdf, other]: Title: Artistic Curve Steganography Carried by Musical Audio

Christopher J. Tralie

Comments: 18 pages, 14 figures, in Proceedings of EvoMUSART 2023

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[47] arXiv:2301.12503 (cross-list from cs.SD) [pdf, other]: Title: AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley

Comments: Accepted by ICML 2023. Demo and implementation at this https URL. Evaluation toolbox at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[48] arXiv:2301.12613 (cross-list from cs.CV) [pdf, other]: Title: AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio

Xiaoyang Huang, Yanjun Wang, Yang Liu, Bingbing Ni, Wenjun Zhang, Jinxian Liu, Teng Li

Comments: Accepted by Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[49] arXiv:2301.12661 (cross-list from cs.SD) [pdf, other]: Title: Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao

Comments: Audio samples are available at this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[50] arXiv:2301.12662 (cross-list from cs.SD) [pdf, other]: Title: SingSong: Generating musical accompaniments from singing

Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Total of 55 entries : 1-50 51-55

Showing up to 50 entries per page: fewer | more | all