Skip to main content

Showing 1–50 of 82 results for author: Aizawa, K

.
  1. arXiv:2506.01952  [pdf, ps, other

    cs.CL cs.AI cs.LG

    WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks

    Authors: Atsuyuki Miyai, Zaiying Zhao, Kazuki Egashira, Atsuki Sato, Tatsumi Sunada, Shota Onohara, Hiromasa Yamanishi, Mashiro Toyooka, Kunato Nishina, Ryoma Maeda, Kiyoharu Aizawa, Toshihiko Yamasaki

    Abstract: Powered by a large language model (LLM), a web browsing agent operates web browsers in a human-like manner and offers a highly transparent path toward automating a wide range of everyday tasks. As web agents become increasingly capable and demonstrate proficiency in general browsing tasks, a critical question emerges: Can they go beyond general browsing to robustly handle tasks that are tedious an… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Project Page: https://webchorearena.github.io/

  2. arXiv:2505.20298  [pdf, ps, other

    cs.CL cs.AI cs.CV

    MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding

    Authors: Jeonghun Baek, Kazuki Egashira, Shota Onohara, Atsuyuki Miyai, Yuki Imajuku, Hikaru Ikuta, Kiyoharu Aizawa

    Abstract: Manga, or Japanese comics, is a richly multimodal narrative form that blends images and text in complex ways. Teaching large multimodal models (LMMs) to understand such narratives at a human-like level could help manga creators reflect on and refine their stories. To this end, we introduce two benchmarks for multimodal manga understanding: MangaOCR, which targets in-page text recognition, and Mang… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 20 pages, 11 figures

  3. arXiv:2505.11118  [pdf, ps, other

    astro-ph.IM astro-ph.CO

    Prototype sub-wavelength structure anti-reflection coating on alumina filters for ground-based CMB telescopes

    Authors: Kosuke Aizawa, Ryosuke Akizawa, Scott Cray, Shaul Hanany, Shotaro Kawano, Jürgen Koch, Kuniaki Konishi, Rex Lam, Tomotake Matsumura, Haruyuki Sakurai, Ryota Takaku

    Abstract: We present designs and fabrication of sub-wavelength anti-reflection (AR) structures on alumina for infrared absorptive filters with passbands near 30, 125, and 250 GHz. These bands are widely used by ground-based instruments measuring the cosmic microwave background radiation. The designs are tuned to provide reflectance of 2% or less for fractional bandwidths between 51% and 72%, with each of th… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 15 pages, 4 figures, submitted to JATIS

  4. arXiv:2504.18727  [pdf, other

    cs.IR cs.AI

    World Food Atlas Project

    Authors: Ali Rostami, Z Xie, A Ishino, Y Yamakata, K Aizawa, Ramesh Jain

    Abstract: A coronavirus pandemic is forcing people to be "at home" all over the world. In a life of hardly ever going out, we would have realized how the food we eat affects our bodies. What can we do to know our food more and control it better? To give us a clue, we are trying to build a World Food Atlas (WFA) that collects all the knowledge about food in the world. In this paper, we present two of our tri… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Journal ref: Proceedings of the 13th International Workshop on Multimedia for Cooking and Eating Activities 2021

  5. Experimental Studies on Spatial Resolution of a Delay-Line Current-Biased Kinetic-Inductance Detector

    Authors: The Dang Vu, Hiroaki Shishido, Kazuya Aizawa, Takayuki Oku, Kenichi Oikawa, Masahide Harada, Kenji M. Kojima, Shigeyuki Miyajima, Kazuhiko Soyama, Tomio Koyama, Mutsuo Hidaka, Soh Y. Suzuki, Manobu M. Tanaka, Masahiko Machida, Shuichi Kawamata, Takekazu Ishida

    Abstract: A current-biased kinetic inductance detector (CB-KID) is a novel superconducting detector to construct a neutron transmission imaging system. The characteristics of a superconducting neutron detector have been systematically studied to improve spatial resolution of our CB-KID neutron detector. In this study, we investigated the distribution of spatial resolutions under different operating conditio… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  6. arXiv:2502.14778  [pdf, ps, other

    cs.CL cs.AI cs.CV

    Harnessing PDF Data for Improving Japanese Large Multimodal Models

    Authors: Jeonghun Baek, Akiko Aizawa, Kiyoharu Aizawa

    Abstract: Large Multimodal Models (LMMs) have demonstrated strong performance in English, but their effectiveness in Japanese remains limited due to the lack of high-quality training data. Current Japanese LMMs often rely on translated English datasets, restricting their ability to capture Japan-specific cultural knowledge. To address this, we explore the potential of Japanese PDF data as a training resourc… ▽ More

    Submitted 31 May, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: Accepted to ACL2025 Findings. Code: https://github.com/ku21fan/PDF-JLMM

  7. arXiv:2501.18463  [pdf, ps, other

    cs.CV

    A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models

    Authors: Shiho Noda, Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa

    Abstract: Out-of-distribution (OOD) detection is a task that detects OOD samples during inference to ensure the safety of deployed models. However, conventional benchmarks have reached performance saturation, making it difficult to compare recent OOD detection methods. To address this challenge, we introduce three novel OOD detection benchmarks that enable a deeper understanding of method characteristics an… ▽ More

    Submitted 29 May, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

    Comments: Accepted at ICIP2025 Dataset and Benchmark Track

  8. arXiv:2410.17250  [pdf, other

    cs.CL cs.AI cs.CV

    JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

    Authors: Shota Onohara, Atsuyuki Miyai, Yuki Imajuku, Kazuki Egashira, Jeonghun Baek, Xiang Yue, Graham Neubig, Kiyoharu Aizawa

    Abstract: Accelerating research on Large Multimodal Models (LMMs) in non-English languages is crucial for enhancing user experiences across broader populations. In this paper, we introduce JMMMU (Japanese MMMU), the first large-scale Japanese benchmark designed to evaluate LMMs on expert-level tasks based on the Japanese cultural context. To facilitate comprehensive culture-aware evaluation, JMMMU features… ▽ More

    Submitted 19 March, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: Accepted at NAACL 2025. Project page: https://mmmu-japanese-benchmark.github.io/JMMMU/

  9. FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation

    Authors: Yuki Imajuku, Yoko Yamakata, Kiyoharu Aizawa

    Abstract: Research on food image understanding using recipe data has been a long-standing focus due to the diversity and complexity of the data. Moreover, food is inextricably linked to people's lives, making it a vital research area for practical applications such as dietary management. Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities, not only in th… ▽ More

    Submitted 3 March, 2025; v1 submitted 27 September, 2024; originally announced September 2024.

    Comments: 15 pages, 5 figures. We found errors in the calculation of evaluation metrics, which were corrected in this version with $\color{blue}{\text{modifications highlighted in blue}}$. Please also see the Appendix

  10. arXiv:2409.00313  [pdf, other

    cs.CV

    Training-Free Sketch-Guided Diffusion with Latent Optimization

    Authors: Sandra Zhang Ding, Jiafeng Mao, Kiyoharu Aizawa

    Abstract: Based on recent advanced diffusion models, Text-to-image (T2I) generation models have demonstrated their capabilities to generate diverse and high-quality images. However, leveraging their potential for real-world content creation, particularly in providing users with precise control over the image generation result, poses a significant challenge. In this paper, we propose an innovative training-f… ▽ More

    Submitted 7 May, 2025; v1 submitted 30 August, 2024; originally announced September 2024.

    Comments: 8 pages

  11. arXiv:2408.04844  [pdf, other

    cs.HC

    Investigating the Perception of Facial Anonymization Techniques in 360° Videos

    Authors: Leslie Wöhler, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: In this work, we investigate facial anonymization techniques in 360° videos and assess their influence on the perceived realism, anonymization effect, and presence of participants. In comparison to traditional footage, 360° videos can convey engaging, immersive experiences that accurately represent the atmosphere of real-world locations. As the entire environment is captured simultaneously, it is… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  12. arXiv:2408.03040  [pdf, other

    astro-ph.IM astro-ph.CO

    Multi-dimensional optimisation of the scanning strategy for the LiteBIRD space mission

    Authors: Y. Takase, L. Vacher, H. Ishino, G. Patanchon, L. Montier, S. L. Stever, K. Ishizaka, Y. Nagano, W. Wang, J. Aumont, K. Aizawa, A. Anand, C. Baccigalupi, M. Ballardini, A. J. Banday, R. B. Barreiro, N. Bartolo, S. Basak, M. Bersanelli, M. Bortolami, T. Brinckmann, E. Calabrese, P. Campeti, E. Carinos, A. Carones , et al. (83 additional authors not shown)

    Abstract: Large angular scale surveys in the absence of atmosphere are essential for measuring the primordial $B$-mode power spectrum of the Cosmic Microwave Background (CMB). Since this proposed measurement is about three to four orders of magnitude fainter than the temperature anisotropies of the CMB, in-flight calibration of the instruments and active suppression of systematic effects are crucial. We inv… ▽ More

    Submitted 15 November, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  13. arXiv:2407.21794  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

    Authors: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Yueqian Lin, Qing Yu, Go Irie, Shafiq Joty, Yixuan Li, Hai Li, Ziwei Liu, Toshihiko Yamasaki, Kiyoharu Aizawa

    Abstract: Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped the field of OOD detection. Meanwhile, several other problems are closely related to OOD detection, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). To unify these problems, a generalized OOD detection framework w… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: survey paper. We welcome questions, issues, and paper requests via https://github.com/AtsuMiyai/Awesome-OOD-VLM

  14. arXiv:2407.19034  [pdf, other

    cs.CV cs.MM

    MangaUB: A Manga Understanding Benchmark for Large Multimodal Models

    Authors: Hikaru Ikuta, Leslie Wöhler, Kiyoharu Aizawa

    Abstract: Manga is a popular medium that combines stylized drawings and text to convey stories. As manga panels differ from natural images, computational systems traditionally had to be designed specifically for manga. Recently, the adaptive nature of modern large multimodal models (LMMs) shows possibilities for more general approaches. To provide an analysis of the current capability of LMMs for manga unde… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  15. arXiv:2407.17555  [pdf, other

    astro-ph.CO

    LiteBIRD Science Goals and Forecasts. Mapping the Hot Gas in the Universe

    Authors: M. Remazeilles, M. Douspis, J. A. Rubiño-Martín, A. J. Banday, J. Chluba, P. de Bernardis, M. De Petris, C. Hernández-Monteagudo, G. Luzzi, J. Macias-Perez, S. Masi, T. Namikawa, L. Salvati, H. Tanimura, K. Aizawa, A. Anand, J. Aumont, C. Baccigalupi, M. Ballardini, R. B. Barreiro, N. Bartolo, S. Basak, M. Bersanelli, D. Blinov, M. Bortolami , et al. (82 additional authors not shown)

    Abstract: We assess the capabilities of the LiteBIRD mission to map the hot gas distribution in the Universe through the thermal Sunyaev-Zeldovich (SZ) effect. Our analysis relies on comprehensive simulations incorporating various sources of Galactic and extragalactic foreground emission, while accounting for specific instrumental characteristics of LiteBIRD, such as detector sensitivities, frequency-depend… ▽ More

    Submitted 23 October, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: 38 pages, 13 figures, abstract shortened. Updated to match version accepted by JCAP

  16. arXiv:2406.02724  [pdf, other

    astro-ph.IM astro-ph.CO physics.ins-det

    The LiteBIRD mission to explore cosmic inflation

    Authors: T. Ghigna, A. Adler, K. Aizawa, H. Akamatsu, R. Akizawa, E. Allys, A. Anand, J. Aumont, J. Austermann, S. Azzoni, C. Baccigalupi, M. Ballardini, A. J. Banday, R. B. Barreiro, N. Bartolo, S. Basak, A. Basyrov, S. Beckman, M. Bersanelli, M. Bortolami, F. Bouchet, T. Brinckmann, P. Campeti, E. Carinos, A. Carones , et al. (134 additional authors not shown)

    Abstract: LiteBIRD, the next-generation cosmic microwave background (CMB) experiment, aims for a launch in Japan's fiscal year 2032, marking a major advancement in the exploration of primordial cosmology and fundamental physics. Orbiting the Sun-Earth Lagrangian point L2, this JAXA-led strategic L-class mission will conduct a comprehensive mapping of the CMB polarization across the entire sky. During its 3-… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 23 pages, 9 figures, 1 table, SPIE Astronomical Telescopes + Instrumentation 2024

  17. arXiv:2405.05924  [pdf

    cs.HC

    Privacy Protection and Video Manipulation in Immersive Media

    Authors: Leslie Wöhler, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: In comparison to traditional footage, 360° videos can convey engaging, immersive experiences and even be utilized to create interactive virtual environments. Like regular recordings, these videos need to consider the privacy of recorded people and could be targets for video manipulations. However, due to their properties like enhanced presence, the effects on users might differ from traditional, n… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

    Comments: This is an accepted position statement of CHI 2024 Workshop (Novel Approaches for Understanding and Mitigating Emerging New Harms in Immersive and Embodied Virtual Spaces: A Workshop at CHI 2024)

  18. arXiv:2404.13993  [pdf, other

    cs.MM cs.CV

    Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion

    Authors: Yingxuan Li, Ryota Hinami, Kiyoharu Aizawa, Yusuke Matsui

    Abstract: Recognizing characters and predicting speakers of dialogue are critical for comic processing tasks, such as voice generation or translation. However, because characters vary by comic title, supervised learning approaches like training character classifiers which require specific annotations for each comic title are infeasible. This motivates us to propose a novel zero-shot approach, allowing machi… ▽ More

    Submitted 4 September, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to ACM Multimedia 2024. Project page: https://liyingxuan1012.github.io/zeroshot-speaker-prediction ; Github repo: https://github.com/liyingxuan1012/zeroshot-speaker-prediction

  19. arXiv:2403.20331  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models

    Authors: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa

    Abstract: This paper introduces a novel task to evaluate the robust understanding capability of Large Multimodal Models (LMMs), termed $\textbf{Unsolvable Problem Detection (UPD)}$. Multiple-choice question answering (MCQA) is widely used to assess the understanding capability of LMMs, but it does not guarantee that LMMs truly comprehend the answer. UPD assesses the LMM's ability to withhold answers when en… ▽ More

    Submitted 9 June, 2025; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by ACL 2025 Main Conference

  20. arXiv:2403.16141  [pdf, other

    cs.CV

    Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes

    Authors: Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: Recent advancements in the study of Neural Radiance Fields (NeRF) for dynamic scenes often involve explicit modeling of scene dynamics. However, this approach faces challenges in modeling scene dynamics in urban environments, where moving objects of various categories and scales are present. In such settings, it becomes crucial to effectively eliminate moving objects to accurately reconstruct stat… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024), Project website: https://otonari726.github.io/entitynerf/

  21. arXiv:2312.10806  [pdf, other

    cs.CV

    Cross-Lingual Learning in Multilingual Scene Text Recognition

    Authors: Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

    Abstract: In this paper, we investigate cross-lingual learning (CLL) for multilingual scene text recognition (STR). CLL transfers knowledge from one language to another. We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages. To do so, we first examine if two general insights about CLL discussed in previous works are applied to m… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted at ICASSP2024, 5 pages, 2 figures

  22. arXiv:2312.08872  [pdf, other

    cs.CV

    The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

    Authors: Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa

    Abstract: Text-to-image diffusion models allow users control over the content of generated images. Still, text-to-image generation occasionally leads to generation failure requiring users to generate dozens of images under the same text prompt before they obtain a satisfying result. We formulate the lottery ticket hypothesis in denoising: randomly initialized Gaussian noise images contain special pixel bloc… ▽ More

    Submitted 8 October, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: ECCV 2024

  23. arXiv:2311.13602  [pdf, other

    cs.CV

    Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

    Authors: Daichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yamaguchi, Kiyoharu Aizawa

    Abstract: Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. In this paper, we argue that the current layout generation approaches suffer from the limited training data for the high-dimensional layout structure. We show that a simple retrieval augmentation can significantly improve the generation quality. Our… ▽ More

    Submitted 15 April, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: Accepted to CVPR 2024 (Oral), Project website: https://udonda.github.io/RALF/ , GitHub: https://github.com/CyberAgentAILab/RALF

  24. arXiv:2310.00847  [pdf, other

    cs.CV

    Can Pre-trained Networks Detect Familiar Out-of-Distribution Data?

    Authors: Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa

    Abstract: Out-of-distribution (OOD) detection is critical for safety-sensitive machine learning applications and has been extensively studied, yielding a plethora of methods developed in the literature. However, most studies for OOD detection did not use pre-trained models and trained a backbone from scratch. In recent years, transferring knowledge from large pre-trained models to downstream tasks by lightw… ▽ More

    Submitted 12 October, 2023; v1 submitted 1 October, 2023; originally announced October 2023.

  25. Orientation mapping of YbSn$_3$ single crystals based on Bragg-dip analysis using a delay-line superconducting sensor

    Authors: Hiroaki Shishido, The Dang Vu, Kazuya Aizawa, Kenji M. Kojima, Tomio Koyama, Kenichi Oikawa, Masahide Harada, Takayuki Oku, Kazuhiko Soyama, Shigeyuki Miyajima, Mutsuo Hidaka, Soh Y. Suzuki, Manobu M. Tanaka, Shuichi Kawamata, Takekazu Ishida

    Abstract: Recent progress in high-power pulsed neutron sources has stimulated the development of the Bragg-dip and Bragg-edge analysis methods using a two-dimensional neutron detector with high temporal resolution to resolve the neutron energy by the time-of-flight method. The delay-line current-biased kinetic-inductance detector (CB-KID) is a two-dimensional superconducting sensor with a high temporal reso… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 18 pages, 6 figures

    Journal ref: Journal of Applied Crystallography 56, 1108-1113 (2023)

  26. arXiv:2307.16204  [pdf, other

    cs.CV

    Open-Set Domain Adaptation with Visual-Language Foundation Models

    Authors: Qing Yu, Go Irie, Kiyoharu Aizawa

    Abstract: Unsupervised domain adaptation (UDA) has proven to be very effective in transferring knowledge obtained from a source domain with labeled data to a target domain with unlabeled data. Owing to the lack of labeled data in the target domain and the possible presence of unknown classes, open-set domain adaptation (ODA) has emerged as a potential solution to identify these classes during the training p… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

  27. arXiv:2306.17469  [pdf, other

    cs.CV

    Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection

    Authors: Yingxuan Li, Kiyoharu Aizawa, Yusuke Matsui

    Abstract: The expanding market for e-comics has spurred interest in the development of automated methods to analyze comics. For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words. Comics speaker detection research has practical applications, such as automatic character assignment for audiobooks, automatic translation according to characte… ▽ More

    Submitted 22 April, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted to ICME2024

  28. arXiv:2306.01293  [pdf, other

    cs.CV

    LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning

    Authors: Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa

    Abstract: We present a novel vision-language prompt learning approach for few-shot out-of-distribution (OOD) detection. Few-shot OOD detection aims to detect OOD images from classes that are unseen during training using only a few labeled in-distribution (ID) images. While prompt learning methods such as CoOp have shown effectiveness and efficiency in few-shot ID classification, they still face limitations… ▽ More

    Submitted 25 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS 2023

  29. Guided Image Synthesis via Initial Image Editing in Diffusion Model

    Authors: Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa

    Abstract: Diffusion models have the ability to generate high quality images by denoising pure Gaussian noise images. While previous research has primarily focused on improving the control of image generation through adjusting the denoising process, we propose a novel direction of manipulating the initial noise to control the generated image. Through experiments on stable diffusion, we show that blocks of pi… ▽ More

    Submitted 8 October, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: ACM MM 23

  30. arXiv:2304.04521  [pdf, other

    cs.CV

    GL-MCM: Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection

    Authors: Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa

    Abstract: Zero-shot out-of-distribution (OOD) detection is a task that detects OOD images during inference with only in-distribution (ID) class names. Existing methods assume ID images contain a single, centered object, and do not consider the more realistic multi-object scenarios, where both ID and OOD objects are present. To meet the needs of many users, the detection method must have the flexibility to a… ▽ More

    Submitted 21 January, 2025; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted by International Journal of Computer Vision (IJCV) 2025

  31. arXiv:2303.00587  [pdf, other

    eess.IV

    Comprehensive Comparisons of Uniform Quantization in Deep Image Compression

    Authors: Koki Tsubota, Kiyoharu Aizawa

    Abstract: In deep image compression, uniform quantization is applied to latent representations obtained by using an auto-encoder architecture for reducing bits and entropy coding. Quantization is a problem encountered in the end-to-end training of deep image compression. Quantization's gradient is zero, and it cannot backpropagate meaningful gradients. Many methods have been proposed to address the approxim… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: Published in IEEE Access

  32. arXiv:2212.03635  [pdf, other

    cs.CV cs.GR

    Non-uniform Sampling Strategies for NeRF on 360{\textdegree} images

    Authors: Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: In recent years, the performance of novel view synthesis using perspective images has dramatically improved with the advent of neural radiance fields (NeRF). This study proposes two novel techniques that effectively build NeRF for 360{\textdegree} omnidirectional images. Due to the characteristics of a 360{\textdegree} image of ERP format that has spatial distortion in their high latitude regions… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Accepted at the 33rd British Machine Vision Conference (BMVC) 2022

  33. arXiv:2211.10437  [pdf, other

    cs.CV

    A Structure-Guided Diffusion Model for Large-Hole Image Completion

    Authors: Daichi Horita, Jiaolong Yang, Dong Chen, Yuki Koyama, Kiyoharu Aizawa, Nicu Sebe

    Abstract: Image completion techniques have made significant progress in filling missing regions (i.e., holes) in images. However, large-hole completion remains challenging due to limited structural information. In this paper, we address this problem by integrating explicit structural guidance into diffusion-based image completion, forming our structure-guided diffusion model (SGDM). It consists of two casca… ▽ More

    Submitted 6 September, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: BMVC2023. Code: https://github.com/UdonDa/Structure_Guided_Diffusion_Model

  34. arXiv:2211.00918  [pdf, other

    eess.IV cs.CV

    Universal Deep Image Compression via Content-Adaptive Optimization with Adapters

    Authors: Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa

    Abstract: Deep image compression performs better than conventional codecs, such as JPEG, on natural images. However, deep image compression is learning-based and encounters a problem: the compression performance deteriorates significantly for out-of-domain images. In this study, we highlight this problem and address a novel task: universal deep image compression. This task aims to compress images belonging… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

  35. arXiv:2210.12681  [pdf, other

    cs.CV

    Rethinking Rotation in Self-Supervised Contrastive Learning: Adaptive Positive or Negative Data Augmentation

    Authors: Atsuyuki Miyai, Qing Yu, Daiki Ikami, Go Irie, Kiyoharu Aizawa

    Abstract: Rotation is frequently listed as a candidate for data augmentation in contrastive learning but seldom provides satisfactory improvements. We argue that this is because the rotated image is always treated as either positive or negative. The semantics of an image can be rotation-invariant or rotation-variant, so whether the rotated image is treated as positive or negative should be determined based… ▽ More

    Submitted 24 November, 2022; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: Accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

  36. Saliency-based Multiple Region of Interest Detection from a Single 360° image

    Authors: Yuuki Sawabe, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: 360° images are informative -- it contains omnidirectional visual information around the camera. However, the areas that cover a 360° image is much larger than the human's field of view, therefore important information in different view directions is easily overlooked. To tackle this issue, we propose a method for predicting the optimal set of Region of Interest (RoI) from a single 360° image usin… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Journal ref: in IEEE Access, vol. 10, pp. 89124-89133, 2022

  37. Evaluating the Stability of Deep Image Quality Assessment With Respect to Image Scaling

    Authors: Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa

    Abstract: Image quality assessment (IQA) is a fundamental metric for image processing tasks (e.g., compression). With full-reference IQAs, traditional IQAs, such as PSNR and SSIM, have been used. Recently, IQAs based on deep neural networks (deep IQAs), such as LPIPS and DISTS, have also been used. It is known that image scaling is inconsistent among deep IQAs, as some perform down-scaling as pre-processing… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: IEICE Transactions on Information and Systems (Letter)

  38. arXiv:2207.04675  [pdf, other

    cs.CV

    COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

    Authors: Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

    Abstract: Recognizing irregular texts has been a challenging topic in text recognition. To encourage research on this topic, we provide a novel comic onomatopoeia dataset (COO), which consists of onomatopoeia texts in Japanese comics. COO has many arbitrary texts, such as extremely curved, partially shrunk texts, or arbitrarily placed texts. Furthermore, some texts are separated into several parts. Each par… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: Accepted at ECCV 2022. 25 pages, 16 figures

  39. arXiv:2206.10329  [pdf, other

    cs.CV

    SVG Vector Font Generation for Chinese Characters with Transformer

    Authors: Haruka Aoki, Kiyoharu Aizawa

    Abstract: Designing fonts for Chinese characters is highly labor-intensive and time-consuming. While the latest methods successfully generate the English alphabet vector font, despite the high demand for automatic font generation, Chinese vector font generation has been an unsolved problem owing to its complex shape and numerous characters. This study addressed the problem of automatically generating Chines… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted to ICIP 2022

  40. arXiv:2204.04634  [pdf, other

    cs.CV cs.MM

    Intersection Prediction from Single 360° Image via Deep Detection of Possible Direction of Travel

    Authors: Naoki Sugimoto, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: Movie-Map, an interactive first-person-view map that engages the user in a simulated walking experience, comprises short 360° video segments separated by traffic intersections that are seamlessly connected according to the viewer's direction of travel. However, in wide urban-scale areas with numerous intersecting roads, manual intersection segmentation requires significant human effort. Therefore,… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: Accepted for publication in BMVC

  41. arXiv:2204.01027  [pdf, other

    cs.CV

    Distortion-Aware Self-Supervised 360° Depth Estimation from A Single Equirectangular Projection Image

    Authors: Yuya Hasegawa, Ikehata Satoshi, Kiyoharu Aizawa

    Abstract: 360° images are widely available over the last few years. This paper proposes a new technique for single 360° image depth prediction under open environments. Depth prediction from a 360° single image is not easy for two reasons. One is the limitation of supervision datasets - the currently available dataset is limited to indoor scenes. The other is the problems caused by Equirectangular Projection… ▽ More

    Submitted 3 April, 2022; originally announced April 2022.

  42. arXiv:2202.03176  [pdf, other

    cs.CV

    Field-of-View IoU for Object Detection in 360° Images

    Authors: Miao Cao, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: 360° cameras have gained popularity over the last few years. In this paper, we propose two fundamental techniques -- Field-of-View IoU (FoV-IoU) and 360Augmentation for object detection in 360° images. Although most object detection neural networks designed for the perspective images are applicable to 360° images in equirectangular projection (ERP) format, their performance deteriorates owing to t… ▽ More

    Submitted 22 September, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

  43. arXiv:2110.10456  [pdf, other

    cs.CV

    Noisy Annotation Refinement for Object Detection

    Authors: Jiafeng Mao, Qing Yu, Yoko Yamakata, Kiyoharu Aizawa

    Abstract: Supervised training of object detectors requires well-annotated large-scale datasets, whose production is costly. Therefore, some efforts have been made to obtain annotations in economical ways, such as cloud sourcing. However, datasets obtained by these methods tend to contain noisy annotations such as inaccurate bounding boxes and incorrect class labels. In this study, we propose a new problem s… ▽ More

    Submitted 7 December, 2021; v1 submitted 20 October, 2021; originally announced October 2021.

  44. arXiv:2110.03976  [pdf, other

    physics.app-ph physics.ins-det

    High Spatial Resolution Neutron Transmission Imaging Using a Superconducting Two-Dimensional Detector

    Authors: Hiroaki Shishido, Kazuma Nishimura, The Dang Vu, Kazuya Aizawa, Kenji M. Kojima, Tomio Koyama, Kenichi Oikawa, Masahide Harada, Takayuki Oku, Kazuhiko Soyama, Shigeyuki Miyajima, Mutsuo Hidaka, Soh Y. Suzuki, Manobu M. Tanaka, Shuichi Kawamata, Takekazu Ishida

    Abstract: Neutron imaging is one of the most powerful tools for nondestructive inspection owing to the unique characteristics of neutron beams, such as high permeability for many heavy metals, high sensitivity for certain light elements, and isotope selectivity owing to a specific nuclear reaction between an isotope and neutrons. In this study, we employed a superconducting detector, current-biased kinetic-… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: 7 pages, 6 figures, Accepted in Journal of IEEE transactions of Applied Superconductivity

  45. arXiv:2105.03612  [pdf

    cond-mat.supr-con nucl-ex

    Practical tests of neutron transmission imaging with a superconducting kinetic-inductance sensor

    Authors: The Dang Vu, Hiroaki Shishido, Kazuya Aizawa, Kenji M. Kojima, Tomio Koyama, Kenichi Oikawa, Masahide Harada, Takayuki Oku, Kazuhiko Soyama, Shigeyuki Miyajima, Mutsuo Hidaka, Soh Y. Suzuki, Manobu M. Tanakai, Alex Malins, Masahiko Machida, Shuichi Kawamata, Takekazu Ishida

    Abstract: Samples were examined using a superconducting (Nb) neutron imaging system employing a delay-line technique which in previous studies was shown to have high spatial resolution. We found excellent correspondence between neutron transmission and scanning electron microscope (SEM) images of Gd islands with sizes between 15 and 130 micrometer which were thermally-sprayed onto a Si substrate. Neutron tr… ▽ More

    Submitted 8 May, 2021; originally announced May 2021.

    Comments: 19 pages, 6 figures

  46. arXiv:2105.03072  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Perceptual Image Quality Assessment

    Authors: Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More

    Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  47. arXiv:2103.04685  [pdf, other

    cs.LG

    A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels

    Authors: Daiki Tanaka, Daiki Ikami, Kiyoharu Aizawa

    Abstract: Positive-unlabeled learning refers to the process of training a binary classifier using only positive and unlabeled data. Although unlabeled data can contain positive data, all unlabeled data are regarded as negative data in existing positive-unlabeled learning methods, which resulting in diminishing performance. We provide a new perspective on this problem -- considering unlabeled data as noisy-l… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

  48. arXiv:2103.04400  [pdf, other

    cs.CV

    What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

    Authors: Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

    Abstract: Scene text recognition (STR) task has a common practice: All state-of-the-art STR models are trained on large synthetic data. In contrast to this practice, training STR models only on fewer real labels (STR with fewer labels) is important when we have to train STR models without synthetic data: for handwritten or artistic texts that are difficult to generate synthetically and for languages other t… ▽ More

    Submitted 5 June, 2021; v1 submitted 7 March, 2021; originally announced March 2021.

    Comments: CVPR 2021

  49. Building Movie Map -- A Tool for Exploring Areas in a City -- and its Evaluation

    Authors: Naoki Sugimoto, Yoshihito Ebine, Kiyoharu Aizawa

    Abstract: We propose a new Movie Map system, with an interface for exploring cities. The system consists of four stages; acquisition, analysis, management, and interaction. In the acquisition stage, omnidirectional videos are taken along streets in target areas. Frames of the video are localized on the map, intersections are detected, and videos are segmented. Turning views at intersections are subsequently… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Journal ref: ACM Multimedia 2020

  50. arXiv:2011.02206  [pdf, other

    cs.CV

    Few-Shot Font Generation with Deep Metric Learning

    Authors: Haruka Aoki, Koki Tsubota, Hikaru Ikuta, Kiyoharu Aizawa

    Abstract: Designing fonts for languages with a large number of characters, such as Japanese and Chinese, is an extremely labor-intensive and time-consuming task. In this study, we addressed the problem of automatically generating Japanese typographic fonts from only a few font samples, where the synthesized glyphs are expected to have coherent characteristics, such as skeletons, contours, and serifs. Existi… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted to ICPR 2020