Skip to main content

Showing 1–50 of 148 results for author: Tanaka, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.22301  [pdf, ps, other

    cs.LG

    Weakly-Supervised Domain Adaptation with Proportion-Constrained Pseudo-Labeling

    Authors: Takumi Okuo, Shinnosuke Matsuo, Shota Harada, Kiyohito Tanaka, Ryoma Bise

    Abstract: Domain shift is a significant challenge in machine learning, particularly in medical applications where data distributions differ across institutions due to variations in data collection practices, equipment, and procedures. This can degrade performance when models trained on source domain data are applied to the target domain. Domain adaptation methods have been widely studied to address this iss… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted at IJCNN2025

  2. arXiv:2506.19335  [pdf, ps, other

    cs.SD

    Learning to assess subjective impressions from speech

    Authors: Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko, Noboru Harada

    Abstract: We tackle a new task of training neural network models that can assess subjective impressions conveyed through speech and assign scores accordingly, inspired by the work on automatic speech quality assessment (SQA). Speech impressions are often described using phrases like `cute voice.' We define such phrases as subjective voice descriptors (SVDs). Focusing on the difference in usage scenarios bet… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Accepted on EUSIPCO 2024

  3. arXiv:2506.18326  [pdf, ps, other

    cs.SD eess.AS

    Selecting N-lowest scores for training MOS prediction models

    Authors: Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko

    Abstract: The automatic speech quality assessment (SQA) has been extensively studied to predict the speech quality without time-consuming questionnaires. Recently, neural-based SQA models have been actively developed for speech samples produced by text-to-speech or voice conversion, with a primary focus on training mean opinion score (MOS) prediction models. The quality of each speech sample may not be cons… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted on ICASSP 2024

  4. arXiv:2506.18307  [pdf, ps, other

    cs.SD eess.AS

    Rethinking Mean Opinion Scores in Speech Quality Assessment: Aggregation through Quantized Distribution Fitting

    Authors: Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko

    Abstract: Speech quality assessment (SQA) aims to evaluate the quality of speech samples without relying on time-consuming listener questionnaires. Recent efforts have focused on training neural-based SQA models to predict the mean opinion score (MOS) of speech samples produced by text-to-speech or voice conversion systems. This paper targets the enhancement of MOS prediction models' performance. We propose… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted on ICASSP 2025

  5. arXiv:2506.18296  [pdf, ps, other

    cs.SD eess.AS

    JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles

    Authors: Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko

    Abstract: We construct Japanese Idol Speech Corpus (JIS) to advance research in speech generation AI, including text-to-speech synthesis (TTS) and voice conversion (VC). JIS will facilitate more rigorous evaluations of speaker similarity in TTS and VC systems since all speakers in JIS belong to a highly specific category: "young female live idols" in Japan, and each speaker is identified by a stage name, en… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted on Interspeech 2025

  6. arXiv:2505.12752  [pdf, ps, other

    cs.RO

    MOON: Multi-Objective Optimization-Driven Object-Goal Navigation Using a Variable-Horizon Set-Orienteering Planner

    Authors: Daigo Nakajima, Kanji Tanaka, Daiki Iwata, Kouki Terashima

    Abstract: Object-goal navigation (ON) enables autonomous robots to locate and reach user-specified objects in previously unknown environments, offering promising applications in domains such as assistive care and disaster response. Existing ON methods -- including training-free approaches, reinforcement learning, and zero-shot planners -- generally depend on active exploration to identify landmark objects (… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: 9 pages, 7 figures, technical report

  7. arXiv:2505.04162  [pdf, other

    cs.RO

    SCU-Hand: Soft Conical Universal Robotic Hand for Scooping Granular Media from Containers of Various Sizes

    Authors: Tomoya Takahashi, Cristian C. Beltran-Hernandez, Yuki Kuroda, Kazutoshi Tanaka, Masashi Hamaya, Yoshitaka Ushiku

    Abstract: Automating small-scale experiments in materials science presents challenges due to the heterogeneous nature of experimental setups. This study introduces the SCU-Hand (Soft Conical Universal Robot Hand), a novel end-effector designed to automate the task of scooping powdered samples from various container sizes using a robotic arm. The SCU-Hand employs a flexible, conical structure that adapts to… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 2025 IEEE International Conference on Robotics and Automation (ICRA2025). Preprint. Accepted January 2025

  8. arXiv:2505.01693  [pdf, ps, other

    cs.CL

    High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers

    Authors: Brian Wong, Kaito Tanaka

    Abstract: Automated labeling of chest X-ray reports is essential for enabling downstream tasks such as training image-based diagnostic models, population health studies, and clinical decision support. However, the high variability, complexity, and prevalence of negation and uncertainty in these free-text reports pose significant challenges for traditional Natural Language Processing methods. While large lan… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  9. arXiv:2504.04428  [pdf, other

    cs.SD cs.AI

    Formula-Supervised Sound Event Detection: Pre-Training Without Real Data

    Authors: Yuto Shibata, Keitaro Tanaka, Yoshiaki Bando, Keisuke Imoto, Hirokatsu Kataoka, Yoshimitsu Aoki

    Abstract: In this paper, we propose a novel formula-driven supervised learning (FDSL) framework for pre-training an environmental sound analysis model by leveraging acoustic signals parametrically synthesized through formula-driven methods. Specifically, we outline detailed procedures and evaluate their effectiveness for sound event detection (SED). The SED task, which involves estimating the types and timi… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: Accepted by ICASSP 2025

  10. arXiv:2503.20241  [pdf, ps, other

    cs.RO cs.AI

    LGR: LLM-Guided Ranking of Frontiers for Object Goal Navigation

    Authors: Mitsuaki Uno, Kanji Tanaka, Daiki Iwata, Yudai Noda, Shoya Miyazaki, Kouki Terashima

    Abstract: Object Goal Navigation (OGN) is a fundamental task for robots and AI, with key applications such as mobile robot image databases (MRID). In particular, mapless OGN is essential in scenarios involving unknown or dynamic environments. This study aims to enhance recent modular mapless OGN systems by leveraging the commonsense reasoning capabilities of large language models (LLMs). Specifically, we ad… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: 10 pages, 11 figures, technical report

  11. arXiv:2503.12768  [pdf, ps, other

    cs.RO cs.CV

    Dynamic-Dark SLAM: RGB-Thermal Cooperative Robot Vision Strategy for Multi-Person Tracking in Both Well-Lit and Low-Light Scenes

    Authors: Tatsuro Sakai, Kanji Tanaka, Jonathan Tay Yu Liang, Muhammad Adil Luqman, Daiki Iwata

    Abstract: In robot vision, thermal cameras hold great potential for recognizing humans even in complete darkness. However, their application to multi-person tracking (MPT) has been limited due to data scarcity and the inherent difficulty of distinguishing individuals. In this study, we propose a cooperative MPT system that utilizes co-located RGB and thermal cameras, where pseudo-annotations (bounding boxes… ▽ More

    Submitted 13 April, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: 10 pages, 9 figures, technical report

  12. arXiv:2503.06138  [pdf, other

    cs.AI cs.RO q-bio.NC

    System 0/1/2/3: Quad-process theory for multi-timescale embodied collective cognitive systems

    Authors: Tadahiro Taniguchi, Yasushi Hirai, Masahiro Suzuki, Shingo Murata, Takato Horii, Kazutoshi Tanaka

    Abstract: This paper introduces the System 0/1/2/3 framework as an extension of dual-process theory, employing a quad-process model of cognition. Expanding upon System 1 (fast, intuitive thinking) and System 2 (slow, deliberative thinking), we incorporate System 0, which represents pre-cognitive embodied processes, and System 3, which encompasses collective intelligence and symbol emergence. We contextualiz… ▽ More

    Submitted 13 March, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

    Comments: Under review

  13. arXiv:2503.02256  [pdf, ps, other

    cs.RO

    Continual Multi-Robot Learning from Black-Box Visual Place Recognition Models

    Authors: Kenta Tsukahara, Kanji Tanaka, Daiki Iwata, Jonathan Tay Yu Liang

    Abstract: In the context of visual place recognition (VPR), continual learning (CL) techniques offer significant potential for avoiding catastrophic forgetting when learning new places. However, existing CL methods often focus on knowledge transfer from a known model to a new one, overlooking the existence of unknown black-box models. We explore a novel multi-robot CL approach that enables knowledge transfe… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 6 pages, 4 figures, technical report

  14. arXiv:2502.09899  [pdf, other

    cs.HC

    Transtiff: A Stylus-shaped Interface for Rendering Perceived Stiffness of Virtual Objects via Stylus Stiffness Control

    Authors: Ryoya Komatsu, Ayumu Ogura, Shigeo Yoshida, Kazutoshi Tanaka, Yuichi Itoh

    Abstract: The replication of object stiffness is essential for enhancing haptic feedback in virtual environments. However, existing research has overlooked how stylus stiffness influences the perception of virtual object stiffness during tool-mediated interactions. To address this, we conducted a psychophysical experiment demonstrating that changing stylus stiffness combined with visual stimuli altered user… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: 11 pages, 16 figures

  15. arXiv:2501.02114  [pdf, other

    quant-ph cond-mat.stat-mech cs.AI cs.LG

    Relaxation-assisted reverse annealing on nonnegative/binary matrix factorization

    Authors: Renichiro Haba, Masayuki Ohzeki, Kazuyuki Tanaka

    Abstract: Quantum annealing has garnered significant attention as meta-heuristics inspired by quantum physics for combinatorial optimization problems. Among its many applications, nonnegative/binary matrix factorization stands out for its complexity and relevance in unsupervised machine learning. The use of reverse annealing, a derivative procedure of quantum annealing to prioritize the search in a vicinity… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

  16. arXiv:2412.17282  [pdf, ps, other

    cs.RO

    LMD-PGN: Cross-Modal Knowledge Distillation from First-Person-View Images to Third-Person-View BEV Maps for Universal Point Goal Navigation

    Authors: Riku Uemura, Kanji Tanaka, Kenta Tsukahara, Daiki Iwata

    Abstract: Point goal navigation (PGN) is a mapless navigation approach that trains robots to visually navigate to goal points without relying on pre-built maps. Despite significant progress in handling complex environments using deep reinforcement learning, current PGN methods are designed for single-robot systems, limiting their generalizability to multi-robot scenarios with diverse platforms. This paper a… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: Draft version of a conference paper: 5 pages with 2 figures

  17. arXiv:2412.11523  [pdf, ps, other

    cs.RO

    ON as ALC: Active Loop Closing Object Goal Navigation

    Authors: Daiki Iwata, Kanji Tanaka, Shoya Miyazaki, Kouki Terashima

    Abstract: In simultaneous localization and mapping, active loop closing (ALC) is an active vision problem that aims to visually guide a robot to maximize the chances of revisiting previously visited points, thereby resetting the drift errors accumulated in the incrementally built map during travel. However, current mainstream navigation strategies that leverage such incomplete maps as workspace prior knowle… ▽ More

    Submitted 14 May, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

    Comments: Draft version of a conference paper with 7 pages, 5 figures, and 1 table

  18. arXiv:2412.10758  [pdf, ps, other

    cs.CV

    Optimizing Vision-Language Interactions Through Decoder-Only Models

    Authors: Kaito Tanaka, Benjamin Tan, Brian Wong

    Abstract: Vision-Language Models (VLMs) have emerged as key enablers for multimodal tasks, but their reliance on separate visual encoders introduces challenges in efficiency, scalability, and modality alignment. To address these limitations, we propose MUDAIF (Multimodal Unified Decoder with Adaptive Input Fusion), a decoder-only vision-language model that seamlessly integrates visual and textual inputs thr… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

  19. arXiv:2412.08343  [pdf, other

    cs.GR cs.SD eess.AS

    SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering

    Authors: Hiroki Nishizawa, Keitaro Tanaka, Asuka Hirata, Shugo Yamaguchi, Qi Feng, Masatoshi Hamanaka, Shigeo Morishima

    Abstract: Automatically generating realistic musical performance motion can greatly enhance digital media production, often involving collaboration between professionals and musicians. However, capturing the intricate body, hand, and finger movements required for accurate musical performances is challenging. Existing methods often fall short due to the complex mapping between audio and motion, typically req… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    Comments: 10 pages, 7 figures, 6 tables, WACV 2025

  20. arXiv:2411.14750  [pdf, other

    cs.CV cs.LG

    Ordinal Multiple-instance Learning for Ulcerative Colitis Severity Estimation with Selective Aggregated Transformer

    Authors: Kaito Shiku, Kazuya Nishimura, Daiki Suehiro, Kiyohito Tanaka, Ryoma Bise

    Abstract: Patient-level diagnosis of severity in ulcerative colitis (UC) is common in real clinical settings, where the most severe score in a patient is recorded. However, previous UC classification methods (i.e., image-level estimation) mainly assumed the input was a single image. Thus, these methods can not utilize severity labels recorded in real clinical settings. In this paper, we propose a patient-le… ▽ More

    Submitted 22 November, 2024; originally announced November 2024.

    Comments: 10 pages, 9 figures, Accepted in WACV 2025

  21. arXiv:2411.13153  [pdf, other

    cs.LG

    Long-term Detection System for Six Kinds of Abnormal Behavior of the Elderly Living Alone

    Authors: Kai Tanaka, Mineichi Kudo, Keigo Kimura, Atsuyoshi Nakamura

    Abstract: The proportion of elderly people is increasing worldwide, particularly those living alone in Japan. As elderly people get older, their risks of physical disabilities and health issues increase. To automatically discover these issues at a low cost in daily life, sensor-based detection in a smart home is promising. As part of the effort towards early detection of abnormal behaviors, we propose a sim… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: 20 pages, 3 figures

  22. arXiv:2410.21885  [pdf, other

    cs.CV

    Self-Relaxed Joint Training: Sample Selection for Severity Estimation with Ordinal Noisy Labels

    Authors: Shumpei Takezaki, Kiyohito Tanaka, Seiichi Uchida

    Abstract: Severity level estimation is a crucial task in medical image diagnosis. However, accurately assigning severity class labels to individual images is very costly and challenging. Consequently, the attached labels tend to be noisy. In this paper, we propose a new framework for training with ``ordinal'' noisy labels. Since severity levels have an ordinal relationship, we can leverage this to train a c… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

    Comments: Accepted at WACV2025

  23. CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization

    Authors: Shigemichi Matsuzaki, Kazuhito Tanaka, Kazuhiro Shintani

    Abstract: This letter proposes a method of global localization on a map with semantic object landmarks. One of the most promising approaches for localization on object maps is to use semantic graph matching using landmark descriptors calculated from the distribution of surrounding objects. These descriptors are vulnerable to misclassification and partial observations. Moreover, many existing methods rely on… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: IEEE Robotics and Automation Letters

  24. arXiv:2409.20116  [pdf, other

    cs.CV

    REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for Treatment of Hands after Surviving Stroke

    Authors: Wiktor Mucha, Kentaro Tanaka, Martin Kampel

    Abstract: Stroke represents the third cause of death and disability worldwide, and is recognised as a significant global health problem. A major challenge for stroke survivors is persistent hand dysfunction, which severely affects the ability to perform daily activities and the overall quality of life. In order to regain their functional hand ability, stroke survivors need rehabilitation therapy. However, t… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

    Comments: Accepted at ACVR ECCV 2024

  25. arXiv:2409.14899  [pdf, ps, other

    cs.RO cs.CV cs.LG

    CON: Continual Object Navigation via Data-Free Inter-Agent Knowledge Transfer in Unseen and Unfamiliar Places

    Authors: Kouki Terashima, Daiki Iwata, Kanji Tanaka

    Abstract: This work explores the potential of brief inter-agent knowledge transfer (KT) to enhance the robotic object goal navigation (ON) in unseen and unfamiliar environments. Drawing on the analogy of human travelers acquiring local knowledge, we propose a framework in which a traveler robot (student) communicates with local robots (teachers) to obtain ON knowledge through minimal interactions. We frame… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: 6 pages, 3 figures, workshop paper's draft version

  26. arXiv:2409.09276  [pdf, other

    cs.RO

    Visuo-Tactile Zero-Shot Object Recognition with Vision-Language Model

    Authors: Shiori Ueda, Atsushi Hashimoto, Masashi Hamaya, Kazutoshi Tanaka, Hideo Saito

    Abstract: Tactile perception is vital, especially when distinguishing visually similar objects. We propose an approach to incorporate tactile data into a Vision-Language Model (VLM) for visuo-tactile zero-shot object recognition. Our approach leverages the zero-shot capability of VLMs to infer tactile properties from the names of tactilely similar objects. The proposed method translates tactile data into a… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: 9 pages, 9 figures, accepted to IROS2024, project page: https://omron-sinicx.github.io/visuo-tactile-recognition/

  27. arXiv:2409.04952  [pdf, other

    cs.CV

    Deep Bayesian Active Learning-to-Rank with Relative Annotation for Estimation of Ulcerative Colitis Severity

    Authors: Takeaki Kadota, Hideaki Hayashi, Ryoma Bise, Kiyohito Tanaka, Seiichi Uchida

    Abstract: Automatic image-based severity estimation is an important task in computer-aided diagnosis. Severity estimation by deep learning requires a large amount of training data to achieve a high performance. In general, severity estimation uses training data annotated with discrete (i.e., quantized) severity labels. Annotating discrete labels is often difficult in images with ambiguous severity, and the… ▽ More

    Submitted 9 September, 2024; v1 submitted 7 September, 2024; originally announced September 2024.

    Comments: 14 pages, 8 figures, accepted in Medical Image Analysis 2024

    Journal ref: Medical Image Analysis 2024

  28. arXiv:2409.02245  [pdf, other

    cs.SD cs.AI cs.LG eess.AS stat.ML

    FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo

    Abstract: Diffusion-based voice conversion (VC) techniques such as VoiceGrad have attracted interest because of their high VC performance in terms of speech quality and speaker similarity. However, a notable limitation is the slow inference caused by the multi-step reverse diffusion. Therefore, we propose FastVoiceGrad, a novel one-step diffusion-based VC that reduces the number of iterations from dozens to… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted to Interspeech 2024. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/fastvoicegrad/

  29. arXiv:2408.11202  [pdf, other

    stat.ML cs.AI cs.LG

    Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits

    Authors: Tatsuhiro Shimizu, Koichi Tanaka, Ren Kishimoto, Haruka Kiyohara, Masahiro Nomura, Yuta Saito

    Abstract: We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the action space. For example, it might choose a set of furniture pieces (a bed and a drawer) from available items (bed, drawer, chair, etc.) for interior design sales. This setting is widespread in fields such as recommender systems and healthcare, yet OPE/L of CCB r… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted at RecSys2024

  30. arXiv:2408.06874  [pdf, other

    cs.CL

    Leveraging Language Models for Emotion and Behavior Analysis in Education

    Authors: Kaito Tanaka, Benjamin Tan, Brian Wong

    Abstract: The analysis of students' emotions and behaviors is crucial for enhancing learning outcomes and personalizing educational experiences. Traditional methods often rely on intrusive visual and physiological data collection, posing privacy concerns and scalability issues. This paper proposes a novel method leveraging large language models (LLMs) and prompt engineering to analyze textual data from stud… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 8 pages

  31. arXiv:2408.04293  [pdf, other

    cs.CL cs.CY

    Are Social Sentiments Inherent in LLMs? An Empirical Study on Extraction of Inter-demographic Sentiments

    Authors: Kunitomo Tanaka, Ryohei Sasano, Koichi Takeda

    Abstract: Large language models (LLMs) are supposed to acquire unconscious human knowledge and feelings, such as social common sense and biases, by training models from large amounts of text. However, it is not clear how much the sentiments of specific social groups can be captured in various LLMs. In this study, we focus on social groups defined in terms of nationality, religion, and race/ethnicity, and va… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  32. arXiv:2406.16535  [pdf, other

    cs.CL cs.AI cs.LG

    Token-based Decision Criteria Are Suboptimal in In-context Learning

    Authors: Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue

    Abstract: In-Context Learning (ICL) typically utilizes classification criteria from output probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation applied. To address this problem, we propose Hidden Calibration, which renounces token prob… ▽ More

    Submitted 5 February, 2025; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 24 pages, 15 figures, 13 tables. NAACL 2025 Main Conference Accepted. Camera-ready version

  33. arXiv:2406.11266  [pdf, ps, other

    cs.CV

    DRIP: Discriminative Rotation-Invariant Pole Landmark Descriptor for 3D LiDAR Localization

    Authors: Dingrui Li, Dedi Guo, Kanji Tanaka

    Abstract: In 3D LiDAR-based robot self-localization, pole-like landmarks are gaining popularity as lightweight and discriminative landmarks. This work introduces a novel approach called "discriminative rotation-invariant poles," which enhances the discriminability of pole-like landmarks while maintaining their lightweight nature. Unlike conventional methods that model a pole landmark as a 3D line segment pe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 4 pages, 1 table

  34. arXiv:2406.01468  [pdf, other

    cs.CL cs.AI cs.LG

    Understanding Token Probability Encoding in Output Embeddings

    Authors: Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue

    Abstract: In this paper, we investigate the output token probability information in the output embedding of language models. We find an approximate common log-linear encoding of output token probabilities within the output embedding vectors and empirically demonstrate that it is accurate and sparse. As a causality examination, we steer the encoding in output embedding to modify the output probability distri… ▽ More

    Submitted 11 December, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 15 pages, 17 figures, 3 tables. COLING 2025 Accepted

  35. arXiv:2405.06185  [pdf, other

    cs.CV

    Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection

    Authors: Koji Takeda, Kanji Tanaka, Yoshimasa Nakamura, Asako Kanezaki

    Abstract: In everyday indoor navigation, robots often needto detect non-distinctive small-change objects (e.g., stationery,lost items, and junk, etc.) to maintain domain knowledge. Thisis most relevant to ground-view change detection (GVCD), a recently emerging research area in the field of computer vision.However, these existing techniques rely on high-quality class-specific object priors to regularize a c… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 7 pages, 7 figures

  36. arXiv:2404.11727  [pdf

    cs.CV

    Deep Learning for Video-Based Assessment of Endotracheal Intubation Skills

    Authors: Jean-Paul Ainam, Erim Yanik, Rahul Rahul, Taylor Kunkes, Lora Cavuoto, Brian Clemency, Kaori Tanaka, Matthew Hackett, Jack Norfleet, Suvranu De

    Abstract: Endotracheal intubation (ETI) is an emergency procedure performed in civilian and combat casualty care settings to establish an airway. Objective and automated assessment of ETI skills is essential for the training and certification of healthcare providers. However, the current approach is based on manual feedback by an expert, which is subjective, time- and resource-intensive, and is prone to poo… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  37. 1-out-of-n Oblivious Signatures: Security Revisited and a Generic Construction with an Efficient Communication Cost

    Authors: Masayuki Tezuka, Keisuke Tanaka

    Abstract: 1-out-of-n oblivious signature by Chen (ESORIC 1994) is a protocol between the user and the signer. In this scheme, the user makes a list of n messages and chooses the message that the user wants to obtain a signature from the list. The user interacts with the signer by providing this message list and obtains the signature for only the chosen message without letting the signer identify which messa… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: ICISC 2023

  38. arXiv:2403.16464  [pdf, other

    cs.SD cs.LG eess.AS

    Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka

    Abstract: A generative adversarial network (GAN)-based vocoder trained with an adversarial discriminator is commonly used for speech synthesis because of its fast, lightweight, and high-quality characteristics. However, this data-driven model requires a large amount of training data incurring high data-collection costs. This fact motivates us to train a GAN-based vocoder on limited data. A promising solutio… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to ICASSP 2024. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/augcondd/

  39. ProgrammableGrass: A Shape-Changing Artificial Grass Display Adapted for Dynamic and Interactive Display Features

    Authors: Kojiro Tanaka, Akito Mizuno, Toranosuke Kato, Masahiko Mikawa, Makoto Fujisawa

    Abstract: There are various proposals for employing grass materials as a green landscape-friendly display. However, it is difficult for current techniques to display smooth animations using 8-bit images and to adjust display resolution, similar to conventional displays. We present ProgrammableGrass, an artificial grass display with scalable resolution, capable of swiftly controlling grass color at 8-bit lev… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  40. arXiv:2403.10552  [pdf, ps, other

    cs.LG cs.AI cs.CV cs.RO

    Training Self-localization Models for Unseen Unfamiliar Places via Teacher-to-Student Data-Free Knowledge Transfer

    Authors: Kenta Tsukahara, Kanji Tanaka, Daiki Iwata

    Abstract: A typical assumption in state-of-the-art self-localization models is that an annotated training dataset is available in the target workspace. However, this does not always hold when a robot travels in a general open-world. This study introduces a novel training scheme for open-world distributed robot systems. In our scheme, a robot ("student") can ask the other robots it meets at unfamiliar places… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 7 pages, 3 figures, technical report

  41. arXiv:2402.15830  [pdf, other

    cs.HC cs.ET cs.RO

    Swarm Body: Embodied Swarm Robots

    Authors: Sosuke Ichihashi, So Kuroki, Mai Nishimura, Kazumi Kasaura, Takefumi Hiraki, Kazutoshi Tanaka, Shigeo Yoshida

    Abstract: The human brain's plasticity allows for the integration of artificial body parts into the human body. Leveraging this, embodied systems realize intuitive interactions with the environment. We introduce a novel concept: embodied swarm robots. Swarm robots constitute a collective of robots working in harmony to achieve a common objective, in our case, serving as functional body parts. Embodied swarm… ▽ More

    Submitted 29 February, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  42. arXiv:2402.06092  [pdf, other

    cs.CV cs.RO

    CLIP-Loc: Multi-modal Landmark Association for Global Localization in Object-based Maps

    Authors: Shigemichi Matsuzaki, Takuma Sugino, Kazuhito Tanaka, Zijun Sha, Shintaro Nakaoka, Shintaro Yoshizawa, Kazuhiro Shintani

    Abstract: This paper describes a multi-modal data association method for global localization using object-based maps and camera images. In global localization, or relocalization, using object-based maps, existing methods typically resort to matching all possible combinations of detected objects and landmarks with the same object category, followed by inlier extraction using RANSAC or brute-force search. Thi… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 7 pages, 7 figures. Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2024

  43. arXiv:2401.10005  [pdf, other

    cs.CV cs.CL

    Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation

    Authors: Kohei Uehara, Nabarun Goswami, Hanqin Wang, Toshiaki Baba, Kohtaro Tanaka, Tomohiro Hashimoto, Kai Wang, Rei Ito, Takagi Naoya, Ryo Umagami, Yingyi Wen, Tanachai Anakewat, Tatsuya Harada

    Abstract: The increasing demand for intelligent systems capable of interpreting and reasoning about visual content requires the development of large Vision-and-Language Models (VLMs) that are not only accurate but also have explicit reasoning capabilities. This paper presents a novel approach to develop a VLM with the ability to conduct explicit reasoning based on visual content and textual instructions. We… ▽ More

    Submitted 17 July, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  44. arXiv:2401.09014  [pdf, other

    cs.HC cs.MA

    Data assimilation approach for addressing imperfections in people flow measurement techniques using particle filter

    Authors: Ryo Murata, Kenji Tanaka

    Abstract: Understanding and predicting people flow in urban areas is useful for decision-making in urban planning and marketing strategies. Traditional methods for understanding people flow can be divided into measurement-based approaches and simulation-based approaches. Measurement-based approaches have the advantage of directly capturing actual people flow, but they face the challenge of data imperfection… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  45. arXiv:2401.08242  [pdf, other

    cs.CG math.NA

    Polygonal Sequence-driven Triangulation Validator: An Incremental Approach to 2D Triangulation Verification

    Authors: Sora Sawai, Kazuaki Tanaka, Katsuhisa Ozaki, Shin'ichi Oishi

    Abstract: Two-dimensional Delaunay triangulation is a fundamental aspect of computational geometry. This paper presents a novel algorithm that is specifically designed to ensure the correctness of 2D Delaunay triangulation, namely the Polygonal Sequence-driven Triangulation Validator (PSTV). Our research highlights the paramount importance of proper triangulation and the often overlooked, yet profound, impa… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 27 pages, 18 figures

    MSC Class: 65D18; 68U05; 65N30; 65G50

  46. arXiv:2312.16852  [pdf, other

    cs.LG cs.HC eess.SP

    Sensor Data Simulation for Anomaly Detection of the Elderly Living Alone

    Authors: Kai Tanaka, Mineichi Kudo, Keigo Kimura

    Abstract: With the increase of the number of elderly people living alone around the world, there is a growing demand for sensor-based detection of anomalous behaviors. Although smart homes with ambient sensors could be useful for detecting such anomalies, there is a problem of lack of sufficient real data for developing detection algorithms. For coping with this problem, several sensor data simulators have… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 26 pages, 10 figures

    Journal ref: IEEE Internet of Things Journal, 11-19 (2024), pp. 31675-31686

  47. arXiv:2312.15897  [pdf, ps, other

    cs.RO cs.CV cs.LG

    Recursive Distillation for Open-Set Distributed Robot Localization

    Authors: Kenta Tsukahara, Kanji Tanaka

    Abstract: A typical assumption in state-of-the-art self-localization models is that an annotated training dataset is available for the target workspace. However, this is not necessarily true when a robot travels around the general open world. This work introduces a novel training scheme for open-world distributed robot systems. In our scheme, a robot (``student") can ask the other robots it meets at unfamil… ▽ More

    Submitted 26 September, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: 5 pages, 4 figures, technical report

  48. arXiv:2311.00967  [pdf, other

    cs.RO cs.AI cs.CL

    Vision-Language Interpreter for Robot Task Planning

    Authors: Keisuke Shirai, Cristian C. Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, Shinsuke Mori

    Abstract: Large language models (LLMs) are accelerating the development of language-guided robot planners. Meanwhile, symbolic planners offer the advantage of interpretability. This paper proposes a new task that bridges these two trends, namely, multimodal planning problem specification. The aim is to generate a problem description (PD), a machine-readable file used by the planners to find a plan. By gener… ▽ More

    Submitted 19 February, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: ICRA 2024

  49. arXiv:2310.15504  [pdf, ps, other

    cs.CV cs.RO

    Cross-view Self-localization from Synthesized Scene-graphs

    Authors: Ryogo Yamamoto, Kanji Tanaka

    Abstract: Cross-view self-localization is a challenging scenario of visual place recognition in which database images are provided from sparse viewpoints. Recently, an approach for synthesizing database images from unseen viewpoints using NeRF (Neural Radiance Fields) technology has emerged with impressive performance. However, synthesized images provided by these techniques are often of lower quality than… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 5 pages, 5 figures, technical report

  50. Multimodal Active Measurement for Human Mesh Recovery in Close Proximity

    Authors: Takahiro Maeda, Keisuke Takeshita, Norimichi Ukita, Kazuhito Tanaka

    Abstract: For physical human-robot interactions (pHRI), a robot needs to estimate the accurate body pose of a target person. However, in these pHRI scenarios, the robot cannot fully observe the target person's body with equipped cameras because the target person must be close to the robot for physical interaction. This close distance leads to severe truncation and occlusions and thus results in poor accurac… ▽ More

    Submitted 8 October, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted at Robotics and Automation Letters (RA-L) on Sep 2024