Skip to main content

Showing 1–13 of 13 results for author: Shirai, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.03270  [pdf, ps, other

    cs.RO cs.AI

    Grounded Vision-Language Interpreter for Integrated Task and Motion Planning

    Authors: Jeremy Siburian, Keisuke Shirai, Cristian C. Beltran-Hernandez, Masashi Hamaya, Michael Görner, Atsushi Hashimoto

    Abstract: While recent advances in vision-language models (VLMs) have accelerated the development of language-guided robot planners, their black-box nature often lacks safety guarantees and interpretability crucial for real-world deployment. Conversely, classical symbolic planners offer rigorous safety verification but require significant expert knowledge for setup. To bridge the current gap, this paper pro… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Project website: https://omron-sinicx.github.io/ViLaIn-TAMP/

  2. arXiv:2504.10011  [pdf, other

    cs.RO

    KeyMPs: One-Shot Vision-Language Guided Motion Generation by Sequencing DMPs for Occlusion-Rich Tasks

    Authors: Edgar Anarossi, Yuhwan Kwon, Hirotaka Tahara, Shohei Tanaka, Keisuke Shirai, Masashi Hamaya, Cristian C. Beltran-Hernandez, Atsushi Hashimoto, Takamitsu Matsubara

    Abstract: Dynamic Movement Primitives (DMPs) provide a flexible framework wherein smooth robotic motions are encoded into modular parameters. However, they face challenges in integrating multimodal inputs commonly used in robotics like vision and language into their framework. To fully maximize DMPs' potential, enabling them to handle multimodal inputs is essential. In addition, we also aim to extend DMPs'… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 17 pages, Submitted to IEEE Access April 9th 2025

  3. arXiv:2410.05343  [pdf, other

    cs.CV cs.AI cs.CL

    EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos Referring to Procedural Texts

    Authors: Yuto Haneji, Taichi Nishimura, Hirotaka Kameko, Keisuke Shirai, Tomoya Yoshida, Keiya Kajimura, Koki Yamamoto, Taiyu Cui, Tomohiro Nishimoto, Shinsuke Mori

    Abstract: Mistake action detection is crucial for developing intelligent archives that detect workers' errors and provide feedback. Existing studies have focused on visually apparent mistakes in free-style activities, resulting in video-only approaches to mistake detection. However, in text-following activities, models cannot determine the correctness of some actions without referring to the texts. Addition… ▽ More

    Submitted 11 February, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Main 6 pages, supplementary 13 pages

  4. arXiv:2404.03161  [pdf, ps, other

    cs.CV cs.CL cs.MM

    BioVL-QR: Egocentric Biochemical Vision-and-Language Dataset Using Micro QR Codes

    Authors: Tomohiro Nishimoto, Taichi Nishimura, Koki Yamamoto, Keisuke Shirai, Hirotaka Kameko, Yuto Haneji, Tomoya Yoshida, Keiya Kajimura, Taiyu Cui, Chihiro Nishiwaki, Eriko Daikoku, Natsuko Okuda, Fumihito Ono, Shinsuke Mori

    Abstract: This paper introduces BioVL-QR, a biochemical vision-and-language dataset comprising 23 egocentric experiment videos, corresponding protocols, and vision-and-language alignments. A major challenge in understanding biochemical videos is detecting equipment, reagents, and containers because of the cluttered environment and indistinguishable objects. Previous studies assumed manual object annotation,… ▽ More

    Submitted 29 May, 2025; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: ICIP2025

  5. arXiv:2403.16483  [pdf, other

    cs.CL

    Automatic Construction of a Large-Scale Corpus for Geoparsing Using Wikipedia Hyperlinks

    Authors: Keyaki Ohno, Hirotaka Kameko, Keisuke Shirai, Taichi Nishimura, Shinsuke Mori

    Abstract: Geoparsing is the task of estimating the latitude and longitude (coordinates) of location expressions in texts. Geoparsing must deal with the ambiguity of the expressions that indicate multiple locations with the same notation. For evaluating geoparsing systems, several corpora have been proposed in previous work. However, these corpora are small-scale and suffer from the coverage of location expr… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024

  6. arXiv:2312.09718  [pdf, other

    cs.CL

    Discovering Highly Influential Shortcut Reasoning: An Automated Template-Free Approach

    Authors: Daichi Haraguchi, Kiyoaki Shirai, Naoya Inoue, Natthawut Kertkeidkachorn

    Abstract: Shortcut reasoning is an irrational process of inference, which degrades the robustness of an NLP model. While a number of previous work has tackled the identification of shortcut reasoning, there are still two major limitations: (i) a method for quantifying the severity of the discovered shortcut reasoning is not provided; (ii) certain types of shortcut reasoning may be missed. To address these i… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  7. arXiv:2311.00967  [pdf, other

    cs.RO cs.AI cs.CL

    Vision-Language Interpreter for Robot Task Planning

    Authors: Keisuke Shirai, Cristian C. Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, Shinsuke Mori

    Abstract: Large language models (LLMs) are accelerating the development of language-guided robot planners. Meanwhile, symbolic planners offer the advantage of interpretability. This paper proposes a new task that bridges these two trends, namely, multimodal planning problem specification. The aim is to generate a problem description (PD), a machine-readable file used by the planners to find a plan. By gener… ▽ More

    Submitted 19 February, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: ICRA 2024

  8. arXiv:2305.19497  [pdf, other

    cs.CL

    Towards Flow Graph Prediction of Open-Domain Procedural Texts

    Authors: Keisuke Shirai, Hirotaka Kameko, Shinsuke Mori

    Abstract: Machine comprehension of procedural texts is essential for reasoning about the steps and automating the procedures. However, this requires identifying entities within a text and resolving the relationships between the entities. Previous work focused on the cooking domain and proposed a framework to convert a recipe text into a flow graph (FG) representation. In this work, we propose a framework ba… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: RepL4NLP 2023

  9. arXiv:2209.05840  [pdf, other

    cs.CL cs.AI

    Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

    Authors: Keisuke Shirai, Atsushi Hashimoto, Taichi Nishimura, Hirotaka Kameko, Shuhei Kurita, Yoshitaka Ushiku, Shinsuke Mori

    Abstract: We present a new multimodal dataset called Visual Recipe Flow, which enables us to learn each cooking action result in a recipe text. The dataset consists of object state changes and the workflow of the recipe text. The state change is represented as an image pair, while the workflow is represented as a recipe flow graph (r-FG). The image pairs are grounded in the r-FG, which provides the cross-mo… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: COLING 2022

  10. arXiv:2105.09034  [pdf, other

    cs.GR cs.CV

    Guided Facial Skin Color Correction

    Authors: Keiichiro Shirai, Tatsuya Baba, Shunsuke Ono, Masahiro Okuda, Yusuke Tatesumi, Paul Perrotin

    Abstract: This paper proposes an automatic image correction method for portrait photographs, which promotes consistency of facial skin color by suppressing skin color changes due to background colors. In portrait photographs, skin color is often distorted due to the lighting environment (e.g., light reflected from a colored background wall and over-exposure by a camera strobe), and if the photo is artificia… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: 12 pages, 16 figures

  11. arXiv:2012.14124  [pdf, other

    cs.CL cs.AI

    Neural Text Generation with Artificial Negative Examples

    Authors: Keisuke Shirai, Kazuma Hashimoto, Akiko Eriguchi, Takashi Ninomiya, Shinsuke Mori

    Abstract: Neural text generation models conditioning on given input (e.g. machine translation and image captioning) are usually trained by maximum likelihood estimation of target text. However, the trained models suffer from various types of errors at inference time. In this paper, we propose to suppress an arbitrary type of errors by training the text generation model in a reinforcement learning framework,… ▽ More

    Submitted 28 December, 2020; originally announced December 2020.

  12. Fast Singular Value Shrinkage with Chebyshev Polynomial Approximation Based on Signal Sparsity

    Authors: Masaki Onuki, Shunsuke Ono, Keiichiro Shirai, Yuichi Tanaka

    Abstract: We propose an approximation method for thresholding of singular values using Chebyshev polynomial approximation (CPA). Many signal processing problems require iterative application of singular value decomposition (SVD) for minimizing the rank of a given data matrix with other cost functions and/or constraints, which is called matrix rank minimization. In matrix rank minimization, singular values o… ▽ More

    Submitted 19 May, 2017; originally announced May 2017.

    Comments: This is a journal paper

  13. arXiv:1611.10017  [pdf, ps, other

    cs.CV cs.LG cs.MM

    Fast Supervised Discrete Hashing and its Analysis

    Authors: Gou Koutaki, Keiichiro Shirai, Mitsuru Ambai

    Abstract: In this paper, we propose a learning-based supervised discrete hashing method. Binary hashing is widely used for large-scale image retrieval as well as video and document searches because the compact representation of binary code is essential for data storage and reasonable for query searches using bit-operations. The recently proposed Supervised Discrete Hashing (SDH) efficiently solves mixed-int… ▽ More

    Submitted 30 November, 2016; originally announced November 2016.

    Comments: 12 pages