-
Hypothesis on the Functional Advantages of the Selection-Broadcast Cycle Structure: Global Workspace Theory and Dealing with a Real-Time World
Authors:
Junya Nakanishi,
Jun Baba,
Yuichiro Yoshikawa,
Hiroko Kamide,
Hiroshi Ishiguro
Abstract:
This paper discusses the functional advantages of the Selection-Broadcast Cycle structure proposed by Global Workspace Theory (GWT), inspired by human consciousness, particularly focusing on its applicability to artificial intelligence and robotics in dynamic, real-time scenarios. While previous studies often examined the Selection and Broadcast processes independently, this research emphasizes th…
▽ More
This paper discusses the functional advantages of the Selection-Broadcast Cycle structure proposed by Global Workspace Theory (GWT), inspired by human consciousness, particularly focusing on its applicability to artificial intelligence and robotics in dynamic, real-time scenarios. While previous studies often examined the Selection and Broadcast processes independently, this research emphasizes their combined cyclic structure and the resulting benefits for real-time cognitive systems. Specifically, the paper identifies three primary benefits: Dynamic Thinking Adaptation, Experience-Based Adaptation, and Immediate Real-Time Adaptation. This work highlights GWT's potential as a cognitive architecture suitable for sophisticated decision-making and adaptive performance in unsupervised, dynamic environments. It suggests new directions for the development and implementation of robust, general-purpose AI and robotics systems capable of managing complex, real-world tasks.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Proactive User Information Acquisition via Chats on User-Favored Topics
Authors:
Shiki Sato,
Jun Baba,
Asahi Hentona,
Shinji Iwata,
Akifumi Yoshimoto,
Koichiro Yoshino
Abstract:
Chat-oriented dialogue systems designed to provide tangible benefits, such as sharing the latest news or preventing frailty in senior citizens, often require Proactive acquisition of specific user Information via chats on user-faVOred Topics (PIVOT). This study proposes the PIVOT task, designed to advance the technical foundation for these systems. In this task, a system needs to acquire the answe…
▽ More
Chat-oriented dialogue systems designed to provide tangible benefits, such as sharing the latest news or preventing frailty in senior citizens, often require Proactive acquisition of specific user Information via chats on user-faVOred Topics (PIVOT). This study proposes the PIVOT task, designed to advance the technical foundation for these systems. In this task, a system needs to acquire the answers of a user to predefined questions without making the user feel abrupt while engaging in a chat on a predefined topic. We found that even recent large language models (LLMs) show a low success rate in the PIVOT task. We constructed a dataset suitable for the analysis to develop more effective systems. Finally, we developed a simple but effective system for this task by incorporating insights obtained through the analysis of this dataset.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment
Authors:
Koji Inoue,
Yuki Okafuji,
Jun Baba,
Yoshiki Ohira,
Katsuya Hyodo,
Tatsuya Kawahara
Abstract:
Turn-taking is a crucial aspect of human-robot interaction, directly influencing conversational fluidity and user engagement. While previous research has explored turn-taking models in controlled environments, their robustness in real-world settings remains underexplored. In this study, we propose a noise-robust voice activity projection (VAP) model, based on a Transformer architecture, to enhance…
▽ More
Turn-taking is a crucial aspect of human-robot interaction, directly influencing conversational fluidity and user engagement. While previous research has explored turn-taking models in controlled environments, their robustness in real-world settings remains underexplored. In this study, we propose a noise-robust voice activity projection (VAP) model, based on a Transformer architecture, to enhance real-time turn-taking in dialogue robots. To evaluate the effectiveness of the proposed system, we conducted a field experiment in a shopping mall, comparing the VAP system with a conventional cloud-based speech recognition system. Our analysis covered both subjective user evaluations and objective behavioral analysis. The results showed that the proposed system significantly reduced response latency, leading to a more natural conversation where both the robot and users responded faster. The subjective evaluations suggested that faster responses contribute to a better interaction experience.
△ Less
Submitted 8 March, 2025;
originally announced March 2025.
-
What Drives You to Interact?: The Role of User Motivation for a Robot in the Wild
Authors:
Amy Koike,
Yuki Okafuji,
Kenya Hoshimure,
Jun Baba
Abstract:
In this paper, we aim to understand how user motivation shapes human-robot interaction (HRI) in the wild. To explore this, we conducted a field study by deploying a fully autonomous conversational robot in a shopping mall over two days. Through sequential video analysis, we identified five patterns of interaction fluency (Smooth, Awkward, Active, Messy, and Quiet), four types of user motivation fo…
▽ More
In this paper, we aim to understand how user motivation shapes human-robot interaction (HRI) in the wild. To explore this, we conducted a field study by deploying a fully autonomous conversational robot in a shopping mall over two days. Through sequential video analysis, we identified five patterns of interaction fluency (Smooth, Awkward, Active, Messy, and Quiet), four types of user motivation for interacting with the robot (Function, Experiment, Curiosity, and Education), and user positioning towards the robot. We further analyzed how these motivations and positioning influence interaction fluency. Our findings suggest that incorporating users' motivation types into the design of robot behavior can enhance interaction fluency, engagement, and user satisfaction in real-world HRI scenarios.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
User Willingness-aware Sales Talk Dataset
Authors:
Asahi Hentona,
Jun Baba,
Shiki Sato,
Reina Akama
Abstract:
User willingness is a crucial element in the sales talk process that affects the achievement of the salesperson's or sales system's objectives. Despite the importance of user willingness, to the best of our knowledge, no previous study has addressed the development of automated sales talk dialogue systems that explicitly consider user willingness. A major barrier is the lack of sales talk datasets…
▽ More
User willingness is a crucial element in the sales talk process that affects the achievement of the salesperson's or sales system's objectives. Despite the importance of user willingness, to the best of our knowledge, no previous study has addressed the development of automated sales talk dialogue systems that explicitly consider user willingness. A major barrier is the lack of sales talk datasets with reliable user willingness data. Thus, in this study, we developed a user willingness-aware sales talk collection by leveraging the ecological validity concept, which is discussed in the field of human-computer interaction. Our approach focused on three types of user willingness essential in real sales interactions. We created a dialogue environment that closely resembles real-world scenarios to elicit natural user willingness, with participants evaluating their willingness at the utterance level from multiple perspectives. We analyzed the collected data to gain insights into practical user willingness-aware sales talk strategies. In addition, as a practical application of the constructed dataset, we developed and evaluated a sales dialogue system aimed at enhancing the user's intent to purchase.
△ Less
Submitted 27 December, 2024;
originally announced December 2024.
-
RetailOpt: Opt-In, Easy-to-Deploy Trajectory Estimation from Smartphone Motion Data and Retail Facility Information
Authors:
Ryo Yonetani,
Jun Baba,
Yasutaka Furukawa
Abstract:
We present RetailOpt, a novel opt-in, easy-to-deploy system for tracking customer movements offline in indoor retail environments. The system uses readily accessible information from customer smartphones and retail apps, including motion data, store maps, and purchase records. This eliminates the need for additional hardware installations/maintenance and ensures customers full data control. Specif…
▽ More
We present RetailOpt, a novel opt-in, easy-to-deploy system for tracking customer movements offline in indoor retail environments. The system uses readily accessible information from customer smartphones and retail apps, including motion data, store maps, and purchase records. This eliminates the need for additional hardware installations/maintenance and ensures customers full data control. Specifically, RetailOpt first uses inertial navigation to recover relative trajectories from smartphone motion data. The store map and purchase records are cross-referenced to identify a list of visited shelves, providing anchors to localize the relative trajectories in a store through continuous and discrete optimization. We demonstrate the effectiveness of our system in five diverse environments. The system, if successful, would produce accurate customer movement data, essential for a broad range of retail applications including customer behavior analysis and in-store navigation.
△ Less
Submitted 15 July, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Influence of collaborative customer service by service robots and clerks in bakery stores
Authors:
Yuki Okafuji,
Sichao Song,
Jun Baba,
Yuichiro Yoshikawa,
Hiroshi Ishiguro
Abstract:
In recent years, various service robots have been introduced in stores as recommendation systems. Previous studies attempted to increase the influence of these robots by improving their social acceptance and trust. However, when such service robots recommend a product to customers in real environments, the effect on the customers is influenced not only by the robot itself, but also by the social i…
▽ More
In recent years, various service robots have been introduced in stores as recommendation systems. Previous studies attempted to increase the influence of these robots by improving their social acceptance and trust. However, when such service robots recommend a product to customers in real environments, the effect on the customers is influenced not only by the robot itself, but also by the social influence of the surrounding people such as store clerks. Therefore, leveraging the social influence of the clerks may increase the influence of the robots on the customers. Hence, we compared the influence of robots with and without collaborative customer service between the robots and clerks in two bakery stores. The experimental results showed that collaborative customer service increased the purchase rate of the recommended bread and improved the impression regarding the robot and store experience of the customers. Because the results also showed that the workload required for the clerks to collaborate with the robot was not high, this study suggests that all stores with service robots may show high effectiveness in introducing collaborative customer service.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
An Estimation Framework for Passerby Engagement Interacting with Social Robots
Authors:
Taichi Sakaguchi,
Yuki Okafuji,
Kohei Matsumura,
Jun Baba,
Junya Nakanishi
Abstract:
Social robots are expected to be a human labor support technology, and one application of them is an advertising medium in public spaces. When social robots provide information, such as recommended shops, adaptive communication according to the user's state is desired. User engagement, which is also defined as the level of interest in the robot, is likely to play an important role in adaptive comm…
▽ More
Social robots are expected to be a human labor support technology, and one application of them is an advertising medium in public spaces. When social robots provide information, such as recommended shops, adaptive communication according to the user's state is desired. User engagement, which is also defined as the level of interest in the robot, is likely to play an important role in adaptive communication. Therefore, in this paper, we propose a new framework to estimate user engagement. The proposed method focuses on four unsolved open problems: multi-party interactions, process of state change in engagement, difficulty in annotating engagement, and interaction dataset in the real world. The accuracy of the proposed method for estimating engagement was evaluated using interaction duration. The results show that the interaction duration can be accurately estimated by considering the influence of the behaviors of other people; this also implies that the proposed model accurately estimates the level of engagement during interaction with the robot.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks
Authors:
Zhaojie Luo,
Shoufeng Lin,
Rui Liu,
Jun Baba,
Yuichiro Yoshikawa,
Ishiguro Hiroshi
Abstract:
Emotional voice conversion (VC) aims to convert a neutral voice to an emotional (e.g. happy) one while retaining the linguistic information and speaker identity. We note that the decoupling of emotional features from other speech information (such as speaker, content, etc.) is the key to achieving remarkable performance. Some recent attempts about speech representation decoupling on the neutral sp…
▽ More
Emotional voice conversion (VC) aims to convert a neutral voice to an emotional (e.g. happy) one while retaining the linguistic information and speaker identity. We note that the decoupling of emotional features from other speech information (such as speaker, content, etc.) is the key to achieving remarkable performance. Some recent attempts about speech representation decoupling on the neutral speech can not work well on the emotional speech, due to the more complex acoustic properties involved in the latter. To address this problem, here we propose a novel Source-Filter-based Emotional VC model (SFEVC) to achieve proper filtering of speaker-independent emotion features from both the timbre and pitch features. Our SFEVC model consists of multi-channel encoders, emotion separate encoders, and one decoder. Note that all encoder modules adopt a designed information bottlenecks auto-encoder. Additionally, to further improve the conversion quality for various emotions, a novel two-stage training strategy based on the 2D Valence-Arousal (VA) space was proposed. Experimental results show that the proposed SFEVC along with a two-stage training strategy outperforms all baselines and achieves the state-of-the-art performance in speaker-independent emotional VC with nonparallel data.
△ Less
Submitted 3 October, 2021;
originally announced October 2021.
-
Behavioral assessment of a humanoid robot when attracting pedestrians in a mall
Authors:
Yuki Okafuji,
Yasunori Ozaki,
Jun Baba,
Junya Nakanishi,
Kohei Ogawa,
Yuichiro Yoshikawa,
Hiroshi Ishiguro
Abstract:
Research currently being conducted on the use of robots as human labor support technology. In particular, the service industry needs to allocate more manpower, and it will be important for robots to support people. This study focuses on using a humanoid robot as a social service robot to convey information in a shopping mall, and the robot's behavioral concepts were analyzed. In order to convey th…
▽ More
Research currently being conducted on the use of robots as human labor support technology. In particular, the service industry needs to allocate more manpower, and it will be important for robots to support people. This study focuses on using a humanoid robot as a social service robot to convey information in a shopping mall, and the robot's behavioral concepts were analyzed. In order to convey the information, two processes must occur. Pedestrians must stop in front of the robot, and the robot must continue the engagement with them. For the purpose of this study, three types of autonomous behavioral concepts of the robot for the general use were analyzed and compared in these processes in the experiment: active, passive-negative, and passive-positive concepts. After interactions were attempted with 65,000+ pedestrians, this study revealed that the passive-negative concept can make pedestrians stop more and stay longer. In order to evaluate the effectiveness of the robot in a real environment, the comparative results between three behaviors and human advertisers revealed that (1) the results of the active and passive-positive concepts of the robot are comparable to those of the humans, and (2) the performance of the passive-negative concept is higher than that of all participants. These findings demonstrate that the performance of robots is comparable to that of humans in providing information tasks in a limited environment; therefore, it is expected that service robots as a labor support technology will be able to perform well in the real world.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
3D Head-Position Prediction in First-Person View by Considering Head Pose for Human-Robot Eye Contact
Authors:
Yuki Tamaru,
Yasunori Ozaki,
Yuki Okafuji,
Junya Nakanishi,
Yuichiro Yoshikawa,
Jun Baba
Abstract:
For a humanoid robot to make eye contact and initiate communication with a person, it is necessary to estimate the person's head position. However, eye contact becomes difficult due to the mechanical delay of the robot when the person is moving. Owing to these issues, it is important to conduct a head-position prediction to mitigate the effect of the delay in the robot motion. Based on the fact th…
▽ More
For a humanoid robot to make eye contact and initiate communication with a person, it is necessary to estimate the person's head position. However, eye contact becomes difficult due to the mechanical delay of the robot when the person is moving. Owing to these issues, it is important to conduct a head-position prediction to mitigate the effect of the delay in the robot motion. Based on the fact that humans turn their heads before changing direction while walking, we hypothesized that the accuracy of three-dimensional (3D) head-position prediction from a first-person view can be improved by considering the head pose. We compared our method with a conventional Kalman filter-based approach, and found our method to be more accurate. The experiment results show that considering the head pose helps improve the accuracy of 3D head-position prediction.
△ Less
Submitted 20 January, 2022; v1 submitted 10 March, 2021;
originally announced March 2021.