-
A 3D Mobile Crowdsensing Framework for Sustainable Urban Digital Twins
Authors:
Taku Yamazaki,
Kaito Watanabe,
Tatsuya Kase,
Kenta Hasegawa,
Koki Saida,
Takumi Miyoshi
Abstract:
In this article, we propose a 3D mobile crowdsensing (3D-MCS) framework aimed at sustainable urban digital twins (UDTs). The framework comprises four key mechanisms: (1) the 3D-MCS mechanism, consisting of active and passive models; (2) the Geohash-based spatial information management mechanism; (3) the dynamic point cloud integration mechanism for UDTs; and (4) the web-based real-time visualizer…
▽ More
In this article, we propose a 3D mobile crowdsensing (3D-MCS) framework aimed at sustainable urban digital twins (UDTs). The framework comprises four key mechanisms: (1) the 3D-MCS mechanism, consisting of active and passive models; (2) the Geohash-based spatial information management mechanism; (3) the dynamic point cloud integration mechanism for UDTs; and (4) the web-based real-time visualizer for 3D-MCS and UDTs. The active sensing model features a gamified 3D-MCS approach, where participants collect point cloud data through an augmented reality territory coloring game. In contrast, the passive sensing model employs a wearable 3D-MCS approach, where participants wear smartphones around their necks without disrupting daily activities. The spatial information management mechanism efficiently partitions the space into regions using Geohash. The dynamic point cloud integration mechanism incorporates point clouds collected by 3D-MCS into UDTs through global and local point cloud registration. Finally, we evaluated the proposed framework through real-world experiments. We verified the effectiveness of the proposed 3D-MCS models from the perspectives of subjective evaluation and data collection and analysis. Furthermore, we analyzed the performance of the dynamic point cloud integration using a dataset.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Authors:
Wiradee Imrattanatrai,
Masaki Asada,
Kimihiro Hasegawa,
Zhi-Qi Cheng,
Ken Fukuda,
Teruko Mitamura
Abstract:
This paper presents VDAct, a dataset for a Video-grounded Dialogue on Event-driven Activities, alongside VDEval, a session-based context evaluation metric specially designed for the task. Unlike existing datasets, VDAct includes longer and more complex video sequences that depict a variety of event-driven activities that require advanced contextual understanding for accurate response generation. T…
▽ More
This paper presents VDAct, a dataset for a Video-grounded Dialogue on Event-driven Activities, alongside VDEval, a session-based context evaluation metric specially designed for the task. Unlike existing datasets, VDAct includes longer and more complex video sequences that depict a variety of event-driven activities that require advanced contextual understanding for accurate response generation. The dataset comprises 3,000 dialogues with over 30,000 question-and-answer pairs, derived from 1,000 videos with diverse activity scenarios. VDAct displays a notably challenging characteristic due to its broad spectrum of activity scenarios and wide range of question types. Empirical studies on state-of-the-art vision foundation models highlight their limitations in addressing certain question types on our dataset. Furthermore, VDEval, which integrates dialogue session history and video content summaries extracted from our supplementary Knowledge Graphs to evaluate individual responses, demonstrates a significantly higher correlation with human assessments on the VDAct dataset than existing evaluation metrics that rely solely on the context of single dialogue turns.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Annealing Machine-assisted Learning of Graph Neural Network for Combinatorial Optimization
Authors:
Pablo Loyola,
Kento Hasegawa,
Andres Hoyos-Idobro,
Kazuo Ono,
Toyotaro Suzumura,
Yu Hirate,
Masanao Yamaoka
Abstract:
While Annealing Machines (AM) have shown increasing capabilities in solving complex combinatorial problems, positioning themselves as a more immediate alternative to the expected advances of future fully quantum solutions, there are still scaling limitations. In parallel, Graph Neural Networks (GNN) have been recently adapted to solve combinatorial problems, showing competitive results and potenti…
▽ More
While Annealing Machines (AM) have shown increasing capabilities in solving complex combinatorial problems, positioning themselves as a more immediate alternative to the expected advances of future fully quantum solutions, there are still scaling limitations. In parallel, Graph Neural Networks (GNN) have been recently adapted to solve combinatorial problems, showing competitive results and potentially high scalability due to their distributed nature. We propose a merging approach that aims at retaining both the accuracy exhibited by AMs and the representational flexibility and scalability of GNNs. Our model considers a compression step, followed by a supervised interaction where partial solutions obtained from the AM are used to guide local GNNs from where node feature representations are obtained and combined to initialize an additional GNN-based solver that handles the original graph's target problem. Intuitively, the AM can solve the combinatorial problem indirectly by infusing its knowledge into the GNN. Experiments on canonical optimization problems show that the idea is feasible, effectively allowing the AM to solve size problems beyond its original limits.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
Multilingual Open QA on the MIA Shared Task
Authors:
Navya Yarrabelly,
Saloni Mittal,
Ketan Todi,
Kimihiro Hasegawa
Abstract:
Cross-lingual information retrieval (CLIR) ~\cite{shi2021cross, asai2021one, jiang2020cross} for example, can find relevant text in any language such as English(high resource) or Telugu (low resource) even when the query is posed in a different, possibly low-resource, language. In this work, we aim to develop useful CLIR models for this constrained, yet important, setting where we do not require a…
▽ More
Cross-lingual information retrieval (CLIR) ~\cite{shi2021cross, asai2021one, jiang2020cross} for example, can find relevant text in any language such as English(high resource) or Telugu (low resource) even when the query is posed in a different, possibly low-resource, language. In this work, we aim to develop useful CLIR models for this constrained, yet important, setting where we do not require any kind of additional supervision or labelled data for retrieval task and hence can work effectively for low-resource languages.
\par We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot multilingual question generation model, which is a pre-trained language model, to compute the probability of the input question in the target language conditioned on a retrieved passage, which can be possibly in a different language. We evaluate our method in a completely zero shot setting and doesn't require any training. Thus the main advantage of our method is that our approach can be used to re-rank results obtained by any sparse retrieval methods like BM-25. This eliminates the need for obtaining expensive labelled corpus required for the retrieval tasks and hence can be used for low resource languages.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding
Authors:
Kimihiro Hasegawa,
Wiradee Imrattanatrai,
Zhi-Qi Cheng,
Masaki Asada,
Susan Holm,
Yuran Wang,
Ken Fukuda,
Teruko Mitamura
Abstract:
Multimodal systems have great potential to assist humans in procedural activities, where people follow instructions to achieve their goals. Despite diverse application scenarios, systems are typically evaluated on traditional classification tasks, e.g., action recognition or temporal action segmentation. In this paper, we present a novel evaluation dataset, ProMQA, to measure system advancements i…
▽ More
Multimodal systems have great potential to assist humans in procedural activities, where people follow instructions to achieve their goals. Despite diverse application scenarios, systems are typically evaluated on traditional classification tasks, e.g., action recognition or temporal action segmentation. In this paper, we present a novel evaluation dataset, ProMQA, to measure system advancements in application-oriented scenarios. ProMQA consists of 401 multimodal procedural QA pairs on user recording of procedural activities coupled with their corresponding instruction. For QA annotation, we take a cost-effective human-LLM collaborative approach, where the existing annotation is augmented with LLM-generated QA pairs that are later verified by humans. We then provide the benchmark results to set the baseline performance on ProMQA. Our experiment reveals a significant gap between human performance and that of current systems, including competitive proprietary multimodal models. We hope our dataset sheds light on new aspects of models' multimodal understanding capabilities.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
Aerial Push-Button with Two-Stage Tactile Feedback using Reflected Airborne Ultrasound Focus
Authors:
Hiroya Sugawara,
Masaya Takasaki,
Keisuke Hasegawa
Abstract:
We developed a new aerial push-button with tactile feedback using focused airborne ultrasound. This study has two significant novelties compared to past related studies: 1) ultrasound emitters are equipped behind the user's finger and reflected ultrasound emission that is focused just above the solid plane placed under the finger presents tactile feedback to a finger pad, and 2) tactile feedback i…
▽ More
We developed a new aerial push-button with tactile feedback using focused airborne ultrasound. This study has two significant novelties compared to past related studies: 1) ultrasound emitters are equipped behind the user's finger and reflected ultrasound emission that is focused just above the solid plane placed under the finger presents tactile feedback to a finger pad, and 2) tactile feedback is presented at two stages during pressing motion; at the time of pushing the button and withdrawing the finger from it. The former has a significant advantage in apparatus implementation in that the input surface of the device can be composed of a generic thin plane including touch panels, potentially capable of presenting input touch feedback only when the user touches objects on the screen. We experimentally found that the two-stage tactile presentation is much more effective in strengthening perceived tactile stimulation and feeling of input completion when compared with a conventional single-stage method. This study proposes a composition of an aerial push-button in much more practical use than ever. The proposed system composition is expected to be one of the simplest frameworks in the airborne ultrasound tactile interface.
△ Less
Submitted 1 July, 2024; v1 submitted 28 June, 2024;
originally announced June 2024.
-
Formulation Comparison for Timeline Construction using LLMs
Authors:
Kimihiro Hasegawa,
Nikhil Kandukuri,
Susan Holm,
Yukari Yamakawa,
Teruko Mitamura
Abstract:
Constructing a timeline requires identifying the chronological order of events in an article. In prior timeline construction datasets, temporal orders are typically annotated by either event-to-time anchoring or event-to-event pairwise ordering, both of which suffer from missing temporal information. To mitigate the issue, we develop a new evaluation dataset, TimeSET, consisting of single-document…
▽ More
Constructing a timeline requires identifying the chronological order of events in an article. In prior timeline construction datasets, temporal orders are typically annotated by either event-to-time anchoring or event-to-event pairwise ordering, both of which suffer from missing temporal information. To mitigate the issue, we develop a new evaluation dataset, TimeSET, consisting of single-document timelines with document-level order annotation. TimeSET features saliency-based event selection and partial ordering, which enable a practical annotation workload. Aiming to build better automatic timeline construction systems, we propose a novel evaluation framework to compare multiple task formulations with TimeSET by prompting open LLMs, i.e., Llama 2 and Flan-T5. Considering that identifying temporal orders of events is a core subtask in timeline construction, we further benchmark open LLMs on existing event temporal ordering datasets to gain a robust understanding of their capabilities. Our experiments show that (1) NLI formulation with Flan-T5 demonstrates a strong performance among others, while (2) timeline construction and event temporal ordering are still challenging tasks for few-shot LLMs. Our code and data are available at https://github.com/kimihiroh/timeset.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
EdgePruner: Poisoned Edge Pruning in Graph Contrastive Learning
Authors:
Hiroya Kato,
Kento Hasegawa,
Seira Hidano,
Kazuhide Fukushima
Abstract:
Graph Contrastive Learning (GCL) is unsupervised graph representation learning that can obtain useful representation of unknown nodes. The node representation can be utilized as features of downstream tasks. However, GCL is vulnerable to poisoning attacks as with existing learning models. A state-of-the-art defense cannot sufficiently negate adverse effects by poisoned graphs although such a defen…
▽ More
Graph Contrastive Learning (GCL) is unsupervised graph representation learning that can obtain useful representation of unknown nodes. The node representation can be utilized as features of downstream tasks. However, GCL is vulnerable to poisoning attacks as with existing learning models. A state-of-the-art defense cannot sufficiently negate adverse effects by poisoned graphs although such a defense introduces adversarial training in the GCL. To achieve further improvement, pruning adversarial edges is important. To the best of our knowledge, the feasibility remains unexplored in the GCL domain. In this paper, we propose a simple defense for GCL, EdgePruner. We focus on the fact that the state-of-the-art poisoning attack on GCL tends to mainly add adversarial edges to create poisoned graphs, which means that pruning edges is important to sanitize the graphs. Thus, EdgePruner prunes edges that contribute to minimizing the contrastive loss based on the node representation obtained after training on poisoned graphs by GCL. Furthermore, we focus on the fact that nodes with distinct features are connected by adversarial edges in poisoned graphs. Thus, we introduce feature similarity between neighboring nodes to help more appropriately determine adversarial edges. This similarity is helpful in further eliminating adverse effects from poisoned graphs on various datasets. Finally, EdgePruner outputs a graph that yields the minimum contrastive loss as the sanitized graph. Our results demonstrate that pruning adversarial edges is feasible on six datasets. EdgePruner can improve the accuracy of node classification under the attack by up to 5.55% compared with that of the state-of-the-art defense. Moreover, we show that EdgePruner is immune to an adaptive attack.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
BlindSpotNet: Seeing Where We Cannot See
Authors:
Taichi Fukuda,
Kotaro Hasegawa,
Shinya Ishizaki,
Shohei Nobuhara,
Ko Nishino
Abstract:
We introduce 2D blind spot estimation as a critical visual task for road scene understanding. By automatically detecting road regions that are occluded from the vehicle's vantage point, we can proactively alert a manual driver or a self-driving system to potential causes of accidents (e.g., draw attention to a road region from which a child may spring out). Detecting blind spots in full 3D would b…
▽ More
We introduce 2D blind spot estimation as a critical visual task for road scene understanding. By automatically detecting road regions that are occluded from the vehicle's vantage point, we can proactively alert a manual driver or a self-driving system to potential causes of accidents (e.g., draw attention to a road region from which a child may spring out). Detecting blind spots in full 3D would be challenging, as 3D reasoning on the fly even if the car is equipped with LiDAR would be prohibitively expensive and error prone. We instead propose to learn to estimate blind spots in 2D, just from a monocular camera. We achieve this in two steps. We first introduce an automatic method for generating ``ground-truth'' blind spot training data for arbitrary driving videos by leveraging monocular depth estimation, semantic segmentation, and SLAM. The key idea is to reason in 3D but from 2D images by defining blind spots as those road regions that are currently invisible but become visible in the near future. We construct a large-scale dataset with this automatic offline blind spot estimation, which we refer to as Road Blind Spot (RBS) dataset. Next, we introduce BlindSpotNet (BSN), a simple network that fully leverages this dataset for fully automatic estimation of frame-wise blind spot probability maps for arbitrary driving videos. Extensive experimental results demonstrate the validity of our RBS Dataset and the effectiveness of our BSN.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
R-HTDetector: Robust Hardware-Trojan Detection Based on Adversarial Training
Authors:
Kento Hasegawa,
Seira Hidano,
Kohei Nozawa,
Shinsaku Kiyomoto,
Nozomu Togawa
Abstract:
Hardware Trojans (HTs) have become a serious problem, and extermination of them is strongly required for enhancing the security and safety of integrated circuits. An effective solution is to identify HTs at the gate level via machine learning techniques. However, machine learning has specific vulnerabilities, such as adversarial examples. In reality, it has been reported that adversarial modified…
▽ More
Hardware Trojans (HTs) have become a serious problem, and extermination of them is strongly required for enhancing the security and safety of integrated circuits. An effective solution is to identify HTs at the gate level via machine learning techniques. However, machine learning has specific vulnerabilities, such as adversarial examples. In reality, it has been reported that adversarial modified HTs greatly degrade the performance of a machine learning-based HT detection method. Therefore, we propose a robust HT detection method using adversarial training (R-HTDetector). We formally describe the robustness of R-HTDetector in modifying HTs. Our work gives the world-first adversarial training for HT detection with theoretical backgrounds. We show through experiments with Trust-HUB benchmarks that R-HTDetector overcomes adversarial examples while maintaining its original accuracy.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Node-wise Hardware Trojan Detection Based on Graph Learning
Authors:
Kento Hasegawa,
Kazuki Yamashita,
Seira Hidano,
Kazuhide Fukushima,
Kazuo Hashimoto,
Nozomu Togawa
Abstract:
In the fourth industrial revolution, securing the protection of the supply chain has become an ever-growing concern. One such cyber threat is a hardware Trojan (HT), a malicious modification to an IC. HTs are often identified in the hardware manufacturing process, but should be removed earlier, when the design is being specified. Machine learning-based HT detection in gate-level netlists is an eff…
▽ More
In the fourth industrial revolution, securing the protection of the supply chain has become an ever-growing concern. One such cyber threat is a hardware Trojan (HT), a malicious modification to an IC. HTs are often identified in the hardware manufacturing process, but should be removed earlier, when the design is being specified. Machine learning-based HT detection in gate-level netlists is an efficient approach to identify HTs at the early stage. However, feature-based modeling has limitations in discovering an appropriate set of HT features. We thus propose NHTD-GL in this paper, a novel node-wise HT detection method based on graph learning (GL). Given the formal analysis of HT features obtained from domain knowledge, NHTD-GL bridges the gap between graph representation learning and feature-based HT detection. The experimental results demonstrate that NHTD-GL achieves 0.998 detection accuracy and outperforms state-of-the-art node-wise HT detection methods. NHTD-GL extracts HT features without heuristic feature engineering.
△ Less
Submitted 15 March, 2022; v1 submitted 3 December, 2021;
originally announced December 2021.
-
Cross-document Event Identity via Dense Annotation
Authors:
Adithya Pratapa,
Zhengzhong Liu,
Kimihiro Hasegawa,
Linwei Li,
Yukari Yamakawa,
Shikun Zhang,
Teruko Mitamura
Abstract:
In this paper, we study the identity of textual events from different documents. While the complex nature of event identity is previously studied (Hovy et al., 2013), the case of events across documents is unclear. Prior work on cross-document event coreference has two main drawbacks. First, they restrict the annotations to a limited set of event types. Second, they insufficiently tackle the conce…
▽ More
In this paper, we study the identity of textual events from different documents. While the complex nature of event identity is previously studied (Hovy et al., 2013), the case of events across documents is unclear. Prior work on cross-document event coreference has two main drawbacks. First, they restrict the annotations to a limited set of event types. Second, they insufficiently tackle the concept of event identity. Such annotation setup reduces the pool of event mentions and prevents one from considering the possibility of quasi-identity relations. We propose a dense annotation approach for cross-document event coreference, comprising a rich source of event mentions and a dense annotation effort between related document pairs. To this end, we design a new annotation workflow with careful quality control and an easy-to-use annotation interface. In addition to the links, we further collect overlapping event contexts, including time, location, and participants, to shed some light on the relation between identity decisions and context. We present an open-access dataset for cross-document event coreference, CDEC-WN, collected from English Wikinews and open-source our annotation toolkit to encourage further research on cross-document tasks.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Manual Character Transmission by Presenting Trajectories of 7mm-high Letters in One Second
Authors:
Keisuke Hasegawa,
Tatsuma Sakurai,
Yasutoshi Makino,
Hiroyuki Shinoda
Abstract:
In this paper, we report a method of intuitively transmitting symbolic information to untrained users via only their hands without using any visual or auditory cues. Our simple concept is presenting three-dimensional letter trajectories to the user's hand via a stylus which is mechanically manipulated. By this simple method, in our experiments, participants were able to read 14 mm-high lower-case…
▽ More
In this paper, we report a method of intuitively transmitting symbolic information to untrained users via only their hands without using any visual or auditory cues. Our simple concept is presenting three-dimensional letter trajectories to the user's hand via a stylus which is mechanically manipulated. By this simple method, in our experiments, participants were able to read 14 mm-high lower-case letters displayed at a rate of one letter per second with an accuracy rate of 71.9% in their first trials, which was improved to 91.3% after a five-minute training period. These results showed small individual differences among participants (standard deviation of 12.7% in the first trials and 6.7% after training). We also found that this accuracy was still retained to a high level (85.1% with SD of 8.2%) even when the letters were reduced to a height of 7 mm. Thus, we revealed that sighted adults potentially possess the ability to read small letters accurately at normal writing speed using their hands.
△ Less
Submitted 1 October, 2015; v1 submitted 24 March, 2015;
originally announced March 2015.