-
GenAI-Powered Inference
Authors:
Kosuke Imai,
Kentaro Nakamura
Abstract:
We introduce GenAI-Powered Inference (GPI), a statistical framework for both causal and predictive inference using unstructured data, including text and images. GPI leverages open-source Generative Artificial Intelligence (GenAI) models - such as large language models and diffusion models - not only to generate unstructured data at scale but also to extract low-dimensional representations that cap…
▽ More
We introduce GenAI-Powered Inference (GPI), a statistical framework for both causal and predictive inference using unstructured data, including text and images. GPI leverages open-source Generative Artificial Intelligence (GenAI) models - such as large language models and diffusion models - not only to generate unstructured data at scale but also to extract low-dimensional representations that capture their underlying structure. Applying machine learning to these representations, GPI enables estimation of causal and predictive effects while quantifying associated estimation uncertainty. Unlike existing approaches to representation learning, GPI does not require fine-tuning of generative models, making it computationally efficient and broadly accessible. We illustrate the versatility of the GPI framework through three applications: (1) analyzing Chinese social media censorship, (2) estimating predictive effects of candidates' facial appearance on electoral outcomes, and (3) assessing the persuasiveness of political rhetoric. An open-source software package is available for implementing GPI.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
Correspondence of high-dimensional emotion structures elicited by video clips between humans and Multimodal LLMs
Authors:
Haruka Asanuma,
Naoko Koide-Majima,
Ken Nakamura,
Takato Horii,
Shinji Nishimoto,
Masafumi Oizumi
Abstract:
Recent studies have revealed that human emotions exhibit a high-dimensional, complex structure. A full capturing of this complexity requires new approaches, as conventional models that disregard high dimensionality risk overlooking key nuances of human emotions. Here, we examined the extent to which the latest generation of rapidly evolving Multimodal Large Language Models (MLLMs) capture these hi…
▽ More
Recent studies have revealed that human emotions exhibit a high-dimensional, complex structure. A full capturing of this complexity requires new approaches, as conventional models that disregard high dimensionality risk overlooking key nuances of human emotions. Here, we examined the extent to which the latest generation of rapidly evolving Multimodal Large Language Models (MLLMs) capture these high-dimensional, intricate emotion structures, including capabilities and limitations. Specifically, we compared self-reported emotion ratings from participants watching videos with model-generated estimates (e.g., Gemini or GPT). We evaluated performance not only at the individual video level but also from emotion structures that account for inter-video relationships. At the level of simple correlation between emotion structures, our results demonstrated strong similarity between human and model-inferred emotion structures. To further explore whether the similarity between humans and models is at the signle item level or the coarse-categorical level, we applied Gromov Wasserstein Optimal Transport. We found that although performance was not necessarily high at the strict, single-item level, performance across video categories that elicit similar emotions was substantial, indicating that the model could infer human emotional experiences at the category level. Our results suggest that current state-of-the-art MLLMs broadly capture the complex high-dimensional emotion structures at the category level, as well as their apparent limitations in accurately capturing entire structures at the single-item level.
△ Less
Submitted 23 May, 2025; v1 submitted 19 May, 2025;
originally announced May 2025.
-
Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures
Authors:
Junwon Seo,
Kensuke Nakamura,
Andrea Bajcsy
Abstract:
Recent advances in generative world models have enabled classical safe control methods, such as Hamilton-Jacobi (HJ) reachability, to generalize to complex robotic systems operating directly from high-dimensional sensor observations. However, obtaining comprehensive coverage of all safety-critical scenarios during world model training is extremely challenging. As a result, latent safety filters bu…
▽ More
Recent advances in generative world models have enabled classical safe control methods, such as Hamilton-Jacobi (HJ) reachability, to generalize to complex robotic systems operating directly from high-dimensional sensor observations. However, obtaining comprehensive coverage of all safety-critical scenarios during world model training is extremely challenging. As a result, latent safety filters built on top of these models may miss novel hazards and even fail to prevent known ones, overconfidently misclassifying risky out-of-distribution (OOD) situations as safe. To address this, we introduce an uncertainty-aware latent safety filter that proactively steers robots away from both known and unseen failures. Our key idea is to use the world model's epistemic uncertainty as a proxy for identifying unseen potential hazards. We propose a principled method to detect OOD world model predictions by calibrating an uncertainty threshold via conformal prediction. By performing reachability analysis in an augmented state space-spanning both the latent representation and the epistemic uncertainty-we synthesize a latent safety filter that can reliably safeguard arbitrary policies from both known and unseen safety hazards. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our uncertainty-aware safety filter preemptively detects potential unsafe scenarios and reliably proposes safe, in-distribution actions. Video results can be found on the project website at https://cmu-intentlab.github.io/UNISafe
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval
Authors:
Yuji Nozawa,
Yu-Chieh Lin,
Kazumoto Nakamura,
Youyang Ng
Abstract:
The goal of this paper is to enhance pretrained Vision Transformer (ViT) models for focus-oriented image retrieval with visual prompting. In real-world image retrieval scenarios, both query and database images often exhibit complexity, with multiple objects and intricate backgrounds. Users often want to retrieve images with specific object, which we define as the Focus-Oriented Image Retrieval (FO…
▽ More
The goal of this paper is to enhance pretrained Vision Transformer (ViT) models for focus-oriented image retrieval with visual prompting. In real-world image retrieval scenarios, both query and database images often exhibit complexity, with multiple objects and intricate backgrounds. Users often want to retrieve images with specific object, which we define as the Focus-Oriented Image Retrieval (FOIR) task. While a standard image encoder can be employed to extract image features for similarity matching, it may not perform optimally in the multi-object-based FOIR task. This is because each image is represented by a single global feature vector. To overcome this, a prompt-based image retrieval solution is required. We propose an approach called Prompt-guided attention Head Selection (PHS) to leverage the head-wise potential of the multi-head attention mechanism in ViT in a promptable manner. PHS selects specific attention heads by matching their attention maps with user's visual prompts, such as a point, box, or segmentation. This empowers the model to focus on specific object of interest while preserving the surrounding visual context. Notably, PHS does not necessitate model re-training and avoids any image alteration. Experimental results show that PHS substantially improves performance on multiple datasets, offering a practical and training-free solution to enhance model performance in the FOIR task.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
CyberCScope: Mining Skewed Tensor Streams and Online Anomaly Detection in Cybersecurity Systems
Authors:
Kota Nakamura,
Koki Kawabata,
Shungo Tanaka,
Yasuko Matsubara,
Yasushi Sakurai
Abstract:
Cybersecurity systems are continuously producing a huge number of time-stamped events in the form of high-order tensors, such as {count; time, port, flow duration, packet size, . . . }, and so how can we detect anomalies/intrusions in real time? How can we identify multiple types of intrusions and capture their characteristic behaviors? The tensor data consists of categorical and continuous attrib…
▽ More
Cybersecurity systems are continuously producing a huge number of time-stamped events in the form of high-order tensors, such as {count; time, port, flow duration, packet size, . . . }, and so how can we detect anomalies/intrusions in real time? How can we identify multiple types of intrusions and capture their characteristic behaviors? The tensor data consists of categorical and continuous attributes and the data distributions of continuous attributes typically exhibit skew. These data properties require handling skewed infinite and finite dimensional spaces simultaneously. In this paper, we propose a novel streaming method, namely CyberCScope. The method effectively decomposes incoming tensors into major trends while explicitly distinguishing between categorical and skewed continuous attributes. To our knowledge, it is the first to compute hybrid skewed infinite and finite dimensional decomposition. Based on this decomposition, it streamingly finds distinct time-evolving patterns, enabling the detection of multiple types of anomalies. Extensive experiments on large-scale real datasets demonstrate that CyberCScope detects various intrusions with higher accuracy than state-of-the-art baselines while providing meaningful summaries for the intrusions that occur in practice.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
Characterizing Photorealism and Artifacts in Diffusion Model-Generated Images
Authors:
Negar Kamali,
Karyn Nakamura,
Aakriti Kumar,
Angelos Chatzimparmpas,
Jessica Hullman,
Matthew Groh
Abstract:
Diffusion model-generated images can appear indistinguishable from authentic photographs, but these images often contain artifacts and implausibilities that reveal their AI-generated provenance. Given the challenge to public trust in media posed by photorealistic AI-generated images, we conducted a large-scale experiment measuring human detection accuracy on 450 diffusion-model generated images an…
▽ More
Diffusion model-generated images can appear indistinguishable from authentic photographs, but these images often contain artifacts and implausibilities that reveal their AI-generated provenance. Given the challenge to public trust in media posed by photorealistic AI-generated images, we conducted a large-scale experiment measuring human detection accuracy on 450 diffusion-model generated images and 149 real images. Based on collecting 749,828 observations and 34,675 comments from 50,444 participants, we find that scene complexity of an image, artifact types within an image, display time of an image, and human curation of AI-generated images all play significant roles in how accurately people distinguish real from AI-generated images. Additionally, we propose a taxonomy characterizing artifacts often appearing in images generated by diffusion models. Our empirical observations and taxonomy offer nuanced insights into the capabilities and limitations of diffusion models to generate photorealistic images in 2024.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Tensor Decomposition Meets Knowledge Compilation: A Study Comparing Tensor Trains with OBDDs
Authors:
Ryoma Onaka,
Kengo Nakamura,
Masaaki Nishino,
Norihito Yasuda
Abstract:
A knowledge compilation map analyzes tractable operations in Boolean function representations and compares their succinctness. This enables the selection of appropriate representations for different applications. In the knowledge compilation map, all representation classes are subsets of the negation normal form (NNF). However, Boolean functions may be better expressed by a representation that is…
▽ More
A knowledge compilation map analyzes tractable operations in Boolean function representations and compares their succinctness. This enables the selection of appropriate representations for different applications. In the knowledge compilation map, all representation classes are subsets of the negation normal form (NNF). However, Boolean functions may be better expressed by a representation that is different from that of the NNF subsets. In this study, we treat tensor trains as Boolean function representations and analyze their succinctness and tractability. Our study is the first to evaluate the expressiveness of a tensor decomposition method using criteria from knowledge compilation literature. Our main results demonstrate that tensor trains are more succinct than ordered binary decision diagrams (OBDDs) and support the same polytime operations as OBDDs. Our study broadens their application by providing a theoretical link between tensor decomposition and existing NNF subsets.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Generalizing Safety Beyond Collision-Avoidance via Latent-Space Reachability Analysis
Authors:
Kensuke Nakamura,
Lasse Peters,
Andrea Bajcsy
Abstract:
Hamilton-Jacobi (HJ) reachability is a rigorous mathematical framework that enables robots to simultaneously detect unsafe states and generate actions that prevent future failures. While in theory, HJ reachability can synthesize safe controllers for nonlinear systems and nonconvex constraints, in practice, it has been limited to hand-engineered collision-avoidance constraints modeled via low-dimen…
▽ More
Hamilton-Jacobi (HJ) reachability is a rigorous mathematical framework that enables robots to simultaneously detect unsafe states and generate actions that prevent future failures. While in theory, HJ reachability can synthesize safe controllers for nonlinear systems and nonconvex constraints, in practice, it has been limited to hand-engineered collision-avoidance constraints modeled via low-dimensional state-space representations and first-principles dynamics. In this work, our goal is to generalize safe robot controllers to prevent failures that are hard--if not impossible--to write down by hand, but can be intuitively identified from high-dimensional observations: for example, spilling the contents of a bag. We propose Latent Safety Filters, a latent-space generalization of HJ reachability that tractably operates directly on raw observation data (e.g., RGB images) to automatically compute safety-preserving actions without explicit recovery demonstrations by performing safety analysis in the latent embedding space of a generative world model. Our method leverages diverse robot observation-action data of varying quality (including successes, random exploration, and unsafe demonstrations) to learn a world model. Constraint specification is then transformed into a classification problem in the latent space of the learned world model. In simulation and hardware experiments, we compute an approximation of Latent Safety Filters to safeguard arbitrary policies (from imitation- learned policies to direct teleoperation) from complex safety hazards, like preventing a Franka Research 3 manipulator from spilling the contents of a bag or toppling cluttered objects.
△ Less
Submitted 30 April, 2025; v1 submitted 2 February, 2025;
originally announced February 2025.
-
MoHeat: A Modular Platform for High-Responsive Non-Contact Thermal Feedback Interactions
Authors:
Jiayi Xu,
Kazuma Nakamura,
Yoshihiro Kuroda,
Masahiko Inami
Abstract:
MoHeat is a modular hardware and software platform designed for rapid prototyping of highly responsive, non-contact thermal feedback interactions. In our previous work, we developed an intensity-adjustable, highly responsive, non-contact thermal feedback system by integrating the vortex effect and thermal radiation. In this study, we further enhanced the system by developing an authoring tool that…
▽ More
MoHeat is a modular hardware and software platform designed for rapid prototyping of highly responsive, non-contact thermal feedback interactions. In our previous work, we developed an intensity-adjustable, highly responsive, non-contact thermal feedback system by integrating the vortex effect and thermal radiation. In this study, we further enhanced the system by developing an authoring tool that allows users to freely adjust the intensity of thermal stimuli, the duration of stimuli, the delay time before stimuli, and the interval between alternating hot and cold stimuli. This modular approach enables countless combinations of non-contact thermal feedback experiences.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Improving Image Clustering with Artifacts Attenuation via Inference-Time Attention Engineering
Authors:
Kazumoto Nakamura,
Yuji Nozawa,
Yu-Chieh Lin,
Kengo Nakata,
Youyang Ng
Abstract:
The goal of this paper is to improve the performance of pretrained Vision Transformer (ViT) models, particularly DINOv2, in image clustering task without requiring re-training or fine-tuning. As model size increases, high-norm artifacts anomaly appears in the patches of multi-head attention. We observe that this anomaly leads to reduced accuracy in zero-shot image clustering. These artifacts are c…
▽ More
The goal of this paper is to improve the performance of pretrained Vision Transformer (ViT) models, particularly DINOv2, in image clustering task without requiring re-training or fine-tuning. As model size increases, high-norm artifacts anomaly appears in the patches of multi-head attention. We observe that this anomaly leads to reduced accuracy in zero-shot image clustering. These artifacts are characterized by disproportionately large values in the attention map compared to other patch tokens. To address these artifacts, we propose an approach called Inference-Time Attention Engineering (ITAE), which manipulates attention function during inference. Specifically, we identify the artifacts by investigating one of the Query-Key-Value (QKV) patches in the multi-head attention and attenuate their corresponding attention values inside the pretrained models. ITAE shows improved clustering accuracy on multiple datasets by exhibiting more expressive features in latent space. Our findings highlight the potential of ITAE as a practical solution for reducing artifacts in pretrained ViT models and improving model performance in clustering tasks without the need for re-training or fine-tuning.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments
Authors:
Kosuke Imai,
Kentaro Nakamura
Abstract:
In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence (GenAI). Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effec…
▽ More
In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence (GenAI). Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics, from other possibly unknown confounding features. Unlike existing methods, the proposed GenAI-Powered Inference (GPI) methodology eliminates the need to learn causal representation from the data, and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed methodology to the settings in which the treatment feature is based on human perception. The proposed GPI methodology is also applicable to text reuse where an LLM is used to regenerate existing texts. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama 3, to illustrate the advantages of our estimator over state-of-the-art causal representation learning algorithms.
△ Less
Submitted 2 July, 2025; v1 submitted 1 October, 2024;
originally announced October 2024.
-
Decoding Algorithm Correcting Single-Insertion Plus Single-Deletion for Non-binary Quantum Codes
Authors:
Ken Nakamura,
Takayuki Nozaki
Abstract:
In this paper, we assume an error such that a single insertion occurs and then a single deletion occurs. Under such an error model, this paper provides a decoding algorithm for non-binary quantum codes constructed by Matsumoto and Hagiwara.
In this paper, we assume an error such that a single insertion occurs and then a single deletion occurs. Under such an error model, this paper provides a decoding algorithm for non-binary quantum codes constructed by Matsumoto and Hagiwara.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
ARIM-mdx Data System: Towards a Nationwide Data Platform for Materials Science
Authors:
Masatoshi Hanai,
Ryo Ishikawa,
Mitsuaki Kawamura,
Masato Ohnishi,
Norio Takenaka,
Kou Nakamura,
Daiju Matsumura,
Seiji Fujikawa,
Hiroki Sakamoto,
Yukinori Ochiai,
Tetsuo Okane,
Shin-Ichiro Kuroki,
Atsuo Yamada,
Toyotaro Suzumura,
Junichiro Shiomi,
Kenjiro Taura,
Yoshio Mita,
Naoya Shibata,
Yuichi Ikuhara
Abstract:
In modern materials science, effective and high-volume data management across leading-edge experimental facilities and world-class supercomputers is indispensable for cutting-edge research. However, existing integrated systems that handle data from these resources have primarily focused just on smaller-scale cross-institutional or single-domain operations. As a result, they often lack the scalabil…
▽ More
In modern materials science, effective and high-volume data management across leading-edge experimental facilities and world-class supercomputers is indispensable for cutting-edge research. However, existing integrated systems that handle data from these resources have primarily focused just on smaller-scale cross-institutional or single-domain operations. As a result, they often lack the scalability, efficiency, agility, and interdisciplinarity, needed for handling substantial volumes of data from various researchers. In this paper, we introduce ARIM-mdx data system, aiming at a nationwide data platform for materials science in Japan. Currently in its trial phase, the platform has been involving 11 universities and institutes all over Japan, and it is utilized by over 800 researchers from around 140 organizations in academia and industry, being intended to gradually expand its reach. The ARIM-mdx data system, as a pioneering nationwide data platform, has the potential to contribute to the creation of new research communities and accelerate innovations.
△ Less
Submitted 4 November, 2024; v1 submitted 8 September, 2024;
originally announced September 2024.
-
Aortic root landmark localization with optimal transport loss for heatmap regression
Authors:
Tsuyoshi Ishizone,
Masaki Miyasaka,
Sae Ochi,
Norio Tada,
Kazuyuki Nakamura
Abstract:
Anatomical landmark localization is gaining attention to ease the burden on physicians. Focusing on aortic root landmark localization, the three hinge points of the aortic valve can reduce the burden by automatically determining the valve size required for transcatheter aortic valve implantation surgery. Existing methods for landmark prediction of the aortic root mainly use time-consuming two-step…
▽ More
Anatomical landmark localization is gaining attention to ease the burden on physicians. Focusing on aortic root landmark localization, the three hinge points of the aortic valve can reduce the burden by automatically determining the valve size required for transcatheter aortic valve implantation surgery. Existing methods for landmark prediction of the aortic root mainly use time-consuming two-step estimation methods. We propose a highly accurate one-step landmark localization method from even coarse images. The proposed method uses an optimal transport loss to break the trade-off between prediction precision and learning stability in conventional heatmap regression methods. We apply the proposed method to the 3D CT image dataset collected at Sendai Kousei Hospital and show that it significantly improves the estimation error over existing methods and other loss functions. Our code is available on GitHub.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
How to Distinguish AI-Generated Images from Authentic Photographs
Authors:
Negar Kamali,
Karyn Nakamura,
Angelos Chatzimparmpas,
Jessica Hullman,
Matthew Groh
Abstract:
The high level of photorealism in state-of-the-art diffusion models like Midjourney, Stable Diffusion, and Firefly makes it difficult for untrained humans to distinguish between real photographs and AI-generated images. To address this problem, we designed a guide to help readers develop a more critical eye toward identifying artifacts, inconsistencies, and implausibilities that often appear in AI…
▽ More
The high level of photorealism in state-of-the-art diffusion models like Midjourney, Stable Diffusion, and Firefly makes it difficult for untrained humans to distinguish between real photographs and AI-generated images. To address this problem, we designed a guide to help readers develop a more critical eye toward identifying artifacts, inconsistencies, and implausibilities that often appear in AI-generated images. The guide is organized into five categories of artifacts and implausibilities: anatomical, stylistic, functional, violations of physics, and sociocultural. For this guide, we generated 138 images with diffusion models, curated 9 images from social media, and curated 42 real photographs. These images showcase the kinds of cues that prompt suspicion towards the possibility an image is AI-generated and why it is often difficult to draw conclusions about an image's provenance without any context beyond the pixels in an image. Human-perceptible artifacts are not always present in AI-generated images, but this guide reveals artifacts and implausibilities that often emerge. By drawing attention to these kinds of artifacts and implausibilities, we aim to better equip people to distinguish AI-generated images from real photographs in the future.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Single Family Algebra Operation on BDDs and ZDDs Leads To Exponential Blow-Up
Authors:
Kengo Nakamura,
Masaaki Nishino,
Shuhei Denzumi
Abstract:
Binary decision diagram (BDD) and zero-suppressed binary decision diagram (ZDD) are data structures to represent a family of (sub)sets compactly, and it can be used as succinct indexes for a family of sets. To build BDD/ZDD representing a desired family of sets, there are many transformation operations that take BDDs/ZDDs as inputs and output BDD/ZDD representing the resultant family after perform…
▽ More
Binary decision diagram (BDD) and zero-suppressed binary decision diagram (ZDD) are data structures to represent a family of (sub)sets compactly, and it can be used as succinct indexes for a family of sets. To build BDD/ZDD representing a desired family of sets, there are many transformation operations that take BDDs/ZDDs as inputs and output BDD/ZDD representing the resultant family after performing operations such as set union and intersection. However, except for some basic operations, the worst-time complexity of taking such transformation on BDDs/ZDDs has not been extensively studied, and some contradictory statements about it have arisen in the literature. In this paper, we show that many transformation operations on BDDs/ZDDs, including all operations for families of sets that appear in Knuth's book, cannot be performed in worst-case polynomial time in the size of input BDDs/ZDDs. This refutes some of the folklore circulated in past literature and resolves an open problem raised by Knuth. Our results are stronger in that such blow-up of computational time occurs even when the ordering, which has a significant impact on the efficiency of treating BDDs/ZDDs, is chosen arbitrarily.
△ Less
Submitted 30 September, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Not All Errors Are Made Equal: A Regret Metric for Detecting System-level Trajectory Prediction Failures
Authors:
Kensuke Nakamura,
Ran Tian,
Andrea Bajcsy
Abstract:
Robot decision-making increasingly relies on data-driven human prediction models when operating around people. While these models are known to mispredict in out-of-distribution interactions, only a subset of prediction errors impact downstream robot performance. We propose characterizing such "system-level" prediction failures via the mathematical notion of regret: high-regret interactions are pre…
▽ More
Robot decision-making increasingly relies on data-driven human prediction models when operating around people. While these models are known to mispredict in out-of-distribution interactions, only a subset of prediction errors impact downstream robot performance. We propose characterizing such "system-level" prediction failures via the mathematical notion of regret: high-regret interactions are precisely those in which mispredictions degraded closed-loop robot performance. We further introduce a probabilistic generalization of regret that calibrates failure detection across disparate deployment contexts and renders regret compatible with reward-based and reward-free (e.g., generative) planners. In simulated autonomous driving interactions and social navigation interactions deployed on hardware, we showcase that our system-level failure metric can be used offline to automatically extract closed-loop human-robot interactions that state-of-the-art generative human predictors and robot planners previously struggled with. We further find that the very presence of high-regret data during human predictor fine-tuning is highly predictive of robot re-deployment performance improvements. Fine-tuning with the informative but significantly smaller high-regret data (23% of deployment data) is competitive with fine-tuning on the full deployment dataset, indicating a promising avenue for efficiently mitigating system-level human-robot interaction failures. Project website: https://cmu-intentlab.github.io/not-all-errors/
△ Less
Submitted 9 November, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Segmentation of Kidney Tumors on Non-Contrast CT Images using Protuberance Detection Network
Authors:
Taro Hatsutani,
Akimichi Ichinose,
Keigo Nakamura,
Yoshiro Kitamura
Abstract:
Many renal cancers are incidentally found on non-contrast CT (NCCT) images. On contrast-enhanced CT (CECT) images, most kidney tumors, especially renal cancers, have different intensity values compared to normal tissues. However, on NCCT images, some tumors called isodensity tumors, have similar intensity values to the surrounding normal tissues, and can only be detected through a change in organ…
▽ More
Many renal cancers are incidentally found on non-contrast CT (NCCT) images. On contrast-enhanced CT (CECT) images, most kidney tumors, especially renal cancers, have different intensity values compared to normal tissues. However, on NCCT images, some tumors called isodensity tumors, have similar intensity values to the surrounding normal tissues, and can only be detected through a change in organ shape. Several deep learning methods which segment kidney tumors from CECT images have been proposed and showed promising results. However, these methods fail to capture such changes in organ shape on NCCT images. In this paper, we present a novel framework, which can explicitly capture protruded regions in kidneys to enable a better segmentation of kidney tumors. We created a synthetic mask dataset that simulates a protuberance, and trained a segmentation network to separate the protruded regions from the normal kidney regions. To achieve the segmentation of whole tumors, our framework consists of three networks. The first network is a conventional semantic segmentation network which extracts a kidney region mask and an initial tumor region mask. The second network, which we name protuberance detection network, identifies the protruded regions from the kidney region mask. Given the initial tumor region mask and the protruded region mask, the last network fuses them and predicts the final kidney tumor mask accurately. The proposed method was evaluated on a publicly available KiTS19 dataset, which contains 108 NCCT images, and showed that our method achieved a higher dice score of 0.615 (+0.097) and sensitivity of 0.721 (+0.103) compared to 3D-UNet. To the best of our knowledge, this is the first deep learning method that is specifically designed for kidney tumor segmentation on NCCT images.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Visual Grounding of Whole Radiology Reports for 3D CT Images
Authors:
Akimichi Ichinose,
Taro Hatsutani,
Keigo Nakamura,
Yoshiro Kitamura,
Satoshi Iizuka,
Edgar Simo-Serra,
Shoji Kido,
Noriyuki Tomiyama
Abstract:
Building a large-scale training dataset is an essential problem in the development of medical image recognition systems. Visual grounding techniques, which automatically associate objects in images with corresponding descriptions, can facilitate labeling of large number of images. However, visual grounding of radiology reports for CT images remains challenging, because so many kinds of anomalies a…
▽ More
Building a large-scale training dataset is an essential problem in the development of medical image recognition systems. Visual grounding techniques, which automatically associate objects in images with corresponding descriptions, can facilitate labeling of large number of images. However, visual grounding of radiology reports for CT images remains challenging, because so many kinds of anomalies are detectable via CT imaging, and resulting report descriptions are long and complex. In this paper, we present the first visual grounding framework designed for CT image and report pairs covering various body parts and diverse anomaly types. Our framework combines two components of 1) anatomical segmentation of images, and 2) report structuring. The anatomical segmentation provides multiple organ masks of given CT images, and helps the grounding model recognize detailed anatomies. The report structuring helps to accurately extract information regarding the presence, location, and type of each anomaly described in corresponding reports. Given the two additional image/report features, the grounding model can achieve better localization. In the verification process, we constructed a large-scale dataset with region-description correspondence annotations for 10,410 studies of 7,321 unique patients. We evaluated our framework using grounding accuracy, the percentage of correctly localized anomalies, as a metric and demonstrated that the combination of the anatomical segmentation and the report structuring improves the performance with a large margin over the baseline model (66.0% vs 77.8%). Comparison with the prior techniques also showed higher performance of our method.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Deception Game: Closing the Safety-Learning Loop in Interactive Robot Autonomy
Authors:
Haimin Hu,
Zixu Zhang,
Kensuke Nakamura,
Andrea Bajcsy,
Jaime F. Fisac
Abstract:
An outstanding challenge for the widespread deployment of robotic systems like autonomous vehicles is ensuring safe interaction with humans without sacrificing performance. Existing safety methods often neglect the robot's ability to learn and adapt at runtime, leading to overly conservative behavior. This paper proposes a new closed-loop paradigm for synthesizing safe control policies that explic…
▽ More
An outstanding challenge for the widespread deployment of robotic systems like autonomous vehicles is ensuring safe interaction with humans without sacrificing performance. Existing safety methods often neglect the robot's ability to learn and adapt at runtime, leading to overly conservative behavior. This paper proposes a new closed-loop paradigm for synthesizing safe control policies that explicitly account for the robot's evolving uncertainty and its ability to quickly respond to future scenarios as they arise, by jointly considering the physical dynamics and the robot's learning algorithm. We leverage adversarial reinforcement learning for tractable safety analysis under high-dimensional learning dynamics and demonstrate our framework's ability to work with both Bayesian belief propagation and implicit learning through large pre-trained neural trajectory predictors.
△ Less
Submitted 1 November, 2023; v1 submitted 3 September, 2023;
originally announced September 2023.
-
Emergent Coordination through Game-Induced Nonlinear Opinion Dynamics
Authors:
Haimin Hu,
Kensuke Nakamura,
Kai-Chieh Hsu,
Naomi Ehrich Leonard,
Jaime Fernández Fisac
Abstract:
We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to uns…
▽ More
We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to unsafe behavior in safety-critical events. The recently developed nonlinear opinion dynamics (NOD) provide guarantees for breaking deadlocks. However, choosing the appropriate model parameters automatically in general multi-agent settings remains a challenge. In this paper, we first propose a novel and principled procedure for synthesizing NOD based on the value functions of dynamic games conditioned on agents' intents. In particular, we provide for the two-player two-option case precise stability conditions for equilibria of the game-induced NOD based on the mismatch between agents' opinions and their game values. We then propose an optimization-based trajectory optimization algorithm that computes agents' policies guided by the evolution of opinions. The efficacy of our method is illustrated with a simulated toll station coordination example.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Fast and Multi-aspect Mining of Complex Time-stamped Event Streams
Authors:
Kota Nakamura,
Yasuko Matsubara,
Koki Kawabata,
Yuhei Umeda,
Yuichiro Wada,
Yasushi Sakurai
Abstract:
Given a huge, online stream of time-evolving events with multiple attributes, such as online shopping logs: (item, price, brand, time), and local mobility activities: (pick-up and drop-off locations, time), how can we summarize large, dynamic high-order tensor streams? How can we see any hidden patterns, rules, and anomalies? Our answer is to focus on two types of patterns, i.e., ''regimes'' and '…
▽ More
Given a huge, online stream of time-evolving events with multiple attributes, such as online shopping logs: (item, price, brand, time), and local mobility activities: (pick-up and drop-off locations, time), how can we summarize large, dynamic high-order tensor streams? How can we see any hidden patterns, rules, and anomalies? Our answer is to focus on two types of patterns, i.e., ''regimes'' and ''components'', for which we present CubeScope, an efficient and effective method over high-order tensor streams. Specifically, it identifies any sudden discontinuity and recognizes distinct dynamical patterns, ''regimes'' (e.g., weekday/weekend/holiday patterns). In each regime, it also performs multi-way summarization for all attributes (e.g., item, price, brand, and time) and discovers hidden ''components'' representing latent groups (e.g., item/brand groups) and their relationship. Thanks to its concise but effective summarization, CubeScope can also detect the sudden appearance of anomalies and identify the types of anomalies that occur in practice. Our proposed method has the following properties: (a) Effective: it captures dynamical multi-aspect patterns, i.e., regimes and components, and statistically summarizes all the events; (b) General: it is practical for successful application to data compression, pattern discovery, and anomaly detection on various types of tensor streams; (c) Scalable: our algorithm does not depend on the length of the data stream and its dimensionality. Extensive experiments on real datasets demonstrate that CubeScope finds meaningful patterns and anomalies correctly, and consistently outperforms the state-of-the-art methods as regards accuracy and execution speed.
△ Less
Submitted 5 July, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System
Authors:
Takenori Yoshimura,
Shinji Takaki,
Kazuhiro Nakamura,
Keiichiro Oura,
Yukiya Hono,
Kei Hashimoto,
Yoshihiko Nankaku,
Keiichi Tokuda
Abstract:
This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in neural waveform models in the proposed system, both voice characteristics and the pitch of synthesized speech are highly controlled via a frequency warping parameter and fundame…
▽ More
This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in neural waveform models in the proposed system, both voice characteristics and the pitch of synthesized speech are highly controlled via a frequency warping parameter and fundamental frequency, respectively. We implement the mel-cepstral synthesis filter as a differentiable and GPU-friendly module to enable the acoustic and waveform models in the proposed system to be simultaneously optimized in an end-to-end manner. Experiments show that the proposed system improves speech quality from a baseline system maintaining controllability. The core PyTorch modules used in the experiments will be publicly available on GitHub.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Generalization Analysis on Learning with a Concurrent Verifier
Authors:
Masaaki Nishino,
Kengo Nakamura,
Norihito Yasuda
Abstract:
Machine learning technologies have been used in a wide range of practical systems. In practical situations, it is natural to expect the input-output pairs of a machine learning model to satisfy some requirements. However, it is difficult to obtain a model that satisfies requirements by just learning from examples. A simple solution is to add a module that checks whether the input-output pairs meet…
▽ More
Machine learning technologies have been used in a wide range of practical systems. In practical situations, it is natural to expect the input-output pairs of a machine learning model to satisfy some requirements. However, it is difficult to obtain a model that satisfies requirements by just learning from examples. A simple solution is to add a module that checks whether the input-output pairs meet the requirements and then modifies the model's outputs. Such a module, which we call a {\em concurrent verifier} (CV), can give a certification, although how the generalizability of the machine learning model changes using a CV is unclear. This paper gives a generalization analysis of learning with a CV. We analyze how the learnability of a machine learning model changes with a CV and show a condition where we can obtain a guaranteed hypothesis using a verifier only in the inference time. We also show that typical error bounds based on Rademacher complexity will be no larger than that of the original model when using a CV in multi-class classification and structured prediction settings.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Online Update of Safety Assurances Using Confidence-Based Predictions
Authors:
Kensuke Nakamura,
Somil Bansal
Abstract:
Robots such as autonomous vehicles and assistive manipulators are increasingly operating in dynamic environments and close physical proximity to people. In such scenarios, the robot can leverage a human motion predictor to predict their future states and plan safe and efficient trajectories. However, no model is ever perfect -- when the observed human behavior deviates from the model predictions,…
▽ More
Robots such as autonomous vehicles and assistive manipulators are increasingly operating in dynamic environments and close physical proximity to people. In such scenarios, the robot can leverage a human motion predictor to predict their future states and plan safe and efficient trajectories. However, no model is ever perfect -- when the observed human behavior deviates from the model predictions, the robot might plan unsafe maneuvers. Recent works have explored maintaining a confidence parameter in the human model to overcome this challenge, wherein the predicted human actions are tempered online based on the likelihood of the observed human action under the prediction model. This has opened up a new research challenge, i.e., \textit{how to compute the future human states online as the confidence parameter changes?} In this work, we propose a Hamilton-Jacobi (HJ) reachability-based approach to overcome this challenge. Treating the confidence parameter as a virtual state in the system, we compute a parameter-conditioned forward reachable tube (FRT) that provides the future human states as a function of the confidence parameter. Online, as the confidence parameter changes, we can simply query the corresponding FRT, and use it to update the robot plan. Computing parameter-conditioned FRT corresponds to an (offline) high-dimensional reachability problem, which we solve by leveraging recent advances in data-driven reachability analysis. Overall, our framework enables online maintenance and updates of safety assurances in human-robot interaction scenarios, even when the human prediction model is incorrect. We demonstrate our approach in several safety-critical autonomous driving scenarios, involving a state-of-the-art deep learning-based prediction model.
△ Less
Submitted 5 June, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Parameter-Conditioned Reachable Sets for Updating Safety Assurances Online
Authors:
Javier Borquez,
Kensuke Nakamura,
Somil Bansal
Abstract:
Hamilton-Jacobi (HJ) reachability analysis is a powerful tool for analyzing the safety of autonomous systems. However, the provided safety assurances are often predicated on the assumption that once deployed, the system or its environment does not evolve. Online, however, an autonomous system might experience changes in system dynamics, control authority, external disturbances, and/or the surround…
▽ More
Hamilton-Jacobi (HJ) reachability analysis is a powerful tool for analyzing the safety of autonomous systems. However, the provided safety assurances are often predicated on the assumption that once deployed, the system or its environment does not evolve. Online, however, an autonomous system might experience changes in system dynamics, control authority, external disturbances, and/or the surrounding environment, requiring updated safety assurances. Rather than restarting the safety analysis from scratch, which can be time-consuming and often intractable to perform online, we propose to compute \textit{parameter-conditioned} reachable sets. Assuming expected system and environment changes can be parameterized, we treat these parameters as virtual states in the system and leverage recent advances in high-dimensional reachability analysis to solve the corresponding reachability problem offline. This results in a family of reachable sets that is parameterized by the environment and system factors. Online, as these factors change, the system can simply query the corresponding safety function from this family to ensure system safety, enabling a real-time update of the safety assurances. Through various simulation studies, we demonstrate the capability of our approach in maintaining system safety despite the system and environment evolution.
△ Less
Submitted 22 April, 2024; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Decentralized Learning With Limited Communications for Multi-robot Coverage of Unknown Spatial Fields
Authors:
Kensuke Nakamura,
MarÃa Santos,
Naomi Ehrich Leonard
Abstract:
This paper presents an algorithm for a team of mobile robots to simultaneously learn a spatial field over a domain and spatially distribute themselves to optimally cover it. Drawing from previous approaches that estimate the spatial field through a centralized Gaussian process, this work leverages the spatial structure of the coverage problem and presents a decentralized strategy where samples are…
▽ More
This paper presents an algorithm for a team of mobile robots to simultaneously learn a spatial field over a domain and spatially distribute themselves to optimally cover it. Drawing from previous approaches that estimate the spatial field through a centralized Gaussian process, this work leverages the spatial structure of the coverage problem and presents a decentralized strategy where samples are aggregated locally by establishing communications through the boundaries of a Voronoi partition. We present an algorithm whereby each robot runs a local Gaussian process calculated from its own measurements and those provided by its Voronoi neighbors, which are incorporated into the individual robot's Gaussian process only if they provide sufficiently novel information. The performance of the algorithm is evaluated in simulation and compared with centralized approaches.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Class-Difficulty Based Methods for Long-Tailed Visual Recognition
Authors:
Saptarshi Sinha,
Hiroki Ohashi,
Katsuyuki Nakamura
Abstract:
Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted los…
▽ More
Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called `class-wise difficulty based weighted (CDB-W) loss' and a novel data sampling technique called `class-wise difficulty based sampling (CDB-S)'. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.
△ Less
Submitted 22 August, 2022; v1 submitted 29 July, 2022;
originally announced July 2022.
-
C-SENN: Contrastive Self-Explaining Neural Network
Authors:
Yoshihide Sawada,
Keigo Nakamura
Abstract:
In this study, we use a self-explaining neural network (SENN), which learns unsupervised concepts, to acquire concepts that are easy for people to understand automatically. In concept learning, the hidden layer retains verbalizable features relevant to the output, which is crucial when adapting to real-world environments where explanations are required. However, it is known that the interpretabili…
▽ More
In this study, we use a self-explaining neural network (SENN), which learns unsupervised concepts, to acquire concepts that are easy for people to understand automatically. In concept learning, the hidden layer retains verbalizable features relevant to the output, which is crucial when adapting to real-world environments where explanations are required. However, it is known that the interpretability of concepts output by SENN is reduced in general settings, such as autonomous driving scenarios. Thus, this study combines contrastive learning with concept learning to improve the readability of concepts and the accuracy of tasks. We call this model Contrastive Self-Explaining Neural Network (C-SENN).
△ Less
Submitted 26 June, 2022; v1 submitted 20 June, 2022;
originally announced June 2022.
-
Individual health-disease phase diagrams for disease prevention based on machine learning
Authors:
Kazuki Nakamura,
Eiichiro Uchino,
Noriaki Sato,
Ayano Araki,
Kei Terayama,
Ryosuke Kojima,
Koichi Murashita,
Ken Itoh,
Tatsuya Mikami,
Yoshinori Tamada,
Yasushi Okuno
Abstract:
Early disease detection and prevention methods based on effective interventions are gaining attention. Machine learning technology has enabled precise disease prediction by capturing individual differences in multivariate data. Progress in precision medicine has revealed that substantial heterogeneity exists in health data at the individual level and that complex health factors are involved in the…
▽ More
Early disease detection and prevention methods based on effective interventions are gaining attention. Machine learning technology has enabled precise disease prediction by capturing individual differences in multivariate data. Progress in precision medicine has revealed that substantial heterogeneity exists in health data at the individual level and that complex health factors are involved in the development of chronic diseases. However, it remains a challenge to identify individual physiological state changes in cross-disease onset processes because of the complex relationships among multiple biomarkers. Here, we present the health-disease phase diagram (HDPD), which represents a personal health state by visualizing the boundary values of multiple biomarkers that fluctuate early in the disease progression process. In HDPDs, future onset predictions are represented by perturbing multiple biomarker values while accounting for dependencies among variables. We constructed HDPDs for 11 non-communicable diseases (NCDs) from a longitudinal health checkup cohort of 3,238 individuals, comprising 3,215 measurement items and genetic data. Improvement of biomarker values to the non-onset region in HDPD significantly prevented future disease onset in 7 out of 11 NCDs. Our results demonstrate that HDPDs can represent individual physiological states in the onset process and be used as intervention goals for disease prevention.
△ Less
Submitted 7 July, 2022; v1 submitted 31 May, 2022;
originally announced May 2022.
-
HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data
Authors:
Kai Nakamura,
Sharon Levy,
Yi-Lin Tuan,
Wenhu Chen,
William Yang Wang
Abstract:
A pressing challenge in current dialogue systems is to successfully converse with users on topics with information distributed across different modalities. Previous work in multiturn dialogue systems has primarily focused on either text or table information. In more realistic scenarios, having a joint understanding of both is critical as knowledge is typically distributed over both unstructured an…
▽ More
A pressing challenge in current dialogue systems is to successfully converse with users on topics with information distributed across different modalities. Previous work in multiturn dialogue systems has primarily focused on either text or table information. In more realistic scenarios, having a joint understanding of both is critical as knowledge is typically distributed over both unstructured and structured forms. We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables. The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions. We propose retrieval, system state tracking, and dialogue response generation tasks for our dataset and conduct baseline experiments for each. Our results show that there is still ample opportunity for improvement, demonstrating the importance of building stronger dialogue systems that can reason over the complex setting of information-seeking dialogue grounded on tables and text.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
OpenKBP-Opt: An international and reproducible evaluation of 76 knowledge-based planning pipelines
Authors:
Aaron Babier,
Rafid Mahmood,
Binghao Zhang,
Victor G. L. Alves,
Ana Maria Barragán-Montero,
Joel Beaudry,
Carlos E. Cardenas,
Yankui Chang,
Zijie Chen,
Jaehee Chun,
Kelly Diaz,
Harold David Eraso,
Erik Faustmann,
Sibaji Gaj,
Skylar Gay,
Mary Gronberg,
Bingqi Guo,
Junjun He,
Gerd Heilemann,
Sanchit Hira,
Yuliang Huang,
Fuxin Ji,
Dashan Jiang,
Jean Carlo Jimenez Giraldo,
Hoyeon Lee
, et al. (34 additional authors not shown)
Abstract:
We establish an open framework for developing plan optimization models for knowledge-based planning (KBP) in radiotherapy. Our framework includes reference plans for 100 patients with head-and-neck cancer and high-quality dose predictions from 19 KBP models that were developed by different research groups during the OpenKBP Grand Challenge. The dose predictions were input to four optimization mode…
▽ More
We establish an open framework for developing plan optimization models for knowledge-based planning (KBP) in radiotherapy. Our framework includes reference plans for 100 patients with head-and-neck cancer and high-quality dose predictions from 19 KBP models that were developed by different research groups during the OpenKBP Grand Challenge. The dose predictions were input to four optimization models to form 76 unique KBP pipelines that generated 7600 plans. The predictions and plans were compared to the reference plans via: dose score, which is the average mean absolute voxel-by-voxel difference in dose a model achieved; the deviation in dose-volume histogram (DVH) criterion; and the frequency of clinical planning criteria satisfaction. We also performed a theoretical investigation to justify our dose mimicking models. The range in rank order correlation of the dose score between predictions and their KBP pipelines was 0.50 to 0.62, which indicates that the quality of the predictions is generally positively correlated with the quality of the plans. Additionally, compared to the input predictions, the KBP-generated plans performed significantly better (P<0.05; one-sided Wilcoxon test) on 18 of 23 DVH criteria. Similarly, each optimization model generated plans that satisfied a higher percentage of criteria than the reference plans. Lastly, our theoretical investigation demonstrated that the dose mimicking models generated plans that are also optimal for a conventional planning model. This was the largest international effort to date for evaluating the combination of KBP prediction and optimization models. In the interest of reproducibility, our data and code is freely available at https://github.com/ababier/open-kbp-opt.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
Concept Bottleneck Model with Additional Unsupervised Concepts
Authors:
Yoshihide Sawada,
Keigo Nakamura
Abstract:
With the increasing demands for accountability, interpretability is becoming an essential capability for real-world AI applications. However, most methods utilize post-hoc approaches rather than training the interpretable model. In this article, we propose a novel interpretable model based on the concept bottleneck model (CBM). CBM uses concept labels to train an intermediate layer as the addition…
▽ More
With the increasing demands for accountability, interpretability is becoming an essential capability for real-world AI applications. However, most methods utilize post-hoc approaches rather than training the interpretable model. In this article, we propose a novel interpretable model based on the concept bottleneck model (CBM). CBM uses concept labels to train an intermediate layer as the additional visible layer. However, because the number of concept labels restricts the dimension of this layer, it is difficult to obtain high accuracy with a small number of labels. To address this issue, we integrate supervised concepts with unsupervised ones trained with self-explaining neural networks (SENNs). By seamlessly training these two types of concepts while reducing the amount of computation, we can obtain both supervised and unsupervised concepts simultaneously, even for large-sized images. We refer to the proposed model as the concept bottleneck model with additional unsupervised concepts (CBM-AUC). We experimentally confirmed that the proposed model outperformed CBM and SENN. We also visualized the saliency map of each concept and confirmed that it was consistent with the semantic meanings.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
The Application of Zig-Zag Sampler in Sequential Markov Chain Monte Carlo
Authors:
Yu Han,
Kazuyuki Nakamura
Abstract:
Particle filtering methods are widely applied in sequential state estimation within nonlinear non-Gaussian state space model. However, the traditional particle filtering methods suffer the weight degeneracy in the high-dimensional state space model. Currently, there are many methods to improve the performance of particle filtering in high-dimensional state space model. Among these, the more advanc…
▽ More
Particle filtering methods are widely applied in sequential state estimation within nonlinear non-Gaussian state space model. However, the traditional particle filtering methods suffer the weight degeneracy in the high-dimensional state space model. Currently, there are many methods to improve the performance of particle filtering in high-dimensional state space model. Among these, the more advanced method is to construct the Sequential Makov chian Monte Carlo (SMCMC) framework by implementing the Composite Metropolis-Hasting (MH) Kernel. In this paper, we proposed to discrete the Zig-Zag Sampler and apply the Zig-Zag Sampler in the refinement stage of the Composite MH Kernel within the SMCMC framework which is implemented the invertible particle flow in the joint draw stage. We evaluate the performance of proposed method through numerical experiments of the challenging complex high-dimensional filtering examples. Nemurical experiments show that in high-dimensional state estimation examples, the proposed method improves estimation accuracy and increases the acceptance ratio compared with state-of-the-art filtering methods.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Solving Rep-tile by Computers: Performance of Solvers and Analyses of Solutions
Authors:
Mutsunori Banbara,
Kenji Hashimoto,
Takashi Horiyama,
Shin-ichi Minato,
Kakeru Nakamura,
Masaaki Nishino,
Masahiko Sakai,
Ryuhei Uehara,
Yushi Uno,
Norihito Yasuda
Abstract:
A rep-tile is a polygon that can be dissected into smaller copies (of the same size) of the original polygon. A polyomino is a polygon that is formed by joining one or more unit squares edge to edge. These two notions were first introduced and investigated by Solomon W. Golomb in the 1950s and popularized by Martin Gardner in the 1960s. Since then, dozens of studies have been made in communities o…
▽ More
A rep-tile is a polygon that can be dissected into smaller copies (of the same size) of the original polygon. A polyomino is a polygon that is formed by joining one or more unit squares edge to edge. These two notions were first introduced and investigated by Solomon W. Golomb in the 1950s and popularized by Martin Gardner in the 1960s. Since then, dozens of studies have been made in communities of recreational mathematics and puzzles. In this study, we first focus on the specific rep-tiles that have been investigated in these communities. Since the notion of rep-tiles is so simple that can be formulated mathematically in a natural way, we can apply a representative puzzle solver, a MIP solver, and SAT-based solvers for solving the rep-tile problem in common. In comparing their performance, we can conclude that the puzzle solver is the weakest while the SAT-based solvers are the strongest in the context of simple puzzle solving. We then turn to analyses of the specific rep-tiles. Using some properties of the rep-tile patterns found by a solver, we can complete analyses of specific rep-tiles up to certain sizes. That is, up to certain sizes, we can determine the existence of solutions, clarify the number of the solutions, or we can enumerate all the solutions for each size. In the last case, we find new series of solutions for the rep-tiles which have never been found in the communities.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Differentiable Equilibrium Computation with Decision Diagrams for Stackelberg Models of Combinatorial Congestion Games
Authors:
Shinsaku Sakaue,
Kengo Nakamura
Abstract:
We address Stackelberg models of combinatorial congestion games (CCGs); we aim to optimize the parameters of CCGs so that the selfish behavior of non-atomic players attains desirable equilibria. This model is essential for designing such social infrastructures as traffic and communication networks. Nevertheless, computational approaches to the model have not been thoroughly studied due to two diff…
▽ More
We address Stackelberg models of combinatorial congestion games (CCGs); we aim to optimize the parameters of CCGs so that the selfish behavior of non-atomic players attains desirable equilibria. This model is essential for designing such social infrastructures as traffic and communication networks. Nevertheless, computational approaches to the model have not been thoroughly studied due to two difficulties: (I) bilevel-programming structures and (II) the combinatorial nature of CCGs. We tackle them by carefully combining (I) the idea of \textit{differentiable} optimization and (II) data structures called \textit{zero-suppressed binary decision diagrams} (ZDDs), which can compactly represent sets of combinatorial strategies. Our algorithm numerically approximates the equilibria of CCGs, which we can differentiate with respect to parameters of CCGs by automatic differentiation. With the resulting derivatives, we can apply gradient-based methods to Stackelberg models of CCGs. Our method is tailored to induce Nesterov's acceleration and can fully utilize the empirical compactness of ZDDs. These technical advantages enable us to deal with CCGs with a vast number of combinatorial strategies. Experiments on real-world network design instances demonstrate the practicality of our method.
△ Less
Submitted 17 October, 2021; v1 submitted 4 October, 2021;
originally announced October 2021.
-
SHARP: Shielding-Aware Robust Planning for Safe and Efficient Human-Robot Interaction
Authors:
Haimin Hu,
Kensuke Nakamura,
Jaime F. Fisac
Abstract:
Jointly achieving safety and efficiency in human-robot interaction (HRI) settings is a challenging problem, as the robot's planning objectives may be at odds with the human's own intent and expectations. Recent approaches ensure safe robot operation in uncertain environments through a supervisory control scheme, sometimes called "shielding", which overrides the robot's nominal plan with a safety f…
▽ More
Jointly achieving safety and efficiency in human-robot interaction (HRI) settings is a challenging problem, as the robot's planning objectives may be at odds with the human's own intent and expectations. Recent approaches ensure safe robot operation in uncertain environments through a supervisory control scheme, sometimes called "shielding", which overrides the robot's nominal plan with a safety fallback strategy when a safety-critical event is imminent. These reactive "last-resort" strategies (typically in the form of aggressive emergency maneuvers) focus on preserving safety without efficiency considerations; when the nominal planner is unaware of possible safety overrides, shielding can be activated more frequently than necessary, leading to degraded performance. In this work, we propose a new shielding-based planning approach that allows the robot to plan efficiently by explicitly accounting for possible future shielding events. Leveraging recent work on Bayesian human motion prediction, the resulting robot policy proactively balances nominal performance with the risk of high-cost emergency maneuvers triggered by low-probability human behaviors. We formalize Shielding-Aware Robust Planning (SHARP) as a stochastic optimal control problem and propose a computationally efficient framework for finding tractable approximate solutions at runtime. Our method outperforms the shielding-agnostic motion planning baseline (equipped with the same human intent inference scheme) on simulated driving examples with human trajectories taken from the recently released Waymo Open Motion Dataset.
△ Less
Submitted 10 March, 2022; v1 submitted 2 October, 2021;
originally announced October 2021.
-
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Authors:
Katsuyuki Nakamura,
Hiroki Ohashi,
Mitsuhiro Okada
Abstract:
Automatically describing video, or video captioning, has been widely studied in the multimedia field. This paper proposes a new task of sensor-augmented egocentric-video captioning, a newly constructed dataset for it called MMAC Captions, and a method for the newly proposed task that effectively utilizes multi-modal data of video and motion sensors, or inertial measurement units (IMUs). While conv…
▽ More
Automatically describing video, or video captioning, has been widely studied in the multimedia field. This paper proposes a new task of sensor-augmented egocentric-video captioning, a newly constructed dataset for it called MMAC Captions, and a method for the newly proposed task that effectively utilizes multi-modal data of video and motion sensors, or inertial measurement units (IMUs). While conventional video captioning tasks have difficulty in dealing with detailed descriptions of human activities due to the limited view of a fixed camera, egocentric vision has greater potential to be used for generating the finer-grained descriptions of human activities on the basis of a much closer view. In addition, we utilize wearable-sensor data as auxiliary information to mitigate the inherent problems in egocentric vision: motion blur, self-occlusion, and out-of-camera-range activities. We propose a method for effectively utilizing the sensor data in combination with the video data on the basis of an attention mechanism that dynamically determines the modality that requires more attention, taking the contextual information into account. We compared the proposed sensor-fusion method with strong baselines on the MMAC Captions dataset and found that using sensor data as supplementary information to the egocentric-video data was beneficial, and that our proposed method outperformed the strong baselines, demonstrating the effectiveness of the proposed method.
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
Generative Adversarial Networks via a Composite Annealing of Noise and Diffusion
Authors:
Kensuke Nakamura,
Simon Korman,
Byung-Woo Hong
Abstract:
Generative adversarial network (GAN) is a framework for generating fake data using a set of real examples. However, GAN is unstable in the training stage. In order to stabilize GANs, the noise injection has been used to enlarge the overlap of the real and fake distributions at the cost of increasing variance. The diffusion (or smoothing) may reduce the intrinsic underlying dimensionality of data b…
▽ More
Generative adversarial network (GAN) is a framework for generating fake data using a set of real examples. However, GAN is unstable in the training stage. In order to stabilize GANs, the noise injection has been used to enlarge the overlap of the real and fake distributions at the cost of increasing variance. The diffusion (or smoothing) may reduce the intrinsic underlying dimensionality of data but it suppresses the capability of GANs to learn high-frequency information in the training procedure. Based on these observations, we propose a data representation for the GAN training, called noisy scale-space (NSS), that recursively applies the smoothing with a balanced noise to data in order to replace the high-frequency information by random data, leading to a coarse-to-fine training of GANs. We experiment with NSS using DCGAN and StyleGAN2 based on benchmark datasets in which the NSS-based GANs outperforms the state-of-the-arts in most cases.
△ Less
Submitted 31 July, 2022; v1 submitted 1 May, 2021;
originally announced May 2021.
-
GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback
Authors:
Jie Huang,
Rongshun Juan,
Randy Gomez,
Keisuke Nakamura,
Qixin Sha,
Bo He,
Guangliang Li
Abstract:
Deep reinforcement learning (DRL) has achieved great successes in many simulated tasks. The sample inefficiency problem makes applying traditional DRL methods to real-world robots a great challenge. Generative Adversarial Imitation Learning (GAIL) -- a general model-free imitation learning method, allows robots to directly learn policies from expert trajectories in large environments. However, GAI…
▽ More
Deep reinforcement learning (DRL) has achieved great successes in many simulated tasks. The sample inefficiency problem makes applying traditional DRL methods to real-world robots a great challenge. Generative Adversarial Imitation Learning (GAIL) -- a general model-free imitation learning method, allows robots to directly learn policies from expert trajectories in large environments. However, GAIL shares the limitation of other imitation learning methods that they can seldom surpass the performance of demonstrations. In this paper, to address the limit of GAIL, we propose GAN-Based Interactive Reinforcement Learning (GAIRL) from demonstration and human evaluative feedback by combining the advantages of GAIL and interactive reinforcement learning. We tested our proposed method in six physics-based control tasks, ranging from simple low-dimensional control tasks -- Cart Pole and Mountain Car, to difficult high-dimensional tasks -- Inverted Double Pendulum, Lunar Lander, Hopper and HalfCheetah. Our results suggest that with both optimal and suboptimal demonstrations, a GAIRL agent can always learn a more stable policy with optimal or close to optimal performance, while the performance of the GAIL agent is upper bounded by the performance of demonstrations or even worse than it. In addition, our results indicate the reason that GAIRL is superior over GAIL is the complementary effect of demonstrations and human evaluative feedback.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Regularization in network optimization via trimmed stochastic gradient descent with noisy label
Authors:
Kensuke Nakamura,
Bong-Soo Sohn,
Kyoung-Jae Won,
Byung-Woo Hong
Abstract:
Regularization is essential for avoiding over-fitting to training data in network optimization, leading to better generalization of the trained networks. The label noise provides a strong implicit regularization by replacing the target ground truth labels of training examples by uniform random labels. However, it can cause undesirable misleading gradients due to the large loss associated with inco…
▽ More
Regularization is essential for avoiding over-fitting to training data in network optimization, leading to better generalization of the trained networks. The label noise provides a strong implicit regularization by replacing the target ground truth labels of training examples by uniform random labels. However, it can cause undesirable misleading gradients due to the large loss associated with incorrect labels. We propose a first-order optimization method (Label-Noised Trim-SGD) that uses the label noise with the example trimming in order to remove the outliers based on the loss. The proposed algorithm is simple yet enables us to impose a large label-noise and obtain a better regularization effect than the original methods. The quantitative analysis is performed by comparing the behavior of the label noise, the example trimming, and the proposed algorithm. We also present empirical results that demonstrate the effectiveness of our algorithm using the major benchmarks and the fundamental networks, where our method has successfully outperformed the state-of-the-art optimization methods.
△ Less
Submitted 2 May, 2022; v1 submitted 20 December, 2020;
originally announced December 2020.
-
Health improvement framework for planning actionable treatment process using surrogate Bayesian model
Authors:
Kazuki Nakamura,
Ryosuke Kojima,
Eiichiro Uchino,
Koichi Murashita,
Ken Itoh,
Shigeyuki Nakaji,
Yasushi Okuno
Abstract:
Clinical decision making regarding treatments based on personal characteristics leads to effective health improvements. Machine learning (ML) has been the primary concern of diagnosis support according to comprehensive patient information. However, the remaining prominent issue is the development of objective treatment processes in clinical situations. This study proposes a novel framework to plan…
▽ More
Clinical decision making regarding treatments based on personal characteristics leads to effective health improvements. Machine learning (ML) has been the primary concern of diagnosis support according to comprehensive patient information. However, the remaining prominent issue is the development of objective treatment processes in clinical situations. This study proposes a novel framework to plan treatment processes in a data-driven manner. A key point of the framework is the evaluation of the "actionability" for personal health improvements by using a surrogate Bayesian model in addition to a high-performance nonlinear ML model. We first evaluated the framework from the viewpoint of its methodology using a synthetic dataset. Subsequently, the framework was applied to an actual health checkup dataset comprising data from 3,132 participants, to improve systolic blood pressure values at the individual level. We confirmed that the computed treatment processes are actionable and consistent with clinical knowledge for lowering blood pressure. These results demonstrate that our framework could contribute toward decision making in the medical field, providing clinicians with deeper insights.
△ Less
Submitted 13 November, 2020; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Ensemble Kalman Variational Objectives: Nonlinear Latent Trajectory Inference with A Hybrid of Variational Inference and Ensemble Kalman Filter
Authors:
Tsuyoshi Ishizone,
Tomoyuki Higuchi,
Kazuyuki Nakamura
Abstract:
Variational inference (VI) combined with Bayesian nonlinear filtering produces state-of-the-art results for latent time-series modeling. A body of recent work has focused on sequential Monte Carlo (SMC) and its variants, e.g., forward filtering backward simulation (FFBSi). Although these studies have succeeded, serious problems remain in particle degeneracy and biased gradient estimators. In this…
▽ More
Variational inference (VI) combined with Bayesian nonlinear filtering produces state-of-the-art results for latent time-series modeling. A body of recent work has focused on sequential Monte Carlo (SMC) and its variants, e.g., forward filtering backward simulation (FFBSi). Although these studies have succeeded, serious problems remain in particle degeneracy and biased gradient estimators. In this paper, we propose Ensemble Kalman Variational Objective (EnKO), a hybrid method of VI and the ensemble Kalman filter (EnKF), to infer state space models (SSMs). Our proposed method can efficiently identify latent dynamics because of its particle diversity and unbiased gradient estimators. We demonstrate that our EnKO outperforms SMC-based methods in terms of predictive ability and particle efficiency for three benchmark nonlinear system identification tasks.
△ Less
Submitted 9 November, 2021; v1 submitted 17 October, 2020;
originally announced October 2020.
-
Class-Wise Difficulty-Balanced Loss for Solving Class-Imbalance
Authors:
Saptarshi Sinha,
Hiroki Ohashi,
Katsuyuki Nakamura
Abstract:
Class-imbalance is one of the major challenges in real world datasets, where a few classes (called majority classes) constitute much more data samples than the rest (called minority classes). Learning deep neural networks using such datasets leads to performances that are typically biased towards the majority classes. Most of the prior works try to solve class-imbalance by assigning more weights t…
▽ More
Class-imbalance is one of the major challenges in real world datasets, where a few classes (called majority classes) constitute much more data samples than the rest (called minority classes). Learning deep neural networks using such datasets leads to performances that are typically biased towards the majority classes. Most of the prior works try to solve class-imbalance by assigning more weights to the minority classes in various manners (e.g., data re-sampling, cost-sensitive learning). However, we argue that the number of available training data may not be always a good clue to determine the weighting strategy because some of the minority classes might be sufficiently represented even by a small number of training data. Overweighting samples of such classes can lead to drop in the model's overall performance. We claim that the 'difficulty' of a class as perceived by the model is more important to determine the weighting. In this light, we propose a novel loss function named Class-wise Difficulty-Balanced loss, or CDB loss, which dynamically distributes weights to each sample according to the difficulty of the class that the sample belongs to. Note that the assigned weights dynamically change as the 'difficulty' for the model may change with the learning progress. Extensive experiments are conducted on both image (artificially induced class-imbalanced MNIST, long-tailed CIFAR and ImageNet-LT) and video (EGTEA) datasets. The results show that CDB loss consistently outperforms the recently proposed loss functions on class-imbalanced datasets irrespective of the data type (i.e., video or image).
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
Automatic Segmentation, Localization, and Identification of Vertebrae in 3D CT Images Using Cascaded Convolutional Neural Networks
Authors:
Naoto Masuzawa,
Yoshiro Kitamura,
Keigo Nakamura,
Satoshi Iizuka,
Edgar Simo-Serra
Abstract:
This paper presents a method for automatic segmentation, localization, and identification of vertebrae in arbitrary 3D CT images. Many previous works do not perform the three tasks simultaneously even though requiring a priori knowledge of which part of the anatomy is visible in the 3D CT images. Our method tackles all these tasks in a single multi-stage framework without any assumptions. In the f…
▽ More
This paper presents a method for automatic segmentation, localization, and identification of vertebrae in arbitrary 3D CT images. Many previous works do not perform the three tasks simultaneously even though requiring a priori knowledge of which part of the anatomy is visible in the 3D CT images. Our method tackles all these tasks in a single multi-stage framework without any assumptions. In the first stage, we train a 3D Fully Convolutional Networks to find the bounding boxes of the cervical, thoracic, and lumbar vertebrae. In the second stage, we train an iterative 3D Fully Convolutional Networks to segment individual vertebrae in the bounding box. The input to the second networks have an auxiliary channel in addition to the 3D CT images. Given the segmented vertebra regions in the auxiliary channel, the networks output the next vertebra. The proposed method is evaluated in terms of segmentation, localization, and identification accuracy with two public datasets of 15 3D CT images from the MICCAI CSI 2014 workshop challenge and 302 3D CT images with various pathologies introduced in [1]. Our method achieved a mean Dice score of 96%, a mean localization error of 8.3 mm, and a mean identification rate of 84%. In summary, our method achieved better performance than all existing works in all the three metrics.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Hibikino-Musashi@Home 2019 Team Description Paper
Authors:
Yuichiro Tanaka,
Yutaro Ishida,
Yushi Abe,
Tomohiro Ono,
Kohei Kabashima,
Takuma Sakata,
Masashi Fukuyado,
Fuyuki Muto,
Takumi Yoshii,
Kazuki Kanamaru,
Daichi Kamimura,
Kentaro Nakamura,
Yuta Nishimura,
Takashi Morie,
Hakaru Tamukoh
Abstract:
Our team, Hibikino-Musashi@Home (HMA), was founded in 2010. It is based in the Kitakyushu Science and Research Park, Japan. Since 2010, we have participated in the RoboCup@Home Japan Open competition open platform league annually. We have also participated in the RoboCup 2017 Nagoya as an open platform league and domestic standard platform league teams, and in the RoboCup 2018 Montreal as a domest…
▽ More
Our team, Hibikino-Musashi@Home (HMA), was founded in 2010. It is based in the Kitakyushu Science and Research Park, Japan. Since 2010, we have participated in the RoboCup@Home Japan Open competition open platform league annually. We have also participated in the RoboCup 2017 Nagoya as an open platform league and domestic standard platform league teams, and in the RoboCup 2018 Montreal as a domestic standard platform league team. Currently, we have 23 members from seven different laboratories based in Kyushu Institute of Technology. This paper aims to introduce the activities that are performed by our team and the technologies that we use.
△ Less
Submitted 29 May, 2020;
originally announced June 2020.
-
Hibikino-Musashi@Home 2020 Team Description Paper
Authors:
Tomohiro Ono,
Yuichiro Tanaka,
Yutaro Ishida,
Yushi Abe,
Kazuki Kanamaru,
Daichi Kamimura,
Kentaro Nakamura,
Yuta Nishimura,
Shoshi Tokuno,
Yuya Mii,
Morio Yamauchi,
Yuichiro Uemura,
Takunori Hashimoto,
Yugo Nakamura,
Issei Uchino,
Daiju Kanaoka,
Takeru Hanyu,
Kenta Tsukamoto,
Takashi Morie,
Hakaru Tamukoh
Abstract:
Our team, Hibikino-Musashi@Home (HMA), was founded in 2010. It is based in Japan in the Kitakyushu Science and Research Park. Since 2010, we have annually participated in the RoboCup@Home Japan Open competition in the open platform league (OPL). We participated as an open platform league team in the 2017 Nagoya RoboCup competition and as a domestic standard platform league (DSPL) team in the 2017…
▽ More
Our team, Hibikino-Musashi@Home (HMA), was founded in 2010. It is based in Japan in the Kitakyushu Science and Research Park. Since 2010, we have annually participated in the RoboCup@Home Japan Open competition in the open platform league (OPL). We participated as an open platform league team in the 2017 Nagoya RoboCup competition and as a domestic standard platform league (DSPL) team in the 2017 Nagoya, 2018 Montreal, and 2019 Sydney RoboCup competitions. We also participated in the World Robot Challenge (WRC) 2018 in the service-robotics category of the partner-robot challenge (real space) and won first place. Currently, we have 20 members from eight different laboratories within the Kyushu Institute of Technology. In this paper, we introduce the activities that have been performed by our team and the technologies that we use.
△ Less
Submitted 29 May, 2020;
originally announced May 2020.
-
The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge: A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset
Authors:
Arjun D. Desai,
Francesco Caliva,
Claudia Iriondo,
Naji Khosravan,
Aliasghar Mortazi,
Sachin Jambawalikar,
Drew Torigian,
Jutta Ellermann,
Mehmet Akcakaya,
Ulas Bagci,
Radhika Tibrewala,
Io Flament,
Matthew O`Brien,
Sharmila Majumdar,
Mathias Perslev,
Akshay Pai,
Christian Igel,
Erik B. Dam,
Sibaji Gaj,
Mingrui Yang,
Kunio Nakamura,
Xiaojuan Li,
Cem M. Deniz,
Vladimir Juras,
Ravinder Regatte
, et al. (4 additional authors not shown)
Abstract:
Purpose: To organize a knee MRI segmentation challenge for characterizing the semantic and clinical efficacy of automatic segmentation methods relevant for monitoring osteoarthritis progression.
Methods: A dataset partition consisting of 3D knee MRI from 88 subjects at two timepoints with ground-truth articular (femoral, tibial, patellar) cartilage and meniscus segmentations was standardized. Ch…
▽ More
Purpose: To organize a knee MRI segmentation challenge for characterizing the semantic and clinical efficacy of automatic segmentation methods relevant for monitoring osteoarthritis progression.
Methods: A dataset partition consisting of 3D knee MRI from 88 subjects at two timepoints with ground-truth articular (femoral, tibial, patellar) cartilage and meniscus segmentations was standardized. Challenge submissions and a majority-vote ensemble were evaluated using Dice score, average symmetric surface distance, volumetric overlap error, and coefficient of variation on a hold-out test set. Similarities in network segmentations were evaluated using pairwise Dice correlations. Articular cartilage thickness was computed per-scan and longitudinally. Correlation between thickness error and segmentation metrics was measured using Pearson's coefficient. Two empirical upper bounds for ensemble performance were computed using combinations of model outputs that consolidated true positives and true negatives.
Results: Six teams (T1-T6) submitted entries for the challenge. No significant differences were observed across all segmentation metrics for all tissues (p=1.0) among the four top-performing networks (T2, T3, T4, T6). Dice correlations between network pairs were high (>0.85). Per-scan thickness errors were negligible among T1-T4 (p=0.99) and longitudinal changes showed minimal bias (<0.03mm). Low correlations (<0.41) were observed between segmentation metrics and thickness error. The majority-vote ensemble was comparable to top performing networks (p=1.0). Empirical upper bound performances were similar for both combinations (p=1.0).
Conclusion: Diverse networks learned to segment the knee similarly where high segmentation accuracy did not correlate to cartilage thickness accuracy. Voting ensembles did not outperform individual networks but may help regularize individual models.
△ Less
Submitted 26 May, 2020; v1 submitted 29 April, 2020;
originally announced April 2020.
-
Stochastic batch size for adaptive regularization in deep network optimization
Authors:
Kensuke Nakamura,
Stefano Soatto,
Byung-Woo Hong
Abstract:
We propose a first-order stochastic optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. The adaptive regularization is imposed by stochastic process in determining batch size for each model parameter at each optimization iteration. The stochastic batch size is determined by the update probability of each parameter followi…
▽ More
We propose a first-order stochastic optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. The adaptive regularization is imposed by stochastic process in determining batch size for each model parameter at each optimization iteration. The stochastic batch size is determined by the update probability of each parameter following a distribution of gradient norms in consideration of their local and global properties in the neural network architecture where the range of gradient norms may vary within and across layers. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets. The quantitative evaluation indicates that our algorithm outperforms the state-of-the-art optimization algorithms in generalization while providing less sensitivity to the selection of batch size which often plays a critical role in optimization, thus achieving more robustness to the selection of regularity.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Deep learning-based topological optimization for representing a user-specified design area
Authors:
Keigo Nakamura,
Yoshiro Suzuki
Abstract:
Presently, topology optimization requires multiple iterations to create an optimized structure for given conditions. Among the conditions for topology optimization,the design area is one of the most important for structural design. In this study, we propose a new deep learning model to generate an optimized structure for a given design domain and other boundary conditions without iteration. For th…
▽ More
Presently, topology optimization requires multiple iterations to create an optimized structure for given conditions. Among the conditions for topology optimization,the design area is one of the most important for structural design. In this study, we propose a new deep learning model to generate an optimized structure for a given design domain and other boundary conditions without iteration. For this purpose, we used open-source topology optimization MATLAB code to generate a pair of optimized structures under various design conditions. The resolution of the optimized structure is 32 * 32 pixels, and the design conditions are design area, volume fraction, distribution of external forces, and load value. Our deep learning model is primarily composed of a convolutional neural network (CNN)-based encoder and decoder, trained with datasets generated with MATLAB code. In the encoder, we use batch normalization (BN) to increase the stability of the CNN model. In the decoder, we use SPADE (spatially adaptive denormalization) to reinforce the design area information. Comparing the performance of our proposed model with a CNN model that does not use BN and SPADE, values for mean absolute error (MAE), mean compliance error, and volume error with the optimized topology structure generated in MAT-LAB code were smaller, and the proposed model was able to represent the design area more precisely. The proposed method generates near-optimal structures reflecting the design area in less computational time, compared with the open-source topology optimization MATLAB code.
△ Less
Submitted 19 April, 2020; v1 submitted 11 April, 2020;
originally announced April 2020.