-
Measurement of Medial Elbow Joint Space using Landmark Detection
Authors:
Shizuka Akahori,
Shotaro Teruya,
Pragyan Shrestha,
Yuichi Yoshii,
Ryuhei Michinobu,
Satoshi Iizuka,
Itaru Kitahara
Abstract:
Ultrasound imaging of the medial elbow is crucial for the early diagnosis of Ulnar Collateral Ligament (UCL) injuries. Specifically, measuring the elbow joint space in ultrasound images is used to assess the valgus instability of the elbow caused by UCL injuries. To automate this measurement, a model trained on a precisely annotated dataset is necessary; however, no publicly available dataset exis…
▽ More
Ultrasound imaging of the medial elbow is crucial for the early diagnosis of Ulnar Collateral Ligament (UCL) injuries. Specifically, measuring the elbow joint space in ultrasound images is used to assess the valgus instability of the elbow caused by UCL injuries. To automate this measurement, a model trained on a precisely annotated dataset is necessary; however, no publicly available dataset exists to date. This study introduces a novel ultrasound medial elbow dataset to measure the joint space. The dataset comprises 4,201 medial elbow ultrasound images from 22 subjects, with landmark annotations on the humerus and ulna, based on the expertise of three orthopedic surgeons. We evaluated joint space measurement methods on our proposed dataset using heatmap-based, regression-based, and token-based landmark detection methods. While heatmap-based landmark detection methods generally achieve high accuracy, they sometimes produce multiple peaks on a heatmap, leading to incorrect detection. To mitigate this issue and enhance landmark localization, we propose Shape Subspace (SS) landmark refinement by measuring geometrical similarities between the detected and reference landmark positions. The results show that the mean joint space measurement error is 0.116 mm when using HRNet. Furthermore, SS landmark refinement can reduce the mean absolute error of landmark positions by 0.010 mm with HRNet and by 0.103 mm with ViTPose on average. These highlight the potential for high-precision, real-time diagnosis of UCL injuries by accurately measuring joint space. Lastly, we demonstrate point-based segmentation for the humerus and ulna using the detected landmarks as inputs. Our dataset will be publicly available at https://github.com/Akahori000/Ultrasound-Medial-Elbow-Dataset
△ Less
Submitted 25 February, 2025; v1 submitted 17 December, 2024;
originally announced December 2024.
-
Point Cloud Novelty Detection Based on Latent Representations of a General Feature Extractor
Authors:
Shizuka Akahori,
Satoshi Iizuka,
Ken Mawatari,
Kazuhiro Fukui
Abstract:
We propose an effective unsupervised 3D point cloud novelty detection approach, leveraging a general point cloud feature extractor and a one-class classifier. The general feature extractor consists of a graph-based autoencoder and is trained once on a point cloud dataset such as a mathematically generated fractal 3D point cloud dataset that is independent of normal/abnormal categories. The input p…
▽ More
We propose an effective unsupervised 3D point cloud novelty detection approach, leveraging a general point cloud feature extractor and a one-class classifier. The general feature extractor consists of a graph-based autoencoder and is trained once on a point cloud dataset such as a mathematically generated fractal 3D point cloud dataset that is independent of normal/abnormal categories. The input point clouds are first converted into latent vectors by the general feature extractor, and then one-class classification is performed on the latent vectors. Compared to existing methods measuring the reconstruction error in 3D coordinate space, our approach utilizes latent representations where the shape information is condensed, which allows more direct and effective novelty detection. We confirm that our general feature extractor can extract shape features of unseen categories, eliminating the need for autoencoder re-training and reducing the computational burden. We validate the performance of our method through experiments on several subsets of the ShapeNet dataset and demonstrate that our latent-based approach outperforms the existing methods.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
JMultiWOZ: A Large-Scale Japanese Multi-Domain Task-Oriented Dialogue Dataset
Authors:
Atsumoto Ohashi,
Ryu Hirai,
Shinya Iizuka,
Ryuichiro Higashinaka
Abstract:
Dialogue datasets are crucial for deep learning-based task-oriented dialogue system research. While numerous English language multi-domain task-oriented dialogue datasets have been developed and contributed to significant advancements in task-oriented dialogue systems, such a dataset does not exist in Japanese, and research in this area is limited compared to that in English. In this study, toward…
▽ More
Dialogue datasets are crucial for deep learning-based task-oriented dialogue system research. While numerous English language multi-domain task-oriented dialogue datasets have been developed and contributed to significant advancements in task-oriented dialogue systems, such a dataset does not exist in Japanese, and research in this area is limited compared to that in English. In this study, towards the advancement of research and development of task-oriented dialogue systems in Japanese, we constructed JMultiWOZ, the first Japanese language large-scale multi-domain task-oriented dialogue dataset. Using JMultiWOZ, we evaluated the dialogue state tracking and response generation capabilities of the state-of-the-art methods on the existing major English benchmark dataset MultiWOZ2.2 and the latest large language model (LLM)-based methods. Our evaluation results demonstrated that JMultiWOZ provides a benchmark that is on par with MultiWOZ2.2. In addition, through evaluation experiments of interactive dialogues with the models and human participants, we identified limitations in the task completion capabilities of LLMs in Japanese.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Team Flow at DRC2023: Building Common Ground and Text-based Turn-taking in a Travel Agent Spoken Dialogue System
Authors:
Ryu Hirai,
Shinya Iizuka,
Haruhisa Iseno,
Ao Guo,
Jingjing Jiang,
Atsumoto Ohashi,
Ryuichiro Higashinaka
Abstract:
At the Dialogue Robot Competition 2023 (DRC2023), which was held to improve the capability of dialogue robots, our team developed a system that could build common ground and take more natural turns based on user utterance texts. Our system generated queries for sightseeing spot searches using the common ground and engaged in dialogue while waiting for user comprehension.
At the Dialogue Robot Competition 2023 (DRC2023), which was held to improve the capability of dialogue robots, our team developed a system that could build common ground and take more natural turns based on user utterance texts. Our system generated queries for sightseeing spot searches using the common ground and engaged in dialogue while waiting for user comprehension.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Visual Grounding of Whole Radiology Reports for 3D CT Images
Authors:
Akimichi Ichinose,
Taro Hatsutani,
Keigo Nakamura,
Yoshiro Kitamura,
Satoshi Iizuka,
Edgar Simo-Serra,
Shoji Kido,
Noriyuki Tomiyama
Abstract:
Building a large-scale training dataset is an essential problem in the development of medical image recognition systems. Visual grounding techniques, which automatically associate objects in images with corresponding descriptions, can facilitate labeling of large number of images. However, visual grounding of radiology reports for CT images remains challenging, because so many kinds of anomalies a…
▽ More
Building a large-scale training dataset is an essential problem in the development of medical image recognition systems. Visual grounding techniques, which automatically associate objects in images with corresponding descriptions, can facilitate labeling of large number of images. However, visual grounding of radiology reports for CT images remains challenging, because so many kinds of anomalies are detectable via CT imaging, and resulting report descriptions are long and complex. In this paper, we present the first visual grounding framework designed for CT image and report pairs covering various body parts and diverse anomaly types. Our framework combines two components of 1) anatomical segmentation of images, and 2) report structuring. The anatomical segmentation provides multiple organ masks of given CT images, and helps the grounding model recognize detailed anatomies. The report structuring helps to accurately extract information regarding the presence, location, and type of each anomaly described in corresponding reports. Given the two additional image/report features, the grounding model can achieve better localization. In the verification process, we constructed a large-scale dataset with region-description correspondence annotations for 10,410 studies of 7,321 unique patients. We evaluated our framework using grounding accuracy, the percentage of correctly localized anomalies, as a metric and demonstrated that the combination of the anatomical segmentation and the report structuring improves the performance with a large margin over the baseline model (66.0% vs 77.8%). Comparison with the prior techniques also showed higher performance of our method.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Image Synthesis-based Late Stage Cancer Augmentation and Semi-Supervised Segmentation for MRI Rectal Cancer Staging
Authors:
Saeko Sasuga,
Akira Kudo,
Yoshiro Kitamura,
Satoshi Iizuka,
Edgar Simo-Serra,
Atsushi Hamabe,
Masayuki Ishii,
Ichiro Takemasa
Abstract:
Rectal cancer is one of the most common diseases and a major cause of mortality. For deciding rectal cancer treatment plans, T-staging is important. However, evaluating the index from preoperative MRI images requires high radiologists' skill and experience. Therefore, the aim of this study is to segment the mesorectum, rectum, and rectal cancer region so that the system can predict T-stage from se…
▽ More
Rectal cancer is one of the most common diseases and a major cause of mortality. For deciding rectal cancer treatment plans, T-staging is important. However, evaluating the index from preoperative MRI images requires high radiologists' skill and experience. Therefore, the aim of this study is to segment the mesorectum, rectum, and rectal cancer region so that the system can predict T-stage from segmentation results. Generally, shortage of large and diverse dataset and high quality annotation are known to be the bottlenecks in computer aided diagnostics development. Regarding rectal cancer, advanced cancer images are very rare, and per-pixel annotation requires high radiologists' skill and time. Therefore, it is not feasible to collect comprehensive disease patterns in a training dataset. To tackle this, we propose two kinds of approaches of image synthesis-based late stage cancer augmentation and semi-supervised learning which is designed for T-stage prediction. In the image synthesis data augmentation approach, we generated advanced cancer images from labels. The real cancer labels were deformed to resemble advanced cancer labels by artificial cancer progress simulation. Next, we introduce a T-staging loss which enables us to train segmentation models from per-image T-stage labels. The loss works to keep inclusion/invasion relationships between rectum and cancer region consistent to the ground truth T-stage. The verification tests show that the proposed method obtains the best sensitivity (0.76) and specificity (0.80) in distinguishing between over T3 stage and underT2. In the ablation studies, our semi-supervised learning approach with the T-staging loss improved specificity by 0.13. Adding the image synthesis-based data augmentation improved the DICE score of invasion cancer area by 0.08 from baseline.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Diffusion-based Holistic Texture Rectification and Synthesis
Authors:
Guoqing Hao,
Satoshi Iizuka,
Kensho Hara,
Edgar Simo-Serra,
Hirokatsu Kataoka,
Kazuhiro Fukui
Abstract:
We present a novel framework for rectifying occlusions and distortions in degraded texture samples from natural images. Traditional texture synthesis approaches focus on generating textures from pristine samples, which necessitate meticulous preparation by humans and are often unattainable in most natural images. These challenges stem from the frequent occlusions and distortions of texture samples…
▽ More
We present a novel framework for rectifying occlusions and distortions in degraded texture samples from natural images. Traditional texture synthesis approaches focus on generating textures from pristine samples, which necessitate meticulous preparation by humans and are often unattainable in most natural images. These challenges stem from the frequent occlusions and distortions of texture samples in natural images due to obstructions and variations in object surface geometry. To address these issues, we propose a framework that synthesizes holistic textures from degraded samples in natural images, extending the applicability of exemplar-based texture synthesis techniques. Our framework utilizes a conditional Latent Diffusion Model (LDM) with a novel occlusion-aware latent transformer. This latent transformer not only effectively encodes texture features from partially-observed samples necessary for the generation process of the LDM, but also explicitly captures long-range dependencies in samples with large occlusions. To train our model, we introduce a method for generating synthetic data by applying geometric transformations and free-form mask generation to clean textures. Experimental results demonstrate that our framework significantly outperforms existing methods both quantitatively and quantitatively. Furthermore, we conduct comprehensive ablation studies to validate the different components of our proposed framework. Results are corroborated by a perceptual user study which highlights the efficiency of our proposed approach.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Controllable Multi-domain Semantic Artwork Synthesis
Authors:
Yuantian Huang,
Satoshi Iizuka,
Edgar Simo-Serra,
Kazuhiro Fukui
Abstract:
We present a novel framework for multi-domain synthesis of artwork from semantic layouts. One of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art synthesis. To address this problem, we propose a dataset, which we call ArtSem, that contains 40,000 images of artwork from 4 different domains with their corresponding semantic label maps. We…
▽ More
We present a novel framework for multi-domain synthesis of artwork from semantic layouts. One of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art synthesis. To address this problem, we propose a dataset, which we call ArtSem, that contains 40,000 images of artwork from 4 different domains with their corresponding semantic label maps. We generate the dataset by first extracting semantic maps from landscape photography and then propose a conditional Generative Adversarial Network (GAN)-based approach to generate high-quality artwork from the semantic maps without necessitating paired training data. Furthermore, we propose an artwork synthesis model that uses domain-dependent variational encoders for high-quality multi-domain synthesis. The model is improved and complemented with a simple but effective normalization method, based on normalizing both the semantic and style jointly, which we call Spatially STyle-Adaptive Normalization (SSTAN). In contrast to previous methods that only take semantic layout as input, our model is able to learn a joint representation of both style and semantic information, which leads to better generation quality for synthesizing artistic images. Results indicate that our model learns to separate the domains in the latent space, and thus, by identifying the hyperplanes that separate the different domains, we can also perform fine-grained control of the synthesized artwork. By combining our proposed dataset and approach, we are able to generate user-controllable artwork that is of higher quality than existing
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
P2Net: A Post-Processing Network for Refining Semantic Segmentation of LiDAR Point Cloud based on Consistency of Consecutive Frames
Authors:
Yutaka Momma,
Weimin Wang,
Edgar Simo-Serra,
Satoshi Iizuka,
Ryosuke Nakamura,
Hiroshi Ishikawa
Abstract:
We present a lightweight post-processing method to refine the semantic segmentation results of point cloud sequences. Most existing methods usually segment frame by frame and encounter the inherent ambiguity of the problem: based on a measurement in a single frame, labels are sometimes difficult to predict even for humans. To remedy this problem, we propose to explicitly train a network to refine…
▽ More
We present a lightweight post-processing method to refine the semantic segmentation results of point cloud sequences. Most existing methods usually segment frame by frame and encounter the inherent ambiguity of the problem: based on a measurement in a single frame, labels are sometimes difficult to predict even for humans. To remedy this problem, we propose to explicitly train a network to refine these results predicted by an existing segmentation method. The network, which we call the P2Net, learns the consistency constraints between coincident points from consecutive frames after registration. We evaluate the proposed post-processing method both qualitatively and quantitatively on the SemanticKITTI dataset that consists of real outdoor scenes. The effectiveness of the proposed method is validated by comparing the results predicted by two representative networks with and without the refinement by the post-processing network. Specifically, qualitative visualization validates the key idea that labels of the points that are difficult to predict can be corrected with P2Net. Quantitatively, overall mIoU is improved from 10.5% to 11.7% for PointNet [1] and from 10.8% to 15.9% for PointNet++ [2].
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Team Flow at DRC2022: Pipeline System for Travel Destination Recommendation Task in Spoken Dialogue
Authors:
Ryu Hirai,
Atsumoto Ohashi,
Ao Guo,
Hideki Shiroma,
Xulin Zhou,
Yukihiko Tone,
Shinya Iizuka,
Ryuichiro Higashinaka
Abstract:
To improve the interactive capabilities of a dialogue system, e.g., to adapt to different customers, the Dialogue Robot Competition (DRC2022) was held. As one of the teams, we built a dialogue system with a pipeline structure containing four modules. The natural language understanding (NLU) and natural language generation (NLG) modules were GPT-2 based models, and the dialogue state tracking (DST)…
▽ More
To improve the interactive capabilities of a dialogue system, e.g., to adapt to different customers, the Dialogue Robot Competition (DRC2022) was held. As one of the teams, we built a dialogue system with a pipeline structure containing four modules. The natural language understanding (NLU) and natural language generation (NLG) modules were GPT-2 based models, and the dialogue state tracking (DST) and policy modules were designed on the basis of hand-crafted rules. After the preliminary round of the competition, we found that the low variation in training examples for the NLU and failed recommendation due to the policy used were probably the main reasons for the limited performance of the system.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Adaptive occlusion sensitivity analysis for visually explaining video recognition networks
Authors:
Tomoki Uchiyama,
Naoya Sogi,
Satoshi Iizuka,
Koichiro Niinuma,
Kazuhiro Fukui
Abstract:
This paper proposes a method for visually explaining the decision-making process of video recognition networks with a temporal extension of occlusion sensitivity analysis, called Adaptive Occlusion Sensitivity Analysis (AOSA). The key idea here is to occlude a specific volume of data by a 3D mask in an input 3D temporal-spatial data space and then measure the change degree in the output score. The…
▽ More
This paper proposes a method for visually explaining the decision-making process of video recognition networks with a temporal extension of occlusion sensitivity analysis, called Adaptive Occlusion Sensitivity Analysis (AOSA). The key idea here is to occlude a specific volume of data by a 3D mask in an input 3D temporal-spatial data space and then measure the change degree in the output score. The occluded volume data that produces a larger change degree is regarded as a more critical element for classification. However, while the occlusion sensitivity analysis is commonly used to analyze single image classification, applying this idea to video classification is not so straightforward as a simple fixed cuboid cannot deal with complicated motions. To solve this issue, we adaptively set the shape of a 3D occlusion mask while referring to motions. Our flexible mask adaptation is performed by considering the temporal continuity and spatial co-occurrence of the optical flows extracted from the input video data. We further propose a novel method to reduce the computational cost of the proposed method with the first-order approximation of the output score with respect to an input video. We demonstrate the effectiveness of our method through various and extensive comparisons with the conventional methods in terms of the deletion/insertion metric and the pointing metric on the UCF101 dataset and the Kinetics-400 and 700 datasets.
△ Less
Submitted 17 August, 2023; v1 submitted 26 July, 2022;
originally announced July 2022.
-
Automatic Segmentation, Localization, and Identification of Vertebrae in 3D CT Images Using Cascaded Convolutional Neural Networks
Authors:
Naoto Masuzawa,
Yoshiro Kitamura,
Keigo Nakamura,
Satoshi Iizuka,
Edgar Simo-Serra
Abstract:
This paper presents a method for automatic segmentation, localization, and identification of vertebrae in arbitrary 3D CT images. Many previous works do not perform the three tasks simultaneously even though requiring a priori knowledge of which part of the anatomy is visible in the 3D CT images. Our method tackles all these tasks in a single multi-stage framework without any assumptions. In the f…
▽ More
This paper presents a method for automatic segmentation, localization, and identification of vertebrae in arbitrary 3D CT images. Many previous works do not perform the three tasks simultaneously even though requiring a priori knowledge of which part of the anatomy is visible in the 3D CT images. Our method tackles all these tasks in a single multi-stage framework without any assumptions. In the first stage, we train a 3D Fully Convolutional Networks to find the bounding boxes of the cervical, thoracic, and lumbar vertebrae. In the second stage, we train an iterative 3D Fully Convolutional Networks to segment individual vertebrae in the bounding box. The input to the second networks have an auxiliary channel in addition to the 3D CT images. Given the segmented vertebra regions in the auxiliary channel, the networks output the next vertebra. The proposed method is evaluated in terms of segmentation, localization, and identification accuracy with two public datasets of 15 3D CT images from the MICCAI CSI 2014 workshop challenge and 302 3D CT images with various pathologies introduced in [1]. Our method achieved a mean Dice score of 96%, a mean localization error of 8.3 mm, and a mean identification rate of 84%. In summary, our method achieved better performance than all existing works in all the three metrics.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
DeepRemaster: Temporal Source-Reference Attention Networks for Comprehensive Video Enhancement
Authors:
Satoshi Iizuka,
Edgar Simo-Serra
Abstract:
The remastering of vintage film comprises of a diversity of sub-tasks including super-resolution, noise removal, and contrast enhancement which aim to restore the deteriorated film medium to its original state. Additionally, due to the technical limitations of the time, most vintage film is either recorded in black and white, or has low quality colors, for which colorization becomes necessary. In…
▽ More
The remastering of vintage film comprises of a diversity of sub-tasks including super-resolution, noise removal, and contrast enhancement which aim to restore the deteriorated film medium to its original state. Additionally, due to the technical limitations of the time, most vintage film is either recorded in black and white, or has low quality colors, for which colorization becomes necessary. In this work, we propose a single framework to tackle the entire remastering task semi-interactively. Our work is based on temporal convolutional neural networks with attention mechanisms trained on videos with data-driven deterioration simulation. Our proposed source-reference attention allows the model to handle an arbitrary number of reference color images to colorize long videos without the need for segmentation while maintaining temporal consistency. Quantitative analysis shows that our framework outperforms existing approaches, and that, in contrast to existing approaches, the performance of our framework increases with longer videos and more reference color images.
△ Less
Submitted 18 September, 2020;
originally announced September 2020.
-
TopNet: Topology Preserving Metric Learning for Vessel Tree Reconstruction and Labelling
Authors:
Deepak Keshwani,
Yoshiro Kitamura,
Satoshi Ihara,
Satoshi Iizuka,
Edgar Simo-Serra
Abstract:
Reconstructing Portal Vein and Hepatic Vein trees from contrast enhanced abdominal CT scans is a prerequisite for preoperative liver surgery simulation. Existing deep learning based methods treat vascular tree reconstruction as a semantic segmentation problem. However, vessels such as hepatic and portal vein look very similar locally and need to be traced to their source for robust label assignmen…
▽ More
Reconstructing Portal Vein and Hepatic Vein trees from contrast enhanced abdominal CT scans is a prerequisite for preoperative liver surgery simulation. Existing deep learning based methods treat vascular tree reconstruction as a semantic segmentation problem. However, vessels such as hepatic and portal vein look very similar locally and need to be traced to their source for robust label assignment. Therefore, semantic segmentation by looking at local 3D patch results in noisy misclassifications. To tackle this, we propose a novel multi-task deep learning architecture for vessel tree reconstruction. The network architecture simultaneously solves the task of detecting voxels on vascular centerlines (i.e. nodes) and estimates connectivity between center-voxels (edges) in the tree structure to be reconstructed. Further, we propose a novel connectivity metric which considers both inter-class distance and intra-class topological distance between center-voxel pairs. Vascular trees are reconstructed starting from the vessel source using the learned connectivity metric using the shortest path tree algorithm. A thorough evaluation on public IRCAD dataset shows that the proposed method considerably outperforms existing semantic segmentation based methods. To the best of our knowledge, this is the first deep learning based approach which learns multi-label tree structure connectivity from images.
△ Less
Submitted 18 September, 2020;
originally announced September 2020.
-
Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval
Authors:
Shuhei Yokoo,
Kohei Ozaki,
Edgar Simo-Serra,
Satoshi Iizuka
Abstract:
We propose an efficient pipeline for large-scale landmark image retrieval that addresses the diversity of the dataset through two-stage discriminative re-ranking. Our approach is based on embedding the images in a feature-space using a convolutional neural network trained with a cosine softmax loss. Due to the variance of the images, which include extreme viewpoint changes such as having to retrie…
▽ More
We propose an efficient pipeline for large-scale landmark image retrieval that addresses the diversity of the dataset through two-stage discriminative re-ranking. Our approach is based on embedding the images in a feature-space using a convolutional neural network trained with a cosine softmax loss. Due to the variance of the images, which include extreme viewpoint changes such as having to retrieve images of the exterior of a landmark from images of the interior, this is very challenging for approaches based exclusively on visual similarity. Our proposed re-ranking approach improves the results in two steps: in the sort-step, $k$-nearest neighbor search with soft-voting to sort the retrieved results based on their label similarity to the query images, and in the insert-step, we add additional samples from the dataset that were not retrieved by image-similarity. This approach allows overcoming the low visual diversity in retrieved images. In-depth experimental results show that the proposed approach significantly outperforms existing approaches on the challenging Google Landmarks Datasets. Using our methods, we achieved 1st place in the Google Landmark Retrieval 2019 challenge and 3rd place in the Google Landmark Recognition 2019 challenge on Kaggle. Our code is publicly available here: \url{https://github.com/lyakaap/Landmark2019-1st-and-3rd-Place-Solution}
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
Virtual Thin Slice: 3D Conditional GAN-based Super-resolution for CT Slice Interval
Authors:
Akira Kudo,
Yoshiro Kitamura,
Yuanzhong Li,
Satoshi Iizuka,
Edgar Simo-Serra
Abstract:
Many CT slice images are stored with large slice intervals to reduce storage size in clinical practice. This leads to low resolution perpendicular to the slice images (i.e., z-axis), which is insufficient for 3D visualization or image analysis. In this paper, we present a novel architecture based on conditional Generative Adversarial Networks (cGANs) with the goal of generating high resolution ima…
▽ More
Many CT slice images are stored with large slice intervals to reduce storage size in clinical practice. This leads to low resolution perpendicular to the slice images (i.e., z-axis), which is insufficient for 3D visualization or image analysis. In this paper, we present a novel architecture based on conditional Generative Adversarial Networks (cGANs) with the goal of generating high resolution images of main body parts including head, chest, abdomen and legs. However, GANs are known to have a difficulty with generating a diversity of patterns due to a phenomena known as mode collapse. To overcome the lack of generated pattern variety, we propose to condition the discriminator on the different body parts. Furthermore, our generator networks are extended to be three dimensional fully convolutional neural networks, allowing for the generation of high resolution images from arbitrary fields of view. In our verification tests, we show that the proposed method obtains the best scores by PSNR/SSIM metrics and Visual Turing Test, allowing for accurate reproduction of the principle anatomy in high resolution. We expect that the proposed method contribute to effective utilization of the existing vast amounts of thick CT images stored in hospitals.
△ Less
Submitted 1 September, 2019; v1 submitted 29 August, 2019;
originally announced August 2019.
-
Mastering Sketching: Adversarial Augmentation for Structured Prediction
Authors:
Edgar Simo-Serra,
Satoshi Iizuka,
Hiroshi Ishikawa
Abstract:
We present an integral framework for training sketch simplification networks that convert challenging rough sketches into clean line drawings. Our approach augments a simplification network with a discriminator network, training both networks jointly so that the discriminator network discerns whether a line drawing is a real training data or the output of the simplification network, which in turn…
▽ More
We present an integral framework for training sketch simplification networks that convert challenging rough sketches into clean line drawings. Our approach augments a simplification network with a discriminator network, training both networks jointly so that the discriminator network discerns whether a line drawing is a real training data or the output of the simplification network, which in turn tries to fool it. This approach has two major advantages. First, because the discriminator network learns the structure in line drawings, it encourages the output sketches of the simplification network to be more similar in appearance to the training sketches. Second, we can also train the simplification network with additional unsupervised data, using the discriminator network as a substitute teacher. Thus, by adding only rough sketches without simplified line drawings, or only line drawings without the original rough sketches, we can improve the quality of the sketch simplification. We show how our framework can be used to train models that significantly outperform the state of the art in the sketch simplification task, despite using the same architecture for inference. We additionally present an approach to optimize for a single image, which improves accuracy at the cost of additional computation time. Finally, we show that, using the same framework, it is possible to train the network to perform the inverse problem, i.e., convert simple line sketches into pencil drawings, which is not possible using the standard mean squared error loss. We validate our framework with two user tests, where our approach is preferred to the state of the art in sketch simplification 92.3% of the time and obtains 1.2 more points on a scale of 1 to 5.
△ Less
Submitted 27 March, 2017;
originally announced March 2017.