-
TreeCat: Standalone Catalog Engine for Large Data Systems
Authors:
Keonwoo Oh,
Pooja Nilangekar,
Amol Deshpande
Abstract:
With ever increasing volume and heterogeneity of data, advent of new specialized compute engines, and demand for complex use cases, large scale data systems require a performant catalog system that can satisfy diverse needs. We argue that existing solutions, including recent lakehouse storage formats, have fundamental limitations and that there is a strong motivation for a specialized database eng…
▽ More
With ever increasing volume and heterogeneity of data, advent of new specialized compute engines, and demand for complex use cases, large scale data systems require a performant catalog system that can satisfy diverse needs. We argue that existing solutions, including recent lakehouse storage formats, have fundamental limitations and that there is a strong motivation for a specialized database engine, dedicated to serve as the catalog. We present the design and implementation of TreeCat, a database engine that features a hierarchical data model with a path-based query language, a storage format optimized for efficient range queries and versioning, and a correlated scan operation that enables fast query execution. A key performance challenge is supporting concurrent read and write operations from many different clients while providing strict consistency guarantees. To this end, we present a novel MVOCC (multi-versioned optimistic concurrency control) protocol that guarantees serializable isolation. We conduct a comprehensive experimental evaluation comparing our concurrency control scheme with prior techniques, and evaluating our overall system against Hive Metastore, Delta Lake, and Iceberg.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Autonomous Dissection in Robotic Cholecystectomy
Authors:
Ki-Hwan Oh,
Leonardo Borgioli,
Miloš Žefran,
Valentina Valle,
Pier Cristoforo Giulianotti
Abstract:
Robotic surgery offers enhanced precision and adaptability, paving the way for automation in surgical interventions. Cholecystectomy, the gallbladder removal, is particularly well-suited for automation due to its standardized procedural steps and distinct anatomical boundaries. A key challenge in automating this procedure is dissecting with accuracy and adaptability. This paper presents a vision-b…
▽ More
Robotic surgery offers enhanced precision and adaptability, paving the way for automation in surgical interventions. Cholecystectomy, the gallbladder removal, is particularly well-suited for automation due to its standardized procedural steps and distinct anatomical boundaries. A key challenge in automating this procedure is dissecting with accuracy and adaptability. This paper presents a vision-based autonomous robotic dissection architecture that integrates real-time segmentation, keypoint detection, grasping and stretching the gallbladder with the left arm, and dissecting with the other. We introduce an improved segmentation dataset based on videos of robotic cholecystectomy performed by various surgeons, incorporating a new ``liver bed'' class to enhance boundary tracking after multiple rounds of dissection. Our system employs state-of-the-art segmentation models and an adaptive boundary extraction method that maintains accuracy despite tissue deformations and visual variations. Moreover, we implemented an automated grasping and pulling strategy to optimize tissue tension before dissection upon our previous work. Ex vivo evaluations on porcine livers demonstrate that our framework significantly improves dissection precision and consistency, marking a step toward fully autonomous robotic cholecystectomy.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Lost in Sequence: Do Large Language Models Understand Sequential Recommendation?
Authors:
Sein Kim,
Hongseok Kang,
Kibum Kim,
Jiwan Kim,
Donghyun Kim,
Minchul Yang,
Kwangjin Oh,
Julian McAuley,
Chanyoung Park
Abstract:
Large Language Models (LLMs) have recently emerged as promising tools for recommendation thanks to their advanced textual understanding ability and context-awareness. Despite the current practice of training and evaluating LLM-based recommendation (LLM4Rec) models under a sequential recommendation scenario, we found that whether these models understand the sequential information inherent in users'…
▽ More
Large Language Models (LLMs) have recently emerged as promising tools for recommendation thanks to their advanced textual understanding ability and context-awareness. Despite the current practice of training and evaluating LLM-based recommendation (LLM4Rec) models under a sequential recommendation scenario, we found that whether these models understand the sequential information inherent in users' item interaction sequences has been largely overlooked. In this paper, we first demonstrate through a series of experiments that existing LLM4Rec models do not fully capture sequential information both during training and inference. Then, we propose a simple yet effective LLM-based sequential recommender, called LLM-SRec, a method that enhances the integration of sequential information into LLMs by distilling the user representations extracted from a pre-trained CF-SRec model into LLMs. Our extensive experiments show that LLM-SRec enhances LLMs' ability to understand users' item interaction sequences, ultimately leading to improved recommendation performance. Furthermore, unlike existing LLM4Rec models that require fine-tuning of LLMs, LLM-SRec achieves state-of-the-art performance by training only a few lightweight MLPs, highlighting its practicality in real-world applications. Our code is available at https://github.com/Sein-Kim/LLM-SRec.
△ Less
Submitted 11 June, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD)
Authors:
Ki-Hwan Oh,
Leonardo Borgioli,
Alberto Mangano,
Valentina Valle,
Marco Di Pangrazio,
Francesco Toti,
Gioia Pozza,
Luciano Ambrosini,
Alvaro Ducas,
Miloš Žefran,
Liaohai Chen,
Pier Cristoforo Giulianotti
Abstract:
In recent years, the application of machine learning to minimally invasive surgery (MIS) has attracted considerable interest. Datasets are critical to the use of such techniques. This paper presents a unique dataset recorded during ex vivo pseudo-cholecystectomy procedures on pig livers using the da Vinci Research Kit (dVRK). Unlike existing datasets, it addresses a critical gap by providing compr…
▽ More
In recent years, the application of machine learning to minimally invasive surgery (MIS) has attracted considerable interest. Datasets are critical to the use of such techniques. This paper presents a unique dataset recorded during ex vivo pseudo-cholecystectomy procedures on pig livers using the da Vinci Research Kit (dVRK). Unlike existing datasets, it addresses a critical gap by providing comprehensive kinematic data, recordings of all pedal inputs, and offers a time-stamped record of the endoscope's movements. This expanded version also includes segmentation and keypoint annotations of images, enhancing its utility for computer vision applications.
Contributed by seven surgeons with varied backgrounds and experience levels that are provided as a part of this expanded version, the dataset is an important new resource for surgical robotics research. It enables the development of advanced methods for evaluating surgeon skills, tools for providing better context awareness, and automation of surgical tasks. Our work overcomes the limitations of incomplete recordings and imprecise kinematic data found in other datasets. To demonstrate the potential of the dataset for advancing automation in surgical robotics, we introduce two models that predict clutch usage and camera activation, a 3D scene reconstruction example, and the results from our keypoint and segmentation models.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model
Authors:
Young-Jun Lee,
Dokyong Lee,
Junyoung Youn,
Kyeongjin Oh,
Ho-Jin Choi
Abstract:
To increase social bonding with interlocutors, humans naturally acquire the ability to respond appropriately in a given situation by considering which conversational skill is most suitable for the response - a process we call skill-of-mind. For large language model (LLM)-based conversational agents, planning appropriate conversational skills, as humans do, is challenging due to the complexity of s…
▽ More
To increase social bonding with interlocutors, humans naturally acquire the ability to respond appropriately in a given situation by considering which conversational skill is most suitable for the response - a process we call skill-of-mind. For large language model (LLM)-based conversational agents, planning appropriate conversational skills, as humans do, is challenging due to the complexity of social dialogue, especially in interactive scenarios. To address this, we propose a skill-of-mind-annotated conversation dataset, named Multifaceted Skill-of-Mind, which includes multi-turn and multifaceted conversational skills across various interactive scenarios (e.g., long-term, counseling, task-oriented), grounded in diverse social contexts (e.g., demographics, persona, rules of thumb). This dataset consists of roughly 100K conversations. Using this dataset, we introduce a new family of skill-of-mind-infused LLMs, named Thanos, with model sizes of 1B, 3B, and 8B parameters. With extensive experiments, these models successfully demonstrate the skill-of-mind process and exhibit strong generalizability in inferring multifaceted skills across a variety of domains. Moreover, we show that Thanos significantly enhances the quality of responses generated by LLM-based conversational agents and promotes prosocial behavior in human evaluations.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
IdenBAT: Disentangled Representation Learning for Identity-Preserved Brain Age Transformation
Authors:
Junyeong Maeng,
Kwanseok Oh,
Wonsik Jung,
Heung-Il Suk
Abstract:
Brain age transformation aims to convert reference brain images into synthesized images that accurately reflect the age-specific features of a target age group. The primary objective of this task is to modify only the age-related attributes of the reference image while preserving all other age-irrelevant attributes. However, achieving this goal poses substantial challenges due to the inherent enta…
▽ More
Brain age transformation aims to convert reference brain images into synthesized images that accurately reflect the age-specific features of a target age group. The primary objective of this task is to modify only the age-related attributes of the reference image while preserving all other age-irrelevant attributes. However, achieving this goal poses substantial challenges due to the inherent entanglement of various image attributes within features extracted from a backbone encoder, resulting in simultaneous alterations during the image generation. To address this challenge, we propose a novel architecture that employs disentangled representation learning for identity-preserved brain age transformation called IdenBAT. This approach facilitates the decomposition of image features, ensuring the preservation of individual traits while selectively transforming age-related characteristics to match those of the target age group. Through comprehensive experiments conducted on both 2D and full-size 3D brain datasets, our method adeptly converts input images to target age while retaining individual characteristics accurately. Furthermore, our approach demonstrates superiority over existing state-of-the-art regarding performance fidelity.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
DyMix: Dynamic Frequency Mixup Scheduler based Unsupervised Domain Adaptation for Enhancing Alzheimer's Disease Identification
Authors:
Yooseung Shin,
Kwanseok Oh,
Heung-Il Suk
Abstract:
Advances in deep learning (DL)-based models for brain image analysis have significantly enhanced the accuracy of Alzheimer's disease (AD) diagnosis, allowing for more timely interventions. Despite these advancements, most current DL models suffer from performance degradation when inferring on unseen domain data owing to the variations in data distributions, a phenomenon known as domain shift. To a…
▽ More
Advances in deep learning (DL)-based models for brain image analysis have significantly enhanced the accuracy of Alzheimer's disease (AD) diagnosis, allowing for more timely interventions. Despite these advancements, most current DL models suffer from performance degradation when inferring on unseen domain data owing to the variations in data distributions, a phenomenon known as domain shift. To address this challenge, we propose a novel approach called the dynamic frequency mixup scheduler (DyMix) for unsupervised domain adaptation. Contrary to the conventional mixup technique, which involves simple linear interpolations between predefined data points from the frequency space, our proposed DyMix dynamically adjusts the magnitude of the frequency regions being mixed from the source and target domains. Such an adaptive strategy optimizes the model's capacity to deal with domain variability, thereby enhancing its generalizability across the target domain. In addition, we incorporate additional strategies to further enforce the model's robustness against domain shifts, including leveraging amplitude-phase recombination to ensure resilience to intensity variations and applying self-adversarial learning to derive domain-invariant feature representations. Experimental results on two benchmark datasets quantitatively and qualitatively validated the effectiveness of our DyMix in that we demonstrated its outstanding performance in AD diagnosis compared to state-of-the-art methods.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Tabular Transfer Learning via Prompting LLMs
Authors:
Jaehyun Nam,
Woomin Song,
Seong Hyeon Park,
Jihoon Tack,
Sukmin Yun,
Jaehyung Kim,
Kyu Hwan Oh,
Jinwoo Shin
Abstract:
Learning with a limited number of labeled data is a central problem in real-world applications of machine learning, as it is often expensive to obtain annotations. To deal with the scarcity of labeled data, transfer learning is a conventional approach; it suggests to learn a transferable knowledge by training a neural network from multiple other sources. In this paper, we investigate transfer lear…
▽ More
Learning with a limited number of labeled data is a central problem in real-world applications of machine learning, as it is often expensive to obtain annotations. To deal with the scarcity of labeled data, transfer learning is a conventional approach; it suggests to learn a transferable knowledge by training a neural network from multiple other sources. In this paper, we investigate transfer learning of tabular tasks, which has been less studied and successful in the literature, compared to other domains, e.g., vision and language. This is because tables are inherently heterogeneous, i.e., they contain different columns and feature spaces, making transfer learning difficult. On the other hand, recent advances in natural language processing suggest that the label scarcity issue can be mitigated by utilizing in-context learning capability of large language models (LLMs). Inspired by this and the fact that LLMs can also process tables within a unified language space, we ask whether LLMs can be effective for tabular transfer learning, in particular, under the scenarios where the source and target datasets are of different format. As a positive answer, we propose a novel tabular transfer learning framework, coined Prompt to Transfer (P2T), that utilizes unlabeled (or heterogeneous) source data with LLMs. Specifically, P2T identifies a column feature in a source dataset that is strongly correlated with a target task feature to create examples relevant to the target task, thus creating pseudo-demonstrations for prompts. Experimental results demonstrate that P2T outperforms previous methods on various tabular learning benchmarks, showing good promise for the important, yet underexplored tabular transfer learning problem. Code is available at https://github.com/jaehyun513/P2T.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge
Authors:
Young-Jun Lee,
Dokyong Lee,
Junyoung Youn,
Kyeongjin Oh,
Byungsoo Ko,
Jonghwan Hyeon,
Ho-Jin Choi
Abstract:
Humans share a wide variety of images related to their personal experiences within conversations via instant messaging tools. However, existing works focus on (1) image-sharing behavior in singular sessions, leading to limited long-term social interaction, and (2) a lack of personalized image-sharing behavior. In this work, we introduce Stark, a large-scale long-term multi-modal conversation datas…
▽ More
Humans share a wide variety of images related to their personal experiences within conversations via instant messaging tools. However, existing works focus on (1) image-sharing behavior in singular sessions, leading to limited long-term social interaction, and (2) a lack of personalized image-sharing behavior. In this work, we introduce Stark, a large-scale long-term multi-modal conversation dataset that covers a wide range of social personas in a multi-modality format, time intervals, and images. To construct Stark automatically, we propose a novel multi-modal contextualization framework, Mcu, that generates long-term multi-modal dialogue distilled from ChatGPT and our proposed Plan-and-Execute image aligner. Using our Stark, we train a multi-modal conversation model, Ultron 7B, which demonstrates impressive visual imagination ability. Furthermore, we demonstrate the effectiveness of our dataset in human evaluation. We make our source code and dataset publicly available.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
FIESTA: Fourier-Based Semantic Augmentation with Uncertainty Guidance for Enhanced Domain Generalizability in Medical Image Segmentation
Authors:
Kwanseok Oh,
Eunjin Jeon,
Da-Woon Heo,
Yooseung Shin,
Heung-Il Suk
Abstract:
Single-source domain generalization (SDG) in medical image segmentation (MIS) aims to generalize a model using data from only one source domain to segment data from an unseen target domain. Despite substantial advances in SDG with data augmentation, existing methods often fail to fully consider the details and uncertain areas prevalent in MIS, leading to mis-segmentation. This paper proposes a Fou…
▽ More
Single-source domain generalization (SDG) in medical image segmentation (MIS) aims to generalize a model using data from only one source domain to segment data from an unseen target domain. Despite substantial advances in SDG with data augmentation, existing methods often fail to fully consider the details and uncertain areas prevalent in MIS, leading to mis-segmentation. This paper proposes a Fourier-based semantic augmentation method called FIESTA using uncertainty guidance to enhance the fundamental goals of MIS in an SDG context by manipulating the amplitude and phase components in the frequency domain. The proposed Fourier augmentative transformer addresses semantic amplitude modulation based on meaningful angular points to induce pertinent variations and harnesses the phase spectrum to ensure structural coherence. Moreover, FIESTA employs epistemic uncertainty to fine-tune the augmentation process, improving the ability of the model to adapt to diverse augmented data and concentrate on areas with higher ambiguity. Extensive experiments across three cross-domain scenarios demonstrate that FIESTA surpasses recent state-of-the-art SDG approaches in segmentation performance and significantly contributes to boosting the applicability of the model in medical imaging modalities.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Sensory Glove-Based Surgical Robot User Interface
Authors:
Leonardo Borgioli,
Ki-Hwan Oh,
Valentina Valle,
Alvaro Ducas,
Mohammad Halloum,
Diego Federico Mendoza Medina,
Arman Sharifi,
Paula A L'opez,
Jessica Cassiani,
Milos Zefran,
Liaohai Chen,
Pier Cristoforo Giulianotti
Abstract:
Robotic surgery has reached a high level of maturity and has become an integral part of standard surgical care. However, existing surgeon consoles are bulky, take up valuable space in the operating room, make surgical team coordination challenging, and their proprietary nature makes it difficult to take advantage of recent technological advances, especially in virtual and augmented reality. One po…
▽ More
Robotic surgery has reached a high level of maturity and has become an integral part of standard surgical care. However, existing surgeon consoles are bulky, take up valuable space in the operating room, make surgical team coordination challenging, and their proprietary nature makes it difficult to take advantage of recent technological advances, especially in virtual and augmented reality. One potential area for further improvement is the integration of modern sensory gloves into robotic platforms, allowing surgeons to control robotic arms intuitively with their hand movements. We propose one such system that combines an HTC Vive tracker, a Manus Meta Prime 3 XR sensory glove, and SCOPEYE wireless smart glasses. The system controls one arm of a da Vinci surgical robot. In addition to moving the arm, the surgeon can use fingers to control the end-effector of the surgical instrument. Hand gestures are used to implement clutching and similar functions. In particular, we introduce clutching of the instrument orientation, a functionality unavailable in the da Vinci system. The vibrotactile elements of the glove are used to provide feedback to the user when gesture commands are invoked. A qualitative and quantitative evaluation has been conducted that compares the current device with the dVRK console. The system is shown to have excellent tracking accuracy, and the new interface allows surgeons to perform common surgical training tasks with minimal practice efficiently.
△ Less
Submitted 17 March, 2025; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Context-based Interpretable Spatio-Temporal Graph Convolutional Network for Human Motion Forecasting
Authors:
Edgar Medina,
Leyong Loh,
Namrata Gurung,
Kyung Hun Oh,
Niels Heller
Abstract:
Human motion prediction is still an open problem extremely important for autonomous driving and safety applications. Due to the complex spatiotemporal relation of motion sequences, this remains a challenging problem not only for movement prediction but also to perform a preliminary interpretation of the joint connections. In this work, we present a Context-based Interpretable Spatio-Temporal Graph…
▽ More
Human motion prediction is still an open problem extremely important for autonomous driving and safety applications. Due to the complex spatiotemporal relation of motion sequences, this remains a challenging problem not only for movement prediction but also to perform a preliminary interpretation of the joint connections. In this work, we present a Context-based Interpretable Spatio-Temporal Graph Convolutional Network (CIST-GCN), as an efficient 3D human pose forecasting model based on GCNs that encompasses specific layers, aiding model interpretability and providing information that might be useful when analyzing motion distribution and body behavior. Our architecture extracts meaningful information from pose sequences, aggregates displacements and accelerations into the input model, and finally predicts the output displacements. Extensive experiments on Human 3.6M, AMASS, 3DPW, and ExPI datasets demonstrate that CIST-GCN outperforms previous methods in human motion prediction and robustness. Since the idea of enhancing interpretability for motion prediction has its merits, we showcase experiments towards it and provide preliminary evaluations of such insights here. available code: https://github.com/QualityMinds/cistgcn
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Authors:
Jiwon Song,
Kyungseok Oh,
Taesu Kim,
Hyungjun Kim,
Yulhwa Kim,
Jae-Joon Kim
Abstract:
Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existi…
▽ More
Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existing methods often struggle to achieve substantial end-to-end LLM inference speedup. In this paper, we introduce SLEB, a novel approach designed to streamline LLMs by eliminating redundant transformer blocks. We choose the transformer block as the fundamental unit for pruning, because LLMs exhibit block-level redundancy with high similarity between the outputs of neighboring blocks. This choice allows us to effectively enhance the processing speed of LLMs. Our experimental results demonstrate that SLEB outperforms previous LLM pruning methods in accelerating LLM inference while also maintaining superior perplexity and accuracy, making SLEB as a promising technique for enhancing the efficiency of LLMs. The code is available at: https://github.com/jiwonsong-dev/SLEB.
△ Less
Submitted 12 December, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Transferring Ultrahigh-Field Representations for Intensity-Guided Brain Segmentation of Low-Field Magnetic Resonance Imaging
Authors:
Kwanseok Oh,
Jieun Lee,
Da-Woon Heo,
Dinggang Shen,
Heung-Il Suk
Abstract:
Ultrahigh-field (UHF) magnetic resonance imaging (MRI), i.e., 7T MRI, provides superior anatomical details of internal brain structures owing to its enhanced signal-to-noise ratio and susceptibility-induced contrast. However, the widespread use of 7T MRI is limited by its high cost and lower accessibility compared to low-field (LF) MRI. This study proposes a deep-learning framework that systematic…
▽ More
Ultrahigh-field (UHF) magnetic resonance imaging (MRI), i.e., 7T MRI, provides superior anatomical details of internal brain structures owing to its enhanced signal-to-noise ratio and susceptibility-induced contrast. However, the widespread use of 7T MRI is limited by its high cost and lower accessibility compared to low-field (LF) MRI. This study proposes a deep-learning framework that systematically fuses the input LF magnetic resonance feature representations with the inferred 7T-like feature representations for brain image segmentation tasks in a 7T-absent environment. Specifically, our adaptive fusion module aggregates 7T-like features derived from the LF image by a pre-trained network and then refines them to be effectively assimilable UHF guidance into LF image features. Using intensity-guided features obtained from such aggregation and assimilation, segmentation models can recognize subtle structural representations that are usually difficult to recognize when relying only on LF features. Beyond such advantages, this strategy can seamlessly be utilized by modulating the contrast of LF features in alignment with UHF guidance, even when employing arbitrary segmentation models. Exhaustive experiments demonstrated that the proposed method significantly outperformed all baseline models on both brain tissue and whole-brain segmentation tasks; further, it exhibited remarkable adaptability and scalability by successfully integrating diverse segmentation models and tasks. These improvements were not only quantifiable but also visible in the superlative visual quality of segmentation masks.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA
Authors:
Kaiyuan Yang,
Fabio Musio,
Yihui Ma,
Norman Juchler,
Johannes C. Paetzold,
Rami Al-Maskari,
Luciano Höher,
Hongwei Bran Li,
Ibrahim Ethem Hamamci,
Anjany Sekuboyina,
Suprosanna Shit,
Houjing Huang,
Chinmay Prabhakar,
Ezequiel de la Rosa,
Bastian Wittmann,
Diana Waldmannstetter,
Florian Kofler,
Fernando Navarro,
Martin Menten,
Ivan Ezhov,
Daniel Rueckert,
Iris N. Vos,
Ynte M. Ruigrok,
Birgitta K. Velthuis,
Hugo J. Kuijf
, et al. (88 additional authors not shown)
Abstract:
The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neurovascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two non-invasive angiographic imag…
▽ More
The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neurovascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two non-invasive angiographic imaging modalities, magnetic resonance angiography (MRA) and computed tomography angiography (CTA), but there exist limited datasets with annotations on CoW anatomy, especially for CTA. Therefore, we organized the TopCoW challenge with the release of an annotated CoW dataset. The TopCoW dataset is the first public dataset with voxel-level annotations for 13 CoW vessel components, enabled by virtual reality technology. It is also the first large dataset using 200 pairs of MRA and CTA from the same patients. As part of the benchmark, we invited submissions worldwide and attracted over 250 registered participants from six continents. The submissions were evaluated on both internal and external test datasets of 226 scans from over five centers. The top performing teams achieved over 90% Dice scores at segmenting the CoW components, over 80% F1 scores at detecting key CoW components, and over 70% balanced accuracy at classifying CoW variants for nearly all test sets. The best algorithms also showed clinical potential in classifying fetal-type posterior cerebral artery and locating aneurysms with CoW anatomy. TopCoW demonstrated the utility and versatility of CoW segmentation algorithms for a wide range of downstream clinical applications with explainability. The annotated datasets and best performing algorithms have been released as public Zenodo records to foster further methodological development and clinical tool building.
△ Less
Submitted 8 July, 2025; v1 submitted 29 December, 2023;
originally announced December 2023.
-
Comprehensive Robotic Cholecystectomy Dataset (CRCD): Integrating Kinematics, Pedal Signals, and Endoscopic Videos
Authors:
Ki-Hwan Oh,
Leonardo Borgioli,
Alberto Mangano,
Valentina Valle,
Marco Di Pangrazio,
Francesco Toti,
Gioia Pozza,
Luciano Ambrosini,
Alvaro Ducas,
Milos Zefran,
Liaohai Chen,
Pier Cristoforo Giulianotti
Abstract:
In recent years, the potential applications of machine learning to Minimally Invasive Surgery (MIS) have spurred interest in data sets that can be used to develop data-driven tools. This paper introduces a novel dataset recorded during ex vivo pseudo-cholecystectomy procedures on pig livers, utilizing the da Vinci Research Kit (dVRK). Unlike current datasets, ours bridges a critical gap by offerin…
▽ More
In recent years, the potential applications of machine learning to Minimally Invasive Surgery (MIS) have spurred interest in data sets that can be used to develop data-driven tools. This paper introduces a novel dataset recorded during ex vivo pseudo-cholecystectomy procedures on pig livers, utilizing the da Vinci Research Kit (dVRK). Unlike current datasets, ours bridges a critical gap by offering not only full kinematic data but also capturing all pedal inputs used during the procedure and providing a time-stamped record of the endoscope's movements. Contributed by seven surgeons, this data set introduces a new dimension to surgical robotics research, allowing the creation of advanced models for automating console functionalities. Our work addresses the existing limitation of incomplete recordings and imprecise kinematic data, common in other datasets. By introducing two models, dedicated to predicting clutch usage and camera activation, we highlight the dataset's potential for advancing automation in surgical robotics. The comparison of methodologies and time windows provides insights into the models' boundaries and limitations.
△ Less
Submitted 16 October, 2024; v1 submitted 2 December, 2023;
originally announced December 2023.
-
Unifying Structure and Language Semantic for Efficient Contrastive Knowledge Graph Completion with Structured Entity Anchors
Authors:
Sang-Hyun Je,
Wontae Choi,
Kwangjin Oh
Abstract:
The goal of knowledge graph completion (KGC) is to predict missing links in a KG using trained facts that are already known. In recent, pre-trained language model (PLM) based methods that utilize both textual and structural information are emerging, but their performances lag behind state-of-the-art (SOTA) structure-based methods or some methods lose their inductive inference capabilities in the p…
▽ More
The goal of knowledge graph completion (KGC) is to predict missing links in a KG using trained facts that are already known. In recent, pre-trained language model (PLM) based methods that utilize both textual and structural information are emerging, but their performances lag behind state-of-the-art (SOTA) structure-based methods or some methods lose their inductive inference capabilities in the process of fusing structure embedding to text encoder. In this paper, we propose a novel method to effectively unify structure information and language semantics without losing the power of inductive reasoning. We adopt entity anchors and these anchors and textual description of KG elements are fed together into the PLM-based encoder to learn unified representations. In addition, the proposed method utilizes additional random negative samples which can be reused in the each mini-batch during contrastive learning to learn a generalized entity representations. We verify the effectiveness of the our proposed method through various experiments and analysis. The experimental results on standard benchmark widely used in link prediction task show that the proposed model outperforms existing the SOTA KGC models. Especially, our method show the largest performance improvement on FB15K-237, which is competitive to the SOTA of structure-based KGC methods.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
A Framework For Automated Dissection Along Tissue Boundary
Authors:
Ki-Hwan Oh,
Leonardo Borgioli,
Miloš Žefran,
Liaohai Chen,
Pier Cristoforo Giulianotti
Abstract:
Robotic surgery promises enhanced precision and adaptability over traditional surgical methods. It also offers the possibility of automating surgical interventions, resulting in reduced stress on the surgeon, better surgical outcomes, and lower costs. Cholecystectomy, the removal of the gallbladder, serves as an ideal model procedure for automation due to its distinct and well-contrasted anatomica…
▽ More
Robotic surgery promises enhanced precision and adaptability over traditional surgical methods. It also offers the possibility of automating surgical interventions, resulting in reduced stress on the surgeon, better surgical outcomes, and lower costs. Cholecystectomy, the removal of the gallbladder, serves as an ideal model procedure for automation due to its distinct and well-contrasted anatomical features between the gallbladder and liver, along with standardized surgical maneuvers. Dissection is a frequently used subtask in cholecystectomy where the surgeon delivers the energy on the hook to detach the gallbladder from the liver. Hence, dissection along tissue boundaries is a good candidate for surgical automation. For the da Vinci surgical robot to perform the same procedure as a surgeon automatically, it needs to have the ability to (1) recognize and distinguish between the two different tissues (e.g. the liver and the gallbladder), (2) understand where the boundary between the two tissues is located in the 3D workspace, (3) locate the instrument tip relative to the boundary in the 3D space using visual feedback, and (4) move the instrument along the boundary. This paper presents a novel framework that addresses these challenges through AI-assisted image processing and vision-based robot control. We also present the ex-vivo evaluation of the automated procedure on chicken and pork liver specimens that demonstrates the effectiveness of the proposed framework.
△ Less
Submitted 20 August, 2024; v1 submitted 14 October, 2023;
originally announced October 2023.
-
Domain Generalization for Medical Image Analysis: A Review
Authors:
Jee Seok Yoon,
Kwanseok Oh,
Yooseung Shin,
Maciej A. Mazurowski,
Heung-Il Suk
Abstract:
Medical image analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, deploying DL models for MedIA in real-world situations remains challenging due to their failure to generalize across the distributional gap bet…
▽ More
Medical image analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, deploying DL models for MedIA in real-world situations remains challenging due to their failure to generalize across the distributional gap between training and testing samples - a problem known as domain shift. Researchers have dedicated their efforts to developing various DL methods to adapt and perform robustly on unknown and out-of-distribution (OOD) data distributions. This article comprehensively reviews domain generalization (DG) studies specifically tailored for MedIA. We provide a holistic view of how DG techniques interact within the broader MedIA system, going beyond methodologies to consider the operational implications on the entire MedIA workflow. Specifically, we categorize DG methods into data-level, feature-level, model-level, and analysis-level methods. We show how those methods can be used in various stages of the MedIA workflow with DL equipped from data acquisition to model prediction and analysis. Furthermore, we critically analyze the strengths and weaknesses of various methods, unveiling future research opportunities.
△ Less
Submitted 7 December, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
A Quantitatively Interpretable Model for Alzheimer's Disease Prediction Using Deep Counterfactuals
Authors:
Kwanseok Oh,
Da-Woon Heo,
Ahmad Wisnu Mulyadi,
Wonsik Jung,
Eunsong Kang,
Kun Ho Lee,
Heung-Il Suk
Abstract:
Deep learning (DL) for predicting Alzheimer's disease (AD) has provided timely intervention in disease progression yet still demands attentive interpretability to explain how their DL models make definitive decisions. Recently, counterfactual reasoning has gained increasing attention in medical research because of its ability to provide a refined visual explanatory map. However, such visual explan…
▽ More
Deep learning (DL) for predicting Alzheimer's disease (AD) has provided timely intervention in disease progression yet still demands attentive interpretability to explain how their DL models make definitive decisions. Recently, counterfactual reasoning has gained increasing attention in medical research because of its ability to provide a refined visual explanatory map. However, such visual explanatory maps based on visual inspection alone are insufficient unless we intuitively demonstrate their medical or neuroscientific validity via quantitative features. In this study, we synthesize the counterfactual-labeled structural MRIs using our proposed framework and transform it into a gray matter density map to measure its volumetric changes over the parcellated region of interest (ROI). We also devised a lightweight linear classifier to boost the effectiveness of constructed ROIs, promoted quantitative interpretation, and achieved comparable predictive performance to DL methods. Throughout this, our framework produces an ``AD-relatedness index'' for each ROI and offers an intuitive understanding of brain status for an individual patient and across patient groups with respect to AD progression.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Recognizing Intent in Collaborative Manipulation
Authors:
Zhanibek Rysbek,
Ki Hwan Oh,
Milos Zefran
Abstract:
Collaborative manipulation is inherently multimodal, with haptic communication playing a central role. When performed by humans, it involves back-and-forth force exchanges between the participants through which they resolve possible conflicts and determine their roles. Much of the existing work on collaborative human-robot manipulation assumes that the robot follows the human. But for a robot to m…
▽ More
Collaborative manipulation is inherently multimodal, with haptic communication playing a central role. When performed by humans, it involves back-and-forth force exchanges between the participants through which they resolve possible conflicts and determine their roles. Much of the existing work on collaborative human-robot manipulation assumes that the robot follows the human. But for a robot to match the performance of a human partner it needs to be able to take initiative and lead when appropriate. To achieve such human-like performance, the robot needs to have the ability to (1) determine the intent of the human, (2) clearly express its own intent, and (3) choose its actions so that the dyad reaches consensus. This work proposes a framework for recognizing human intent in collaborative manipulation tasks using force exchanges. Grounded in a dataset collected during a human study, we introduce a set of features that can be computed from the measured signals and report the results of a classifier trained on our collected human-human interaction data. Two metrics are used to evaluate the intent recognizer: overall accuracy and the ability to correctly identify transitions. The proposed recognizer shows robustness against the variations in the partner's actions and the confounding effects due to the variability in grasp forces and dynamic effects of walking. The results demonstrate that the proposed recognizer is well-suited for implementation in a physical interaction control scheme.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Smartpick: Workload Prediction for Serverless-enabled Scalable Data Analytics Systems
Authors:
Anshuman Das Mohapatra,
Kwangsung Oh
Abstract:
Many data analytic systems have adopted a newly emerging compute resource, serverless (SL), to handle data analytics queries in a timely and cost-efficient manner, i.e., serverless data analytics. While these systems can start processing queries quickly thanks to the agility and scalability of SL, they may encounter performance- and cost-bottlenecks based on workloads due to SL's worse performance…
▽ More
Many data analytic systems have adopted a newly emerging compute resource, serverless (SL), to handle data analytics queries in a timely and cost-efficient manner, i.e., serverless data analytics. While these systems can start processing queries quickly thanks to the agility and scalability of SL, they may encounter performance- and cost-bottlenecks based on workloads due to SL's worse performance and more expensive cost than traditional compute resources, e.g., virtual machine (VM). In this project, we introduce Smartpick, a SL-enabled scalable data analytics system that exploits SL and VM together to realize composite benefits, i.e., agility from SL and better performance with reduced cost from VM. Smartpick uses a machine learning prediction scheme, decision-tree based Random Forest with Bayesian Optimizer, to determine SL and VM configurations, i.e., how many SL and VM instances for queries, that meet cost-performance goals. Smartpick offers a knob for applications to allow them to explore a richer cost-performance tradeoff space opened by exploiting SL and VM together. To maximize the benefits of SL, Smartpick supports a simple but strong mechanism, called relay-instances. Smartpick also supports event-driven prediction model retraining to deal with workload dynamics. A Smartpick prototype was implemented on Spark and deployed on live test-beds, Amazon AWS and Google Cloud Platform. Evaluation results indicate 97.05% and 83.49% prediction accuracies respectively with up to 50% cost reduction as opposed to the baselines. The results also confirm that Smartpick allows data analytics applications to navigate the richer cost-performance tradeoff space efficiently and to handle workload dynamics effectively and automatically.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
An IPW-based Unbiased Ranking Metric in Two-sided Markets
Authors:
Keisho Oh,
Naoki Nishimura,
Minje Sung,
Ken Kobayashi,
Kazuhide Nakata
Abstract:
In modern recommendation systems, unbiased learning-to-rank (LTR) is crucial for prioritizing items from biased implicit user feedback, such as click data. Several techniques, such as Inverse Propensity Weighting (IPW), have been proposed for single-sided markets. However, less attention has been paid to two-sided markets, such as job platforms or dating services, where successful conversions requ…
▽ More
In modern recommendation systems, unbiased learning-to-rank (LTR) is crucial for prioritizing items from biased implicit user feedback, such as click data. Several techniques, such as Inverse Propensity Weighting (IPW), have been proposed for single-sided markets. However, less attention has been paid to two-sided markets, such as job platforms or dating services, where successful conversions require matching preferences from both users. This paper addresses the complex interaction of biases between users in two-sided markets and proposes a tailored LTR approach. We first present a formulation of feedback mechanisms in two-sided matching platforms and point out that their implicit feedback may include position bias from both user groups. On the basis of this observation, we extend the IPW estimator and propose a new estimator, named two-sided IPW, to address the position bases in two-sided markets. We prove that the proposed estimator satisfies the unbiasedness for the ground-truth ranking metric. We conducted numerical experiments on real-world two-sided platforms and demonstrated the effectiveness of our proposed method in terms of both precision and robustness. Our experiments showed that our method outperformed baselines especially when handling rare items, which are less frequently observed in the training data.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Robots Taking Initiative in Collaborative Object Manipulation: Lessons from Physical Human-Human Interaction
Authors:
Zhanibek Rysbek,
Ki Hwan Oh,
Afagh Mehri Shervedani,
Timotej Klemencic,
Milos Zefran,
Barbara Di Eugenio
Abstract:
Physical Human-Human Interaction (pHHI) involves the use of multiple sensory modalities. Studies of communication through spoken utterances and gestures are well established, but communication through force signals is not well understood. In this paper, we focus on investigating the mechanisms employed by humans during the negotiation through force signals, and how the robot can communicate task g…
▽ More
Physical Human-Human Interaction (pHHI) involves the use of multiple sensory modalities. Studies of communication through spoken utterances and gestures are well established, but communication through force signals is not well understood. In this paper, we focus on investigating the mechanisms employed by humans during the negotiation through force signals, and how the robot can communicate task goals, comprehend human intent, and take the lead as needed. To achieve this, we formulate a task that requires active force communication and propose a taxonomy that extends existing literature. Also, we conducted a study to observe how humans behave during collaborative manipulation tasks. An important contribution of this work is the novel features based on force-kinematic signals that demonstrate predictive power to recognize symbolic human intent. Further, we show the feasibility of developing a real-time intent classifier based on the novel features and speculate the role it plays in high-level robot controllers for physical Human-Robot Interaction (pHRI). This work provides important steps to achieve more human-like fluid interaction in physical co-manipulation tasks that are applicable and not limited to humanoid, assistive robots, and human-in-the-loop automation.
△ Less
Submitted 29 July, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
Deep Collective Knowledge Distillation
Authors:
Jihyeon Seo,
Kyusam Oh,
Chanho Min,
Yongkeun Yun,
Sungwoo Cho
Abstract:
Many existing studies on knowledge distillation have focused on methods in which a student model mimics a teacher model well.
Simply imitating the teacher's knowledge, however, is not sufficient for the student to surpass that of the teacher.
We explore a method to harness the knowledge of other students to complement the knowledge of the teacher.
We propose deep collective knowledge distill…
▽ More
Many existing studies on knowledge distillation have focused on methods in which a student model mimics a teacher model well.
Simply imitating the teacher's knowledge, however, is not sufficient for the student to surpass that of the teacher.
We explore a method to harness the knowledge of other students to complement the knowledge of the teacher.
We propose deep collective knowledge distillation for model compression, called DCKD, which is a method for training student models with rich information to acquire knowledge from not only their teacher model but also other student models.
The knowledge collected from several student models consists of a wealth of information about the correlation between classes.
Our DCKD considers how to increase the correlation knowledge of classes during training.
Our novel method enables us to create better performing student models for collecting knowledge.
This simple yet powerful method achieves state-of-the-art performances in many experiments.
For example, for ImageNet, ResNet18 trained with DCKD achieves 72.27\%, which outperforms the pretrained ResNet18 by 2.52\%.
For CIFAR-100, the student model of ShuffleNetV1 with DCKD achieves 6.55\% higher top-1 accuracy than the pretrained ShuffleNetV1.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Evaluating Multimodal Interaction of Robots Assisting Older Adults
Authors:
Afagh Mehri Shervedani,
Ki-Hwan Oh,
Bahareh Abbasi,
Natawut Monaikul,
Zhanibek Rysbek,
Barbara Di Eugenio,
Milos Zefran
Abstract:
We outline our work on evaluating robots that assist older adults by engaging with them through multiple modalities that include physical interaction. Our thesis is that to increase the effectiveness of assistive robots: 1) robots need to understand and effect multimodal actions, 2) robots should not only react to the human, they need to take the initiative and lead the task when it is necessary.…
▽ More
We outline our work on evaluating robots that assist older adults by engaging with them through multiple modalities that include physical interaction. Our thesis is that to increase the effectiveness of assistive robots: 1) robots need to understand and effect multimodal actions, 2) robots should not only react to the human, they need to take the initiative and lead the task when it is necessary. We start by briefly introducing our proposed framework for multimodal interaction and then describe two different experiments with the actual robots. In the first experiment, a Baxter robot helps a human find and locate an object using the Multimodal Interaction Manager (MIM) framework. In the second experiment, a NAO robot is used in the same task, however, the roles of the robot and the human are reversed. We discuss the evaluation methods that were used in these experiments, including different metrics employed to characterize the performance of the robot in each case. We conclude by providing our perspective on the challenges and opportunities for the evaluation of assistive robots for older adults in realistic settings.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
XADLiME: eXplainable Alzheimer's Disease Likelihood Map Estimation via Clinically-guided Prototype Learning
Authors:
Ahmad Wisnu Mulyadi,
Wonsik Jung,
Kwanseok Oh,
Jee Seok Yoon,
Heung-Il Suk
Abstract:
Diagnosing Alzheimer's disease (AD) involves a deliberate diagnostic process owing to its innate traits of irreversibility with subtle and gradual progression. These characteristics make AD biomarker identification from structural brain imaging (e.g., structural MRI) scans quite challenging. Furthermore, there is a high possibility of getting entangled with normal aging. We propose a novel deep-le…
▽ More
Diagnosing Alzheimer's disease (AD) involves a deliberate diagnostic process owing to its innate traits of irreversibility with subtle and gradual progression. These characteristics make AD biomarker identification from structural brain imaging (e.g., structural MRI) scans quite challenging. Furthermore, there is a high possibility of getting entangled with normal aging. We propose a novel deep-learning approach through eXplainable AD Likelihood Map Estimation (XADLiME) for AD progression modeling over 3D sMRIs using clinically-guided prototype learning. Specifically, we establish a set of topologically-aware prototypes onto the clusters of latent clinical features, uncovering an AD spectrum manifold. We then measure the similarities between latent clinical features and well-established prototypes, estimating a "pseudo" likelihood map. By considering this pseudo map as an enriched reference, we employ an estimating network to estimate the AD likelihood map over a 3D sMRI scan. Additionally, we promote the explainability of such a likelihood map by revealing a comprehensible overview from two perspectives: clinical and morphological. During the inference, this estimated likelihood map served as a substitute over unseen sMRI scans for effectively conducting the downstream task while providing thorough explainable states.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Authors:
Wonyoung Shin,
Jonghun Park,
Taekang Woo,
Yongwoo Cho,
Kwangjin Oh,
Hwanjun Song
Abstract:
Understanding vision and language representations of product content is vital for search and recommendation applications in e-commerce. As a backbone for online shopping platforms and inspired by the recent success in representation learning research, we propose a contrastive learning framework that aligns language and visual models using unlabeled raw product text and images. We present technique…
▽ More
Understanding vision and language representations of product content is vital for search and recommendation applications in e-commerce. As a backbone for online shopping platforms and inspired by the recent success in representation learning research, we propose a contrastive learning framework that aligns language and visual models using unlabeled raw product text and images. We present techniques we used to train large-scale representation learning models and share solutions that address domain-specific challenges. We study the performance using our pre-trained model as backbones for diverse downstream tasks, including category classification, attribute extraction, product matching, product clustering, and adult product recognition. Experimental results show that our proposed method outperforms the baseline in each downstream task regarding both single modality and multiple modalities.
△ Less
Submitted 22 August, 2022; v1 submitted 1 July, 2022;
originally announced July 2022.
-
Learn-Explain-Reinforce: Counterfactual Reasoning and Its Guidance to Reinforce an Alzheimer's Disease Diagnosis Model
Authors:
Kwanseok Oh,
Jee Seok Yoon,
Heung-Il Suk
Abstract:
Existing studies on disease diagnostic models focus either on diagnostic model learning for performance improvement or on the visual explanation of a trained diagnostic model. We propose a novel learn-explain-reinforce (LEAR) framework that unifies diagnostic model learning, visual explanation generation (explanation unit), and trained diagnostic model reinforcement (reinforcement unit) guided by…
▽ More
Existing studies on disease diagnostic models focus either on diagnostic model learning for performance improvement or on the visual explanation of a trained diagnostic model. We propose a novel learn-explain-reinforce (LEAR) framework that unifies diagnostic model learning, visual explanation generation (explanation unit), and trained diagnostic model reinforcement (reinforcement unit) guided by the visual explanation. For the visual explanation, we generate a counterfactual map that transforms an input sample to be identified as an intended target label. For example, a counterfactual map can localize hypothetical abnormalities within a normal brain image that may cause it to be diagnosed with Alzheimer's disease (AD). We believe that the generated counterfactual maps represent data-driven and model-induced knowledge about a target task, i.e., AD diagnosis using structural MRI, which can be a vital source of information to reinforce the generalization of the trained diagnostic model. To this end, we devise an attention-based feature refinement module with the guidance of the counterfactual maps. The explanation and reinforcement units are reciprocal and can be operated iteratively. Our proposed approach was validated via qualitative and quantitative analysis on the ADNI dataset. Its comprehensibility and fidelity were demonstrated through ablation studies and comparisons with existing methods.
△ Less
Submitted 21 August, 2021;
originally announced August 2021.
-
Trinity: A No-Code AI platform for complex spatial datasets
Authors:
C. V. Krishnakumar Iyer,
Feili Hou,
Henry Wang,
Yonghong Wang,
Kay Oh,
Swetava Ganguli,
Vipul Pandey
Abstract:
We present a no-code Artificial Intelligence (AI) platform called Trinity with the main design goal of enabling both machine learning researchers and non-technical geospatial domain experts to experiment with domain-specific signals and datasets for solving a variety of complex problems on their own. This versatility to solve diverse problems is achieved by transforming complex Spatio-temporal dat…
▽ More
We present a no-code Artificial Intelligence (AI) platform called Trinity with the main design goal of enabling both machine learning researchers and non-technical geospatial domain experts to experiment with domain-specific signals and datasets for solving a variety of complex problems on their own. This versatility to solve diverse problems is achieved by transforming complex Spatio-temporal datasets to make them consumable by standard deep learning models, in this case, Convolutional Neural Networks (CNNs), and giving the ability to formulate disparate problems in a standard way, eg. semantic segmentation. With an intuitive user interface, a feature store that hosts derivatives of complex feature engineering, a deep learning kernel, and a scalable data processing mechanism, Trinity provides a powerful platform for domain experts to share the stage with scientists and engineers in solving business-critical problems. It enables quick prototyping, rapid experimentation and reduces the time to production by standardizing model building and deployment. In this paper, we present our motivation behind Trinity and its design along with showcasing sample applications to motivate the idea of lowering the bar to using AI.
△ Less
Submitted 1 July, 2021; v1 submitted 21 June, 2021;
originally announced June 2021.
-
DUET: Detection Utilizing Enhancement for Text in Scanned or Captured Documents
Authors:
Eun-Soo Jung,
HyeongGwan Son,
Kyusam Oh,
Yongkeun Yun,
Soonhwan Kwon,
Min Soo Kim
Abstract:
We present a novel deep neural model for text detection in document images. For robust text detection in noisy scanned documents, the advantages of multi-task learning are adopted by adding an auxiliary task of text enhancement. Namely, our proposed model is designed to perform noise reduction and text region enhancement as well as text detection. Moreover, we enrich the training data for the mode…
▽ More
We present a novel deep neural model for text detection in document images. For robust text detection in noisy scanned documents, the advantages of multi-task learning are adopted by adding an auxiliary task of text enhancement. Namely, our proposed model is designed to perform noise reduction and text region enhancement as well as text detection. Moreover, we enrich the training data for the model with synthesized document images that are fully labeled for text detection and enhancement, thus overcome the insufficiency of labeled document image data. For the effective exploitation of the synthetic and real data, the training process is separated in two phases. The first phase is training only synthetic data in a fully-supervised manner. Then real data with only detection labels are added in the second phase. The enhancement task for the real data is weakly-supervised with information from their detection labels. Our methods are demonstrated in a real document dataset with performances exceeding those of other text detection methods. Moreover, ablations are conducted and the results confirm the effectiveness of the synthetic data, auxiliary task, and weak-supervision. Whereas the existing text detection studies mostly focus on the text in scenes, our proposed method is optimized to the applications for the text in scanned documents.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Topology and Routing Problems: The Circular Frame
Authors:
Rak-Kyeong Seong,
Chanho Min,
Sang-Hoon Han,
Jaeho Yang,
Seungwoo Nam,
Kyusam Oh
Abstract:
In this work, we solve the problem of finding non-intersecting paths between points on a plane with a new approach by borrowing ideas from geometric topology, in particular, from the study of polygonal schema in mathematics. We use a topological transformation on the 2-dimensional planar routing environment that simplifies the routing problem into a problem of connecting points on a circle with st…
▽ More
In this work, we solve the problem of finding non-intersecting paths between points on a plane with a new approach by borrowing ideas from geometric topology, in particular, from the study of polygonal schema in mathematics. We use a topological transformation on the 2-dimensional planar routing environment that simplifies the routing problem into a problem of connecting points on a circle with straight line segments that do not intersect in the interior of the circle. These points are either the points that need to be connected by non-intersecting paths or special `reference' points that parametrize the topology of the original environment prior to the transformation. When all the necessary points on the circle are fully connected, the transformation is reversed such that the line segments combine to become the non-intersecting paths that connect the start and end points in the original environment. We interpret the transformed environment in which the routing problem is solved as a new data structure where any routing problem can be solved efficiently. We perform experiments and show that the routing time and success rate of the new routing algorithm outperforms the ones for the A*-algorithm.
△ Less
Submitted 7 May, 2021;
originally announced May 2021.
-
Fully Automated Mitral Inflow Doppler Analysis Using Deep Learning
Authors:
Mohamed Y. Elwazir,
Zeynettin Akkus,
Didem Oguz,
Jae K. Oh
Abstract:
Echocardiography (echo) is an indispensable tool in a cardiologist's diagnostic armamentarium. To date, almost all echocardiographic parameters require time-consuming manual labeling and measurements by an experienced echocardiographer and exhibit significant variability, owing to the noisy and artifact-laden nature of echo images. For example, mitral inflow (MI) Doppler is used to assess left ven…
▽ More
Echocardiography (echo) is an indispensable tool in a cardiologist's diagnostic armamentarium. To date, almost all echocardiographic parameters require time-consuming manual labeling and measurements by an experienced echocardiographer and exhibit significant variability, owing to the noisy and artifact-laden nature of echo images. For example, mitral inflow (MI) Doppler is used to assess left ventricular (LV) diastolic function, which is of paramount clinical importance to distinguish between different cardiac diseases. In the current work we present a fully automated workflow which leverages deep learning to a) label MI Doppler images acquired in an echo study, b) detect the envelope of MI Doppler signal, c) extract early and late filing (E and A wave) flow velocities and E-wave deceleration time from the envelope. We trained a variety of convolutional neural networks (CNN) models on 5544 images of 140 patients for predicting 24 image classes including MI Doppler images and obtained overall accuracy of 0.97 on 1737 images of 40 patients. Automated E and A wave velocity showed excellent correlation (Pearson R 0.99 and 0.98 respectively) and Bland Altman agreement (mean difference 0.06 and 0.05 m/s respectively and SD 0.03 for both) with the operator measurements. Deceleration time also showed good but lower correlation (Pearson R 0.82) and Bland-Altman agreement (mean difference: 34.1ms, SD: 30.9ms). These results demonstrate feasibility of Doppler echocardiography measurement automation and the promise of a fully automated echocardiography measurement package.
△ Less
Submitted 24 November, 2020;
originally announced November 2020.
-
Born Identity Network: Multi-way Counterfactual Map Generation to Explain a Classifier's Decision
Authors:
Kwanseok Oh,
Jee Seok Yoon,
Heung-Il Suk
Abstract:
There exists an apparent negative correlation between performance and interpretability of deep learning models. In an effort to reduce this negative correlation, we propose a Born Identity Network (BIN), which is a post-hoc approach for producing multi-way counterfactual maps. A counterfactual map transforms an input sample to be conditioned and classified as a target label, which is similar to ho…
▽ More
There exists an apparent negative correlation between performance and interpretability of deep learning models. In an effort to reduce this negative correlation, we propose a Born Identity Network (BIN), which is a post-hoc approach for producing multi-way counterfactual maps. A counterfactual map transforms an input sample to be conditioned and classified as a target label, which is similar to how humans process knowledge through counterfactual thinking. For example, a counterfactual map can localize hypothetical abnormalities from a normal brain image that may cause it to be diagnosed with a disease. Specifically, our proposed BIN consists of two core components: Counterfactual Map Generator and Target Attribution Network. The Counterfactual Map Generator is a variation of conditional GAN which can synthesize a counterfactual map conditioned on an arbitrary target label. The Target Attribution Network provides adequate assistance for generating synthesized maps by conditioning a target label into the Counterfactual Map Generator. We have validated our proposed BIN in qualitative and quantitative analysis on MNIST, 3D Shapes, and ADNI datasets, and showed the comprehensibility and fidelity of our method from various ablation studies.
△ Less
Submitted 8 April, 2021; v1 submitted 20 November, 2020;
originally announced November 2020.
-
Toward the Fully Physics-Informed Echo State Network -- an ODE Approximator Based on Recurrent Artificial Neurons
Authors:
Dong Keun Oh
Abstract:
Inspired by recent theoretical arguments, physics-informed echo state network (ESN) is discussed on the attempt to train a reservoir model absolutely in physics-informed manner. As the plainest work on such a purpose, an ODE (ordinary differential equation) approximator is designed to replicate the solution in sequence with respect to the recurrent evaluations. On the principal invariance of diffe…
▽ More
Inspired by recent theoretical arguments, physics-informed echo state network (ESN) is discussed on the attempt to train a reservoir model absolutely in physics-informed manner. As the plainest work on such a purpose, an ODE (ordinary differential equation) approximator is designed to replicate the solution in sequence with respect to the recurrent evaluations. On the principal invariance of differential equations, the constraint in recurrence just takes shape to secure a proper regression method for the ESN-based ODE approximator. After then, the actual training process is established on the idea of two-pass strategy for regression. Aiming at the fully physics-informed reservoir model, a couple of nonlinear dynamical problems are demonstrated as the computations obtained from the proposed method in this study.
△ Less
Submitted 13 November, 2020;
originally announced November 2020.
-
Regional Multi-scale Approach for Visually Pleasing Explanations of Deep Neural Networks
Authors:
Dasom Seo,
Kanghan Oh,
Il-Seok Oh
Abstract:
Recently, many methods to interpret and visualize deep neural network predictions have been proposed and significant progress has been made. However, a more class-discriminative and visually pleasing explanation is required. Thus, this paper proposes a region-based approach that estimates feature importance in terms of appropriately segmented regions. By fusing the saliency maps generated from mul…
▽ More
Recently, many methods to interpret and visualize deep neural network predictions have been proposed and significant progress has been made. However, a more class-discriminative and visually pleasing explanation is required. Thus, this paper proposes a region-based approach that estimates feature importance in terms of appropriately segmented regions. By fusing the saliency maps generated from multi-scale segmentations, a more class-discriminative and visually pleasing map is obtained. We incorporate this regional multi-scale concept into a prediction difference method that is model-agnostic. An input image is segmented in several scales using the super-pixel method, and exclusion of a region is simulated by sampling a normal distribution constructed using the boundary prior. The experimental results demonstrate that the regional multi-scale method produces much more class-discriminative and visually pleasing saliency maps.
△ Less
Submitted 1 August, 2018; v1 submitted 31 July, 2018;
originally announced July 2018.
-
Load-Modulated Single-RF MIMO Transmission for Spatially Multiplexed QAM Signals
Authors:
Seung-Eun Hong,
Kyoung-Sub Oh
Abstract:
Today, MIMO has become an indispensable scheme for providing significant spectral efficiency in wireless communication and for future wireless system, recently, it goes to two extremes: massive MIMO and single-RF MIMO. This paper, which is put in the latter, utilizes load-modulated arrays with only reactance loads for single-RF transmission of spatially multiplexed QAM signals. To alleviate the ne…
▽ More
Today, MIMO has become an indispensable scheme for providing significant spectral efficiency in wireless communication and for future wireless system, recently, it goes to two extremes: massive MIMO and single-RF MIMO. This paper, which is put in the latter, utilizes load-modulated arrays with only reactance loads for single-RF transmission of spatially multiplexed QAM signals. To alleviate the need for iterative processes while considering mutual coupling in the compact antenna, we present a novel design methodology for the loading network, which enables the exact computation of the three reactance loads per antenna element and also the perfect matching to the source with the opportunity to select appropriate analog tunable loads. We verify the design methodology by comparing the calculated values for some key parameters with the values from the circuit simulation. In addition, as an evaluation of the proposed architecture, we perform the bit error rate (BER) comparison which shows that our scheme with ideal loading is comparable to the conventional MIMO.
△ Less
Submitted 7 January, 2015;
originally announced January 2015.