Search | arXiv e-print repository

Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling

Authors: Bryan Wong, Jong Woo Kim, Huazhu Fu, Mun Yong Yi

Abstract: Vision-language models (VLMs) have recently been integrated into multiple instance learning (MIL) frameworks to address the challenge of few-shot, weakly supervised classification of whole slide images (WSIs). A key trend involves leveraging multi-scale information to better represent hierarchical tissue structures. However, existing methods often face two key limitations: (1) insufficient modelin… ▽ More Vision-language models (VLMs) have recently been integrated into multiple instance learning (MIL) frameworks to address the challenge of few-shot, weakly supervised classification of whole slide images (WSIs). A key trend involves leveraging multi-scale information to better represent hierarchical tissue structures. However, existing methods often face two key limitations: (1) insufficient modeling of interactions within the same modalities across scales (e.g., 5x and 20x) and (2) inadequate alignment between visual and textual modalities on the same scale. To address these gaps, we propose HiVE-MIL, a hierarchical vision-language framework that constructs a unified graph consisting of (1) parent-child links between coarse (5x) and fine (20x) visual/textual nodes to capture hierarchical relationships, and (2) heterogeneous intra-scale edges linking visual and textual nodes on the same scale. To further enhance semantic consistency, HiVE-MIL incorporates a two-stage, text-guided dynamic filtering mechanism that removes weakly correlated patch-text pairs, and introduces a hierarchical contrastive loss to align textual semantics across scales. Extensive experiments on TCGA breast, lung, and kidney cancer datasets demonstrate that HiVE-MIL consistently outperforms both traditional MIL and recent VLM-based MIL approaches, achieving gains of up to 4.1% in macro F1 under 16-shot settings. Our results demonstrate the value of jointly modeling hierarchical structure and multimodal alignment for efficient and scalable learning from limited pathology data. The code is available at https://github.com/bryanwong17/HiVE-MIL △ Less

Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.10251 [pdf, ps, other]

SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning

Authors: Ji Woong Kim, Juo-Tung Chen, Pascal Hansen, Lucy X. Shi, Antony Goldenberg, Samuel Schmidgall, Paul Maria Scheikl, Anton Deguet, Brandon M. White, De Ru Tsai, Richard Cha, Jeffrey Jopling, Chelsea Finn, Axel Krieger

Abstract: Research on autonomous robotic surgery has largely focused on simple task automation in controlled environments. However, real-world surgical applications require dexterous manipulation over extended time scales while demanding generalization across diverse variations in human tissue. These challenges remain difficult to address using existing logic-based or conventional end-to-end learning strate… ▽ More Research on autonomous robotic surgery has largely focused on simple task automation in controlled environments. However, real-world surgical applications require dexterous manipulation over extended time scales while demanding generalization across diverse variations in human tissue. These challenges remain difficult to address using existing logic-based or conventional end-to-end learning strategies. To bridge this gap, we propose a hierarchical framework for dexterous, long-horizon surgical tasks. Our method employs a high-level policy for task planning and a low-level policy for generating task-space controls for the surgical robot. The high-level planner plans tasks using language, producing task-specific or corrective instructions that guide the robot at a coarse level. Leveraging language as a planning modality offers an intuitive and generalizable interface, mirroring how experienced surgeons instruct traineers during procedures. We validate our framework in ex-vivo experiments on a complex minimally invasive procedure, cholecystectomy, and conduct ablative studies to assess key design choices. Our approach achieves a 100% success rate across n=8 different ex-vivo gallbladders, operating fully autonomously without human intervention. The hierarchical approach greatly improves the policy's ability to recover from suboptimal states that are inevitable in the highly dynamic environment of realistic surgical applications. This work represents the first demonstration of step-level autonomy, marking a critical milestone toward autonomous surgical systems for clinical studies. By advancing generalizable autonomy in surgical robotics, our approach brings the field closer to real-world deployment. △ Less

Submitted 15 May, 2025; originally announced May 2025.

arXiv:2504.11573 [pdf]

X-ray scattering investigation of hydride surface segregation in epitaxial Nb films

Authors: David A. Garcia-Wetten, Philip J. Ryan, Jong Woo Kim, Dominic P Goronzy, Roger J. Reinertsen, Mark C. Hersam, Michael J. Bedzyk

Abstract: Hydride precipitation in niobium-based, superconducting circuits is a damaging side-effect of hydrofluoric acid treatments used to clean and thin the Nb surface oxides and Si oxides. The precipitate microstructure is difficult to probe because of the high hydrogen mobility in the niobium matrix. In particular, destructive techniques used to prepare samples for elemental depth profiling can change… ▽ More Hydride precipitation in niobium-based, superconducting circuits is a damaging side-effect of hydrofluoric acid treatments used to clean and thin the Nb surface oxides and Si oxides. The precipitate microstructure is difficult to probe because of the high hydrogen mobility in the niobium matrix. In particular, destructive techniques used to prepare samples for elemental depth profiling can change the hydride structure. Here, we use X-ray surface scattering to non-destructively probe the depth distribution of precipitates in hydrided, epitaxial, niobium thin films. We find that the niobium hydride is confined within the top ten nm of the surface. △ Less

Submitted 15 April, 2025; originally announced April 2025.

Comments: 11 pages, 5 figures

arXiv:2412.20706 [pdf, other]

Effect of disorder on the strain-tuned charge density wave multicriticality in Pd$_x$ErTe$_3$

Authors: Anisha G. Singh, Matthew Krogstad, Maja D. Bachmann, Paul Thompson, Stephan Rosenkranz, Ray Osborn, Alan Fang, Aharon Kapitulnik, Jong Woo Kim, Philip J. Ryan, Steven A. Kivelson, Ian R. Fisher

Abstract: We explore, through a combination of x-ray diffraction and elastoresistivity measurements, the effect of disorder on the strain-tuned charge density wave and associated multicriticality in Pd$_x$ErTe$_3$ (x = 0, 0.01, 0.02 and 0.026). We focus particularly on the behavior near the strain-tuned bicritical point that occurs in pristine ErTe$_3$ (x=0). Our study reveals that while Pd intercalation so… ▽ More We explore, through a combination of x-ray diffraction and elastoresistivity measurements, the effect of disorder on the strain-tuned charge density wave and associated multicriticality in Pd$_x$ErTe$_3$ (x = 0, 0.01, 0.02 and 0.026). We focus particularly on the behavior near the strain-tuned bicritical point that occurs in pristine ErTe$_3$ (x=0). Our study reveals that while Pd intercalation somewhat broadens the signatures of the CDW phase transitions, the line of first-order transitions at which the CDW reorients as a function of applied strain persists in the presence of disorder and still seemingly terminates at a critical point. The critical point occurs at a lower temperature and a lower strain compared to pristine ErTe$_3$. Similarly, the nematic elastoresistance of Pd$_x$ErTe$_3$, though suppressed in magnitude and broadened relative to that of ErTe$_3$, has a markedly more symmetric response around the critical point. These observations point to disorder driving a reduction in the system's electronic orthorhombicity even while the material remains irrevocably orthorhombic due to the presence of a glide plane in the crystal structure. Disorder, it would appear, reinforces the emergence of a "pseudo-tetragonal" electronic response in this fundamentally orthorhombic material. △ Less

Submitted 29 December, 2024; originally announced December 2024.

arXiv:2412.12906 [pdf, other]

CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image

Authors: Wonseok Roh, Hwanhee Jung, Jong Wook Kim, Seunggwan Lee, Innfarn Yoo, Andreas Lugmayr, Seunggeun Chi, Karthik Ramani, Sangpil Kim

Abstract: Recently, generalizable feed-forward methods based on 3D Gaussian Splatting have gained significant attention for their potential to reconstruct 3D scenes using finite resources. These approaches create a 3D radiance field, parameterized by per-pixel 3D Gaussian primitives, from just a few images in a single forward pass. However, unlike multi-view methods that benefit from cross-view corresponden… ▽ More Recently, generalizable feed-forward methods based on 3D Gaussian Splatting have gained significant attention for their potential to reconstruct 3D scenes using finite resources. These approaches create a 3D radiance field, parameterized by per-pixel 3D Gaussian primitives, from just a few images in a single forward pass. However, unlike multi-view methods that benefit from cross-view correspondences, 3D scene reconstruction with a single-view image remains an underexplored area. In this work, we introduce CATSplat, a novel generalizable transformer-based framework designed to break through the inherent constraints in monocular settings. First, we propose leveraging textual guidance from a visual-language model to complement insufficient information from a single image. By incorporating scene-specific contextual details from text embeddings through cross-attention, we pave the way for context-aware 3D scene reconstruction beyond relying solely on visual cues. Moreover, we advocate utilizing spatial guidance from 3D point features toward comprehensive geometric understanding under single-view settings. With 3D priors, image features can capture rich structural insights for predicting 3D Gaussians without multi-view techniques. Extensive experiments on large-scale datasets demonstrate the state-of-the-art performance of CATSplat in single-view 3D scene reconstruction with high-quality novel view synthesis. △ Less

Submitted 3 February, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

arXiv:2411.05727 [pdf]

Cascade hot carriers via broad-band resonant tunneling

Authors: Kamal Kumar Paul, Ashok Mondal, Jae Woo Kim, Ji-Hee Kim, Young Hee Lee

Abstract: Extraction of hot carriers (HCs) over the band-edge is a key to harvest solar energy beyond Shockley-Queisser limit1. Graphene is known as a HC-layered material due to phonon bottleneck effect near Dirac point, but limited by low photocarrier density2. Graphene/transition metal dichalcogenide (TMD) heterostructures circumvent this issue by ultrafast carrier transfer from TMD to graphene2,3. Nevert… ▽ More Extraction of hot carriers (HCs) over the band-edge is a key to harvest solar energy beyond Shockley-Queisser limit1. Graphene is known as a HC-layered material due to phonon bottleneck effect near Dirac point, but limited by low photocarrier density2. Graphene/transition metal dichalcogenide (TMD) heterostructures circumvent this issue by ultrafast carrier transfer from TMD to graphene2,3. Nevertheless, efficient extraction of photocurrent by means of HCs together with carrier multiplication (CM) is still missing. Here, we introduce an ultrathin broadband resonant tunneling (BRT) barrier, TiOX to efficiently extract photocurrent with simultaneous CM and HC measurements in MoS2/graphene/TiOX heterostructure. The BRT layer gives rise to boosting open circuit voltage which is linearly proportional to incident photon energy. Meanwhile, short circuit current rises rapidly over 2Eg with obvious CM feature. This was explained by defining the joint density of states between graphene and TiOX layer over positive and negative voltage. The broadband resonant tunneling states inherently constructed from oxidation states varying from Ti3+ to Ti4+ allow the ultrafast HCs to efficiently transfer from graphene to TiOX layer. We find that the number of available tunneling states is directly proportional to short circuit current, which is well corroborated with TiOX and MoS2 thickness variance. We obtained an optimum thickness of BRT layer of ~2.8 nm, yielding cascade open circuit voltage as high as ~0.7 V, two orders of magnitude higher than that without BRT layer to reach a record efficiency of 5.3% with improved fill factor owing to synergistic HC and CM conversion under 1-SUN with long-term stability. △ Less

Submitted 8 November, 2024; originally announced November 2024.

arXiv:2410.21276 [pdf, other]

GPT-4o System Card

Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities. △ Less

Submitted 25 October, 2024; originally announced October 2024.

arXiv:2410.20026 [pdf, other]

Towards Robust Algorithms for Surgical Phase Recognition via Digital Twin Representation

Authors: Hao Ding, Yuqian Zhang, Wenzheng Cheng, Xinyu Wang, Xu Lian, Chenhao Yu, Hongchao Shu, Ji Woong Kim, Axel Krieger, Mathias Unberath

Abstract: Surgical phase recognition (SPR) is an integral component of surgical data science, enabling high-level surgical analysis. End-to-end trained neural networks that predict surgical phase directly from videos have shown excellent performance on benchmarks. However, these models struggle with robustness due to non-causal associations in the training set. Our goal is to improve model robustness to var… ▽ More Surgical phase recognition (SPR) is an integral component of surgical data science, enabling high-level surgical analysis. End-to-end trained neural networks that predict surgical phase directly from videos have shown excellent performance on benchmarks. However, these models struggle with robustness due to non-causal associations in the training set. Our goal is to improve model robustness to variations in the surgical videos by leveraging the digital twin (DT) paradigm -- an intermediary layer to separate high-level analysis (SPR) from low-level processing. As a proof of concept, we present a DT representation-based framework for SPR from videos. The framework employs vision foundation models with reliable low-level scene understanding to craft DT representation. We embed the DT representation in place of raw video inputs in the state-of-the-art SPR model. The framework is trained on the Cholec80 dataset and evaluated on out-of-distribution (OOD) and corrupted test samples. Contrary to the vulnerability of the baseline model, our framework demonstrates strong robustness on both OOD and corrupted samples, with a video-level accuracy of 80.3 on a highly corrupted Cholec80 test set, 67.9 on the challenging CRCD dataset, and 99.8 on an internal robotic surgery dataset, outperforming the baseline by 3.9, 16.8, and 90.9 respectively. We also find that using DT representation as an augmentation to the raw input can significantly improve model robustness. Our findings lend support to the thesis that DT representations are effective in enhancing model robustness. Future work will seek to improve the feature informativeness and incorporate interpretability for a more comprehensive framework. △ Less

Submitted 1 March, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

arXiv:2410.02486 [pdf, other]

Encryption-Friendly LLM Architecture

Authors: Donghwan Rho, Taeseong Kim, Minje Park, Jung Woo Kim, Hyunsik Chae, Ernest K. Ryu, Jung Hee Cheon

Abstract: Large language models (LLMs) offer personalized responses based on user interactions, but this use case raises serious privacy concerns. Homomorphic encryption (HE) is a cryptographic protocol supporting arithmetic computations in encrypted states and provides a potential solution for privacy-preserving machine learning (PPML). However, the computational intensity of transformers poses challenges… ▽ More Large language models (LLMs) offer personalized responses based on user interactions, but this use case raises serious privacy concerns. Homomorphic encryption (HE) is a cryptographic protocol supporting arithmetic computations in encrypted states and provides a potential solution for privacy-preserving machine learning (PPML). However, the computational intensity of transformers poses challenges for applying HE to LLMs. In this work, we propose a modified HE-friendly transformer architecture with an emphasis on inference following personalized (private) fine-tuning. Utilizing LoRA fine-tuning and Gaussian kernels, we achieve significant computational speedups -- 6.94x for fine-tuning and 2.3x for inference -- while maintaining performance comparable to plaintext models. Our findings provide a viable proof of concept for offering privacy-preserving LLM services in areas where data protection is crucial. Our code is available on GitHub. △ Less

Submitted 20 February, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

Comments: 27 pages

arXiv:2410.00046 [pdf, other]

Mixture of Multicenter Experts in Multimodal Generative AI for Advanced Radiotherapy Target Delineation

Authors: Yujin Oh, Sangjoon Park, Xiang Li, Wang Yi, Jonathan Paly, Jason Efstathiou, Annie Chan, Jun Won Kim, Hwa Kyung Byun, Ik Jae Lee, Jaeho Cho, Chan Woo Wee, Peng Shu, Peilong Wang, Nathan Yu, Jason Holmes, Jong Chul Ye, Quanzheng Li, Wei Liu, Woong Sub Koom, Jin Sung Kim, Kyungsang Kim

Abstract: Clinical experts employ diverse philosophies and strategies in patient care, influenced by regional patient populations. However, existing medical artificial intelligence (AI) models are often trained on data distributions that disproportionately reflect highly prevalent patterns, reinforcing biases and overlooking the diverse expertise of clinicians. To overcome this limitation, we introduce the… ▽ More Clinical experts employ diverse philosophies and strategies in patient care, influenced by regional patient populations. However, existing medical artificial intelligence (AI) models are often trained on data distributions that disproportionately reflect highly prevalent patterns, reinforcing biases and overlooking the diverse expertise of clinicians. To overcome this limitation, we introduce the Mixture of Multicenter Experts (MoME) approach. This method strategically integrates specialized expertise from diverse clinical strategies, enhancing the AI model's ability to generalize and adapt across multiple medical centers. The MoME-based multimodal target volume delineation model, trained with few-shot samples including images and clinical notes from each medical center, outperformed baseline methods in prostate cancer radiotherapy target delineation. The advantages of MoME were most pronounced when data characteristics varied across centers or when data availability was limited, demonstrating its potential for broader clinical applications. Therefore, the MoME framework enables the deployment of AI-based target volume delineation models in resource-constrained medical facilities by adapting to specific preferences of each medical center only using a few sample data, without the need for data sharing between institutions. Expanding the number of multicenter experts within the MoME framework will significantly enhance the generalizability, while also improving the usability and adaptability of clinical AI applications in the field of precision radiation oncology. △ Less

Submitted 26 October, 2024; v1 submitted 27 September, 2024; originally announced October 2024.

Comments: 39 pages

arXiv:2407.12998 [pdf, other]

Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks

Authors: Ji Woong Kim, Tony Z. Zhao, Samuel Schmidgall, Anton Deguet, Marin Kobilarov, Chelsea Finn, Axel Krieger

Abstract: We explore whether surgical manipulation tasks can be learned on the da Vinci robot via imitation learning. However, the da Vinci system presents unique challenges which hinder straight-forward implementation of imitation learning. Notably, its forward kinematics is inconsistent due to imprecise joint measurements, and naively training a policy using such approximate kinematics data often leads to… ▽ More We explore whether surgical manipulation tasks can be learned on the da Vinci robot via imitation learning. However, the da Vinci system presents unique challenges which hinder straight-forward implementation of imitation learning. Notably, its forward kinematics is inconsistent due to imprecise joint measurements, and naively training a policy using such approximate kinematics data often leads to task failure. To overcome this limitation, we introduce a relative action formulation which enables successful policy training and deployment using its approximate kinematics data. A promising outcome of this approach is that the large repository of clinical data, which contains approximate kinematics, may be directly utilized for robot learning without further corrections. We demonstrate our findings through successful execution of three fundamental surgical tasks, including tissue manipulation, needle handling, and knot-tying. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 8 pages

arXiv:2407.07317 [pdf, other]

Flow-acoustic resonance in deep and inclined cavities

Authors: You Wei Ho, Jae Wook Kim

Abstract: This paper presents numerical investigations of flow-acoustic resonances in deep and inclined cavities using wall-resolved large eddy simulations. The study focuses on cavity configurations with an aspect ratio of $D/L = 2.632$, subjected to two Mach numbers of $0.2$ and $0.3$ at three different inclination angles ($α=30^{\circ}$, $60^{\circ}$, and $90^{\circ}$). Fully turbulent boundary layers ge… ▽ More This paper presents numerical investigations of flow-acoustic resonances in deep and inclined cavities using wall-resolved large eddy simulations. The study focuses on cavity configurations with an aspect ratio of $D/L = 2.632$, subjected to two Mach numbers of $0.2$ and $0.3$ at three different inclination angles ($α=30^{\circ}$, $60^{\circ}$, and $90^{\circ}$). Fully turbulent boundary layers generated from independent precursor simulations are employed upstream of the cavities. Initial results highlight distinct aeroacoustic responses between inclined and orthogonal cavities, particularly at $M_{\infty}=0.3$, where inclined cavities exhibit stronger resonances at a lower peak frequency ($St\approx 0.27$) compared to the orthogonal cavity. Further analysis reveals that this lower Strouhal number corresponds to a reduced vortex convection speed linked to large shear-layer oscillations. Additionally, the acoustic input-output analysis indicates that the inclined cavities amplify acoustic responses more effectively and exhibit weaker source-sink cancellations compared to the orthogonal cavity. These mechanisms are identified as the primary contributors to the enhanced aeroacoustic responses in the inclined cavities. Finally, this paper proposes that the ratio between acoustic particle displacement and momentum thickness may be used as a criterion to predict the onset of the distinctive resonance at $St\approx 0.27$. It is suggested that the amplified resonances may be linked to a nonlinear mode shift of the first hydrodynamic mode through enhanced shear-layer oscillation taking place when the proposed criterion is met. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2405.07650 [pdf, other]

Arrow of Time in Estimation and Control: Duality Theory Beyond the Linear Gaussian Model

Authors: Jin Won Kim, Prashant G. Mehta

Abstract: Duality between estimation and control is a foundational concept in Control Theory. Most students learn about the elementary duality -- between observability and controllability -- in their first graduate course in linear systems theory. Therefore, it comes as a surprise that for a more general class of nonlinear stochastic systems (hidden Markov models or HMMs), duality is incomplete. Our objec… ▽ More Duality between estimation and control is a foundational concept in Control Theory. Most students learn about the elementary duality -- between observability and controllability -- in their first graduate course in linear systems theory. Therefore, it comes as a surprise that for a more general class of nonlinear stochastic systems (hidden Markov models or HMMs), duality is incomplete. Our objective in writing this article is two-fold: (i) To describe the difficulty in extending duality to HMMs; and (ii) To discuss its recent resolution by the authors. A key message is that the main difficulty in extending duality comes from time reversal in going from estimation to control. The reason for time reversal is explained with the aid of the familiar linear deterministic and linear Gaussian models. The explanation is used to motivate the difference between the linear and the nonlinear models. Once the difference is understood, duality for HMMs is described based on our recent work. The article also includes a comparison and discussion of the different types of duality considered in literature. △ Less

Submitted 8 October, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.02066 [pdf, other]

WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights

Authors: Youngdong Jang, Dong In Lee, MinHyuk Jang, Jong Wook Kim, Feng Yang, Sangpil Kim

Abstract: The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representat… ▽ More The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representations. In this work, we introduce an innovative watermarking method that can be employed in both representations of NeRF. This is achieved by fine-tuning NeRF to embed binary messages in the rendering process. In detail, we propose utilizing the discrete wavelet transform in the NeRF space for watermarking. Furthermore, we adopt a deferred back-propagation technique and introduce a combination with the patch-wise loss to improve rendering quality and bit accuracy with minimum trade-offs. We evaluate our method in three different aspects: capacity, invisibility, and robustness of the embedded watermarks in the 2D-rendered images. Our method achieves state-of-the-art performance with faster training speed over the compared state-of-the-art methods. △ Less

Submitted 11 July, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.01127 [pdf, other]

Backward Map for Filter Stability Analysis

Authors: Jin Won Kim, Anant A. Joshi, Prashant G. Mehta

Abstract: In this paper, a backward map is introduced for the purposes of analysis of the nonlinear (stochastic) filter stability. The backward map is important because the filter-stability in the sense of $\chisq$-divergence follows from showing a certain variance decay property for the backward map. To show this property requires additional assumptions on the model properties of the hidden Markov model (H… ▽ More In this paper, a backward map is introduced for the purposes of analysis of the nonlinear (stochastic) filter stability. The backward map is important because the filter-stability in the sense of $\chisq$-divergence follows from showing a certain variance decay property for the backward map. To show this property requires additional assumptions on the model properties of the hidden Markov model (HMM). The analysis in this paper is based on introducing a Poincaré Inequality (PI) for HMMs with white noise observations. In finite state-space settings, PI is related to both the ergodicity of the Markov process as well as the observability of the HMM. It is shown that the Poincaré constant is positive if and only if the HMM is detectable. △ Less

Submitted 8 October, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: Conference proceeding related to arXiv:2305.12850

arXiv:2404.15779 [pdf, ps, other]

Divergence metrics in the study of Markov and hidden Markov processes

Authors: Jin Won Kim, Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper is divided into two parts. The first part reviews the formulae for f-divergences in the study of continuous-time Markov processes and explores their applications in areas such as stochastic stability, the second law of thermodynamics, and its non-equilibrium extensions. This sets the foundation for the second part, which focuses on f-divergence in the study of hidden Markov processes. I… ▽ More This paper is divided into two parts. The first part reviews the formulae for f-divergences in the study of continuous-time Markov processes and explores their applications in areas such as stochastic stability, the second law of thermodynamics, and its non-equilibrium extensions. This sets the foundation for the second part, which focuses on f-divergence in the study of hidden Markov processes. In this context, we present analyses of filter stability and stochastic thermodynamics, with the latter being used to illustrate the concept of a Maxwell demon in an over-damped Langevin model with white noise observations. The paper's expository style and unified formalism for both Markov and hidden Markov processes aim to serve as a valuable resource for researchers working across related fields. △ Less

Submitted 2 October, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2403.14111 [pdf, other]

HETAL: Efficient Privacy-preserving Transfer Learning with Homomorphic Encryption

Authors: Seewoo Lee, Garam Lee, Jung Woo Kim, Junbum Shin, Mun-Kyu Lee

Abstract: Transfer learning is a de facto standard method for efficiently training machine learning models for data-scarce problems by adding and fine-tuning new classification layers to a model pre-trained on large datasets. Although numerous previous studies proposed to use homomorphic encryption to resolve the data privacy issue in transfer learning in the machine learning as a service setting, most of t… ▽ More Transfer learning is a de facto standard method for efficiently training machine learning models for data-scarce problems by adding and fine-tuning new classification layers to a model pre-trained on large datasets. Although numerous previous studies proposed to use homomorphic encryption to resolve the data privacy issue in transfer learning in the machine learning as a service setting, most of them only focused on encrypted inference. In this study, we present HETAL, an efficient Homomorphic Encryption based Transfer Learning algorithm, that protects the client's privacy in training tasks by encrypting the client data using the CKKS homomorphic encryption scheme. HETAL is the first practical scheme that strictly provides encrypted training, adopting validation-based early stopping and achieving the accuracy of nonencrypted training. We propose an efficient encrypted matrix multiplication algorithm, which is 1.8 to 323 times faster than prior methods, and a highly precise softmax approximation algorithm with increased coverage. The experimental results for five well-known benchmark datasets show total training times of 567-3442 seconds, which is less than an hour. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: ICML 2023, Appendix D includes some updates after official publication

Journal ref: PMLR 202:19010-19035, 2023

arXiv:2403.08187 [pdf, other]

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

Authors: Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung Nam

Abstract: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit… ▽ More This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children with SSDs is impractical. We fine-tuned the wav2vec 2.0 XLS-R model to recognize speech as pronounced rather than as existing words. The model was fine-tuned with a speech dataset from 137 children with inadequate speech production pronouncing 73 Korean words selected for actual clinical diagnosis. The model's predictions of the pronunciations of the words matched the human annotations with about 90% accuracy. While the model still requires improvement in recognizing unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 12 pages, 2 figures

ACM Class: I.2.7

arXiv:2403.05949 [pdf, other]

General surgery vision transformer: A video pre-trained foundation model for general surgery

Authors: Samuel Schmidgall, Ji Woong Kim, Jeffrey Jopling, Axel Krieger

Abstract: The absence of openly accessible data and specialized foundation models is a major barrier for computational research in surgery. Toward this, (i) we open-source the largest dataset of general surgery videos to-date, consisting of 680 hours of surgical videos, including data from robotic and laparoscopic techniques across 28 procedures; (ii) we propose a technique for video pre-training a general… ▽ More The absence of openly accessible data and specialized foundation models is a major barrier for computational research in surgery. Toward this, (i) we open-source the largest dataset of general surgery videos to-date, consisting of 680 hours of surgical videos, including data from robotic and laparoscopic techniques across 28 procedures; (ii) we propose a technique for video pre-training a general surgery vision transformer (GSViT) on surgical videos based on forward video prediction that can run in real-time for surgical applications, toward which we open-source the code and weights of GSViT; (iii) we also release code and weights for procedure-specific fine-tuned versions of GSViT across 10 procedures; (iv) we demonstrate the performance of GSViT on the Cholec80 phase annotation task, displaying improved performance over state-of-the-art single frame predictors. △ Less

Submitted 12 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2402.08113 [pdf, other]

Addressing cognitive bias in medical language models

Authors: Samuel Schmidgall, Carl Harris, Ime Essien, Daniel Olshvang, Tawsifur Rahman, Ji Woong Kim, Rojin Ziaei, Jason Eshraghian, Peter Abadir, Rama Chellappa

Abstract: There is increasing interest in the application large language models (LLMs) to the medical field, in part because of their impressive performance on medical exam questions. While promising, exam questions do not reflect the complexity of real patient-doctor interactions. In reality, physicians' decisions are shaped by many complex factors, such as patient compliance, personal experience, ethical… ▽ More There is increasing interest in the application large language models (LLMs) to the medical field, in part because of their impressive performance on medical exam questions. While promising, exam questions do not reflect the complexity of real patient-doctor interactions. In reality, physicians' decisions are shaped by many complex factors, such as patient compliance, personal experience, ethical beliefs, and cognitive bias. Taking a step toward understanding this, our hypothesis posits that when LLMs are confronted with clinical questions containing cognitive biases, they will yield significantly less accurate responses compared to the same questions presented without such biases. In this study, we developed BiasMedQA, a benchmark for evaluating cognitive biases in LLMs applied to medical tasks. Using BiasMedQA we evaluated six LLMs, namely GPT-4, Mixtral-8x70B, GPT-3.5, PaLM-2, Llama 2 70B-chat, and the medically specialized PMC Llama 13B. We tested these models on 1,273 questions from the US Medical Licensing Exam (USMLE) Steps 1, 2, and 3, modified to replicate common clinically-relevant cognitive biases. Our analysis revealed varying effects for biases on these LLMs, with GPT-4 standing out for its resilience to bias, in contrast to Llama 2 70B-chat and PMC Llama 13B, which were disproportionately affected by cognitive bias. Our findings highlight the critical need for bias mitigation in the development of medical LLMs, pointing towards safer and more reliable applications in healthcare. △ Less

Submitted 20 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.18006 [pdf, other]

EEG-GPT: Exploring Capabilities of Large Language Models for EEG Classification and Interpretation

Authors: Jonathan W. Kim, Ahmed Alaa, Danilo Bernardo

Abstract: In conventional machine learning (ML) approaches applied to electroencephalography (EEG), this is often a limited focus, isolating specific brain activities occurring across disparate temporal scales (from transient spikes in milliseconds to seizures lasting minutes) and spatial scales (from localized high-frequency oscillations to global sleep activity). This siloed approach limits the developmen… ▽ More In conventional machine learning (ML) approaches applied to electroencephalography (EEG), this is often a limited focus, isolating specific brain activities occurring across disparate temporal scales (from transient spikes in milliseconds to seizures lasting minutes) and spatial scales (from localized high-frequency oscillations to global sleep activity). This siloed approach limits the development EEG ML models that exhibit multi-scale electrophysiological understanding and classification capabilities. Moreover, typical ML EEG approaches utilize black-box approaches, limiting their interpretability and trustworthiness in clinical contexts. Thus, we propose EEG-GPT, a unifying approach to EEG classification that leverages advances in large language models (LLM). EEG-GPT achieves excellent performance comparable to current state-of-the-art deep learning methods in classifying normal from abnormal EEG in a few-shot learning paradigm utilizing only 2% of training data. Furthermore, it offers the distinct advantages of providing intermediate reasoning steps and coordinating specialist EEG tools across multiple scales in its operation, offering transparent and interpretable step-by-step verification, thereby promoting trustworthiness in clinical contexts. △ Less

Submitted 3 February, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.14430 [pdf, other]

A Westervelt equation for acoustic wave propagation through weakly stratified, arbitrary Mach number atmospheres

Authors: Liam J. Tope, Jae Wook Kim, Peter Spence

Abstract: Nonlinear distortion of infrasonic waves through atmospheres up to thermospheric altitudes govern large-range ground-level observations of explosive noise sources, causing large differences between the near and far field. Propagation modelling in this scenario to include realistic nonlinear effects has thus far been limited to high-fidelity, numerically intensive Direct Numerical Simulations of th… ▽ More Nonlinear distortion of infrasonic waves through atmospheres up to thermospheric altitudes govern large-range ground-level observations of explosive noise sources, causing large differences between the near and far field. Propagation modelling in this scenario to include realistic nonlinear effects has thus far been limited to high-fidelity, numerically intensive Direct Numerical Simulations of the Navier-Stokes equations, or nonlinear parabolic equations with restrictions on the mean flow Mach number. For the accurate modelling of nonlinear waveform synthesis through realistic atmospheric winds up to thermospheric altitudes, this work presents nonlinear wave equation analysis which results in in a Westervelt equation for weakly stratified, arbitrary Mach number atmospheres. This is intended to be used as a benchmark for model development and numerical analyses such that alternative low-fidelity numerical calculations to Direct Numerical Simulations can be sought. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.13836 [pdf, other]

doi 10.1016/j.conengprac.2024.105841

Machine learning for industrial sensing and control: A survey and practical perspective

Authors: Nathan P. Lawrence, Seshu Kumar Damarla, Jong Woo Kim, Aditya Tulsyan, Faraz Amjad, Kai Wang, Benoit Chachuat, Jong Min Lee, Biao Huang, R. Bhushan Gopaluni

Abstract: With the rise of deep learning, there has been renewed interest within the process industries to utilize data on large-scale nonlinear sensing and control problems. We identify key statistical and machine learning techniques that have seen practical success in the process industries. To do so, we start with hybrid modeling to provide a methodological framework underlying core application areas: so… ▽ More With the rise of deep learning, there has been renewed interest within the process industries to utilize data on large-scale nonlinear sensing and control problems. We identify key statistical and machine learning techniques that have seen practical success in the process industries. To do so, we start with hybrid modeling to provide a methodological framework underlying core application areas: soft sensing, process optimization, and control. Soft sensing contains a wealth of industrial applications of statistical and machine learning methods. We quantitatively identify research trends, allowing insight into the most successful techniques in practice. We consider two distinct flavors for data-driven optimization and control: hybrid modeling in conjunction with mathematical programming techniques and reinforcement learning. Throughout these application areas, we discuss their respective industrial requirements and challenges. A common challenge is the interpretability and efficiency of purely data-driven methods. This suggests a need to carefully balance deep learning techniques with domain knowledge. As a result, we highlight ways prior knowledge may be integrated into industrial machine learning applications. The treatment of methods, problems, and applications presented here is poised to inform and inspire practitioners and researchers to develop impactful data-driven sensing, optimization, and control solutions in the process industries. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: 48 pages

Journal ref: Control Engineering Practice 2024

arXiv:2401.00678 [pdf, other]

General-purpose foundation models for increased autonomy in robot-assisted surgery

Authors: Samuel Schmidgall, Ji Woong Kim, Alan Kuntz, Ahmed Ezzat Ghazi, Axel Krieger

Abstract: The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise toward being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown i… ▽ More The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise toward being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: (1) there is a lack of existing large-scale open-source data to train models, (2) it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue, and (3) surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This perspective article aims to provide a path toward increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision-language-action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide three guiding actions toward increased autonomy in robot-assisted surgery. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.01631 [pdf, other]

Cooperative vs. Teleoperation Control of the Steady Hand Eye Robot with Adaptive Sclera Force Control: A Comparative Study

Authors: Mojtaba Esfandiari, Ji Woong Kim, Botao Zhao, Golchehr Amirkhani, Muhammad Hadi, Peter Gehlbach, Russell H. Taylor, Iulian Iordachita

Abstract: A surgeon's physiological hand tremor can significantly impact the outcome of delicate and precise retinal surgery, such as retinal vein cannulation (RVC) and epiretinal membrane peeling. Robot-assisted eye surgery technology provides ophthalmologists with advanced capabilities such as hand tremor cancellation, hand motion scaling, and safety constraints that enable them to perform these otherwise… ▽ More A surgeon's physiological hand tremor can significantly impact the outcome of delicate and precise retinal surgery, such as retinal vein cannulation (RVC) and epiretinal membrane peeling. Robot-assisted eye surgery technology provides ophthalmologists with advanced capabilities such as hand tremor cancellation, hand motion scaling, and safety constraints that enable them to perform these otherwise challenging and high-risk surgeries with high precision and safety. Steady-Hand Eye Robot (SHER) with cooperative control mode can filter out surgeon's hand tremor, yet another important safety feature, that is, minimizing the contact force between the surgical instrument and sclera surface for avoiding tissue damage cannot be met in this control mode. Also, other capabilities, such as hand motion scaling and haptic feedback, require a teleoperation control framework. In this work, for the first time, we implemented a teleoperation control mode incorporated with an adaptive sclera force control algorithm using a PHANTOM Omni haptic device and a force-sensing surgical instrument equipped with Fiber Bragg Grating (FBG) sensors attached to the SHER 2.1 end-effector. This adaptive sclera force control algorithm allows the robot to dynamically minimize the tool-sclera contact force. Moreover, for the first time, we compared the performance of the proposed adaptive teleoperation mode with the cooperative mode by conducting a vessel-following experiment inside an eye phantom under a microscope. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2309.02706 [pdf, other]

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models

Authors: Guijin Son, Hanwool Lee, Suwan Kim, Huiseo Kim, Jaecheol Lee, Je Won Yeom, Jihyu Jung, Jung Woo Kim, Songseong Kim

Abstract: Large language models (LLMs) trained on massive corpora demonstrate impressive capabilities in a wide range of tasks. While there are ongoing efforts to adapt these models to languages beyond English, the attention given to their evaluation methodologies remains limited. Current multilingual benchmarks often rely on back translations or re-implementations of English tests, limiting their capacity… ▽ More Large language models (LLMs) trained on massive corpora demonstrate impressive capabilities in a wide range of tasks. While there are ongoing efforts to adapt these models to languages beyond English, the attention given to their evaluation methodologies remains limited. Current multilingual benchmarks often rely on back translations or re-implementations of English tests, limiting their capacity to capture unique cultural and linguistic nuances. To bridge this gap for the Korean language, we introduce the HAE-RAE Bench, a dataset curated to challenge models lacking Korean cultural and contextual depth. The dataset encompasses six downstream tasks across four domains: vocabulary, history, general knowledge, and reading comprehension. Unlike traditional evaluation suites focused on token and sequence classification or mathematical and logical reasoning, the HAE-RAE Bench emphasizes a model's aptitude for recalling Korean-specific knowledge and cultural contexts. Comparative analysis with prior Korean benchmarks indicates that the HAE-RAE Bench presents a greater challenge to non-Korean models by disturbing abilities and knowledge learned from English being transferred. △ Less

Submitted 20 March, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: Accepted at LREC-COLING 2024

arXiv:2308.07788 [pdf, ps, other]

GIST-AiTeR Speaker Diarization System for VoxCeleb Speaker Recognition Challenge (VoxSRC) 2023

Authors: Dongkeon Park, Ji Won Kim, Kang Ryeol Kim, Do Hyun Lee, Hong Kook Kim

Abstract: This report describes the submission system by the GIST-AiTeR team for the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23) Track 4. Our submission system focuses on implementing diverse speaker diarization (SD) techniques, including ResNet293 and MFA-Conformer with different combinations of segment and hop length. Then, those models are combined into an ensemble model. The ResNet293 and MF… ▽ More This report describes the submission system by the GIST-AiTeR team for the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23) Track 4. Our submission system focuses on implementing diverse speaker diarization (SD) techniques, including ResNet293 and MFA-Conformer with different combinations of segment and hop length. Then, those models are combined into an ensemble model. The ResNet293 and MFA-Conformer models exhibited the diarization error rates (DERs) of 3.65% and 3.83% on VAL46, respectively. The submitted ensemble model provided a DER of 3.50% on VAL46, and consequently, it achieved a DER of 4.88% on the VoxSRC-23 test set. △ Less

Submitted 25 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: VoxSRC 2023 Track4

arXiv:2306.17421 [pdf, other]

Micromanipulation in Surgery: Autonomous Needle Insertion Inside the Eye for Targeted Drug Delivery

Authors: Ji Woong Kim, Peiyao Zhang, Peter Gehlbach, Iulian Iordachita, Marin Kobilarov

Abstract: We consider a micromanipulation problem in eye surgery, specifically retinal vein cannulation (RVC). RVC involves inserting a microneedle into a retinal vein for the purpose of targeted drug delivery. The procedure requires accurately guiding a needle to a target vein and inserting it while avoiding damage to the surrounding tissues. RVC can be considered similar to the reach or push task studied… ▽ More We consider a micromanipulation problem in eye surgery, specifically retinal vein cannulation (RVC). RVC involves inserting a microneedle into a retinal vein for the purpose of targeted drug delivery. The procedure requires accurately guiding a needle to a target vein and inserting it while avoiding damage to the surrounding tissues. RVC can be considered similar to the reach or push task studied in robotics manipulation, but with additional constraints related to precision and safety while interacting with soft tissues. Prior works have mainly focused developing robotic hardware and sensors to enhance the surgeons' accuracy, leaving the automation of RVC largely unexplored. In this paper, we present the first autonomous strategy for RVC while relying on a minimal setup: a robotic arm, a needle, and monocular images. Our system exclusively relies on monocular vision to achieve precise navigation, gentle placement on the target vein, and safe insertion without causing tissue damage. Throughout the procedure, we employ machine learning for perception and to identify key surgical events such as needle-vein contact and vein punctures. Detecting these events guides our task and motion planning framework, which generates safe trajectories using model predictive control to complete the procedure. We validate our system through 24 successful autonomous trials on 4 cadaveric pig eyes. We show that our system can navigate to target veins within 22 micrometers of XY accuracy and under 35 seconds, and consistently puncture the target vein without causing tissue damage. Preliminary comparison to a human demonstrates the superior accuracy and reliability of our system. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: Experiment-oriented Locomotion and Manipulation Research, RSS 2023 workshop. arXiv admin note: text overlap with arXiv:2306.10133

arXiv:2306.14755 [pdf]

doi 10.1126/sciadv.adk3321

Emergent Tetragonality in a Fundamentally Orthorhombic Material

Authors: Anisha G. Singh, Maja D. Bachmann, Joshua J. Sanchez, Akshat Pandey, Aharon Kapitulnik, Jong Woo Kim, Philip J. Ryan, Steven A. Kivelson, Ian R. Fisher

Abstract: Symmetry plays a key role in determining the physical properties of materials. By Neumann's principle, the properties of a material are invariant under the symmetry operations of the space group to which the material belongs. Continuous phase transitions are associated with a spontaneous reduction in symmetry. (For example, the onset of ferromagnetism spontaneously breaks time reversal symmetry.)… ▽ More Symmetry plays a key role in determining the physical properties of materials. By Neumann's principle, the properties of a material are invariant under the symmetry operations of the space group to which the material belongs. Continuous phase transitions are associated with a spontaneous reduction in symmetry. (For example, the onset of ferromagnetism spontaneously breaks time reversal symmetry.) Much less common are examples where proximity to a continuous phase transition leads to an increase in symmetry. Here, we find an emergent tetragonal symmetry close to an apparent charge density wave (CDW) bicritical point in a fundamentally orthorhombic material, ErTe$_3$, for which the CDW phase transitions are tuned via anisotropic strain. The underlying structure of the material remains orthorhombic for all applied strains, including at the bicritical point, due to a glide plane symmetry in the crystal structure. Nevertheless, the observation of a divergence in the anisotropy of the in-plane elastoresistivity reveals an emergent electronic tetragonality near the bicritical point. △ Less

Submitted 29 May, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.10133 [pdf, other]

Deep Learning Guided Autonomous Surgery: Guiding Small Needles into Sub-Millimeter Scale Blood Vessels

Authors: Ji Woong Kim, Peiyao Zhang, Peter Gehlbach, Iulian Iordachita, Marin Kobilarov

Abstract: We propose a general strategy for autonomous guidance and insertion of a needle into a retinal blood vessel. The main challenges underpinning this task are the accurate placement of the needle-tip on the target vein and a careful needle insertion maneuver to avoid double-puncturing the vein, while dealing with challenging kinematic constraints and depth-estimation uncertainty. Following how surgeo… ▽ More We propose a general strategy for autonomous guidance and insertion of a needle into a retinal blood vessel. The main challenges underpinning this task are the accurate placement of the needle-tip on the target vein and a careful needle insertion maneuver to avoid double-puncturing the vein, while dealing with challenging kinematic constraints and depth-estimation uncertainty. Following how surgeons perform this task purely based on visual feedback, we develop a system which relies solely on \emph{monocular} visual cues by combining data-driven kinematic and contact estimation, visual-servoing, and model-based optimal control. By relying on both known kinematic models, as well as deep-learning based perception modules, the system can localize the surgical needle tip and detect needle-tissue interactions and venipuncture events. The outputs from these perception modules are then combined with a motion planning framework that uses visual-servoing and optimal control to cannulate the target vein, while respecting kinematic constraints that consider the safety of the procedure. We demonstrate that we can reliably and consistently perform needle insertion in the domain of retinal surgery, specifically in performing retinal vein cannulation. Using cadaveric pig eyes, we demonstrate that our system can navigate to target veins within 22$μm$ XY accuracy and perform the entire procedure in less than 35 seconds on average, and all 24 trials performed on 4 pig eyes were successful. Preliminary comparison study against a human operator show that our system is consistently more accurate and safer, especially during safety-critical needle-tissue interactions. To the best of the authors' knowledge, this work accomplishes a first demonstration of autonomous retinal vein cannulation at a clinically-relevant setting using animal tissues. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.10127 [pdf, other]

Towards Deep Learning Guided Autonomous Eye Surgery Using Microscope and iOCT Images

Authors: Ji Woong Kim, Shuwen Wei, Peiyao Zhang, Peter Gehlbach, Jin U. Kang, Iulian Iordachita, Marin Kobilarov

Abstract: Recent advancements in retinal surgery have paved the way for a modern operating room equipped with a surgical robot, a microscope, and intraoperative optical coherence tomography (iOCT)- a depth sensor widely used in retinal surgery. Integrating these tools raises the fundamental question of how to effectively combine them to enable surgical autonomy. In this work, we tackle this question by deve… ▽ More Recent advancements in retinal surgery have paved the way for a modern operating room equipped with a surgical robot, a microscope, and intraoperative optical coherence tomography (iOCT)- a depth sensor widely used in retinal surgery. Integrating these tools raises the fundamental question of how to effectively combine them to enable surgical autonomy. In this work, we tackle this question by developing a unified framework that facilitates real-time autonomous surgical workflows leveraging these devices. The system features: (1) a novel imaging system that integrates the microscope and iOCT in real-time by dynamically tracking the surgical instrument via a small iOCT scanning region, providing real-time depth feedback; (2) implementation of convolutional neural networks (CNN) that automatically detect and segment task-relevant information for surgical autonomy; (3) intuitive selection of goal waypoints within both the microscope and iOCT views through simple mouse-click interactions; and (4) integration of model predictive control (MPC) for trajectory generation, ensuring patient safety by implementing safety-related kinematic constraints. The system's utility is demonstrated by automating subretinal injection (SI), a challenging procedure with high accuracy and depth perception requirements. We validate our system by conducting 30 successful SI trials on pig eyes, achieving mean needle insertion accuracy of 26 micrometers to various subretinal goals and mean duration of 55 seconds. Preliminary comparisons to a human operator performing SI in robot-assisted mode highlight the enhanced safety of our system. Project website is here: https://sites.google.com/view/eyesurgerymicroscopeoct/home △ Less

Submitted 27 July, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: pending submission to a journal

arXiv:2306.06461 [pdf]

Semi-supervsied Learning-based Sound Event Detection using Freuqency Dynamic Convolution with Large Kernel Attention for DCASE Challenge 2023 Task 4

Authors: Ji Won Kim, Sang Won Son, Yoonah Song, Hong Kook Kim, Il Hoon Song, Jeong Eun Lim

Abstract: This report proposes a frequency dynamic convolution (FDY) with a large kernel attention (LKA)-convolutional recurrent neural network (CRNN) with a pre-trained bidirectional encoder representation from audio transformers (BEATs) embedding-based sound event detection (SED) model that employs a mean-teacher and pseudo-label approach to address the challenge of limited labeled data for DCASE 2023 Tas… ▽ More This report proposes a frequency dynamic convolution (FDY) with a large kernel attention (LKA)-convolutional recurrent neural network (CRNN) with a pre-trained bidirectional encoder representation from audio transformers (BEATs) embedding-based sound event detection (SED) model that employs a mean-teacher and pseudo-label approach to address the challenge of limited labeled data for DCASE 2023 Task 4. The proposed FDY with LKA integrates the FDY and LKA module to effectively capture time-frequency patterns, long-term dependencies, and high-level semantic information in audio signals. The proposed FDY with LKA-CRNN with a BEATs embedding network is initially trained on the entire DCASE 2023 Task 4 dataset using the mean-teacher approach, generating pseudo-labels for weakly labeled, unlabeled, and the AudioSet. Subsequently, the proposed SED model is retrained using the same pseudo-label approach. A subset of these models is selected for submission, demonstrating superior F1-scores and polyphonic SED score performance on the DCASE 2023 Challenge Task 4 validation dataset. △ Less

Submitted 10 June, 2023; originally announced June 2023.

Comments: DCASE 2023 Challenge Task 4A, 5 pages

arXiv:2305.12850 [pdf, other]

doi 10.1109/TAC.2024.3413573

Variance Decay Property for Filter Stability

Authors: Jin Won Kim, Prashant G. Mehta

Abstract: This paper is concerned with the problem of nonlinear (stochastic) filter stability of a hidden Markov model (HMM) with white noise observations. A contribution is the variance decay property which is used to conclude filter stability. For this purpose, a new notion of the Poincaré inequality (PI) is introduced for the nonlinear filter. PI is related to both the ergodicity of the Markov process as… ▽ More This paper is concerned with the problem of nonlinear (stochastic) filter stability of a hidden Markov model (HMM) with white noise observations. A contribution is the variance decay property which is used to conclude filter stability. For this purpose, a new notion of the Poincaré inequality (PI) is introduced for the nonlinear filter. PI is related to both the ergodicity of the Markov process as well as the observability of the HMM. The proofs are based upon a recently discovered minimum variance duality which is used to transform the nonlinear filtering problem into a stochastic optimal control problem for a backward stochastic differential equation (BSDE). △ Less

Submitted 26 June, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: 16 pages

Journal ref: IEEE Transactions on Automatic Control, 2024

arXiv:2304.12727 [pdf, ps, other]

On forward-backward SDE approaches to continuous-time minimum variance estimation

Authors: Jin Won Kim, Sebastian Reich

Abstract: The work of Kalman and Bucy has established a duality between filtering and optimal estimation in the context of time-continuous linear systems. This duality has recently been extended to time-continuous nonlinear systems in terms of an optimization problem constrained by a backward stochastic partial differential equation. Here we revisit this problem from the perspective of appropriate forward-b… ▽ More The work of Kalman and Bucy has established a duality between filtering and optimal estimation in the context of time-continuous linear systems. This duality has recently been extended to time-continuous nonlinear systems in terms of an optimization problem constrained by a backward stochastic partial differential equation. Here we revisit this problem from the perspective of appropriate forward-backward stochastic differential equations. This approach sheds new light on the estimation problem and provides a unifying perspective. It is also demonstrated that certain formulations of the estimation problem lead to deterministic formulations similar to the linear Gaussian case as originally investigated by Kalman and Bucy. Finally, optimal control of partially observed diffusion processes is discussed as an application of the proposed estimators. △ Less

Submitted 14 August, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

MSC Class: 90E10; 90E11; 60G35; 62M20; 93E11; 93E20

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2301.11839 [pdf, other]

doi 10.1109/ICRA48891.2023.10161151

Autonomous Needle Navigation in Retinal Microsurgery: Evaluation in ex vivo Porcine Eyes

Authors: Peiyao Zhang, Ji Woong Kim, Peter Gehlbach, Iulian Iordachita, Marin Kobilarov

Abstract: Important challenges in retinal microsurgery include prolonged operating time, inadequate force feedback, and poor depth perception due to a constrained top-down view of the surgery. The introduction of robot-assisted technology could potentially deal with such challenges and improve the surgeon's performance. Motivated by such challenges, this work develops a strategy for autonomous needle naviga… ▽ More Important challenges in retinal microsurgery include prolonged operating time, inadequate force feedback, and poor depth perception due to a constrained top-down view of the surgery. The introduction of robot-assisted technology could potentially deal with such challenges and improve the surgeon's performance. Motivated by such challenges, this work develops a strategy for autonomous needle navigation in retinal microsurgery aiming to achieve precise manipulation, reduced end-to-end surgery time, and enhanced safety. This is accomplished through real-time geometry estimation and chance-constrained Model Predictive Control (MPC) resulting in high positional accuracy while keeping scleral forces within a safe level. The robotic system is validated using both open-sky and intact (with lens and partial vitreous removal) ex vivo porcine eyes. The experimental results demonstrate that the generation of safe control trajectories is robust to small motions associated with head drift. The mean navigation time and scleral force for MPC navigation experiments are 7.208 s and 11.97 mN, which can be considered efficient and well within acceptable safe limits. The resulting mean errors along lateral directions of the retina are below 0.06 mm, which is below the typical hand tremor amplitude in retinal microsurgery. △ Less

Submitted 27 January, 2023; originally announced January 2023.

arXiv:2301.02064 [pdf, other]

Single-round Self-supervised Distributed Learning using Vision Transformer

Authors: Sangjoon Park, Ik-Jae Lee, Jun Won Kim, Jong Chul Ye

Abstract: Despite the recent success of deep learning in the field of medicine, the issue of data scarcity is exacerbated by concerns about privacy and data ownership. Distributed learning approaches, including federated learning, have been investigated to address these issues. However, they are hindered by the need for cumbersome communication overheads and weaknesses in privacy protection. To tackle these… ▽ More Despite the recent success of deep learning in the field of medicine, the issue of data scarcity is exacerbated by concerns about privacy and data ownership. Distributed learning approaches, including federated learning, have been investigated to address these issues. However, they are hindered by the need for cumbersome communication overheads and weaknesses in privacy protection. To tackle these challenges, we propose a self-supervised masked sampling distillation method for the vision transformer. This method can be implemented without continuous communication and can enhance privacy by utilizing a vision transformer-specific encryption technique. We conducted extensive experiments on two different tasks, which demonstrated the effectiveness of our method. We achieved superior performance compared to the existing distributed learning strategy as well as the fine-tuning only baseline. Furthermore, since the self-supervised model created using our proposed method can achieve a general semantic understanding of the image, we demonstrate its potential as a task-agnostic self-supervised foundation model for various downstream tasks, thereby expanding its applicability in the medical domain. △ Less

Submitted 15 April, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

arXiv:2212.04356 [pdf, other]

Robust Speech Recognition via Large-Scale Weak Supervision

Authors: Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever

Abstract: We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuni… ▽ More We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing. △ Less

Submitted 6 December, 2022; originally announced December 2022.

arXiv:2210.05142 [pdf, ps, other]

doi 10.1016/j.automatica.2023.111371

A Design Method of Distributed Algorithms via Discrete-time Blended Dynamics Theorem

Authors: Jeong Woo Kim, Jin Gyu Lee, Donggil Lee, Hyungbo Shim

Abstract: We develop a discrete-time version of the blended dynamics theorem for the use of designing distributed computation algorithms. The blended dynamics theorem enables to predict the behavior of heterogeneous multi-agent systems. Therefore, once we get a blended dynamics for a particular computational task, design idea of node dynamics for individual heterogeneous agents can easily occur. In the cont… ▽ More We develop a discrete-time version of the blended dynamics theorem for the use of designing distributed computation algorithms. The blended dynamics theorem enables to predict the behavior of heterogeneous multi-agent systems. Therefore, once we get a blended dynamics for a particular computational task, design idea of node dynamics for individual heterogeneous agents can easily occur. In the continuous-time case, prediction by blended dynamics was enabled by high coupling gain among neighboring agents. In the discrete-time case, we propose an equivalent action, which we call multi-step coupling in this paper. Compared to the continuous-time case, the blended dynamics can have more variety depending on the coupling matrix. This benefit is demonstrated with three applications; distributed estimation of network size, distributed computation of the PageRank, and distributed computation of the degree sequence of a graph, which correspond to the coupling by doubly-stochastic, column-stochastic, and row-stochastic matrices, respectively. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Journal ref: Automatica, vol. 159, pp. 111371, Jan 2024

arXiv:2209.11123 [pdf, other]

doi 10.1016/j.ifacol.2020.12.126

Modern Machine Learning Tools for Monitoring and Control of Industrial Processes: A Survey

Authors: R. Bhushan Gopaluni, Aditya Tulsyan, Benoit Chachuat, Biao Huang, Jong Min Lee, Faraz Amjad, Seshu Kumar Damarla, Jong Woo Kim, Nathan P. Lawrence

Abstract: Over the last ten years, we have seen a significant increase in industrial data, tremendous improvement in computational power, and major theoretical advances in machine learning. This opens up an opportunity to use modern machine learning tools on large-scale nonlinear monitoring and control problems. This article provides a survey of recent results with applications in the process industry. Over the last ten years, we have seen a significant increase in industrial data, tremendous improvement in computational power, and major theoretical advances in machine learning. This opens up an opportunity to use modern machine learning tools on large-scale nonlinear monitoring and control problems. This article provides a survey of recent results with applications in the process industry. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: IFAC World Congress 2020

arXiv:2209.10357 [pdf, other]

GIST-AiTeR System for the Diarization Task of the 2022 VoxCeleb Speaker Recognition Challenge

Authors: Dongkeon Park, Yechan Yu, Kyeong Wan Park, Ji Won Kim, Hong Kook Kim

Abstract: This report describes the submission system of the GIST-AiTeR team at the 2022 VoxCeleb Speaker Recognition Challenge (VoxSRC) Track 4. Our system mainly includes speech enhancement, voice activity detection , multi-scaled speaker embedding, probabilistic linear discriminant analysis-based speaker clustering, and overlapped speech detection models. We first construct four different diarization sys… ▽ More This report describes the submission system of the GIST-AiTeR team at the 2022 VoxCeleb Speaker Recognition Challenge (VoxSRC) Track 4. Our system mainly includes speech enhancement, voice activity detection , multi-scaled speaker embedding, probabilistic linear discriminant analysis-based speaker clustering, and overlapped speech detection models. We first construct four different diarization systems according to different model combinations with the best experimental efforts. Our final submission is an ensemble system of all the four systems and achieves a diarization error rate of 5.12% on the challenge evaluation set, ranked third at the diarization track of the challenge. △ Less

Submitted 6 October, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: 2022 VoxSRC Track4

arXiv:2209.01083 [pdf, other]

When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development

Authors: Nghia Duong-Trung, Stefan Born, Jong Woo Kim, Marie-Therese Schermeyer, Katharina Paulick, Maxim Borisyak, Mariano Nicolas Cruz-Bournazou, Thorben Werner, Randolf Scholz, Lars Schmidt-Thieme, Peter Neubauer, Ernesto Martinez

Abstract: Machine learning (ML) is becoming increasingly crucial in many fields of engineering but has not yet played out its full potential in bioprocess engineering. While experimentation has been accelerated by increasing levels of lab automation, experimental planning and data modeling are still largerly depend on human intervention. ML can be seen as a set of tools that contribute to the automation of… ▽ More Machine learning (ML) is becoming increasingly crucial in many fields of engineering but has not yet played out its full potential in bioprocess engineering. While experimentation has been accelerated by increasing levels of lab automation, experimental planning and data modeling are still largerly depend on human intervention. ML can be seen as a set of tools that contribute to the automation of the whole experimental cycle, including model building and practical planning, thus allowing human experts to focus on the more demanding and overarching cognitive tasks. First, probabilistic programming is used for the autonomous building of predictive models. Second, machine learning automatically assesses alternative decisions by planning experiments to test hypotheses and conducting investigations to gather informative data that focus on model selection based on the uncertainty of model predictions. This review provides a comprehensive overview of ML-based automation in bioprocess development. On the one hand, the biotech and bioengineering community should be aware of the potential and, most importantly, the limitation of existing ML solutions for their application in biotechnology and biopharma. On the other hand, it is essential to identify the missing links to enable the easy implementation of ML and Artificial Intelligence (AI) tools in valuable solutions for the bio-community. △ Less

Submitted 1 November, 2022; v1 submitted 2 September, 2022; originally announced September 2022.

arXiv:2208.09183 [pdf]

Improved Image Classification with Token Fusion

Authors: Keong Hun Choi, Jin Woo Kim, Yao Wang, Jong Eun Ha

Abstract: In this paper, we propose a method using the fusion of CNN and transformer structure to improve image classification performance. In the case of CNN, information about a local area on an image can be extracted well, but there is a limit to the extraction of global information. On the other hand, the transformer has an advantage in relatively global extraction, but has a disadvantage in that it req… ▽ More In this paper, we propose a method using the fusion of CNN and transformer structure to improve image classification performance. In the case of CNN, information about a local area on an image can be extracted well, but there is a limit to the extraction of global information. On the other hand, the transformer has an advantage in relatively global extraction, but has a disadvantage in that it requires a lot of memory for local feature value extraction. In the case of an image, it is converted into a feature map through CNN, and each feature map's pixel is considered a token. At the same time, the image is divided into patch areas and then fused with the transformer method that views them as tokens. For the fusion of tokens with two different characteristics, we propose three methods: (1) late token fusion with parallel structure, (2) early token fusion, (3) token fusion in a layer by layer. In an experiment using ImageNet 1k, the proposed method shows the best classification performance. △ Less

Submitted 19 August, 2022; originally announced August 2022.

arXiv:2208.06587 [pdf, other]

Duality for Nonlinear Filtering II: Optimal Control

Authors: Jin Won Kim, Prashant G. Mehta

Abstract: This paper is concerned with the development and use of duality theory for a nonlinear filtering model with white noise observations. The main contribution of this paper is to introduce a stochastic optimal control problem as a dual to the nonlinear filtering problem. The mathematical statement of the dual relationship between the two problems is given in the form of a duality principle. The const… ▽ More This paper is concerned with the development and use of duality theory for a nonlinear filtering model with white noise observations. The main contribution of this paper is to introduce a stochastic optimal control problem as a dual to the nonlinear filtering problem. The mathematical statement of the dual relationship between the two problems is given in the form of a duality principle. The constraint for the optimal control problem is the backward stochastic differential equation (BSDE) introduced in the companion paper. The optimal control solution is obtained from an application of the maximum principle, and subsequently used to derive the equation of the nonlinear filter. The proposed duality is shown to be an exact extension of the classical Kalman-Bucy duality, and different from other types of optimal control and variational formulations given in literature. △ Less

Submitted 13 August, 2022; originally announced August 2022.

arXiv:2208.06586 [pdf, other]

Duality for Nonlinear Filtering I: Observability

Authors: Jin Won Kim, Prashant G. Mehta

Abstract: This paper is concerned with the development and use of duality theory for a hidden Markov model (HMM) with white noise observations. The main contribution of this work is to introduce a backward stochastic differential equation (BSDE) as a dual control system. A key outcome is that stochastic observability (resp. detectability) of the HMM is expressed in dual terms: as controllability (resp. stab… ▽ More This paper is concerned with the development and use of duality theory for a hidden Markov model (HMM) with white noise observations. The main contribution of this work is to introduce a backward stochastic differential equation (BSDE) as a dual control system. A key outcome is that stochastic observability (resp. detectability) of the HMM is expressed in dual terms: as controllability (resp. stabilizability) of the dual control system. All aspects of controllability, namely, definition of controllable space and controllability gramian, along with their properties and explicit formulae, are discussed. The proposed duality is shown to be an exact extension of the classical duality in linear systems theory. One can then relate and compare the linear and the nonlinear systems. A side-by-side summary of this relationship is given in a tabular form (Table~II). △ Less

Submitted 13 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2207.07709

arXiv:2207.07709 [pdf, other]

Duality for nonlinear filtering

Authors: Jin Won Kim

Abstract: This thesis is concerned with the stochastic filtering problem for a hidden Markov model (HMM) with the white noise observation model. For this filtering problem, we make three types of original contributions: (1) dual controllability characterization of stochastic observability, (2) dual minimum variance optimal control formulation of the stochastic filtering problem, and (3) filter stability ana… ▽ More This thesis is concerned with the stochastic filtering problem for a hidden Markov model (HMM) with the white noise observation model. For this filtering problem, we make three types of original contributions: (1) dual controllability characterization of stochastic observability, (2) dual minimum variance optimal control formulation of the stochastic filtering problem, and (3) filter stability analysis using the dual optimal control formulation. For the first contribution of this thesis, a backward stochastic differential equation (BSDE) is proposed as the dual control system. The observability (detectability) of the HMM is shown to be equivalent to the controllability (stabilizability) of the dual control system. For the linear-Gaussian model, the dual relationship reduces to classical duality in linear systems theory. The second contribution is to transform the minimum variance estimation problem into an optimal control problem. The constraint is given by the dual control system. The optimal solution is obtained via two approaches: (1) by an application of maximum principle and (2) by the martingale characterization of the optimal value. The optimal solution is used to derive the nonlinear filter. The third contribution is to carry out filter stability analysis by studying the dual optimal control problem. Two approaches are presented through Chapters 7 and 8. In Chapter 7, conditional Poincaré inequality (PI) is introduced. Based on conditional PI, various convergence rates are obtained and related to literature. In Chapter 8, the stabilizability of the dual control system is shown to be a necessary and sufficient condition for filter stability on certain finite state space model. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Comments: Ph.D. Thesis of the author

arXiv:2206.02222 [pdf, other]

How does a Rational Agent Act in an Epidemic?

Authors: S. Yagiz Olmez, Shubham Aggarwal, Jin Won Kim, Erik Miehling, Tamer Başar, Matthew West, Prashant G. Mehta

Abstract: Evolution of disease in a large population is a function of the top-down policy measures from a centralized planner, as well as the self-interested decisions (to be socially active) of individual agents in a large heterogeneous population. This paper is concerned with understanding the latter based on a mean-field type optimal control model. Specifically, the model is used to investigate the role… ▽ More Evolution of disease in a large population is a function of the top-down policy measures from a centralized planner, as well as the self-interested decisions (to be socially active) of individual agents in a large heterogeneous population. This paper is concerned with understanding the latter based on a mean-field type optimal control model. Specifically, the model is used to investigate the role of partial information on an agent's decision-making, and study the impact of such decisions by a large number of agents on the spread of the virus in the population. The motivation comes from the presymptomatic and asymptomatic spread of the COVID-19 virus where an agent unwittingly spreads the virus. We show that even in a setting with fully rational agents, limited information on the viral state can result in an epidemic growth. △ Less

Submitted 5 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2111.10422

arXiv:2205.06468 [pdf, other]

Monocular Human Digitization via Implicit Re-projection Networks

Authors: Min-Gyu Park, Ju-Mi Kang, Je Woo Kim, Ju Hong Yoon

Abstract: We present an approach to generating 3D human models from images. The key to our framework is that we predict double-sided orthographic depth maps and color images from a single perspective projected image. Our framework consists of three networks. The first network predicts normal maps to recover geometric details such as wrinkles in the clothes and facial regions. The second network predicts sha… ▽ More We present an approach to generating 3D human models from images. The key to our framework is that we predict double-sided orthographic depth maps and color images from a single perspective projected image. Our framework consists of three networks. The first network predicts normal maps to recover geometric details such as wrinkles in the clothes and facial regions. The second network predicts shade-removed images for the front and back views by utilizing the predicted normal maps. The last multi-headed network takes both normal maps and shade-free images and predicts depth maps while selectively fusing photometric and geometric information through multi-headed attention gates. Experimental results demonstrate that our method shows visually plausible results and competitive performance in terms of various evaluation metrics over state-of-the-art methods. △ Less

Submitted 15 May, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

Comments: Presented at CVRRW (AI for Content Creation workshop) 2022

arXiv:2204.09176 [pdf]

doi 10.1103/PhysRevLett.129.027203

Controllable emergent spatial spin modulation in Sr2IrO4 by in situ shear strain

Authors: S. Pandey, H. Zhang, J. Yang, A. F. May, J. Sanchez, Z. Liu, J. -H. Chu, J. W. Kim, P. J. Ryan, H. D. Zhou, J. Liu

Abstract: Symmetric anisotropic interaction can be ferromagnetic and antiferromagnetic at the same time but for different crystallographic axes. We show that inducing competition of anisotropic interactions of orthogonal irreducible representations represents a general route to obtain new exotic magnetic states. We demonstrate it here by observing the emergence of a continuously tunable 12-layer spatial spi… ▽ More Symmetric anisotropic interaction can be ferromagnetic and antiferromagnetic at the same time but for different crystallographic axes. We show that inducing competition of anisotropic interactions of orthogonal irreducible representations represents a general route to obtain new exotic magnetic states. We demonstrate it here by observing the emergence of a continuously tunable 12-layer spatial spin modulation when distorting the square lattice planes in the quasi-2D antiferromagnetic Sr2IrO4 under in situ shear strain. This translation-symmetry-breaking phase is a result of an unusual strain activated anisotropic interaction which is at the 4th order and competing with the inherent quadratic anisotropic interaction. Such a mechanism of competing anisotropy is distinct from that among the ferromagnetic, antiferromagnetic, and/or the Dzyaloshinskii-Moriya interactions, and it could be widely applicable and highly controllable in low dimensional magnets. △ Less

Submitted 19 April, 2022; originally announced April 2022.

arXiv:2203.07211 [pdf, other]

Model predictive control and moving horizon estimation for adaptive optimal bolus feeding in high-throughput cultivation of \textit{E. coli}

Authors: Jong Woo Kim, Niels Krausch, Judit Aizpuru, Tilman Barz, Sergio Lucia, Peter Neubauer, Mariano Nicolas Cruz Bournazou

Abstract: We discuss the application of a nonlinear model predictive control (MPC) and a moving horizon estimation (MHE) to achieve an optimal operation of \textit{E. coli} fed-batch cultivations with intermittent bolus feeding. 24 parallel experiments were considered in a high-throughput microbioreactor platform at a 10 mL scale. The robotic island in question can run up to 48 fed-batch processes in parall… ▽ More We discuss the application of a nonlinear model predictive control (MPC) and a moving horizon estimation (MHE) to achieve an optimal operation of \textit{E. coli} fed-batch cultivations with intermittent bolus feeding. 24 parallel experiments were considered in a high-throughput microbioreactor platform at a 10 mL scale. The robotic island in question can run up to 48 fed-batch processes in parallel with automated liquid handling and online and at-line analytics. The implementation of the model-based monitoring and control framework reveals that there are mainly three challenges that need to be addressed; First, the inputs are given in an instantaneous pulsed form by bolus injections, second, online and at-line measurement frequencies are severely imbalanced, and third, optimization for the distinctive multiple reactors can be either parallelized or integrated. We address these challenges by incorporating the concept of impulsive control systems, formulating multi-rate MHE with identifiability analysis, and suggesting criteria for deciding the reactor configuration. In this study, we present the key elements and background theory of the implementation with \textit{in silico} simulations for bacterial fed-batch cultivation. △ Less

Submitted 6 February, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

Showing 1–50 of 136 results for author: Kim, J W