Search | arXiv e-print repository

Approaching the quantum-limited precision in frequency-comb-based spectral interferometry for length measurements

Authors: Yoon-Soo Jang, Heulbi Ahn, Sunghoon Eom, Jungjae Park, Jonghan Jin

Abstract: Over the last two decades, frequency combs have brought breakthroughs in length metrology with traceability to length standards. In particular, frequency-comb-based spectral interferometry is regarded as a promising technology for next-generation length standards. However, to achieve this, the nanometer-level precision inherent in laser interferometer is required. Here, we report distance measurem… ▽ More Over the last two decades, frequency combs have brought breakthroughs in length metrology with traceability to length standards. In particular, frequency-comb-based spectral interferometry is regarded as a promising technology for next-generation length standards. However, to achieve this, the nanometer-level precision inherent in laser interferometer is required. Here, we report distance measurements by a frequency-comb-based spectral interferometry with sub-nm precision close to a standard quantum limit. The measurement precision was confirmed as 0.67 nm at an averaging time of 25 us. The measurement sensitivity was found to be 4.5 10-12m/Hz1/2, close to the quantum-limit. As a practical example of observing precise physical phenomena, we demonstrated measurements of acoustic-wave-induced vibration and laser eavesdropping. Our study will be an important step toward the practical realization of upcoming length standards. △ Less

Submitted 17 January, 2025; originally announced January 2025.

arXiv:2408.08591 [pdf, other]

Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation

Authors: Tri Ton, Ji Woo Hong, SooHwan Eom, Jun Yeop Shim, Junyeong Kim, Chang D. Yoo

Abstract: Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate class-agnostic object mask proposals. Previous efforts predominantly focused on enhancing 3D mask propos… ▽ More Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate class-agnostic object mask proposals. Previous efforts predominantly focused on enhancing 3D mask proposal models; consequently, the information that could come from 2D association to 3D was not fully exploited. This bias towards 3D data, while effective for familiar indoor objects, limits the system's adaptability to new and varied object types, where 2D models offer greater utility. Addressing this gap, we introduce Zero-Shot Dual-Path Integration Framework that equally values the contributions of both 3D and 2D modalities. Our framework comprises three components: 3D pathway, 2D pathway, and Dual-Path Integration. 3D pathway generates spatially accurate class-agnostic mask proposals of common indoor objects from 3D point cloud data using a pre-trained 3D model, while 2D pathway utilizes pre-trained open-vocabulary instance segmentation model to identify a diverse array of object proposals from multi-view RGB-D images. In Dual-Path Integration, our Conditional Integration process, which operates in two stages, filters and merges the proposals from both pathways adaptively. This process harmonizes output proposals to enhance segmentation capabilities. Our framework, utilizing pre-trained models in a zero-shot manner, is model-agnostic and demonstrates superior performance on both seen and unseen data, as evidenced by comprehensive evaluations on the ScanNet200 and qualitative results on ARKitScenes datasets. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: OpenSUN 3D: 2nd Workshop on Open-Vocabulary 3D Scene Understanding (CVPR 2024)

arXiv:2407.16574 [pdf, other]

TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Authors: Eunseop Yoon, Hee Suk Yoon, SooHwan Eom, Gunsoo Han, Daniel Wontae Nam, Daejin Jo, Kyoung-Woon On, Mark A. Hasegawa-Johnson, Sungwoong Kim, Chang D. Yoo

Abstract: Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tri… ▽ More Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tried to provide token-level (i.e., dense) rewards for each individual token, these typically rely on predefined discrete reward values (e.g., positive: +1, negative: -1, neutral: 0), failing to account for varying degrees of preference inherent to each token. To address this limitation, we introduce TLCR (Token-Level Continuous Reward) for RLHF, which incorporates a discriminator trained to distinguish positive and negative tokens, and the confidence of the discriminator is used to assign continuous rewards to each token considering the context. Extensive experiments show that our proposed TLCR leads to consistent performance improvements over previous sequence-level or token-level discrete rewards on open-ended generation benchmarks. △ Less

Submitted 8 December, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

Comments: ACL2024 Findings

arXiv:2403.11578 [pdf, other]

AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition

Authors: SooHwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim, Mark Hasegawa-Johnson, Chang D. Yoo

Abstract: In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence mapping. While earlier efforts have tried to combine the CTC loss with an entropy maximization regulariz… ▽ More In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence mapping. While earlier efforts have tried to combine the CTC loss with an entropy maximization regularization term to mitigate this issue, they employed a constant weighting term on the regularization during the training, which we find may not be optimal. In this work, we introduce Adaptive Maximum Entropy Regularization (AdaMER), a technique that can modulate the impact of entropy regularization throughout the training process. This approach not only refines ASR model training but ensures that as training proceeds, predictions display the desired model confidence. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2308.15273 [pdf, other]

Cross-Modal Retrieval Meets Inference:Improving Zero-Shot Classification with Cross-Modal Retrieval

Authors: Seongha Eom, Namgyu Ho, Jaehoon Oh, Se-Young Yun

Abstract: Contrastive language-image pre-training (CLIP) has demonstrated remarkable zero-shot classification ability, namely image classification using novel text labels. Existing works have attempted to enhance CLIP by fine-tuning on downstream tasks, but these have inadvertently led to performance degradation on unseen classes, thus harming zero-shot generalization. This paper aims to address this challe… ▽ More Contrastive language-image pre-training (CLIP) has demonstrated remarkable zero-shot classification ability, namely image classification using novel text labels. Existing works have attempted to enhance CLIP by fine-tuning on downstream tasks, but these have inadvertently led to performance degradation on unseen classes, thus harming zero-shot generalization. This paper aims to address this challenge by leveraging readily available image-text pairs from an external dataset for cross-modal guidance during inference. To this end, we propose X-MoRe, a novel inference method comprising two key steps: (1) cross-modal retrieval and (2) modal-confidence-based ensemble. Given a query image, we harness the power of CLIP's cross-modal representations to retrieve relevant textual information from an external image-text pair dataset. Then, we assign higher weights to the more reliable modality between the original query image and retrieved text, contributing to the final prediction. X-MoRe demonstrates robust performance across a diverse set of tasks without the need for additional training, showcasing the effectiveness of utilizing cross-modal features to maximize CLIP's zero-shot ability. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.08442 [pdf, other]

Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

Authors: Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo

Abstract: Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or parag… ▽ More Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or paragraph-level G2P can improve usability in real-world applications as it is better suited to perform on heteronyms and linking sounds between words, we find that using ByT5 for these scenarios is nontrivial. Since ByT5 operates on the character level, it requires longer decoding steps, which deteriorates the performance due to the exposure bias commonly observed in auto-regressive generation models. This paper shows that the performance of sentence-level and paragraph-level G2P can be improved by mitigating such exposure bias using our proposed loss-based sampling method. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: INTERSPEECH 2023

arXiv:2307.15302 [pdf]

doi 10.1016/j.optlastec.2023.110324

Programmable spectral shaping to improve the measurement precision of frequency comb mode-resolved spectral interferometric ranging

Authors: Yoon-Soo Jang, Sunghoon Eom, Jungjae Park, Jonghan Jin

Abstract: Comb-mode resolved spectral domain interferometry (CORE-SDI), which is capable of measuring length of kilometers or more with precision on the order of nanometers, is considered to be a promising technology for next-generation length standards, replacing laser displacement interferometers. In this study, we aim to improve the measurement precision of CORE-SDI using programmable spectral shaping. W… ▽ More Comb-mode resolved spectral domain interferometry (CORE-SDI), which is capable of measuring length of kilometers or more with precision on the order of nanometers, is considered to be a promising technology for next-generation length standards, replacing laser displacement interferometers. In this study, we aim to improve the measurement precision of CORE-SDI using programmable spectral shaping. We report the generation of effectively broad and symmetric light sources through the programmable spectral shaping. The light source used here was generated by the spectrally-broadened electro-optic comb with a repetition rate of 17.5 GHz. Through the programmable spectral shaping, the optical spectrum was flattened within 1 dB, resulting in a square-shaped optical spectrum. As a result, the 3-dB spectral width was extended from 1.15 THz to 6.7 THz. We performed a comparison between the measurement results of various spectrum shapes. We confirmed an improvement in the measurement precision from 69 nm to 6 nm, which was also corroborated by numerical simulations. We believe that this study on enhancing the measurement precision of CORE-SDI through the proposed spectral shaping will make a significant contribution to reducing the measurement uncertainty of future CORE-SDI systems, thereby advancing the development of next-generation length standards. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 22 pages, 10 figures

Journal ref: Optics & Laser Technology 170, 110324, 2024

arXiv:2305.03864 [pdf]

Revisiting contrast mechanism of lateral piezoresponse force microscopy

Authors: Jaegyu Kim, Seongwoo Cho, Jiwon Yeom, Seongmun Eom, Seungbum Hong

Abstract: Piezoresponse force microscopy (PFM) has been widely used for nanoscale analysis of piezoelectric properties and ferroelectric domains. Although PFM is useful because of its simple and nondestructive features, PFM measurements can be obscured by non-piezoelectric effects that could affect the PFM signals or lead to ferroelectric-like behaviors in non-ferroelectric materials. Many researches have a… ▽ More Piezoresponse force microscopy (PFM) has been widely used for nanoscale analysis of piezoelectric properties and ferroelectric domains. Although PFM is useful because of its simple and nondestructive features, PFM measurements can be obscured by non-piezoelectric effects that could affect the PFM signals or lead to ferroelectric-like behaviors in non-ferroelectric materials. Many researches have addressed related technical issues, but they have primarily focused on vertical PFM. Here, we investigate significant discrepancies of lateral PFM signals between the trace and the retrace scans, which are proportional to the scan angle and the cantilever lateral tilting discrepancy. The discrepancies of PFM signals are analyzed based on intrinsic and extrinsic components, including out-of-plane piezoresponse, electrostatic force, and other factors. Our research will contribute to the accurate PFM measurements for visualization of ferroelectric in-plane polarization distributions. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: 38 pages, 18 figures

arXiv:2303.12968 [pdf, other]

Ambient Intelligence for Next-Generation AR

Authors: Tim Scargill, Sangjun Eom, Ying Chen, Maria Gorlatova

Abstract: Next-generation augmented reality (AR) promises a high degree of context-awareness - a detailed knowledge of the environmental, user, social and system conditions in which an AR experience takes place. This will facilitate both the closer integration of the real and virtual worlds, and the provision of context-specific content or adaptations. However, environmental awareness in particular is chall… ▽ More Next-generation augmented reality (AR) promises a high degree of context-awareness - a detailed knowledge of the environmental, user, social and system conditions in which an AR experience takes place. This will facilitate both the closer integration of the real and virtual worlds, and the provision of context-specific content or adaptations. However, environmental awareness in particular is challenging to achieve using AR devices alone; not only are these mobile devices' view of an environment spatially and temporally limited, but the data obtained by onboard sensors is frequently inaccurate and incomplete. This, combined with the fact that many aspects of core AR functionality and user experiences are impacted by properties of the real environment, motivates the use of ambient IoT devices, wireless sensors and actuators placed in the surrounding environment, for the measurement and optimization of environment properties. In this book chapter we categorize and examine the wide variety of ways in which these IoT sensors and actuators can support or enhance AR experiences, including quantitative insights and proof-of-concept systems that will inform the development of future solutions. We outline the challenges and opportunities associated with several important research directions which must be addressed to realize the full potential of next-generation AR. △ Less

Submitted 24 March, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: This is a preprint of a book chapter which will appear in the Springer Handbook of the Metaverse

arXiv:2212.02059 [pdf, other]

Region-Conditioned Orthogonal 3D U-Net for Weather4Cast Competition

Authors: Taehyeon Kim, Shinhwan Kang, Hyeonjeong Shin, Deukryeol Yoon, Seongha Eom, Kijung Shin, Se-Young Yun

Abstract: The Weather4Cast competition (hosted by NeurIPS 2022) required competitors to predict super-resolution rain movies in various regions of Europe when low-resolution satellite contexts covering wider regions are given. In this paper, we show that a general baseline 3D U-Net can be significantly improved with region-conditioned layers as well as orthogonality regularizations on 1x1x1 convolutional la… ▽ More The Weather4Cast competition (hosted by NeurIPS 2022) required competitors to predict super-resolution rain movies in various regions of Europe when low-resolution satellite contexts covering wider regions are given. In this paper, we show that a general baseline 3D U-Net can be significantly improved with region-conditioned layers as well as orthogonality regularizations on 1x1x1 convolutional layers. Additionally, we facilitate the generalization with a bag of training strategies: mixup data augmentation, self-distillation, and feature-wise linear modulation (FiLM). Presented modifications outperform the baseline algorithms (3D U-Net) by up to 19.54% with less than 1% additional parameters, which won the 4th place in the core test leaderboard. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: workshop at NeurIPS 2022 Competition Track on Weather4Cast

arXiv:2211.01692 [pdf, other]

Data-efficient End-to-end Information Extraction for Statistical Legal Analysis

Authors: Wonseok Hwang, Saehee Eom, Hanuhl Lee, Hai Jin Park, Minjoon Seo

Abstract: Legal practitioners often face a vast amount of documents. Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing. Although legal search engines can assist finding individual target documents and narrowing down the number of candidates, retrieved information is often presented as unstructured text and users have to… ▽ More Legal practitioners often face a vast amount of documents. Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing. Although legal search engines can assist finding individual target documents and narrowing down the number of candidates, retrieved information is often presented as unstructured text and users have to examine each document thoroughly which could lead to information overloading. This also makes their statistical analysis challenging. Here, we present an end-to-end information extraction (IE) system for legal documents. By formulating IE as a generation task, our system can be easily applied to various tasks without domain-specific engineering effort. The experimental results of four IE tasks on Korean precedents shows that our IE system can achieve competent scores (-2.3 on average) compared to the rule-based baseline with as few as 50 training examples per task and higher score (+5.4 on average) with 200 examples. Finally, our statistical analysis on two case categories--drunk driving and fraud--with 35k precedents reveals the resulting structured information from our IE system faithfully reflects the macroscopic features of Korean legal system. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: NLLP workshop @ EMNLP 2022

arXiv:1801.02782 [pdf, ps, other]

UAV-Aided Wireless Communication Designs With Propulsion Energy Limitations

Authors: Subin Eom, Hoon Lee, Junhee Park, Inkyu Lee

Abstract: This paper studies unmanned aerial vehicle (UAV) aided wireless communication systems where a UAV supports uplink communications of multiple ground nodes (GNs) while flying over the area of the interest. In this system, the propulsion energy consumption at the UAV is taken into account so that the UAV's velocity and acceleration should not exceed a certain threshold. We formulate the minimum avera… ▽ More This paper studies unmanned aerial vehicle (UAV) aided wireless communication systems where a UAV supports uplink communications of multiple ground nodes (GNs) while flying over the area of the interest. In this system, the propulsion energy consumption at the UAV is taken into account so that the UAV's velocity and acceleration should not exceed a certain threshold. We formulate the minimum average rate maximization problem and the energy efficiency (EE) maximization problem by jointly optimizing the trajectory, velocity, and acceleration of the UAV and the uplink transmit power at the GNs. As these problems are non-convex in general, we employ the successive convex approximation (SCA) techniques. To this end, proper convex approximations for the non-convex constraints are derived, and iterative algorithms are proposed which converge to a local optimal point. Numerical results demonstrate that the proposed algorithms outperform baseline schemes for both problems. Especially for the EE maximization problem, the proposed algorithm exhibits about 109 % gain over the baseline scheme. △ Less

Submitted 8 January, 2018; originally announced January 2018.

Comments: 24 pages, 7 figures

arXiv:1801.02781 [pdf, ps, other]

Minimum Throughput Maximization in UAV-Aided Wireless Powered Communication Networks

Authors: Junhee Park, Hoon Lee, Subin Eom, Inkyu Lee

Abstract: This paper investigates unmanned aerial vehicle (UAV)-aided wireless powered communication network (WPCN) systems where a mobile access point (AP) at the UAV serves multiple energy-constrained ground terminals (GTs). Specifically, the UAVs first charge the GTs by transmitting the wireless energy transfer (WET) signals in the downlink. Then, by utilizing the harvested wireless energy from the UAVs,… ▽ More This paper investigates unmanned aerial vehicle (UAV)-aided wireless powered communication network (WPCN) systems where a mobile access point (AP) at the UAV serves multiple energy-constrained ground terminals (GTs). Specifically, the UAVs first charge the GTs by transmitting the wireless energy transfer (WET) signals in the downlink. Then, by utilizing the harvested wireless energy from the UAVs, the GTs send their uplink wireless information transmission (WIT) signals to the UAVs. In this paper, depending on the operations of the UAVs, we adopt two different scenarios, namely integrated UAV and separated UAV WPCNs. First, in the integrated UAV WPCN, a UAV acts as a hybrid AP in which both energy transfer and information reception are processed at a single UAV. In contrast, for the separated UAV WPCN, we consider two UAVs each of which behaves as an energy AP and an information AP independently, and thus the energy transfer and the information decoding are separately performed at two different UAVs. For both systems, we jointly optimize the trajectories of the UAVs, the uplink power control, and the time resource allocation for the WET and the WIT to maximize the minimum throughput of the GTs. Since the formulated problems are non-convex, we apply the concave-convex procedure by deriving appropriate convex bounds for non-convex constraints. As a result, we propose iterative algorithms which efficiently identify a local optimal solution for the minimum throughput maximization problems. Simulation results verify the efficiency of the proposed algorithms compared to conventional schemes. △ Less

Submitted 8 January, 2018; originally announced January 2018.

Comments: 22 pages, 7 figures

Showing 1–13 of 13 results for author: Eom, S