Search | arXiv e-print repository

Speech Tokenizer is Key to Consistent Representation

Authors: Wonjin Jung, Sungil Kang, Dong-Yeon Cho

Abstract: Speech tokenization is crucial in digital speech processing, converting continuous speech signals into discrete units for various computational tasks. This paper introduces a novel speech tokenizer with broad applicability across downstream tasks. While recent advances in residual vector quantization (RVQ) have incorporated semantic elements, they often neglect critical acoustic features. We propo… ▽ More Speech tokenization is crucial in digital speech processing, converting continuous speech signals into discrete units for various computational tasks. This paper introduces a novel speech tokenizer with broad applicability across downstream tasks. While recent advances in residual vector quantization (RVQ) have incorporated semantic elements, they often neglect critical acoustic features. We propose an advanced approach that simultaneously encodes both linguistic and acoustic information, preserving prosodic and emotional content. Our method significantly enhances speech representation fidelity across diverse applications. Empirical evaluations demonstrate its effectiveness in speech coding, voice conversion, emotion recognition, and multimodal language modeling, without requiring additional training. This versatility underscores its potential as a key tool for advancing AI-driven speech processing. △ Less

Submitted 9 July, 2025; originally announced July 2025.

arXiv:2507.06261 [pdf, ps, other]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3278 additional authors not shown)

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving. △ Less

Submitted 7 July, 2025; originally announced July 2025.

Comments: 72 pages, 17 figures

arXiv:2506.23384 [pdf, ps, other]

Programmable Co-Transcriptional Splicing: Realizing Regular Languages via Hairpin Deletion

Authors: Da-Jung Cho, Szilárd Zsolt Fazekas, Shinnosuke Seki, Max Wiedenhöft

Abstract: RNA co-transcriptionality, where RNA is spliced or folded during transcription from DNA templates, offers promising potential for molecular programming. It enables programmable folding of nano-scale RNA structures and has recently been shown to be Turing universal. While post-transcriptional splicing is well studied, co-transcriptional splicing is gaining attention for its efficiency, though its u… ▽ More RNA co-transcriptionality, where RNA is spliced or folded during transcription from DNA templates, offers promising potential for molecular programming. It enables programmable folding of nano-scale RNA structures and has recently been shown to be Turing universal. While post-transcriptional splicing is well studied, co-transcriptional splicing is gaining attention for its efficiency, though its unpredictability still remains a challenge. In this paper, we focus on engineering co-transcriptional splicing, not only as a natural phenomenon but as a programmable mechanism for generating specific RNA target sequences from DNA templates. The problem we address is whether we can encode a set of RNA sequences for a given system onto a DNA template word, ensuring that all the sequences are generated through co-transcriptional splicing. Given that finding the optimal encoding has been shown to be NP-complete under the various energy models considered, we propose a practical alternative approach under the logarithmic energy model. More specifically, we provide a construction that encodes an arbitrary nondeterministic finite automaton (NFA) into a circular DNA template from which co-transcriptional splicing produces all sequences accepted by the NFA. As all finite languages can be efficiently encoded as NFA, this framework solves the problem of finding small DNA templates for arbitrary target sets of RNA sequences. The quest to obtain the smallest possible such templates naturally leads us to consider the problem of minimizing NFA and certain practically motivated variants of it, but as we show, those minimization problems are computationally intractable. △ Less

Submitted 29 June, 2025; originally announced June 2025.

Comments: 28 pages, 8 Figures, Accepted at the 31st International Conference on DNA Computing and Molecular Programming (2025)

MSC Class: 92-10 ACM Class: F.4.3; J.3; F.1.3

arXiv:2506.19183 [pdf]

A Novel Analysis Framework for Microstructural Characterization of Ferroelectric Hafnia: Experimental Validation and Application

Authors: Yoonsang Park, Jaeduck Jang, Hyangsook Lee, Kihong Kim, Kyooho Jung, Yunseong Lee, Jaewoo Lee, Eunji Yang, Sanghyun Jo, Sijung Yoo, Hyun Jae Lee, Donghoon Kim, Duk-Hyun Choe, Seunggeol Nam

Abstract: Herein, we present a novel analysis framework for grain size profile of ferroelectric hafnia to tackle critical shortcomings inherent in the current microstructural analysis. We vastly enhanced visibility of grains with ion beam treatment and performed accurate grain segmentation using deep neural network (DNN). By leveraging our new method, we discovered unexpected discrepancies that contradict p… ▽ More Herein, we present a novel analysis framework for grain size profile of ferroelectric hafnia to tackle critical shortcomings inherent in the current microstructural analysis. We vastly enhanced visibility of grains with ion beam treatment and performed accurate grain segmentation using deep neural network (DNN). By leveraging our new method, we discovered unexpected discrepancies that contradict previous results, such as deposition temperature (Tdep) and post-metallization annealing (PMA) dependence of grain size statistics, prompting us to reassess earlier interpretations. Combining microstructural analysis with electrical tests, we found that grain size reduction had both positive and negative outcomes: it caused significant diminishing of die-to-die variation (~68 % decrease in standard deviation) in coercive field (Ec), while triggering an upsurge in leakage current. These uncovered results signify robustness of our method in characterization of ferroelectric hafnia for in-depth examination of both device variability and reliability. △ Less

Submitted 23 June, 2025; originally announced June 2025.

Comments: 4 pages (2 pages are text rest are filled with figures)

arXiv:2506.08240 [pdf, ps, other]

Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations

Authors: Dongkyu Cho, Rumi Chunara

Abstract: Data augmentation is a promising tool for enhancing out-of-distribution generalization, where the key is to produce diverse, challenging variations of the source domain via costly targeted augmentations that maximize its generalization effect. Conversely, random augmentation is inexpensive but is deemed suboptimal due to its limited effect. In this paper, we revisit random augmentation and explore… ▽ More Data augmentation is a promising tool for enhancing out-of-distribution generalization, where the key is to produce diverse, challenging variations of the source domain via costly targeted augmentations that maximize its generalization effect. Conversely, random augmentation is inexpensive but is deemed suboptimal due to its limited effect. In this paper, we revisit random augmentation and explore methods to address its shortcomings. We show that the stochastic nature of random augmentation can produce a set of colliding augmentations that distorts the learned features, similar to catastrophic forgetting. We propose a simple solution that improves the generalization effect of random augmentation by addressing forgetting, which displays strong generalization performance across various single source domain generalization (sDG) benchmarks. △ Less

Submitted 27 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

Comments: 12 pages, 6 figures

arXiv:2506.08228 [pdf, ps, other]

Scaling Laws of Motion Forecasting and Planning -- A Technical Report

Authors: Mustafa Baniodeh, Kratarth Goel, Scott Ettinger, Carlos Fuertes, Ari Seff, Tim Shen, Cole Gulino, Chenjie Yang, Ghassen Jerfel, Dokook Choe, Rui Wang, Vinutha Kallem, Sergio Casas, Rami Al-Rfou, Benjamin Sapp, Dragomir Anguelov

Abstract: We study the empirical scaling laws of a family of encoder-decoder autoregressive transformer models on the task of joint motion forecasting and planning in the autonomous driving domain. Using a 500 thousand hours driving dataset, we demonstrate that, similar to language modeling, model performance improves as a power-law function of the total compute budget, and we observe a strong correlation b… ▽ More We study the empirical scaling laws of a family of encoder-decoder autoregressive transformer models on the task of joint motion forecasting and planning in the autonomous driving domain. Using a 500 thousand hours driving dataset, we demonstrate that, similar to language modeling, model performance improves as a power-law function of the total compute budget, and we observe a strong correlation between model training loss and model evaluation metrics. Most interestingly, closed-loop metrics also improve with scaling, which has important implications for the suitability of open-loop metrics for model development and hill climbing. We also study the optimal scaling of the number of transformer parameters and the training data size for a training compute-optimal model. We find that as the training compute budget grows, optimal scaling requires increasing the model size 1.5x as fast as the dataset size. We also study inference-time compute scaling, where we observe that sampling and clustering the output of smaller models makes them competitive with larger models, up to a crossover point beyond which a larger models becomes more inference-compute efficient. Overall, our experimental results demonstrate that optimizing the training and inference-time scaling properties of motion forecasting and planning models is a key lever for improving their performance to address a wide variety of driving scenarios. Finally, we briefly study the utility of training on general logged driving data of other agents to improve the performance of the ego-agent, an important research area to address the scarcity of robotics data for large capacity models training. △ Less

Submitted 9 June, 2025; originally announced June 2025.

arXiv:2505.21671 [pdf, ps, other]

Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing

Authors: Davin Choo, Yuqi Pan, Tonghan Wang, Milind Tambe, Alastair van Heerden, Cheryl Johnson

Abstract: We study a sequential decision-making problem on a $n$-node graph $G$ where each node has an unknown label from a finite set $\mathbfΣ$, drawn from a joint distribution $P$ that is Markov with respect to $G$. At each step, selecting a node reveals its label and yields a label-dependent reward. The goal is to adaptively choose nodes to maximize expected accumulated discounted rewards. We impose a f… ▽ More We study a sequential decision-making problem on a $n$-node graph $G$ where each node has an unknown label from a finite set $\mathbfΣ$, drawn from a joint distribution $P$ that is Markov with respect to $G$. At each step, selecting a node reveals its label and yields a label-dependent reward. The goal is to adaptively choose nodes to maximize expected accumulated discounted rewards. We impose a frontier exploration constraint, where actions are limited to neighbors of previously selected nodes, reflecting practical constraints in settings such as contact tracing and robotic exploration. We design a Gittins index-based policy that applies to general graphs and is provably optimal when $G$ is a forest. Our implementation runs in $O(n^2 \cdot |\mathbfΣ|^2)$ time while using $O(n \cdot |\mathbfΣ|^2)$ oracle calls to $P$ and $O(n^2 \cdot |\mathbfΣ|)$ space. Experiments on synthetic and real-world graphs show that our method consistently outperforms natural baselines, including in non-tree, budget-limited, and undiscounted settings. For example, in HIV testing simulations on real-world sexual interaction networks, our policy detects nearly all positive cases with only half the population tested, substantially outperforming other baselines. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.20868 [pdf, ps, other]

Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech

Authors: Nam-Gyu Kim, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee

Abstract: Recent advances in expressive text-to-speech (TTS) have introduced diverse methods based on style embedding extracted from reference speech. However, synthesizing high-quality expressive speech remains challenging. We propose Spotlight-TTS, which exclusively emphasizes style via voiced-aware style extraction and style direction adjustment. Voiced-aware style extraction focuses on voiced regions hi… ▽ More Recent advances in expressive text-to-speech (TTS) have introduced diverse methods based on style embedding extracted from reference speech. However, synthesizing high-quality expressive speech remains challenging. We propose Spotlight-TTS, which exclusively emphasizes style via voiced-aware style extraction and style direction adjustment. Voiced-aware style extraction focuses on voiced regions highly related to style while maintaining continuity across different speech regions to improve expressiveness. We adjust the direction of the extracted style for optimal integration into the TTS model, which improves speech quality. Experimental results demonstrate that Spotlight-TTS achieves superior performance compared to baseline models in terms of expressiveness, overall speech quality, and style transfer capability. Our audio samples are publicly available. △ Less

Submitted 29 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

Comments: Proceedings of Interspeech 2025

arXiv:2505.19693 [pdf, ps, other]

EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification

Authors: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee

Abstract: Speech emotion recognition predicts a speaker's emotional state from speech signals using discrete labels or continuous dimensions such as arousal, valence, and dominance (VAD). We propose EmoSphere-SER, a joint model that integrates spherical VAD region classification to guide VAD regression for improved emotion prediction. In our framework, VAD values are transformed into spherical coordinates t… ▽ More Speech emotion recognition predicts a speaker's emotional state from speech signals using discrete labels or continuous dimensions such as arousal, valence, and dominance (VAD). We propose EmoSphere-SER, a joint model that integrates spherical VAD region classification to guide VAD regression for improved emotion prediction. In our framework, VAD values are transformed into spherical coordinates that are divided into multiple spherical regions, and an auxiliary classification task predicts which spherical region each point belongs to, guiding the regression process. Additionally, we incorporate a dynamic weighting scheme and a style pooling layer with multi-head self-attention to capture spectral and temporal dynamics, further boosting performance. This combined training strategy reinforces structured learning and improves prediction consistency. Experimental results show that our approach exceeds baseline methods, confirming the validity of the proposed framework. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: Proceedings of Interspeech 2025

arXiv:2505.19687 [pdf, ps, other]

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech

Authors: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee

Abstract: Cross-speaker emotion transfer in speech synthesis relies on extracting speaker-independent emotion embeddings for accurate emotion modeling without retaining speaker traits. However, existing timbre compression methods fail to fully separate speaker and emotion characteristics, causing speaker leakage and degraded synthesis quality. To address this, we propose DiEmo-TTS, a self-supervised distill… ▽ More Cross-speaker emotion transfer in speech synthesis relies on extracting speaker-independent emotion embeddings for accurate emotion modeling without retaining speaker traits. However, existing timbre compression methods fail to fully separate speaker and emotion characteristics, causing speaker leakage and degraded synthesis quality. To address this, we propose DiEmo-TTS, a self-supervised distillation method to minimize emotional information loss and preserve speaker identity. We introduce cluster-driven sampling and information perturbation to preserve emotion while removing irrelevant factors. To facilitate this process, we propose an emotion clustering and matching approach using emotional attribute prediction and speaker embeddings, enabling generalization to unlabeled data. Additionally, we designed a dual conditioning transformer to integrate style features better. Experimental results confirm the effectiveness of our method in learning speaker-irrelevant emotion embeddings. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: Proceedings of Interspeech 2025

arXiv:2505.19252 [pdf, other]

Learning-Augmented Online Bipartite Fractional Matching

Authors: Davin Choo, Billy Jin, Yongho Shin

Abstract: Online bipartite matching is a fundamental problem in online optimization, extensively studied both in its integral and fractional forms due to its theoretical significance and practical applications, such as online advertising and resource allocation. Motivated by recent progress in learning-augmented algorithms, we study online bipartite fractional matching when the algorithm is given advice in… ▽ More Online bipartite matching is a fundamental problem in online optimization, extensively studied both in its integral and fractional forms due to its theoretical significance and practical applications, such as online advertising and resource allocation. Motivated by recent progress in learning-augmented algorithms, we study online bipartite fractional matching when the algorithm is given advice in the form of a suggested matching in each iteration. We develop algorithms for both the vertex-weighted and unweighted variants that provably dominate the naive "coin flip" strategy of randomly choosing between the advice-following and advice-free algorithms. Moreover, our algorithm for the vertex-weighted setting extends to the AdWords problem under the small bids assumption, yielding a significant improvement over the seminal work of Mahdian, Nazerzadeh, and Saberi (EC 2007, TALG 2012). Complementing our positive results, we establish a hardness bound on the robustness-consistency tradeoff that is attainable by any algorithm. We empirically validate our algorithms through experiments on synthetic and real-world data. △ Less

Submitted 25 May, 2025; originally announced May 2025.

arXiv:2505.13376 [pdf, ps, other]

Seeing, Saying, Solving: An LLM-to-TL Framework for Cooperative Robots

Authors: Dan BW Choe, Sundhar Vinodh Sangeetha, Steven Emanuel, Chih-Yuan Chiu, Samuel Coogan, Shreyas Kousik

Abstract: Increased robot deployment, such as in warehousing, has revealed a need for seamless collaboration among heterogeneous robot teams to resolve unforeseen conflicts. To address this challenge, we propose a novel, decentralized framework for robots to request and provide help. The framework begins with robots detecting conflicts using a Vision Language Model (VLM), then reasoning over whether help is… ▽ More Increased robot deployment, such as in warehousing, has revealed a need for seamless collaboration among heterogeneous robot teams to resolve unforeseen conflicts. To address this challenge, we propose a novel, decentralized framework for robots to request and provide help. The framework begins with robots detecting conflicts using a Vision Language Model (VLM), then reasoning over whether help is needed. If so, it crafts and broadcasts a natural language (NL) help request using a Large Language Model (LLM). Potential helper robots reason over the request and offer help (if able), along with information about impact to their current tasks. Helper reasoning is implemented via an LLM grounded in Signal Temporal Logic (STL) using a Backus-Naur Form (BNF) grammar to guarantee syntactically valid NL-to-STL translations, which are then solved as a Mixed Integer Linear Program (MILP). Finally, the requester robot chooses a helper by reasoning over impact on the overall system. We evaluate our system via experiments considering different strategies for choosing a helper, and find that a requester robot can minimize overall time impact on the system by considering multiple help offers versus simple heuristics (e.g., selecting the nearest robot to help). △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.12745 [pdf, ps, other]

PEER pressure: Model-to-Model Regularization for Single Source Domain Generalization

Authors: Dong Kyu Cho, Inwoo Hwang, Sanghack Lee

Abstract: Data augmentation is a popular tool for single source domain generalization, which expands the source domain by generating simulated ones, improving generalization on unseen target domains. In this work, we show that the performance of such augmentation-based methods in the target domains universally fluctuates during training, posing challenges in model selection under realistic scenarios. We arg… ▽ More Data augmentation is a popular tool for single source domain generalization, which expands the source domain by generating simulated ones, improving generalization on unseen target domains. In this work, we show that the performance of such augmentation-based methods in the target domains universally fluctuates during training, posing challenges in model selection under realistic scenarios. We argue that the fluctuation stems from the inability of the model to accumulate the knowledge learned from diverse augmentations, exacerbating feature distortion during training. Based on this observation, we propose a novel generalization method, coined Parameter-Space Ensemble with Entropy Regularization (PEER), that uses a proxy model to learn the augmented data on behalf of the main model. The main model is updated by averaging its parameters with the proxy model, progressively accumulating knowledge over the training steps. Maximizing the mutual information between the output representations of the two models guides the learning process of the proxy model, mitigating feature distortion during training. Experimental results demonstrate the effectiveness of PEER in reducing the OOD performance fluctuation and enhancing generalization across various datasets, including PACS, Digits, Office-Home, and VLCS. Notably, our method with simple random augmentation achieves state-of-the-art performance, surpassing prior approaches on sDG that utilize complex data augmentation strategies. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Comments: 21 pages, 9 figures, Accepted at CVPR 2025

arXiv:2505.08799 [pdf, other]

Measuring Security in 5G and Future Networks

Authors: Loay Abdelrazek, Rim ElMalki, Filippo Rebecchi, Daniel Cho

Abstract: In today's increasingly interconnected and fast-paced digital ecosystem, mobile networks, such as 5G and future generations such as 6G, play a pivotal role and must be considered as critical infrastructures. Ensuring their security is paramount to safeguard both individual users and the industries that depend on these networks. An essential condition for maintaining and improving the security post… ▽ More In today's increasingly interconnected and fast-paced digital ecosystem, mobile networks, such as 5G and future generations such as 6G, play a pivotal role and must be considered as critical infrastructures. Ensuring their security is paramount to safeguard both individual users and the industries that depend on these networks. An essential condition for maintaining and improving the security posture of a system is the ability to effectively measure and monitor its security state. In this work we address the need for an objective measurement of the security state of 5G and future networks. We introduce a state machine model designed to capture the security life cycle of network functions and the transitions between different states within the life cycle. Such a model can be computed locally at each node, or hierarchically, by aggregating measurements into security domains or the whole network. We identify three essential security metrics -- attack surface exposure, impact of system vulnerabilities, and effectiveness of applied security controls -- that collectively form the basis for calculating the overall security score. With this approach, it is possible to provide a holistic understanding of the security posture, laying the foundation for effective security management in the expected dynamic threat landscape of 6G networks. Through practical examples, we illustrate the real-world application of our proposed methodology, offering valuable insights for developing risk management and informed decision-making strategies in 5G and 6G security operations and laying the foundation for effective security management in the expected dynamic threat landscape of 6G networks. △ Less

Submitted 9 May, 2025; originally announced May 2025.

Comments: Accepted and presented in IEEE Future Networks World Forum 2024 conference, This is a pre-print version

arXiv:2505.08230 [pdf, ps, other]

SKiD-SLAM: Robust, Lightweight, and Distributed Multi-Robot LiDAR SLAM in Resource-Constrained Field Environments

Authors: Hogyun Kim, Jiwon Choi, Juwon Kim, Geonmo Yang, Dongjin Cho, Hyungtae Lim, Younggun Cho

Abstract: Distributed LiDAR SLAM is crucial for achieving efficient robot autonomy and improving the scalability of mapping. However, two issues need to be considered when applying it in field environments: one is resource limitation, and the other is inter/intra-robot association. The resource limitation issue arises when the data size exceeds the processing capacity of the network or memory, especially wh… ▽ More Distributed LiDAR SLAM is crucial for achieving efficient robot autonomy and improving the scalability of mapping. However, two issues need to be considered when applying it in field environments: one is resource limitation, and the other is inter/intra-robot association. The resource limitation issue arises when the data size exceeds the processing capacity of the network or memory, especially when utilizing communication systems or onboard computers in the field. The inter/intra-robot association issue occurs due to the narrow convergence region of ICP under large viewpoint differences, triggering many false positive loops and ultimately resulting in an inconsistent global map for multi-robot systems. To tackle these problems, we propose a distributed LiDAR SLAM framework designed for versatile field applications, called SKiD-SLAM. Extending our previous work that solely focused on lightweight place recognition and fast and robust global registration, we present a multi-robot mapping framework that focuses on robust and lightweight inter-robot loop closure in distributed LiDAR SLAM. Through various environmental experiments, we demonstrate that our method is more robust and lightweight compared to other state-of-the-art distributed SLAM approaches, overcoming resource limitation and inter/intra-robot association issues. Also, we validated the field applicability of our approach through mapping experiments in real-world planetary emulation terrain and cave environments, which are in-house datasets. Our code will be available at https://sparolab.github.io/research/skid_slam/. △ Less

Submitted 8 June, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

Comments: 8 pages, 10 figures

arXiv:2504.13490 [pdf, other]

Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing

Authors: Joowon Kim, Ziseok Lee, Donghyeon Cho, Sanghyun Jo, Yeonsung Jung, Kyungsu Kim, Eunho Yang

Abstract: Despite recent advances in diffusion models, achieving reliable image generation and editing remains challenging due to the inherent diversity induced by stochastic noise in the sampling process. Instruction-guided image editing with diffusion models offers user-friendly capabilities, yet editing failures, such as background distortion, frequently occur. Users often resort to trial and error, adju… ▽ More Despite recent advances in diffusion models, achieving reliable image generation and editing remains challenging due to the inherent diversity induced by stochastic noise in the sampling process. Instruction-guided image editing with diffusion models offers user-friendly capabilities, yet editing failures, such as background distortion, frequently occur. Users often resort to trial and error, adjusting seeds or prompts to achieve satisfactory results, which is inefficient. While seed selection methods exist for Text-to-Image (T2I) generation, they depend on external verifiers, limiting applicability, and evaluating multiple seeds increases computational complexity. To address this, we first establish a multiple-seed-based image editing baseline using background consistency scores, achieving Best-of-N performance without supervision. Building on this, we introduce ELECT (Early-timestep Latent Evaluation for Candidate Selection), a zero-shot framework that selects reliable seeds by estimating background mismatches at early diffusion timesteps, identifying the seed that retains the background while modifying only the foreground. ELECT ranks seed candidates by a background inconsistency score, filtering unsuitable samples early based on background consistency while preserving editability. Beyond standalone seed selection, ELECT integrates into instruction-guided editing pipelines and extends to Multimodal Large-Language Models (MLLMs) for joint seed and prompt selection, further improving results when seed selection alone is insufficient. Experiments show that ELECT reduces computational costs (by 41 percent on average and up to 61 percent) while improving background consistency and instruction adherence, achieving around 40 percent success rates in previously failed cases - without any external supervision or training. △ Less

Submitted 18 April, 2025; originally announced April 2025.

arXiv:2504.13354 [pdf, ps, other]

A Formalization of Co-Transcriptional Splicing as an Operation on Formal Languages

Authors: Da-Jung Cho, Szilárd Zsolt Fazekas, Shinnosuke Seki, Max Wiedenhöft

Abstract: RNA co-transcriptionality is the process where RNA sequences are spliced while being transcribed from DNA templates. This process holds potential as a key tool for molecular programming. Co-transcriptional folding has been shown to be programmable for assembling nano-scale RNA structures, and recent advances have proven its Turing universality. While post-transcriptional splicing has been extensiv… ▽ More RNA co-transcriptionality is the process where RNA sequences are spliced while being transcribed from DNA templates. This process holds potential as a key tool for molecular programming. Co-transcriptional folding has been shown to be programmable for assembling nano-scale RNA structures, and recent advances have proven its Turing universality. While post-transcriptional splicing has been extensively studied, co-transcriptional splicing is gaining attention for its potential to save resources and space in molecular systems. However, its unpredictability has limited its practical applications. In this paper, we focus on engineering co-transcriptional splicing, moving beyond natural occurrences to program RNA sequences that produce specific target sequences through DNA templates. We introduce contextual lariat deletion operations under three energy models - linear loop penalty, logarithmic loop penalty, and constantly bounded loop length - as well as bracketed contextual deletion, where deletion occurs solely based on context matching, without any structural constraints from hairpin loops. We examine the complexity of the template constructability problem associated with these operations and study the closure properties of the languages they generate, providing insights for RNA template design in molecular programming systems. △ Less

Submitted 17 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

Comments: 35 pages, 2 tables, 4 figures, Updated Long Version, Under revision review in Natural Computing

MSC Class: 68Q45; 68Q17 ACM Class: F.4.3; F.1.3

arXiv:2504.02193 [pdf, other]

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

Authors: Yifan Wang, Runjin Chen, Bolian Li, David Cho, Yihe Deng, Ruqi Zhang, Tianlong Chen, Zhangyang Wang, Ananth Grama, Junyuan Hong

Abstract: Aligning large language models (LLMs) with human values is an increasingly critical step in post-training. Direct Preference Optimization (DPO) has emerged as a simple, yet effective alternative to reinforcement learning from human feedback (RLHF). Synthetic preference data with its low cost and high quality enable effective alignment through single- or multi-model generated preference data. Our s… ▽ More Aligning large language models (LLMs) with human values is an increasingly critical step in post-training. Direct Preference Optimization (DPO) has emerged as a simple, yet effective alternative to reinforcement learning from human feedback (RLHF). Synthetic preference data with its low cost and high quality enable effective alignment through single- or multi-model generated preference data. Our study reveals a striking, safety-specific phenomenon associated with DPO alignment: Although multi-model generated data enhances performance on general tasks (ARC, Hellaswag, MMLU, TruthfulQA, Winogrande) by providing diverse responses, it also tends to facilitate reward hacking during training. This can lead to a high attack success rate (ASR) when models encounter jailbreaking prompts. The issue is particularly pronounced when employing stronger models like GPT-4o or larger models in the same family to generate chosen responses paired with target model self-generated rejected responses, resulting in dramatically poorer safety outcomes. Furthermore, with respect to safety, using solely self-generated responses (single-model generation) for both chosen and rejected pairs significantly outperforms configurations that incorporate responses from stronger models, whether used directly as chosen data or as part of a multi-model response pool. We demonstrate that multi-model preference data exhibits high linear separability between chosen and rejected responses, which allows models to exploit superficial cues rather than internalizing robust safety constraints. Our experiments, conducted on models from the Llama, Mistral, and Qwen families, consistently validate these findings. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2504.00137 [pdf, other]

Performance analysis of metasurface-based spatial multimode transmission for 6G wireless communications

Authors: Ju Yong Lee, Seung-Won Keum, Sang Min Oh, Dang-Oh Kim, Dong-Ho Cho

Abstract: In 6th generation wireless communication technology, it is important to utilize space resources efficiently. Recently, holographic multiple-input multiple-output (HMIMO) and meta-surface technology have attracted attention as technologies that maximize space utilization for 6G mobile communications. However, studies on HMIMO communications are still in an initial stage and its fundamental limits a… ▽ More In 6th generation wireless communication technology, it is important to utilize space resources efficiently. Recently, holographic multiple-input multiple-output (HMIMO) and meta-surface technology have attracted attention as technologies that maximize space utilization for 6G mobile communications. However, studies on HMIMO communications are still in an initial stage and its fundamental limits are yet to be unveiled. It is well known that the Fourier transform relationship can be obtained using a lens in the optical field, but research to apply it to the mobile communication field is in the early stages. In this paper, we show that the Fourier transform relationship between signals can be obtained when two metasurfaces are aligned or unaligned, and analyze the transmission and reception power, and the maximum number of spatial multimodes that can be transmitted. In addition, to reduce transmission complexity, we propose a spatial multimode transmission system using three metasurfaces and analyze signal characteristics on the meta-surfaces. In numerical results, we provide the performance of spatial multimode transmission in case of using rectangular and Gaussian signals. △ Less

Submitted 31 March, 2025; originally announced April 2025.

arXiv:2503.05841 [pdf]

Low Mach number limit for the diffusion approximation model in radiation hydrodynamics at equilibrium-diffusion regime

Authors: Kwang-Il Choe, Dae-Won Choe, Myong Chol Pak

Abstract: The low Mach number limit for the compressible viscous diffusion approximation model arising in radiation hydrodynamics is rigorously justified. For the 3-D Cauchy problem, the solutions in an equilibrium diffusion regime are shown to converge to the solutions of an incompressible Navier-Stokes equations locally and globally in time as Mach number goes to zero, when the effect of the small tempera… ▽ More The low Mach number limit for the compressible viscous diffusion approximation model arising in radiation hydrodynamics is rigorously justified. For the 3-D Cauchy problem, the solutions in an equilibrium diffusion regime are shown to converge to the solutions of an incompressible Navier-Stokes equations locally and globally in time as Mach number goes to zero, when the effect of the small temperature variation upon the limit is taken into account. △ Less

Submitted 6 March, 2025; originally announced March 2025.

Comments: 26 pages

arXiv:2502.07274 [pdf, other]

Memory Is Not the Bottleneck: Cost-Efficient Continual Learning via Weight Space Consolidation

Authors: Dongkyu Cho, Taesup Moon, Rumi Chunara, Kyunghyun Cho, Sungmin Cha

Abstract: Continual learning (CL) has traditionally emphasized minimizing exemplar memory usage, assuming that memory is the primary bottleneck. However, in modern computing environments-particularly those involving large foundation models-memory is inexpensive and abundant, while GPU time constitutes the main cost. This paper re-examines CL under a more realistic setting with sufficient exemplar memory, wh… ▽ More Continual learning (CL) has traditionally emphasized minimizing exemplar memory usage, assuming that memory is the primary bottleneck. However, in modern computing environments-particularly those involving large foundation models-memory is inexpensive and abundant, while GPU time constitutes the main cost. This paper re-examines CL under a more realistic setting with sufficient exemplar memory, where the system can retain a representative portion of past data. We find that, under this regime, stability improves due to reduced forgetting, but plasticity diminishes as the model becomes biased toward prior tasks and struggles to adapt to new ones. Notably, even simple baselines like naive replay can match or exceed the performance of state-of-the-art methods at a fraction of the computational cost. Building on this insight, we propose a lightweight yet effective method called Weight Space Consolidation, which directly operates in the model's weight space via two core mechanisms: (1) rank-based parameter resets to recover plasticity, and (2) weight averaging to enhance stability. Our approach outperforms strong baselines across class-incremental learning with image classifiers and continual instruction tuning with large language models, while requiring only one-third to one-fourth of the training cost. These findings challenge long-standing CL assumptions and establish a new, cost-efficient baseline for real-world continual learning systems where exemplar memory is no longer the limiting factor. △ Less

Submitted 20 May, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

Comments: 23 pages, 11 figures

arXiv:2502.04998 [pdf, other]

On Sequential Fault-Intolerant Process Planning

Authors: Andrzej Kaczmarczyk, Davin Choo, Niclas Boehmer, Milind Tambe, Haifeng Xu

Abstract: We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP). SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed. Such reward structures are different from classic additive reward structures and arise in important applications such as drug/material disco… ▽ More We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP). SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed. Such reward structures are different from classic additive reward structures and arise in important applications such as drug/material discovery, security, and quality-critical product design. We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage. We do so both for the foundational case in which the behavior of actions is deterministic, and the case of probabilistic action outcomes, where we effectively balance exploration for learning and exploitation for planning through the usage of multi-armed bandit algorithms. In our empirical evaluations, we demonstrate that the specialized algorithms we develop, which leverage additional information about the structure of the SFIPP instance, outperform our more general algorithm. △ Less

Submitted 7 February, 2025; originally announced February 2025.

Comments: 20 pages; 7 figures

arXiv:2501.07809 [pdf, other]

Conformal mapping Coordinates Physics-Informed Neural Networks (CoCo-PINNs): learning neural networks for designing neutral inclusions

Authors: Daehee Cho, Hyeonmin Yun, Jaeyong Lee, Mikyoung Lim

Abstract: We focus on designing and solving the neutral inclusion problem via neural networks. The neutral inclusion problem has a long history in the theory of composite materials, and it is exceedingly challenging to identify the precise condition that precipitates a general-shaped inclusion into a neutral inclusion. Physics-informed neural networks (PINNs) have recently become a highly successful approac… ▽ More We focus on designing and solving the neutral inclusion problem via neural networks. The neutral inclusion problem has a long history in the theory of composite materials, and it is exceedingly challenging to identify the precise condition that precipitates a general-shaped inclusion into a neutral inclusion. Physics-informed neural networks (PINNs) have recently become a highly successful approach to addressing both forward and inverse problems associated with partial differential equations. We found that traditional PINNs perform inadequately when applied to the inverse problem of designing neutral inclusions with arbitrary shapes. In this study, we introduce a novel approach, Conformal mapping Coordinates Physics-Informed Neural Networks (CoCo-PINNs), which integrates complex analysis techniques into PINNs. This method exhibits strong performance in solving forward-inverse problems to construct neutral inclusions of arbitrary shapes in two dimensions, where the imperfect interface condition on the inclusion's boundary is modeled by training neural networks. Notably, we mathematically prove that training with a single linear field is sufficient to achieve neutrality for untrained linear fields in arbitrary directions, given a minor assumption. We demonstrate that CoCo-PINNs offer enhanced performances in terms of credibility, consistency, and stability. △ Less

Submitted 13 January, 2025; originally announced January 2025.

arXiv:2501.06246 [pdf, other]

A partition cover approach to tokenization

Authors: Jia Peng Lim, Shawn Tan, Davin Choo, Hady W. Lauw

Abstract: Tokenization is the process of encoding strings into tokens of a fixed vocabulary size, and is widely utilized in Natural Language Processing applications. The leading tokenization algorithm today is Byte-Pair Encoding (BPE), which formulates the tokenization problem as a compression problem and tackles it by performing sequences of merges. In this work, we formulate tokenization as an optimizatio… ▽ More Tokenization is the process of encoding strings into tokens of a fixed vocabulary size, and is widely utilized in Natural Language Processing applications. The leading tokenization algorithm today is Byte-Pair Encoding (BPE), which formulates the tokenization problem as a compression problem and tackles it by performing sequences of merges. In this work, we formulate tokenization as an optimization objective, show that it is NP-hard via a simple reduction from vertex cover, and propose a polynomial-time greedy algorithm GreedTok. Our formulation naturally relaxes to the well-studied weighted maximum coverage problem which has a simple $(1 - 1/e)$-approximation algorithm GreedWMC. Through empirical evaluations on real-world corpora, we show that GreedTok outperforms BPE and Unigram on compression and achieves a covering score comparable to GreedWMC. Finally, our extensive pre-training for two transformer-based language models with 1 billion parameters, comparing the choices of BPE and GreedTok as the tokenizer, shows that GreedTok achieves a lower bit per byte even when we control for either the total dataset proportion or total training tokens. △ Less

Submitted 25 May, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

Comments: under review

arXiv:2412.19561 [pdf, other]

Single-qubit quantum gate at an arbitrary speed

Authors: Seongjin Ahn, Kichan Park, Daehee Cho, Mikyoung Lim, Taeyoung Choi, Andrey S. Moskalenko

Abstract: Quantum information processing comprises physical processes, which obey the quantum speed limit (QSL): high speed requires strong driving. Single-qubit gates using Rabi oscillation, which is based on the rotating wave approximation (RWA), satisfy this bound in the form that the gate time $T$ is inversely proportional to the Rabi frequency $Ω$, characterizing the driving strength. However, if the g… ▽ More Quantum information processing comprises physical processes, which obey the quantum speed limit (QSL): high speed requires strong driving. Single-qubit gates using Rabi oscillation, which is based on the rotating wave approximation (RWA), satisfy this bound in the form that the gate time $T$ is inversely proportional to the Rabi frequency $Ω$, characterizing the driving strength. However, if the gate time is comparable or shorter than the qubit period $T_{0} \equiv 2π/ ω_{0}$, the RWA actually breaks down since the Rabi frequency has to be large compared to the qubit frequency $ω_{0}$ due to the QSL, which is given as $T \gtrsim π/Ω$. We show that it is possible to construct a universal set of single-qubit gates at this strong-coupling and ultrafast regime, by adjusting the central frequency $ω$ and the Rabi frequency $Ω$ of the driving pulse. We observe a transition in the scaling behavior of the central frequency from the long-gate time regime ($T \gg T_{0}$) to the short-gate time ($T \ll T_{0}$) regime. In the former, the central frequency is nearly resonant to the qubit, i.e., $ω\simeq ω_{0}$, whereas in the latter, the central frequency is inversely proportional to the gate time, i.e., $ω\sim π/T$. We identify the transition gate time at which the scaling exponent $n$ of the optimal central frequency $ω\sim T^{n}$ changes from $n=0$ to $n=-1$. △ Less

Submitted 27 December, 2024; originally announced December 2024.

Comments: 11 pages, 4 figures

arXiv:2412.06192 [pdf, other]

PoLaRIS Dataset: A Maritime Object Detection and Tracking Dataset in Pohang Canal

Authors: Jiwon Choi, Dongjin Cho, Gihyeon Lee, Hogyun Kim, Geonmo Yang, Joowan Kim, Younggun Cho

Abstract: Maritime environments often present hazardous situations due to factors such as moving ships or buoys, which become obstacles under the influence of waves. In such challenging conditions, the ability to detect and track potentially hazardous objects is critical for the safe navigation of marine robots. To address the scarcity of comprehensive datasets capturing these dynamic scenarios, we introduc… ▽ More Maritime environments often present hazardous situations due to factors such as moving ships or buoys, which become obstacles under the influence of waves. In such challenging conditions, the ability to detect and track potentially hazardous objects is critical for the safe navigation of marine robots. To address the scarcity of comprehensive datasets capturing these dynamic scenarios, we introduce a new multi-modal dataset that includes image and point-wise annotations of maritime hazards. Our dataset provides detailed ground truth for obstacle detection and tracking, including objects as small as 10$\times$10 pixels, which are crucial for maritime safety. To validate the dataset's effectiveness as a reliable benchmark, we conducted evaluations using various methodologies, including \ac{SOTA} techniques for object detection and tracking. These evaluations are expected to contribute to performance improvements, particularly in the complex maritime environment. To the best of our knowledge, this is the first dataset offering multi-modal annotations specifically tailored to maritime environments. Our dataset is available at https://sites.google.com/view/polaris-dataset. △ Less

Submitted 19 December, 2024; v1 submitted 8 December, 2024; originally announced December 2024.

arXiv:2411.13955 [pdf, other]

A silicon-based ion trap chip protected from semiconductor charging

Authors: Daun Chung, Kwangyeul Choi, Woojun Lee, Chiyoon Kim, Hosung Shon, Jeonghyun Park, Beomgeun Cho, Kyungmin Lee, Suhan Kim, Seungwoo Yoo, Eui Hwan Jung, Changhyun Jung, Jiyong Kang, Kyunghye Kim, Roberts Berkis, Tracy Northup, Dong-Il "Dan'' Cho, Taehyun Kim

Abstract: Silicon-based ion trap chips can benefit from existing advanced fabrication technologies, such as multi-metal layer techniques for two-dimensional architectures and silicon photonics for the integration of on-chip optical components. However, the scalability of these technologies may be compromised by semiconductor charging, where photogenerated charge carriers produce electric potentials that dis… ▽ More Silicon-based ion trap chips can benefit from existing advanced fabrication technologies, such as multi-metal layer techniques for two-dimensional architectures and silicon photonics for the integration of on-chip optical components. However, the scalability of these technologies may be compromised by semiconductor charging, where photogenerated charge carriers produce electric potentials that disrupt ion motion. Inspired by recent studies on charge distribution mechanisms in semiconductors, we developed a silicon-based chip with gold coated on all exposed silicon surfaces. This modification significantly stabilized ion motion compared to a chip without such metallic shielding, a result that underscores the detrimental effects of exposed silicon. With the mitigation of background silicon-induced fields to negligible levels, quantum operations such as sideband cooling and two-ion entangling gates, which were previously infeasible with the unshielded chip, can now be implemented. △ Less

Submitted 21 November, 2024; originally announced November 2024.

arXiv:2411.12700 [pdf, other]

Learning multivariate Gaussians with imperfect advice

Authors: Arnab Bhattacharyya, Davin Choo, Philips George John, Themis Gouleakis

Abstract: We revisit the problem of distribution learning within the framework of learning-augmented algorithms. In this setting, we explore the scenario where a probability distribution is provided as potentially inaccurate advice on the true, unknown distribution. Our objective is to develop learning algorithms whose sample complexity decreases as the quality of the advice improves, thereby surpassing sta… ▽ More We revisit the problem of distribution learning within the framework of learning-augmented algorithms. In this setting, we explore the scenario where a probability distribution is provided as potentially inaccurate advice on the true, unknown distribution. Our objective is to develop learning algorithms whose sample complexity decreases as the quality of the advice improves, thereby surpassing standard learning lower bounds when the advice is sufficiently accurate. Specifically, we demonstrate that this outcome is achievable for the problem of learning a multivariate Gaussian distribution $N(\boldsymbolμ, \boldsymbolΣ)$ in the PAC learning setting. Classically, in the advice-free setting, $\tildeΘ(d^2/\varepsilon^2)$ samples are sufficient and worst case necessary to learn $d$-dimensional Gaussians up to TV distance $\varepsilon$ with constant probability. When we are additionally given a parameter $\tilde{\boldsymbolΣ}$ as advice, we show that $\tilde{O}(d^{2-β}/\varepsilon^2)$ samples suffices whenever $\| \tilde{\boldsymbolΣ}^{-1/2} \boldsymbolΣ \tilde{\boldsymbolΣ}^{-1/2} - \boldsymbol{I_d} \|_1 \leq \varepsilon d^{1-β}$ (where $\|\cdot\|_1$ denotes the entrywise $\ell_1$ norm) for any $β> 0$, yielding a polynomial improvement over the advice-free setting. △ Less

Submitted 31 January, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

arXiv:2411.08141 [pdf, ps, other]

Probably approximately correct high-dimensional causal effect estimation given a valid adjustment set

Authors: Davin Choo, Chandler Squires, Arnab Bhattacharyya, David Sontag

Abstract: Accurate estimates of causal effects play a key role in decision-making across applications such as healthcare, economics, and operations. In the absence of randomized experiments, a common approach to estimating causal effects uses \textit{covariate adjustment}. In this paper, we study covariate adjustment for discrete distributions from the PAC learning perspective, assuming knowledge of a valid… ▽ More Accurate estimates of causal effects play a key role in decision-making across applications such as healthcare, economics, and operations. In the absence of randomized experiments, a common approach to estimating causal effects uses \textit{covariate adjustment}. In this paper, we study covariate adjustment for discrete distributions from the PAC learning perspective, assuming knowledge of a valid adjustment set $\bZ$, which might be high-dimensional. Our first main result PAC-bounds the estimation error of covariate adjustment by a term that is exponential in the size of the adjustment set; it is known that such a dependency is unavoidable even if one only aims to minimize the mean squared error. Motivated by this result, we introduce the notion of an \emph{$\eps$-Markov blanket}, give bounds on the misspecification error of using such a set for covariate adjustment, and provide an algorithm for $\eps$-Markov blanket discovery; our second main result upper bounds the sample complexity of this algorithm. Furthermore, we provide a misspecification error bound and a constraint-based algorithm that allow us to go beyond $\eps$-Markov blankets to even smaller adjustment sets. Our third main result upper bounds the sample complexity of this algorithm, and our final result combines the first three into an overall PAC bound. Altogether, our results highlight that one does not need to perfectly recover causal structure in order to ensure accurate estimates of causal effects. △ Less

Submitted 12 November, 2024; originally announced November 2024.

arXiv:2411.02625 [pdf, other]

doi 10.1109/TAFFC.2025.3561267

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector

Authors: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee

Abstract: Emotional text-to-speech (TTS) technology has achieved significant progress in recent years; however, challenges remain owing to the inherent complexity of emotions and limitations of the available emotional speech datasets and models. Previous studies typically relied on limited emotional speech datasets or required extensive manual annotations, restricting their ability to generalize across diff… ▽ More Emotional text-to-speech (TTS) technology has achieved significant progress in recent years; however, challenges remain owing to the inherent complexity of emotions and limitations of the available emotional speech datasets and models. Previous studies typically relied on limited emotional speech datasets or required extensive manual annotations, restricting their ability to generalize across different speakers and emotional styles. In this paper, we present EmoSphere++, an emotion-controllable zero-shot TTS model that can control emotional style and intensity to resemble natural human speech. We introduce a novel emotion-adaptive spherical vector that models emotional style and intensity without human annotation. Moreover, we propose a multi-level style encoder that can ensure effective generalization for both seen and unseen speakers. We also introduce additional loss functions to enhance the emotion transfer performance for zero-shot scenarios. We employ a conditional flow matching-based decoder to achieve high-quality and expressive emotional TTS in a few sampling steps. Experimental results demonstrate the effectiveness of the proposed framework. △ Less

Submitted 16 April, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

Journal ref: Published in IEEE Transactions on Affective Computing 2025

arXiv:2410.11894 [pdf, other]

Automated Discovery of Operable Dynamics from Videos

Authors: Kuang Huang, Dong Heon Cho, Boyuan Chen

Abstract: Dynamical systems form the foundation of scientific discovery, traditionally modeled with predefined state variables such as the angle and angular velocity, and differential equations such as the equation of motion for a single pendulum. We introduce a framework that automatically discovers a low-dimensional and operable representation of system dynamics, including a set of compact state variables… ▽ More Dynamical systems form the foundation of scientific discovery, traditionally modeled with predefined state variables such as the angle and angular velocity, and differential equations such as the equation of motion for a single pendulum. We introduce a framework that automatically discovers a low-dimensional and operable representation of system dynamics, including a set of compact state variables that preserve the smoothness of the system dynamics and a differentiable vector field, directly from video without requiring prior domain-specific knowledge. The prominence and effectiveness of the proposed approach are demonstrated through both quantitative and qualitative analyses of a range of dynamical systems, including the identification of stable equilibria, the prediction of natural frequencies, and the detection of of chaotic and limit cycle behaviors. The results highlight the potential of our data-driven approach to advance automated scientific discovery. △ Less

Submitted 23 April, 2025; v1 submitted 13 October, 2024; originally announced October 2024.

arXiv:2410.06583 [pdf, other]

A short note about the learning-augmented secretary problem

Authors: Davin Choo, Chun Kai Ling

Abstract: We consider the secretary problem through the lens of learning-augmented algorithms. As it is known that the best possible expected competitive ratio is $1/e$ in the classic setting without predictions, a natural goal is to design algorithms that are 1-consistent and $1/e$-robust. Unfortunately, [FY24] provided hardness constructions showing that such a goal is not attainable when the candidates'… ▽ More We consider the secretary problem through the lens of learning-augmented algorithms. As it is known that the best possible expected competitive ratio is $1/e$ in the classic setting without predictions, a natural goal is to design algorithms that are 1-consistent and $1/e$-robust. Unfortunately, [FY24] provided hardness constructions showing that such a goal is not attainable when the candidates' true values are allowed to scale with $n$. Here, we provide a simple and explicit alternative hardness construction showing that such a goal is not achievable even when the candidates' true values are constants that do not scale with $n$. △ Less

Submitted 2 November, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

arXiv:2409.15784 [pdf]

doi 10.1038/s41524-025-01569-7

Deep-learning real-time phase retrieval of imperfect diffraction patterns from X-ray free-electron lasers

Authors: Sung Yun Lee, Do Hyung Cho, Chulho Jung, Daeho Sung, Daewoong Nam, Sangsoo Kim, Changyong Song

Abstract: Machine learning is attracting surging interest across nearly all scientific areas by enabling the analysis of large datasets and the extraction of scientific information from incomplete data. Data-driven science is rapidly growing, especially in X-ray methodologies, where advanced light sources and detection technologies accumulate vast amounts of data that exceed meticulous human inspection capa… ▽ More Machine learning is attracting surging interest across nearly all scientific areas by enabling the analysis of large datasets and the extraction of scientific information from incomplete data. Data-driven science is rapidly growing, especially in X-ray methodologies, where advanced light sources and detection technologies accumulate vast amounts of data that exceed meticulous human inspection capabilities. Despite the increasing demands, the full application of machine learning has been hindered by the need for data-specific optimizations. In this study, we introduce a new deep-learning-based phase retrieval method for imperfect diffraction data. This method provides robust phase retrieval for simulated data and performs well on weak-signal single-pulse diffraction data from X-ray free-electron lasers. Moreover, the method significantly reduces data processing time, facilitating real-time image reconstructions that are crucial for high-repetition-rate data acquisition. Thus, this approach offers a reliable solution to the phase problem and is expected to be widely adopted across various research areas. △ Less

Submitted 24 September, 2024; originally announced September 2024.

MSC Class: 68T07 ACM Class: J.2

arXiv:2408.16598 [pdf, other]

doi 10.1021/acs.nanolett.4c05650

Signatures of Amorphous Shiba State in FeTe$_{0.55}$Se$_{0.45}$

Authors: Jinwon Lee, Sanghun Lee, Andreas Kreisel, Jens Paaske, Brian M. Andersen, Koen M. Bastiaans, Damianos Chatzopoulos, Genda Gu, Doohee Cho, Milan P. Allan

Abstract: The iron-based superconductor FeTe$_{0.55}$Se$_{0.45}$ is a peculiar material: it hosts a surface state with a Dirac dispersion, is a putative topological superconductor hosting Majorana modes in vortices, and has an unusually low Fermi energy. The superconducting state is generally thought to be characterized by three gaps in different bands, with the usual homogenous, spatially extended Bogoliub… ▽ More The iron-based superconductor FeTe$_{0.55}$Se$_{0.45}$ is a peculiar material: it hosts a surface state with a Dirac dispersion, is a putative topological superconductor hosting Majorana modes in vortices, and has an unusually low Fermi energy. The superconducting state is generally thought to be characterized by three gaps in different bands, with the usual homogenous, spatially extended Bogoliubov excitations -- in this work, we uncover evidence that it is instead of a very different nature. Our scanning tunneling spectroscopy data shows several peaks in the density of states above a full gap, and by analyzing the spatial and junction-resistance dependence of the peaks, we conclude that the peaks above the first one are not coherence peaks from different bands. Instead, comparisons with our simulations indicate that they originate from generalized Shiba states that are spatially overlapping. This can lead to an amorphous state of Bogoliubov quasiparticles, reminiscent of impurity bands in semiconductors. We discuss the origin and implications of this new state. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 6 pages, 4 figures

Journal ref: Nano Letters 25, 4227-4233 (2025)

arXiv:2407.21678 [pdf]

Charged-impurity free printing-based diffusion doping in molybdenum disulfide field-effect transistors

Authors: Inho Jeong, Jiwoo Yang, Juntae Jang, Daeheum Cho, Deok-Hwang Kwon, Jae-Keun Kim, Takhee Lee, Kyungjune Cho, Seungjun Chung

Abstract: In practical electronic applications, where doping is crucial to exploit large-area two-dimensional (2D) semiconductors, surface charge transfer doping (SCTD) has emerged as a promising strategy to tailor their electrical characteristics. However, impurity scattering caused by resultant ionized dopants, after donating or withdrawing carriers, hinders transport in 2D semiconductor layers, limiting… ▽ More In practical electronic applications, where doping is crucial to exploit large-area two-dimensional (2D) semiconductors, surface charge transfer doping (SCTD) has emerged as a promising strategy to tailor their electrical characteristics. However, impurity scattering caused by resultant ionized dopants, after donating or withdrawing carriers, hinders transport in 2D semiconductor layers, limiting the carrier mobility. Here, we propose a diffusion doping method for chemical vapor deposition (CVD) grown molybdenum disulfide that avoids interference from charged impurities. Selectively inkjet-printed dopants were introduced only on the contact region, allowing excessively donated electrons to diffuse to the channel layer due to the electron density difference. Therefore, diffusion-doped molybdenum disulfide FETs do not have undesirable charged impurities on the channel, exhibiting over two-fold higher field-effect mobility compared with conventional direct-doped ones. Our study paves the way for a new doping strategy that simultaneously suppresses charged impurity scattering and facilitates the tailoring of the SCTD effect. △ Less

Submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.03231 [pdf]

doi 10.1021/acs.nanolett.4c01536

Dimensionality Engineering of Magnetic Anisotropy from Anomalous Hall Effect in Synthetic SrRuO3 Crystals

Authors: Seung Gyo Jeong, Seong Won Cho, Sehwan Song, Jin Young Oh, Do Gyeom Jeong, Gyeongtak Han, Hu Young Jeong, Ahmed Yousef Mohamed, Woo-suk Noh, Sungkyun Park, Jong Seok Lee, Suyoun Lee, Young-Min Kim, Deok-Yong Cho, Woo Seok Choi

Abstract: Magnetic anisotropy in atomically thin correlated heterostructures is essential for exploring quantum magnetic phases for next-generation spintronics. Whereas previous studies have mostly focused on van der Waals systems, here, we investigate the impact of dimensionality of epitaxially-grown correlated oxides down to the monolayer limit on structural, magnetic, and orbital anisotropies. By designi… ▽ More Magnetic anisotropy in atomically thin correlated heterostructures is essential for exploring quantum magnetic phases for next-generation spintronics. Whereas previous studies have mostly focused on van der Waals systems, here, we investigate the impact of dimensionality of epitaxially-grown correlated oxides down to the monolayer limit on structural, magnetic, and orbital anisotropies. By designing oxide superlattices with a correlated ferromagnetic SrRuO3 and nonmagnetic SrTiO3 layers, we observed modulated ferromagnetic behavior with the change of the SrRuO3 thickness. Especially, for three-unit-cell-thick layers, we observe a significant 1,500% improvement of coercive field in the anomalous Hall effect, which cannot be solely attributed to the dimensional crossover in ferromagnetism. The atomic-scale heterostructures further reveal the systematic modulation of anisotropy for the lattice structure and orbital hybridization, explaining the enhanced magnetic anisotropy. Our findings provide valuable insights into engineering the anisotropic hybridization of synthetic magnetic crystals, offering a tunable spin order for various applications. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 23 pages

Journal ref: published 2024

arXiv:2407.00927 [pdf, ps, other]

Learnability of Parameter-Bounded Bayes Nets

Authors: Arnab Bhattacharyya, Davin Choo, Sutanu Gayen, Dimitrios Myrisiotis

Abstract: Bayes nets are extensively used in practice to efficiently represent joint probability distributions over a set of random variables and capture dependency relations. In a seminal paper, Chickering et al. (JMLR 2004) showed that given a distribution $\mathbb{P}$, that is defined as the marginal distribution of a Bayes net, it is $\mathsf{NP}$-hard to decide whether there is a parameter-bounded Baye… ▽ More Bayes nets are extensively used in practice to efficiently represent joint probability distributions over a set of random variables and capture dependency relations. In a seminal paper, Chickering et al. (JMLR 2004) showed that given a distribution $\mathbb{P}$, that is defined as the marginal distribution of a Bayes net, it is $\mathsf{NP}$-hard to decide whether there is a parameter-bounded Bayes net that represents $\mathbb{P}$. They called this problem LEARN. In this work, we extend the $\mathsf{NP}$-hardness result of LEARN and prove the $\mathsf{NP}$-hardness of a promise search variant of LEARN, whereby the Bayes net in question is guaranteed to exist and one is asked to find such a Bayes net. We complement our hardness result with a positive result about the sample complexity that is sufficient to recover a parameter-bounded Bayes net that is close (in TV distance) to a given distribution $\mathbb{P}$, that is represented by some parameter-bounded Bayes net, generalizing a degree-bounded sample complexity result of Brustle et al. (EC 2020). △ Less

Submitted 4 August, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

Comments: 15 pages, 2 figures

arXiv:2406.09460 [pdf, other]

doi 10.1002/advs.202401348

Origin of Distinct Insulating Domains in the Layered Charge Density Wave Material 1T-TaS2

Authors: Hyungryul Yang, Byeongin Lee, Junho Bang, Sunghun Kim, Dirk Wulferding, Sung-Hoon Lee, Doohee Cho

Abstract: Vertical charge order shapes the electronic properties in layered charge density wave (CDW) materials. Various stacking orders inevitably create nanoscale domains with distinct electronic structures inaccessible to bulk probes. Here, the stacking characteristics of bulk 1$T$-TaS$2$ are analyzed using scanning tunneling spectroscopy (STS) and density functional theory (DFT) calculations. It is obse… ▽ More Vertical charge order shapes the electronic properties in layered charge density wave (CDW) materials. Various stacking orders inevitably create nanoscale domains with distinct electronic structures inaccessible to bulk probes. Here, the stacking characteristics of bulk 1$T$-TaS$2$ are analyzed using scanning tunneling spectroscopy (STS) and density functional theory (DFT) calculations. It is observed that Mott-insulating domains undergo a transition to band-insulating domains restoring vertical dimerization of the CDWs. Furthermore, STS measurements covering a wide terrace reveal two distinct band insulating domains differentiated by band edge broadening. These DFT calculations reveal that the Mott insulating layers preferably reside on the subsurface, forming broader band edges in the neighboring band insulating layers. Ultimately, buried Mott insulating layers believed to harbor the quantum spin liquid phase are identified. These results resolve persistent issues regarding vertical charge order in 1$T$-TaS$2$, providing a new perspective for investigating emergent quantum phenomena in layered CDW materials. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 26 pages and 13 figures

arXiv:2406.07960 [pdf, other]

doi 10.1103/PhysRevB.109.195170

Charge ordered phases in the hole-doped triangular Mott insulator 4Hb-TaS2

Authors: Junho Bang, Byeongin Lee, Hyungryul Yang, Sunghun Kim, Dirk Wulferding, Doohee Cho

Abstract: 4Hb-TaS2 has been proposed to possess unconventional superconductivity with broken time reveral symmetry due to distinctive layered structure, featuring a heterojunction between a 2D triangular Mott insulator and a charge density wave metal. However, since a frustrated spin state in the correlated insulating layer is susceptible to charge ordering with carrier doping, it is required to investigate… ▽ More 4Hb-TaS2 has been proposed to possess unconventional superconductivity with broken time reveral symmetry due to distinctive layered structure, featuring a heterojunction between a 2D triangular Mott insulator and a charge density wave metal. However, since a frustrated spin state in the correlated insulating layer is susceptible to charge ordering with carrier doping, it is required to investigate the charge distribution driven by inter-layer charge transfer to understand its superconductivity. Here, we use scanning tunneling microscopy and spectroscopy (STM/S) to investigate the charge ordered phases of 1T-TaS2 layers within 4Hb-TaS2, explicitly focusing on the non-half-filled regime. Our STS results show an energy gap which exhibits an out-of-phase relation with the charge density. We ascribe the competition between on-site and nonlocal Coulomb repulsion as the driving force for the charge-ordered insulating phase of a doped triangular Mott insulator. In addition, we discuss the role of the insulating layer in the enhanced superconductivity of 4Hb-TaS2. △ Less

Submitted 17 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 18 pages, 6 figures

Journal ref: Phys. Rev. B 109, 195170 (2024)

arXiv:2406.07803 [pdf, other]

doi 10.21437/Interspeech.2024-398

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech

Authors: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee

Abstract: Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressi… ▽ More Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressive emotional speech by using a spherical emotion vector to control the emotional style and intensity of the synthetic speech. Without any human annotation, we use the arousal, valence, and dominance pseudo-labels to model the complex nature of emotion via a Cartesian-spherical transformation. Furthermore, we propose a dual conditional adversarial network to improve the quality of generated speech by reflecting the multi-aspect characteristics. The experimental results demonstrate the model ability to control emotional style and intensity with high-quality expressive speech. △ Less

Submitted 4 November, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: Proceedings of Interspeech

arXiv:2405.09784 [pdf, other]

Online bipartite matching with imperfect advice

Authors: Davin Choo, Themis Gouleakis, Chun Kai Ling, Arnab Bhattacharyya

Abstract: We study the problem of online unweighted bipartite matching with $n$ offline vertices and $n$ online vertices where one wishes to be competitive against the optimal offline algorithm. While the classic RANKING algorithm of Karp et al. [1990] provably attains competitive ratio of $1-1/e > 1/2$, we show that no learning-augmented method can be both 1-consistent and strictly better than $1/2$-robust… ▽ More We study the problem of online unweighted bipartite matching with $n$ offline vertices and $n$ online vertices where one wishes to be competitive against the optimal offline algorithm. While the classic RANKING algorithm of Karp et al. [1990] provably attains competitive ratio of $1-1/e > 1/2$, we show that no learning-augmented method can be both 1-consistent and strictly better than $1/2$-robust under the adversarial arrival model. Meanwhile, under the random arrival model, we show how one can utilize methods from distribution testing to design an algorithm that takes in external advice about the online vertices and provably achieves competitive ratio interpolating between any ratio attainable by advice-free methods and the optimal ratio of 1, depending on the advice quality. △ Less

Submitted 23 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted into ICML 2024

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.15714 [pdf, ps, other]

Analytic asymptotic formulas for effective parameters of planar elastic composites

Authors: Daehee Cho, Doosung Choi, Mikyoung Lim

Abstract: We investigate the effective elastic properties of periodic dilute two-phase composites consisting of an homogeneous isotropic matrix and a periodic array of rigid inclusions. We assume the rigid inclusion in a unit cell is a simply connected, bounded domain so that there exists an exterior conformal mapping corresponding the inclusion. Recently, an analytical series solution method for the elasti… ▽ More We investigate the effective elastic properties of periodic dilute two-phase composites consisting of an homogeneous isotropic matrix and a periodic array of rigid inclusions. We assume the rigid inclusion in a unit cell is a simply connected, bounded domain so that there exists an exterior conformal mapping corresponding the inclusion. Recently, an analytical series solution method for the elastic problem with a rigid inclusion was developed based on the layer potential technique and the geometric function theory \cite{Mattei:2021:EAS}. In this paper, by using the series solution method, we derive expression formulas for the elastic moment tensors--the coefficients of the multipole expansion associated with an elastic inclusion--of an inclusion of arbitrary shape. These formulas for the elastic moment tensors lead us to analytic asymptotic formulas for the effective parameters of the periodic elastic composites with rigid inclusions in terms of the associated exterior conformal mapping. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.15713 [pdf, ps, other]

Geometric series solution for the plane elastostatic problem in the presence of a cavity

Authors: Daehee Cho, Doosung Choi, Mikyoung Lim

Abstract: This paper presents an analytic series solution method for the elastic inclusion problem in a two-dimensional unbounded isotropic medium with a cavity. Generalizing the work of Mattei and Lim \cite{Mattei:2021:EAS}, this study develops an analytic series solution method for the elastic inclusion problem to encompass a cavity problem. The central mathematical challenge tackled in this research is t… ▽ More This paper presents an analytic series solution method for the elastic inclusion problem in a two-dimensional unbounded isotropic medium with a cavity. Generalizing the work of Mattei and Lim \cite{Mattei:2021:EAS}, this study develops an analytic series solution method for the elastic inclusion problem to encompass a cavity problem. The central mathematical challenge tackled in this research is to deal with the conormal derivative condition. By using the complex-variable formulation for the conormal derivative, we effectively deal with the boundary condition and derive an explicit series solution for the plane elastostatic problem with a cavity of arbitrary shape subject to arbitrary far-field loading. The solution is expressed as a series expansion in terms of the given far-field loading and the exterior conformal mapping associated with the cavity. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.01519 [pdf, other]

Analytic shape recovery of an elastic inclusion from elastic moment tensors

Authors: Daehee Cho, Mikyoung Lim

Abstract: In this paper, we present an analytic non-iterative approach for recovering a planar isotropic elastic inclusion embedded in an unbounded medium from the elastic moment tensors (EMTs), which are coefficients for the multipole expansion of field perturbation caused by the inclusion. EMTs contain information about the inclusion's material and geometric properties and, as is well known, the inclusion… ▽ More In this paper, we present an analytic non-iterative approach for recovering a planar isotropic elastic inclusion embedded in an unbounded medium from the elastic moment tensors (EMTs), which are coefficients for the multipole expansion of field perturbation caused by the inclusion. EMTs contain information about the inclusion's material and geometric properties and, as is well known, the inclusion can be approximated by a disk from leading-order EMTs. We define the complex contracted EMTs as the linear combinations of EMTs where the expansion coefficients are given from complex-valued background polynomial solutions. By using the layer potential technique for the Lamé system and the theory of conformal mapping, we derive explicit asymptotic formulas in terms of the complex contracted EMTs for the shape of the inclusion, treating the inclusion as a perturbed disk. These formulas lead us to an analytic non-iterative algorithm for elastic inclusion reconstruction using EMTs. We perform numerical experiments to demonstrate the validity and limitations of our proposed method. △ Less

Submitted 5 August, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 22 pages, 3 figures

arXiv:2403.01162 [pdf, ps, other]

doi 10.1016/j.orl.2024.107103

Envy-Free House Allocation with Minimum Subsidy

Authors: Davin Choo, Yan Hao Ling, Warut Suksompong, Nicholas Teh, Jian Zhang

Abstract: House allocation refers to the problem where $m$ houses are to be allocated to $n$ agents so that each agent receives one house. Since an envy-free house allocation does not always exist, we consider finding such an allocation in the presence of subsidy. We show that computing an envy-free allocation with minimum subsidy is NP-hard in general, but can be done efficiently if $m$ differs from $n$ by… ▽ More House allocation refers to the problem where $m$ houses are to be allocated to $n$ agents so that each agent receives one house. Since an envy-free house allocation does not always exist, we consider finding such an allocation in the presence of subsidy. We show that computing an envy-free allocation with minimum subsidy is NP-hard in general, but can be done efficiently if $m$ differs from $n$ by an additive constant or if the agents have identical utilities. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Journal ref: Operations Research Letters, 54:107103 (2024)

arXiv:2402.08229 [pdf, other]

Causal Discovery under Off-Target Interventions

Authors: Davin Choo, Kirankumar Shiragur, Caroline Uhler

Abstract: Causal graph discovery is a significant problem with applications across various disciplines. However, with observational data alone, the underlying causal graph can only be recovered up to its Markov equivalence class, and further assumptions or interventions are necessary to narrow down the true graph. This work addresses the causal discovery problem under the setting of stochastic interventions… ▽ More Causal graph discovery is a significant problem with applications across various disciplines. However, with observational data alone, the underlying causal graph can only be recovered up to its Markov equivalence class, and further assumptions or interventions are necessary to narrow down the true graph. This work addresses the causal discovery problem under the setting of stochastic interventions with the natural goal of minimizing the number of interventions performed. We propose the following stochastic intervention model which subsumes existing adaptive noiseless interventions in the literature while capturing scenarios such as fat-hand interventions and CRISPR gene knockouts: any intervention attempt results in an actual intervention on a random subset of vertices, drawn from a distribution dependent on attempted action. Under this model, we study the two fundamental problems in causal discovery of verification and search and provide approximation algorithms with polylogarithmic competitive ratios and provide some preliminary experimental results. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Accepted into AISTATS 2024

arXiv:2401.08095 [pdf, other]

doi 10.1109/TAFFC.2025.3530920

DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment

Authors: Hyung-Seok Oh, Sang-Hoon Lee, Deok-Hyeon Cho, Seong-Whan Lee

Abstract: Emotional voice conversion (EVC) involves modifying various acoustic characteristics, such as pitch and spectral envelope, to match a desired emotional state while preserving the speaker's identity. Existing EVC methods often rely on text transcriptions or time-alignment information and struggle to handle varying speech durations effectively. In this paper, we propose DurFlex-EVC, a duration-flexi… ▽ More Emotional voice conversion (EVC) involves modifying various acoustic characteristics, such as pitch and spectral envelope, to match a desired emotional state while preserving the speaker's identity. Existing EVC methods often rely on text transcriptions or time-alignment information and struggle to handle varying speech durations effectively. In this paper, we propose DurFlex-EVC, a duration-flexible EVC framework that operates without the need for text or alignment information. We introduce a unit aligner that models contextual information by aligning speech with discrete units representing content, eliminating the need for text or speech-text alignment. Additionally, we design a style autoencoder that effectively disentangles content and emotional style, allowing precise manipulation of the emotional characteristics of the speech. We further enhance emotional expressiveness through a hierarchical stylize encoder that applies the target emotional style at multiple hierarchical levels, refining the stylization process to improve the naturalness and expressiveness of the converted speech. Experimental results from subjective and objective evaluations demonstrate that our approach outperforms baseline models, effectively handling duration variability and enhancing emotional expressiveness in the converted speech. △ Less

Submitted 20 January, 2025; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: 15 pages, 11 figures, 12 tables

Journal ref: IEEE Transactions on Affective Computing, 2025, pp.1 - 15

arXiv:2401.00265 [pdf, ps, other]

doi 10.1021/acsnano.4c05398

An unconventional platform for two-dimensional Kagome flat bands on semiconductor surfaces

Authors: Jae Hyuck Lee, GwanWoo Kim, Inkyung Song, Yejin Kim, Yeonjae Lee, Sung Jong Yoo, Deok-Yong Cho, Jun-Won Rhim, Jongkeun Jung, Gunn Kim, Changyoung Kim

Abstract: In condensed matter physics, the Kagome lattice and its inherent flat bands have attracted considerable attention for their potential to host a variety of exotic physical phenomena. Despite extensive efforts to fabricate thin films of Kagome materials aimed at modulating the flat bands through electrostatic gating or strain manipulation, progress has been limited. Here, we report the observation o… ▽ More In condensed matter physics, the Kagome lattice and its inherent flat bands have attracted considerable attention for their potential to host a variety of exotic physical phenomena. Despite extensive efforts to fabricate thin films of Kagome materials aimed at modulating the flat bands through electrostatic gating or strain manipulation, progress has been limited. Here, we report the observation of a novel $d$-orbital hybridized Kagome-derived flat band in Ag/Si(111) $\sqrt{3}\times\sqrt{3}$ as revealed by angle-resolved photoemission spectroscopy. Our findings indicate that silver atoms on a silicon substrate form a Kagome-like structure, where a delicate balance in the hopping parameters of the in-plane $d$-orbitals leads to destructive interference, resulting in a flat band. These results not only introduce a new platform for Kagome physics but also illuminate the potential for integrating metal-semiconductor interfaces into Kagome-related research, thereby opening a new avenue for exploring ideal two-dimensional Kagome systems. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: 7 pages, 4 figures

arXiv:2312.08986 [pdf, other]

doi 10.1021/acs.nanolett.3c03721

Melting of unidirectional charge density waves across twin domain boundaries in GdTe$_{3}$

Authors: Sanghun Lee, Eunseo Kim, Junho Bang, Jongho Park, Changyoung Kim, Dirk Wulferding, Doohee Cho

Abstract: Solids undergoing a transition from order to disorder experience the proliferation of topological defects. The melting process generates transient quantum states. However, their dynamical nature with femtosecond lifetime hinders exploration with atomic precision. Here, we suggest an alternative approach to the dynamical melting process by focusing on the interface created by competing degenerate q… ▽ More Solids undergoing a transition from order to disorder experience the proliferation of topological defects. The melting process generates transient quantum states. However, their dynamical nature with femtosecond lifetime hinders exploration with atomic precision. Here, we suggest an alternative approach to the dynamical melting process by focusing on the interface created by competing degenerate quantum states. We use a scanning tunneling microscope (STM) to visualize the unidirectional charge density wave (CDW) and its spatial progression ("static melting") across a twin domain boundary (TDB) in the layered material GdTe$_{3}$. Combining STM with a spatial lock-in technique, we reveal that the order parameter amplitude attenuates with the formation of dislocations and thus two different unidirectional CDWs coexist near the TDB, reducing the CDW anisotropy. Notably, we discover a correlation between this anisotropy and the CDW gap. Our study provides valuable insight into the behavior of topological defects and transient quantum states. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Journal ref: Nano Lett. 23, 11219 (2023)

Showing 1–50 of 203 results for author: Choe, D