Search | arXiv e-print repository

PersonaBooth: Personalized Text-to-Motion Generation

Authors: Boeun Kim, Hea In Jeong, JungHoon Sung, Yihua Cheng, Jeongmin Lee, Ju Yong Chang, Sang-Il Choi, Younggeun Choi, Saim Shin, Jungho Kim, Hyung Jin Chang

Abstract: This paper introduces Motion Personalization, a new task that generates personalized motions aligned with text descriptions using several basic motions containing Persona. To support this novel task, we introduce a new large-scale motion dataset called PerMo (PersonaMotion), which captures the unique personas of multiple actors. We also propose a multi-modal finetuning method of a pretrained motio… ▽ More This paper introduces Motion Personalization, a new task that generates personalized motions aligned with text descriptions using several basic motions containing Persona. To support this novel task, we introduce a new large-scale motion dataset called PerMo (PersonaMotion), which captures the unique personas of multiple actors. We also propose a multi-modal finetuning method of a pretrained motion diffusion model called PersonaBooth. PersonaBooth addresses two main challenges: i) A significant distribution gap between the persona-focused PerMo dataset and the pretraining datasets, which lack persona-specific data, and ii) the difficulty of capturing a consistent persona from the motions vary in content (action type). To tackle the dataset distribution gap, we introduce a persona token to accept new persona features and perform multi-modal adaptation for both text and visuals during finetuning. To capture a consistent persona, we incorporate a contrastive learning technique to enhance intra-cohesion among samples with the same persona. Furthermore, we introduce a context-aware fusion mechanism to maximize the integration of persona cues from multiple input motions. PersonaBooth outperforms state-of-the-art motion style transfer methods, establishing a new benchmark for motion personalization. △ Less

Submitted 21 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

arXiv:2503.07216 [pdf, other]

FedRand: Enhancing Privacy in Federated Learning with Randomized LoRA Subparameter Updates

Authors: Sangwoo Park, Seanie Lee, Byungjoo Kim, Sung Ju Hwang

Abstract: Federated Learning (FL) is a widely used framework for training models in a decentralized manner, ensuring that the central server does not have direct access to data from local clients. However, this approach may still fail to fully preserve data privacy, as models from local clients are exposed to the central server during the aggregation process. This issue becomes even more critical when train… ▽ More Federated Learning (FL) is a widely used framework for training models in a decentralized manner, ensuring that the central server does not have direct access to data from local clients. However, this approach may still fail to fully preserve data privacy, as models from local clients are exposed to the central server during the aggregation process. This issue becomes even more critical when training vision-language models (VLMs) with FL, as VLMs can easily memorize training data instances, making them vulnerable to membership inference attacks (MIAs). To address this challenge, we propose the FedRand framework, which avoids disclosing the full set of client parameters. In this framework, each client randomly selects subparameters of Low-Rank Adaptation (LoRA) from the server and keeps the remaining counterparts of the LoRA weights as private parameters. After training both parameters on the client's private dataset, only the non-private client parameters are sent back to the server for aggregation. This approach mitigates the risk of exposing client-side VLM parameters, thereby enhancing data privacy. We empirically validate that FedRand improves robustness against MIAs compared to relevant baselines while achieving accuracy comparable to methods that communicate full LoRA parameters across several benchmark datasets. △ Less

Submitted 11 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

Comments: Preprint

arXiv:2503.04335 [pdf]

Quantum interference and occupation control in high harmonic generation from monolayer $WS_2$

Authors: Minjeong Kim, Taeho Kim, Anna Galler, Dasol Kim, Alexis Chacon, Xiangxin Gong, Yuhui Yang, Rouli Fang, Kenji Watanabe, Takashi Taniguchi, B. J. Kim, Sang Hoon Chae, Moon-Ho Jo, Angel Rubio, Ofer Neufeld, Jonghwan Kim

Abstract: Two-dimensional hexagonal materials such as transition metal dichalcogenides exhibit valley degrees of freedom, offering fascinating potential for valley-based quantum computing and optoelectronics. In nonlinear optics, the K and K' valleys provide excitation resonances that can be used for ultrafast control of excitons, Bloch oscillations, and Floquet physics. Under intense laser fields, however,… ▽ More Two-dimensional hexagonal materials such as transition metal dichalcogenides exhibit valley degrees of freedom, offering fascinating potential for valley-based quantum computing and optoelectronics. In nonlinear optics, the K and K' valleys provide excitation resonances that can be used for ultrafast control of excitons, Bloch oscillations, and Floquet physics. Under intense laser fields, however, the role of coherent carrier dynamics away from the K/K' valleys is largely unexplored. In this study, we observe quantum interferences in high harmonic generation from monolayer $WS_2$ as laser fields drive electrons from the valleys across the full Brillouin zone. In the perturbative regime, interband resonances at the valleys enhance high harmonic generation through multi-photon excitations. In the strong-field regime, the high harmonic spectrum is sensitively controlled by light-driven quantum interferences between the interband valley resonances and intraband currents originating from electrons occupying various points in the Brillouin zone, also away from K/K' valleys such as $Γ$ and M. Our experimental observations are in strong agreement with quantum simulations, validating their interpretation. This work proposes new routes for harnessing laser-driven quantum interference in two-dimensional hexagonal systems and all-optical techniques to occupy and read-out electronic structures in the full Brillouin zone via strong-field nonlinear optics, advancing quantum technologies. △ Less

Submitted 9 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

arXiv:2503.02192 [pdf, other]

Design of the Global Reconstruction Logic in the Belle II Level-1 Trigger system

Authors: Y. -T. Lai, T. Koga, Y. Iwasaki, Y. Ahn, H. Bae, M. Campajola, B. G. Cheon, H. -E. Cho, T. Ferber, I. Haide, G. Heine, C. -L. Hsu, C. Kiesling, C. -H. Kim, J. B. Kim, K. Kim, S. H. Kim, I. S. Lee, M. J. Lee, Y. P. Liao, J. Lin, A. Little, H. K. Moon, H. Nakazawa, M. Neu , et al. (10 additional authors not shown)

Abstract: The Belle~II experiment is designed to search for physics beyond the Standard Model by investigating rare decays at the SuperKEKB $e^{+}e^{-}$ collider. Owing to the significant beam background at high luminosity, the data acquisition system employs a hardware-based Level-1~Trigger to reduce the readout data throughput by selecting collision events of interest in real time. The Belle~II Level-1~… ▽ More The Belle~II experiment is designed to search for physics beyond the Standard Model by investigating rare decays at the SuperKEKB $e^{+}e^{-}$ collider. Owing to the significant beam background at high luminosity, the data acquisition system employs a hardware-based Level-1~Trigger to reduce the readout data throughput by selecting collision events of interest in real time. The Belle~II Level-1~Trigger system utilizes FPGAs to reconstruct various detector observables from the raw data for trigger decision-making. The Global Reconstruction Logic receives these processed observables from four sub-trigger systems and provides a global summary for the final trigger decision. Its logic encompasses charged particle tracking, matching between sub-triggers, and the identification of special event topologies associated with low-multiplicity decays. This article discusses the hardware devices, FPGA firmware, integration with peripheral systems, and the design and performance of the trigger algorithms implemented within the Global Reconstruction Logic. △ Less

Submitted 3 March, 2025; originally announced March 2025.

Comments: 10 pages, 12 figures

arXiv:2503.01905 [pdf, other]

PaCA: Partial Connection Adaptation for Efficient Fine-Tuning

Authors: Sunghyeon Woo, Sol Namkung, Sunwoo Lee, Inho Jeong, Beomseok Kim, Dongsuk Jeon

Abstract: Prior parameter-efficient fine-tuning (PEFT) algorithms reduce memory usage and computational costs of fine-tuning large neural network models by training only a few additional adapter parameters, rather than the entire model. However, the reduction in computational costs due to PEFT does not necessarily translate to a reduction in training time; although the computational costs of the adapter lay… ▽ More Prior parameter-efficient fine-tuning (PEFT) algorithms reduce memory usage and computational costs of fine-tuning large neural network models by training only a few additional adapter parameters, rather than the entire model. However, the reduction in computational costs due to PEFT does not necessarily translate to a reduction in training time; although the computational costs of the adapter layers are much smaller than the pretrained layers, it is well known that those two types of layers are processed sequentially on GPUs, resulting in significant latency overhead. LoRA and its variants merge low-rank adapter matrices with pretrained weights during inference to avoid latency overhead, but during training, the pretrained weights remain frozen while the adapter matrices are continuously updated, preventing such merging. To mitigate this issue, we propose Partial Connection Adaptation (PaCA), which fine-tunes randomly selected partial connections within the pretrained weights instead of introducing adapter layers in the model. PaCA not only enhances training speed by eliminating the time overhead due to the sequential processing of the adapter and pretrained layers but also reduces activation memory since only partial activations, rather than full activations, need to be stored for gradient computation. Compared to LoRA, PaCA reduces training time by 22% and total memory usage by 16%, while maintaining comparable accuracy across various fine-tuning scenarios, such as fine-tuning on the MMLU dataset and instruction tuning on the Oasst1 dataset. PaCA can also be combined with quantization, enabling the fine-tuning of large models such as LLaMA3.1-70B. In addition, PaCA enables training with 23% longer sequence and improves throughput by 16% on both NVIDIA A100 GPU and INTEL Gaudi2 HPU compared to LoRA. The code is available at https://github.com/WooSunghyeon/paca. △ Less

Submitted 11 March, 2025; v1 submitted 28 February, 2025; originally announced March 2025.

arXiv:2503.00652 [pdf]

Relativistic Spin-Lattice Interaction Compatible with Discrete Translation Symmetry in Solids

Authors: Bumseop Kim, Noejung Park, Kyoung-Whan Kim

Abstract: Recent interest in orbital angular momentum has led to a rapid expansion of research on spin-orbit coupling effects in solids, while also highlighting significant technical challenges. The breaking of rotational symmetry renders the orbital angular momentum operator ill-defined, causing conceptual and computational issues in describing orbital motion. To address these issues, here we propose an al… ▽ More Recent interest in orbital angular momentum has led to a rapid expansion of research on spin-orbit coupling effects in solids, while also highlighting significant technical challenges. The breaking of rotational symmetry renders the orbital angular momentum operator ill-defined, causing conceptual and computational issues in describing orbital motion. To address these issues, here we propose an alternative framework. Based on the Bloch representation of the full relativistic interaction, we derive a field that directly couples to electron spins while preserving discrete translational symmetry, thereby eliminating the need for the position operator. Our approach is fully compatible with existing first-principles computational frameworks for both static and time-dependent density functional theory. We demonstrate that this method offers a more effective description of the Edelstein and spin Hall effects compared to conventional orbital angular momentum formalisms. △ Less

Submitted 1 March, 2025; originally announced March 2025.

arXiv:2502.20843 [pdf, other]

Hierarchical and Modular Network on Non-prehensile Manipulation in General Environments

Authors: Yoonyoung Cho, Junhyek Han, Jisu Han, Beomjoon Kim

Abstract: For robots to operate in general environments like households, they must be able to perform non-prehensile manipulation actions such as toppling and rolling to manipulate ungraspable objects. However, prior works on non-prehensile manipulation cannot yet generalize across environments with diverse geometries. The main challenge lies in adapting to varying environmental constraints: within a cabine… ▽ More For robots to operate in general environments like households, they must be able to perform non-prehensile manipulation actions such as toppling and rolling to manipulate ungraspable objects. However, prior works on non-prehensile manipulation cannot yet generalize across environments with diverse geometries. The main challenge lies in adapting to varying environmental constraints: within a cabinet, the robot must avoid walls and ceilings; to lift objects to the top of a step, the robot must account for the step's pose and extent. While deep reinforcement learning (RL) has demonstrated impressive success in non-prehensile manipulation, accounting for such variability presents a challenge for the generalist policy, as it must learn diverse strategies for each new combination of constraints. To address this, we propose a modular and reconfigurable architecture that adaptively reconfigures network modules based on task requirements. To capture the geometric variability in environments, we extend the contact-based object representation (CORN) to environment geometries, and propose a procedural algorithm for generating diverse environments to train our agent. Taken together, the resulting policy can zero-shot transfer to novel real-world environments and objects despite training entirely within a simulator. We additionally release a simulation-based benchmark featuring nine digital twins of real-world scenes with 353 objects to facilitate non-prehensile manipulation research in realistic domains. △ Less

Submitted 28 February, 2025; originally announced February 2025.

Comments: http://unicorn-hamnet.github.io/

arXiv:2502.18934 [pdf, other]

Kanana: Compute-efficient Bilingual Language Models

Authors: Kanana LLM Team, Yunju Bak, Hojin Lee, Minho Ryu, Jiyeon Ham, Seungjae Jung, Daniel Wontae Nam, Taegyeong Eo, Donghun Lee, Doohae Jung, Boseop Kim, Nayeon Kim, Jaesun Park, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Kyoung-Woon On, Seulye Baeg, Junrae Cho, Sunghee Jung, Jieun Kang, EungGyun Kim, Eunhwa Kim, Byeongil Ko, Daniel Lee , et al. (4 additional authors not shown)

Abstract: We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality dat… ▽ More We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation. Furthermore, the report outlines the methodologies utilized during the post-training of the Kanana models, encompassing supervised fine-tuning and preference optimization, aimed at enhancing their capability for seamless interaction with users. Lastly, the report elaborates on plausible approaches used for language model adaptation to specific scenarios, such as embedding, retrieval augmented generation, and function calling. The Kanana model series spans from 2.1B to 32.5B parameters with 2.1B models (base, instruct, embedding) publicly released to promote research on Korean language models. △ Less

Submitted 28 February, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

Comments: 40 pages, 15 figures

arXiv:2502.18015 [pdf, other]

$\texttt{SPIN}$: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation

Authors: Haewon Jung, Donguk Lee, Haecheol Park, JunHyeop Kim, Beomjoon Kim

Abstract: Current robots struggle with long-horizon manipulation tasks requiring sequences of prehensile and non-prehensile skills, contact-rich interactions, and long-term reasoning. We present $\texttt{SPIN}$ ($\textbf{S}$kill $\textbf{P}$lanning to $\textbf{IN}$ference), a framework that distills a computationally intensive planning algorithm into a policy via imitation learning. We propose… ▽ More Current robots struggle with long-horizon manipulation tasks requiring sequences of prehensile and non-prehensile skills, contact-rich interactions, and long-term reasoning. We present $\texttt{SPIN}$ ($\textbf{S}$kill $\textbf{P}$lanning to $\textbf{IN}$ference), a framework that distills a computationally intensive planning algorithm into a policy via imitation learning. We propose $\texttt{Skill-RRT}$, an extension of RRT that incorporates skill applicability checks and intermediate object pose sampling for solving such long-horizon problems. To chain independently trained skills, we introduce $\textit{connectors}$, goal-conditioned policies trained to minimize object disturbance during transitions. High-quality demonstrations are generated with $\texttt{Skill-RRT}$ and distilled through noise-based replay in order to reduce online computation time. The resulting policy, trained entirely in simulation, transfers zero-shot to the real world and achieves over 80% success across three challenging long-horizon manipulation tasks and outperforms state-of-the-art hierarchical RL and planning methods. △ Less

Submitted 7 May, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

Comments: Project website: https://sites.google.com/view/skill-rrt

arXiv:2502.17708 [pdf, other]

A Unified Model of Text and Citations for Topic-Specific Citation Networks

Authors: ByungKoo Kim, Saki Kuzushima, Yuki Shiraito

Abstract: Social scientists analyze citation networks to study how documents influence subsequent work across various domains such as judicial politics and international relations. However, conventional approaches that summarize document attributes in citation networks often overlook the diverse semantic contexts in which citations occur. This paper develops the paragraph-citation topic model (PCTM), which… ▽ More Social scientists analyze citation networks to study how documents influence subsequent work across various domains such as judicial politics and international relations. However, conventional approaches that summarize document attributes in citation networks often overlook the diverse semantic contexts in which citations occur. This paper develops the paragraph-citation topic model (PCTM), which analyzes citation networks and document texts jointly. The PCTM extends conventional topic models by assigning topics to paragraphs of citing documents, allowing citations to share topics with their embedding paragraphs. Our empirical analysis of U.S. Supreme Court opinions in the privacy issue domain, which includes cases on reproductive rights, demonstrates that citations within individual documents frequently span multiple substantive areas, and citations to individual documents show considerable topical diversity. △ Less

Submitted 24 February, 2025; originally announced February 2025.

MSC Class: 62P25; 91C20; 62F15

arXiv:2502.17310 [pdf, other]

Hyperfine and Zeeman Optical Pumping and Transverse Laser Cooling of a Thermal Atomic Beam of Dysprosium Using a Single 421 nm Laser

Authors: Rohan Chakravarthy, Jonathan Agil, Arijit Sharma, Jung Bog Kim, Dmitry Budker

Abstract: We demonstrate the effect of Zeeman and hyperfine optical pumping and transverse laser cooling of a dysprosium (Dy) atomic beam on the $4f^{10}6s^2(J = 8) \rightarrow 4f^{10}6s6p(J = 9)$ transition at 421.291 nm. For $^{163}$Dy, an electro-optic modulator is used to generate five frequency sidebands required to pump the atoms to the $F = 10.5$ ground state hyperfine level and the light polarizatio… ▽ More We demonstrate the effect of Zeeman and hyperfine optical pumping and transverse laser cooling of a dysprosium (Dy) atomic beam on the $4f^{10}6s^2(J = 8) \rightarrow 4f^{10}6s6p(J = 9)$ transition at 421.291 nm. For $^{163}$Dy, an electro-optic modulator is used to generate five frequency sidebands required to pump the atoms to the $F = 10.5$ ground state hyperfine level and the light polarization is chosen to pump the atoms to the $m_F = 10.5$ Zeeman sublevel. The atoms are simultaneously laser-cooled using a standing wave orthogonal to the atomic beam. The resulting polarized and cooled atomic beam will be used in fundamental physics experiments taking advantage of the accidental degeneracy of excited states in Dy including the ongoing measurement of parity violation in this system. △ Less

Submitted 25 February, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

Comments: 8 pages, 10 figures

arXiv:2502.17002 [pdf, other]

Neutron multiplicity measurement in muon capture on oxygen nuclei in the Gd-loaded Super-Kamiokande detector

Authors: The Super-Kamiokande Collaboration, :, S. Miki, K. Abe, S. Abe, Y. Asaoka, C. Bronner, M. Harada, Y. Hayato, K. Hiraide, K. Hosokawa, K. Ieki, M. Ikeda, J. Kameda, Y. Kanemura, R. Kaneshima, Y. Kashiwagi, Y. Kataoka, S. Mine, M. Miura, S. Moriyama, M. Nakahata, S. Nakayama, Y. Noguchi, K. Okamoto , et al. (265 additional authors not shown)

Abstract: In recent neutrino detectors, neutrons produced in neutrino reactions play an important role. Muon capture on oxygen nuclei is one of the processes that produce neutrons in water Cherenkov detectors. We measured neutron multiplicity in the process using cosmic ray muons that stop in the gadolinium-loaded Super-Kamiokande detector. For this measurement, neutron detection efficiency is obtained with… ▽ More In recent neutrino detectors, neutrons produced in neutrino reactions play an important role. Muon capture on oxygen nuclei is one of the processes that produce neutrons in water Cherenkov detectors. We measured neutron multiplicity in the process using cosmic ray muons that stop in the gadolinium-loaded Super-Kamiokande detector. For this measurement, neutron detection efficiency is obtained with the muon capture events followed by gamma rays to be $50.2^{+2.0}_{-2.1}\%$. By fitting the observed multiplicity considering the detection efficiency, we measure neutron multiplicity in muon capture as $P(0)=24\pm3\%$, $P(1)=70^{+3}_{-2}\%$, $P(2)=6.1\pm0.5\%$, $P(3)=0.38\pm0.09\%$. This is the first measurement of the multiplicity of neutrons associated with muon capture without neutron energy threshold. △ Less

Submitted 24 February, 2025; originally announced February 2025.

arXiv:2502.16908 [pdf, other]

Design of a low-cost and lightweight 6 DoF bimanual arm for dynamic and contact-rich manipulation

Authors: Jaehyung Kim, Jiho Kim, Dongryung Lee, Yujin Jang, Beomjoon Kim

Abstract: Dynamic and contact-rich object manipulation, such as striking, snatching, or hammering, remains challenging for robotic systems due to hardware limitations. Most existing robots are constrained by high-inertia design, limited compliance, and reliance on expensive torque sensors. To address this, we introduce ARMADA (Affordable Robot for Manipulation and Dynamic Actions), a 6 degrees-of-freedom bi… ▽ More Dynamic and contact-rich object manipulation, such as striking, snatching, or hammering, remains challenging for robotic systems due to hardware limitations. Most existing robots are constrained by high-inertia design, limited compliance, and reliance on expensive torque sensors. To address this, we introduce ARMADA (Affordable Robot for Manipulation and Dynamic Actions), a 6 degrees-of-freedom bimanual robot designed for dynamic manipulation research. ARMADA combines low-inertia, back-drivable actuators with a lightweight design, using readily available components and 3D-printed links for ease of assembly in research labs. The entire system, including both arms, is built for just $6,100. Each arm achieves speeds up to 6.16m/s, almost twice that of most collaborative robots, with a comparable payload of 2.5kg. We demonstrate ARMADA can perform dynamic manipulation like snatching, hammering, and bimanual throwing in real-world environments. We also showcase its effectiveness in reinforcement learning (RL) by training a non-prehensile manipulation policy in simulation and transferring it zero-shot to the real world, as well as human motion shadowing for dynamic bimanual object throwing. ARMADA is fully open-sourced with detailed assembly instructions, CAD models, URDFs, simulation, and learning codes. We highly recommend viewing the supplementary video at https://sites.google.com/view/im2-humanoid-arm. △ Less

Submitted 24 February, 2025; originally announced February 2025.

arXiv:2502.16384 [pdf, other]

A muon tagging with Flash ADC waveform baselines

Authors: D. H. Lee, M. K. Cheoun, J. H. Choi, J. Y. Choi, T. Dodo, J. Goh, K. Haga, M. Harada, S. Hasegawa, W. Hwang, T. Iida, H. I. Jang, J. S. Jang, K. K. Joo, D. E. Jung, S. K. Kang, Y. Kasugai, T. Kawasaki, E. M. Kim, S. B. Kim, S. Y. Kim, H. Kinoshita, T. Konno, C. Little, T. Maruyama , et al. (32 additional authors not shown)

Abstract: This manuscript describes an innovative method to tag the muons using the baseline information of the Flash ADC (FADC) waveform of PMTs in the JSNS1 (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) experiment. This experiment is designed for the search for sterile neutrinos, and a muon tagging is an essential key component for the background rejection since the detector of the… ▽ More This manuscript describes an innovative method to tag the muons using the baseline information of the Flash ADC (FADC) waveform of PMTs in the JSNS1 (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) experiment. This experiment is designed for the search for sterile neutrinos, and a muon tagging is an essential key component for the background rejection since the detector of the experiment is located over-ground, where is the 3rd floor of the J-PARC Material and Life experimental facility (MLF). Especially, stopping muons inside the detector create the Michel electrons, and they are important background to be rejected. Utilizing this innovative method, more than 99.8% of Michel electrons can be rejected even without a detector veto region. This technique can be employed for any experiments which uses the similar detector configurations. △ Less

Submitted 22 February, 2025; originally announced February 2025.

Comments: 7 pages, 6 figures

arXiv:2502.14067 [pdf, other]

Towards a global phase diagram of Ce-based dipolar-octupolar pyrochlore magnets under magnetic fields

Authors: Zhengbang Zhou, Yong Baek Kim

Abstract: Recent experiments have established a strong case for Ce$_2$(Zr, Sn, Hf)$_2$O$_7$ to host $π$-flux quantum spin ice (QSI). However, an irrefutable conclusion still requires strong, multifaceted evidence. In dipolar-octupolar (DO) compounds, external magnetic fields only strongly couple with the dipolar component $τ_z$ along its local z-axis in contrast to octupolar components $τ^{x,y}$. This gives… ▽ More Recent experiments have established a strong case for Ce$_2$(Zr, Sn, Hf)$_2$O$_7$ to host $π$-flux quantum spin ice (QSI). However, an irrefutable conclusion still requires strong, multifaceted evidence. In dipolar-octupolar (DO) compounds, external magnetic fields only strongly couple with the dipolar component $τ_z$ along its local z-axis in contrast to octupolar components $τ^{x,y}$. This gives rise to the unique ways magnetic fields interact with the system and, in turn, provides us with a variety of tuning knobs to generate comprehensive experimental results. In this work, we focus on magnetic fields along the (110), (111), and (001) directions and present a plethora of remarkable experimental signatures to probe the underlying physics of $π$-flux QSI using gauge mean field theory (GMFT) and Monte Carlo simulations. In particular, we present unique signatures in magnetic field-dependent phase diagrams, equal-time and dynamical structure factors, and magnetostriction. △ Less

Submitted 14 March, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

Comments: Main text: 7 pages, 4 figures; Supplemental material: 15 pages, 8 figures

arXiv:2502.11789 [pdf, other]

Personality Editing for Language Models through Relevant Knowledge Editing

Authors: Seojin Hwang, Yumin Kim, Byeongjeong Kim, Hwanhee Lee

Abstract: Large Language Models (LLMs) play a vital role in applications like conversational agents and content creation, where controlling a model's personality is crucial for maintaining tone, consistency, and engagement. However, traditional prompt-based techniques for controlling personality often fall short, as they do not effectively mitigate the model's inherent biases. In this paper, we introduce a… ▽ More Large Language Models (LLMs) play a vital role in applications like conversational agents and content creation, where controlling a model's personality is crucial for maintaining tone, consistency, and engagement. However, traditional prompt-based techniques for controlling personality often fall short, as they do not effectively mitigate the model's inherent biases. In this paper, we introduce a novel method PALETTE that enhances personality control through knowledge editing. By generating adjustment queries inspired by psychological assessments, our approach systematically adjusts responses to personality-related queries similar to modifying factual knowledge, thereby achieving controlled shifts in personality traits. Experimental results from both automatic and human evaluations demonstrate that our method enables more stable and well-balanced personality control in LLMs. △ Less

Submitted 17 February, 2025; originally announced February 2025.

Comments: 15 pages, 3 figures, 16 tables

arXiv:2502.11438 [pdf, other]

SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL

Authors: Jimin Lee, Ingeol Baek, Byeongjeong Kim, Hwanhee Lee

Abstract: Text-to-SQL aims to convert natural language questions into executable SQL queries. While previous approaches, such as skeleton-masked selection, have demonstrated strong performance by retrieving similar training examples to guide large language models (LLMs), they struggle in real-world scenarios where such examples are unavailable. To overcome this limitation, we propose Self-Augmentation in-co… ▽ More Text-to-SQL aims to convert natural language questions into executable SQL queries. While previous approaches, such as skeleton-masked selection, have demonstrated strong performance by retrieving similar training examples to guide large language models (LLMs), they struggle in real-world scenarios where such examples are unavailable. To overcome this limitation, we propose Self-Augmentation in-context learning with Fine-grained Example selection for Text-to-SQL (SAFE-SQL), a novel framework that improves SQL generation by generating and filtering self-augmented examples. SAFE-SQL first prompts an LLM to generate multiple Text-to-SQL examples relevant to the test input. Then SAFE-SQL filters these examples through three relevance assessments, constructing high-quality in-context learning examples. Using self-generated examples, SAFE-SQL surpasses the previous zero-shot, and few-shot Text-to-SQL frameworks, achieving higher execution accuracy. Notably, our approach provides additional performance gains in extra hard and unseen scenarios, where conventional methods often fail. △ Less

Submitted 16 February, 2025; originally announced February 2025.

Comments: 13 pages, 5 figures, 10 tables

arXiv:2502.09878 [pdf]

doi 10.1038/s41467-025-58024-w

Superconductivity and a van Hove singularity confined to the surface of a topological semimetal

Authors: Md Shafayat Hossain, Rajibul Islam, Zi-Jia Cheng, Zahir Muhammad, Qi Zhang, Zurab Guguchia, Jonas A. Krieger, Brian Casas, Yu-Xiao Jiang, Maksim Litskevich, Xian P. Yang, Byunghoon Kim, Tyler A. Cochran, Ilias E. Perakis, Fei Xue, Mehdi Kargarian, Weisheng Zhao, Luis Balicas, M. Zahid Hasan

Abstract: The interplay between electronic topology and superconductivity is the subject of great current interest in condensed matter physics. For example, superconductivity induced on the surface of topological insulators is predicted to be triplet in nature, while the interplay between electronic correlations and topology may lead to unconventional superconductivity as in twisted bilayer graphene. Here,… ▽ More The interplay between electronic topology and superconductivity is the subject of great current interest in condensed matter physics. For example, superconductivity induced on the surface of topological insulators is predicted to be triplet in nature, while the interplay between electronic correlations and topology may lead to unconventional superconductivity as in twisted bilayer graphene. Here, we unveil an unconventional two-dimensional superconducting state in the recently discovered Dirac nodal line semimetal ZrAs2 which is exclusively confined to the top and bottom surfaces within the crystal's ab plane. As a remarkable consequence of this emergent state, we observe a Berezinskii-Kosterlitz-Thouless (BKT) transition, the hallmark of two-dimensional superconductivity. Notably, this is the first observation of a BKT transition on the surface of a three-dimensional system. Furthermore, employing angle-resolved photoemission spectroscopy and first-principles calculations, we find that these same surfaces also host a two-dimensional van Hove singularity near the Fermi energy. The proximity of van Hove singularity to the Fermi level leads to enhanced electronic correlations contributing to the stabilization of superconductivity at the surface of ZrAs2, a unique phenomenon among topological semimetals. The surface-confined nature of the van Hove singularity, and associated superconductivity, realized for the first time, opens new avenues to explore the interplay between low-dimensional quantum topology, correlations, and superconductivity in a bulk material without resorting to the superconducting proximity effect. △ Less

Submitted 13 February, 2025; originally announced February 2025.

Comments: in press

Journal ref: Nature Communications volume 16, Article number: 3998 (2025)

arXiv:2502.08537 [pdf]

doi 10.1038/s41467-025-58262-y

Broken symmetries associated with a Kagome chiral charge order

Authors: Zi-Jia Cheng, Md Shafayat Hossain, Qi Zhang, Sen Shao, Jinjin Liu, Yilin Zhao, Mohammad Yahyavi, Yu-Xiao Jiang, Jia-Xin Yin, Xian Yang, Yongkai Li, Tyler A. Cochran, Maksim Litskevich, Byunghoon Kim, Junyi Zhang, Yugui Yao, Luis Balicas, Zhiwei Wang, Guoqing Chang, M. Zahid Hasan

Abstract: Chirality or handedness manifests in all fields of science, ranging from cell biology, molecular interaction, and catalysis to different branches of physics. In condensed matter physics, chirality is intrinsic to enigmatic quantum phases, such as chiral charge density waves and chiral superconductivity. Here, the underlying chiral response is subtle and leads to broken symmetries in the ground sta… ▽ More Chirality or handedness manifests in all fields of science, ranging from cell biology, molecular interaction, and catalysis to different branches of physics. In condensed matter physics, chirality is intrinsic to enigmatic quantum phases, such as chiral charge density waves and chiral superconductivity. Here, the underlying chiral response is subtle and leads to broken symmetries in the ground state. Detection of subtle broken symmetries is the key to understand these quantum states but they are extremely challenging to expose leading to debate and controversy. Here, using second-order optical response, we uncover the broken symmetries of a chiral charge density wave in the Kagome lattice KV3Sb5, revealing the relevant broken symmetries of its charge order. KV3Sb5 undergoes a phase transition to a charge-ordered state at low temperatures. Our polarization-dependent mid-infrared photocurrent microscopy reveals an intrinsic, longitudinal helicity-dependent photocurrent associated with the charge order. Our measurements, supported by our theoretical analysis, provide direct evidence for broken inversion and mirror symmetries at the charge order transition, indicating a chiral charge ordered state. On the other hand, we do not observe a circular photogalvanic effect along the direction perpendicular to that of the incident light, imposing stringent constraints on the rotational and point group symmetries of the charge order. Our study not only visualizes the chiral nature of the Kagome charge order revealing its broken symmetries, but also highlights the nonlinear photogalvanic effect as a sensitive probe for detecting subtle symmetry breakings. △ Less

Submitted 12 February, 2025; originally announced February 2025.

Comments: in press

Journal ref: Nature Communications volume 16, Article number: 3782 (2025)

arXiv:2502.07586 [pdf, other]

We Can't Understand AI Using our Existing Vocabulary

Authors: John Hewitt, Robert Geirhos, Been Kim

Abstract: This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be… ▽ More This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be framed as a communication problem: humans must be able to reference and control machine concepts, and communicate human concepts to machines. Creating a shared human-machine language through developing neologisms, we believe, could solve this communication problem. Successful neologisms achieve a useful amount of abstraction: not too detailed, so they're reusable in many contexts, and not too high-level, so they convey precise information. As a proof of concept, we demonstrate how a "length neologism" enables controlling LLM response length, while a "diversity neologism" allows sampling more variable responses. Taken together, we argue that we cannot understand AI using our existing vocabulary, and expanding it through neologisms creates opportunities for both controlling and understanding machines better. △ Less

Submitted 11 February, 2025; originally announced February 2025.

Comments: Position paper

arXiv:2502.06516 [pdf, other]

Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation

Authors: Soobin Um, Beomsu Kim, Jong Chul Ye

Abstract: Minority samples are underrepresented instances located in low-density regions of a data manifold, and are valuable in many generative AI applications, such as data augmentation, creative content generation, etc. Unfortunately, existing diffusion-based minority generators often rely on computationally expensive guidance dedicated for minority generation. To address this, here we present a simple y… ▽ More Minority samples are underrepresented instances located in low-density regions of a data manifold, and are valuable in many generative AI applications, such as data augmentation, creative content generation, etc. Unfortunately, existing diffusion-based minority generators often rely on computationally expensive guidance dedicated for minority generation. To address this, here we present a simple yet powerful guidance-free approach called Boost-and-Skip for generating minority samples using diffusion models. The key advantage of our framework requires only two minimal changes to standard generative processes: (i) variance-boosted initialization and (ii) timestep skipping. We highlight that these seemingly-trivial modifications are supported by solid theoretical and empirical evidence, thereby effectively promoting emergence of underrepresented minority features. Our comprehensive experiments demonstrate that Boost-and-Skip greatly enhances the capability of generating minority samples, even rivaling guidance-based state-of-the-art approaches while requiring significantly fewer computations. △ Less

Submitted 10 February, 2025; originally announced February 2025.

Comments: 29 pages, 11 figures

arXiv:2502.04892 [pdf, other]

A Foundational Brain Dynamics Model via Stochastic Optimal Control

Authors: Joonhyeong Park, Byoungwoo Park, Chang-Bae Bang, Jungwon Choi, Hyungjin Chung, Byung-Hoon Kim, Juho Lee

Abstract: We introduce a foundational model for brain dynamics that utilizes stochastic optimal control (SOC) and amortized inference. Our method features a continuous-discrete state space model (SSM) that can robustly handle the intricate and noisy nature of fMRI signals. To address computational limitations, we implement an approximation strategy grounded in the SOC framework. Additionally, we present a s… ▽ More We introduce a foundational model for brain dynamics that utilizes stochastic optimal control (SOC) and amortized inference. Our method features a continuous-discrete state space model (SSM) that can robustly handle the intricate and noisy nature of fMRI signals. To address computational limitations, we implement an approximation strategy grounded in the SOC framework. Additionally, we present a simulation-free latent dynamics approach that employs locally linear approximations, facilitating efficient and scalable inference. For effective representation learning, we derive an Evidence Lower Bound (ELBO) from the SOC formulation, which integrates smoothly with recent advancements in self-supervised learning (SSL), thereby promoting robust and transferable representations. Pre-trained on extensive datasets such as the UKB, our model attains state-of-the-art results across a variety of downstream tasks, including demographic prediction, trait analysis, disease diagnosis, and prognosis. Moreover, evaluating on external datasets such as HCP-A, ABIDE, and ADHD200 further validates its superior abilities and resilience across different demographic and clinical distributions. Our foundational model provides a scalable and efficient approach for deciphering brain dynamics, opening up numerous applications in neuroscience. △ Less

Submitted 7 February, 2025; originally announced February 2025.

Comments: The first two authors contributed equally

arXiv:2502.04363 [pdf, other]

On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices

Authors: Bosung Kim, Kyuhwan Lee, Isu Jeong, Jungmin Cheon, Yeojin Lee, Seulki Lee

Abstract: We present On-device Sora, the first model training-free solution for diffusion-based on-device text-to-video generation that operates efficiently on smartphone-grade devices. To address the challenges of diffusion-based text-to-video generation on computation- and memory-limited mobile devices, the proposed On-device Sora applies three novel techniques to pre-trained video generative models. Firs… ▽ More We present On-device Sora, the first model training-free solution for diffusion-based on-device text-to-video generation that operates efficiently on smartphone-grade devices. To address the challenges of diffusion-based text-to-video generation on computation- and memory-limited mobile devices, the proposed On-device Sora applies three novel techniques to pre-trained video generative models. First, Linear Proportional Leap (LPL) reduces the excessive denoising steps required in video diffusion through an efficient leap-based approach. Second, Temporal Dimension Token Merging (TDTM) minimizes intensive token-processing computation in attention layers by merging consecutive tokens along the temporal dimension. Third, Concurrent Inference with Dynamic Loading (CI-DL) dynamically partitions large models into smaller blocks and loads them into memory for concurrent model inference, effectively addressing the challenges of limited device memory. We implement On-device Sora on the iPhone 15 Pro, and the experimental evaluations show that it is capable of generating high-quality videos on the device, comparable to those produced by high-end GPUs. These results show that On-device Sora enables efficient and high-quality video generation on resource-constrained mobile devices. We envision the proposed On-device Sora as a significant first step toward democratizing state-of-the-art generative technologies, enabling video generation on commodity mobile and embedded devices without resource-intensive re-training for model optimization (compression). The code implementation is available at a GitHub repository(https://github.com/eai-lab/On-device-Sora). △ Less

Submitted 31 March, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

arXiv:2502.04074 [pdf, other]

3D Prior is All You Need: Cross-Task Few-shot 2D Gaze Estimation

Authors: Yihua Cheng, Hengfei Wang, Zhongqun Zhang, Yang Yue, Bo Eun Kim, Feng Lu, Hyung Jin Chang

Abstract: 3D and 2D gaze estimation share the fundamental objective of capturing eye movements but are traditionally treated as two distinct research domains. In this paper, we introduce a novel cross-task few-shot 2D gaze estimation approach, aiming to adapt a pre-trained 3D gaze estimation network for 2D gaze prediction on unseen devices using only a few training images. This task is highly challenging du… ▽ More 3D and 2D gaze estimation share the fundamental objective of capturing eye movements but are traditionally treated as two distinct research domains. In this paper, we introduce a novel cross-task few-shot 2D gaze estimation approach, aiming to adapt a pre-trained 3D gaze estimation network for 2D gaze prediction on unseen devices using only a few training images. This task is highly challenging due to the domain gap between 3D and 2D gaze, unknown screen poses, and limited training data. To address these challenges, we propose a novel framework that bridges the gap between 3D and 2D gaze. Our framework contains a physics-based differentiable projection module with learnable parameters to model screen poses and project 3D gaze into 2D gaze. The framework is fully differentiable and can integrate into existing 3D gaze networks without modifying their original architecture. Additionally, we introduce a dynamic pseudo-labelling strategy for flipped images, which is particularly challenging for 2D labels due to unknown screen poses. To overcome this, we reverse the projection process by converting 2D labels to 3D space, where flipping is performed. Notably, this 3D space is not aligned with the camera coordinate system, so we learn a dynamic transformation matrix to compensate for this misalignment. We evaluate our method on MPIIGaze, EVE, and GazeCapture datasets, collected respectively on laptops, desktop computers, and mobile devices. The superior performance highlights the effectiveness of our approach, and demonstrates its strong potential for real-world applications. △ Less

Submitted 24 March, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

Comments: CVPR 2025

arXiv:2502.03966 [pdf, other]

MultiFloodSynth: Multi-Annotated Flood Synthetic Dataset Generation

Authors: YoonJe Kang, Yonghoon Jung, Wonseop Shin, Bumsoo Kim, Sanghyun Seo

Abstract: In this paper, we present synthetic data generation framework for flood hazard detection system. For high fidelity and quality, we characterize several real-world properties into virtual world and simulate the flood situation by controlling them. For the sake of efficiency, recent generative models in image-to-3D and urban city synthesis are leveraged to easily composite flood environments so that… ▽ More In this paper, we present synthetic data generation framework for flood hazard detection system. For high fidelity and quality, we characterize several real-world properties into virtual world and simulate the flood situation by controlling them. For the sake of efficiency, recent generative models in image-to-3D and urban city synthesis are leveraged to easily composite flood environments so that we avoid data bias due to the hand-crafted manner. Based on our framework, we build the flood synthetic dataset with 5 levels, dubbed MultiFloodSynth which contains rich annotation types like normal map, segmentation, 3D bounding box for a variety of downstream task. In experiments, our dataset demonstrate the enhanced performance of flood hazard detection with on-par realism compared with real dataset. △ Less

Submitted 13 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

Comments: 6 pages, 6 figures. Accepted as Oral Presentation to AAAI 2025 Workshop on Good-Data

arXiv:2502.03468 [pdf]

AI Governance in the Context of the EU AI Act: A Bibliometric and Literature Review Approach

Authors: Byeong-Je Kim, Seunghoo Jeong, Bong-Kyung Cho, Ji-Bum Chung

Abstract: The rapid advancement of artificial intelligence (AI) has brought about significant societal changes, necessitating robust AI governance frameworks. This study analyzed the research trends in AI governance within the framework of the EU AI Act. This study conducted a bibliometric analysis to examine the publications indexed in the Web of Science database. Our findings reveal that research on AI go… ▽ More The rapid advancement of artificial intelligence (AI) has brought about significant societal changes, necessitating robust AI governance frameworks. This study analyzed the research trends in AI governance within the framework of the EU AI Act. This study conducted a bibliometric analysis to examine the publications indexed in the Web of Science database. Our findings reveal that research on AI governance, particularly concerning AI systems regulated by the EU AI Act, remains relatively limited compared to the broader AI research landscape. Nonetheless, a growing interdisciplinary interest in AI governance is evident, with notable contributions from multi-disciplinary journals and open-access publications. Dominant research themes include ethical considerations, privacy concerns, and the growing impact of generative AI, such as ChatGPT. Notably, education, healthcare, and worker management are prominent application domains. Keyword network analysis highlights education, ethics, and ChatGPT as central keywords, underscoring the importance of these areas in current AI governance research. Subsequently, a comprehensive literature review was undertaken based on the bibliometric analysis findings to identify research trends, challenges, and insights within the categories of the EU AI Act. The findings provide valuable insights for researchers and policymakers, informing future research directions and contributing to developing comprehensive AI governance frameworks beyond the EU AI Act. △ Less

Submitted 8 January, 2025; originally announced February 2025.

Comments: 16 pages, 3 figures, 9 tables, submitted to IEEE Access

arXiv:2502.02732 [pdf, other]

Peri-LN: Revisiting Layer Normalization in the Transformer Architecture

Authors: Jeonghoon Kim, Byeongchan Lee, Cheonbok Park, Yeontaek Oh, Beomjun Kim, Taehwan Yoo, Seongjin Shin, Dongyoon Han, Jinwoo Shin, Kang Min Yoo

Abstract: Designing Transformer architectures with the optimal layer normalization (LN) strategy that ensures large-scale training stability and expedite convergence has remained elusive, even in this era of large language models (LLMs). To this end, we present a comprehensive analytical foundation for understanding how different LN strategies influence training dynamics in large-scale Transformer training.… ▽ More Designing Transformer architectures with the optimal layer normalization (LN) strategy that ensures large-scale training stability and expedite convergence has remained elusive, even in this era of large language models (LLMs). To this end, we present a comprehensive analytical foundation for understanding how different LN strategies influence training dynamics in large-scale Transformer training. Until recently, Pre-LN and Post-LN have long dominated standard practices despite their limitations in large-scale training. However, several open-source large-scale models have recently begun silently adopting a third strategy without much explanation. This strategy places layer normalization (LN) peripherally around sublayers, a design we term Peri-LN. While Peri-LN has demonstrated promising empirical performance, its precise mechanisms and benefits remain almost unexplored. Our in-depth analysis shows that Peri-LN strikes an ideal balance in variance growth -- unlike Pre-LN and Post-LN, which are prone to vanishing gradients and ``massive activations.'' To validate our theoretical insight, we conduct large-scale experiments on Transformers up to 3.2B parameters, showing that Peri-LN consistently achieves more balanced variance growth, steadier gradient flow, and convergence stability. Our results suggest that Peri-LN warrants broader consideration for large-scale Transformer architectures, providing renewed insights into the optimal placement and application of LN. △ Less

Submitted 6 February, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

Comments: Preprint

arXiv:2502.02640 [pdf, other]

Dynamics and lifetime of geometric excitations in moiré systems

Authors: Yuzhu Wang, Joe Huxford, Dung Xuan Nguyen, Guangyue Ji, Yong Baek Kim, Bo Yang

Abstract: We show that spin-2 geometric excitations, known as graviton modes, generally exhibit vanishing lifetimes in lattice Chern bands, including in moiré systems. In contrast to the Landau levels, we first numerically demonstrate that the prominent graviton peaks in spectral functions diminish rapidly with increasing system sizes. We explore how the choice of interaction affects the strength of these p… ▽ More We show that spin-2 geometric excitations, known as graviton modes, generally exhibit vanishing lifetimes in lattice Chern bands, including in moiré systems. In contrast to the Landau levels, we first numerically demonstrate that the prominent graviton peaks in spectral functions diminish rapidly with increasing system sizes. We explore how the choice of interaction affects the strength of these peaks, with short-ranged interactions pushing the graviton mode far into the continuum of excitations, where it can be significantly scattered due to the increased density of states. We also analytically investigate the short lifetime of the graviton mode. In lattice systems, continuous rotational symmetry is broken, leading to highly anisotropic gapped excitations that mix different angular momentum or ``spins''. This is despite the surprising emergence of a ``guiding center" continuous rotational symmetry in the ground state, which is shared by the graviton mode. Consequently, the graviton mode in Chern bands can be strongly scattered by the anisotropic gapped excitations. However, the emergent rotational symmetry implies that gravitons can be robust in principle, and we propose experimental tuning strategies to lower the graviton mode energy below the continuum. We argue this is a necessary condition enabling the observation of graviton modes and geometric excitations in realistic moiré systems. △ Less

Submitted 26 March, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

Comments: 10 pages, 6 figures, comments very welcome

arXiv:2502.01070 [pdf, other]

An Inquiry into Datacenter TCO for LLM Inference with FP8

Authors: Jiwoo Kim, Joonhyung Lee, Gunho Park, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee, Youngjoo Lee

Abstract: As large language models (LLMs) continue to scale, their inference demands present significant challenges, particularly due to the high power consumption of AI accelerators in datacenters. These facilities require specialized cooling and power management systems, substantially increasing the total cost of ownership (TCO) for cloud service providers (CSPs). In this work, we analyze the computationa… ▽ More As large language models (LLMs) continue to scale, their inference demands present significant challenges, particularly due to the high power consumption of AI accelerators in datacenters. These facilities require specialized cooling and power management systems, substantially increasing the total cost of ownership (TCO) for cloud service providers (CSPs). In this work, we analyze the computational characteristics and constraints of LLM inference from a TCO perspective, focusing on two representative accelerators: the Gaudi 2 and NVIDIA H100. We present a generalizable framework that enables CSPs to compare and select AI accelerators according to diverse operational requirements. Using this model, we analyze the impact of FP8 precision and LLM inference workload characteristics as key factors influencing TCO. We investigate FP8 quantization, which is gaining adoption in LLM training, as a technique to improve inference throughput while maintaining cost efficiency. Furthermore, our analysis of LLM inference workloads reveals that performance on thin GEMMs, which dominate the decode phase, can have a greater impact than theoretical hardware peak performance. By studying the interaction between power consumption, quantization strategies, and hardware architecture, we offer insights that support informed deployment decisions and guide future accelerator designs to improve the TCO of LLM inference. △ Less

Submitted 29 April, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

arXiv:2501.17683 [pdf, other]

Temperature-Free Loss Function for Contrastive Learning

Authors: Bum Jun Kim, Sang Woo Kim

Abstract: As one of the most promising methods in self-supervised learning, contrastive learning has achieved a series of breakthroughs across numerous fields. A predominant approach to implementing contrastive learning is applying InfoNCE loss: By capturing the similarities between pairs, InfoNCE loss enables learning the representation of data. Albeit its success, adopting InfoNCE loss requires tuning a t… ▽ More As one of the most promising methods in self-supervised learning, contrastive learning has achieved a series of breakthroughs across numerous fields. A predominant approach to implementing contrastive learning is applying InfoNCE loss: By capturing the similarities between pairs, InfoNCE loss enables learning the representation of data. Albeit its success, adopting InfoNCE loss requires tuning a temperature, which is a core hyperparameter for calibrating similarity scores. Despite its significance and sensitivity to performance being emphasized by several studies, searching for a valid temperature requires extensive trial-and-error-based experiments, which increases the difficulty of adopting InfoNCE loss. To address this difficulty, we propose a novel method to deploy InfoNCE loss without temperature. Specifically, we replace temperature scaling with the inverse hyperbolic tangent function, resulting in a modified InfoNCE loss. In addition to hyperparameter-free deployment, we observed that the proposed method even yielded a performance gain in contrastive learning. Our detailed theoretical analysis discovers that the current practice of temperature scaling in InfoNCE loss causes serious problems in gradient descent, whereas our method provides desirable gradient properties. The proposed method was validated on five benchmarks on contrastive learning, yielding satisfactory results without temperature tuning. △ Less

Submitted 29 January, 2025; originally announced January 2025.

Comments: 10 pages, 5 figures

arXiv:2501.16482 [pdf, other]

Hybrid Hadronization -- A Study of In-Medium Hadronization of Jets

Authors: A. Sengupta, R. J. Fries, M. Kordell II, B. Kim, A. Angerami, R. Arora, S. A. Bass, Y. Chen, R. Datta, L. Du, R. Ehlers, H. Elfner, C. Gale, Y. He, B. V. Jacak, P. M. Jacobs, S. Jeon, Y. Ji, F. Jonas, L. Kasper, A. Kumar, R. Kunnawalkam-Elayavalli, J. Latessa, Y. -J. Lee, R. Lemmon , et al. (28 additional authors not shown)

Abstract: QCD jets are considered important probes for quark gluon plasma created in collisions of nuclei at high energies. Their parton showers are significantly altered if they develop inside of a deconfined medium. Hadronization of jets is also thought to be affected by the presence of quarks and gluons. We present a systematic study of the effects of a thermal bath of partons on the hadronization of par… ▽ More QCD jets are considered important probes for quark gluon plasma created in collisions of nuclei at high energies. Their parton showers are significantly altered if they develop inside of a deconfined medium. Hadronization of jets is also thought to be affected by the presence of quarks and gluons. We present a systematic study of the effects of a thermal bath of partons on the hadronization of parton showers. We use the JETSCAPE framework to create parton showers both in vacuum and in a brick of quark gluon plasma. The brick setup allows important parameters, like the size of the plasma as well as the collective flow of partons, to be varied systematically. We hadronize the parton showers using Hybrid Hadronization, which permits shower partons to form strings with thermal partons, or to recombine directly with thermal partons as well as with each other. We find a sizeable amount of interaction of shower partons with thermal partons during hadronization, indicating a natural continuation of the interaction of jet and medium during this stage. The observed effects grow with the size of the medium. Collective flow easily transfers from the thermal partons onto the emerging jet hadrons. We also see a significant change in hadron chemistry as expected in the presence of quark recombination processes. △ Less

Submitted 27 January, 2025; originally announced January 2025.

Comments: 12 pages, 6 figures

arXiv:2501.15076 [pdf, other]

Cryptanalysis via Machine Learning Based Information Theoretic Metrics

Authors: Benjamin D. Kim, Vipindev Adat Vasudevan, Rafael G. L. D'Oliveira, Alejandro Cohen, Thomas Stahlbuhk, Muriel Médard

Abstract: The fields of machine learning (ML) and cryptanalysis share an interestingly common objective of creating a function, based on a given set of inputs and outputs. However, the approaches and methods in doing so vary vastly between the two fields. In this paper, we explore integrating the knowledge from the ML domain to provide empirical evaluations of cryptosystems. Particularly, we utilize informa… ▽ More The fields of machine learning (ML) and cryptanalysis share an interestingly common objective of creating a function, based on a given set of inputs and outputs. However, the approaches and methods in doing so vary vastly between the two fields. In this paper, we explore integrating the knowledge from the ML domain to provide empirical evaluations of cryptosystems. Particularly, we utilize information theoretic metrics to perform ML-based distribution estimation. We propose two novel applications of ML algorithms that can be applied in a known plaintext setting to perform cryptanalysis on any cryptosystem. We use mutual information neural estimation to calculate a cryptosystem's mutual information leakage, and a binary cross entropy classification to model an indistinguishability under chosen plaintext attack (CPA). These algorithms can be readily applied in an audit setting to evaluate the robustness of a cryptosystem and the results can provide a useful empirical bound. We evaluate the efficacy of our methodologies by empirically analyzing several encryption schemes. Furthermore, we extend the analysis to novel network coding-based cryptosystems and provide other use cases for our algorithms. We show that our classification model correctly identifies the encryption schemes that are not IND-CPA secure, such as DES, RSA, and AES ECB, with high accuracy. It also identifies the faults in CPA-secure cryptosystems with faulty parameters, such a reduced counter version of AES-CTR. We also conclude that with our algorithms, in most cases a smaller-sized neural network using less computing power can identify vulnerabilities in cryptosystems, providing a quick check of the sanity of the cryptosystem and help to decide whether to spend more resources to deploy larger networks that are able to break the cryptosystem. △ Less

Submitted 24 January, 2025; originally announced January 2025.

arXiv:2501.14013 [pdf, other]

Leveraging Multiphase CT for Quality Enhancement of Portal Venous CT: Utility for Pancreas Segmentation

Authors: Xinya Wang, Tejas Sudharshan Mathai, Boah Kim, Ronald M. Summers

Abstract: Multiphase CT studies are routinely obtained in clinical practice for diagnosis and management of various diseases, such as cancer. However, the CT studies can be acquired with low radiation doses, different scanners, and are frequently affected by motion and metal artifacts. Prior approaches have targeted the quality improvement of one specific CT phase (e.g., non-contrast CT). In this work, we h… ▽ More Multiphase CT studies are routinely obtained in clinical practice for diagnosis and management of various diseases, such as cancer. However, the CT studies can be acquired with low radiation doses, different scanners, and are frequently affected by motion and metal artifacts. Prior approaches have targeted the quality improvement of one specific CT phase (e.g., non-contrast CT). In this work, we hypothesized that leveraging multiple CT phases for the quality enhancement of one phase may prove advantageous for downstream tasks, such as segmentation. A 3D progressive fusion and non-local (PFNL) network was developed. It was trained with three degraded (low-quality) phases (non-contrast, arterial, and portal venous) to enhance the quality of the portal venous phase. Then, the effect of scan quality enhancement was evaluated using a proxy task of pancreas segmentation, which is useful for tracking pancreatic cancer. The proposed approach improved the pancreas segmentation by 3% over the corresponding low-quality CT scan. To the best of our knowledge, we are the first to harness multiphase CT for scan quality enhancement and improved pancreas segmentation. △ Less

Submitted 23 January, 2025; originally announced January 2025.

Comments: ISBI 2025

MSC Class: 92C55 ACM Class: I.4.6

arXiv:2501.13665 [pdf, other]

Limits on WIMP dark matter with NaI(Tl) crystals in three years of COSINE-100 data

Authors: G. H. Yu, N. Carlin, J. Y. Cho, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. Franca, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim, Y. J. Ko, D. H. Lee , et al. (34 additional authors not shown)

Abstract: We report limits on WIMP dark matter derived from three years of data collected by the COSINE-100 experiment with NaI(Tl) crystals, achieving an improved energy threshold of 0.7 keV. This lowered threshold enhances sensitivity in the sub-GeV mass range, extending the reach for direct detection of low-mass dark matter. Although no excess of WIMP-like events was observed, the increased sensitivity e… ▽ More We report limits on WIMP dark matter derived from three years of data collected by the COSINE-100 experiment with NaI(Tl) crystals, achieving an improved energy threshold of 0.7 keV. This lowered threshold enhances sensitivity in the sub-GeV mass range, extending the reach for direct detection of low-mass dark matter. Although no excess of WIMP-like events was observed, the increased sensitivity enabled a model-independent comparison between the expected WIMP signal rate-based on mass limits from our data-and DAMA's reported modulation amplitude. Our findings strongly disfavor the DAMA signal as originating from WIMP interactions, fully excluding DAMA/LIBRA 3$σ$ allowed regions and providing enhanced WIMP mass limits by an order of magnitude in the spin-independent model compared to previous results. In the spin-dependent model, cross-section upper limits were obtained in the mass range [0.1-5.0] GeV/c$^2$, with additional sensitivity to sub-GeV WIMPs through the inclusion of the Migdal effect. These results represent substantial progress in low-mass dark matter exploration and reinforce constraints on the longstanding DAMA claim. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.13260 [pdf]

Field induced density wave in a kagome superconductor

Authors: Md Shafayat Hossain, Qi Zhang, Julian Ingham, Jinjin Liu, Sen Shao, Yangmu Li, Yuxin Wang, Bal K. Pokharel, Zi-Jia Cheng, Yu-Xiao Jiang, Maksim Litskevich, Byunghoon Kim, Xian Yang, Yongkai Li, Tyler A. Cochran, Yugui Yao, Dragana Popović, Zhiwei Wang, Guoqing Chang, Ronny Thomale, Luis Balicas, M. Zahid Hasan

Abstract: On the kagome lattice, electrons benefit from the simultaneous presence of band topology, flat electronic bands, and van Hove singularities, forming competing or cooperating orders. Understanding the interrelation between these distinct order parameters remains a significant challenge, leaving much of the associated physics unexplored. In the kagome superconductor KV3Sb5, which exhibits a charge d… ▽ More On the kagome lattice, electrons benefit from the simultaneous presence of band topology, flat electronic bands, and van Hove singularities, forming competing or cooperating orders. Understanding the interrelation between these distinct order parameters remains a significant challenge, leaving much of the associated physics unexplored. In the kagome superconductor KV3Sb5, which exhibits a charge density wave (CDW) state below T = 78 K, we uncover an unpredicted field-induced phase transition below 6 K. The observed transition is marked by a hysteretic anomaly in the resistivity, nonlinear electrical transport, and a change in the symmetry of the electronic response as probed via the angular dependence of the magnetoresistivity. These observations surprisingly suggest the emergence of an unanticipated broken symmetry state coexisting with the original CDW. To understand this experimental observation, we developed a theoretical minimal model for the normal state inside the high-temperature parent CDW phase where an incommensurate CDW order emerges as an instability sub-leading to superconductivity. The incommensurate CDW emerges when superconducting fluctuations become fully suppressed by large magnetic fields. Our results suggest that, in kagome superconductors, quantum states can either coexist or are nearly degenerate in energy, indicating that these are rich platforms to expose new correlated phenomena. △ Less

Submitted 22 January, 2025; originally announced January 2025.

arXiv:2501.11698 [pdf, other]

AstroPix: A Pixelated HVCMOS Sensor for Space-Based Gamma-Ray Measurement

Authors: Amanda L. Steinhebel, Regina Caputo, Daniel P. Violette, Anthony Affolder, Autumn Bauman, Carolyn Chinatti, Aware Deshmukh, Vitaliy Fadayev, Yasushi Fukazawa, Manoj Jadhav, Carolyn Kierans, Bobae Kim, Jihee Kim, Henry Klest, Olivia Kroger, Kavic Kumar, Shin Kushima, Jean-Marie Lauenstein, Richard Leys, Forest Martinez-Mckinney, Jessica Metcalfe, Zachary Metzler, John W. Mitchell, Norito Nakano, Jennifer Ott , et al. (11 additional authors not shown)

Abstract: A next-generation medium-energy gamma-ray telescope targeting the MeV range would address open questions in astrophysics regarding how extreme conditions accelerate cosmic-ray particles, produce relativistic jet outflows, and more. One concept, AMEGO-X, relies upon the mission-enabling CMOS Monolithic Active Pixel Sensor silicon chip AstroPix. AstroPix is designed for space-based use, featuring lo… ▽ More A next-generation medium-energy gamma-ray telescope targeting the MeV range would address open questions in astrophysics regarding how extreme conditions accelerate cosmic-ray particles, produce relativistic jet outflows, and more. One concept, AMEGO-X, relies upon the mission-enabling CMOS Monolithic Active Pixel Sensor silicon chip AstroPix. AstroPix is designed for space-based use, featuring low noise, low power consumption, and high scalability. Desired performance of the device include an energy resolution of 5 keV (or 10% FWHM) at 122 keV and a dynamic range per-pixel of 25-700 keV, enabled by the addition of a high-voltage bias to each pixel which supports a depletion depth of 500 um. This work reports on the status of the AstroPix development process with emphasis on the current version under test, version three (v3), and highlights of version two (v2). Version 3 achieves energy resolution of 10.4 +\- 3.2 % at 59.5 keV and 94 +\- 6 um depletion in a low-resistivity test silicon substrate. △ Less

Submitted 20 January, 2025; originally announced January 2025.

Comments: 20 pages, 13 figures

arXiv:2501.09993 [pdf, other]

Agent-as-Judge for Factual Summarization of Long Narratives

Authors: Yeonseok Jeong, Minsoo Kim, Seung-won Hwang, Byung-Hak Kim

Abstract: Large Language Models (LLMs) have demonstrated near-human performance in summarization tasks based on traditional metrics such as ROUGE and BERTScore. However, these metrics do not adequately capture critical aspects of summarization quality, such as factual accuracy, particularly for long narratives (>100K tokens). Recent advances, such as LLM-as-a-Judge, address the limitations of metrics based… ▽ More Large Language Models (LLMs) have demonstrated near-human performance in summarization tasks based on traditional metrics such as ROUGE and BERTScore. However, these metrics do not adequately capture critical aspects of summarization quality, such as factual accuracy, particularly for long narratives (>100K tokens). Recent advances, such as LLM-as-a-Judge, address the limitations of metrics based on lexical similarity but still exhibit factual inconsistencies, especially in understanding character relationships and states. In this work, we introduce NarrativeFactScore, a novel "Agent-as-a-Judge" framework for evaluating and refining summaries. By leveraging a Character Knowledge Graph (CKG) extracted from input and generated summaries, NarrativeFactScore assesses the factual consistency and provides actionable guidance for refinement, such as identifying missing or erroneous facts. We demonstrate the effectiveness of NarrativeFactScore through a detailed workflow illustration and extensive validation on widely adopted benchmarks, achieving superior performance compared to competitive methods. Our results highlight the potential of agent-driven evaluation systems to improve the factual reliability of LLM-generated summaries. △ Less

Submitted 17 January, 2025; originally announced January 2025.

arXiv:2501.07653 [pdf, ps, other]

Large Language Models for Interpretable Mental Health Diagnosis

Authors: Brian Hyeongseok Kim, Chao Wang

Abstract: We propose a clinical decision support system (CDSS) for mental health diagnosis that combines the strengths of large language models (LLMs) and constraint logic programming (CLP). Having a CDSS is important because of the high complexity of diagnostic manuals used by mental health professionals and the danger of diagnostic errors. Our CDSS is a software tool that uses an LLM to translate diagnost… ▽ More We propose a clinical decision support system (CDSS) for mental health diagnosis that combines the strengths of large language models (LLMs) and constraint logic programming (CLP). Having a CDSS is important because of the high complexity of diagnostic manuals used by mental health professionals and the danger of diagnostic errors. Our CDSS is a software tool that uses an LLM to translate diagnostic manuals to a logic program and solves the program using an off-the-shelf CLP engine to query a patient's diagnosis based on the encoded rules and provided data. By giving domain experts the opportunity to inspect the LLM-generated logic program, and making modifications when needed, our CDSS ensures that the diagnosis is not only accurate but also interpretable. We experimentally compare it with two baseline approaches of using LLMs: diagnosing patients using the LLM-only approach, and using the LLM-generated logic program but without expert inspection. The results show that, while LLMs are extremely useful in generating candidate logic programs, these programs still require expert inspection and modification to guarantee faithfulness to the official diagnostic manuals. Additionally, ethical concerns arise from the direct use of patient data in LLMs, underscoring the need for a safer hybrid approach like our proposed method. △ Less

Submitted 21 February, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

Comments: Accepted at AAAI 2025 Workshop on Large Language Models and Generative AI for Health (GenAI4Health)

arXiv:2501.05980 [pdf]

doi 10.1038/s41467-025-56919-2

Tunable superconductivity coexisting with the anomalous Hall effect in 1T'-WS2

Authors: Md Shafayat Hossain, Qi Zhang, David Graf, Mikel Iraola, Tobias Müller, Sougata Mardanya, Yi-Hsin Tu, Zhuangchai Lai, Martina O. Soldini, Siyuan Li, Yao Yao, Yu-Xiao Jiang, Zi-Jia Cheng, Maksim Litskevich, Brian Casas, Tyler A. Cochran, Xian P. Yang, Byunghoon Kim, Kenji Watanabe, Takashi Taniguchi, Sugata Chowdhury, Arun Bansil, Hua Zhang, Tay-Rong Chang, Mark Fischer , et al. (3 additional authors not shown)

Abstract: Transition metal dichalcogenides are a family of quasi-two-dimensional materials that display a high technological potential due to their wide range of electronic ground states, e.g., from superconducting to semiconducting, depending on the chemical composition, crystal structure, or electrostatic doping. Here, we unveil that by tuning a single parameter, the hydrostatic pressure P, a cascade of e… ▽ More Transition metal dichalcogenides are a family of quasi-two-dimensional materials that display a high technological potential due to their wide range of electronic ground states, e.g., from superconducting to semiconducting, depending on the chemical composition, crystal structure, or electrostatic doping. Here, we unveil that by tuning a single parameter, the hydrostatic pressure P, a cascade of electronic phase transitions can be induced in the few-layer transition metal dichalcogenide 1T'-WS2, including superconducting, topological, and anomalous Hall effect phases. Specifically, as P increases, we observe a dual phase transition: the suppression of superconductivity with the concomitant emergence of an anomalous Hall effect at P=1.15 GPa. Remarkably, upon further increasing the pressure above 1.6 GPa, we uncover a reentrant superconducting state that emerges out of a state still exhibiting an anomalous Hall effect. This superconducting state shows a marked increase in superconducting anisotropy with respect to the phase observed at ambient pressure, suggesting a different superconducting state with a distinct pairing symmetry. Via first-principles calculations, we demonstrate that the system concomitantly transitions into a strong topological phase with markedly different band orbital characters and Fermi surfaces contributing to the superconductivity. These findings position 1T'-WS2 as a unique, tunable superconductor, wherein superconductivity, anomalous transport, and band features can be tuned through the application of moderate pressures. △ Less

Submitted 10 January, 2025; originally announced January 2025.

Journal ref: Nature Communications volume 16, Article number: 2399 (2025)

arXiv:2501.04896 [pdf, other]

Quantifying Itch and its Impact on Sleep Using Machine Learning and Radio Signals

Authors: Michail Ouroutzoglou, Mingmin Zhao, Joshua Hellerstein, Hariharan Rahul, Asima Badic, Brian S. Kim, Dina Katabi

Abstract: Chronic itch affects 13% of the US population, is highly debilitating, and underlies many medical conditions. A major challenge in clinical care and new therapeutics development is the lack of an objective measure for quantifying itch, leading to reliance on subjective measures like patients' self-assessment of itch severity. In this paper, we show that a home radio device paired with artificial i… ▽ More Chronic itch affects 13% of the US population, is highly debilitating, and underlies many medical conditions. A major challenge in clinical care and new therapeutics development is the lack of an objective measure for quantifying itch, leading to reliance on subjective measures like patients' self-assessment of itch severity. In this paper, we show that a home radio device paired with artificial intelligence (AI) can concurrently capture scratching and evaluate its impact on sleep quality by analyzing radio signals bouncing in the environment. The device eliminates the need for wearable sensors or skin contact, enabling monitoring of chronic itch over extended periods at home without burdening patients or interfering with their skin condition. To validate the technology, we conducted an observational clinical study of chronic pruritus patients, monitored at home for one month using both the radio device and an infrared camera. Comparing the output of the device to ground truth data from the camera demonstrates its feasibility and accuracy (ROC AUC = 0.997, sensitivity = 0.825, specificity = 0.997). The results reveal a significant correlation between scratching and low sleep quality, manifested as a reduction in sleep efficiency (R = 0.6, p < 0.001) and an increase in sleep latency (R = 0.68, p < 0.001). Our study underscores the potential of passive, long-term, at-home monitoring of chronic scratching and its sleep implications, offering a valuable tool for both clinical care of chronic itch patients and pharmaceutical clinical trials. △ Less

Submitted 8 January, 2025; originally announced January 2025.

arXiv:2501.04428 [pdf, other]

Hypersonic acoustic wave control via hyperuniform phononic nanostructures

Authors: Michele Diego, Jade Hardouin, Gabrielle Mazevet-Schargrod, Matteo Pirro, Byunggi Kim, Roman Anufriev, Masahiro Nomura

Abstract: Controlling hypersonic surface acoustic waves is crucial for advanced phononic devices such as high-frequency filters, sensors, and quantum computing components. While periodic phononic crystals enable precise bandgap engineering, their ability to suppress acoustic waves is limited to specific frequency ranges. Here, we experimentally demonstrate the control of surface acoustic waves using a hyper… ▽ More Controlling hypersonic surface acoustic waves is crucial for advanced phononic devices such as high-frequency filters, sensors, and quantum computing components. While periodic phononic crystals enable precise bandgap engineering, their ability to suppress acoustic waves is limited to specific frequency ranges. Here, we experimentally demonstrate the control of surface acoustic waves using a hyperuniform arrangement of gold nanopillars on a lithium niobate layer. The hyperuniform structure exhibits characteristics of both random and ordered systems, leading to an overall reduction in acoustic transmission and the formation of bandgap-like regions where phonon propagation is strongly suppressed. We further demonstrate effective waveguiding by incorporating linear and S-shaped waveguides into the hyperuniform pattern. Both simulations and experiments confirm high transmission through the waveguides at frequencies within the bandgaps, demonstrating the flexibility of hyperuniform structures to support waveguides of complex shapes. These findings provide a novel approach to overcoming the limitations of traditional phononic crystals and advancing acoustic technologies in applications such as mechanical quantum computing and smartphone filters. △ Less

Submitted 8 January, 2025; originally announced January 2025.

Comments: 5 pages, 3 figures

arXiv:2501.04284 [pdf, other]

ContextMRI: Enhancing Compressed Sensing MRI through Metadata Conditioning

Authors: Hyungjin Chung, Dohun Lee, Zihui Wu, Byung-Hoon Kim, Katherine L. Bouman, Jong Chul Ye

Abstract: Compressed sensing MRI seeks to accelerate MRI acquisition processes by sampling fewer k-space measurements and then reconstructing the missing data algorithmically. The success of these approaches often relies on strong priors or learned statistical models. While recent diffusion model-based priors have shown great potential, previous methods typically ignore clinically available metadata (e.g. p… ▽ More Compressed sensing MRI seeks to accelerate MRI acquisition processes by sampling fewer k-space measurements and then reconstructing the missing data algorithmically. The success of these approaches often relies on strong priors or learned statistical models. While recent diffusion model-based priors have shown great potential, previous methods typically ignore clinically available metadata (e.g. patient demographics, imaging parameters, slice-specific information). In practice, metadata contains meaningful cues about the anatomy and acquisition protocol, suggesting it could further constrain the reconstruction problem. In this work, we propose ContextMRI, a text-conditioned diffusion model for MRI that integrates granular metadata into the reconstruction process. We train a pixel-space diffusion model directly on minimally processed, complex-valued MRI images. During inference, metadata is converted into a structured text prompt and fed to the model via CLIP text embeddings. By conditioning the prior on metadata, we unlock more accurate reconstructions and show consistent gains across multiple datasets, acceleration factors, and undersampling patterns. Our experiments demonstrate that increasing the fidelity of metadata, ranging from slice location and contrast to patient age, sex, and pathology, systematically boosts reconstruction performance. This work highlights the untapped potential of leveraging clinical context for inverse problems and opens a new direction for metadata-driven MRI reconstruction. △ Less

Submitted 8 January, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

Comments: 29 pages, 9 figures. Code is available at https://github.com/DoHunLee1/ContextMRI

arXiv:2501.02070 [pdf]

doi 10.1038/s41535-025-00725-y

Magnetoelectric effect in van der Waals magnets

Authors: Kai-Xuan Zhang, Giung Park, Youjin Lee, Beom Hyun Kim, Je-Geun Park

Abstract: The magnetoelectric (ME) effect is a fundamental concept in modern condensed matter physics and represents the electrical control of magnetic polarisations or vice versa. Two-dimensional (2D) van-der-Waals (vdW) magnets have emerged as a new class of materials and exhibit novel ME effects with diverse manifestations. This review emphasizes some important recent discoveries unique to vdW magnets: m… ▽ More The magnetoelectric (ME) effect is a fundamental concept in modern condensed matter physics and represents the electrical control of magnetic polarisations or vice versa. Two-dimensional (2D) van-der-Waals (vdW) magnets have emerged as a new class of materials and exhibit novel ME effects with diverse manifestations. This review emphasizes some important recent discoveries unique to vdW magnets: multiferroicity on two dimensions, spin-charge correlation, atomic ME effect and current-induced intrinsic spin-orbit torque, and electrical gating control and magnetic control of their electronic properties. We also highlight the promising route of utilizing quantum magnetic hetero- or homo-structures to engineer the ME effect and corresponding spintronic and optoelectronic device applications. Due to the intrinsic two-dimensionality, vdW magnets with those ME effects are expected to form a new, exciting research direction. △ Less

Submitted 7 January, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

Comments: Accepted by npj Quantum Materials; 27 pages, 6 main figures

Journal ref: npj Quantum Materials 10, 6 (2025)

arXiv:2501.01594 [pdf, other]

PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents

Authors: Jingoo Lee, Kyungho Lim, Young-Chul Jung, Byung-Hoon Kim

Abstract: Recent advances in large language models (LLMs) have accelerated the development of conversational agents capable of generating human-like responses. Since psychiatric assessments typically involve complex conversational interactions between psychiatrists and patients, there is growing interest in developing LLM-based psychiatric assessment conversational agents (PACAs) that aim to simulate the ro… ▽ More Recent advances in large language models (LLMs) have accelerated the development of conversational agents capable of generating human-like responses. Since psychiatric assessments typically involve complex conversational interactions between psychiatrists and patients, there is growing interest in developing LLM-based psychiatric assessment conversational agents (PACAs) that aim to simulate the role of psychiatrists in clinical evaluations. However, standardized methods for benchmarking the clinical appropriateness of PACAs' interaction with patients still remain underexplored. Here, we propose PSYCHE, a novel framework designed to enable the 1) clinically relevant, 2) ethically safe, 3) cost-efficient, and 4) quantitative evaluation of PACAs. This is achieved by simulating psychiatric patients based on a multi-faceted psychiatric construct that defines the simulated patients' profiles, histories, and behaviors, which PACAs are expected to assess. We validate the effectiveness of PSYCHE through a study with 10 board-certified psychiatrists, supported by an in-depth analysis of the simulated patient utterances. △ Less

Submitted 2 January, 2025; originally announced January 2025.

Comments: The first two authors contributed equally

arXiv:2501.00183 [pdf, ps, other]

Parabolic Lipschitz truncation for multi-phase problems: the degenerate case

Authors: Bogi Kim, Jehan Oh, Abhrojyoti Sen

Abstract: This article is devoted to exploring the Lipschitz truncation method for parabolic multi-phase problems. The method is based on Whitney decomposition and covering lemmas with a delicate comparison scheme of appropriate alternatives to distinguish phases, as introduced by the first and the second author in [24]. This article is devoted to exploring the Lipschitz truncation method for parabolic multi-phase problems. The method is based on Whitney decomposition and covering lemmas with a delicate comparison scheme of appropriate alternatives to distinguish phases, as introduced by the first and the second author in [24]. △ Less

Submitted 13 April, 2025; v1 submitted 30 December, 2024; originally announced January 2025.

Comments: 39 pages. Revised according to reviewer's comments. To appear in Advances in Calculus of Variations

arXiv:2412.20048 [pdf, other]

CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation

Authors: Ji-Hoon Kim, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

Abstract: The goal of this work is to generate natural speech in multiple languages while maintaining the same speaker identity, a task known as cross-lingual speech synthesis. A key challenge of cross-lingual speech synthesis is the language-speaker entanglement problem, which causes the quality of cross-lingual systems to lag behind that of intra-lingual systems. In this paper, we propose CrossSpeech++, w… ▽ More The goal of this work is to generate natural speech in multiple languages while maintaining the same speaker identity, a task known as cross-lingual speech synthesis. A key challenge of cross-lingual speech synthesis is the language-speaker entanglement problem, which causes the quality of cross-lingual systems to lag behind that of intra-lingual systems. In this paper, we propose CrossSpeech++, which effectively disentangles language and speaker information and significantly improves the quality of cross-lingual speech synthesis. To this end, we break the complex speech generation pipeline into two simple components: language-dependent and speaker-dependent generators. The language-dependent generator produces linguistic variations that are not biased by specific speaker attributes. The speaker-dependent generator models acoustic variations that characterize speaker identity. By handling each type of information in separate modules, our method can effectively disentangle language and speaker representation. We conduct extensive experiments using various metrics, and demonstrate that CrossSpeech++ achieves significant improvements in cross-lingual speech synthesis, outperforming existing methods by a large margin. △ Less

Submitted 28 December, 2024; originally announced December 2024.

arXiv:2412.19129 [pdf, other]

doi 10.1002/adem.202401561

The effect of grain boundary misorientation on hydrogen flux using a phase-field based diffusion and trapping model

Authors: Abdelrahman Hussein, Byungki Kim, Kim Verbeken, Tom Depover

Abstract: Understanding hydrogen-grain boundary (GB) interactions is critical to the analysis of hydrogen embrittlement in metals. This work presents a mesoscale fully kinetic model to investigate the effect of GB misorientation on hydrogen diffusion and trapping using phase-field based representative volume elements (RVEs). The flux equation consists of three terms: a diffusive term and two terms for high… ▽ More Understanding hydrogen-grain boundary (GB) interactions is critical to the analysis of hydrogen embrittlement in metals. This work presents a mesoscale fully kinetic model to investigate the effect of GB misorientation on hydrogen diffusion and trapping using phase-field based representative volume elements (RVEs). The flux equation consists of three terms: a diffusive term and two terms for high and low angle grain boundary (H/LAGB) trapping. Uptake simulations showed that decreasing the grain size resulted in higher hydrogen content due to increasing the GB density. Permeation simulations showed that GBs are high flux paths due to their higher enrichment with hydrogen. Since HAGBs have higher enrichment than LAGBs, due to their higher trap-binding energy, they generally have the highest hydrogen flux. Nevertheless, the flux shows a convoluted behavior as it depends on the local concentration, alignment of GB with external concentration gradient as well as the GB network connectivity. Finally, decreasing the grain size resulted in a larger break-through time and a larger steady-state exit flux. △ Less

Submitted 30 December, 2024; v1 submitted 26 December, 2024; originally announced December 2024.

arXiv:2412.18974 [pdf, other]

doi 10.1016/j.ijhydene.2023.11.270

Modeling the effect of grain boundary diffusivity and trapping on hydrogen transport using a phase-field compatible formulation

Authors: Abdelrahman Hussein, Byungki Kim, Tom Depover, Kim Verbeken

Abstract: Hydrogen grain boundary (GB) trapping is widely accepted as the main cause for hydrogen induced intergranular failure. Several studies were conducted to unveil the role of GBs on hydrogen transport; however, a clear understanding is yet to be attained. This is due to the limitations of the state-of-the-art experimental procedures for such highly kinetic processes. In this study, we aim at providin… ▽ More Hydrogen grain boundary (GB) trapping is widely accepted as the main cause for hydrogen induced intergranular failure. Several studies were conducted to unveil the role of GBs on hydrogen transport; however, a clear understanding is yet to be attained. This is due to the limitations of the state-of-the-art experimental procedures for such highly kinetic processes. In this study, we aim at providing a deeper understanding of hydrogen-GB interactions using full-field representative volume element (RVE). The phase-field method is chosen for generating RVEs, since it is the an appropriate numerical tool to represent GBs. A novel fully-kinetic formulation for hydrogen diffusion and GB trapping is presented, which is compatible with the phase-field based RVEs. GB diffusivity ($D_\mathrm{gb}$) and trap-binding energy ($E_\mathrm{gb}$) were used as parameters to understand the interactions between diffusion and GB trapping. Uptake and permeation simulations were performed with constant and gradient occupancy boundary conditions respectively. In both cases, increasing $E_\mathrm{gb}$, increased the hydrogen GB occupancy. The permeation simulations showed that the hydrogen flux along the GBs increased with increasing both, $D_\mathrm{gb}$ and, surprisingly, $E_\mathrm{gb}$. Since trapping increases the hydrogen occupancy along GBs, it also increases the occupancy gradients, resulting in a higher flux. This led to the conclusion that, in the case of an external occupancy gradient, GB trapping and diffusion cooperate, rather than compete, to increase the hydrogen flux. On the other hand, the decisive factor for the retention of hydrogen at the GBs in permeation simulations was $D_\mathrm{gb}$ rather than $E_\mathrm{gb}$. △ Less

Submitted 25 December, 2024; originally announced December 2024.

arXiv:2412.18711 [pdf, other]

Measurement of reactor antineutrino oscillation amplitude and frequency using 3800 days of complete data sample of the RENO experiment

Authors: S. Jeon, H. I. Kim, J. H. Choi, H. I. Jang, J. S. Jang, K. K. Joo, D. E. Jung, J. G. Kim, J. H. Kim, J. Y. Kim, S. B. Kim, S. Y. Kim, W. Kim, E. Kwon, D. H. Lee, H. G. Lee, W. J. Lee, I. T. Lim, D. H. Moon, M. Y. Pac, J. S. Park, R. G. Park, H. Seo, J. W. Seo, C. D. Shin , et al. (5 additional authors not shown)

Abstract: We report an updated neutrino mixing angle of $θ_{13}$ obtained from a complete data sample of the RENO experiment. The experiment has measured the amplitude and frequency of reactor anti-electron-neutrinos ($\barν_{e}$) oscillations at the Hanbit nuclear power plant, Younggwang, Korea, since August 2011. As of March 2023, the data acquisition was completed after a total of 3800 live days of detec… ▽ More We report an updated neutrino mixing angle of $θ_{13}$ obtained from a complete data sample of the RENO experiment. The experiment has measured the amplitude and frequency of reactor anti-electron-neutrinos ($\barν_{e}$) oscillations at the Hanbit nuclear power plant, Younggwang, Korea, since August 2011. As of March 2023, the data acquisition was completed after a total of 3800 live days of detector operation. The observed candidates via inverse beta decay (IBD) are 1,211,995 (144,667) in the near (far) detector. Based on an observed energy-dependent reactor neutrino disappearance, neutrino oscillation parameters of $θ_{13}$ and $\lvertΔm_{ee}^2\rvert$ are precisely determined as $\sin^{2}2θ_{13}=0.0920_{-0.0042}^{+0.0044}(\text{stat.})_{-0.0041}^{+0.0041}(\text{syst.})$ and $\lvertΔm_{ee}^2\rvert=\left[2.57_{-0.11}^{+0.10}(\text{stat.})_{-0.05}^{+0.05}(\text{syst.})\right]\times10^{-3}~\text{eV}^{2}$. Compared to the previous RENO results published in Ref.~\cite{PhysRevLett.121.201801}, the precision is improved from 7.5\% to 6.4\% for $\sin^{2}2θ_{13}$ and from 5.2\% to 4.5\% for $\lvertΔm_{ee}^2\rvert$. The statistical error of the measurement has reached our goal and is hardly improved with additional data-taking. △ Less

Submitted 24 December, 2024; originally announced December 2024.

Comments: 13 pages, 11 figures

arXiv:2412.18509 [pdf, other]

The first JSNS$^2$ measurement of electron neutrino flux using $^{12}C(ν_{e},e^{-}) ^{12}N_{g.s.}$ reaction

Authors: T. Dodo, M. K. Cheoun, J. H. Choi, J. Y. Choi, J. Goh, K. Haga, M. Harada, S. Hasegawa, W. Hwang, H. I. Jang, J. S. Jang, K. K. Joo, D. E. Jung, S. K. Kang, Y. Kasugai, T. Kawasaki, E. M. Kim, S. Y. Kim, S. B. Kim, H. Kinoshita, T. Konno, D. H. Lee, C. Little, T. Maruyama, E. Marzec , et al. (26 additional authors not shown)

Abstract: JSNS$^2$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment searching for sterile neutrinos through the observation of $\barν_μ \rightarrow \barν_e$ appearance oscillations, using neutrinos produced by muon decay-at-rest. A key aspect of the experiment involves accurately understanding the neutrino flux and the quantities of pions and muons, which are progenitors… ▽ More JSNS$^2$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment searching for sterile neutrinos through the observation of $\barν_μ \rightarrow \barν_e$ appearance oscillations, using neutrinos produced by muon decay-at-rest. A key aspect of the experiment involves accurately understanding the neutrino flux and the quantities of pions and muons, which are progenitors of (anti-)neutrinos, given that their production rates have yet to be measured. We present the first electron-neutrino flux measurement using $^{12}\mathrm{C}(ν_{e},e^{-}) ^{12}\mathrm{N}_{g.s.}$ reaction in JSNS$^2$, yielding a flux of (6.7 $\pm$ 1.6 (stat.) $\pm$ 1.7 (syst.)) $\times$ 10$^{-9}$ cm$^{-2}$ proton$^{-1}$ at the JSNS$^2$ detector location, located at 24 meters distance from the mercury target. This flux measurement is consistent with predictions from simulations based on hadron models. △ Less

Submitted 24 December, 2024; originally announced December 2024.

Showing 51–100 of 2,289 results for author: Kim, B