Search | arXiv e-print repository

arXiv:2505.20065 [pdf, ps, other]

SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

Authors: Geon-Hyeong Kim, Youngsoo Jang, Yu Jin Kim, Byoungjip Kim, Honglak Lee, Kyunghoon Bae, Moontae Lee

Abstract: As Large Language Models (LLMs) continue to advance and find applications across a growing number of fields, ensuring the safety of LLMs has become increasingly critical. To address safety concerns, recent studies have proposed integrating safety constraints into Reinforcement Learning from Human Feedback (RLHF). However, these approaches tend to be complex, as they encompass complicated procedure… ▽ More As Large Language Models (LLMs) continue to advance and find applications across a growing number of fields, ensuring the safety of LLMs has become increasingly critical. To address safety concerns, recent studies have proposed integrating safety constraints into Reinforcement Learning from Human Feedback (RLHF). However, these approaches tend to be complex, as they encompass complicated procedures in RLHF along with additional steps required by the safety constraints. Inspired by Direct Preference Optimization (DPO), we introduce a new algorithm called SafeDPO, which is designed to directly optimize the safety alignment objective in a single stage of policy learning, without requiring relaxation. SafeDPO introduces only one additional hyperparameter to further enhance safety and requires only minor modifications to standard DPO. As a result, it eliminates the need to fit separate reward and cost models or to sample from the language model during fine-tuning, while still enhancing the safety of LLMs. Finally, we demonstrate that SafeDPO achieves competitive performance compared to state-of-the-art safety alignment algorithms, both in terms of aligning with human preferences and improving safety. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: 34 pages

arXiv:2505.09705 [pdf, other]

Search for a dark Higgs boson produced in association with inelastic dark matter at the Belle II experiment

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, N. Akopov, S. Alghamdi, M. Alhakami, A. Aloisio, N. Althubiti, K. Amos, M. Angelsmark, N. Anh Ky, C. Antonioli, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, N. K. Baghel, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal , et al. (415 additional authors not shown)

Abstract: Inelastic dark matter models that have two dark matter particles and a massive dark photon can reproduce the observed relic dark matter density without violating cosmological limits. The mass splitting between the two dark matter particles $χ_{1}$ and $χ_{2}$, with $m(χ_{2}) > m(χ_{1})$, is induced by a dark Higgs field and a corresponding dark Higgs boson $h^{\prime}$. We present a search for dar… ▽ More Inelastic dark matter models that have two dark matter particles and a massive dark photon can reproduce the observed relic dark matter density without violating cosmological limits. The mass splitting between the two dark matter particles $χ_{1}$ and $χ_{2}$, with $m(χ_{2}) > m(χ_{1})$, is induced by a dark Higgs field and a corresponding dark Higgs boson $h^{\prime}$. We present a search for dark matter in events with two vertices, at least one of which must be displaced from the interaction region, and missing energy. Using a $365\,\mbox{fb}^{-1}$ data sample collected at Belle II, which operates at the SuperKEKB $e^+e^-$ collider, we observe no evidence for a signal. We set upper limits on the product of the production cross section $σ\left(e^+e^- \to h^\prime χ_1 χ_2\right)$, and the product of branching fractions $\mathcal{B}\left(χ_2\toχ_1 e^+ e^-\right)\times\mathcal{B}\left(h^\prime\to x^+x^-\right)$, where $x^+x^-$ indicates $μ^+μ^-, π^+π^-$, or $K^+K^-$, as functions of $h^{\prime}$ mass and lifetime at the level of $10^{-1}\,\mbox{fb}$. We set model-dependent upper limits on the dark Higgs mixing angle at the level of $10^{-5}$ and on the dark photon kinetic mixing parameter at the level of $10^{-3}$. This is the first search for dark Higgs bosons in association with inelastic dark matter. △ Less

Submitted 14 May, 2025; originally announced May 2025.

Comments: Submitted for publication with Physical Review Letters

Report number: Belle II Preprint 2025-015, KEK Preprint 2025-14

arXiv:2505.00765 [pdf, other]

doi 10.1364/OL.551624

Experimental and on-sky demonstration of spectrally dispersed wavefront sensing using a photonic lantern

Authors: Jonathan Lin, Michael P. Fitzgerald, Yinzi Xin, Yoo Jung Kim, Olivier Guyon, Barnaby Norris, Christopher Betters, Sergio Leon-Saval, Kyohoon Ahn, Vincent Deo, Julien Lozi, Sébastien Vievard, Daniel Levinstein, Steph Sallum, Nemanja Jovanovic

Abstract: Adaptive optics systems are critical in any application where highly resolved imaging or beam control must be performed through a dynamic medium. Such applications include astronomy and free-space optical communications, where light propagates through the atmosphere, as well as medical microscopy and vision science, where light propagates through biological tissue. Recent works have demonstrated c… ▽ More Adaptive optics systems are critical in any application where highly resolved imaging or beam control must be performed through a dynamic medium. Such applications include astronomy and free-space optical communications, where light propagates through the atmosphere, as well as medical microscopy and vision science, where light propagates through biological tissue. Recent works have demonstrated common-path wavefront sensors for adaptive optics using the photonic lantern, a slowly varying waveguide that can efficiently couple multi-moded light into single-mode fibers. We use the SCExAO astrophotonics platform at the 8-m Subaru Telescope to show that spectral dispersion of lantern outputs can improve correction fidelity, culminating with an on-sky demonstration of real-time wavefront control. To our best knowledge, this is the first such result for either a spectrally dispersed or a photonic lantern wavefront sensor. Combined with the benefits offered by lanterns in precision spectroscopy, our results suggest the future possibility of a unified wavefront sensing spectrograph using compact photonic devices. △ Less

Submitted 1 May, 2025; originally announced May 2025.

Journal ref: Opt. Lett. 50, 2780-2783 (2025)

arXiv:2504.21233 [pdf, other]

Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Authors: Haoran Xu, Baolin Peng, Hany Awadalla, Dongdong Chen, Yen-Chun Chen, Mei Gao, Young Jin Kim, Yunsheng Li, Liliang Ren, Yelong Shen, Shuohang Wang, Weijian Xu, Jianfeng Gao, Weizhu Chen

Abstract: Chain-of-Thought (CoT) significantly enhances formal reasoning capabilities in Large Language Models (LLMs) by training them to explicitly generate intermediate reasoning steps. While LLMs readily benefit from such techniques, improving reasoning in Small Language Models (SLMs) remains challenging due to their limited model capacity. Recent work by Deepseek-R1 demonstrates that distillation from L… ▽ More Chain-of-Thought (CoT) significantly enhances formal reasoning capabilities in Large Language Models (LLMs) by training them to explicitly generate intermediate reasoning steps. While LLMs readily benefit from such techniques, improving reasoning in Small Language Models (SLMs) remains challenging due to their limited model capacity. Recent work by Deepseek-R1 demonstrates that distillation from LLM-generated synthetic data can substantially improve the reasoning ability of SLM. However, the detailed modeling recipe is not disclosed. In this work, we present a systematic training recipe for SLMs that consists of four steps: (1) large-scale mid-training on diverse distilled long-CoT data, (2) supervised fine-tuning on high-quality long-CoT data, (3) Rollout DPO leveraging a carefully curated preference dataset, and (4) Reinforcement Learning (RL) with Verifiable Reward. We apply our method on Phi-4-Mini, a compact 3.8B-parameter model. The resulting Phi-4-Mini-Reasoning model exceeds, on math reasoning tasks, much larger reasoning models, e.g., outperforming DeepSeek-R1-Distill-Qwen-7B by 3.2 points and DeepSeek-R1-Distill-Llama-8B by 7.7 points on Math-500. Our results validate that a carefully designed training recipe, with large-scale high-quality CoT data, is effective to unlock strong reasoning capabilities even in resource-constrained small models. △ Less

Submitted 29 April, 2025; originally announced April 2025.

arXiv:2504.15745 [pdf, other]

Search for lepton-flavor-violating $τ^- \to \ell^- K_s^0$ decays at Belle and Belle II

Authors: Belle, Belle II Collaborations, :, I. Adachi, L. Aggarwal, H. Ahmed, Y. Ahn, H. Aihara, N. Akopov, S. Alghamdi, M. Alhakami, A. Aloisio, N. Althubiti, K. Amos, M. Angelsmark, N. Anh Ky, C. Antonioli, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, N. K. Baghel, S. Bahinipati , et al. (397 additional authors not shown)

Abstract: We present the results of a search for charged-lepton-flavor violating decays $τ^{-} \rightarrow \ell^{-}K_{S}^{0}$, where $\ell^{-}$ is either an electron or a muon. We combine $e^+e^-$ data samples recorded by the Belle II experiment at the SuperKEKB collider (428 fb$^{-1}$) with samples recorded by the Belle experiment at the KEKB collider (980 fb$^{-1}$) to obtain a sample of 1.3 billion… ▽ More We present the results of a search for charged-lepton-flavor violating decays $τ^{-} \rightarrow \ell^{-}K_{S}^{0}$, where $\ell^{-}$ is either an electron or a muon. We combine $e^+e^-$ data samples recorded by the Belle II experiment at the SuperKEKB collider (428 fb$^{-1}$) with samples recorded by the Belle experiment at the KEKB collider (980 fb$^{-1}$) to obtain a sample of 1.3 billion $e^+e^-\toτ^+τ^-$ events. We observe 0 and 1 events and set $90\%$ confidence level upper limits of $0.8 \times 10^{-8}$ and $1.2 \times 10^{-8}$ on the branching fractions of the decay modes $τ^{-} \rightarrow e^{-}K_{S}^{0}$ and $τ^{-} \rightarrow μ^{-}K_{S}^{0}$, respectively. These are the most stringent upper limits to date. △ Less

Submitted 22 April, 2025; originally announced April 2025.

arXiv:2503.24292 [pdf, other]

Implicit Electric Field Conjugation with the Photonic Lantern Nuller

Authors: Yinzi Xin, Daniel Echeverri, Nemanja Jovanovic, Jonathan Lin, Yoo Jung Kim, Dimitri Mawet, Sergio Leon-Saval, Rodrigo Amezcua-Correa, Stephanos Yerolatsitis, Michael P. Fitzgerald, Pradip Gatkine, Suvinay Goyal, Barnaby Norris, Garreth Ruane, Steph Sallum

Abstract: The Photonic Lantern Nuller (PLN) is an instrument concept designed to characterize exoplanets within a single beam-width from its host star. The PLN leverages the spatial symmetry of a mode-selective photonic lantern (MSPL) to create nulled ports, which cancel out on-axis starlight but allow off-axis exoplanet light to couple. The null-depths are limited by wavefront aberrations in the system as… ▽ More The Photonic Lantern Nuller (PLN) is an instrument concept designed to characterize exoplanets within a single beam-width from its host star. The PLN leverages the spatial symmetry of a mode-selective photonic lantern (MSPL) to create nulled ports, which cancel out on-axis starlight but allow off-axis exoplanet light to couple. The null-depths are limited by wavefront aberrations in the system as well as by imperfections in the lantern. We show that the implicit electric field conjugation algorithm can be used to reduce the stellar coupling through the PLN by orders of magnitude while maintaining the majority of the off-axis light, leading to deeper null depths (~10^{-4}) and thus higher sensitivity to potential planet signals. We discuss a theory for the tradeoff we observed between the different ports, where iEFC improves the nulls of some ports at the expense of others, and show that targeting one port alone can lead to deeper starlight rejection through that port than when targeting all ports at once. We also observe different levels of stability depending on the port and discuss the implications for practically implementing this technique for science observations. △ Less

Submitted 10 April, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

Comments: Accepted at JATIS. 23 pages, 7 figures

arXiv:2503.17643 [pdf, other]

Measurements of the branching fractions of $Ξ_{c}^{+}\to Σ^{+}K_{S}^{0}$, $Ξ_{c}^{+}\to Ξ^{0}π^{+}$, and $Ξ_{c}^{+}\to Ξ^{0}K^{+}$ at Belle and Belle II

Authors: Belle, Belle II Collaborations, :, I. Adachi, J. K. Ahn, Y. Ahn, N. Akopov, S. Alghamdi, M. Alhakami, N. Althubiti, K. Amos, N. Anh Ky, C. Antonioli, D. M. Asner, M. Aversano, R. Ayad, V. Babu, N. K. Baghel, P. Bambade, Sw. Banerjee, M. Barrett, M. Bartl, J. Baudot, A. Beaubien, F. Becherer , et al. (335 additional authors not shown)

Abstract: Using 983.0 $\rm{fb}^{-1}$ and 427.9 $\rm{fb}^{-1}$ data samples collected with the Belle and Belle II detectors at the KEKB and SuperKEKB asymmetric energy $e^+e^-$ colliders, respectively, we present studies of the Cabibbo-favored $Ξ_c^+$ decays ${Ξ_{c}^{+}\to Σ^{+}K_{S}^{0}}$ and $Ξ_{c}^{+}\to Ξ^{0}π^{+}$, and the singly Cabibbo-suppressed decay $Ξ_{c}^{+}\to Ξ^{0}K^{+}$. The ratios of branchin… ▽ More Using 983.0 $\rm{fb}^{-1}$ and 427.9 $\rm{fb}^{-1}$ data samples collected with the Belle and Belle II detectors at the KEKB and SuperKEKB asymmetric energy $e^+e^-$ colliders, respectively, we present studies of the Cabibbo-favored $Ξ_c^+$ decays ${Ξ_{c}^{+}\to Σ^{+}K_{S}^{0}}$ and $Ξ_{c}^{+}\to Ξ^{0}π^{+}$, and the singly Cabibbo-suppressed decay $Ξ_{c}^{+}\to Ξ^{0}K^{+}$. The ratios of branching fractions of ${Ξ_{c}^{+}\to Σ^{+}K_{S}^{0}}$ and $Ξ_{c}^{+}\to Ξ^{0}K^{+}$ relative to that of $Ξ_{c}^{+}\toΞ^{-}π^{+}π^{+}$ are measured for the first time, while the ratio ${\cal B}(Ξ_{c}^{+}\toΞ^{0}π^{+})/{\cal B}(Ξ_{c}^{+}\toΞ^{-}π^{+}π^{+}) $ is also determined and improved by an order of magnitude in precision. The measured branching fraction ratios are $\frac{\cal{B}(Ξ_{c}^{+} \to Σ^{+}K_{S}^{0})}{\cal{B}(Ξ_{c}^{+}\to Ξ^{-}π^{+}π^+)}= 0.067 \pm 0.007 \pm 0.003$, $\frac{\cal{B}(Ξ_c^{+} \to Ξ^{0}π^{+})}{\cal{B}(Ξ_{c}^{+}\to Ξ^{-}π^{+}π^+)} = 0.248 \pm 0.005 \pm 0.009$, $\frac{\cal{B}(Ξ_c^{+} \to Ξ^{0}K^{+})}{\cal{B}(Ξ_{c}^{+}\to Ξ^{-}π^{+}π^+)} = 0.017 \pm 0.003 \pm 0.001$. Additionally, the ratio ${\cal B}(Ξ_{c}^{+}\toΞ^{0}K^{+})/{\cal B}(Ξ_{c}^{+}\toΞ^{0}π^{+})$ is measured to be $ 0.068 \pm 0.010 \pm 0.004$. Here, the first and second uncertainties are statistical and systematic, respectively. Multiplying the ratios by the branching fraction of the normalization mode, ${\mathcal B}(Ξ_{c}^{+}\toΞ^{-}π^{+}π^+)= (2.9\pm 1.3)\%$, we obtain the following absolute branching fractions ${\cal B}(Ξ_{c}^{+}\toΣ^{+}K^{0}_{S}) = (0.194 \pm 0.021 \pm 0.009 \pm 0.087 )%$, ${\cal B}(Ξ_{c}^{+}\toΞ^{0}π^{+}) = (0.719 \pm 0.014 \pm 0.024 \pm 0.322 )%$, ${\cal B}(Ξ_{c}^{+}\toΞ^{0}K^{+}) = (0.049 \pm 0.007 \pm 0.002 \pm 0.022 )%$. △ Less

Submitted 22 March, 2025; originally announced March 2025.

Comments: 20 pages, 4 figures, 3 Tables

Report number: Belle II Preprint 2025-005; KEK Preprint 2025-2

arXiv:2503.05202 [pdf, other]

Bridging between reheating and late-time observations in quintessential inflation

Authors: Ok Song An, Jin U Kang, Yong Jin Kim, Ui Ri Mun

Abstract: We propose an idea to build a bridge between reheating and late-time observations in quintessential inflation by backtracking the evolution of the inflaton field from the present time to the end of reheating. This idea is implemented when the potential gradient is negligible compared to the Hubble friction, rendering the inflaton field frozen, till the present time. We find a simple analytic relat… ▽ More We propose an idea to build a bridge between reheating and late-time observations in quintessential inflation by backtracking the evolution of the inflaton field from the present time to the end of reheating. This idea is implemented when the potential gradient is negligible compared to the Hubble friction, rendering the inflaton field frozen, till the present time. We find a simple analytic relation between the reheating temperature and the observational parameters for dark energy, and numerically confirm its validity for typical models of quintessential inflation. This relation is universal and can apply to all quintessential inflation models with any reheating mechanism. It also implies that any quintessential inflation model with a successful reheating with the reheating temperature $1\textrm{MeV}\lesssim T_\textrm{re}\lesssim 10^{15}\textrm{GeV}$ predicts the equation of state of dark energy today extremely close to $-1$, i.e. $-1+10^{-60}\lesssim w_0\lesssim -1+10^{-24}$, unless the inflaton field unfreezes before the present time. △ Less

Submitted 7 March, 2025; originally announced March 2025.

Comments: 27 pages, 10 figures

arXiv:2503.04371 [pdf, other]

Measurement of the Branching Fraction of $Λ_c^+ \to p K_S^0 π^0$ at Belle

Authors: The Belle, Belle II Collaborations, :, I. Adachi, L. Aggarwal, H. Ahmed, J. K. Ahn, H. Aihara, N. Akopov, M. Alhakami, A. Aloisio, N. Althubiti, M. Angelsmark, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, N. K. Baghel, S. Bahinipati, P. Bambade , et al. (404 additional authors not shown)

Abstract: We report a precise measurement of the ratio of branching fractions $\mathcal{B}(Λ_c^+\to p K_S^0 π^0)/\mathcal{B}(Λ_c^+\to p K^- π^+)$ using 980 fb$^{-1}$ of $e^+e^-$ data from the Belle experiment. We obtain a value of $\mathcal{B}(Λ_c^+\to p K_S^0 π^0)/\mathcal{B}(Λ_c^+\to p K^- π^+)=0.339\pm 0.002\pm 0.009$, where the first and second uncertainties are statistical and systematic, respectively.… ▽ More We report a precise measurement of the ratio of branching fractions $\mathcal{B}(Λ_c^+\to p K_S^0 π^0)/\mathcal{B}(Λ_c^+\to p K^- π^+)$ using 980 fb$^{-1}$ of $e^+e^-$ data from the Belle experiment. We obtain a value of $\mathcal{B}(Λ_c^+\to p K_S^0 π^0)/\mathcal{B}(Λ_c^+\to p K^- π^+)=0.339\pm 0.002\pm 0.009$, where the first and second uncertainties are statistical and systematic, respectively. This Belle result is consistent with the previous measurement from the CLEO experiment but has a fivefold improvement in precision. By combining our result with the world average $\mathcal{B}(Λ_c^+\to p K^- π^+)$, we obtain the absolute branching fraction $\mathcal{B}(Λ_c^+\to p K_S^0 π^0)=(2.12\pm 0.01\pm 0.05 \pm 0.10)\%$, where the uncertainties are statistical, systematic, and the uncertainty in the absolute branching fraction scale $\mathcal{B}(Λ_c^+\to p K^- π^+)$, respectively. This measurement can shed light on hadronic decay mechanisms in charmed baryon decays. △ Less

Submitted 18 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

Comments: 20 pages, 7 figures

Report number: Belle II Preprint: 2024-022, KEK preprint: 2024-20

arXiv:2503.01743 [pdf, other]

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Authors: Microsoft, :, Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, Dong Chen, Dongdong Chen, Junkun Chen, Weizhu Chen, Yen-Chun Chen, Yi-ling Chen, Qi Dai, Xiyang Dai, Ruchao Fan, Mei Gao, Min Gao, Amit Garg, Abhishek Goswami , et al. (51 additional authors not shown)

Abstract: We introduce Phi-4-Mini and Phi-4-Multimodal, compact yet highly capable language and multimodal models. Phi-4-Mini is a 3.8-billion-parameter language model trained on high-quality web and synthetic data, significantly outperforming recent open-source models of similar size and matching the performance of models twice its size on math and coding tasks requiring complex reasoning. This achievement… ▽ More We introduce Phi-4-Mini and Phi-4-Multimodal, compact yet highly capable language and multimodal models. Phi-4-Mini is a 3.8-billion-parameter language model trained on high-quality web and synthetic data, significantly outperforming recent open-source models of similar size and matching the performance of models twice its size on math and coding tasks requiring complex reasoning. This achievement is driven by a carefully curated synthetic data recipe emphasizing high-quality math and coding datasets. Compared to its predecessor, Phi-3.5-Mini, Phi-4-Mini features an expanded vocabulary size of 200K tokens to better support multilingual applications, as well as group query attention for more efficient long-sequence generation. Phi-4-Multimodal is a multimodal model that integrates text, vision, and speech/audio input modalities into a single model. Its novel modality extension approach leverages LoRA adapters and modality-specific routers to allow multiple inference modes combining various modalities without interference. For example, it now ranks first in the OpenASR leaderboard to date, although the LoRA component of the speech/audio modality has just 460 million parameters. Phi-4-Multimodal supports scenarios involving (vision + language), (vision + speech), and (speech/audio) inputs, outperforming larger vision-language and speech-language models on a wide range of tasks. Additionally, we experiment to further train Phi-4-Mini to enhance its reasoning capabilities. Despite its compact 3.8-billion-parameter size, this experimental version achieves reasoning performance on par with or surpassing significantly larger models, including DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Llama-8B. △ Less

Submitted 7 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

Comments: 39 pages

arXiv:2502.09283 [pdf, other]

Rate-Splitting Multiple Access for 6G: Prototypes, Experimental Results and Link/System level Simulations

Authors: Sundar Aditya, Yong Jin Daniel Kim, David Vargas, David Redgate, Onur Dizdar, Neil Bhushan, Xinze Lyu, Sibo Zhang, Stephen Wang, Bruno Clerckx

Abstract: Rate-Splitting Multiple Access (RSMA) is a powerful and versatile physical layer multiple access technique that generalizes and has better interference management capabilities than 5G-based Space Division Multiple Access (SDMA). It is also a rapidly maturing technology, all of which makes it a natural successor to SDMA in 6G. In this article, we describe RSMA's suitability for 6G by presenting: i)… ▽ More Rate-Splitting Multiple Access (RSMA) is a powerful and versatile physical layer multiple access technique that generalizes and has better interference management capabilities than 5G-based Space Division Multiple Access (SDMA). It is also a rapidly maturing technology, all of which makes it a natural successor to SDMA in 6G. In this article, we describe RSMA's suitability for 6G by presenting: i) link and system level simulations of RSMA's performance gains over SDMA in realistic environments, and (ii) pioneering experimental results that demonstrate RSMA's gains over SDMA for key use cases like enhanced Mobile Broadband (eMBb), and Integrated Sensing and Communications (ISAC). We also comment on the status of standardization activities for RSMA. △ Less

Submitted 17 February, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

Comments: Submitted to the IEEE Communications Standards Magazine December 2025 Special Issue on "Wireless Technologies for 6G and Beyond: Applications, Implementations, and Standardization"

arXiv:2501.09985 [pdf, ps, other]

Fully viable DHOST bounce with extra scalar

Authors: Ok Song An, Jin U Kang, Yong Jin Kim, Ui Ri Mun, Un Gyong Ri

Abstract: In this paper we construct a class of Degenerate Higher-Order Scalar-Tensor (DHOST) theories with an extra scalar field, which admits viable solutions of bouncing universe satisfying the following requirements: (i) absence of Belinski-Khalatnikov-Lifshitz (BKL) instability, ghost and gradient instability, (ii) absence of superluminality, (iii) generation of nearly scale-invariant curvature perturb… ▽ More In this paper we construct a class of Degenerate Higher-Order Scalar-Tensor (DHOST) theories with an extra scalar field, which admits viable solutions of bouncing universe satisfying the following requirements: (i) absence of Belinski-Khalatnikov-Lifshitz (BKL) instability, ghost and gradient instability, (ii) absence of superluminality, (iii) generation of nearly scale-invariant curvature perturbations and very small tensor-to-scalar ratio, and (iv) conventional asymptotics in the distant past and future, where gravity sector is described by General Relativity and the DHOST scalar has a canonical form of Lagrangian. We also expect our models to have sufficiently small non-Gaussianities of primordial curvature perturbations to be compatible with observations. As such, this work exemplifies for the first time the fully viable two-field DHOST bouncing cosmology, which is free of instability and superluminality problems as well as compatible with observations. △ Less

Submitted 13 April, 2025; v1 submitted 17 January, 2025; originally announced January 2025.

Comments: 28 pages, two appendices, 12 figures

arXiv:2412.14260 [pdf, other]

Measurement of the branching fraction and $\it CP$-violating asymmetry of the decay $B^{0} \rightarrow π^{0} π^{0}$ using $387$ million bottom-antibottom meson pairs in Belle II data

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, M. Alhakami, A. Aloisio, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, N. K. Baghel, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, M. Bartl, J. Baudot, A. Baur , et al. (415 additional authors not shown)

Abstract: We measure the branching fraction and $\it CP$-violating flavor-dependent rate asymmetry of $B^{0} \to π^{0} π^{0}$ decays reconstructed using the Belle II detector in an electron-positron collision sample containing $387 \times 10^{6}$ $B\overline{B}$ pairs. Using an optimized event selection, we find $126\pm 20$ signal decays in a fit to background-discriminating and flavor-sensitive distributio… ▽ More We measure the branching fraction and $\it CP$-violating flavor-dependent rate asymmetry of $B^{0} \to π^{0} π^{0}$ decays reconstructed using the Belle II detector in an electron-positron collision sample containing $387 \times 10^{6}$ $B\overline{B}$ pairs. Using an optimized event selection, we find $126\pm 20$ signal decays in a fit to background-discriminating and flavor-sensitive distributions. The resulting branching fraction is $(1.25 \pm 0.23)\times 10^{-6}$ and the $\it CP$-violating asymmetry is $0.03 \pm 0.30$. △ Less

Submitted 18 December, 2024; originally announced December 2024.

Report number: Belle II Preprint 2024-032, KEK Preprint 2024-34

arXiv:2412.08662 [pdf, other]

doi 10.1088/1748-0221/19/12/P12008

Performance of the prototype beam drift chamber for LAMPS at RAON with proton and Carbon-12 beams

Authors: H. Kim, Y. Bae, C. Heo, J. Seo, J. Hwang, D. H. Moon, D. S. Ahn, J. K. Ahn, J. Bae, J. Bok, Y. Cheon, S. W. Choi, S. Do, B. Hong, S. -W. Hong, J. Huh, S. Hwang, Y. Jang, B. Kang, A. Kim, B. Kim, C. Kim, E. -J. Kim, G. Kim, G. Kim , et al. (23 additional authors not shown)

Abstract: Beam Drift Chamber (BDC) is designed to reconstruct the trajectories of incident rare isotope beams provided by RAON (Rare isotope Accelerator complex for ON-line experiments) into the experimental target of LAMPS (Large Acceptance Multi-Purpose Spectrometer). To conduct the performance test of the BDC, the prototype BDC (pBDC) is manufactured and evaluated with the high energy ion beams from HIMA… ▽ More Beam Drift Chamber (BDC) is designed to reconstruct the trajectories of incident rare isotope beams provided by RAON (Rare isotope Accelerator complex for ON-line experiments) into the experimental target of LAMPS (Large Acceptance Multi-Purpose Spectrometer). To conduct the performance test of the BDC, the prototype BDC (pBDC) is manufactured and evaluated with the high energy ion beams from HIMAC (Heavy Ion Medical Accelerator in Chiba) facility in Japan. Two kinds of ion beams, 100 MeV proton, and 200 MeV/u $^{12}$C, have been utilized for this evaluation, and the track reconstruction efficiency and position resolution have been measured as the function of applied high voltage. This paper introduces the construction details and presents the track reconstruction efficiency and position resolution of pBDC. △ Less

Submitted 6 December, 2024; originally announced December 2024.

Comments: 13 pages, 15 figures

Journal ref: JINST 19 (2024) P12008

arXiv:2411.14032 [pdf, other]

Measurement of the inclusive branching fractions for $B_s^0$ decays into $D$ mesons via hadronic tagging

Authors: Belle, Belle II Collaborations, :, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, N. Akopov, A. Aloisio, S. Al Said, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, N. K. Baghel, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal , et al. (430 additional authors not shown)

Abstract: We report measurements of the absolute branching fractions $\mathcal{B}(B_s^0 \to D_s^{\pm} X)$, $\mathcal{B}(B_s^0 \to D^0/\bar{D}^0 X)$, and $\mathcal{B}(B_s^0 \to D^{\pm} X)$, where the latter is measured for the first time. The results are based on a 121.4\,fb$^{-1}$ data sample collected at the $Υ(10860)$ resonance by the Belle detector at the KEKB asymmetric-energy $e^+ e^-$ collider. We rec… ▽ More We report measurements of the absolute branching fractions $\mathcal{B}(B_s^0 \to D_s^{\pm} X)$, $\mathcal{B}(B_s^0 \to D^0/\bar{D}^0 X)$, and $\mathcal{B}(B_s^0 \to D^{\pm} X)$, where the latter is measured for the first time. The results are based on a 121.4\,fb$^{-1}$ data sample collected at the $Υ(10860)$ resonance by the Belle detector at the KEKB asymmetric-energy $e^+ e^-$ collider. We reconstruct one $B_s^0$ meson in $e^+e^- \to Υ(10860) \to B_s^{*} \bar{B}_s^{*}$ events and measure yields of $D_s^+$, $D^0$, and $D^+$ mesons in the rest of the event. We obtain $\mathcal{B}(B_s^0 \to D_s^{\pm} X) = (68.6 \pm 7.2 \pm 4.0)\%$, $\mathcal{B}(B_s^0 \to D^0/\bar{D}^0 X) = (21.5 \pm 6.1 \pm 1.8)\%$, and $\mathcal{B}(B_s^0 \to D^{\pm} X) = (12.6 \pm 4.6 \pm 1.3)\%$, where the first uncertainty is statistical and the second is systematic. Averaging with previous Belle measurements gives $\mathcal{B}(B_s^0 \to D_s^{\pm} X) = (63.4 \pm 4.5 \pm 2.2)\%$ and $\mathcal{B}(B_s^0 \to D^0/\bar{D}^0 X) = (23.9 \pm 4.1 \pm 1.8)\%$. For the $B_s^0$ production fraction at the $Υ(10860)$, we find $f_s = (21.4^{+1.5}_{-1.7})\%$. △ Less

Submitted 18 February, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

Comments: 23 pages, 9 figures, submitted to JHEP

Report number: Belle II Preprint 2024-030, KEK Preprint 2024-32

arXiv:2411.10127 [pdf, other]

Measurement of $B \to K{}^{*}(892)γ$ decays at Belle II

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, N. Akopov, A. Aloisio, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, N. K. Baghel, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, M. Bartl, J. Baudot , et al. (429 additional authors not shown)

Abstract: We present measurements of $B \to K{}^{*}(892)γ$ decays using $365\,{\rm fb}^{-1}$ of data collected from 2019 to 2022 by the Belle~II experiment at the SuperKEKB asymmetric-energy $e^+e^-$ collider. The data sample contains $(387 \pm 6) \times 10^6$ $Υ(4S)$ events. We measure branching fractions ($\mathcal{B}$) and $C\!P$ asymmetries ($\mathcal{A}_{C\!P}$) for both $B^{0}\to K{}^{*0}γ$ and… ▽ More We present measurements of $B \to K{}^{*}(892)γ$ decays using $365\,{\rm fb}^{-1}$ of data collected from 2019 to 2022 by the Belle~II experiment at the SuperKEKB asymmetric-energy $e^+e^-$ collider. The data sample contains $(387 \pm 6) \times 10^6$ $Υ(4S)$ events. We measure branching fractions ($\mathcal{B}$) and $C\!P$ asymmetries ($\mathcal{A}_{C\!P}$) for both $B^{0}\to K{}^{*0}γ$ and $B^{+}\to K{}^{*+}γ$ decays. The difference in $C\!P$ asymmetries ($Δ\mathcal{A}_{C\!P}$) and the isospin asymmetry ($Δ_{0+}$) between these neutral and charged channels are also measured. We obtain the following branching fractions and $C\!P$ asymmetries: $\mathcal{B} (B^{0} \to K{}^{*0}γ) = (4.14 \pm 0.10 \pm 0.11 ) \times 10^{-5}$, $\mathcal{B} (B^{+} \to K{}^{*+}γ) = (4.04 \pm 0.13 {}^{+0.13}_{-0.15} )\times 10^{-5}$, $\mathcal{A}_{C\!P} (B^{0} \to K{}^{*0}γ) = (-3.3 \pm 2.3 \pm 0.4 )\%$, and $\mathcal{A}_{C\!P} (B^{+} \to K{}^{*+}γ) = (-0.7 \pm 2.9 \pm 0.5 )\%$. The measured difference in $C\!P$ asymmetries is $Δ\mathcal{A}_{C\!P} = (+2.6 \pm 3.8 \pm 0.6 )\%$, and the measured isospin asymmetry is $Δ_{0+} = (+4.8 \pm 2.0 \pm 1.8 )\%$. The first uncertainties listed are statistical and the second are systematic. These results are consistent with world-average values and theory predictions. △ Less

Submitted 19 March, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

Report number: Belle II Preprint 2024-029; KEK Preprint 2024-31

arXiv:2411.02501 [pdf, other]

Spectral characterization of a 3-port photonic lantern for application to spectroastrometry

Authors: Yoo Jung Kim, Michael P. Fitzgerald, Jonathan Lin, Julien Lozi, Sébastien Vievard, Yinzi Xin, Daniel Levinstein, Nemanja Jovanovic, Sergio Leon-Saval, Christopher Betters, Olivier Guyon, Barnaby Norris, Steph Sallum

Abstract: Spectroastrometry, which measures wavelength-dependent shifts in the center of light, is well-suited for studying objects whose morphology changes with wavelength at very high angular resolutions. Photonic lantern (PL)-fed spectrometers have potential to enable measurement of spectroastrometric signals because the relative intensities between the PL output SMFs contain spatial information on the i… ▽ More Spectroastrometry, which measures wavelength-dependent shifts in the center of light, is well-suited for studying objects whose morphology changes with wavelength at very high angular resolutions. Photonic lantern (PL)-fed spectrometers have potential to enable measurement of spectroastrometric signals because the relative intensities between the PL output SMFs contain spatial information on the input scene. In order to use PL output spectra for spectroastrometric measurements, it is important to understand the wavelength-dependent behaviors of PL outputs and develop methods to calibrate the effects of time-varying wavefront errors in ground-based observations. We present experimental characterizations of the 3-port PL on the SCExAO testbed at the Subaru Telescope. We develop spectral response models of the PL and verify the behaviors with lab experiments. We find sinusoidal behavior of astrometric sensitivity of the 3-port PL as a function of wavelength, as expected from numerical simulations. Furthermore, we compare experimental and numerically simulated coupling maps and discuss their potential use for offsetting pointing errors. We then present a method of building PL spectral response models (solving for the transfer matrices as a function of wavelength) using coupling maps, which can be used for further calibration strategies. △ Less

Submitted 4 November, 2024; originally announced November 2024.

Comments: Accepted for publication in the Journal of Astronomical Telescopes, Instruments, and Systems (JATIS)

arXiv:2410.09002 [pdf, other]

WaveDiffusion: Exploring Full Waveform Inversion via Joint Diffusion in the Latent Space

Authors: Hanchen Wang, Yinan Feng, Yinpeng Chen, Jeeun Kang, Yixuan Wu, Young Jin Kim, Youzuo Lin

Abstract: Full Waveform Inversion (FWI) reconstructs high-resolution subsurface velocity maps from seismic waveform data governed by partial differential equations (PDEs). Traditional machine learning approaches frame FWI as an image-to-image translation task, mapping seismic data to velocity maps via encoder-decoder architectures. In this paper, we revisit FWI from a new perspective: generating both modali… ▽ More Full Waveform Inversion (FWI) reconstructs high-resolution subsurface velocity maps from seismic waveform data governed by partial differential equations (PDEs). Traditional machine learning approaches frame FWI as an image-to-image translation task, mapping seismic data to velocity maps via encoder-decoder architectures. In this paper, we revisit FWI from a new perspective: generating both modalities simultaneously. We found that both modalities can be jointly generated from a shared latent space using a diffusion process. Remarkably, our jointly generated seismic-velocity pairs inherently satisfy the governing PDE without requiring additional constraints. This reveals an interesting insight: the diffusion process inherently learns a scoring mechanism in the latent space, quantifying the deviation from the governing PDE. Specifically, the generated seismic-velocity pairs with higher scores are closer to the solutions of the governing PDEs. Our experiments on the OpenFWI dataset demonstrate that the generated seismic-velocity pairs not only yield high fidelity, diversity and physical consistency, but also can serve as effective augmentation for training data-driven FWI models. △ Less

Submitted 11 February, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.08622 [pdf, other]

doi 10.1103/PhysRevD.111.012011

Observation of time-dependent $CP$ violation and measurement of the branching fraction of $B^0 \to J/ψπ^0$ decays

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, N. Akopov, A. Aloisio, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, N. K. Baghel, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, J. Baudot, A. Baur, A. Beaubien, F. Becherer , et al. (369 additional authors not shown)

Abstract: We present a measurement of the branching fraction and time-dependent charge-parity ($CP$) decay-rate asymmetries in $B^0 \to J/ψπ^0$ decays. The data sample was collected with the Belle~II detector at the SuperKEKB asymmetric $e^+e^-$ collider in 2019-2022 and contains $(387\pm 6)\times 10^6$ $B\overline{B}$ meson pairs from $Υ(4S)$ decays. We reconstruct $392\pm 24$ signal decays and fit the… ▽ More We present a measurement of the branching fraction and time-dependent charge-parity ($CP$) decay-rate asymmetries in $B^0 \to J/ψπ^0$ decays. The data sample was collected with the Belle~II detector at the SuperKEKB asymmetric $e^+e^-$ collider in 2019-2022 and contains $(387\pm 6)\times 10^6$ $B\overline{B}$ meson pairs from $Υ(4S)$ decays. We reconstruct $392\pm 24$ signal decays and fit the $CP$ parameters from the distribution of the proper-decay-time difference of the two $B$ mesons. We measure the branching fraction to be $B(B^0 \to J/ψπ^0)=(2.00 \pm 0.12 \pm 0.09)\times 10^{-5}$ and the direct and mixing-induced $CP$ asymmetries to be $C_{CP}=0.13 \pm 0.12 \pm 0.03$ and $S_{CP}=-0.88 \pm 0.17 \pm 0.03$, respectively, where the first uncertainties are statistical and the second are systematic. We observe mixing-induced $CP$ violation with a significance of $5.0$ standard deviations for the first time in this mode. △ Less

Submitted 27 January, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

Report number: Belle II preprint: 2024-018, KEK preprint: 2024-14

Journal ref: Phys. Rev. D 111, 012011 (2025)

arXiv:2409.19250 [pdf, other]

Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition

Authors: Minseo Kwon, Yaesol Kim, Young J. Kim

Abstract: In robotic task planning, symbolic planners using rule-based representations like PDDL are effective but struggle with long-sequential tasks in complicated environments due to exponentially increasing search space. Meanwhile, LLM-based approaches, which are grounded in artificial neural networks, offer faster inference and commonsense reasoning but suffer from lower success rates. To address the l… ▽ More In robotic task planning, symbolic planners using rule-based representations like PDDL are effective but struggle with long-sequential tasks in complicated environments due to exponentially increasing search space. Meanwhile, LLM-based approaches, which are grounded in artificial neural networks, offer faster inference and commonsense reasoning but suffer from lower success rates. To address the limitations of the current symbolic (slow speed) or LLM-based approaches (low accuracy), we propose a novel neuro-symbolic task planner that decomposes complex tasks into subgoals using LLM and carries out task planning for each subgoal using either symbolic or MCTS-based LLM planners, depending on the subgoal complexity. This decomposition reduces planning time and improves success rates by narrowing the search space and enabling LLMs to focus on more manageable tasks. Our method significantly reduces planning time while maintaining high success rates across task planning domains, as well as real-world and simulated robotics environments. More details are available at http://graphics.ewha.ac.kr/LLMTAMP/. △ Less

Submitted 31 March, 2025; v1 submitted 28 September, 2024; originally announced September 2024.

arXiv:2409.12136 [pdf, other]

GRIN: GRadient-INformed MoE

Authors: Liyuan Liu, Young Jin Kim, Shuohang Wang, Chen Liang, Yelong Shen, Hao Cheng, Xiaodong Liu, Masahiro Tanaka, Xiaoxia Wu, Wenxiang Hu, Vishrav Chaudhary, Zeqi Lin, Chenruidong Zhang, Jilong Xue, Hany Awadalla, Jianfeng Gao, Weizhu Chen

Abstract: Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules. However, sparse computation challenges traditional training practices, as discrete expert routing hinders standard backpropagation and thus gradient-based optimization, which are the cornerstone of deep learning. To… ▽ More Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules. However, sparse computation challenges traditional training practices, as discrete expert routing hinders standard backpropagation and thus gradient-based optimization, which are the cornerstone of deep learning. To better pursue the scaling power of MoE, we introduce GRIN (GRadient-INformed MoE training), which incorporates sparse gradient estimation for expert routing and configures model parallelism to avoid token dropping. Applying GRIN to autoregressive language modeling, we develop a top-2 16$\times$3.8B MoE model. Our model, with only 6.6B activated parameters, outperforms a 7B dense model and matches the performance of a 14B dense model trained on the same data. Extensive evaluations across diverse tasks demonstrate the potential of GRIN to significantly enhance MoE efficacy, achieving 79.4 on MMLU, 83.7 on HellaSwag, 74.4 on HumanEval, and 58.9 on MATH. △ Less

Submitted 18 September, 2024; originally announced September 2024.

Comments: 58 pages

arXiv:2409.09120 [pdf, other]

On the Potential of Spectroastrometry with Photonic Lanterns

Authors: Yoo Jung Kim, Michael P. Fitzgerald, Jonathan Lin, Yinzi Xin, Daniel Levinstein, Steph Sallum, Nemanja Jovanovic, Sergio Leon-Saval

Abstract: We investigate the potential of photonic lantern (PL) fiber fed spectrometers for two-dimensional spectroastrometry. Spectroastrometry, a technique for studying small angular scales by measuring centroid shifts as a function of wavelength, is typically conducted using long-slit spectrographs. However, slit-based spectroastrometry requires observations with multiple position angles to measure two-d… ▽ More We investigate the potential of photonic lantern (PL) fiber fed spectrometers for two-dimensional spectroastrometry. Spectroastrometry, a technique for studying small angular scales by measuring centroid shifts as a function of wavelength, is typically conducted using long-slit spectrographs. However, slit-based spectroastrometry requires observations with multiple position angles to measure two-dimensional spectroastrometric signals. In a typical configuration of PL-fed spectrometers, light from the focal plane is coupled into the few-moded PL, which is then split into several single-mode outputs, with the relative intensities containing astrometric information. The single-moded beams can be fed into a high-resolution spectrometer to measure wavelength-dependent centroid shifts. We perform numerical simulations of a standard 6-port PL and demonstrate its capability of measuring spectroastrometric signals. The effects of photon noise, wavefront errors, and chromaticity are investigated. When the PL is designed to have large linear responses to tip-tilts at the wavelengths of interest, the centroid shifts can be efficiently measured. Furthermore, we provide mock observations of detecting accreting protoplanets. PL spectroastrometry is potentially a simple and efficient technique for detecting spectroastrometric signals. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: Accepted for publication in the Journal of Astronomical Telescopes, Instruments, and Systems (JATIS)

arXiv:2409.06958 [pdf, other]

doi 10.1051/0004-6361/202450234

Spectroscopy using a visible photonic lantern at the Subaru telescope: Laboratory characterization and first on-sky demonstration on Ikiiki (α Leo) and `Aua (α Ori)

Authors: Sébastien Vievard, Manon Lallement, Sergio Leon-Saval, Olivier Guyon, Nemanja Jovanovic, Elsa Huby, Sylvestre Lacour, Julien Lozi, Vincent Deo, Kyohoon Ahn, Miles Lucas, Steph Sallum, Barnaby Norris, Chris Betters, Rodrygo Amezcua-Correa, Stephanos Yerolatsitis, Michael Fitzgerald, Jon Lin, Yoo Jung Kim, Pradip Gatkine, Takayuki Kotani, Motohide Tamura, Thayne Currie, Harry-Dean Kenchington, Guillermo Martin , et al. (1 additional authors not shown)

Abstract: Photonic lanterns are waveguide devices enabling high throughput single mode spectroscopy and high angular resolution. We aim to present the first on-sky demonstration of a photonic lantern (PL) operating in visible light, to measure its throughput and assess its potential for high-resolution spectroscopy of compact objects. We used the SCExAO instrument (a double stage extreme AO system installed… ▽ More Photonic lanterns are waveguide devices enabling high throughput single mode spectroscopy and high angular resolution. We aim to present the first on-sky demonstration of a photonic lantern (PL) operating in visible light, to measure its throughput and assess its potential for high-resolution spectroscopy of compact objects. We used the SCExAO instrument (a double stage extreme AO system installed at the Subaru telescope) and FIRST mid-resolution spectrograph (R 3000) to test the visible capabilities of the PL on internal source and on-sky observations. The best averaged coupling efficiency over the PL field of view was measured at 51% +/- 10% with a peak at 80%. We also investigate the relationship between coupling efficiency and the Strehl ratio for a PL, comparing them with those of a single-mode fiber (SMF). Findings show that in the AO regime, a PL offers better coupling efficiency performance than a SMF, especially in the presence of low spatial frequency aberrations. We observed Ikiiki (alpha Leo - mR = 1.37) and `Aua (alpha Ori - mR = -1.17) at a frame rate of 200 Hz. Under median seeing conditions (about 1 arcsec measured in H band) and large tip/tilt residuals (over 20 mas), we estimated an average light coupling efficiency of 14.5% +/- 7.4%, with a maximum of 42.8% at 680 nm. We were able to reconstruct both star's spectra, containing various absorption lines. The successful demonstration of this device opens new possibilities in terms of high throughput single-mode fiber-fed spectroscopy in the Visible. The demonstrated on-sky coupling efficiency performance would not have been achievable with a single SMF injection setup under similar conditions, partly because the residual tip/tilt alone exceeded the field of view of a visible SMF (18 mas at 700 nm). Thus emphasizing the enhanced resilience of PL technology to such atmospheric disturbances. The additional △ Less

Submitted 14 November, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

Comments: Accepted in Astronomy and Astrophysics journal on 9/11/2024

Journal ref: A&A 691, A140 (2024)

arXiv:2407.11276 [pdf, other]

doi 10.1063/5.0228845

A practical approach to calculating magnetic Johnson noise for precision measurements

Authors: N. S. Phan, S. M. Clayton, Y. J. Kim, T. M. Ito

Abstract: Magnetic Johnson noise is an important consideration for many applications involving precision magnetometry, and its significance will only increase in the future with improvements in measurement sensitivity. The fluctuation-dissipation theorem can be utilized to derive analytic expressions for magnetic Johnson noise in certain situations. But when used in conjunction with finite element analysis… ▽ More Magnetic Johnson noise is an important consideration for many applications involving precision magnetometry, and its significance will only increase in the future with improvements in measurement sensitivity. The fluctuation-dissipation theorem can be utilized to derive analytic expressions for magnetic Johnson noise in certain situations. But when used in conjunction with finite element analysis tools, the combined approach is particularly powerful as it provides a practical means to calculate the magnetic Johnson noise arising from conductors of arbitrary geometry and permeability. In this paper, we demonstrate this method to be one of the most comprehensive approaches presently available to calculate thermal magnetic noise. In particular, its applicability is shown to not be limited to cases where the noise is evaluated at a point in space but also can be expanded to include cases where the magnetic field detector has a more general shape, such as a finite size loop, a gradiometer, or a detector that consists of a polarized atomic species trapped in a volume. Furthermore, some physics insights gained through studies made using this method are discussed △ Less

Submitted 13 September, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

Report number: LA-UR-24-27277

Journal ref: J. Appl. Phys. 136, 124901 (2024)

arXiv:2407.00965 [pdf, other]

doi 10.1088/1674-1137/ad806c

Measurement of the integrated luminosity of data samples collected during 2019-2022 by the Belle II experiment

Authors: The Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, J. K. Ahn, H. Aihara, N. Akopov, A. Aloisio, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, M. Barrett, J. Baudot, A. Baur, A. Beaubien , et al. (382 additional authors not shown)

Abstract: A series of data samples was collected with the Belle~II detector at the SuperKEKB collider from March 2019 to June 2022. We determine the integrated luminosities of these data samples using three distinct methodologies involving Bhabha ($e^+e^- \to e^+e^-(nγ)$), digamma ($e^+e^- \to γγ(nγ)$), and dimuon ($e^+e^- \to μ^+ μ^- (nγ)$) events. The total integrated luminosity obtained with Bhabha, diga… ▽ More A series of data samples was collected with the Belle~II detector at the SuperKEKB collider from March 2019 to June 2022. We determine the integrated luminosities of these data samples using three distinct methodologies involving Bhabha ($e^+e^- \to e^+e^-(nγ)$), digamma ($e^+e^- \to γγ(nγ)$), and dimuon ($e^+e^- \to μ^+ μ^- (nγ)$) events. The total integrated luminosity obtained with Bhabha, digamma, and dimuon events is ({426.88} $\pm$ 0.03 $\pm$ {2.61})~fb$^{-1}$, ({429.28} $\pm$ 0.03 $\pm$ {2.62})~fb$^{-1}$, and ({423.99} $\pm$ 0.04 $\pm$ {3.83})~fb$^{-1}$, where the first uncertainties are statistical and the second are systematic. The resulting total integrated luminosity obtained from the combination of the three methods is ({427.87 $\pm$ 2.01})~fb$^{-1}$. △ Less

Submitted 19 September, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

Comments: 12 pages, 3 figures; accepted for publication in Chinese Physics C

Report number: Belle II Preprint 2024-019; KEK Preprint 2024-16

Journal ref: Chin. Phys. C 49, 013001 (2025)

arXiv:2406.15965 [pdf, other]

doi 10.1103/PhysRevD.110.032021

Search for charmed baryons in the $Λ_c^+η$ system and measurement of the branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $Λ_c^+η$ and $pD^0$ relative to $Σ_c(2455)π$

Authors: Belle Collaboration, S. X. Li, C. P. Shen, I. Adachi, J. K. Ahn, H. Aihara, D. M. Asner, H. Atmacan, T. Aushev, R. Ayad, Sw. Banerjee, K. Belous, J. Bennett, M. Bessner, T. Bilka, D. Biswas, D. Bodrov, A. Bozek, M. Bračko, P. Branchini, T. E. Browder, A. Budano, M. Campajola, M. -C. Chang, B. G. Cheon , et al. (103 additional authors not shown)

Abstract: We search for excited charmed baryons in the $Λ_c^+η$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λ_c^+η$ mass spectrum, including the known $Λ_c(2880)^+$ and $Λ_c(2940)^+$. Clear $Λ_c(2880)^+$ and… ▽ More We search for excited charmed baryons in the $Λ_c^+η$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λ_c^+η$ mass spectrum, including the known $Λ_c(2880)^+$ and $Λ_c(2940)^+$. Clear $Λ_c(2880)^+$ and $Λ_c(2940)^+$ signals are observed in the $pD^0$ mass spectrum. We set upper limits at 90\% credibility level on ratios of branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $Λ_c^+η$ relative to $Σ_c(2455)π$ of $<0.13$ for the $Λ_c(2880)^+$ and $<1.11$ for the $Λ_c(2940)^+$. We measure ratios of branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $pD^0$ relative to $Σ_c(2455)π$ of $0.75 \pm 0.03(\text{stat.}) \pm 0.07(\text{syst.})$ for the $Λ_c(2880)^+$ and $3.59 \pm 0.21(\text{stat.}) \pm 0.56(\text{syst.})$ for the $Λ_c(2940)^+$. △ Less

Submitted 28 July, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

Comments: 10 pages, 4 figures, accepted for publication as a Regular Article in Physical Review D

Report number: Belle Preprint: 2024-06;KEK Preprint: 2024-15

Journal ref: Phys. Rev. D 110, 032021 (2024)

arXiv:2405.00229 [pdf, other]

Aptly: Making Mobile Apps from Natural Language

Authors: Evan W. Patton, David Y. J. Kim, Ashley Granquist, Robin Liu, Arianna Scott, Jennet Zamanova, Harold Abelson

Abstract: We present Aptly, an extension of the MIT App Inventor platform enabling mobile app development via natural language powered by code-generating large language models (LLMs). Aptly complements App Inventor's block language with a text language designed to allow visual code generation via text-based LLMs. We detail the technical aspects of how the Aptly server integrates LLMs with a realtime collabo… ▽ More We present Aptly, an extension of the MIT App Inventor platform enabling mobile app development via natural language powered by code-generating large language models (LLMs). Aptly complements App Inventor's block language with a text language designed to allow visual code generation via text-based LLMs. We detail the technical aspects of how the Aptly server integrates LLMs with a realtime collaboration function to facilitate the automated creation and editing of mobile apps given user instructions. The paper concludes with insights from a study of a pilot implementation involving high school students, which examines Aptly's practicality and user experience. The findings underscore Aptly's potential as a tool that democratizes app development and fosters technological creativity. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 11 pages, 7 figures, 2 tables

arXiv:2404.14219 [pdf, other]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Authors: Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai , et al. (104 additional authors not shown)

Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version… ▽ More We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce three models in the phi-3.5 series: phi-3.5-mini, phi-3.5-MoE, and phi-3.5-Vision. The phi-3.5-MoE, a 16 x 3.8B MoE model with 6.6 billion active parameters, achieves superior performance in language reasoning, math, and code tasks compared to other open-source models of similar scale, such as Llama 3.1 and the Mixtral series, and on par with Gemini-1.5-Flash and GPT-4o-mini. Meanwhile, phi-3.5-Vision, a 4.2 billion parameter model derived from phi-3.5-mini, excels in reasoning tasks and is adept at handling both single-image and text prompts, as well as multi-image and text prompts. △ Less

Submitted 30 August, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 24 pages

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01426 [pdf, other]

doi 10.1117/1.JATIS.10.2.025001

Laboratory demonstration of a Photonic Lantern Nuller in monochromatic and broadband light

Authors: Yinzi Xin, Daniel Echeverri, Nemanja Jovanovic, Dimitri Mawet, Sergio Leon-Saval, Rodrigo Amezcua-Correa, Stephanos Yerolatsitis, Michael P. Fitzgerald, Pradip Gatkine, Yoo Jung Kim, Jonathan Lin, Barnaby Norris, Garreth Ruane, Steph Sallum

Abstract: Photonic lantern nulling (PLN) is a method for enabling the detection and characterization of close-in exoplanets by exploiting the symmetries of the ports of a mode-selective photonic lantern (MSPL) to cancel out starlight. A six-port MSPL provides four ports where on-axis starlight is suppressed, while off-axis planet light is coupled with efficiencies that vary as a function of the planet's spa… ▽ More Photonic lantern nulling (PLN) is a method for enabling the detection and characterization of close-in exoplanets by exploiting the symmetries of the ports of a mode-selective photonic lantern (MSPL) to cancel out starlight. A six-port MSPL provides four ports where on-axis starlight is suppressed, while off-axis planet light is coupled with efficiencies that vary as a function of the planet's spatial position. We characterize the properties of a six-port MSPL in the laboratory and perform the first testbed demonstration of the PLN in monochromatic light (1569 nm) and in broadband light (1450 nm to 1625 nm), each using two orthogonal polarizations. We compare the measured spatial throughput maps with those predicted by simulations using the lantern's modes. We find that the morphologies of the measured throughput maps are reproduced by the simulations, though the real lantern is lossy and has lower throughputs overall. The measured ratios of on-axis stellar leakage to peak off-axis throughput are around 10^(-2), likely limited by testbed wavefront errors. These null-depths are already sufficient for observing young gas giants at the diffraction limit using ground-based observatories. Future work includes using wavefront control to further improve the nulls, as well as testing and validating the PLN on-sky. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 30 pages, 12 figures

Journal ref: Journal of Astronomical Telescopes, Instruments, and Systems, Vol. 10, Issue 2, 025001 (April 2024)

arXiv:2403.13513 [pdf, other]

What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models

Authors: Junho Kim, Yeon Ju Kim, Yong Man Ro

Abstract: This paper presents a way of enhancing the reliability of Large Multi-modal Models (LMMs) in addressing hallucination, where the models generate cross-modal inconsistent responses. Without additional training, we propose Counterfactual Inception, a novel method that implants counterfactual thinking into LMMs using self-generated counterfactual keywords. Our method is grounded in the concept of cou… ▽ More This paper presents a way of enhancing the reliability of Large Multi-modal Models (LMMs) in addressing hallucination, where the models generate cross-modal inconsistent responses. Without additional training, we propose Counterfactual Inception, a novel method that implants counterfactual thinking into LMMs using self-generated counterfactual keywords. Our method is grounded in the concept of counterfactual thinking, a cognitive process where human considers alternative realities, enabling more extensive context exploration. Bridging the human cognition mechanism into LMMs, we aim for the models to engage with and generate responses that span a wider contextual scene understanding, mitigating hallucinatory outputs. We further introduce Plausibility Verification Process (PVP), a simple yet robust keyword constraint that effectively filters out sub-optimal keywords to enable the consistent triggering of counterfactual thinking in the model responses. Comprehensive analyses across various LMMs, including both open-source and proprietary models, corroborate that counterfactual thinking significantly reduces hallucination and helps to broaden contextual understanding based on true visual clues. △ Less

Submitted 21 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: Project page: https://ivy-lvlm.github.io/Counterfactual-Inception/

arXiv:2403.04340 [pdf, other]

Search for a pentaquark state decaying into $pJ/ψ$ in $Υ(1,2S)$ inclusive decays at Belle

Authors: Belle Collaboration, X. Dong, S. M. Zou, H. Y. Zhang, X. L. Wang, I. Adachi, J. K. Ahn, H. Aihara, S. Al Said, D. M. Asner, H. Atmacan, R. Ayad, S. Bahinipati, Sw. Banerjee, M. Bessner, V. Bhardwaj, D. Biswas, D. Bodrov, A. Bozek, M. Bračko, P. Branchini, T. E. Browder, A. Budano, M. Campajola, D. Červenkov , et al. (140 additional authors not shown)

Abstract: Using the data samples of 102 million $Υ(1S)$ and 158 million $Υ(2S)$ events collected by the Belle detector, we search for a pentaquark state in the $pJ/ψ$ final state from $Υ(1,2S)$ inclusive decays. Here, the charge-conjugate $\bar{p}J/ψ$ is included. We observe clear $pJ/ψ$ production in $Υ(1,2S)$ decays and measure the branching fractions to be… ▽ More Using the data samples of 102 million $Υ(1S)$ and 158 million $Υ(2S)$ events collected by the Belle detector, we search for a pentaquark state in the $pJ/ψ$ final state from $Υ(1,2S)$ inclusive decays. Here, the charge-conjugate $\bar{p}J/ψ$ is included. We observe clear $pJ/ψ$ production in $Υ(1,2S)$ decays and measure the branching fractions to be $B[Υ(1S) \to pJ/ψ+ anything] = [4.31 \pm 0.28(stat.) \pm 0.20(syst.)] \times 10^{-5}$ and $B[Υ(2S) \to pJ/ψ+ anything] = [2.31 \pm 0.22(stat.) \pm 0.18(syst.)] \times 10^{-5}$. We also measure the cross section of inclusive $pJ/ψ$ production in $e^+e^-$ annihilation to be $σ(e^+e^- \to pJ/ψ+ anything) = [58.5 \pm 5.7 (stat.) \pm 2.8(syst.)]$~fb at $\sqrt{s} = 10.52~\hbox{GeV}$ using an 89.5~fb$^{-1}$ continuum data sample. There is no significant $P_c(4312)^+$, $P_c(4440)^+$ or $P_c(4457)^+$ signal found in the $pJ/ψ$ final states in $Υ(1,2S)$ inclusive decays. We determine the upper limits of $B[Υ(1,2S)\to P_c^{+} + anything] \cdot B(P_c^{+}\to pJ/ψ)$ to be at the $10^{-6}$ level. △ Less

Submitted 27 May, 2025; v1 submitted 7 March, 2024; originally announced March 2024.

Report number: Belle Preprint 2024-02, KEK Preprint 2023-54

arXiv:2402.08158 [pdf, other]

Coherent Imaging with Photonic Lanterns

Authors: Yoo Jung Kim, Michael P. Fitzgerald, Jonathan Lin, Steph Sallum, Yinzi Xin, Nemanja Jovanovic, Sergio Leon-Saval

Abstract: Photonic Lanterns (PLs) are tapered waveguides that gradually transition from a multi-mode fiber geometry to a bundle of single-mode fibers (SMFs). They can efficiently couple multi-mode telescope light into a multi-mode fiber entrance at the focal plane and convert it into multiple single-mode beams. Thus, each SMF samples its unique mode (lantern principal mode) of the telescope light in the pup… ▽ More Photonic Lanterns (PLs) are tapered waveguides that gradually transition from a multi-mode fiber geometry to a bundle of single-mode fibers (SMFs). They can efficiently couple multi-mode telescope light into a multi-mode fiber entrance at the focal plane and convert it into multiple single-mode beams. Thus, each SMF samples its unique mode (lantern principal mode) of the telescope light in the pupil, analogous to subapertures in aperture masking interferometry (AMI). Coherent imaging with PLs can be enabled by interfering SMF outputs and applying phase modulation, which can be achieved using a photonic chip beam combiner at the backend (e.g., the ABCD beam combiner). In this study, we investigate the potential of coherent imaging by interfering SMF outputs of a PL with a single telescope. We demonstrate that the visibilities that can be measured from a PL are mutual intensities incident on the pupil weighted by the cross-correlation of a pair of lantern modes. From numerically simulated lantern principal modes of a 6-port PL, we find that interferometric observables using a PL behave similarly to separated-aperture visibilities for simple models on small angular scales ($<λ/D$) but with greater sensitivity to symmetries and capability to break phase angle degeneracies. Furthermore, we present simulated observations with wavefront errors and compare them to AMI. Despite the redundancy caused by extended lantern principal modes, spatial filtering offers stability to wavefront errors. Our simulated observations suggest that PLs may offer significant benefits in the photon noise-limited regime and in resolving small angular scales at low contrast regime. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted for publication in ApJ

arXiv:2401.08417 [pdf, other]

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Authors: Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim

Abstract: Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We… ▽ More Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets. △ Less

Submitted 2 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted at ICML 2024

arXiv:2401.04807 [pdf, other]

Search for Baryon-Number-Violating Processes in $B^-$ Decays to the $\barΞ_{c}^{0} \barΛ_{c}^{-}$ Final State

Authors: Belle Collaboration, T. Gu, V. Savinov, I. Adachi, H. Aihara, D. M. Asner, H. Atmacan, T. Aushev, R. Ayad, Sw. Banerjee, K. Belous, J. Bennett, M. Bessner, V. Bhardwaj, B. Bhuyan, D. Biswas, A. Bobrov, D. Bodrov, J. Borah, A. Bozek, M. Bračko, P. Branchini, T. E. Browder, A. Budano, M. Campajola , et al. (139 additional authors not shown)

Abstract: We report the results of the first search for $B^-$ decays to the $\barΞ_{c}^{0} \barΛ_{c}^{-}$ final state using 711~${\rm fb^{-1}}$ of data collected at the $Υ(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+ e^-$ collider. The results are interpreted in terms of both direct baryon-number-violating $B^-$ decay and $Ξ_{c}^{0}-\barΞ_{c}^{0}$ oscillations which follow the S… ▽ More We report the results of the first search for $B^-$ decays to the $\barΞ_{c}^{0} \barΛ_{c}^{-}$ final state using 711~${\rm fb^{-1}}$ of data collected at the $Υ(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+ e^-$ collider. The results are interpreted in terms of both direct baryon-number-violating $B^-$ decay and $Ξ_{c}^{0}-\barΞ_{c}^{0}$ oscillations which follow the Standard Model decay $B^- \to Ξ_{c}^{0} \barΛ_{c}^{-}$. We observe no evidence for baryon number violation and set the 95\% confidence-level upper limits on the ratio of baryon-number-violating and Standard Model branching fractions ${\mathcal{B}(B^- \rightarrow \barΞ_{c}^{0} \barΛ_{c}^{-})}/{\mathcal{B}(B^- \rightarrow Ξ_{c}^{0} \barΛ_{c}^{-})}$ to be $< 2.7\%$ and on the $Ξ_{c}^{0} - \barΞ_{c}^{0}$ oscillation angular frequency $ω$ to be $< 0.76\ \mathrm{ps}^{-1}$ (equivalent to $τ_{\rm mix} > 1.3$~ps). △ Less

Submitted 11 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: 7 pages, 2 figures, submitted to Phys. Rev. Lett

Report number: Belle Preprint 2024-01, KEK Preprint 2023-48

arXiv:2401.04646 [pdf, ps, other]

Measurements of the branching fraction, polarization, and $CP$ asymmetry for the decay $B^0\rightarrow ωω$

Authors: Belle Collaboration, Y. Guan, A. J. Schwartz, K. Kinoshita, I. Adachi, H. Aihara, S. Al Said, D. M. Asner, H. Atmacan, R. Ayad, S. Bahinipati, Sw. Banerjee, K. Belous, J. Bennett, M. Bessner, V. Bhardwaj, B. Bhuyan, D. Biswas, A. Bobrov, D. Bodrov, J. Borah, A. Bozek, M. Bračko, P. Branchini, A. Budano , et al. (145 additional authors not shown)

Abstract: We present a measurement of $B^{0} \rightarrow ωω$, a charmless decay into two vector mesons, using 772 $\times 10^6$ $B\overline{B}$ pairs collected with the Belle detector at the KEKB $e^+e^-$ collider. The decay is observed with a significance of 7.9 standard deviations. We measure a branching fraction $\mathcal{B} = (1.53 \pm 0.29 \pm 0.17) \times 10^{-6}$, a fraction of longitudinal polarizat… ▽ More We present a measurement of $B^{0} \rightarrow ωω$, a charmless decay into two vector mesons, using 772 $\times 10^6$ $B\overline{B}$ pairs collected with the Belle detector at the KEKB $e^+e^-$ collider. The decay is observed with a significance of 7.9 standard deviations. We measure a branching fraction $\mathcal{B} = (1.53 \pm 0.29 \pm 0.17) \times 10^{-6}$, a fraction of longitudinal polarization $f_L = 0.87 \pm 0.13 \pm 0.13$, and a time-integrated $CP$ asymmetry $A_{CP}$ = $-0.44 \pm 0.43 \pm 0.11$, where the first uncertainties listed are statistical and the second are systematic. This is the first observation of $B^{0} \rightarrow ωω$, and the first measurements of $f_L$ and $A_{CP}$ for this decay. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 7 pages, 1 figure, submitted to Phys. Rev. Lett

Report number: Belle Preprint 2023-21, KEK Preprint 2023-43, UCHEP-24-01

arXiv:2312.13381 [pdf, other]

Real-time experimental demonstrations of a photonic lantern wavefront sensor

Authors: Jonathan W. Lin, Michael P. Fitzgerald, Yinzi Xin, Yoo Jung Kim, Olivier Guyon, Barnaby Norris, Christopher Betters, Sergio Leon-Saval, Kyohoon Ahn, Vincent Deo, Julien Lozi, Sébastien Vievard, Daniel Levinstein, Steph Sallum, Nemanja Jovanovic

Abstract: The direct imaging of an Earth-like exoplanet will require sub-nanometric wavefront control across large light-collecting apertures, to reject host starlight and detect the faint planetary signal. Current adaptive optics (AO) systems, which use wavefront sensors that reimage the telescope pupil, face two challenges that prevent this level of control: non-common-path aberrations (NCPAs), caused by… ▽ More The direct imaging of an Earth-like exoplanet will require sub-nanometric wavefront control across large light-collecting apertures, to reject host starlight and detect the faint planetary signal. Current adaptive optics (AO) systems, which use wavefront sensors that reimage the telescope pupil, face two challenges that prevent this level of control: non-common-path aberrations (NCPAs), caused by differences between the sensing and science arms of the instrument; and petaling modes: discontinuous phase aberrations caused by pupil fragmentation, especially relevant for the upcoming 30-m class telescopes. Such aberrations drastically impact the capabilities of high-contrast instruments. To address these issues, we can add a second-stage wavefront sensor to the science focal plane. One promising architecture uses the photonic lantern (PL): a waveguide that efficiently couples aberrated light into single-mode fibers (SMFs). In turn, SMF-confined light can be stably injected into high-resolution spectrographs, enabling direct exoplanet characterization and precision radial velocity measurements; simultaneously, the PL can be used for focal-plane wavefront sensing. We present a real-time experimental demonstration of the PL wavefront sensor on the Subaru/SCExAO testbed. Our system is stable out to around ~400 nm of low-order Zernike wavefront error, and can correct petaling modes. When injecting ~30 nm RMS of low order time-varying error, we achieve ~10x rejection at 1 s timescales; further refinements to the control law and lantern fabrication process should make sub-nanometric wavefront control possible. In the future, novel sensors like the PLWFS may prove to be critical in resolving the wavefront control challenges posed by exoplanet direct imaging. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: Accepted to ApJL

arXiv:2312.01253 [pdf]

On Merits of Faster-than-Nyquist Signaling in the Finite Blocklength Regime

Authors: Yong Jin Daniel Kim

Abstract: We identify potential merits of faster-than-Nyquist (FTN) signaling in the finite blocklength (FBL) regime. A unique aspect of FTN signaling is that it can increase the blocklength by packing more data symbols within the same time and frequency to yield strictly higher number of independent signaling dimensions than that of Nyquist rate signaling. Using the finite-blocklength information theory, w… ▽ More We identify potential merits of faster-than-Nyquist (FTN) signaling in the finite blocklength (FBL) regime. A unique aspect of FTN signaling is that it can increase the blocklength by packing more data symbols within the same time and frequency to yield strictly higher number of independent signaling dimensions than that of Nyquist rate signaling. Using the finite-blocklength information theory, we provide tight bounds on the maximum channel coding rate (MCCR) of FTN signaling for any finite time-bandwidth product. The merits are categorized into two operating regions of FTN, i.e., when the time-acceleration factor of FTN, $τ$, is above or below a certain threshold $τ_{0}$. When $τ> τ_{0}$, FTN has both higher channel capacity and MCCR than that of Nyquist rate signaling, when the utilized pulse shape is non-sinc. Since the issues associated with the ideal sinc pulse only get exacerbated when packets are short, the benefit of FTN becomes more significant in the FBL regime. On the other hand, when $τ< τ_{0}$, the channel capacity is fixed but MCCR of FTN can continue to increase to a certain degree, thereby reducing the gap between the capacity and MCCR. This benefit is present regardless of the utilized pulse shape, including the ideal sinc-pulse, and is unique to the FBL regime. Instead of increasing MCCR for fixed block error rates, FTN can alternatively lower the block error rates for fixed channel coding rates. These results imply that FTN can lower the penalty from limited channel coding over short blocklength and can improve the performance and reliability of short packet communications. △ Less

Submitted 25 April, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

arXiv:2312.00221 [pdf, other]

Spectroastrometry and Imaging Science with Photonic Lanterns on Extremely Large Telescopes

Authors: Yoo Jung Kim, Michael P. Fitzgerald, Jonathan Lin, Steph Sallum, Yinzi Xin, Nemanja Jovanovic, Sergio Leon-Saval, Christopher Betters, Pradip Gatkine, Olivier Guyon, Julien Lozi, Dimitri Mawet, Barnaby Norris, Sébastien Vievard

Abstract: Photonic lanterns (PLs) are tapered waveguides that gradually transition from a multi-mode fiber geometry to a bundle of single-mode fibers. In astronomical applications, PLs can efficiently couple multi-mode telescope light into a multi-mode fiber entrance and convert it into multiple single-mode beams. The output beams are highly stable and suitable for feeding into high-resolution spectrographs… ▽ More Photonic lanterns (PLs) are tapered waveguides that gradually transition from a multi-mode fiber geometry to a bundle of single-mode fibers. In astronomical applications, PLs can efficiently couple multi-mode telescope light into a multi-mode fiber entrance and convert it into multiple single-mode beams. The output beams are highly stable and suitable for feeding into high-resolution spectrographs or photonic chip beam combiners. For instance, by using relative intensities in the output cores as a function of wavelength, PLs can enable spectroastrometry. In addition, by interfering beams in the output cores with a beam combiner in the backend, PLs can be used for high-throughput interferometric imaging. When used on an Extremely Large Telescope (ELT), with its increased sensitivity and angular resolution, the imaging and spectroastrometric capabilities of PLs will be extended to higher contrast and smaller angular scales. We study the potential spectroastrometry and imaging science cases of PLs on ELTs, including study of exomoons, broad-line regions of quasars, and inner circumstellar disks. △ Less

Submitted 30 November, 2023; originally announced December 2023.

Comments: AO4ELT7 conference proceedings 2023

arXiv:2311.08590 [pdf, other]

PEMA: An Offsite-Tunable Plug-in External Memory Adaptation for Language Models

Authors: HyunJin Kim, Young Jin Kim, JinYeong Bak

Abstract: Pre-trained language models (PLMs) show impressive performance in various downstream NLP tasks. However, pre-training large language models demands substantial memory and training compute. Furthermore, due to the substantial resources required, many PLM weights are confidential. Consequently, users are compelled to share their data with model owners for fine-tuning specific tasks. To overcome the… ▽ More Pre-trained language models (PLMs) show impressive performance in various downstream NLP tasks. However, pre-training large language models demands substantial memory and training compute. Furthermore, due to the substantial resources required, many PLM weights are confidential. Consequently, users are compelled to share their data with model owners for fine-tuning specific tasks. To overcome the limitations, we introduce Plug-in External Memory Adaptation (PEMA), a Parameter-Efficient Fine-Tuning (PEFT) method, enabling PLM fine-tuning without requiring access to all the weights. PEMA integrates with context representations from test data during inference to perform downstream tasks. It uses external memory to store PLM-generated context representations mapped with target tokens. Our method utilizes weight matrices of LoRA-like bottlenecked adapter in the PLM's final layer to enhance efficiency. Our approach also includes Gradual Unrolling, a novel interpolation strategy to improve generation quality. We validate PEMA's effectiveness through experiments on syntactic and real datasets for machine translation and style transfer. Our findings show that PEMA outperforms other PEFT approaches in memory and latency efficiency for training, and also excels in maintaining sentence meaning and generating appropriate language and styles. △ Less

Submitted 29 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: Accepted to NAACL 2024

arXiv:2311.01622 [pdf, other]

Focal-plane wavefront sensing with photonic lanterns II: numerical characterization and optimization

Authors: Jonathan Lin, Michael P. Fitzgerald, Yinzi Xin, Yoo Jung Kim, Olivier Guyon, Sergio Leon-Saval, Barnaby Norris, Nemanja Jovanovic

Abstract: We present numerical characterizations of the wavefront sensing performance for few-mode photonic lantern wavefront sensors (PLWFSs). These characterizations include calculations of throughput, control space, sensor linearity, and an estimate of maximum linear reconstruction range for standard and hybrid lanterns with 3 to 19 ports, at a wavelength of 1550 nm. We additionally consider the impact o… ▽ More We present numerical characterizations of the wavefront sensing performance for few-mode photonic lantern wavefront sensors (PLWFSs). These characterizations include calculations of throughput, control space, sensor linearity, and an estimate of maximum linear reconstruction range for standard and hybrid lanterns with 3 to 19 ports, at a wavelength of 1550 nm. We additionally consider the impact of beam-shaping optics and a charge-1 vortex mask, placed in the pupil plane. The former is motivated by the application of PLs to high-resolution spectroscopy, which could enable efficient injection into the spectrometer along with simultaneous focal-plane wavefront sensing; similarly, the latter is motivated by the application of PLs to vortex fiber nulling (VFN), which can simultaneously enable wavefront sensing and the nulling of on-axis starlight. Overall, we find that the PLWFS setups tested in this work exhibit good linearity out to ~0.25-0.5 radians of RMS wavefront error (WFE). Meanwhile, we estimate the maximum amount of WFE that can be handled by these sensors, before the sensor response becomes degenerate, to be around ~1-2 radians RMS. In the future, we expect these limits can be pushed further by increasing the number of degrees of freedom, either by adopting higher-mode-count lanterns, dispersing lantern outputs, or separating polarizations. Lastly, we consider optimization strategies for the design of the PLWFS, which involve both modification of the lantern itself and the use of pre- and post-lantern optics like phase masks and interferometric beam recombiners. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: Accepted to JOSA B

arXiv:2310.02410 [pdf, other]

Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness

Authors: Young Jin Kim, Raffy Fahim, Hany Hassan Awadalla

Abstract: Large Mixture of Experts (MoE) models could achieve state-of-the-art quality on various language tasks, including machine translation task, thanks to the efficient model scaling capability with expert parallelism. However, it has brought a fundamental issue of larger memory consumption and increased memory bandwidth bottleneck at deployment time. In this paper, we propose Mixture of Quantized Expe… ▽ More Large Mixture of Experts (MoE) models could achieve state-of-the-art quality on various language tasks, including machine translation task, thanks to the efficient model scaling capability with expert parallelism. However, it has brought a fundamental issue of larger memory consumption and increased memory bandwidth bottleneck at deployment time. In this paper, we propose Mixture of Quantized Experts (MoQE) which is a simple weight-only quantization method applying ultra low-bit down to 2-bit quantizations only to expert weights for mitigating the increased memory and latency issues of MoE models. We show that low-bit quantization together with the MoE architecture delivers a reliable model performance while reducing the memory size significantly even without any additional training in most cases. In particular, expert layers in MoE models are much more robust to the quantization than conventional feedforward networks (FFN) layers. In our comprehensive analysis, we show that MoE models with 2-bit expert weights can deliver better model performance than the dense model trained on the same dataset. As a result of low-bit quantization, we show the model size can be reduced by 79.6% of the original half precision floating point (fp16) MoE model. Combined with an optimized GPU runtime implementation, it also achieves 1.24X speed-up on A100 GPUs. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.14741 [pdf, other]

Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

Authors: Hee-Soo Heo, KiHyun Nam, Bong-Jin Lee, Youngki Kwon, Minjae Lee, You Jin Kim, Joon Son Chung

Abstract: In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remain… ▽ More In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.12306 [pdf, other]

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

Authors: Chaeyoung Jung, Suyeon Lee, Kihyun Nam, Kyeongha Rho, You Jin Kim, Youngjoon Jang, Joon Son Chung

Abstract: The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames. Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored. In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full se… ▽ More The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames. Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored. In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full segments where a person on the screen is actually speaking. This encourages the model to learn effective representations through the natural correspondence of speech and facial movements. Our loss can be jointly optimized with the existing objectives for training ASD models without the need for additional supervision or training data. The experiments demonstrate that our loss can be easily integrated into the existing ASD frameworks, improving their performance. Our method achieves state-of-the-art performances on AVA-ActiveSpeaker and ASW datasets. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.11674 [pdf, other]

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Authors: Haoran Xu, Young Jin Kim, Amr Sharaf, Hany Hassan Awadalla

Abstract: Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities… ▽ More Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation. △ Less

Submitted 6 February, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: Accepted at ICLR 2024

arXiv:2309.08732 [pdf, other]

The path to detecting extraterrestrial life with astrophotonics

Authors: Nemanja Jovanovic, Yinzi Xin, Michael P. Fitzgerald, Olivier Guyon, Peter Tuthill, Barnaby Norris, Pradip Gatkine, Greg Sercel, Svarun Soda, Yoo Jung Kim, Jonathan Lin, Sergio Leon-Saval, Rodrigo Amezcua-Correa, Stephanos Yerolatsitis, Julien Lozi, Sebastien Vievard, Chris Betters, Steph Sallum, Daniel Levinstein, Dimitri Mawet, Jeffrey Jewell, J. Kent Wallace, Nick Cvetojevic

Abstract: Astrophysical research into exoplanets has delivered thousands of confirmed planets orbiting distant stars. These planets span a wide ranges of size and composition, with diversity also being the hallmark of system configurations, the great majority of which do not resemble our own solar system. Unfortunately, only a handful of the known planets have been characterized spectroscopically thus far,… ▽ More Astrophysical research into exoplanets has delivered thousands of confirmed planets orbiting distant stars. These planets span a wide ranges of size and composition, with diversity also being the hallmark of system configurations, the great majority of which do not resemble our own solar system. Unfortunately, only a handful of the known planets have been characterized spectroscopically thus far, leaving a gaping void in our understanding of planetary formation processes and planetary types. To make progress, astronomers studying exoplanets will need new and innovative technical solutions. Astrophotonics -- an emerging field focused on the application of photonic technologies to observational astronomy -- provides one promising avenue forward. In this paper we discuss various astrophotonic technologies that could aid in the detection and subsequent characterization of planets and in particular themes leading towards the detection of extraterrestrial life. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: 9 pages, 2 figures, SPIE Optics and Photonics conference

Report number: 12680-17

arXiv:2308.15772 [pdf, other]

Task-Based MoE for Multitask Multilingual Machine Translation

Authors: Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla

Abstract: Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic… ▽ More Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters. Our experiments and analysis show the advantages of our approaches over the dense and canonical MoE models on multi-task multilingual machine translations. With task-specific adapters, our models can additionally generalize to new tasks efficiently. △ Less

Submitted 24 October, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.14292 [pdf, other]

Photonic spectro-interferometry with SCExAO/FIRST at the Subaru Telescope: towards H-alpha imaging of protoplanets

Authors: Sébastien Vievard, Manon Lallement, Elsa Huby, Sylvestre Lacour, Olivier Guyon, Nemanja Jovanovic, Sergio Leon-saval, Julien Lozi, Vincent Deo, Kyohoon Ahn, Nick Cvetojevic, Kevin Barjot, Guillermo Martin, Harry-Dean Kenchington-Goldsmith, Gaspard Duchêne, Takayuki Kotani, Franck Marchis, Daniel Rouan, Michael Fitzgerald, Steph Sallum, Barnaby Norris, Chris Betters, Pradip Gatkine, John Lin, Yoo Jung Kim , et al. (5 additional authors not shown)

Abstract: FIRST is a post Extreme Adaptive-Optics (ExAO) spectro-interferometer operating in the Visible (600-800 nm, R~400). Its exquisite angular resolution (a sensitivity analysis of on-sky data shows that bright companions can be detected down to 0.25lambda/D) combined with its sensitivity to pupil phase discontinuities (from a few nm up to dozens of microns) makes FIRST an ideal self-calibrated solutio… ▽ More FIRST is a post Extreme Adaptive-Optics (ExAO) spectro-interferometer operating in the Visible (600-800 nm, R~400). Its exquisite angular resolution (a sensitivity analysis of on-sky data shows that bright companions can be detected down to 0.25lambda/D) combined with its sensitivity to pupil phase discontinuities (from a few nm up to dozens of microns) makes FIRST an ideal self-calibrated solution for enabling exoplanet detection and characterization in the future. We present the latest on-sky results along with recent upgrades, including the integration and on-sky test of a new spectrograph (R~3,600) optimized for the detection of H-alpha emission from young exoplanets accreting matter. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: Proceedings published in SPIE optics + Photonics (2023) Session "Instrumentation for exoplanet"

arXiv:2308.13539 [pdf, other]

Redefining Computer Science Education: Code-Centric to Natural Language Programming with AI-Based No-Code Platforms

Authors: David Y. J. Kim

Abstract: This paper delves into the evolving relationship between humans and computers in the realm of programming. Historically, programming has been a dialogue where humans meticulously crafted communication to suit machine understanding, shaping the trajectory of computer science education. However, the advent of AI-based no-code platforms is revolutionizing this dynamic. Now, humans can converse in the… ▽ More This paper delves into the evolving relationship between humans and computers in the realm of programming. Historically, programming has been a dialogue where humans meticulously crafted communication to suit machine understanding, shaping the trajectory of computer science education. However, the advent of AI-based no-code platforms is revolutionizing this dynamic. Now, humans can converse in their natural language, expecting machines to interpret and act. This shift has profound implications for computer science education. As educators, it's imperative to integrate this new dynamic into curricula. In this paper, we've explored several pertinent research questions in this transformation, which demand continued inquiry and adaptation in our educational strategies. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: 7 pages, 1 figure

arXiv:2308.09723 [pdf, other]

FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs

Authors: Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla

Abstract: Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements. Furthermore, the latest generative models suffer from high inference costs caused by the memory bandwidth bottleneck in the auto-regressive decoding process. To address these issues, we propose an efficient… ▽ More Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements. Furthermore, the latest generative models suffer from high inference costs caused by the memory bandwidth bottleneck in the auto-regressive decoding process. To address these issues, we propose an efficient weight-only quantization method that reduces memory consumption and accelerates inference for LLMs. To ensure minimal quality degradation, we introduce a simple and effective heuristic approach that utilizes only the model weights of a pre-trained model. This approach is applicable to both Mixture-of-Experts (MoE) and dense models without requiring additional fine-tuning. To demonstrate the effectiveness of our proposed method, we first analyze the challenges and issues associated with LLM quantization. Subsequently, we present our heuristic approach, which adaptively finds the granularity of quantization, effectively addressing these problems. Furthermore, we implement highly efficient GPU GEMMs that perform on-the-fly matrix multiplication and dequantization, supporting the multiplication of fp16 or bf16 activations with int8 or int4 weights. We evaluate our approach on large-scale open source models such as OPT-175B and internal MoE models, showcasing minimal accuracy loss while achieving up to 3.65 times higher throughput on the same number of GPUs. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Showing 1–50 of 406 results for author: Kim, Y J