-
INDUS: Effective and Efficient Language Models for Scientific Applications
Authors:
Bishwaranjan Bhattacharjee,
Aashka Trivedi,
Masayasu Muraoka,
Muthukumaran Ramasubramanian,
Takuma Udagawa,
Iksha Gurung,
Nishan Pantha,
Rong Zhang,
Bharath Dandala,
Rahul Ramachandran,
Manil Maskey,
Kaylin Bugbee,
Mike Little,
Elizabeth Fancher,
Irina Gerasimov,
Armin Mehrabian,
Lauren Sanders,
Sylvain Costes,
Sergi Blanco-Cuaresma,
Kelly Lockhart,
Thomas Allen,
Felix Grezes,
Megan Ansdell,
Alberto Accomazzi,
Yousef El-Kurdi
, et al. (11 additional authors not shown)
Abstract:
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this insight, we developed INDUS, a comprehensive suite of LLMs tailored for the closely-related domains of Earth science, biology, phys…
▽ More
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this insight, we developed INDUS, a comprehensive suite of LLMs tailored for the closely-related domains of Earth science, biology, physics, heliophysics, planetary sciences and astrophysics, and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address NLP tasks, (2) a contrastive-learning based text embedding model trained using a diverse set of datasets to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation for applications which have latency or resource constraints. We also created three new scientific benchmark datasets, CLIMATE-CHANGE NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. We show that our models outperform both general-purpose (RoBERTa) and domain-specific (SCIBERT) encoders on these new tasks as well as existing tasks in the domains of interest. Furthermore, we demonstrate the use of these models in two industrial settings -- as a retrieval model for large-scale vector search applications and in automatic content tagging systems.
△ Less
Submitted 30 October, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Generative Unlearning for Any Identity
Authors:
Juwon Seo,
Sung-Hoon Lee,
Tae-Young Lee,
Seungjun Moon,
Gyeong-Moon Park
Abstract:
Recent advances in generative models trained on large-scale datasets have made it possible to synthesize high-quality samples across various domains. Moreover, the emergence of strong inversion networks enables not only a reconstruction of real-world images but also the modification of attributes through various editing methods. However, in certain domains related to privacy issues, e.g., human fa…
▽ More
Recent advances in generative models trained on large-scale datasets have made it possible to synthesize high-quality samples across various domains. Moreover, the emergence of strong inversion networks enables not only a reconstruction of real-world images but also the modification of attributes through various editing methods. However, in certain domains related to privacy issues, e.g., human faces, advanced generative models along with strong inversion methods can lead to potential misuses. In this paper, we propose an essential yet under-explored task called generative identity unlearning, which steers the model not to generate an image of a specific identity. In the generative identity unlearning, we target the following objectives: (i) preventing the generation of images with a certain identity, and (ii) preserving the overall quality of the generative model. To satisfy these goals, we propose a novel framework, Generative Unlearning for Any Identity (GUIDE), which prevents the reconstruction of a specific identity by unlearning the generator with only a single image. GUIDE consists of two parts: (i) finding a target point for optimization that un-identifies the source latent code and (ii) novel loss functions that facilitate the unlearning procedure while less affecting the learned distribution. Our extensive experiments demonstrate that our proposed method achieves state-of-the-art performance in the generative machine unlearning task. The code is available at https://github.com/KHU-AGI/GUIDE.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
Authors:
JoonHo Lee,
Jae Oh Woo,
Juree Seok,
Parisa Hassanzadeh,
Wooseok Jang,
JuYoun Son,
Sima Didari,
Baruch Gutow,
Heng Hao,
Hankyu Moon,
Wenjun Hu,
Yeong-Dae Kwon,
Taehee Lee,
Seungjai Min
Abstract:
Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for t…
▽ More
Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for the quality of paired responses based on Bayesian approximation. Trained with preference datasets, our uncertainty-enabled proxy not only scores rewards for responses but also evaluates their inherent uncertainty. Empirical results demonstrate significant benefits of incorporating the proposed proxy into language model training. Our method boosts the instruction following capability of language models by refining data curation for training and improving policy optimization objectives, thereby surpassing existing methods by a large margin on benchmarks such as Vicuna and MT-bench. These findings highlight that our proposed approach substantially advances language model training and paves a new way of harnessing uncertainty within language models.
△ Less
Submitted 31 January, 2025; v1 submitted 10 May, 2024;
originally announced May 2024.
-
The Simons Observatory: Design, integration, and testing of the small aperture telescopes
Authors:
Nicholas Galitzki,
Tran Tsan,
Jake Spisak,
Michael Randall,
Max Silva-Feaver,
Joseph Seibert,
Jacob Lashner,
Shunsuke Adachi,
Sean M. Adkins,
Thomas Alford,
Kam Arnold,
Peter C. Ashton,
Jason E. Austermann,
Carlo Baccigalupi,
Andrew Bazarko,
James A. Beall,
Sanah Bhimani,
Bryce Bixler,
Gabriele Coppi,
Lance Corbett,
Kevin D. Crowley,
Kevin T. Crowley,
Samuel Day-Weiss,
Simon Dicker,
Peter N. Dow
, et al. (55 additional authors not shown)
Abstract:
The Simons Observatory (SO) is a cosmic microwave background (CMB) survey experiment that includes small-aperture telescopes (SATs) observing from an altitude of 5,200 m in the Atacama Desert in Chile. The SO SATs will cover six spectral bands between 27 and 280 GHz to search for primordial B-modes to a sensitivity of $σ(r)=0.002$, with quantified systematic errors well below this value. Each SAT…
▽ More
The Simons Observatory (SO) is a cosmic microwave background (CMB) survey experiment that includes small-aperture telescopes (SATs) observing from an altitude of 5,200 m in the Atacama Desert in Chile. The SO SATs will cover six spectral bands between 27 and 280 GHz to search for primordial B-modes to a sensitivity of $σ(r)=0.002$, with quantified systematic errors well below this value. Each SAT is a self-contained cryogenic telescope with a 35$^\circ$ field of view, 42 cm diameter optical aperture, 40 K half-wave plate, 1 K refractive optics, and $<0.1$ K focal plane that holds $>12,000$ TES detectors. We describe the nominal design of the SATs and present details about the integration and testing for one operating at 93 and 145 GHz.
△ Less
Submitted 10 May, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
A Method of Measuring TES Complex ETF Response in Frequency-domain Multiplexed Readout by Single Sideband Power Modulation
Authors:
Yu Zhou,
Tijmen de Haan,
Hiroki Akamatsu,
Daisuke Kaneko,
Masashi Hazumi,
Masaya Hasegawa,
Aritoki Suzuki,
Adrian T. Lee
Abstract:
The digital frequency domain multiplexing (DfMux) technique is widely used for astrophysical instruments with large detector arrays. Detailed detector characterization is required for instrument calibration and systematics control. We conduct the TES complex electrothermal-feedback (ETF) response measurement with the DfMux readout system as follows. By injecting a single sideband signal, we induce…
▽ More
The digital frequency domain multiplexing (DfMux) technique is widely used for astrophysical instruments with large detector arrays. Detailed detector characterization is required for instrument calibration and systematics control. We conduct the TES complex electrothermal-feedback (ETF) response measurement with the DfMux readout system as follows. By injecting a single sideband signal, we induce modulation in TES power dissipation over a frequency range encompassing the detector response. The modulated current signal induced by TES heating effect is measured, allowing for the ETF response characterization of the detector. With the injection of an upper sideband, the TES readout current shows both an upper and a lower sideband. We model the upper and lower sideband complex ETF response and verify the model by fitting to experimental data. The model not only can fit for certain physical parameters of the detector, such as loop gain, temperature sensitivity, current sensitivity, and time constant, but also enables us to estimate the systematic effect introduced by the multiplexed readout. The method is therefore useful for in-situ detector calibration and for estimating systematic effects during astronomical telescope observations, such as those performed by the upcoming LiteBIRD satellite.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation
Authors:
Yihao Zhou,
Timothy Tin-Yan Lee,
Kelly Ka-Lee Lai,
Chonglin Wu,
Hin Ting Lau,
De Yang,
Chui-Yi Chan,
Winnie Chiu-Wing Chu,
Jack Chun-Yiu Cheng,
Tsz-Ping Lam,
Yong-Ping Zheng
Abstract:
The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of mea…
▽ More
The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of measuring spinal curvature is still carried out manually. Consequently, there is a considerable demand for a fully automatic system that can locate bony landmarks and perform angle measurements. To this end, we introduce an estimation model for automatic ultrasound curve angle (UCA) measurement. The model employs a dual-branch network to detect candidate landmarks and perform vertebra segmentation on ultrasound coronal images. An affinity clustering strategy is utilized within the vertebral segmentation area to illustrate the affinity relationship between candidate landmarks. Subsequently, we can efficiently perform line delineation from a clustered affinity map for UCA measurement. As our method is specifically designed for UCA calculation, this method outperforms other state-of-the-art methods for landmark and line detection tasks. The high correlation between the automatic UCA and Cobb angle (R$^2$=0.858) suggests that our proposed method can potentially replace manual UCA measurement in ultrasound scoliosis assessment.
△ Less
Submitted 6 May, 2024; v1 submitted 5 May, 2024;
originally announced May 2024.
-
Ancient mean curvature flows with finite total curvature
Authors:
Kyeongsu Choi,
Jiuzhou Huang,
Taehun Lee
Abstract:
We construct an $I$-family of ancient graphical mean curvature flows over a minimal hypersurface in $\mathbb{R}^{n+1}$ of finite total curvature with the Morse index $I$ by establishing exponentially fast convergence in terms of $|x|^2-t$. As a corollary, we show that these ancient flows have finite total curvature and finite mass drop. Moreover, one family of these flows is mean convex by a point…
▽ More
We construct an $I$-family of ancient graphical mean curvature flows over a minimal hypersurface in $\mathbb{R}^{n+1}$ of finite total curvature with the Morse index $I$ by establishing exponentially fast convergence in terms of $|x|^2-t$. As a corollary, we show that these ancient flows have finite total curvature and finite mass drop. Moreover, one family of these flows is mean convex by a pointwise estimate.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Finite distance problem on the moduli of non-Kähler Calabi--Yau $\partial\bar{\partial}$-threefolds
Authors:
Tsung-Ju Lee
Abstract:
In this article, we study the finite distance problem with respect to the period-map metric on the moduli of non-Kähler Calabi--Yau $\partial\bar{\partial}$-threefolds via Hodge theory. We extended C.-L. Wang's finite distance criterion for one-parameter degenerations to the present setting. As a byproduct, we also obtained a sufficient condition for a non-Kähler Calabi--Yau to support the…
▽ More
In this article, we study the finite distance problem with respect to the period-map metric on the moduli of non-Kähler Calabi--Yau $\partial\bar{\partial}$-threefolds via Hodge theory. We extended C.-L. Wang's finite distance criterion for one-parameter degenerations to the present setting. As a byproduct, we also obtained a sufficient condition for a non-Kähler Calabi--Yau to support the $\partial\bar{\partial}$-lemma which generalizes the results by Friedman and Li. We also proved that the non-Kähler Calabi--Yau threefolds constructed by Hashimoto and Sano support the $\partial\bar{\partial}$-lemma.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Low-rank Matrix Bandits with Heavy-tailed Rewards
Authors:
Yue Kang,
Cho-Jui Hsieh,
Thomas C. M. Lee
Abstract:
In stochastic low-rank matrix bandit, the expected reward of an arm is equal to the inner product between its feature matrix and some unknown $d_1$ by $d_2$ low-rank parameter matrix $Θ^*$ with rank $r \ll d_1\wedge d_2$. While all prior studies assume the payoffs are mixed with sub-Gaussian noises, in this work we loosen this strict assumption and consider the new problem of \underline{low}-rank…
▽ More
In stochastic low-rank matrix bandit, the expected reward of an arm is equal to the inner product between its feature matrix and some unknown $d_1$ by $d_2$ low-rank parameter matrix $Θ^*$ with rank $r \ll d_1\wedge d_2$. While all prior studies assume the payoffs are mixed with sub-Gaussian noises, in this work we loosen this strict assumption and consider the new problem of \underline{low}-rank matrix bandit with \underline{h}eavy-\underline{t}ailed \underline{r}ewards (LowHTR), where the rewards only have finite $(1+δ)$ moment for some $δ\in (0,1]$. By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS attaining the regret bound of order $\tilde O(d^\frac{3}{2}r^\frac{1}{2}T^\frac{1}{1+δ}/\tilde{D}_{rr})$ without knowing $T$, which matches the state-of-the-art regret bound under sub-Gaussian noises~\citep{lu2021low,kang2022efficient} with $δ= 1$. Moreover, we establish a lower bound of the order $Ω(d^\fracδ{1+δ} r^\fracδ{1+δ} T^\frac{1}{1+δ}) = Ω(T^\frac{1}{1+δ})$ for LowHTR, which indicates our LOTUS is nearly optimal in the order of $T$. In addition, we improve LOTUS so that it does not require knowledge of the rank $r$ with $\tilde O(dr^\frac{3}{2}T^\frac{1+δ}{1+2δ})$ regret bound, and it is efficient under the high-dimensional scenario. We also conduct simulations to demonstrate the practical superiority of our algorithm.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Generalized boost transformations in finite volumes and application to Hamiltonian methods
Authors:
Yan Li,
Jia-Jun Wu,
T. -S. H. Lee,
R. D. Young
Abstract:
The investigation of hadron interactions within lattice QCD has been facilitated by the well-known quantisation condition, linking scattering phase shifts to finite-volume energies. Additionally, the ability to utilise systems at finite total boosts has been pivotal in smoothly charting the energy-dependent behaviour of these phase shifts. The existing implementations of the quantization condition…
▽ More
The investigation of hadron interactions within lattice QCD has been facilitated by the well-known quantisation condition, linking scattering phase shifts to finite-volume energies. Additionally, the ability to utilise systems at finite total boosts has been pivotal in smoothly charting the energy-dependent behaviour of these phase shifts. The existing implementations of the quantization condition at finite boosts rely on momentum transformations between rest and moving frames, defined directly in terms of the energy eigenvalues. This energy dependence is unsuitable in the formulation of a Hamiltonian.In this work, we introduce a novel approach to generalise the three-momentum boost prescription, enabling the incorporation of energy-independent finite-volume Hamiltonians within moving frames. We demonstrate the application of our method through numerical comparisons, employing a phenomenological $ππ$ scattering example.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Authors:
Marah Abdin,
Jyoti Aneja,
Hany Awadalla,
Ahmed Awadallah,
Ammar Ahmad Awan,
Nguyen Bach,
Amit Bahree,
Arash Bakhtiari,
Jianmin Bao,
Harkirat Behl,
Alon Benhaim,
Misha Bilenko,
Johan Bjorck,
Sébastien Bubeck,
Martin Cai,
Qin Cai,
Vishrav Chaudhary,
Dong Chen,
Dongdong Chen,
Weizhu Chen,
Yen-Chun Chen,
Yi-Ling Chen,
Hao Cheng,
Parul Chopra,
Xiyang Dai
, et al. (104 additional authors not shown)
Abstract:
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version…
▽ More
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce three models in the phi-3.5 series: phi-3.5-mini, phi-3.5-MoE, and phi-3.5-Vision. The phi-3.5-MoE, a 16 x 3.8B MoE model with 6.6 billion active parameters, achieves superior performance in language reasoning, math, and code tasks compared to other open-source models of similar scale, such as Llama 3.1 and the Mixtral series, and on par with Gemini-1.5-Flash and GPT-4o-mini. Meanwhile, phi-3.5-Vision, a 4.2 billion parameter model derived from phi-3.5-mini, excels in reasoning tasks and is adept at handling both single-image and text prompts, as well as multi-image and text prompts.
△ Less
Submitted 30 August, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Fidelitous Augmentation of Human Accelerometric Data for Deep Learning
Authors:
Tracey K. M. Lee,
H. W. Chan,
K. H. Leo,
Effie Chew,
L. Zhao,
Saeid Sanei
Abstract:
Time series (TS) data have consistently been in short supply, yet their demand remains high for training systems in prediction, modeling, classification, and various other applications. Synthesis can serve to expand the sample population, yet it is crucial to maintain the statistical characteristics between the synthesized and the original TS : this ensures consistent sampling of data for both tra…
▽ More
Time series (TS) data have consistently been in short supply, yet their demand remains high for training systems in prediction, modeling, classification, and various other applications. Synthesis can serve to expand the sample population, yet it is crucial to maintain the statistical characteristics between the synthesized and the original TS : this ensures consistent sampling of data for both training and testing purposes. However the time domain features of the data may not be maintained. This motivates for our work, the objective which is to preserve the following features in a synthesized TS: its fundamental statistical characteristics and important time domain features like its general shape and prominent transients. In a novel way, we first isolate important TS features into various components using a spectrogram and singular spectrum analysis. The residual signal is then randomized in a way that preserves its statistical properties. These components are then recombined for the synthetic time series. Using accelerometer data in a clinical setting, we use statistical and shape measures to compare our method to others. We show it has higher fidelity to the original signal features, has good diversity and performs better data classification in a deep learning application.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Object Remover Performance Evaluation Methods using Class-wise Object Removal Images
Authors:
Changsuk Oh,
Dongseok Shim,
Taekbeom Lee,
H. Jin Kim
Abstract:
Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current work…
▽ More
Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current works reporting quantitative performance evaluations utilize original images as references. In this letter, to validate the current evaluation methods cannot properly evaluate the performance of an object remover, we create a dataset with object removal ground truth and compare the evaluations made by the current methods using original images to those utilizing object removal ground truth images. The disparities between two evaluation sets validate that the current methods are not suitable for measuring the performance of an object remover. Additionally, we propose new evaluation methods tailored to gauge the performance of an object remover. The proposed methods evaluate the performance through class-wise object removal results and utilize images without the target class objects as a comparison set. We confirm that the proposed methods can make judgments consistent with human evaluators in the COCO dataset, and that they can produce measurements aligning with those using object removal ground truth in the self-acquired dataset.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
AutoGFI: Streamlined Generalized Fiducial Inference for Modern Inference Problems in Models with Additive Errors
Authors:
Wei Du,
Jan Hannig,
Thomas C. M. Lee,
Yi Su,
Chunzhe Zhang
Abstract:
The concept of fiducial inference was introduced by R. A. Fisher in the 1930s to address the perceived limitations of Bayesian inference, particularly the need for subjective prior distributions in cases with limited prior information. However, Fisher's fiducial approach lost favor due to complications, especially in multi-parameter problems. With renewed interest in fiducial inference in the 2000…
▽ More
The concept of fiducial inference was introduced by R. A. Fisher in the 1930s to address the perceived limitations of Bayesian inference, particularly the need for subjective prior distributions in cases with limited prior information. However, Fisher's fiducial approach lost favor due to complications, especially in multi-parameter problems. With renewed interest in fiducial inference in the 2000s, generalized fiducial inference (GFI) emerged as a promising extension of Fisher's ideas, offering new solutions for complex inference challenges. Despite its potential, GFI's adoption has been hindered by demanding mathematical derivations and complex implementation requirements, such as Markov Chain Monte Carlo (MCMC) algorithms. This paper introduces AutoGFI, a streamlined variant of GFI designed to simplify its application across various inference problems with additive noise. AutoGFI's accessibility lies in its simplicity-requiring only a fitting routine-making it a feasible option for a wider range of researchers and practitioners. To demonstrate its efficacy, AutoGFI is applied to three challenging problems: tensor regression, matrix completion, and network cohesion regression. These case studies showcase AutoGFI's competitive performance against specialized solutions, highlighting its potential to broaden the application of GFI in practical domains, ultimately enriching the statistical inference toolkit.
△ Less
Submitted 24 December, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Hyperparameter Selection in Continual Learning
Authors:
Thomas L. Lee,
Sigrid Passano Hellan,
Linus Ericsson,
Elliot J. Crowley,
Amos Storkey
Abstract:
In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparam…
▽ More
In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparameter settings. However, this end-of-training HPO is unusable in practice since a learner can only see the stream once. Hence, there is an open question: what HPO framework should a practitioner use for a CL problem in reality? This paper looks at this question by comparing several realistic HPO frameworks. We find that none of the HPO frameworks considered, including end-of-training HPO, perform consistently better than the rest on popular CL benchmarks. We therefore arrive at a twofold conclusion: a) to be able to discriminate between HPO frameworks there is a need to move beyond the current most commonly used CL benchmarks, and b) on the popular CL benchmarks examined, a CL practitioner should use a realistic HPO framework and can select it based on factors separate from performance, for example compute efficiency.
△ Less
Submitted 14 March, 2025; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Classification of Nasopharyngeal Cases using DenseNet Deep Learning Architecture
Authors:
W. S. H. M. W. Ahmad,
M. F. A. Fauzi,
M. K. Abdullahi,
Jenny T. H. Lee,
N. S. A. Basry,
A Yahaya,
A. M. Ismail,
A. Adam,
Elaine W. L. Chan,
F. S. Abas
Abstract:
Nasopharyngeal carcinoma (NPC) is one of the understudied yet deadliest cancers in South East Asia. In Malaysia, the prevalence is identified mainly in Sarawak, among the ethnic of Bidayuh. NPC is often late-diagnosed because it is asymptomatic at the early stage. There are several tissue representations from the nasopharynx biopsy, such as nasopharyngeal inflammation (NPI), lymphoid hyperplasia (…
▽ More
Nasopharyngeal carcinoma (NPC) is one of the understudied yet deadliest cancers in South East Asia. In Malaysia, the prevalence is identified mainly in Sarawak, among the ethnic of Bidayuh. NPC is often late-diagnosed because it is asymptomatic at the early stage. There are several tissue representations from the nasopharynx biopsy, such as nasopharyngeal inflammation (NPI), lymphoid hyperplasia (LHP), nasopharyngeal carcinoma (NPC) and normal tissue. This paper is our first initiative to identify the difference between NPC, NPI and normal cases. Seven whole slide images (WSIs) with gigapixel resolutions from seven different patients and two hospitals were experimented with using two test setups, consisting of a different set of images. The tissue regions are patched into smaller blocks and classified using DenseNet architecture with 21 dense layers. Two tests are carried out, each for proof of concept (Test 1) and real-test scenario (Test 2). The accuracy achieved for NPC class is 94.8% for Test 1 and 67.0% for Test 2.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Mass calibration of DES Year-3 clusters via SPT-3G CMB cluster lensing
Authors:
B. Ansarinejad,
S. Raghunathan,
T. M. C. Abbott,
P. A. R. Ade,
M. Aguena,
O. Alves,
A. J. Anderson,
F. Andrade-Oliveira,
M. Archipley,
L. Balkenhol,
K. Benabed,
A. N. Bender,
B. A. Benson,
E. Bertin,
F. Bianchini,
L. E. Bleem,
S. Bocquet,
F. R. Bouchet,
D. Brooks,
L. Bryant,
D. L. Burke,
E. Camphuis,
J. E. Carlstrom,
A. Carnero Rosell,
J. Carretero
, et al. (120 additional authors not shown)
Abstract:
We measure the stacked lensing signal in the direction of galaxy clusters in the Dark Energy Survey Year 3 (DES Y3) redMaPPer sample, using cosmic microwave background (CMB) temperature data from SPT-3G, the third-generation CMB camera on the South Pole Telescope (SPT). We estimate the lensing signal using temperature maps constructed from the initial 2 years of data from the SPT-3G 'Main' survey,…
▽ More
We measure the stacked lensing signal in the direction of galaxy clusters in the Dark Energy Survey Year 3 (DES Y3) redMaPPer sample, using cosmic microwave background (CMB) temperature data from SPT-3G, the third-generation CMB camera on the South Pole Telescope (SPT). We estimate the lensing signal using temperature maps constructed from the initial 2 years of data from the SPT-3G 'Main' survey, covering 1500 deg$^2$ of the Southern sky. We then use this signal as a proxy for the mean cluster mass of the DES sample. In this work, we employ three versions of the redMaPPer catalogue: a Flux-Limited sample containing 8865 clusters, a Volume-Limited sample with 5391 clusters, and a Volume&Redshift-Limited sample with 4450 clusters. For the three samples, we find the mean cluster masses to be ${M}_{200{\rm{m}}}=1.66\pm0.13$ [stat.]$\pm0.03$ [sys.], $1.97\pm0.18$ [stat.]$\pm0.05$ [sys.], and $2.11\pm0.20$ [stat.]$\pm0.05$ [sys.]$\times{10}^{14}\ {\rm{M}}_{\odot }$, respectively. This is a factor of $\sim2$ improvement relative to the precision of measurements with previous generations of SPT surveys and the most constraining cluster mass measurements using CMB cluster lensing to date. Overall, we find no significant tensions between our results and masses given by redMaPPer mass-richness scaling relations of previous works, which were calibrated using CMB cluster lensing, optical weak lensing, and velocity dispersion measurements from various combinations of DES, SDSS and Planck data. We then divide our sample into 3 redshift and 3 richness bins, finding no significant tensions with optical weak-lensing calibrated masses in these bins. We forecast a $5.7\%$ constraint on the mean cluster mass of the DES Y3 sample with the complete SPT-3G surveys when using both temperature and polarization data and including an additional $\sim1400$ deg$^2$ of observations from the 'Extended' SPT-3G survey.
△ Less
Submitted 12 June, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation
Authors:
Taeckyung Lee,
Sorn Chottananurak,
Taesik Gong,
Sung-Ju Lee
Abstract:
Test-time adaptation (TTA) has emerged as a viable solution to adapt pre-trained models to domain shifts using unlabeled test data. However, TTA faces challenges of adaptation failures due to its reliance on blind adaptation to unknown test samples in dynamic scenarios. Traditional methods for out-of-distribution performance estimation are limited by unrealistic assumptions in the TTA context, suc…
▽ More
Test-time adaptation (TTA) has emerged as a viable solution to adapt pre-trained models to domain shifts using unlabeled test data. However, TTA faces challenges of adaptation failures due to its reliance on blind adaptation to unknown test samples in dynamic scenarios. Traditional methods for out-of-distribution performance estimation are limited by unrealistic assumptions in the TTA context, such as requiring labeled data or re-training models. To address this issue, we propose AETTA, a label-free accuracy estimation algorithm for TTA. We propose the prediction disagreement as the accuracy estimate, calculated by comparing the target model prediction with dropout inferences. We then improve the prediction disagreement to extend the applicability of AETTA under adaptation failures. Our extensive evaluation with four baselines and six TTA methods demonstrates that AETTA shows an average of 19.8%p more accurate estimation compared with the baselines. We further demonstrate the effectiveness of accuracy estimation with a model recovery case study, showcasing the practicality of our model recovery based on accuracy estimation. The source code is available at https://github.com/taeckyung/AETTA.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Learning Service Selection Decision Making Behaviors During Scientific Workflow Development
Authors:
Xihao Xie,
Jia Zhang,
Rahul Ramachandran,
Tsengdar J. Lee,
Seungwon Lee
Abstract:
Increasingly, more software services have been published onto the Internet, making it a big challenge to recommend services in the process of a scientific workflow composition. In this paper, a novel context-aware approach is proposed to recommending next services in a workflow development process, through learning service representation and service selection decision making behaviors from workflo…
▽ More
Increasingly, more software services have been published onto the Internet, making it a big challenge to recommend services in the process of a scientific workflow composition. In this paper, a novel context-aware approach is proposed to recommending next services in a workflow development process, through learning service representation and service selection decision making behaviors from workflow provenance. Inspired by natural language sentence generation, the composition process of a scientific workflow is formalized as a step-wise procedure within the context of the goal of workflow, and the problem of next service recommendation is mapped to next word prediction. Historical service dependencies are first extracted from scientific workflow provenance to build a knowledge graph. Service sequences are then generated based on diverse composition path generation strategies. Afterwards, the generated corpus of composition paths are leveraged to study previous decision making strategies. Such a trained goal-oriented next service prediction model will be used to recommend top K candidate services during workflow composition process. Extensive experiments on a real-word repository have demonstrated the effectiveness of this approach.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks
Authors:
Hyunjae Kim,
Hyeon Hwang,
Jiwoo Lee,
Sihyeon Park,
Dain Kim,
Taewhoo Lee,
Chanwoong Yoon,
Jiwoong Sohn,
Donghee Choi,
Jaewoo Kang
Abstract:
While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving co…
▽ More
While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving complex medical problems. To address this, we introduce Meerkat, a new family of medical AI systems ranging from 7 to 70 billion parameters. The models were trained using our new synthetic dataset consisting of high-quality chain-of-thought reasoning paths sourced from 18 medical textbooks, along with diverse instruction-following datasets. Our systems achieved remarkable accuracy across six medical benchmarks, surpassing the previous best models such as MediTron and BioMistral, and GPT-3.5 by a large margin. Notably, Meerkat-7B surpassed the passing threshold of the United States Medical Licensing Examination (USMLE) for the first time for a 7B-parameter model, while Meerkat-70B outperformed GPT-4 by an average of 1.3%. Additionally, Meerkat-70B correctly diagnosed 21 out of 38 complex clinical cases, outperforming humans' 13.8 and closely matching GPT-4's 21.8. Our systems offered more detailed free-form responses to clinical queries compared to existing small models, approaching the performance level of large commercial models. This significantly narrows the performance gap with large LMs, showcasing its effectiveness in addressing complex medical challenges.
△ Less
Submitted 30 June, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
Grid Diffusion Models for Text-to-Video Generation
Authors:
Taegyeong Lee,
Soyeong Kwon,
Taehwan Kim
Abstract:
Recent advances in the diffusion models have significantly improved text-to-image generation. However, generating videos from text is a more challenging task than generating images from text, due to the much larger dataset and higher computational cost required. Most existing video generation methods use either a 3D U-Net architecture that considers the temporal dimension or autoregressive generat…
▽ More
Recent advances in the diffusion models have significantly improved text-to-image generation. However, generating videos from text is a more challenging task than generating images from text, due to the much larger dataset and higher computational cost required. Most existing video generation methods use either a 3D U-Net architecture that considers the temporal dimension or autoregressive generation. These methods require large datasets and are limited in terms of computational costs compared to text-to-image generation. To tackle these challenges, we propose a simple but effective novel grid diffusion for text-to-video generation without temporal dimension in architecture and a large text-video paired dataset. We can generate a high-quality video using a fixed amount of GPU memory regardless of the number of frames by representing the video as a grid image. Additionally, since our method reduces the dimensions of the video to the dimensions of the image, various image-based methods can be applied to videos, such as text-guided video manipulation from image manipulation. Our proposed method outperforms the existing methods in both quantitative and qualitative evaluations, demonstrating the suitability of our model for real-world video generation.
△ Less
Submitted 30 December, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
Stable Surface Regularization for Fast Few-Shot NeRF
Authors:
Byeongin Joung,
Byeong-Uk Lee,
Jaesung Choe,
Ukcheol Shin,
Minjun Kang,
Taeyeop Lee,
In So Kweon,
Kuk-Jin Yoon
Abstract:
This paper proposes an algorithm for synthesizing novel views under few-shot setup. The main concept is to develop a stable surface regularization technique called Annealing Signed Distance Function (ASDF), which anneals the surface in a coarse-to-fine manner to accelerate convergence speed. We observe that the Eikonal loss - which is a widely known geometric regularization - requires dense traini…
▽ More
This paper proposes an algorithm for synthesizing novel views under few-shot setup. The main concept is to develop a stable surface regularization technique called Annealing Signed Distance Function (ASDF), which anneals the surface in a coarse-to-fine manner to accelerate convergence speed. We observe that the Eikonal loss - which is a widely known geometric regularization - requires dense training signal to shape different level-sets of SDF, leading to low-fidelity results under few-shot training. In contrast, the proposed surface regularization successfully reconstructs scenes and produce high-fidelity geometry with stable training. Our method is further accelerated by utilizing grid representation and monocular geometric priors. Finally, the proposed approach is up to 45 times faster than existing few-shot novel view synthesis methods, and it produces comparable results in the ScanNet dataset and NeRF-Real dataset.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Break-for-Make: Modular Low-Rank Adaptations for Composable Content-Style Customization
Authors:
Yu Xu,
Fan Tang,
Juan Cao,
Yuxin Zhang,
Oliver Deussen,
Weiming Dong,
Jintao Li,
Tong-Yee Lee
Abstract:
Personalized generation paradigms empower designers to customize visual intellectual properties with the help of textual descriptions by tuning or adapting pre-trained text-to-image models on a few images. Recent works explore approaches for concurrently customizing both content and detailed visual style appearance. However, these existing approaches often generate images where the content and sty…
▽ More
Personalized generation paradigms empower designers to customize visual intellectual properties with the help of textual descriptions by tuning or adapting pre-trained text-to-image models on a few images. Recent works explore approaches for concurrently customizing both content and detailed visual style appearance. However, these existing approaches often generate images where the content and style are entangled. In this study, we reconsider the customization of content and style concepts from the perspective of parameter space construction. Unlike existing methods that utilize a shared parameter space for content and style, we propose a learning framework that separates the parameter space to facilitate individual learning of content and style, thereby enabling disentangled content and style. To achieve this goal, we introduce "partly learnable projection" (PLP) matrices to separate the original adapters into divided sub-parameter spaces. We propose "break-for-make" customization learning pipeline based on PLP, which is simple yet effective. We break the original adapters into "up projection" and "down projection", train content and style PLPs individually with the guidance of corresponding textual prompts in the separate adapters, and maintain generalization by employing a multi-correspondence projection learning strategy. Based on the adapters broken apart for separate training content and style, we then make the entity parameter space by reconstructing the content and style PLPs matrices, followed by fine-tuning the combined adapter to generate the target object with the desired appearance. Experiments on various styles, including textures, materials, and artistic style, show that our method outperforms state-of-the-art single/multiple concept learning pipelines in terms of content-style-prompt alignment.
△ Less
Submitted 31 March, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Improving the Bit Complexity of Communication for Distributed Convex Optimization
Authors:
Mehrdad Ghadiri,
Yin Tat Lee,
Swati Padmanabhan,
William Swartworth,
David Woodruff,
Guanghao Ye
Abstract:
We consider the communication complexity of some fundamental convex optimization problems in the point-to-point (coordinator) and blackboard communication models. We strengthen known bounds for approximately solving linear regression, $p$-norm regression (for $1\leq p\leq 2$), linear programming, minimizing the sum of finitely many convex nonsmooth functions with varying supports, and low rank app…
▽ More
We consider the communication complexity of some fundamental convex optimization problems in the point-to-point (coordinator) and blackboard communication models. We strengthen known bounds for approximately solving linear regression, $p$-norm regression (for $1\leq p\leq 2$), linear programming, minimizing the sum of finitely many convex nonsmooth functions with varying supports, and low rank approximation; for a number of these fundamental problems our bounds are nearly optimal, as proven by our lower bounds.
Among our techniques, we use the notion of block leverage scores, which have been relatively unexplored in this context, as well as dropping all but the ``middle" bits in Richardson-style algorithms. We also introduce a new communication problem for accurately approximating inner products and establish a lower bound using the spherical Radon transform. Our lower bound can be used to show the first separation of linear programming and linear systems in the distributed model when the number of constraints is polynomial, addressing an open question in prior work.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Authors:
Elliot Bolton,
Abhinav Venigalla,
Michihiro Yasunaga,
David Hall,
Betty Xiong,
Tony Lee,
Roxana Daneshjou,
Jonathan Frankle,
Percy Liang,
Michael Carbin,
Christopher D. Manning
Abstract:
Models such as GPT-4 and Med-PaLM 2 have demonstrated impressive performance on a wide variety of biomedical NLP tasks. However, these models have hundreds of billions of parameters, are computationally expensive to run, require users to send their input data over the internet, and are trained on unknown data sources. Can smaller, more targeted models compete? To address this question, we build an…
▽ More
Models such as GPT-4 and Med-PaLM 2 have demonstrated impressive performance on a wide variety of biomedical NLP tasks. However, these models have hundreds of billions of parameters, are computationally expensive to run, require users to send their input data over the internet, and are trained on unknown data sources. Can smaller, more targeted models compete? To address this question, we build and release BioMedLM, a 2.7 billion parameter GPT-style autoregressive model trained exclusively on PubMed abstracts and full articles. When fine-tuned, BioMedLM can produce strong multiple-choice biomedical question-answering results competitive with much larger models, such as achieving a score of 57.3% on MedMCQA (dev) and 69.0% on the MMLU Medical Genetics exam. BioMedLM can also be fine-tuned to produce useful answers to patient questions on medical topics. This demonstrates that smaller models can potentially serve as transparent, privacy-preserving, economical and environmentally friendly foundations for particular NLP applications, such as in biomedicine. The model is available on the Hugging Face Hub: https://huggingface.co/stanford-crfm/BioMedLM.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Testing the $\mathbfΛ$CDM Cosmological Model with Forthcoming Measurements of the Cosmic Microwave Background with SPT-3G
Authors:
K. Prabhu,
S. Raghunathan,
M. Millea,
G. Lynch,
P. A. R. Ade,
E. Anderes,
A. J. Anderson,
B. Ansarinejad,
M. Archipley,
L. Balkenhol,
K. Benabed,
A. N. Bender,
B. A. Benson,
F. Bianchini,
L. E. Bleem,
F. R. Bouchet,
L. Bryant,
E. Camphuis,
J. E. Carlstrom,
T. W. Cecil,
C. L. Chang,
P. Chaubal,
P. M. Chichura,
T. -L. Chou,
A. Coerver
, et al. (76 additional authors not shown)
Abstract:
We forecast constraints on cosmological parameters enabled by three surveys conducted with SPT-3G, the third-generation camera on the South Pole Telescope. The surveys cover separate regions of 1500, 2650, and 6000 ${\rm deg}^{2}$ to different depths, in total observing 25% of the sky. These regions will be measured to white noise levels of roughly 2.5, 9, and 12 $μ{\rm K-arcmin}$, respectively, i…
▽ More
We forecast constraints on cosmological parameters enabled by three surveys conducted with SPT-3G, the third-generation camera on the South Pole Telescope. The surveys cover separate regions of 1500, 2650, and 6000 ${\rm deg}^{2}$ to different depths, in total observing 25% of the sky. These regions will be measured to white noise levels of roughly 2.5, 9, and 12 $μ{\rm K-arcmin}$, respectively, in CMB temperature units at 150 GHz by the end of 2024. The survey also includes measurements at 95 and 220 GHz, which have noise levels a factor of ~1.2 and 3.5 times higher than 150 GHz, respectively, with each band having a polarization noise level ~$\sqrt{\text{2}}$ times higher than the temperature noise. We use a novel approach to obtain the covariance matrices for jointly and optimally estimated gravitational lensing potential bandpowers and unlensed CMB temperature and polarization bandpowers. We demonstrate the ability to test the $Λ{\rm CDM}$ model via the consistency of cosmological parameters constrained independently from SPT-3G and Planck data, and consider the improvement in constraints on $Λ{\rm CDM}$ extension parameters from a joint analysis of SPT-3G and Planck data. The $Λ{\rm CDM}$ cosmological parameters are typically constrained with uncertainties up to ~2 times smaller with SPT-3G data, compared to Planck, with the two data sets measuring significantly different angular scales and polarization levels, providing additional tests of the standard cosmological model.
△ Less
Submitted 9 September, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Calibration of detector time constant with a thermal source for the POLARBEAR-2A CMB polarization experiment
Authors:
S. Takatori,
M. Hasegawa,
M. Hazumi,
D. Kaneko,
N. Katayama,
A. T. Lee,
S. Takakura,
T. Tomaru,
T. Adkins,
D. Barron,
Y. Chinone,
K. T. Crowley,
T. de Haan,
T. Elleflot,
N. Farias,
C. Feng,
T. Fujino,
J. C. Groh,
H. Hirose,
F. Matsuda,
H. Nishino,
Y. Segawa,
P. Siritanasak,
A. Suzuki,
K. Yamada
Abstract:
The Simons Array (SA) project is a ground-based Cosmic Microwave Background (CMB) polarization experiment. The SA observes the sky using three telescopes, and POLARBEAR-2A (PB-2A) is the receiver system on the first telescope. For the ground-based experiment, atmospheric fluctuation is the primary noise source that could cause polarization leakage. In the PB-2A receiver system, a continuously rota…
▽ More
The Simons Array (SA) project is a ground-based Cosmic Microwave Background (CMB) polarization experiment. The SA observes the sky using three telescopes, and POLARBEAR-2A (PB-2A) is the receiver system on the first telescope. For the ground-based experiment, atmospheric fluctuation is the primary noise source that could cause polarization leakage. In the PB-2A receiver system, a continuously rotating half-wave plate (HWP) is used to mitigate the polarization leakage. However, due to the rapid modulation of the polarization signal, the uncertainty in the time constant of the detector results in an uncertainty in the polarization angle. For PB-2A, the time constant of each bolometer needs to be calibrated at the sub-millisecond level to avoid introducing bias to the polarization signal. We have developed a new calibrator system that can be used to calibrate the time constants of the detectors. In this study, we present the design of the calibration system and the preliminary results of the time constant calibration for PB-2A.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
Authors:
Ziyao Huang,
Fan Tang,
Yong Zhang,
Xiaodong Cun,
Juan Cao,
Jintao Li,
Tong-Yee Lee
Abstract:
Despite the remarkable process of talking-head-based avatar-creating solutions, directly generating anchor-style videos with full-body motions remains challenging. In this study, we propose Make-Your-Anchor, a novel system necessitating only a one-minute video clip of an individual for training, subsequently enabling the automatic generation of anchor-style videos with precise torso and hand movem…
▽ More
Despite the remarkable process of talking-head-based avatar-creating solutions, directly generating anchor-style videos with full-body motions remains challenging. In this study, we propose Make-Your-Anchor, a novel system necessitating only a one-minute video clip of an individual for training, subsequently enabling the automatic generation of anchor-style videos with precise torso and hand movements. Specifically, we finetune a proposed structure-guided diffusion model on input video to render 3D mesh conditions into human appearances. We adopt a two-stage training strategy for the diffusion model, effectively binding movements with specific appearances. To produce arbitrary long temporal video, we extend the 2D U-Net in the frame-wise diffusion model to a 3D style without additional training cost, and a simple yet effective batch-overlapped temporal denoising module is proposed to bypass the constraints on video length during inference. Finally, a novel identity-specific face enhancement module is introduced to improve the visual quality of facial regions in the output videos. Comparative experiments demonstrate the effectiveness and superiority of the system in terms of visual quality, temporal coherence, and identity preservation, outperforming SOTA diffusion/non-diffusion methods. Project page: \url{https://github.com/ICTMCG/Make-Your-Anchor}.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Collaborative AI Teaming in Unknown Environments via Active Goal Deduction
Authors:
Zuyuan Zhang,
Hanhan Zhou,
Mahdi Imani,
Taeyoung Lee,
Tian Lan
Abstract:
With the advancements of artificial intelligence (AI), we're seeing more scenarios that require AI to work closely with other agents, whose goals and strategies might not be known beforehand. However, existing approaches for training collaborative agents often require defined and known reward signals and cannot address the problem of teaming with unknown agents that often have latent objectives/re…
▽ More
With the advancements of artificial intelligence (AI), we're seeing more scenarios that require AI to work closely with other agents, whose goals and strategies might not be known beforehand. However, existing approaches for training collaborative agents often require defined and known reward signals and cannot address the problem of teaming with unknown agents that often have latent objectives/rewards. In response to this challenge, we propose teaming with unknown agents framework, which leverages kernel density Bayesian inverse learning method for active goal deduction and utilizes pre-trained, goal-conditioned policies to enable zero-shot policy adaptation. We prove that unbiased reward estimates in our framework are sufficient for optimal teaming with unknown agents. We further evaluate the framework of redesigned multi-agent particle and StarCraft II micromanagement environments with diverse unknown agents of different behaviors/rewards. Empirical results demonstrate that our framework significantly advances the teaming performance of AI and unknown agents in a wide range of collaborative scenarios.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Certain functional identities on division rings of characteristic two
Authors:
Münevver Pınar Eroğlu,
Tsiu-Kwen Lee,
Jheng-Huei Lin
Abstract:
Let $D$ be a noncommutative division ring. In a recent paper, Lee and Lin proved that if $\text{char}\, D\ne 2$, the only solution of additive maps $f, g$ on $D$ satisfying the identity $f(x) = x^n g(x^{-1})$ on $D\setminus \{0\}$ with $n\ne 2$ a positive integer is the trivial case, that is, $f=0$ and $g=0$. Applying Hua's identity and the theory of functional and generalized polynomial identitie…
▽ More
Let $D$ be a noncommutative division ring. In a recent paper, Lee and Lin proved that if $\text{char}\, D\ne 2$, the only solution of additive maps $f, g$ on $D$ satisfying the identity $f(x) = x^n g(x^{-1})$ on $D\setminus \{0\}$ with $n\ne 2$ a positive integer is the trivial case, that is, $f=0$ and $g=0$. Applying Hua's identity and the theory of functional and generalized polynomial identities, we give a complete solution of the same identity for any nonnegative integer $n$ if $\text{char}\, D=2$.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Branching algebras for the general linear Lie superalgebra
Authors:
Soo Teck Lee,
Ruibin Zhang
Abstract:
We develop an algebraic approach to the branching of representations of the general linear Lie superalgebra $\mathfrak{gl}_{p|q}({\mathbb C})$, by constructing certain super commutative algebras whose structure encodes the branching rules. Using this approach, we derive the branching rules for restricting any irreducible polynomial representation $V$ of $\mathfrak{gl}_{p|q}({\mathbb C})$ to a regu…
▽ More
We develop an algebraic approach to the branching of representations of the general linear Lie superalgebra $\mathfrak{gl}_{p|q}({\mathbb C})$, by constructing certain super commutative algebras whose structure encodes the branching rules. Using this approach, we derive the branching rules for restricting any irreducible polynomial representation $V$ of $\mathfrak{gl}_{p|q}({\mathbb C})$ to a regular subalgebra isomorphic to $\mathfrak{gl}_{r|s}({\mathbb C})\oplus \mathfrak{gl}_{r'|s'}({\mathbb C})$, $\mathfrak{gl}_{r|s}({\mathbb C})\oplus\mathfrak{gl}_1({\mathbb C})^{r'+s'}$ or $\mathfrak{gl}_{r|s}({\mathbb C})$, with $r+r'=p$ and $s+s'=q$. In the case of $\mathfrak{gl}_{r|s}({\mathbb C})\oplus\mathfrak{gl}_1({\mathbb C})^{r'+s'}$ with $s=0$ or $s=1$ but general $r$, we also construct a basis for the space of $\mathfrak{gl}_{r|s}({\mathbb C})$ highest weight vectors in $V$; when $r=s=0$, the branching rule leads to explicit expressions for the weight multiplicities of $V$ in terms of Kostka numbers.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Towards Embedding Dynamic Personas in Interactive Robots: Masquerading Animated Social Kinematics (MASK)
Authors:
Jeongeun Park,
Taemoon Jeong,
Hyeonseong Kim,
Taehyun Byun,
Seungyoon Shin,
Keunjun Choi,
Jaewoon Kwon,
Taeyoon Lee,
Matthew Pan,
Sungjoon Choi
Abstract:
This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent's application to the physical realm, employing robots to provide a more captivating and interactive experience. The proposed system, named the Masquerading Ani…
▽ More
This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent's application to the physical realm, employing robots to provide a more captivating and interactive experience. The proposed system, named the Masquerading Animated Social Kinematic (MASK), leverages an anthropomorphic robot which interacts with guests using non-verbal interactions, including facial expressions and gestures. A behavior generation system based upon a finite-state machine structure effectively conditions robotic behavior to convey distinct personas. The MASK framework integrates a perception engine, a behavior selection engine, and a comprehensive action library to enable real-time, dynamic interactions with minimal human intervention in behavior design. Throughout the user subject studies, we examined whether the users could recognize the intended character in both personality- and film-character-based persona conditions. We conclude by discussing the role of personas in interactive agents and the factors to consider for creating an engaging user experience.
△ Less
Submitted 7 October, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
RECIPE4U: Student-ChatGPT Interaction Dataset in EFL Writing Education
Authors:
Jieun Han,
Haneul Yoo,
Junho Myung,
Minsun Kim,
Tak Yeon Lee,
So-Yeon Ahn,
Alice Oh
Abstract:
The integration of generative AI in education is expanding, yet empirical analyses of large-scale and real-world interactions between students and AI systems still remain limited. Addressing this gap, we present RECIPE4U (RECIPE for University), a dataset sourced from a semester-long experiment with 212 college students in English as Foreign Language (EFL) writing courses. During the study, studen…
▽ More
The integration of generative AI in education is expanding, yet empirical analyses of large-scale and real-world interactions between students and AI systems still remain limited. Addressing this gap, we present RECIPE4U (RECIPE for University), a dataset sourced from a semester-long experiment with 212 college students in English as Foreign Language (EFL) writing courses. During the study, students engaged in dialogues with ChatGPT to revise their essays. RECIPE4U includes comprehensive records of these interactions, including conversation logs, students' intent, students' self-rated satisfaction, and students' essay edit histories. In particular, we annotate the students' utterances in RECIPE4U with 13 intention labels based on our coding schemes. We establish baseline results for two subtasks in task-oriented dialogue systems within educational contexts: intent detection and satisfaction estimation. As a foundational step, we explore student-ChatGPT interaction patterns through RECIPE4U and analyze them by focusing on students' dialogue, essay data statistics, and students' essay edits. We further illustrate potential applications of RECIPE4U dataset for enhancing the incorporation of LLMs in educational frameworks. RECIPE4U is publicly available at https://zeunie.github.io/RECIPE4U/.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Physics-Inspired Deep Learning Anti-Aliasing Framework in Efficient Channel State Feedback
Authors:
Yu-Chien Lin,
Yan Xin,
Ta-Sung Lee,
Charlie,
Zhang,
Zhi Ding
Abstract:
Acquiring downlink channel state information (CSI) at the base station is vital for optimizing performance in massive Multiple input multiple output (MIMO) Frequency-Division Duplexing (FDD) systems. While deep learning architectures have been successful in facilitating UE-side CSI feedback and gNB-side recovery, the undersampling issue prior to CSI feedback is often overlooked. This issue, which…
▽ More
Acquiring downlink channel state information (CSI) at the base station is vital for optimizing performance in massive Multiple input multiple output (MIMO) Frequency-Division Duplexing (FDD) systems. While deep learning architectures have been successful in facilitating UE-side CSI feedback and gNB-side recovery, the undersampling issue prior to CSI feedback is often overlooked. This issue, which arises from low density pilot placement in current standards, results in significant aliasing effects in outdoor channels and consequently limits CSI recovery performance. To this end, this work introduces a new CSI upsampling framework at the gNB as a post-processing solution to address the gaps caused by undersampling. Leveraging the physical principles of discrete Fourier transform shifting theorem and multipath reciprocity, our framework effectively uses uplink CSI to mitigate aliasing effects. We further develop a learning-based method that integrates the proposed algorithm with the Iterative Shrinkage-Thresholding Algorithm Net (ISTA-Net) architecture, enhancing our approach for non-uniform sampling recovery. Our numerical results show that both our rule-based and deep learning methods significantly outperform traditional interpolation techniques and current state-of-the-art approaches in terms of performance.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
First Constraints on the Epoch of Reionization Using the non-Gaussianity of the Kinematic Sunyaev-Zel{'}dovich Effect from the South Pole Telescope and {\it Herschel}-SPIRE Observations
Authors:
S. Raghunathan,
P. A. R. Ade,
A. J. Anderson,
B. Ansarinejad,
M. Archipley,
J. E. Austermann,
L. Balkenhol,
J. A. Beall,
K. Benabed,
A. N. Bender,
B. A. Benson,
F. Bianchini,
L. E. Bleem,
J. Bock,
F. R. Bouchet,
L. Bryant,
E. Camphuis,
J. E. Carlstrom,
T. W. Cecil,
C. L. Chang,
P. Chaubal,
H. C. Chiang,
P. M. Chichura,
T. -L. Chou,
R. Citron
, et al. (99 additional authors not shown)
Abstract:
We report results from an analysis aimed at detecting the trispectrum of the kinematic Sunyaev-Zel{'}dovich (kSZ) effect by combining data from the South Pole Telescope (SPT) and {\it Herschel}-SPIRE experiments over a 100 ${\rm deg}^{2}$ field. The SPT observations combine data from the previous and current surveys, namely SPTpol and SPT-3G, to achieve depths of 4.5, 3, and 16 $μ{\rm K-arcmin}$ i…
▽ More
We report results from an analysis aimed at detecting the trispectrum of the kinematic Sunyaev-Zel{'}dovich (kSZ) effect by combining data from the South Pole Telescope (SPT) and {\it Herschel}-SPIRE experiments over a 100 ${\rm deg}^{2}$ field. The SPT observations combine data from the previous and current surveys, namely SPTpol and SPT-3G, to achieve depths of 4.5, 3, and 16 $μ{\rm K-arcmin}$ in bands centered at 95, 150, and 220 GHz. For SPIRE, we include data from the 600 and 857 GHz bands. We reconstruct the velocity-induced large-scale correlation of the small-scale kSZ signal with a quadratic estimator that uses two cosmic microwave background (CMB) temperature maps, constructed by optimally combining data from all the frequency bands. We reject the null hypothesis of a zero trispectrum at $10.3σ$ level. However, the measured trispectrum contains contributions from both the kSZ and other undesired components, such as CMB lensing and astrophysical foregrounds, with kSZ being sub-dominant. We use the \textsc{Agora} simulations to estimate the expected signal from CMB lensing and astrophysical foregrounds. After accounting for the contributions from CMB lensing and foreground signals, we do not detect an excess kSZ-only trispectrum and use this non-detection to set constraints on reionization. By applying a prior based on observations of the Gunn-Peterson trough, we obtain an upper limit on the duration of reionization of $Δz_{\rm re, 50} < 4.5$ (95\% C.L). We find these constraints are fairly robust to foregrounds assumptions. This trispectrum measurement is independent of, but consistent with, {\it Planck}'s optical depth measurement. This result is the first constraint on the epoch of reionization using the non-Gaussian nature of the kSZ signal.
△ Less
Submitted 15 August, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Exploration of the polarization angle variability of the Crab Nebula with POLARBEAR and its application to the search for axion-like particles
Authors:
Shunsuke Adachi,
Tylor Adkins,
Carlo Baccigalupi,
Yuji Chinone,
Kevin T. Crowley,
Josquin Errard,
Giulio Fabbian,
Chang Feng,
Takuro Fujino,
Masaya Hasegawa,
Masashi Hazumi,
Oliver Jeong,
Daisuke Kaneko,
Brian Keating,
Akito Kusaka,
Adrian T. Lee,
Anto I. Lonappan,
Yuto Minami,
Masaaki Murata,
Lucio Piccirillo,
Christian L. Reichardt,
Praween Siritanasak,
Jacob Spisak,
Satoru Takakura,
Grant P. Teply
, et al. (1 additional authors not shown)
Abstract:
The Crab Nebula, also known as Tau A, is a polarized astronomical source at millimeter wavelengths. It has been used as a stable light source for polarization angle calibration in millimeter-wave astronomy. However, it is known that its intensity and polarization vary as a function of time at a variety of wavelengths. Thus, it is of interest to verify the stability of the millimeter-wave polarizat…
▽ More
The Crab Nebula, also known as Tau A, is a polarized astronomical source at millimeter wavelengths. It has been used as a stable light source for polarization angle calibration in millimeter-wave astronomy. However, it is known that its intensity and polarization vary as a function of time at a variety of wavelengths. Thus, it is of interest to verify the stability of the millimeter-wave polarization. If detected, polarization variability may be used to better understand the dynamics of Tau~A, and for understanding the validity of Tau~A as a calibrator. One intriguing application of such observation is to use it for the search of axion-like particles (ALPs). Ultralight ALPs couple to photons through a Chern-Simons term, and induce a temporal oscillation in the polarization angle of linearly polarized sources. After assessing a number of systematic errors and testing for internal consistency, we evaluate the variability of the polarization angle of the Crab Nebula using 2015 and 2016 observations with the 150 GHz POLARBEAR instrument. We place a median 95% upper bound of polarization oscillation amplitude $A < 0.065^\circ$ over the oscillation frequencies from $0.75~\mathrm{year}^{-1}$ to $0.66~\mathrm{hour}^{-1}$. Assuming that no sources other than ALP are causing Tau A's polarization angle variation, that the ALP constitutes all the dark matter, and that the ALP field is a stochastic Gaussian field, this bound translates into a median 95% upper bound of ALP-photon coupling $g_{aγγ} < 2.16\times10^{-12}\,\mathrm{GeV}^{-1}\times(m_a/10^{-21} \mathrm{eV})$ in the mass range from $9.9\times10^{-23} \mathrm{eV}$ to $7.7\times10^{-19} \mathrm{eV}$. This demonstrates that this type of analysis using bright polarized sources is as competitive as those using the polarization of cosmic microwave background in constraining ALPs.
△ Less
Submitted 19 September, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Dynamical Model of $J/Ψ$ photo-production on the nucleon
Authors:
S. Sakinah,
T. -S. H. Lee,
Ho-Meoyng Choi
Abstract:
A dynamical model based on a phenomenological charm quark-nucleon($c$-N) potential $v_{cN}$ and the Pomeron-exchange mechanism is constructed to investigate the $J/Ψ$ photo-production on the nucleon from threshold to invariant mass $W=300$ GeV. The $J/Ψ$-N potential,$V_{J/ΨN}(r)$,is constructed by folding $v_{cN}$ into the wavefunction $Φ_{J/Ψ}(c\bar{c})$ of $J/Ψ$ within a Constituent Quark Model(…
▽ More
A dynamical model based on a phenomenological charm quark-nucleon($c$-N) potential $v_{cN}$ and the Pomeron-exchange mechanism is constructed to investigate the $J/Ψ$ photo-production on the nucleon from threshold to invariant mass $W=300$ GeV. The $J/Ψ$-N potential,$V_{J/ΨN}(r)$,is constructed by folding $v_{cN}$ into the wavefunction $Φ_{J/Ψ}(c\bar{c})$ of $J/Ψ$ within a Constituent Quark Model(CQM) of Ref.[43]. A photo-production amplitude is also generated by $v_{cN}$ by a $c\bar{c}$-loop integration over the $γ\rightarrow c\bar{c}$ vertex function and $Φ_{J/Ψ}(c\bar{c})$. No commonly used Vector Meson Dominance assumption is used to define this photo-production amplitude which is needed to describe the data near the threshold. The potential $v_{cN}(r)$ is parameterized in a form such that the predicted $V_{J/ΨN}(r)$ at large distances has the same Yukawa potential form extracted from a Lattice QCD(LQCD) calculation of Ref.[18]. The parameters of $v_{cN}$ are determined by fitting the total cross section data of JLab by performing calculations that include $J/Ψ$-N final state interactions(FSI). The resulting differential cross sections are found in good agreements with the data. It is shown that the FSI effects dominate the cross section in the very near threshold region, allowing for sensitive testing of the predicted $J/Ψ$-N scattering amplitudes. By imposing the constraints of $J/Ψ$-N potential extracted from the LQCD calculation, we have obtained three $J/Ψ$-N potentials which fit the JLab data equally well. The resulting $J/Ψ$-N scattering lengths are in the range of $a=(-0.05$ fm $\sim$ $-0.25$ fm). With the determined $v_{cN}(r)$ and the wavefunctions generated from the same CQM, the constructed model is used to predict the cross sections of photo-production of $η_c(1S)$ and $Ψ(2S)$ mesons for future experimental tests.
△ Less
Submitted 10 April, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Differentially Private Synthetic Data via Foundation Model APIs 2: Text
Authors:
Chulin Xie,
Zinan Lin,
Arturs Backurs,
Sivakanth Gopi,
Da Yu,
Huseyin A Inan,
Harsha Nori,
Haotian Jiang,
Huishuai Zhang,
Yin Tat Lee,
Bo Li,
Sergey Yekhanin
Abstract:
Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalab…
▽ More
Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalable solution. However, existing methods necessitate DP finetuning of large language models (LLMs) on private data to generate DP synthetic data. This approach is not viable for proprietary LLMs (e.g., GPT-3.5) and also demands considerable computational resources for open-source LLMs. Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models. In this work, we propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text. We use API access to an LLM and generate DP synthetic text without any model training. We conduct comprehensive experiments on three benchmark datasets. Our results demonstrate that Aug-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines. This underscores the feasibility of relying solely on API access of LLMs to produce high-quality DP synthetic texts, thereby facilitating more accessible routes to privacy-preserving LLM applications. Our code and data are available at https://github.com/AI-secure/aug-pe.
△ Less
Submitted 23 July, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
A far-ultraviolet-driven photoevaporation flow observed in a protoplanetary disk
Authors:
Olivier Berné,
Emilie Habart,
Els Peeters,
Ilane Schroetter,
Amélie Canin,
Ameek Sidhu,
Ryan Chown,
Emeric Bron,
Thomas J. Haworth,
Pamela Klaassen,
Boris Trahin,
Dries Van De Putte,
Felipe Alarcón,
Marion Zannese,
Alain Abergel,
Edwin A. Bergin,
Jeronimo Bernard-Salas,
Christiaan Boersma,
Jan Cami,
Sara Cuadrado,
Emmanuel Dartois,
Daniel Dicken,
Meriem Elyajouri,
Asunción Fuente,
Javier R. Goicoechea
, et al. (121 additional authors not shown)
Abstract:
Most low-mass stars form in stellar clusters that also contain massive stars, which are sources of far-ultraviolet (FUV) radiation. Theoretical models predict that this FUV radiation produces photo-dissociation regions (PDRs) on the surfaces of protoplanetary disks around low-mass stars, impacting planet formation within the disks. We report JWST and Atacama Large Millimetere Array observations of…
▽ More
Most low-mass stars form in stellar clusters that also contain massive stars, which are sources of far-ultraviolet (FUV) radiation. Theoretical models predict that this FUV radiation produces photo-dissociation regions (PDRs) on the surfaces of protoplanetary disks around low-mass stars, impacting planet formation within the disks. We report JWST and Atacama Large Millimetere Array observations of a FUV-irradiated protoplanetary disk in the Orion Nebula. Emission lines are detected from the PDR; modelling their kinematics and excitation allows us to constrain the physical conditions within the gas. We quantify the mass-loss rate induced by the FUV irradiation, finding it is sufficient to remove gas from the disk in less than a million years. This is rapid enough to affect giant planet formation in the disk.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
The $X$-semiprimeness of Rings
Authors:
Grigore Călugăreanu,
Tsiu-Kwen Lee,
Jerzy Matczuk
Abstract:
For a nonempty subset $X$ of a ring $R$, the ring $R$ is called $X$-semiprime if, given $a\in R$, $aXa=0$ implies $a=0$. This provides a proper class of semiprime rings. First, we clarify the relationship between idempotent semiprime and unit-semiprime rings. Secondly, given a Lie ideal $L$ of a ring $R$, we offer a criterion for $R$ to be $L$-semiprime. For a prime ring $R$, we characterizes Lie…
▽ More
For a nonempty subset $X$ of a ring $R$, the ring $R$ is called $X$-semiprime if, given $a\in R$, $aXa=0$ implies $a=0$. This provides a proper class of semiprime rings. First, we clarify the relationship between idempotent semiprime and unit-semiprime rings. Secondly, given a Lie ideal $L$ of a ring $R$, we offer a criterion for $R$ to be $L$-semiprime. For a prime ring $R$, we characterizes Lie ideals $L$ of $R$ such that $R$ is $L$-semiprime. Moreover, $X$-semiprimeness of matrix rings, prime rings (with a nontrivial idempotent), semiprime rings, regular rings, and subdirect products are studied.
△ Less
Submitted 9 April, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
EmoBench: Evaluating the Emotional Intelligence of Large Language Models
Authors:
Sahand Sabour,
Siyang Liu,
Zheyuan Zhang,
June M. Liu,
Jinfeng Zhou,
Alvionna S. Sunaryo,
Juanzi Li,
Tatia M. C. Lee,
Rada Mihalcea,
Minlie Huang
Abstract:
Recent advances in Large Language Models (LLMs) have highlighted the need for robust, comprehensive, and challenging benchmarks. Yet, research on evaluating their Emotional Intelligence (EI) is considerably limited. Existing benchmarks have two major shortcomings: first, they mainly focus on emotion recognition, neglecting essential EI capabilities such as emotion regulation and thought facilitati…
▽ More
Recent advances in Large Language Models (LLMs) have highlighted the need for robust, comprehensive, and challenging benchmarks. Yet, research on evaluating their Emotional Intelligence (EI) is considerably limited. Existing benchmarks have two major shortcomings: first, they mainly focus on emotion recognition, neglecting essential EI capabilities such as emotion regulation and thought facilitation through emotion understanding; second, they are primarily constructed from existing datasets, which include frequent patterns, explicit information, and annotation errors, leading to unreliable evaluation. We propose EmoBench, a benchmark that draws upon established psychological theories and proposes a comprehensive definition for machine EI, including Emotional Understanding and Emotional Application. EmoBench includes a set of 400 hand-crafted questions in English and Chinese, which are meticulously designed to require thorough reasoning and understanding. Our findings reveal a considerable gap between the EI of existing LLMs and the average human, highlighting a promising direction for future research. Our code and data are publicly available at https://github.com/Sahandfer/EmoBench.
△ Less
Submitted 17 July, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Authors:
Jacky Liang,
Fei Xia,
Wenhao Yu,
Andy Zeng,
Montserrat Gonzalez Arenas,
Maria Attarian,
Maria Bauza,
Matthew Bennice,
Alex Bewley,
Adil Dostmohamed,
Chuyuan Kelly Fu,
Nimrod Gileadi,
Marissa Giustina,
Keerthana Gopalakrishnan,
Leonard Hasenclever,
Jan Humplik,
Jasmine Hsu,
Nikhil Joshi,
Ben Jyenis,
Chase Kew,
Sean Kirmani,
Tsang-Wei Edward Lee,
Kuang-Huei Lee,
Assaf Hurwitz Michaely,
Joss Moore
, et al. (25 additional authors not shown)
Abstract:
Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for o…
▽ More
Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for only as long as it fits within the context size of the LLM, and can be forgotten over longer interactions. In this work, we investigate fine-tuning the robot code-writing LLMs, to remember their in-context interactions and improve their teachability i.e., how efficiently they adapt to human inputs (measured by average number of corrections before the user considers the task successful). Our key observation is that when human-robot interactions are viewed as a partially observable Markov decision process (in which human language inputs are observations, and robot code outputs are actions), then training an LLM to complete previous interactions is training a transition dynamics model -- that can be combined with classic robotics techniques such as model predictive control (MPC) to discover shorter paths to success. This gives rise to Language Model Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments -- improving non-expert teaching success rates of unseen tasks by 26.9% while reducing the average number of human corrections from 2.4 to 1.9. Experiments show that LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. See videos, code, and demos at: https://robot-teaching.github.io/.
△ Less
Submitted 31 May, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
GTC Spectroscopic Surveys of Planetary Nebulae in the Milky Way and M31
Authors:
Xuan Fang,
Haomiao Huang,
Martin A. Guerrero,
Letizia Stanghellini,
Ruben Garcia-Benito,
Ting-Hui Lee,
Yong Zhang
Abstract:
We report spectroscopic surveys of planetary nebulae (PNe) in the Milky Way and Andromeda (M31), using the 10.4-m Gran Telescopio Canarias (GTC). The spectra are of high quality and cover the whole optical range, mostly from 3650 Å to beyond 1 micron, enabling detection of nebular emission lines critical for spectral analysis as well as photoionization modeling. We obtained GTC spectra of 24 compa…
▽ More
We report spectroscopic surveys of planetary nebulae (PNe) in the Milky Way and Andromeda (M31), using the 10.4-m Gran Telescopio Canarias (GTC). The spectra are of high quality and cover the whole optical range, mostly from 3650 Å to beyond 1 micron, enabling detection of nebular emission lines critical for spectral analysis as well as photoionization modeling. We obtained GTC spectra of 24 compact (angular diameter <5 arcsec) PNe located in the Galactic disk, ~3-20 kpc from the Galactic centre, and can be used to constrain stellar evolution models and derive radial abundance gradients of the Milky Way. We have observed 30 PNe in the outer halo of M31 using the GTC. These halo PNe are uniformly metal-rich and probably all evolved from low-mass stars, consistent with the conjecture that they all formed from the metal-rich gas in M31 disk but displaced to their present locations due to galaxy interactions.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Authors:
Soroush Nasiriany,
Fei Xia,
Wenhao Yu,
Ted Xiao,
Jacky Liang,
Ishita Dasgupta,
Annie Xie,
Danny Driess,
Ayzaan Wahid,
Zhuo Xu,
Quan Vuong,
Tingnan Zhang,
Tsang-Wei Edward Lee,
Kuang-Huei Lee,
Peng Xu,
Sean Kirmani,
Yuke Zhu,
Andy Zeng,
Karol Hausman,
Nicolas Heess,
Chelsea Finn,
Sergey Levine,
Brian Ichter
Abstract:
Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce only textual outputs, while robotic control and other spatial tasks require outputting continuous coordinates, actions, or trajectories. How can we ena…
▽ More
Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce only textual outputs, while robotic control and other spatial tasks require outputting continuous coordinates, actions, or trajectories. How can we enable VLMs to handle such settings without fine-tuning on task-specific data?
In this paper, we propose a novel visual prompting approach for VLMs that we call Prompting with Iterative Visual Optimization (PIVOT), which casts tasks as iterative visual question answering. In each iteration, the image is annotated with a visual representation of proposals that the VLM can refer to (e.g., candidate robot actions, localizations, or trajectories). The VLM then selects the best ones for the task. These proposals are iteratively refined, allowing the VLM to eventually zero in on the best available answer. We investigate PIVOT on real-world robotic navigation, real-world manipulation from images, instruction following in simulation, and additional spatial inference tasks such as localization. We find, perhaps surprisingly, that our approach enables zero-shot control of robotic systems without any robot training data, navigation in a variety of environments, and other capabilities. Although current performance is far from perfect, our work highlights potentials and limitations of this new regime and shows a promising approach for Internet-Scale VLMs in robotic and spatial reasoning domains. Website: pivot-prompt.github.io and HuggingFace: https://huggingface.co/spaces/pivot-prompt/pivot-prompt-demo.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Eigenmode Decomposition Method for Full-Wave Modeling of Microring Resonators
Authors:
Yuriy Akimov,
Aswin Alexander Eapen,
Shiyang Zhu,
Doris K. T. Ng,
Nanxi Li,
Woon Leng Loh,
Lennon Y. T. Lee,
Alagappan Gandhi,
Aravind P. Anthur
Abstract:
We develop a theoretical predictive model for an all-pass ring resonator that enables the most complete description of linear coupling regimes. The model is based on eigenmode decomposition of Maxwell's equations with full account of the confined and leaky modes, as opposed to the existing phenomenological methods restricted to the confined modes only. This model enables quantitative description o…
▽ More
We develop a theoretical predictive model for an all-pass ring resonator that enables the most complete description of linear coupling regimes. The model is based on eigenmode decomposition of Maxwell's equations with full account of the confined and leaky modes, as opposed to the existing phenomenological methods restricted to the confined modes only. This model enables quantitative description of all-pass ring resonators and provides insights into the physics underlying microring-waveguide coupling. We experimentally validate the model using transmission measurements in the linear regime of aluminium nitride resonators. The developed model is then used to explore the field enhancement in microrings crucial for nonlinear photonic applications.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Dance-to-Music Generation with Encoder-based Textual Inversion
Authors:
Sifei Li,
Weiming Dong,
Yuxin Zhang,
Fan Tang,
Chongyang Ma,
Oliver Deussen,
Tong-Yee Lee,
Changsheng Xu
Abstract:
The seamless integration of music with dance movements is essential for communicating the artistic intent of a dance piece. This alignment also significantly improves the immersive quality of gaming experiences and animation productions. Although there has been remarkable advancement in creating high-fidelity music from textual descriptions, current methodologies mainly focus on modulating overall…
▽ More
The seamless integration of music with dance movements is essential for communicating the artistic intent of a dance piece. This alignment also significantly improves the immersive quality of gaming experiences and animation productions. Although there has been remarkable advancement in creating high-fidelity music from textual descriptions, current methodologies mainly focus on modulating overall characteristics such as genre and emotional tone. They often overlook the nuanced management of temporal rhythm, which is indispensable in crafting music for dance, since it intricately aligns the musical beats with the dancers' movements. Recognizing this gap, we propose an encoder-based textual inversion technique to augment text-to-music models with visual control, facilitating personalized music generation. Specifically, we develop dual-path rhythm-genre inversion to effectively integrate the rhythm and genre of a dance motion sequence into the textual space of a text-to-music model. Contrary to traditional textual inversion methods, which directly update text embeddings to reconstruct a single target object, our approach utilizes separate rhythm and genre encoders to obtain text embeddings for two pseudo-words, adapting to the varying rhythms and genres. We collect a new dataset called In-the-wild Dance Videos (InDV) and demonstrate that our approach outperforms state-of-the-art methods across multiple evaluation metrics. Furthermore, our method is able to adapt to changes in tempo and effectively integrates with the inherent text-guided generation capability of the pre-trained model. Our source code and demo videos are available at \url{https://github.com/lsfhuihuiff/Dance-to-music_Siggraph_Asia_2024}
△ Less
Submitted 12 September, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Spreading and engulfment of a viscoelastic film onto a Newtonian droplet
Authors:
Chunheng Zhao,
Taehun Lee,
Andreas Carlson
Abstract:
We use the conservative phase-field lattice Boltzmann method to investigate the dynamics when a Newtonian droplet comes in contact with an immiscible viscoelastic liquid film. The dynamics of the three liquid phases are explored through numerical simulations, with a focus on illustrating the contact line dynamics and the viscoelastic effects described by the Oldroyd-B model. The droplet dynamics a…
▽ More
We use the conservative phase-field lattice Boltzmann method to investigate the dynamics when a Newtonian droplet comes in contact with an immiscible viscoelastic liquid film. The dynamics of the three liquid phases are explored through numerical simulations, with a focus on illustrating the contact line dynamics and the viscoelastic effects described by the Oldroyd-B model. The droplet dynamics are contrasted with the case of a Newtonian fluid film. The simulations demonstrate that when the film is viscoelastic, the droplet dynamics become insensitive to the film thickness when the polymer viscosity and relaxation time are large. A viscoelastic ridge forms at the moving contact line, which evolves with a power-law dependence on time. By rescaling the interface profile of the ridge using its height and width, it appears to collapse onto a similar shape. Our findings reveal a strong correlation between the viscoelastic stress and the interface shape near the contact line.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Towards Generating Informative Textual Description for Neurons in Language Models
Authors:
Shrayani Mondal,
Rishabh Garodia,
Arbaaz Qureshi,
Taesung Lee,
Youngja Park
Abstract:
Recent developments in transformer-based language models have allowed them to capture a wide variety of world knowledge that can be adapted to downstream tasks with limited resources. However, what pieces of information are understood in these models is unclear, and neuron-level contributions in identifying them are largely unknown. Conventional approaches in neuron explainability either depend on…
▽ More
Recent developments in transformer-based language models have allowed them to capture a wide variety of world knowledge that can be adapted to downstream tasks with limited resources. However, what pieces of information are understood in these models is unclear, and neuron-level contributions in identifying them are largely unknown. Conventional approaches in neuron explainability either depend on a finite set of pre-defined descriptors or require manual annotations for training a secondary model that can then explain the neurons of the primary model. In this paper, we take BERT as an example and we try to remove these constraints and propose a novel and scalable framework that ties textual descriptions to neurons. We leverage the potential of generative language models to discover human-interpretable descriptors present in a dataset and use an unsupervised approach to explain neurons with these descriptors. Through various qualitative and quantitative analyses, we demonstrate the effectiveness of this framework in generating useful data-specific descriptors with little human involvement in identifying the neurons that encode these descriptors. In particular, our experiment shows that the proposed approach achieves 75% precision@2, and 50% recall@2
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Hyperphosphorylation-Induced Phase Transition in Vesicle Delivery Dynamics of Motor Proteins in Neuronal Cells
Authors:
Eunsang Lee,
Donghee Kim,
Yo Han Song,
Kyujin Shin,
Sanggeun Song,
Minho Lee,
Yeongchang Goh,
Mi Hee Lim,
Ji-Hyun Kim,
Jaeyoung Sung,
Kang Taek Lee
Abstract:
Synaptic vesicle transport by motor proteins along microtubules is a crucial active process underlying neuronal communication. It is known that microtubules are destabilized by tau-hyperphosphorylation, which causes tau proteins to detach from microtubules and form neurofibril tangles. However, how tau-phosphorylation affects transport dynamics of motor proteins on the microtubule remains unknown.…
▽ More
Synaptic vesicle transport by motor proteins along microtubules is a crucial active process underlying neuronal communication. It is known that microtubules are destabilized by tau-hyperphosphorylation, which causes tau proteins to detach from microtubules and form neurofibril tangles. However, how tau-phosphorylation affects transport dynamics of motor proteins on the microtubule remains unknown. Here, we discover that long-distance unidirectional motion of vesicle-motor protein multiplexes (VMPMs) in living cells is suppressed under tau-hyperphosphorylation, with the consequent loss of fast vesicle-transport along the microtubule. The VMPMs in hyperphosphorylated cells exhibit seemingly bidirectional random motion, with dynamic properties far different from VMPM motion in normal cells. We establish a parsimonious physicochemical model of VMPM's active motion that provides a unified, quantitative explanation and predictions for our experimental results. Our analysis reveals that, under hyperphosphorylation conditions, motor-protein-multiplexes have both static and dynamic motility fluctuations. The loss of the fast vesicle-transport along the microtubule can be a mechanism of neurodegenerative disorders associated with tau-hyperphosphorylation.
△ Less
Submitted 23 April, 2024; v1 submitted 27 January, 2024;
originally announced January 2024.
-
Speeding up Fermionic Lattice Calculations with Photonic Accelerated Inverters
Authors:
Felipe Attanasio,
Marc Bauer,
Jelle Dijkstra,
Timoteo Lee,
Jan M. Pawlowski,
Wolfram Pernice
Abstract:
Lattice field theory (LFT) is the standard non-perturbative method to perform numerical calculations of quantum field theory. However, the typical bottleneck of fermionic lattice calculations is the inversion of the Dirac matrix. This inversion is solved by iterative methods, like the conjugate gradient algorithm, where matrix-vector multiplications (MVMs) are the main operation. Photonic integrat…
▽ More
Lattice field theory (LFT) is the standard non-perturbative method to perform numerical calculations of quantum field theory. However, the typical bottleneck of fermionic lattice calculations is the inversion of the Dirac matrix. This inversion is solved by iterative methods, like the conjugate gradient algorithm, where matrix-vector multiplications (MVMs) are the main operation. Photonic integrated circuits excel in performing quick and energy-efficient MVMs, but at the same time, they are known to have low accuracy. This can be overcome by using mixed precision methods. In this paper, we explore the idea of using photonic technology to fulfil the demand for computational power of fermionic lattice calculations. These methods have the potential to reduce computation costs by one order of magnitude. Because of the hybrid nature of these methods, we call these 'photonic accelerated inverters (PAIs)'.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.