-
PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry
Authors:
Linqing Chen,
Weilei Wang,
Zilong Bai,
Peng Xu,
Yan Fang,
Jie Fang,
Wentao Wu,
Lizhi Zhou,
Ruiji Zhang,
Yubin Xia,
Chaobo Xu,
Ran Hu,
Licong Xu,
Qijun Cai,
Haoran Hua,
Jing Sun,
Jin Liu,
Tian Qiu,
Haowen Liu,
Meng Hu,
Xiuwen Li,
Fei Gao,
Yufu Wang,
Lin Tie,
Chaochao Wang
, et al. (11 additional authors not shown)
Abstract:
Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpo…
▽ More
Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpose LLMs often fall short. In this study, we introduce PharmaGPT, a suite of domain specilized LLMs with 13 billion and 70 billion parameters, specifically trained on a comprehensive corpus tailored to the Bio-Pharmaceutical and Chemical domains. Our evaluation shows that PharmaGPT surpasses existing general models on specific-domain benchmarks such as NAPLEX, demonstrating its exceptional capability in domain-specific tasks. Remarkably, this performance is achieved with a model that has only a fraction, sometimes just one-tenth-of the parameters of general-purpose large models. This advancement establishes a new benchmark for LLMs in the bio-pharmaceutical and chemical fields, addressing the existing gap in specialized language modeling. It also suggests a promising path for enhanced research and development, paving the way for more precise and effective NLP applications in these areas.
△ Less
Submitted 9 July, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
The Limits of Water Maser Kinematics: Insights from High-Mass Protostar AFGL 5142-MM1
Authors:
Zulfazli Rosli,
Ross A. Burns,
Affan Adly Nazri,
Koichiro Sugiyama,
Tomoya Hirota,
Kee-Tae Kim,
Yoshinori Yonekura,
Liu Tie,
Gabor Orosz,
James Okwe Chibueze,
Andrey M. Sobolev,
Ji Hyun Kang,
Chang Won Lee,
Jihye Hwang,
Hafieduddin Mohammad,
Norsiah Hashim,
Zamri Zainal Abidin
Abstract:
Multi-epoch VLBI observations measure 3D water maser motions in protostellar outflows, enabling analysis of inclination and velocity. However, these analyses assume that water masers and shock surfaces within outflows are co-propagating. We compared VLBI data on maser-traced bowshocks in high-mass protostar AFGL 5142-MM1, from seven epochs of archival data from the VLBI Exploration of Radio Astrom…
▽ More
Multi-epoch VLBI observations measure 3D water maser motions in protostellar outflows, enabling analysis of inclination and velocity. However, these analyses assume that water masers and shock surfaces within outflows are co-propagating. We compared VLBI data on maser-traced bowshocks in high-mass protostar AFGL 5142-MM1, from seven epochs of archival data from the VLBI Exploration of Radio Astrometry (VERA), obtained from April 2014 to May 2015, and our newly-conducted data from the KVN and VERA Array (KaVA), obtained in March 2016. We find an inconsistency between the expected displacement of the bowshocks and the motions of individual masers. The separation between two opposing bowshocks in AFGL 5142-MM1 was determined to be $337.17\pm0.07~\rm{mas}$ in the KaVA data, which is less than an expected value of $342.1\pm0.7~\rm{mas}$ based on extrapolation of the proper motions of individual maser features measured by VERA. Our measurements imply that the bowshock propagates at a velocity of $24\pm3~\rm{km~s^{-1}}$, while the individual masing gas clumps move at an average velocity of $55\pm5~\rm{km~s^{-1}}$, i.e. the water masers are moving in the outflow direction at double the speed at which the bowshocks are propagating. Our results emphasise that investigations of individual maser features are best approached using short-term high-cadence VLBI monitoring, while long-term monitoring on timescales comparable to the lifetimes of maser features, are better suited to tracing the overall evolution of shock surfaces. Observers should be aware that masers and shock surfaces can move relative to each other, and that this can affect the interpretation of protostellar outflows.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Disorder-induced linear magnetoresistance in Al$_2$O$_3$/SrTiO$_3$ heterostructures
Authors:
Gao Kuang Hong,
Lin Tie,
Ma Xiao Rong,
Li Qiu Lin,
Li Zhi Qing
Abstract:
An unsaturated linear magnetoresistance (LMR) has attracted widely attention because of potential applications and fundamental interest. By controlling growth temperature, we realized a metal-to-insulator transition in Al2O3/SrTiO3 heterostructures. The LMR is observed in metallic samples with electron mobility varying over three orders of magnitude. The observed LMR cannot be explained by the gui…
▽ More
An unsaturated linear magnetoresistance (LMR) has attracted widely attention because of potential applications and fundamental interest. By controlling growth temperature, we realized a metal-to-insulator transition in Al2O3/SrTiO3 heterostructures. The LMR is observed in metallic samples with electron mobility varying over three orders of magnitude. The observed LMR cannot be explained by the guiding center diffusion model even in samples with very high mobility. The slope of the observed LMR is proportional to Hall mobility, and the crossover field, indicating a transition from quadratic (at low fields) to linear (at high fields) field dependence, is proportional to the inverse Hall mobility. This signifies that the classical model is valid to explain the observed LMR. More importantly, we develop an analytical expression according to the effective-medium theory that is equivalent to the classical model. And the analytical expression describes the LMR data very well, confirming the validity of the classical model.
△ Less
Submitted 6 January, 2024; v1 submitted 19 August, 2023;
originally announced August 2023.
-
On Separating Points for Ensemble Controllability
Authors:
Jr-Shin Li,
Wei Zhang,
Lin Tie
Abstract:
Recent years have witnessed a wave of research activities in systems science toward the study of population systems. The driving force behind this shift was geared by numerous emerging and ever-changing technologies in life and physical sciences and engineering, from neuroscience, biology, and quantum physics to robotics, where many control-enabled applications involve manipulating a large ensembl…
▽ More
Recent years have witnessed a wave of research activities in systems science toward the study of population systems. The driving force behind this shift was geared by numerous emerging and ever-changing technologies in life and physical sciences and engineering, from neuroscience, biology, and quantum physics to robotics, where many control-enabled applications involve manipulating a large ensemble of structurally identical dynamic units, or agents. Analyzing fundamental properties of ensemble control systems in turn plays a foundational and critical role in enabling and, further, advancing these applications, and the analysis is largely beyond the capability of classical control techniques. In this paper, we consider an ensemble of time-invariant linear systems evolving on an infinite-dimensional space of continuous functions. We exploit the notion of separating points and techniques of polynomial approximation to develop necessary and sufficient ensemble controllability conditions. In particular, we introduce an extended notion of controllability matrix, called Ensemble Controllability Gramian. This means enables the characterization of ensemble controllability through evaluating controllability of each individual system in the ensemble. As a result, the work provides a unified framework with a systematic procedure for analyzing control systems defined on an infinite-dimensional space by a finite-dimensional approach.
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
Cloud G074.11+00.11: a stellar cluster in formation
Authors:
Mika Saajasto,
Jorma Harju,
Mika Juvela,
Liu Tie,
Qizhou Zhang,
Sheng-Yuan Liu,
Naomi Hirano,
Yuefang Wu,
Kee-Tae Kim,
Ken'ichi Tatematsu,
Ke Wang,
Mark Thompson
Abstract:
We present molecular line and dust continuum observations of a Planck-detected cold cloud, G074.11+00.11. The cloud consists of a system of curved filaments and a central star-forming clump. The clump is associated with several infrared sources and H2O maser emission. We aim to determine the mass distribution and gas dynamics within the clump, to investigate if the filamentary structure seen aroun…
▽ More
We present molecular line and dust continuum observations of a Planck-detected cold cloud, G074.11+00.11. The cloud consists of a system of curved filaments and a central star-forming clump. The clump is associated with several infrared sources and H2O maser emission. We aim to determine the mass distribution and gas dynamics within the clump, to investigate if the filamentary structure seen around the clump repeats itself on a smaller scale, and to estimate the fractions of mass contained in dense cores and filaments. The velocity distribution of pristine dense gas can be used to investigate the global dynamical state of the clump, the role of filamentary inflows, filament fragmentation and core accretion. We use molecular line and continuum observations from single dish observatories and interferometric facilities to study the kinematics of the region. The molecular line observations show that the central clump may have formed as a result of a large-scale filament collision. The central clump contains three compact cores. Assuming a distance of 2.3 kpc, based on Gaia observations and a three-dimensional extinction method of background stars, the mass of the central clump exceeds 700 solar masses, which is roughly 25% of the total mass of the cloud. Our virial analysis suggests that the central clump and all identified substructures are collapsing. We find no evidence for small-scale filaments associated with the cores. Our observations indicate that the clump is fragmented into three cores with masses in the range [10,50] solar masses and that all three are collapsing. The presence of an H2O maser emission suggests active star formation. However, the CO lines show only weak signs of outflows. We suggest that the region is young and any processes leading to star formation have just recently begun.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
On Near-controllability, Nearly-controllable Subspaces, and Near-controllability Index of a Class of Discrete-time Bilinear Systems: A Root Locus Approach
Authors:
Lin Tie
Abstract:
This paper studies near-controllability of a class of discrete-time bilinear systems via a root locus approach. A necessary and sufficient criterion for the systems to be nearly controllable is given. In particular, by using the root locus approach, the control inputs which achieve the state transition for the nearly controllable systems can be computed. Furthermore, for the non-nearly controllabl…
▽ More
This paper studies near-controllability of a class of discrete-time bilinear systems via a root locus approach. A necessary and sufficient criterion for the systems to be nearly controllable is given. In particular, by using the root locus approach, the control inputs which achieve the state transition for the nearly controllable systems can be computed. Furthermore, for the non-nearly controllable systems, nearly-controllable subspaces are derived and near-controllability index is defined. Accordingly, the controllability properties of such class of discrete-time bilinear systems are fully characterized. Finally, examples are provided to demonstrate the results of the paper.
△ Less
Submitted 6 February, 2014; v1 submitted 26 January, 2014;
originally announced January 2014.
-
On Controllability and Near-controllability of Multi-input Discrete-time Bilinear Systems in Dimension Two
Authors:
Lin Tie
Abstract:
This paper completely solves the controllability problems of two-dimensional multi-input discrete-time bilinear systems with and without drift. Necessary and sufficient conditions for controllability, which cover the existing results, are obtained by using an algebraic method. Furthermore, for the uncontrollable systems, near-controllability is studied and necessary and sufficient conditions for t…
▽ More
This paper completely solves the controllability problems of two-dimensional multi-input discrete-time bilinear systems with and without drift. Necessary and sufficient conditions for controllability, which cover the existing results, are obtained by using an algebraic method. Furthermore, for the uncontrollable systems, near-controllability is studied and necessary and sufficient conditions for the systems to be nearly controllable are also presented. Examples are provided to demonstrate the conceptions and results of the paper.
△ Less
Submitted 22 January, 2014;
originally announced January 2014.