-
Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach
Authors:
Yuchen Wu,
Edward Sun,
Kaijie Zhu,
Jianxun Lian,
Jose Hernandez-Orallo,
Aylin Caliskan,
Jindong Wang
Abstract:
Large language models (LLMs) typically generate identical or similar responses for all users given the same prompt, posing serious safety risks in high-stakes applications where user vulnerabilities differ widely. Existing safety evaluations primarily rely on context-independent metrics - such as factuality, bias, or toxicity - overlooking the fact that the same response may carry divergent risks…
▽ More
Large language models (LLMs) typically generate identical or similar responses for all users given the same prompt, posing serious safety risks in high-stakes applications where user vulnerabilities differ widely. Existing safety evaluations primarily rely on context-independent metrics - such as factuality, bias, or toxicity - overlooking the fact that the same response may carry divergent risks depending on the user's background or condition. We introduce personalized safety to fill this gap and present PENGUIN - a benchmark comprising 14,000 scenarios across seven sensitive domains with both context-rich and context-free variants. Evaluating six leading LLMs, we demonstrate that personalized user information significantly improves safety scores by 43.2%, confirming the effectiveness of personalization in safety alignment. However, not all context attributes contribute equally to safety enhancement. To address this, we develop RAISE - a training-free, two-stage agent framework that strategically acquires user-specific background. RAISE improves safety scores by up to 31.6% over six vanilla LLMs, while maintaining a low interaction cost of just 2.7 user queries on average. Our findings highlight the importance of selective information gathering in safety-critical domains and offer a practical solution for personalizing LLM responses without model retraining. This work establishes a foundation for safety research that adapts to individual user contexts rather than assuming a universal harm standard.
△ Less
Submitted 29 May, 2025; v1 submitted 24 May, 2025;
originally announced May 2025.
-
Training People to Reward Robots
Authors:
Endong Sun,
Yuqing Zhu,
Matthew Howard
Abstract:
Learning from demonstration (LfD) is a technique that allows expert teachers to teach task-oriented skills to robotic systems. However, the most effective way of guiding novice teachers to approach expert-level demonstrations quantitatively for specific teaching tasks remains an open question. To this end, this paper investigates the use of machine teaching (MT) to guide novice teachers to improve…
▽ More
Learning from demonstration (LfD) is a technique that allows expert teachers to teach task-oriented skills to robotic systems. However, the most effective way of guiding novice teachers to approach expert-level demonstrations quantitatively for specific teaching tasks remains an open question. To this end, this paper investigates the use of machine teaching (MT) to guide novice teachers to improve their teaching skills based on reinforcement learning from demonstration (RLfD). The paper reports an experiment in which novices receive MT-derived guidance to train their ability to teach a given motor skill with only 8 demonstrations and generalise this to previously unseen ones. Results indicate that the MT-guidance not only enhances robot learning performance by 89% on the training skill but also causes a 70% improvement in robot learning performance on skills not seen by subjects during training. These findings highlight the effectiveness of MT-guidance in upskilling human teaching behaviours, ultimately improving demonstration quality in RLfD.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Enhancing Product Search Interfaces with Sketch-Guided Diffusion and Language Agents
Authors:
Edward Sun
Abstract:
The rapid progress in diffusion models, transformers, and language agents has unlocked new possibilities, yet their potential in user interfaces and commercial applications remains underexplored. We present Sketch-Search Agent, a novel framework that transforms the image search experience by integrating a multimodal language agent with freehand sketches as control signals for diffusion models. Usi…
▽ More
The rapid progress in diffusion models, transformers, and language agents has unlocked new possibilities, yet their potential in user interfaces and commercial applications remains underexplored. We present Sketch-Search Agent, a novel framework that transforms the image search experience by integrating a multimodal language agent with freehand sketches as control signals for diffusion models. Using the T2I-Adapter, Sketch-Search Agent combines sketches and text prompts to generate high-quality query images, encoded via a CLIP image encoder for efficient matching against an image corpus. Unlike existing methods, Sketch-Search Agent requires minimal setup, no additional training, and excels in sketch-based image retrieval and natural language interactions. The multimodal agent enhances user experience by dynamically retaining preferences, ranking results, and refining queries for personalized recommendations. This interactive design empowers users to create sketches and receive tailored product suggestions, showcasing the potential of diffusion models in user-centric image retrieval. Experiments confirm Sketch-Search Agent's high accuracy in delivering relevant product search results.
△ Less
Submitted 21 March, 2025;
originally announced April 2025.
-
A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities
Authors:
Enzhe Sun,
Yongchuan Cui,
Peng Liu,
Jining Yan
Abstract:
Hardware limitations and satellite launch costs make direct acquisition of high temporal-spatial resolution remote sensing imagery challenging. Remote sensing spatiotemporal fusion (STF) technology addresses this problem by merging high temporal but low spatial resolution imagery with high spatial but low temporal resolution imagery to efficiently generate high spatiotemporal resolution satellite…
▽ More
Hardware limitations and satellite launch costs make direct acquisition of high temporal-spatial resolution remote sensing imagery challenging. Remote sensing spatiotemporal fusion (STF) technology addresses this problem by merging high temporal but low spatial resolution imagery with high spatial but low temporal resolution imagery to efficiently generate high spatiotemporal resolution satellite images. STF provides unprecedented observational capabilities for land surface change monitoring, agricultural management, and environmental research. Deep learning (DL) methods have revolutionized the remote sensing spatiotemporal fusion field over the past decade through powerful automatic feature extraction and nonlinear modeling capabilities, significantly outperforming traditional methods in handling complex spatiotemporal data. Despite the rapid development of DL-based remote sensing STF, the community lacks a systematic review of this quickly evolving field. This paper comprehensively reviews DL developments in remote sensing STF over the last decade, analyzing key research trends, method classifications, commonly used datasets, and evaluation metrics. It discusses major challenges in existing research and identifies promising future research directions as references for researchers in this field to inspire new ideas. The specific models, datasets, and other information mentioned in this article have been collected in: https://github.com/yc-cui/Deep-Learning-Spatiotemporal-Fusion-Survey.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Online Stochastic Matching with Unknown Arrival Order: Beating $0.5$ against the Online Optimum
Authors:
Enze Sun,
Zhihao Gavin Tang,
Yifan Wang
Abstract:
We study the online stochastic matching problem. Against the offline benchmark, Feldman, Gravin, and Lucier (SODA 2015) designed an optimal $0.5$-competitive algorithm. A recent line of work, initiated by Papadimitriou, Pollner, Saberi, and Wajc (MOR 2024), focuses on designing approximation algorithms against the online optimum. The online benchmark allows positive results surpassing the $0.5$ ra…
▽ More
We study the online stochastic matching problem. Against the offline benchmark, Feldman, Gravin, and Lucier (SODA 2015) designed an optimal $0.5$-competitive algorithm. A recent line of work, initiated by Papadimitriou, Pollner, Saberi, and Wajc (MOR 2024), focuses on designing approximation algorithms against the online optimum. The online benchmark allows positive results surpassing the $0.5$ ratio.
In this work, adapting the order-competitive analysis by Ezra, Feldman, Gravin, and Tang (SODA 2023), we design a $0.5+Ω(1)$ order-competitive algorithm against the online benchmark with unknown arrival order. Our algorithm is significantly different from existing ones, as the known arrival order is crucial to the previous approximation algorithms.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Efficient Continual Adaptation of Pretrained Robotic Policy with Online Meta-Learned Adapters
Authors:
Ruiqi Zhu,
Endong Sun,
Guanhe Huang,
Oya Celiktutan
Abstract:
Continual adaptation is essential for general autonomous agents. For example, a household robot pretrained with a repertoire of skills must still adapt to unseen tasks specific to each household. Motivated by this, building upon parameter-efficient fine-tuning in language models, prior works have explored lightweight adapters to adapt pretrained policies, which can preserve learned features from t…
▽ More
Continual adaptation is essential for general autonomous agents. For example, a household robot pretrained with a repertoire of skills must still adapt to unseen tasks specific to each household. Motivated by this, building upon parameter-efficient fine-tuning in language models, prior works have explored lightweight adapters to adapt pretrained policies, which can preserve learned features from the pretraining phase and demonstrate good adaptation performances. However, these approaches treat task learning separately, limiting knowledge transfer between tasks. In this paper, we propose Online Meta-Learned adapters (OMLA). Instead of applying adapters directly, OMLA can facilitate knowledge transfer from previously learned tasks to current learning tasks through a novel meta-learning objective. Extensive experiments in both simulated and real-world environments demonstrate that OMLA can lead to better adaptation performances compared to the baseline methods. The project link: https://ricky-zhu.github.io/OMLA/.
△ Less
Submitted 27 March, 2025; v1 submitted 24 March, 2025;
originally announced March 2025.
-
TradingAgents: Multi-Agents LLM Financial Trading Framework
Authors:
Yijia Xiao,
Edward Sun,
Di Luo,
Wei Wang
Abstract:
Significant progress has been made in automated problem-solving using societies of agents powered by large language models (LLMs). In finance, efforts have largely focused on single-agent systems handling specific tasks or multi-agent frameworks independently gathering data. However, the multi-agent systems' potential to replicate real-world trading firms' collaborative dynamics remains underexplo…
▽ More
Significant progress has been made in automated problem-solving using societies of agents powered by large language models (LLMs). In finance, efforts have largely focused on single-agent systems handling specific tasks or multi-agent frameworks independently gathering data. However, the multi-agent systems' potential to replicate real-world trading firms' collaborative dynamics remains underexplored. TradingAgents proposes a novel stock trading framework inspired by trading firms, featuring LLM-powered agents in specialized roles such as fundamental analysts, sentiment analysts, technical analysts, and traders with varied risk profiles. The framework includes Bull and Bear researcher agents assessing market conditions, a risk management team monitoring exposure, and traders synthesizing insights from debates and historical data to make informed decisions. By simulating a dynamic, collaborative trading environment, this framework aims to improve trading performance. Detailed architecture and extensive experiments reveal its superiority over baseline models, with notable improvements in cumulative returns, Sharpe ratio, and maximum drawdown, highlighting the potential of multi-agent LLM frameworks in financial trading. TradingAgents is available at https://github.com/TauricResearch/TradingAgents.
△ Less
Submitted 3 June, 2025; v1 submitted 28 December, 2024;
originally announced December 2024.
-
Algorithmic Phase Transitions in Language Models: A Mechanistic Case Study of Arithmetic
Authors:
Alan Sun,
Ethan Sun,
Warren Shepard
Abstract:
Zero-shot capabilities of large language models make them powerful tools for solving a range of tasks without explicit training. It remains unclear, however, how these models achieve such performance, or why they can zero-shot some tasks but not others. In this paper, we shed some light on this phenomenon by defining and investigating algorithmic stability in language models -- changes in problem-…
▽ More
Zero-shot capabilities of large language models make them powerful tools for solving a range of tasks without explicit training. It remains unclear, however, how these models achieve such performance, or why they can zero-shot some tasks but not others. In this paper, we shed some light on this phenomenon by defining and investigating algorithmic stability in language models -- changes in problem-solving strategy employed by the model as a result of changes in task specification. We focus on a task where algorithmic stability is needed for generalization: two-operand arithmetic. Surprisingly, we find that Gemma-2-2b employs substantially different computational models on closely related subtasks, i.e. four-digit versus eight-digit addition. Our findings suggest that algorithmic instability may be a contributing factor to language models' poor zero-shot performance across certain logical reasoning tasks, as they struggle to abstract different problem-solving strategies and smoothly transition between them.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
RNA-GPT: Multimodal Generative System for RNA Sequence Understanding
Authors:
Yijia Xiao,
Edward Sun,
Yiqiao Jin,
Wei Wang
Abstract:
RNAs are essential molecules that carry genetic information vital for life, with profound implications for drug development and biotechnology. Despite this importance, RNA research is often hindered by the vast literature available on the topic. To streamline this process, we introduce RNA-GPT, a multi-modal RNA chat model designed to simplify RNA discovery by leveraging extensive RNA literature.…
▽ More
RNAs are essential molecules that carry genetic information vital for life, with profound implications for drug development and biotechnology. Despite this importance, RNA research is often hindered by the vast literature available on the topic. To streamline this process, we introduce RNA-GPT, a multi-modal RNA chat model designed to simplify RNA discovery by leveraging extensive RNA literature. RNA-GPT integrates RNA sequence encoders with linear projection layers and state-of-the-art large language models (LLMs) for precise representation alignment, enabling it to process user-uploaded RNA sequences and deliver concise, accurate responses. Built on a scalable training pipeline, RNA-GPT utilizes RNA-QA, an automated system that gathers RNA annotations from RNACentral using a divide-and-conquer approach with GPT-4o and latent Dirichlet allocation (LDA) to efficiently handle large datasets and generate instruction-tuning samples. Our experiments indicate that RNA-GPT effectively addresses complex RNA queries, thereby facilitating RNA research. Additionally, we present RNA-QA, a dataset of 407,616 RNA samples for modality alignment and instruction tuning, further advancing the potential of RNA research tools.
△ Less
Submitted 29 October, 2024;
originally announced November 2024.
-
Flexible Thermoelectric Active Cooling Garment to Combat Extreme Heat
Authors:
Tianshi Feng,
Jiedong Wang,
Ethan Sun,
Antonio Di Buono,
Renkun Chen
Abstract:
With the increasing frequency, intensity, and duration of extreme heat events due to climate change, heat-related diseases or even mortality have become more prevalent. An efficient personal cooling strategy can mitigate heat stress by regulating the skin temperature within the thermal comfort zone. However, lightweight, wearable, and sustainable cooling garments are unavailable today. Here, we de…
▽ More
With the increasing frequency, intensity, and duration of extreme heat events due to climate change, heat-related diseases or even mortality have become more prevalent. An efficient personal cooling strategy can mitigate heat stress by regulating the skin temperature within the thermal comfort zone. However, lightweight, wearable, and sustainable cooling garments are unavailable today. Here, we developed a TED-based cooling garment and demonstrated its effectiveness in active personal cooling. The garment is shown to maintain the skin temperature within its thermal comfort zone in a hot environment of up to 40 oC under mild forced convection conditions (air flow speed of 2.2 m s-1). Furthermore, we demonstrated a portable cooling system with less than 700 grams of total weight, which includes the TED-based garment, a battery pack, and a temperature controller. The system showed long-term cooling on the skin with varying ambient temperatures from 35 to 40 oC. With the advantages of lightweight, flexible, controllable and long-term effective cooling, the TED cooling garments described in this work can contribute to enhanced health and comfort in an increasingly hotter climate.
△ Less
Submitted 1 December, 2024; v1 submitted 13 November, 2024;
originally announced November 2024.
-
Reconstructing East Asian Temperatures from 1368 to 1911 Using Historical Documents, Climate Models, and Data Assimilation
Authors:
Eric Sun,
Kuan-hui Elaine Lin,
Wan-Ling Tseng,
Pao K. Wang,
Hsin-Cheng Huang
Abstract:
We propose a novel approach for reconstructing annual temperatures in East Asia from 1368 to 1911, leveraging the Reconstructed East Asian Climate Historical Encoded Series (REACHES). The lack of instrumental data during this period poses significant challenges to understanding past climate conditions. REACHES digitizes historical documents from the Ming and Qing dynasties of China, converting qua…
▽ More
We propose a novel approach for reconstructing annual temperatures in East Asia from 1368 to 1911, leveraging the Reconstructed East Asian Climate Historical Encoded Series (REACHES). The lack of instrumental data during this period poses significant challenges to understanding past climate conditions. REACHES digitizes historical documents from the Ming and Qing dynasties of China, converting qualitative descriptions into a four-level ordinal temperature scale. However, these index-based data are biased toward abnormal or extreme weather phenomena, leading to data gaps that likely correspond to normal conditions. To address this bias and reconstruct historical temperatures at any point within East Asia, including locations without direct historical data, we employ a three-tiered statistical framework. First, we perform kriging to interpolate temperature data across East Asia, adopting a zero-mean assumption to handle missing information. Next, we utilize the Last Millennium Ensemble (LME) reanalysis data and apply quantile mapping to calibrate the kriged REACHES data to Celsius temperature scales. Finally, we introduce a novel Bayesian data assimilation method that integrates the kriged Celsius data with LME simulations to enhance reconstruction accuracy. We model the LME data at each geographic location using a flexible nonstationary autoregressive time series model and employ regularized maximum likelihood estimation with a fused lasso penalty. The resulting dynamic distribution serves as a prior, which is refined via Kalman filtering by incorporating the kriged Celsius REACHES data to yield posterior temperature estimates. This comprehensive integration of historical documentation, contemporary climate models, and advanced statistical methods improves the accuracy of historical temperature reconstructions and provides a crucial resource for future environmental and climate studies.
△ Less
Submitted 18 January, 2025; v1 submitted 29 October, 2024;
originally announced October 2024.
-
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization
Authors:
Jiawei Liu,
Fanrui Zhang,
Jiaying Zhu,
Esther Sun,
Qiang Zhang,
Zheng-Jun Zha
Abstract:
Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of Image Forgery Detection and Localization (IFDL). Moreover, existing IFDL methods are typically limited to the learning of low-level semantic-agnostic clues and…
▽ More
Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of Image Forgery Detection and Localization (IFDL). Moreover, existing IFDL methods are typically limited to the learning of low-level semantic-agnostic clues and merely provide a single outcome judgment. To tackle these issues, we propose ForgeryGPT, a novel framework that advances the IFDL task by capturing high-order forensics knowledge correlations of forged images from diverse linguistic feature spaces, while enabling explainable generation and interactive dialogue through a newly customized Large Language Model (LLM) architecture. Specifically, ForgeryGPT enhances traditional LLMs by integrating the Mask-Aware Forgery Extractor, which enables the excavating of precise forgery mask information from input images and facilitating pixel-level understanding of tampering artifacts. The Mask-Aware Forgery Extractor consists of a Forgery Localization Expert (FL-Expert) and a Mask Encoder, where the FL-Expert is augmented with an Object-agnostic Forgery Prompt and a Vocabulary-enhanced Vision Encoder, allowing for effectively capturing of multi-scale fine-grained forgery details. To enhance its performance, we implement a three-stage training strategy, supported by our designed Mask-Text Alignment and IFDL Task-Specific Instruction Tuning datasets, which align vision-language modalities and improve forgery detection and instruction-following capabilities. Extensive experiments demonstrate the effectiveness of the proposed method.
△ Less
Submitted 6 January, 2025; v1 submitted 14 October, 2024;
originally announced October 2024.
-
Beyond CCDs: Characterization of sCMOS detectors for optical astronomy
Authors:
Aditya Khandelwal,
Sarik Jeram,
Ryan Dungee,
Albert W. K. Lau,
Allison Lau,
Ethen Sun,
Phil Van-Lane,
Shaojie Chen,
Aaron Tohuvavohu,
Ting S. Li
Abstract:
Modern scientific complementary metal-oxide semiconductor (sCMOS) detectors provide a highly competitive alternative to charge-coupled devices (CCDs), the latter of which have historically been dominant in optical imaging. sCMOS boast comparable performances to CCDs with faster frame rates, lower read noise, and a higher dynamic range. Furthermore, their lower production costs are shifting the ind…
▽ More
Modern scientific complementary metal-oxide semiconductor (sCMOS) detectors provide a highly competitive alternative to charge-coupled devices (CCDs), the latter of which have historically been dominant in optical imaging. sCMOS boast comparable performances to CCDs with faster frame rates, lower read noise, and a higher dynamic range. Furthermore, their lower production costs are shifting the industry to abandon CCD support and production in favour of CMOS, making their characterization urgent. In this work, we characterized a variety of high-end commercially available sCMOS detectors to gauge the state of this technology in the context of applications in optical astronomy. We evaluated a range of sCMOS detectors, including larger pixel models such as the Teledyne Prime 95B and the Andor Sona-11, which are similar to CCDs in pixel size and suitable for wide-field astronomy. Additionally, we assessed smaller pixel detectors like the Ximea xiJ and Andor Sona-6, which are better suited for deep-sky imaging. Furthermore, high-sensitivity quantitative sCMOS detectors such as the Hamamatsu Orca-Quest C15550-20UP, capable of resolving individual photoelectrons, were also tested. In-lab testing showed low levels of dark current, read noise, faulty pixels, and fixed pattern noise, as well as linearity levels above $98\%$ across all detectors. The Orca-Quest had particularly low noise levels with a dark current of $0.0067 \pm 0.0003$ e$^-$/s (at $-20^\circ$C with air cooling) and a read noise of $0.37 \pm 0.09$ e$^-$ using its standard readout mode. Our tests revealed that the latest generation of sCMOS detectors excels in optical imaging performance, offering a more accessible alternative to CCDs for future optical astronomy instruments.
△ Less
Submitted 6 December, 2024; v1 submitted 24 September, 2024;
originally announced September 2024.
-
Using Machine Teaching to Boost Novices' Robot Teaching Skill
Authors:
Yuqing Zhu,
Endong Sun,
Matthew Howard
Abstract:
Recent evidence has shown that, contrary to expectations, it is difficult for users, especially novices, to teach robots tasks through LfD. This paper introduces a framework that leverages MT algorithms to train novices to become better teachers of robots, and verifies whether such teaching ability is retained beyond the period of training and generalises such that novices teach robots more effect…
▽ More
Recent evidence has shown that, contrary to expectations, it is difficult for users, especially novices, to teach robots tasks through LfD. This paper introduces a framework that leverages MT algorithms to train novices to become better teachers of robots, and verifies whether such teaching ability is retained beyond the period of training and generalises such that novices teach robots more effectively, even for skills for which training has not been received. A between-subjects study is reported, in which novice teachers are asked to teach simple motor skills to a robot. The results demonstrate that subjects that receive training show average 78.83% improvement in teaching ability (as measured by accuracy of the skill learnt by the robot), and average 63.69% improvement in the teaching of new skills not included as part of the training.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Target word activity detector: An approach to obtain ASR word boundaries without lexicon
Authors:
Sunit Sivasankaran,
Eric Sun,
Jinyu Li,
Yan Huang,
Jing Pan
Abstract:
Obtaining word timestamp information from end-to-end (E2E) ASR models remains challenging due to the lack of explicit time alignment during training. This issue is further complicated in multilingual models. Existing methods, either rely on lexicons or introduce additional tokens, leading to scalability issues and increased computational costs. In this work, we propose a new approach to estimate w…
▽ More
Obtaining word timestamp information from end-to-end (E2E) ASR models remains challenging due to the lack of explicit time alignment during training. This issue is further complicated in multilingual models. Existing methods, either rely on lexicons or introduce additional tokens, leading to scalability issues and increased computational costs. In this work, we propose a new approach to estimate word boundaries without relying on lexicons. Our method leverages word embeddings from sub-word token units and a pretrained ASR model, requiring only word alignment information during training. Our proposed method can scale-up to any number of languages without incurring any additional cost. We validate our approach using a multilingual ASR model trained on five languages and demonstrate its effectiveness against a strong baseline.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Stochastic Online Correlated Selection
Authors:
Ziyun Chen,
Zhiyi Huang,
Enze Sun
Abstract:
We study Stochastic Online Correlated Selection (SOCS), a family of online rounding algorithms for Non-IID Stochastic Online Submodular Welfare Maximization and special cases such as Online Stochastic Matching, Stochastic AdWords, and Stochastic Display Ads. At each step, the algorithm sees an online item's type and fractional allocation, then immediately allocates it to an agent. We propose a met…
▽ More
We study Stochastic Online Correlated Selection (SOCS), a family of online rounding algorithms for Non-IID Stochastic Online Submodular Welfare Maximization and special cases such as Online Stochastic Matching, Stochastic AdWords, and Stochastic Display Ads. At each step, the algorithm sees an online item's type and fractional allocation, then immediately allocates it to an agent. We propose a metric called the convergence rate for the quality of SOCS. This is cleaner than most metrics in the OCS literature.
We propose a Type Decomposition that reduces SOCS to the two-way special case. First, we sample a surrogate type with half-integer allocation. The rounding is trivial for a one-way type fully allocated to an agent. For a two-way type split equally between two agents, we round it using two-way SOCS. We design the distribution of surrogate types to get two-way types as often as possible while respecting the original fractional allocation in expectation.
Following this framework, we make progress on numerous problems:
1) Online Stochastic Matching: We improve the state-of-the-art $0.666$ competitive ratio for unweighted/vertex-weighted matching to $0.69$.
2) Query-Commit Matching: We enhance the ratio to $0.705$ in the Query-Commit model, improving the best previous $0.696$ and $0.662$ for unweighted and vertex-weighted matching.
3) Stochastic AdWords: We give a $0.6338$ competitive algorithm, breaking the $1-\frac{1}{e}$ barrier and answering a decade-old open question.
4) AdWords: The framework applies to the adversarial model if the rounding is oblivious to future items' distributions. We get the first multi-way OCS for AdWords, addressing an open question about OCS. This gives a $0.504$ competitive ratio for AdWords, improving the previous $0.501$.
5) Stochastic Display Ads: We design a $0.644$ competitive algorithm, breaking the $1-\frac{1}{e}$ barrier.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding
Authors:
Yijia Xiao,
Edward Sun,
Yiqiao Jin,
Qifan Wang,
Wei Wang
Abstract:
Understanding biological processes, drug development, and biotechnological advancements requires a detailed analysis of protein structures and functions, a task that is inherently complex and time-consuming in traditional protein research. To streamline this process, we introduce ProteinGPT, a state-of-the-art multimodal large language model for proteins that enables users to upload protein sequen…
▽ More
Understanding biological processes, drug development, and biotechnological advancements requires a detailed analysis of protein structures and functions, a task that is inherently complex and time-consuming in traditional protein research. To streamline this process, we introduce ProteinGPT, a state-of-the-art multimodal large language model for proteins that enables users to upload protein sequences and/or structures for comprehensive analysis and responsive inquiries. ProteinGPT integrates protein sequence and structure encoders with linear projection layers to ensure precise representation adaptation and leverages a large language model (LLM) to generate accurate, contextually relevant responses. To train ProteinGPT, we constructed a large-scale dataset of 132,092 proteins, each annotated with 20-30 property tags and 5-10 QA pairs per protein, and optimized the instruction-tuning process using GPT-4o. Experiments demonstrate that ProteinGPT effectively generates informative responses to protein-related questions, achieving high performance on both semantic and lexical metrics and significantly outperforming baseline models and general-purpose LLMs in understanding and responding to protein-related queries. Our code and data are available at https://github.com/ProteinGPT/ProteinGPT.
△ Less
Submitted 17 April, 2025; v1 submitted 21 August, 2024;
originally announced August 2024.
-
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Authors:
Chun Xu,
En-Wei Sun
Abstract:
An increasing number of Chinese people are troubled by different degrees of visual impairment, which has made the modal conversion between a single image or video frame in the visual field and the audio expressing the same information a research hotspot. Deep learning technologies such as OCR+Vocoder and Im2Wav enable English audio synthesis or image-to-sound matching in a self-supervised manner.…
▽ More
An increasing number of Chinese people are troubled by different degrees of visual impairment, which has made the modal conversion between a single image or video frame in the visual field and the audio expressing the same information a research hotspot. Deep learning technologies such as OCR+Vocoder and Im2Wav enable English audio synthesis or image-to-sound matching in a self-supervised manner. However, the audio data used for training is limited and English is not universal for visually impaired people with different educational levels. Therefore, for the sake of solving the problems of data volume and language applicability to improve the reading efficiency of visually impaired people, a set of image-to-speech framework CLIP-KNN-Fastspeech2 based on the Chinese context was constructed. The framework integrates multiple basic models and adopts the strategy of independent pre-training and joint fine-tuning. First, the Chinese CLIP and Fastspeech2 text-to-speech models were pre-trained on two public datasets, MUGE and Baker, respectively, and their convergence was verified. Subsequently, joint fine-tuning was performed using a self-built Braille image dataset. Experimental results on multiple public datasets such as VGGSound, Flickr8k, ImageHear, and the self-built Braille dataset BIT-DP show that the model has improved objective indicators such as BLEU4,FAD(Fréchet Audio Distance), WER(Word Error Ratio), and even inference speed. This verifies that the constructed model still has the ability to synthesize high-quality speech under limited data, and also proves the effectiveness of the joint training strategy that integrates multiple basic models.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts
Authors:
Yijia Xiao,
Edward Sun,
Tianyu Liu,
Wei Wang
Abstract:
We propose LogicVista, an evaluation benchmark that assesses the integrated logical reasoning capabilities of multimodal large language models (MLLMs) in Visual contexts. Recent advancements in MLLMs have demonstrated various fascinating abilities, from crafting poetry based on an image to performing mathematical reasoning. However, there is still a lack of systematic evaluation of MLLMs' proficie…
▽ More
We propose LogicVista, an evaluation benchmark that assesses the integrated logical reasoning capabilities of multimodal large language models (MLLMs) in Visual contexts. Recent advancements in MLLMs have demonstrated various fascinating abilities, from crafting poetry based on an image to performing mathematical reasoning. However, there is still a lack of systematic evaluation of MLLMs' proficiency in logical reasoning tasks, which are essential for activities like navigation and puzzle-solving. Thus we evaluate general logical cognition abilities across 5 logical reasoning tasks encompassing 9 different capabilities, using a sample of 448 multiple-choice questions. Each question is annotated with the correct answer and the human-written reasoning behind the selection, enabling both open-ended and multiple-choice evaluation. A total of 8 MLLMs are comprehensively evaluated using LogicVista. Code and Data Available at https://github.com/Yijia-Xiao/LogicVista.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Towards Improving Learning from Demonstration Algorithms via MCMC Methods
Authors:
Carl Qi,
Edward Sun,
Harry Zhang
Abstract:
Behavioral cloning, or more broadly, learning from demonstrations (LfD) is a priomising direction for robot policy learning in complex scenarios. Albeit being straightforward to implement and data-efficient, behavioral cloning has its own drawbacks, limiting its efficacy in real robot setups. In this work, we take one step towards improving learning from demonstration algorithms by leveraging impl…
▽ More
Behavioral cloning, or more broadly, learning from demonstrations (LfD) is a priomising direction for robot policy learning in complex scenarios. Albeit being straightforward to implement and data-efficient, behavioral cloning has its own drawbacks, limiting its efficacy in real robot setups. In this work, we take one step towards improving learning from demonstration algorithms by leveraging implicit energy-based policy models. Results suggest that in selected complex robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used neural network-based explicit models, especially in the cases of approximating potentially discontinuous and multimodal functions.
△ Less
Submitted 23 May, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Authors:
Julien Piet,
Maha Alrashed,
Chawin Sitawarin,
Sizhe Chen,
Zeming Wei,
Elizabeth Sun,
Basel Alomair,
David Wagner
Abstract:
Large Language Models (LLMs) are attracting significant research attention due to their instruction-following abilities, allowing users and developers to leverage LLMs for a variety of tasks. However, LLMs are vulnerable to prompt-injection attacks: a class of attacks that hijack the model's instruction-following abilities, changing responses to prompts to undesired, possibly malicious ones. In th…
▽ More
Large Language Models (LLMs) are attracting significant research attention due to their instruction-following abilities, allowing users and developers to leverage LLMs for a variety of tasks. However, LLMs are vulnerable to prompt-injection attacks: a class of attacks that hijack the model's instruction-following abilities, changing responses to prompts to undesired, possibly malicious ones. In this work, we introduce Jatmo, a method for generating task-specific models resilient to prompt-injection attacks. Jatmo leverages the fact that LLMs can only follow instructions once they have undergone instruction tuning. It harnesses a teacher instruction-tuned model to generate a task-specific dataset, which is then used to fine-tune a base model (i.e., a non-instruction-tuned model). Jatmo only needs a task prompt and a dataset of inputs for the task: it uses the teacher model to generate outputs. For situations with no pre-existing datasets, Jatmo can use a single example, or in some cases none at all, to produce a fully synthetic dataset. Our experiments on seven tasks show that Jatmo models provide similar quality of outputs on their specific task as standard LLMs, while being resilient to prompt injections. The best attacks succeeded in less than 0.5% of cases against our models, versus 87% success rate against GPT-3.5-Turbo. We release Jatmo at https://github.com/wagner-group/prompt-injection-defense.
△ Less
Submitted 8 January, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
The 4m International Liquid Mirror Telescope: a brief history and some preliminary scientific results
Authors:
Jean Surdej,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Kuntal Misra,
Vibhore Negi,
Anna Pospieszalska-Surdej,
Kumar Pranshu,
Ethen Sun
Abstract:
The present article is based upon an invited talk delivered at the occasion of the inauguration of the 4m International Liquid Mirror Telescope (ILMT) which took place in Devasthal (ARIES, Uttarakhand, India) on 21st of March 2023. We present hereafter a short history of the liquid mirror telescopes and in particular of the 4m ILMT which is the first liquid mirror telescope entirely dedicated to a…
▽ More
The present article is based upon an invited talk delivered at the occasion of the inauguration of the 4m International Liquid Mirror Telescope (ILMT) which took place in Devasthal (ARIES, Uttarakhand, India) on 21st of March 2023. We present hereafter a short history of the liquid mirror telescopes and in particular of the 4m ILMT which is the first liquid mirror telescope entirely dedicated to astrophysical observations. We discuss a few preliminary scientific results and illustrate some direct CCD images taken during the first commissioning phase of the telescope. We invite the reader to refer to the series of ILMT poster papers published in these same proceedings of the BINA3 workshop for more details about the instrument, operation, first observations, performance and scientific results.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
SunPhot: Preparations for an upcoming quasar variability survey with the International Liquid Mirror Telescope
Authors:
Ethen Sun,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Kuntal Misra,
Vibhore Negi,
Kumar Pranshu,
Jean Surdej
Abstract:
Recent research suggests a correlation between the variability and intrinsic brightness of quasars. If calibrated, this could lead to the use of quasars on the cosmic distance ladder, but this work is currently limited by lack of quasar light curve data with high cadence and precision. The Python photometric data pipeline SunPhot is being developed as part of preparations for an upcoming quasar va…
▽ More
Recent research suggests a correlation between the variability and intrinsic brightness of quasars. If calibrated, this could lead to the use of quasars on the cosmic distance ladder, but this work is currently limited by lack of quasar light curve data with high cadence and precision. The Python photometric data pipeline SunPhot is being developed as part of preparations for an upcoming quasar variability survey with the International Liquid Mirror Telescope (ILMT). SunPhot uses aperture photometry to directly extract light curves for a catalogue of sources from calibrated ILMT images. SunPhot v.2.1 is operational, but the project is awaiting completion of ILMT commissioning.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Surface Brightness Properties of LSB Galaxies with the International Liquid Mirror Telescope
Authors:
Jiuyang Fu,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Kuntal Misra,
Vibhore Negi,
Kumar Pranshu,
Ethen Sun,
Jean Surdej
Abstract:
Low surface brightness (LSB) galaxies make up a significant fraction of the luminosity density of the local universe. Their low surface brightness suggests a different formation and evolution process compared to more-typical high-surface-brightness galaxies. This study presents an analysis of LSB galaxies found in images obtained by the International Liquid Mirror Telescope during the observation…
▽ More
Low surface brightness (LSB) galaxies make up a significant fraction of the luminosity density of the local universe. Their low surface brightness suggests a different formation and evolution process compared to more-typical high-surface-brightness galaxies. This study presents an analysis of LSB galaxies found in images obtained by the International Liquid Mirror Telescope during the observation period from October 24 to November 1, 2022. 3,092 LSB galaxies were measured and separated into blue and red LSB categories based on their $g'-i'$ colours. In these samples, the median effective radius is 4.7 arcsec, and the median value of the mean surface brightness within the effective radius is 26.1 mag arcsec$^{-2}$. The blue LSB galaxies are slightly brighter than the red LSB galaxies. No significant difference of ellipticity was found between the blue and the red LSB galaxies.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Survey of Variables with the ILMT
Authors:
Baldeep Grewal,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Paul Hickson,
Kuntal Misra,
Brajesh Kumar,
Vibhore Negi,
Kumar Pranshu,
Ethen Sun,
Jean Surdej
Abstract:
Nestled in the mountains of Northern India, is a 4-metre rotating dish of liquid mercury. Over a 10-year period, the International Liquid Mirror Telescope (ILMT) will survey 117 square degrees of sky, to study the astrometric and photometric variability of all detected objects. One of the scientific programs will be a survey of variable stars. The data gathered will be used to construct a comprehe…
▽ More
Nestled in the mountains of Northern India, is a 4-metre rotating dish of liquid mercury. Over a 10-year period, the International Liquid Mirror Telescope (ILMT) will survey 117 square degrees of sky, to study the astrometric and photometric variability of all detected objects. One of the scientific programs will be a survey of variable stars. The data gathered will be used to construct a comprehensive catalog of light curves. This will be an essential resource for astronomers studying the formation and evolution of stars, the structure and dynamics of our Milky Way galaxy, and the properties of the Universe as a whole. This catalog will be an aid in our advance to understanding the cosmos and provide deeper insights into the fundamental processes that shape our Universe. In this work, we describe the survey and give some examples of variable stars found in the early commissioning data from the ILMT.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Observation of mulitply imaged quasars with the 4-m ILMT
Authors:
Talat Akhunov,
Bhavya Ailawadhi,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Kuntal Misra,
Vibhore Negi,
Anna Pospieszalska-Surdej,
Kumar Pranshu,
Ethen Sun,
Jean Surdej
Abstract:
Gravitationally lensed quasars (GLQs) are known to potentially provide an independent way of determining the value of the Hubble-Lemaître parameter $H_{0}$, to probe the dark matter content of lensing galaxies and to resolve tiny structures in distant active galactic nuclei. That is why multiply imaged quasars are one of the main drivers for a photometric monitoring with the 4-m International Liqu…
▽ More
Gravitationally lensed quasars (GLQs) are known to potentially provide an independent way of determining the value of the Hubble-Lemaître parameter $H_{0}$, to probe the dark matter content of lensing galaxies and to resolve tiny structures in distant active galactic nuclei. That is why multiply imaged quasars are one of the main drivers for a photometric monitoring with the 4-m International Liquid Mirror Telescope (ILMT). We would like to answer the following questions -- how many multiply imaged quasars should we be able to detect with the ILMT? And how to derive accurate magnitudes of the GLQ images? Our estimation of the possible number of multiply imaged quasars is $15$, although optimistic forecasts predict up to $50$ of them. We propose to use the adaptive PSF fitting method for accurate flux measurements of the lensed images. During preliminary observations in spring 2022 we were able to detect the quadruply imaged quasar - SDSS J1251+2935 in the $\it{i}$ and $\it{r}$ spectral bands.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Follow-up strategy of ILMT discovered supernovae
Authors:
Brajesh Kumar,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Kuntal Misra,
Vibhore Negi,
Kumar Pranshu,
Ethen Sun,
Jean Surdej
Abstract:
The 4m International Liquid Mirror Telescope (ILMT) facility continuously scans the same sky strip ($\sim$22$^\prime$ wide) on each night with a fixed pointing towards the zenith direction. It is possible to detect hundreds of supernovae (SNe) each year by implementing an optimal image subtraction technique on consecutive night images. Prompt monitoring of ILMT-detected SNe is planned under the se…
▽ More
The 4m International Liquid Mirror Telescope (ILMT) facility continuously scans the same sky strip ($\sim$22$^\prime$ wide) on each night with a fixed pointing towards the zenith direction. It is possible to detect hundreds of supernovae (SNe) each year by implementing an optimal image subtraction technique on consecutive night images. Prompt monitoring of ILMT-detected SNe is planned under the secured target of opportunity mode using ARIES telescopes (1.3m DFOT and 3.6m DOT). Spectroscopy with the DOT facility will be useful for the classification and detailed investigation of SNe. During the commissioning phase of the ILMT, supernova (SN) 2023af was identified in the ILMT field of view. The SN was further monitored with the ILMT and DOT facilities. Preliminary results based on the light curve and spectral features of SN 2023af are presented.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Astrometric and photometric calibrators for the 4-m International Liquid Mirror Telescope
Authors:
Naveen Dukiya,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Kuntal Misra,
Vibhore Negi,
Kumar Pranshu,
Ethen Sun,
Jean Surdej
Abstract:
The International Liquid Mirror Telescope (ILMT) is a 4-meter class survey telescope. It achieved its first light on 29$^{\rm th}$ April 2022 and is now undergoing the commissioning phase. It scans the sky in a fixed \ang{;22;} wide strip centred at the declination of $+$\ang{29;21;41.4} and works in \emph{Time Delay Integration (TDI)} mode. We present a full catalog of sources in the ILMT strip d…
▽ More
The International Liquid Mirror Telescope (ILMT) is a 4-meter class survey telescope. It achieved its first light on 29$^{\rm th}$ April 2022 and is now undergoing the commissioning phase. It scans the sky in a fixed \ang{;22;} wide strip centred at the declination of $+$\ang{29;21;41.4} and works in \emph{Time Delay Integration (TDI)} mode. We present a full catalog of sources in the ILMT strip derived by crossmatching \textit{Gaia} DR3 with SDSS DR17 and PanSTARRS-1 (PS1) to supplement the catalog with apparent magnitudes of these sources in $g, r$, and $i$ filters. These sources can serve as astrometric calibrators. The release of Gaia DR3 provides synthetic photometry in popular broadband photometric systems, including the SDSS $g, r$, and $i$ bands for $\sim$220 million sources across the sky. We have used this synthetic photometry to verify our crossmatching performance and, in turn, create a subset of the catalog with accurate photometric measurements from two reliable sources.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
A year-long representation of the ILMT observations in different coordinate systems
Authors:
Monalisa Dubey,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Kuntal Misra,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Vibhore Negi,
Kumar Pranshu,
Ethen Sun,
Jean Surdej
Abstract:
The 4m International Liquid Mirror Telescope (ILMT) is the first optical survey telescope in India that performs zenithal observations of a 22$'$ wide strip of the sky. To determine the portion of the sky covered by the ILMT during the entire year, we represent the ILMT Field of View (FoV) in three different coordinate systems - galactic, ecliptic, and equatorial. We adopt a constant declination o…
▽ More
The 4m International Liquid Mirror Telescope (ILMT) is the first optical survey telescope in India that performs zenithal observations of a 22$'$ wide strip of the sky. To determine the portion of the sky covered by the ILMT during the entire year, we represent the ILMT Field of View (FoV) in three different coordinate systems - galactic, ecliptic, and equatorial. We adopt a constant declination of $+29^{\circ}21'41.4"$ and varying right ascension (RA) ranges corresponding to the Local Sidereal Time (LST). The observations from June to September are hampered due to the monsoon season. The handiness of such representations will allow us to locate a transient event in the ILMT FoV. This will enable prompt follow-up observations with other facilities.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
The 4m International Liquid Mirror Telescope project
Authors:
Jean Surdej,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Kuntal Misra,
Vibhore Negi,
Anna Pospieszalska-Surdej,
Kumar Pranshu,
Ethen Sun
Abstract:
The International Liquid Mirror Telescope (ILMT) project is a scientific collaboration in observational astrophysics between the Li{è}ge Institute of Astrophysics and Geophysics (Li{è}ge University, Belgium), the Aryabatta Research Institute of observational sciencES (ARIES, Nainital, India) and several Canadian universities (British Columbia, Laval, Montr{é}al, Toronto, Victoria and York). Meanwh…
▽ More
The International Liquid Mirror Telescope (ILMT) project is a scientific collaboration in observational astrophysics between the Li{è}ge Institute of Astrophysics and Geophysics (Li{è}ge University, Belgium), the Aryabatta Research Institute of observational sciencES (ARIES, Nainital, India) and several Canadian universities (British Columbia, Laval, Montr{é}al, Toronto, Victoria and York). Meanwhile, several other institutes have joined the project: the Royal Observatory of Belgium, the National University of Uzbekistan and the Ulugh Beg Astronomical Institute (Uzbekistan) as well as the Pozna{ń} Observatory (Poland). The Li{è}ge company AMOS (Advanced Mechanical and Optical Systems) has fabricated the telescope structure that has been erected on the ARIES site in Devasthal (Uttarakhand, India). It is the first liquid mirror telescope being dedicated to astronomical observations. First light was obtained on 29 April 2022 and commissioning is being conducted at the present time. In this short article, we describe and illustrate the main components of the ILMT. We also highlight the ILMT papers presented during the third BINA workshop, which discuss various aspects of the ILMT science programs.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Serendipitous Detection of Orbital Debris by the International Liquid Mirror Telescope: First Results
Authors:
Paul Hickson,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Brajesh Kumar,
Kuntal Misra,
Vibhore Negi,
Kumar Pranshu,
Ethen Sun,
Jean Surdej
Abstract:
Orbital debris presents a growing risk to space operations, and is becoming a significant source of contamination of astronomical images. Much of the debris population is uncatalogued, making the impact more difficult to assess. We present initial results from the first ten nights of commissioning observations with the International Liquid Mirror Telescope, in which images were examined for streak…
▽ More
Orbital debris presents a growing risk to space operations, and is becoming a significant source of contamination of astronomical images. Much of the debris population is uncatalogued, making the impact more difficult to assess. We present initial results from the first ten nights of commissioning observations with the International Liquid Mirror Telescope, in which images were examined for streaks produced by orbiting objects including satellites, rocket bodies and other forms of debris. We detected 83 streaks and performed a correlation analysis to attempt to match these with objects in the public database. 48\% of these objects were uncorrelated, indicating substantial incompleteness in the database, even for some relatively-bright objects. We were able to detect correlated objects to an estimated magnitude of 14.5 and possibly about two magnitudes greater for the faintest uncorrelated object.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Detection and Identification of Asteroids with the 4-m ILMT
Authors:
Anna Pospieszalska-Surdej,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Kuntal Misra,
Vibhore Negi,
Kumar Pranshu,
Ethen Sun,
Jean Surdej
Abstract:
A very unique strength of the Devasthal Observatory is its capability of detecting optical transients with the 4-m International Liquid Mirror Telescope (ILMT) and to rapidly follow them up using the 1.3-m Devasthal Fast Optical Telescope (DFOT) and/or the 3.6-m Devasthal Optical Telescope (DOT), installed right next to it. In this context, we have inspected 20 fields observed during 9 consecutive…
▽ More
A very unique strength of the Devasthal Observatory is its capability of detecting optical transients with the 4-m International Liquid Mirror Telescope (ILMT) and to rapidly follow them up using the 1.3-m Devasthal Fast Optical Telescope (DFOT) and/or the 3.6-m Devasthal Optical Telescope (DOT), installed right next to it. In this context, we have inspected 20 fields observed during 9 consecutive nights in October-November 2022 during the first commissioning phase of the ILMT. Each of these fields has an angular extent of $22^\prime$ in declination by $9 \times 22^\prime$ in right ascension. Combining both a visual search for optical transients and an automatic search for these using an image subtraction technique (see the ILMT poster paper by Pranshu et al.), we report a total of 232 significant transient candidates. After consulting the Minor Planet Center database of asteroids, we could identify among these 219 positions of known asteroids brighter than $V=22$. These correspond to the confirmed positions of 78 distinct known asteroids. Analysis of the remaining CCD frames covering 19 more fields (out of 20) should lead to an impressive number of asteroids observed in only 9 nights. The conclusion is that in order to detect and characterize new supernovae, micro-lensing events, highly variable stars, multiply imaged quasars, etc. among the ILMT optical transients, we shall first have to identify all known and new asteroids. Thanks to its large diameter and short focal length (f/D $\sim$ 2.4), the ILMT turns out to be an excellent asteroid hunter.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Accessibility of the ILMT survey data
Authors:
Kuntal Misra,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Vibhore Negi,
Kumar Pranshu,
Ethen Sun,
Jean Surdej
Abstract:
The 4m International Liquid Mirror Telescope (ILMT) continuously scans a 22$'$ wide strip of the zenithal sky and records the images in three broadband filters (g', r' and i') using a 4K$\times$4K CCD camera. In about 10--12 hours of observations during a single night, $\sim$15 GB of data volume is generated. The raw images resulting from the observations in October--November 2022 have been pre-pr…
▽ More
The 4m International Liquid Mirror Telescope (ILMT) continuously scans a 22$'$ wide strip of the zenithal sky and records the images in three broadband filters (g', r' and i') using a 4K$\times$4K CCD camera. In about 10--12 hours of observations during a single night, $\sim$15 GB of data volume is generated. The raw images resulting from the observations in October--November 2022 have been pre-processed and astrometrically calibrated. In order to exploit the scientific capabilities of the ILMT survey data by the larger scientific community, we are disseminating the raw data (along with dark and flat fields) and the astrometrically calibrated data. These data sets can be downloaded by the users to conduct the scientific projects of their interest. In future, the data will be processed in near real-time and will be available via the ARIES data archive portal.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Automated transient detection in the context of the 4m ILMT
Authors:
Kumar Pranshu,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Kuntal Misra,
Vibhore Negi,
Ethen Sun,
Jean Surdej
Abstract:
In the era of sky surveys like Palomar Transient Factory (PTF), Zwicky Transient Facility (ZTF) and the upcoming Vera Rubin Observatory (VRO) and ILMT, a plethora of image data will be available. ZTF scans the sky with a field of view of 48 deg$^{2}$ and VRO will have a FoV of 9.6 deg$^{2}$ but with a much larger aperture. The 4m ILMT covers a 22$'$ wide strip of the sky. Being a zenith telescope,…
▽ More
In the era of sky surveys like Palomar Transient Factory (PTF), Zwicky Transient Facility (ZTF) and the upcoming Vera Rubin Observatory (VRO) and ILMT, a plethora of image data will be available. ZTF scans the sky with a field of view of 48 deg$^{2}$ and VRO will have a FoV of 9.6 deg$^{2}$ but with a much larger aperture. The 4m ILMT covers a 22$'$ wide strip of the sky. Being a zenith telescope, ILMT has several advantages like low observation air mass, best image quality, minimum light pollution and no pointing time loss. Transient detection requires all these imaging data to be processed through a Difference Imaging Algorithm (DIA) followed by subsequent identification and classification of transients. The ILMT is also expected to discover several known and unknown astrophysical objects including transients. Here, we propose a pipeline with an image subtraction algorithm and a convolutional neural network (CNN) based automated transient discovery and classification system. The pipeline was tested on ILMT data and the transients as well as variable candidates were recovered and classified.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
An automated photometric pipeline for the ILMT data
Authors:
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Kuntal Misra,
Vibhore Negi,
Kumar Pranshu,
Ethen Sun,
Jean Surdej
Abstract:
The International Liquid Mirror Telescope (ILMT) is a 4-meter survey telescope continuously observing towards the zenith in the SDSS g', r', and i' bands. This survey telescope is designed to detect various astrophysical transients (for example, supernovae) and very faint objects like multiply-imaged quasars and low surface brightness galaxies. A single scan of a 22$'$ strip of sky contains a larg…
▽ More
The International Liquid Mirror Telescope (ILMT) is a 4-meter survey telescope continuously observing towards the zenith in the SDSS g', r', and i' bands. This survey telescope is designed to detect various astrophysical transients (for example, supernovae) and very faint objects like multiply-imaged quasars and low surface brightness galaxies. A single scan of a 22$'$ strip of sky contains a large amount of photometric information. To process this type of data, it becomes critical to have tools or pipelines that can handle it efficiently and accurately with minimal human biases. We offer a fully automated pipeline generated in Python to perform aperture photometry over the ILMT data acquired with the CCD in Time Delayed Integration (TDI) mode. The instrumental magnitudes are calibrated with respect to the Pan-STARRS-1 catalogue. The light curves generated from the calibrated magnitudes will allows us to characterize the objects as variable stars or rapidly decaying transients.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Necessity of a TDI optical corrector for ILMT observations
Authors:
Vibhore Negi,
Bhavya Ailawadhi,
Talat Akhunov,
Ermanno Borra,
Monalisa Dubey,
Naveen Dukiya,
Jiuyang Fu,
Baldeep Grewal,
Paul Hickson,
Brajesh Kumar,
Kuntal Misra,
Kumar Pranshu,
Ethen Sun,
Jean Surdej
Abstract:
The International Liquid Mirror Telescope (ILMT) has recently become operational at the Devasthal Observatory of ARIES, Nainital, India. The ILMT observes in the Time delay integration (TDI) mode where the images are formed by electronically stepping the charges over the pixels of the CCD, along a column. Observations near the zenith impose certain constraints dependent on the latitude such as ima…
▽ More
The International Liquid Mirror Telescope (ILMT) has recently become operational at the Devasthal Observatory of ARIES, Nainital, India. The ILMT observes in the Time delay integration (TDI) mode where the images are formed by electronically stepping the charges over the pixels of the CCD, along a column. Observations near the zenith impose certain constraints dependent on the latitude such as image deformation due to the star-trail curvature and differential speed. These effects make the stellar trajectories in the focal plane of the ILMT to be hyperbolic, which are corrected for by the introduction of a TDI optical corrector, designed specifically for the ILMT. Here, we report the first results on the effect of this corrector on the trajectories followed by the stars in the ILMT focal plane. Astrometrically calibrating nine nights of data recorded with the ILMT during its first commissioning phase, we find simple (nearly linear) relations between the CCD-y coordinate and the right ascension (RA) of stars and between the CCD-x coordinate and their declination (DEC), respectively, which confirms that the TDI corrector works very fine in converting the stellar trajectories into straight lines.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Knowledge Distilled Ensemble Model for sEMG-based Silent Speech Interface
Authors:
Wenqiang Lai,
Qihan Yang,
Ye Mao,
Endong Sun,
Jiangnan Ye
Abstract:
Voice disorders affect millions of people worldwide. Surface electromyography-based Silent Speech Interfaces (sEMG-based SSIs) have been explored as a potential solution for decades. However, previous works were limited by small vocabularies and manually extracted features from raw data. To address these limitations, we propose a lightweight deep learning knowledge-distilled ensemble model for sEM…
▽ More
Voice disorders affect millions of people worldwide. Surface electromyography-based Silent Speech Interfaces (sEMG-based SSIs) have been explored as a potential solution for decades. However, previous works were limited by small vocabularies and manually extracted features from raw data. To address these limitations, we propose a lightweight deep learning knowledge-distilled ensemble model for sEMG-based SSI (KDE-SSI). Our model can classify a 26 NATO phonetic alphabets dataset with 3900 data samples, enabling the unambiguous generation of any English word through spelling. Extensive experiments validate the effectiveness of KDE-SSI, achieving a test accuracy of 85.9\%. Our findings also shed light on an end-to-end system for portable, practical equipment.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
Is your data alignable? Principled and interpretable alignability testing and integration of single-cell data
Authors:
Rong Ma,
Eric D. Sun,
David Donoho,
James Zou
Abstract:
Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional si…
▽ More
Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features. SMAI provides a statistical test to robustly assess the alignability between datasets to avoid misleading inference, and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.
△ Less
Submitted 29 February, 2024; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text
Authors:
Eric Sun,
Jinyu Li,
Jian Xue,
Yifan Gong
Abstract:
In end-to-end automatic speech recognition system, one of the difficulties for language expansion is the limited paired speech and text training data. In this paper, we propose a novel method to generate augmented samples with unpaired speech feature segments and text data for model pre-training, which has the advantage of low cost without using additional speech data. When mixing 20,000 hours aug…
▽ More
In end-to-end automatic speech recognition system, one of the difficulties for language expansion is the limited paired speech and text training data. In this paper, we propose a novel method to generate augmented samples with unpaired speech feature segments and text data for model pre-training, which has the advantage of low cost without using additional speech data. When mixing 20,000 hours augmented speech data generated by our method with 12,500 hours original transcribed speech data for Italian Transformer transducer model pre-training, we achieve 8.7% relative word error rate reduction. The pre-trained model achieves similar performance as the model pre-trained with multilingual transcribed 75,000 hours raw speech data. When merging the augmented speech data with the multilingual data to pre-train a new model, we achieve even more relative word error rate reduction of 12.2% over the baseline, which further verifies the effectiveness of our method for speech data augmentation.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
Data Cross-Segmentation for Improved Generalization in Reinforcement Learning Based Algorithmic Trading
Authors:
Vikram Duvvur,
Aashay Mehta,
Edward Sun,
Bo Wu,
Ken Yew Chan,
Jeff Schneider
Abstract:
The use of machine learning in algorithmic trading systems is increasingly common. In a typical set-up, supervised learning is used to predict the future prices of assets, and those predictions drive a simple trading and execution strategy. This is quite effective when the predictions have sufficient signal, markets are liquid, and transaction costs are low. However, those conditions often do not…
▽ More
The use of machine learning in algorithmic trading systems is increasingly common. In a typical set-up, supervised learning is used to predict the future prices of assets, and those predictions drive a simple trading and execution strategy. This is quite effective when the predictions have sufficient signal, markets are liquid, and transaction costs are low. However, those conditions often do not hold in thinly traded financial markets and markets for differentiated assets such as real estate or vehicles. In these markets, the trading strategy must consider the long-term effects of taking positions that are relatively more difficult to change. In this work, we propose a Reinforcement Learning (RL) algorithm that trades based on signals from a learned predictive model and addresses these challenges. We test our algorithm on 20+ years of equity data from Bursa Malaysia.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Improved Algorithms for Online Rent Minimization Problem Under Unit-Size Jobs
Authors:
Enze Sun,
Zonghan Yang,
Yuhao Zhang
Abstract:
We consider the Online Rent Minimization problem, where online jobs with release times, deadlines, and processing times must be scheduled on machines that can be rented for a fixed length period of $T$. The objective is to minimize the number of machine rents. This problem generalizes the Online Machine Minimization problem where machines can be rented for an infinite period, and both problems hav…
▽ More
We consider the Online Rent Minimization problem, where online jobs with release times, deadlines, and processing times must be scheduled on machines that can be rented for a fixed length period of $T$. The objective is to minimize the number of machine rents. This problem generalizes the Online Machine Minimization problem where machines can be rented for an infinite period, and both problems have an asymptotically optimal competitive ratio of $O(\log(p_{\max}/p_{\min}))$ for general processing times, where $p_{\max}$ and $p_{\min}$ are the maximum and minimum processing times respectively. However, for small values of $p_{\max}/p_{\min}$, a better competitive ratio can be achieved by assuming unit-size jobs. Under this assumption, Devanur et al. (2014) gave an optimal $e$-competitive algorithm for Online Machine Minimization, and Chen and Zhang (2022) gave a $(3e+7)\approx 15.16$-competitive algorithm for Online Rent Minimization. In this paper, we significantly improve the competitive ratio of the Online Rent Minimization problem under unit size to $6$, by using a clean oracle-based online algorithm framework.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Motion robust MR fingerprinting scan to image neonates with prenatal opioid exposure
Authors:
Dan Ma,
Chaitra Badve,
Jessie EP Sun,
Siyuan Hu,
Xiaofeng Wang,
Yong Chen,
Ameya Nayate,
Michael Wien,
Douglas Martin,
Lynn T Singer,
Jared C. Durieux,
Chris Flask,
Deanne Wilson Costello
Abstract:
Background: A noninvasive and sensitive imaging tool is needed to assess the fast-evolving baby brain. However, using MRI to study non-sedated babies faces roadblocks, including high scan failure rates due to subjects motion and the lack of quantitative measures for assessing potential developmental delays. This feasibility study explores whether MR Fingerprinting scans can provide motion-robust a…
▽ More
Background: A noninvasive and sensitive imaging tool is needed to assess the fast-evolving baby brain. However, using MRI to study non-sedated babies faces roadblocks, including high scan failure rates due to subjects motion and the lack of quantitative measures for assessing potential developmental delays. This feasibility study explores whether MR Fingerprinting scans can provide motion-robust and quantitative brain tissue measurements for non-sedated infants with prenatal opioid exposure, presenting a viable alternative to clinical MR scans. Assessment: MRF image quality was compared to pediatric MRI scans using a fully crossed, multiple reader multiple case study. The quantitative T1 and T2 values were used to assess brain tissue changes between babies younger than one month and babies between one and two months. Statistical Tests: Generalized estimating equations (GEE) model was performed to test the significant difference of the T1 and T2 values from eight white matter regions of babies under one month and those are older. MRI and MRF image quality were assessed using Gwets second order auto-correlation coefficient (AC2) with its confidence levels. We used the Cochran-Mantel-Haenszel test to assess the difference in proportions between MRF and MRI for all features and stratified by the type of features. Results: In infants under one month of age, the T1 and T2 values are significantly higher (p<0.005) compared to those between one and two months. A multiple-reader and multiple-case study showed superior image quality ratings in anatomical features from the MRF images than the MRI images. Conclusions: This study suggested that the MR Fingerprinting scans offer a motion-robust and efficient method for non-sedated infants, delivering superior image quality than clinical MRI scans and additionally providing quantitative measures to assess brain development.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Strong Interaction Physics at the Luminosity Frontier with 22 GeV Electrons at Jefferson Lab
Authors:
A. Accardi,
P. Achenbach,
D. Adhikari,
A. Afanasev,
C. S. Akondi,
N. Akopov,
M. Albaladejo,
H. Albataineh,
M. Albrecht,
B. Almeida-Zamora,
M. Amaryan,
D. Androić,
W. Armstrong,
D. S. Armstrong,
M. Arratia,
J. Arrington,
A. Asaturyan,
A. Austregesilo,
H. Avagyan,
T. Averett,
C. Ayerbe Gayoso,
A. Bacchetta,
A. B. Balantekin,
N. Baltzell,
L. Barion
, et al. (419 additional authors not shown)
Abstract:
This document presents the initial scientific case for upgrading the Continuous Electron Beam Accelerator Facility (CEBAF) at Jefferson Lab (JLab) to 22 GeV. It is the result of a community effort, incorporating insights from a series of workshops conducted between March 2022 and April 2023. With a track record of over 25 years in delivering the world's most intense and precise multi-GeV electron…
▽ More
This document presents the initial scientific case for upgrading the Continuous Electron Beam Accelerator Facility (CEBAF) at Jefferson Lab (JLab) to 22 GeV. It is the result of a community effort, incorporating insights from a series of workshops conducted between March 2022 and April 2023. With a track record of over 25 years in delivering the world's most intense and precise multi-GeV electron beams, CEBAF's potential for a higher energy upgrade presents a unique opportunity for an innovative nuclear physics program, which seamlessly integrates a rich historical background with a promising future. The proposed physics program encompass a diverse range of investigations centered around the nonperturbative dynamics inherent in hadron structure and the exploration of strongly interacting systems. It builds upon the exceptional capabilities of CEBAF in high-luminosity operations, the availability of existing or planned Hall equipment, and recent advancements in accelerator technology. The proposed program cover various scientific topics, including Hadron Spectroscopy, Partonic Structure and Spin, Hadronization and Transverse Momentum, Spatial Structure, Mechanical Properties, Form Factors and Emergent Hadron Mass, Hadron-Quark Transition, and Nuclear Dynamics at Extreme Conditions, as well as QCD Confinement and Fundamental Symmetries. Each topic highlights the key measurements achievable at a 22 GeV CEBAF accelerator. Furthermore, this document outlines the significant physics outcomes and unique aspects of these programs that distinguish them from other existing or planned facilities. In summary, this document provides an exciting rationale for the energy upgrade of CEBAF to 22 GeV, outlining the transformative scientific potential that lies within reach, and the remarkable opportunities it offers for advancing our understanding of hadron physics and related fundamental phenomena.
△ Less
Submitted 24 August, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Authors:
Eric Sun,
Jinyu Li,
Yuxuan Hu,
Yimeng Zhu,
Long Zhou,
Jian Xue,
Peidong Wang,
Linquan Liu,
Shujie Liu,
Edward Lin,
Yifan Gong
Abstract:
We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference. Our method incorporates a gating mechanism and LID loss, enabling transformer experts to learn language-specific information. By combining gated transformer experts with shared transformer layers, we const…
▽ More
We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference. Our method incorporates a gating mechanism and LID loss, enabling transformer experts to learn language-specific information. By combining gated transformer experts with shared transformer layers, we construct multilingual transformer blocks and utilize linear experts to effectively regularize the joint network. The curriculum training scheme leverages LID to guide the gated experts in improving their respective language performance. Experimental results on a bilingual task involving English and Spanish demonstrate significant improvements, with average relative word error reductions of 12.5% and 7.3% compared to the baseline bilingual and monolingual models, respectively. Notably, our method achieves performance comparable to the upper-bound model trained and inferred with oracle LID. Extending our approach to trilingual, quadrilingual, and pentalingual models reveals similar advantages to those observed in the bilingual models, highlighting its ease of extension to multiple languages.
△ Less
Submitted 7 July, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Authors:
Peidong Wang,
Eric Sun,
Jian Xue,
Yu Wu,
Long Zhou,
Yashesh Gaur,
Shujie Liu,
Jinyu Li
Abstract:
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure. It is thus possible to use a single transducer model to perform both tasks. In real-world applications, such joint ASR and ST models may need to be streaming and do not require source language identification (i.e. language-agnostic). In this paper, we propose LAMASSU, a streaming…
▽ More
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure. It is thus possible to use a single transducer model to perform both tasks. In real-world applications, such joint ASR and ST models may need to be streaming and do not require source language identification (i.e. language-agnostic). In this paper, we propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers. Based on the transducer model structure, we propose four methods, a unified joint and prediction network for multilingual output, a clustered multilingual encoder, target language identification for encoder, and connectionist temporal classification regularization. Experimental results show that LAMASSU not only drastically reduces the model size but also reaches the performances of monolingual ASR and bilingual ST models.
△ Less
Submitted 19 October, 2023; v1 submitted 5 November, 2022;
originally announced November 2022.
-
A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability
Authors:
Jian Xue,
Peidong Wang,
Jinyu Li,
Eric Sun
Abstract:
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which can transcribe or translate multiple spoken languages into texts of the target language. The backbone of SM2 is Transformer Transducer, which has high streaming capability. Instead of human labeled speech translation (ST) data, SM2 models are trained using weakly supervised data generated by convert…
▽ More
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which can transcribe or translate multiple spoken languages into texts of the target language. The backbone of SM2 is Transformer Transducer, which has high streaming capability. Instead of human labeled speech translation (ST) data, SM2 models are trained using weakly supervised data generated by converting the transcriptions in speech recognition corpora with a machine translation service. With 351 thousand hours of anonymized speech training data from 25 languages, SM2 models achieve comparable or even better ST quality than some recent popular large-scale non-streaming speech models. More importantly, we show that SM2 has the truly zero-shot capability when expanding to new target languages, yielding high quality ST results for {source-speech, target-text} pairs that are not seen during training.
△ Less
Submitted 5 July, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
A Spectral Method for Assessing and Combining Multiple Data Visualizations
Authors:
Rong Ma,
Eric D. Sun,
James Zou
Abstract:
Dimension reduction and data visualization aim to project a high-dimensional dataset to a low-dimensional space while capturing the intrinsic structures in the data. It is an indispensable part of modern data science, and many dimensional reduction and visualization algorithms have been developed. However, different algorithms have their own strengths and weaknesses, making it critically important…
▽ More
Dimension reduction and data visualization aim to project a high-dimensional dataset to a low-dimensional space while capturing the intrinsic structures in the data. It is an indispensable part of modern data science, and many dimensional reduction and visualization algorithms have been developed. However, different algorithms have their own strengths and weaknesses, making it critically important to evaluate their relative performance for a given dataset, and to leverage and combine their individual strengths. In this paper, we propose an efficient spectral method for assessing and combining multiple visualizations of a given dataset produced by diverse algorithms. The proposed method provides a quantitative measure -- the visualization eigenscore -- of the relative performance of the visualizations for preserving the structure around each data point. Then it leverages the eigenscores to obtain a consensus visualization, which has much improved { quality over the individual visualizations in capturing the underlying true data structure.} Our approach is flexible and works as a wrapper around any visualizations. We analyze multiple simulated and real-world datasets from diverse applications to demonstrate the effectiveness of the eigenscores for evaluating visualizations and the superiority of the proposed consensus visualization. Furthermore, we establish rigorous theoretical justification of our method based on a general statistical framework, yielding fundamental principles behind the empirical success of consensus visualization along with practical guidance.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Better Approximation for Interdependent SOS Valuations
Authors:
Pinyan Lu,
Enze Sun,
Chenghan Zhou
Abstract:
Submodular over signal (SOS) defines a family of interesting functions for which there exist truthful mechanisms with constant approximation to the social welfare for agents with interdependent valuations. The best-known truthful auction is of $4$-approximation and a lower bound of 2 was proved. We propose a new and simple truthful mechanism to achieve an approximation ratio of 3.315.
Submodular over signal (SOS) defines a family of interesting functions for which there exist truthful mechanisms with constant approximation to the social welfare for agents with interdependent valuations. The best-known truthful auction is of $4$-approximation and a lower bound of 2 was proved. We propose a new and simple truthful mechanism to achieve an approximation ratio of 3.315.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Online Ordinal Problems: Optimality of Comparison-based Algorithms and their Cardinal Complexity
Authors:
Nick Gravin,
Enze Sun,
Zhihao Gavin Tang
Abstract:
We consider ordinal online problems, i.e., tasks that only require pairwise comparisons between elements of the input. A classic example is the secretary problem and the game of googol, as well as its multiple combinatorial extensions such as $(J,K)$-secretary, $2$-sided game of googol, ordinal-competitive matroid secretary. A natural approach to these tasks is to use ordinal algorithms that at ea…
▽ More
We consider ordinal online problems, i.e., tasks that only require pairwise comparisons between elements of the input. A classic example is the secretary problem and the game of googol, as well as its multiple combinatorial extensions such as $(J,K)$-secretary, $2$-sided game of googol, ordinal-competitive matroid secretary. A natural approach to these tasks is to use ordinal algorithms that at each step only consider relative ranking among the arrived elements, without looking at the numerical values of the input. We formally study the question of how cardinal algorithms can improve upon ordinal algorithms.
We give first a universal construction of the input distribution for any ordinal online problem, such that the advantage of any cardinal algorithm over the ordinal algorithms is at most $1+\varepsilon$ for arbitrary small $\varepsilon> 0$. As an implication, previous lower bounds for the aforementioned variants of secretary problems hold not only against ordinal algorithms, but also against any online algorithm. However, the value range of the input elements in our construction is huge: $N=O\left(\frac{n^3\cdot n!\cdot n!}{\varepsilon}\right)\uparrow\uparrow(n-1)$ (tower of exponents) for an input sequence of length $n$. As a second result, we identify a class of natural ordinal problems and find cardinal algorithm with a matching advantage of $1+ Ω\left(\frac{1}{\log^{(c)}N}\right),$ where $\log^{(c)}N=\log\ldots\log N$ with $c$ iterative logs and $c$ is an arbitrary constant. Further, we introduce the cardinal complexity for any given ordinal online task: the minimum size $N(\varepsilon)$ of different numerical values in the input such the advantage of cardinal over ordinal algorithms is at most $1+\varepsilon$. As a third result, we show that the game of googol has much lower cardinal complexity of $N=O\left(\left(\frac{n}{\varepsilon}\right)^n\right)$.
△ Less
Submitted 11 October, 2023; v1 submitted 4 April, 2022;
originally announced April 2022.
-
Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition
Authors:
Kenichi Kumatani,
Robert Gmyr,
Felipe Cruz Salinas,
Linquan Liu,
Wei Zuo,
Devang Patel,
Eric Sun,
Yu Shi
Abstract:
The sparsely-gated Mixture of Experts (MoE) can magnify a network capacity with a little computational complexity. In this work, we investigate how multi-lingual Automatic Speech Recognition (ASR) networks can be scaled up with a simple routing algorithm in order to achieve better accuracy. More specifically, we apply the sparsely-gated MoE technique to two types of networks: Sequence-to-Sequence…
▽ More
The sparsely-gated Mixture of Experts (MoE) can magnify a network capacity with a little computational complexity. In this work, we investigate how multi-lingual Automatic Speech Recognition (ASR) networks can be scaled up with a simple routing algorithm in order to achieve better accuracy. More specifically, we apply the sparsely-gated MoE technique to two types of networks: Sequence-to-Sequence Transformer (S2S-T) and Transformer Transducer (T-T). We demonstrate through a set of ASR experiments on multiple language data that the MoE networks can reduce the relative word error rates by 16.3% and 4.6% with the S2S-T and T-T, respectively. Moreover, we thoroughly investigate the effect of the MoE on the T-T architecture in various conditions: streaming mode, non-streaming mode, the use of language ID and the label decoder with the MoE.
△ Less
Submitted 4 January, 2022; v1 submitted 10 December, 2021;
originally announced December 2021.