-
False: False Negative Samples Aware Contrastive Learning for Semantic Segmentation of High-Resolution Remote Sensing Image
Authors:
Zhaoyang Zhang,
Xuying Wang,
Xiaoming Mei,
Chao Tao,
Haifeng Li
Abstract:
The existing SSCL of RSI is built based on constructing positive and negative sample pairs. However, due to the richness of RSI ground objects and the complexity of the RSI contextual semantics, the same RSI patches have the coexistence and imbalance of positive and negative samples, which causing the SSCL pushing negative samples far away while pushing positive samples far away, and vice versa. W…
▽ More
The existing SSCL of RSI is built based on constructing positive and negative sample pairs. However, due to the richness of RSI ground objects and the complexity of the RSI contextual semantics, the same RSI patches have the coexistence and imbalance of positive and negative samples, which causing the SSCL pushing negative samples far away while pushing positive samples far away, and vice versa. We call this the sample confounding issue (SCI). To solve this problem, we propose a False negAtive sampLes aware contraStive lEarning model (FALSE) for the semantic segmentation of high-resolution RSIs. Since the SSCL pretraining is unsupervised, the lack of definable criteria for false negative sample (FNS) leads to theoretical undecidability, we designed two steps to implement the FNS approximation determination: coarse determination of FNS and precise calibration of FNS. We achieve coarse determination of FNS by the FNS self-determination (FNSD) strategy and achieve calibration of FNS by the FNS confidence calibration (FNCC) loss function. Experimental results on three RSI semantic segmentation datasets demonstrated that the FALSE effectively improves the accuracy of the downstream RSI semantic segmentation task compared with the current three models, which represent three different types of SSCL models. The mean Intersection-over-Union on ISPRS Potsdam dataset is improved by 0.7\% on average; on CVPR DGLC dataset is improved by 12.28\% on average; and on Xiangtan dataset this is improved by 1.17\% on average. This indicates that the SSCL model has the ability to self-differentiate FNS and that the FALSE effectively mitigates the SCI in self-supervised contrastive learning. The source code is available at https://github.com/GeoX-Lab/FALSE.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Dichromatic breather molecules in a mode-locked fiber laser
Authors:
Yudong Cui1,
Yusheng Zhang,
Lin Huang,
Aiguo Zhang,
Zhiming Liu,
Cuifang Kuang,
Chenning Tao,
Daru Chen,
Xu Liu,
Boris A. Malomed
Abstract:
Bound states of solitons (molecules) occur in various settings, playing an important role in the operation of fiber lasers, optical emulations, encoding, and communications. Soliton interactions are generally related to breathing dynamics in nonlinear dissipative systems, maintaining potential applications in spectroscopy. In the present work, dichromatic breather molecules (DBMs) are created in a…
▽ More
Bound states of solitons (molecules) occur in various settings, playing an important role in the operation of fiber lasers, optical emulations, encoding, and communications. Soliton interactions are generally related to breathing dynamics in nonlinear dissipative systems, maintaining potential applications in spectroscopy. In the present work, dichromatic breather molecules (DBMs) are created in a synchronized mode-locked fiber laser. Real-time delay-shifting interference spectra are measured to display the temporal evolution of the DBMs, that cannot be observed by means of the usual real-time spectroscopy. As a result, robust out-of-phase vibrations are found as a typical intrinsic mode of DBMs. The same bound states are produced numerically in the framework of a model combining equations for the population inversion in the mode-locked laser and XPM-coupled complex Ginzburg-Landau equations for amplitudes of the optical fields in the fiber segments of the laser cavity. The results demonstrate that the Q-switching instability induces the onset of breathing oscillations. The findings offer new possibilities for the design of various regimes of the operation of ultrafast lasers.
△ Less
Submitted 30 March, 2023; v1 submitted 11 November, 2022;
originally announced November 2022.
-
CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class Classification
Authors:
Yang Li,
Canran Xu,
Guodong Long,
Tao Shen,
Chongyang Tao,
Jing Jiang
Abstract:
Recently, prefix-tuning was proposed to efficiently adapt pre-trained language models to a broad spectrum of natural language classification tasks. It leverages soft prefix as task-specific indicators and language verbalizers as categorical-label mentions to narrow the formulation gap from pre-training language models. However, when the label space increases considerably (i.e., many-class classifi…
▽ More
Recently, prefix-tuning was proposed to efficiently adapt pre-trained language models to a broad spectrum of natural language classification tasks. It leverages soft prefix as task-specific indicators and language verbalizers as categorical-label mentions to narrow the formulation gap from pre-training language models. However, when the label space increases considerably (i.e., many-class classification), such a tuning technique suffers from a verbalizer ambiguity problem since the many-class labels are represented by semantic-similar verbalizers in short language phrases. To overcome this, inspired by the human-decision process that the most ambiguous classes would be mulled over for each instance, we propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix), for many-class classification. Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification. We conduct experiments on many-class benchmark datasets in both the fully supervised setting and the few-shot setting, which indicates that our model outperforms former baselines.
△ Less
Submitted 12 February, 2024; v1 submitted 10 November, 2022;
originally announced November 2022.
-
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
Authors:
Jiazhan Feng,
Qingfeng Sun,
Can Xu,
Pu Zhao,
Yaming Yang,
Chongyang Tao,
Dongyan Zhao,
Qingwei Lin
Abstract:
Responding with multi-modal content has been recognized as an essential capability for an intelligent conversational agent. In this paper, we introduce the MMDialog dataset to better facilitate multi-modal conversation. MMDialog is composed of a curated set of 1.08 million real-world dialogues with 1.53 million unique images across 4,184 topics. MMDialog has two main and unique advantages. First,…
▽ More
Responding with multi-modal content has been recognized as an essential capability for an intelligent conversational agent. In this paper, we introduce the MMDialog dataset to better facilitate multi-modal conversation. MMDialog is composed of a curated set of 1.08 million real-world dialogues with 1.53 million unique images across 4,184 topics. MMDialog has two main and unique advantages. First, it is the largest multi-modal conversation dataset by the number of dialogues by 88x. Second, it contains massive topics to generalize the open-domain. To build engaging dialogue system with this dataset, we propose and normalize two response producing tasks based on retrieval and generative scenarios. In addition, we build two baselines for above tasks with state-of-the-art techniques and report their experimental performance. We also propose a novel evaluation metric MM-Relevance to measure the multi-modal responses. Our dataset and scripts are available in https://github.com/victorsungo/MMDialog.
△ Less
Submitted 21 December, 2022; v1 submitted 10 November, 2022;
originally announced November 2022.
-
Cosmic void exclusion models and their impact on the distance scale measurements from large scale structure
Authors:
Andrei Variu,
Cheng Zhao,
Daniel Forero-Sánchez,
Chia-Hsun Chuang,
Francisco-Shu Kitaura,
Charling Tao,
Amélie Tamone,
Jean-Paul Kneib
Abstract:
Baryonic Acoustic Oscillations (BAOs) studies based on the clustering of voids and matter tracers provide important constraints on cosmological parameters related to the expansion of the Universe. However, modelling the void exclusion effect is an important challenge for fully exploiting the potential of this kind of analyses. We thus develop two numerical methods to describe the clustering of cos…
▽ More
Baryonic Acoustic Oscillations (BAOs) studies based on the clustering of voids and matter tracers provide important constraints on cosmological parameters related to the expansion of the Universe. However, modelling the void exclusion effect is an important challenge for fully exploiting the potential of this kind of analyses. We thus develop two numerical methods to describe the clustering of cosmic voids. Neither model requires additional cosmological information beyond that assumed within the galaxy de-wiggled model. The models consist in power spectra whose performance we assess in comparison to a parabolic model on Patchy cubic and light-cone mocks. Moreover, we test their robustness against systematic effects and the reconstruction technique. The void model power spectra and the parabolic model with a fixed parameter provide strongly correlated values for the Alcock-Paczynski ($α$) parameter, for boxes and light-cones likewise. The resulting $α$ values -- for all three models -- are unbiased and their uncertainties are correctly estimated. However, the numerical models show less variation with the fitting range compared to the parabolic one. The Bayesian evidence suggests that the numerical techniques are often favoured compared to the parabolic model. Moreover, the void model power spectra computed on boxes can describe the void clustering from light-cones as well as from boxes. The same void model power spectra can be used for the study of pre- and post-reconstructed data-sets. Lastly, the two numerical techniques are resilient against the studied systematic effects. Consequently, using either of the two new void models, one can more robustly measure cosmological parameters.
△ Less
Submitted 16 March, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Potential scientific synergies in weak lensing studies between the CSST and Euclid space probes
Authors:
D. Z. Liu,
X. M. Meng,
X. Z. Er,
Z. H. Fan,
M. Kilbinger,
G. L. Li,
R. Li,
T. Schrabback,
D. Scognamiglio,
H. Y. Shan,
C. Tao,
Y. S. Ting,
J. Zhang,
S. H. Cheng,
S. Farrens,
L. P. Fu,
H. Hildebrandt,
X. Kang,
J. P. Kneib,
X. K. Liu,
Y. Mellier,
R. Nakajima,
P. Schneider,
J. L. Starck,
C. L. Wei
, et al. (2 additional authors not shown)
Abstract:
Aims. With the next generation of large surveys coming to the stage of observational cosmology soon, it is important to explore their potential synergies and to maximise their scientific outcomes. In this study, we aim to investigate the complementarity of the two upcoming space missions Euclid and the China Space Station Telescope (CSST), focusing on weak lensing (WL) cosmology. In particular, we…
▽ More
Aims. With the next generation of large surveys coming to the stage of observational cosmology soon, it is important to explore their potential synergies and to maximise their scientific outcomes. In this study, we aim to investigate the complementarity of the two upcoming space missions Euclid and the China Space Station Telescope (CSST), focusing on weak lensing (WL) cosmology. In particular, we analyse the photometric redshifts (photo-zs) and the galaxy blending effects. For Euclid, WL measurements suffer from chromatic PSF effects. For this, CSST can provide valuable information for Euclid to obtain more accurate PSF, and to calibrate the color and color-gradient biases for WL measurements.
Methods. We create image simulations for different surveys, and quantify the photo-z performance. For blending analyses, we employ high-resolution HST/CANDELS data to mock Euclid, CSST, and an LSST-like survey. We analyse the blending fraction for different cases, and the blending effects on galaxy photometry. Furthermore, we demonstrate that CSST can provide a large enough number of high SNR multi-band galaxy images to calibrate the color-gradient biases for Euclid.
Results. The sky coverage of Euclid lies entirely within the CSST footprint. The combination of Euclid with CSST data can be done more uniformly than with the various ground-based data. Our studies show that by combining Euclid and CSST, we can reach a photo-z precision of $σ_{\rm NMAD} \approx 0.04$, and an outlier fraction of $η\approx 2.4\%$. Because of the similarly high resolutions, the data combination of Euclid and CSST can be relatively straightforward for photometry. To include ground-based data, however, sophisticated deblending utilizing priors from high-resolution space data is demanded. The color-gradient biases for Euclid can be well calibrated to the level of 0.1% using galaxies from CSST deep survey.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Weakly Supervised Data Augmentation Through Prompting for Dialogue Understanding
Authors:
Maximillian Chen,
Alexandros Papangelis,
Chenyang Tao,
Andy Rosenbaum,
Seokhwan Kim,
Yang Liu,
Zhou Yu,
Dilek Hakkani-Tur
Abstract:
Dialogue understanding tasks often necessitate abundant annotated data to achieve good performance and that presents challenges in low-resource settings. To alleviate this barrier, we explore few-shot data augmentation for dialogue understanding by prompting large pre-trained language models and present a novel approach that iterates on augmentation quality by applying weakly-supervised filters. W…
▽ More
Dialogue understanding tasks often necessitate abundant annotated data to achieve good performance and that presents challenges in low-resource settings. To alleviate this barrier, we explore few-shot data augmentation for dialogue understanding by prompting large pre-trained language models and present a novel approach that iterates on augmentation quality by applying weakly-supervised filters. We evaluate our methods on the emotion and act classification tasks in DailyDialog and the intent classification task in Facebook Multilingual Task-Oriented Dialogue. Models fine-tuned on our augmented data mixed with few-shot ground truth data are able to approach or surpass existing state-of-the-art performance on both datasets. For DailyDialog specifically, using 10% of the ground truth data we outperform the current state-of-the-art model which uses 100% of the data.
△ Less
Submitted 2 November, 2022; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation
Authors:
Xueliang Zhao,
Yuxuan Wang,
Chongyang Tao,
Chenshuo Wang,
Dongyan Zhao
Abstract:
We study video-grounded dialogue generation, where a response is generated based on the dialogue context and the associated video. The primary challenges of this task lie in (1) the difficulty of integrating video data into pre-trained language models (PLMs) which presents obstacles to exploiting the power of large-scale pre-training; and (2) the necessity of taking into account the complementarit…
▽ More
We study video-grounded dialogue generation, where a response is generated based on the dialogue context and the associated video. The primary challenges of this task lie in (1) the difficulty of integrating video data into pre-trained language models (PLMs) which presents obstacles to exploiting the power of large-scale pre-training; and (2) the necessity of taking into account the complementarity of various modalities throughout the reasoning process. Although having made remarkable progress in video-grounded dialogue generation, existing methods still fall short when it comes to integrating with PLMs in a way that allows information from different modalities to complement each other. To alleviate these issues, we first propose extracting pertinent information from videos and turning it into reasoning paths that are acceptable to PLMs. Additionally, we propose a multi-agent reinforcement learning method to collaboratively perform reasoning on different modalities (i.e., video and dialogue context). Empirical experiment results on two public datasets indicate that the proposed model can significantly outperform state-of-the-art models by large margins on both automatic and human evaluations.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
There Is No Standard Answer: Knowledge-Grounded Dialogue Generation with Adversarial Activated Multi-Reference Learning
Authors:
Xueliang Zhao,
Tingchen Fu,
Chongyang Tao,
Rui Yan
Abstract:
Knowledge-grounded conversation (KGC) shows excellent potential to deliver an engaging and informative response. However, existing approaches emphasize selecting one golden knowledge given a particular dialogue context, overlooking the one-to-many phenomenon in dialogue. As a result, the existing paradigm limits the diversity of knowledge selection and generation. To this end, we establish a multi…
▽ More
Knowledge-grounded conversation (KGC) shows excellent potential to deliver an engaging and informative response. However, existing approaches emphasize selecting one golden knowledge given a particular dialogue context, overlooking the one-to-many phenomenon in dialogue. As a result, the existing paradigm limits the diversity of knowledge selection and generation. To this end, we establish a multi-reference KGC dataset and propose a series of metrics to systematically assess the one-to-many efficacy of existing KGC models. Furthermore, to extend the hypothesis space of knowledge selection to enhance the mapping relationship between multiple knowledge and multiple responses, we devise a span-based variational model and optimize the model in a wake-sleep style with an ameliorated evidence lower bound objective to learn the one-to-many generalization. Both automatic and human evaluations demonstrate the efficacy of our approach.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling
Authors:
Dongsheng Chen,
Chaofan Tao,
Lu Hou,
Lifeng Shang,
Xin Jiang,
Qun Liu
Abstract:
Recent large-scale video-language pre-trained models have shown appealing performance on various downstream tasks. However, the pre-training process is computationally expensive due to the requirement of millions of video-text pairs and the redundant data structure of each video. To mitigate these problems, we propose LiteVL, which adapts a pre-trained image-language model BLIP into a video-text m…
▽ More
Recent large-scale video-language pre-trained models have shown appealing performance on various downstream tasks. However, the pre-training process is computationally expensive due to the requirement of millions of video-text pairs and the redundant data structure of each video. To mitigate these problems, we propose LiteVL, which adapts a pre-trained image-language model BLIP into a video-text model directly on downstream tasks, without heavy pre-training. To enhance the temporal modeling lacking in the image-language model, we propose to add temporal attention modules in the image encoder of BLIP with dynamic temporal scaling. Besides the model-wise adaptation, we also propose a non-parametric pooling mechanism to adaptively reweight the fine-grained video embedding conditioned on the text. Experimental results on text-video retrieval and video question answering show that the proposed LiteVL even outperforms previous video-language pre-trained models by a clear margin, though without any video-language pre-training.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
ODG-Q: Robust Quantization via Online Domain Generalization
Authors:
Chaofan Tao,
Ngai Wong
Abstract:
Quantizing neural networks to low-bitwidth is important for model deployment on resource-limited edge hardware. Although a quantized network has a smaller model size and memory footprint, it is fragile to adversarial attacks. However, few methods study the robustness and training efficiency of quantized networks. To this end, we propose a new method by recasting robust quantization as an online do…
▽ More
Quantizing neural networks to low-bitwidth is important for model deployment on resource-limited edge hardware. Although a quantized network has a smaller model size and memory footprint, it is fragile to adversarial attacks. However, few methods study the robustness and training efficiency of quantized networks. To this end, we propose a new method by recasting robust quantization as an online domain generalization problem, termed ODG-Q, which generates diverse adversarial data at a low cost during training. ODG-Q consistently outperforms existing works against various adversarial attacks. For example, on CIFAR-10 dataset, ODG-Q achieves 49.2% average improvements under five common white-box attacks and 21.7% average improvements under five common black-box attacks, with a training cost similar to that of natural training (viz. without adversaries). To our best knowledge, this work is the first work that trains both quantized and binary neural networks on ImageNet that consistently improve robustness under different attacks. We also provide a theoretical insight of ODG-Q that accounts for the bound of model risk on attacked data.
△ Less
Submitted 16 October, 2022;
originally announced October 2022.
-
Bump Morphology of the CMAGIC Diagram
Authors:
L. Aldoroty,
L. Wang,
P. Hoeflich,
J. Yang,
N. Suntzeff,
G. Aldering,
P. Antilogus,
C. Aragon,
S. Bailey,
C. Baltay,
S. Bongard,
K. Boone,
C. Buton,
Y. Copin,
S. Dixon,
D. Fouchez,
E. Gangler,
R. Gupta,
B. Hayden,
Mitchell Karmen,
A. G. Kim,
M. Kowalski,
D. Küsters,
P. -F. Léget,
F. Mondon
, et al. (16 additional authors not shown)
Abstract:
We apply the color-magnitude intercept calibration method (CMAGIC) to the Nearby Supernova Factory SNe Ia spectrophotometric dataset. The currently existing CMAGIC parameters are the slope and intercept of a straight line fit to the first linear region in the color-magnitude diagram, which occurs over a span of approximately 30 days after maximum brightness. We define a new parameter, $ω_{XY}$, th…
▽ More
We apply the color-magnitude intercept calibration method (CMAGIC) to the Nearby Supernova Factory SNe Ia spectrophotometric dataset. The currently existing CMAGIC parameters are the slope and intercept of a straight line fit to the first linear region in the color-magnitude diagram, which occurs over a span of approximately 30 days after maximum brightness. We define a new parameter, $ω_{XY}$, the size of the ``bump'' feature near maximum brightness for arbitrary filters $X$ and $Y$. We find a significant correlation between the slope of the first linear region, $β_{XY, 1}$, in the CMAGIC diagram and $ω_{XY}$. These results may be used to our advantage, as they are less affected by extinction than parameters defined as a function of time. Additionally, $ω_{XY}$ is computed independently of templates. We find that current empirical templates are successful at reproducing the features described in this work, particularly SALT3, which correctly exhibits the negative correlation between slope and bump size seen in our data. In 1-D simulations, we show that the correlation between the size of the bump feature and $β_{XY, 1}$ can be understood as a result of chemical mixing due to large-scale Rayleigh-Taylor instabilities.
△ Less
Submitted 22 June, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
GAMA: Generative Adversarial Multi-Object Scene Attacks
Authors:
Abhishek Aich,
Calvin-Khang Ta,
Akash Gupta,
Chengyu Song,
Srikanth V. Krishnamurthy,
M. Salman Asif,
Amit K. Roy-Chowdhury
Abstract:
The majority of methods for crafting adversarial attacks have focused on scenes with a single dominant object (e.g., images from ImageNet). On the other hand, natural scenes include multiple dominant objects that are semantically related. Thus, it is crucial to explore designing attack strategies that look beyond learning on single-object scenes or attack single-object victim classifiers. Due to t…
▽ More
The majority of methods for crafting adversarial attacks have focused on scenes with a single dominant object (e.g., images from ImageNet). On the other hand, natural scenes include multiple dominant objects that are semantically related. Thus, it is crucial to explore designing attack strategies that look beyond learning on single-object scenes or attack single-object victim classifiers. Due to their inherent property of strong transferability of perturbations to unknown models, this paper presents the first approach of using generative models for adversarial attacks on multi-object scenes. In order to represent the relationships between different objects in the input scene, we leverage upon the open-sourced pre-trained vision-language model CLIP (Contrastive Language-Image Pre-training), with the motivation to exploit the encoded semantics in the language space along with the visual space. We call this attack approach Generative Adversarial Multi-object scene Attacks (GAMA). GAMA demonstrates the utility of the CLIP model as an attacker's tool to train formidable perturbation generators for multi-object scenes. Using the joint image-text features to train the generator, we show that GAMA can craft potent transferable perturbations in order to fool victim classifiers in various attack settings. For example, GAMA triggers ~16% more misclassification than state-of-the-art generative approaches in black-box settings where both the classifier architecture and data distribution of the attacker are different from the victim. Our code is available here: https://abhishekaich27.github.io/gama.html
△ Less
Submitted 15 October, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Online Updating Huber Robust Regression for Big Data Streams
Authors:
Chunbai Tao,
Shanshan Wang
Abstract:
Big data streams are grasping increasing attention with the development of modern science and information technology. Due to the incompatibility of limited computer memory to high volume of streaming data, real-time methods without historical data storage is worth investigating. Moreover, outliers may occur with high velocity data streams generating, calling for more robust analysis. Motivated by…
▽ More
Big data streams are grasping increasing attention with the development of modern science and information technology. Due to the incompatibility of limited computer memory to high volume of streaming data, real-time methods without historical data storage is worth investigating. Moreover, outliers may occur with high velocity data streams generating, calling for more robust analysis. Motivated by these concerns, a novel Online Updating Huber Robust Regression algorithm is proposed in this paper. By extracting key features of new data subsets, it obtains a computational efficient online updating estimator without historical data storage. Meanwhile, by integrating Huber regression into the framework, the estimator is robust to contaminated data streams, such as heavy-tailed or heterogeneous distributed ones as well as cases with outliers. Moreover, the proposed online updating estimator is asymptotically equivalent to Oracle estimator obtained by the entire data and has a lower computation complexity. Extensive numerical simulations and a real data analysis are also conducted to evaluate the estimation and calculation efficiency of the proposed method.
△ Less
Submitted 28 June, 2023; v1 submitted 4 September, 2022;
originally announced September 2022.
-
LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval
Authors:
Tao Shen,
Xiubo Geng,
Chongyang Tao,
Can Xu,
Xiaolong Huang,
Binxing Jiao,
Linjun Yang,
Daxin Jiang
Abstract:
In large-scale retrieval, the lexicon-weighting paradigm, learning weighted sparse representations in vocabulary space, has shown promising results with high quality and low latency. Despite it deeply exploiting the lexicon-representing capability of pre-trained language models, a crucial gap remains between language modeling and lexicon-weighting retrieval -- the former preferring certain or low-…
▽ More
In large-scale retrieval, the lexicon-weighting paradigm, learning weighted sparse representations in vocabulary space, has shown promising results with high quality and low latency. Despite it deeply exploiting the lexicon-representing capability of pre-trained language models, a crucial gap remains between language modeling and lexicon-weighting retrieval -- the former preferring certain or low-entropy words whereas the latter favoring pivot or high-entropy words -- becoming the main barrier to lexicon-weighting performance for large-scale retrieval. To bridge this gap, we propose a brand-new pre-training framework, lexicon-bottlenecked masked autoencoder (LexMAE), to learn importance-aware lexicon representations. Essentially, we present a lexicon-bottlenecked module between a normal language modeling encoder and a weakened decoder, where a continuous bag-of-words bottleneck is constructed to learn a lexicon-importance distribution in an unsupervised fashion. The pre-trained LexMAE is readily transferred to the lexicon-weighting retrieval via fine-tuning. On the ad-hoc retrieval benchmark, MS-Marco, it achieves 42.6% MRR@10 with 45.8 QPS for the passage dataset and 44.4% MRR@100 with 134.8 QPS for the document dataset, by a CPU machine. And LexMAE shows state-of-the-art zero-shot transfer capability on BEIR benchmark with 12 datasets.
△ Less
Submitted 4 June, 2023; v1 submitted 31 August, 2022;
originally announced August 2022.
-
LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval
Authors:
Kai Zhang,
Chongyang Tao,
Tao Shen,
Can Xu,
Xiubo Geng,
Binxing Jiao,
Daxin Jiang
Abstract:
Retrieval models based on dense representations in semantic space have become an indispensable branch for first-stage retrieval. These retrievers benefit from surging advances in representation learning towards compressive global sequence-level embeddings. However, they are prone to overlook local salient phrases and entity mentions in texts, which usually play pivot roles in first-stage retrieval…
▽ More
Retrieval models based on dense representations in semantic space have become an indispensable branch for first-stage retrieval. These retrievers benefit from surging advances in representation learning towards compressive global sequence-level embeddings. However, they are prone to overlook local salient phrases and entity mentions in texts, which usually play pivot roles in first-stage retrieval. To mitigate this weakness, we propose to make a dense retriever align a well-performing lexicon-aware representation model. The alignment is achieved by weakened knowledge distillations to enlighten the retriever via two aspects -- 1) a lexicon-augmented contrastive objective to challenge the dense encoder and 2) a pair-wise rank-consistent regularization to make dense model's behavior incline to the other. We evaluate our model on three public benchmarks, which shows that with a comparable lexicon-aware retriever as the teacher, our proposed dense one can bring consistent and significant improvements, and even outdo its teacher. In addition, we found our improvement on the dense retriever is complementary to the standard ranker distillation, which can further lift state-of-the-art performance.
△ Less
Submitted 2 March, 2023; v1 submitted 29 August, 2022;
originally announced August 2022.
-
Multiple topological nodal structure in LaSb2 with large linear magnetoresistance
Authors:
Y. X. Qiao,
Z. C. Tao,
F. Y. Wang,
Huaiqiang Wang,
Z. C. Jiang,
Z. T. Liu,
Soohyun Cho,
F. Y. Zhang,
Q. K. Meng,
W. Xia,
Y. C. Yang,
Z. Huang,
J. S. Liu,
Z. H. Liu,
Z. W. Zhu,
S. Qiao,
Y. F. Guo,
Haijun Zhang,
Dawei Shen
Abstract:
Unconventional fermions in the immensely studied topological semimetals are the source for rich exotic topological properties. Here, using symmetry analysis and first-principles calculations, we propose the coexistence of multiple topological nodal structure in LaSb2, including topological nodal surfaces, nodal lines and in particular eightfold degenerate nodal points, which have been scarcely obs…
▽ More
Unconventional fermions in the immensely studied topological semimetals are the source for rich exotic topological properties. Here, using symmetry analysis and first-principles calculations, we propose the coexistence of multiple topological nodal structure in LaSb2, including topological nodal surfaces, nodal lines and in particular eightfold degenerate nodal points, which have been scarcely observed in a single material. Further, utilizing high resolution angle-resolved photoemission spectroscopy in combination with Shubnikov-de Haas quantum oscillations measurements, we confirm the existence of nodal surfaces and eightfold degenerate nodal points in LaSb2, and extract the π Berry phase proving the non-trivial electronic band structure topology therein. The intriguing multiple topological nodal structure might play a crucial role in giving rise to the large linear magnetoresistance. Our work renews the insights into the exotic topological phenomena in LaSb2 and its analogous.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Imperfect chirality at exceptional points in optical whispering-gallery microcavities
Authors:
Junda Zhu,
Changqing Wang,
Can Tao,
Zhoutian Fu,
Haitao Liu,
Fang Bo,
Lan Yang,
Guoquan Zhang,
Jingjun Xu
Abstract:
Non-Hermitian systems have attracted considerable attention for their broad impacts on various physical platforms and peculiar applications. In non-Hermitian systems, both eigenvalues and eigenstates simultaneously coalesce at exceptional points (EPs). As one of the remarkable features of EPs, the field chirality is commonly considered perfect, which is utilized as an intriguing feature to control…
▽ More
Non-Hermitian systems have attracted considerable attention for their broad impacts on various physical platforms and peculiar applications. In non-Hermitian systems, both eigenvalues and eigenstates simultaneously coalesce at exceptional points (EPs). As one of the remarkable features of EPs, the field chirality is commonly considered perfect, which is utilized as an intriguing feature to control wave propagation and regarded as a criterion of EP. However, in this work, we discover an imperfect chirality of eigenmodes at the EPs in an optical whispering gallery mode (WGM) microcavity perturbed by two strong nanoscatterers. This counterintuitive phenomenon originates from a strong frequency-dependence of the scattering between the counterpropagating waves at an "effective scatterer", which could be explained by a first-principle-based model considering a dynamic multiple-scattering process of the azimuthally propagating modes. We find that the generally imperfect chirality at the EP tends to be globally perfect with the decrease of the scattering effect induced by the nanoscatterers. Furthermore, the chirality also becomes locally perfect with the decrease of the relative azimuthal angle between the two strong nanoscatterers. This work provides a new understanding of the general properties of chirality at EPs. It will benefit the potential applications enabled by the chirality features of non-Hermitian systems at EPs.
△ Less
Submitted 15 August, 2022;
originally announced August 2022.
-
Void BAO measurements on quasars from eBOSS
Authors:
A. Tamone,
C. Zhao,
D. Forero-Sánchez,
A. Variu,
C. -H. Chuang,
F. -S. Kitaura,
J. -P. Kneib,
C. Tao
Abstract:
We present the clustering of voids based on the quasar (QSO) sample of the extended Baryon Oscillation Spectroscopic Survey Data Release 16 in configuration space. We define voids as overlapping empty circumspheres computed by Delaunay tetrahedra spanned by quartets of quasars, allowing for an estimate of the depth of underdense regions. To maximise the BAO signal-to-noise ratio, we consider only…
▽ More
We present the clustering of voids based on the quasar (QSO) sample of the extended Baryon Oscillation Spectroscopic Survey Data Release 16 in configuration space. We define voids as overlapping empty circumspheres computed by Delaunay tetrahedra spanned by quartets of quasars, allowing for an estimate of the depth of underdense regions. To maximise the BAO signal-to-noise ratio, we consider only voids with radii larger than 36$h^{-1}$Mpc. Our analysis shows a negative BAO peak in the cross-correlation of QSOs and voids. The joint BAO measurement of the QSO auto-correlation and the corresponding cross-correlation with voids shows an improvement in 70$\%$ of the QSO mocks with an average improvement of $\sim5\%$. However, on the SDSS data, we find no improvement compatible with cosmic variance. For both mocks and data, adding voids does not introduce any bias. We find under the flat $Λ$CDM assumption, a distance joint measurement on data at the effective redshift $z_{\rm eff}=1.48$ of $D_V(z_{\rm eff})=26.297\pm0.547$. A forecast of a DESI-like survey with 1000 boxes with a similar effective volume recovers the same results as for light-cone mocks with an average of 4.8$\%$ improvement in 68$\%$ of the boxes.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.
-
A Probabilistic Autoencoder for Type Ia Supernovae Spectral Time Series
Authors:
George Stein,
Uros Seljak,
Vanessa Bohm,
G. Aldering,
P. Antilogus,
C. Aragon,
S. Bailey,
C. Baltay,
S. Bongard,
K. Boone,
C. Buton,
Y. Copin,
S. Dixon,
D. Fouchez,
E. Gangler,
R. Gupta,
B. Hayden,
W. Hillebrandt,
M. Karmen,
A. G. Kim,
M. Kowalski,
D. Kusters,
P. F. Leget,
F. Mondon,
J. Nordin
, et al. (15 additional authors not shown)
Abstract:
We construct a physically-parameterized probabilistic autoencoder (PAE) to learn the intrinsic diversity of type Ia supernovae (SNe Ia) from a sparse set of spectral time series. The PAE is a two-stage generative model, composed of an Auto-Encoder (AE) which is interpreted probabilistically after training using a Normalizing Flow (NF). We demonstrate that the PAE learns a low-dimensional latent sp…
▽ More
We construct a physically-parameterized probabilistic autoencoder (PAE) to learn the intrinsic diversity of type Ia supernovae (SNe Ia) from a sparse set of spectral time series. The PAE is a two-stage generative model, composed of an Auto-Encoder (AE) which is interpreted probabilistically after training using a Normalizing Flow (NF). We demonstrate that the PAE learns a low-dimensional latent space that captures the nonlinear range of features that exists within the population, and can accurately model the spectral evolution of SNe Ia across the full range of wavelength and observation times directly from the data. By introducing a correlation penalty term and multi-stage training setup alongside our physically-parameterized network we show that intrinsic and extrinsic modes of variability can be separated during training, removing the need for the additional models to perform magnitude standardization. We then use our PAE in a number of downstream tasks on SNe Ia for increasingly precise cosmological analyses, including automatic detection of SN outliers, the generation of samples consistent with the data distribution, and solving the inverse problem in the presence of noisy and incomplete data to constrain cosmological distance measurements. We find that the optimal number of intrinsic model parameters appears to be three, in line with previous studies, and show that we can standardize our test sample of SNe Ia with an RMS of $0.091 \pm 0.010$ mag, which corresponds to $0.074 \pm 0.010$ mag if peculiar velocity contributions are removed. Trained models and codes are released at \href{https://github.com/georgestein/suPAErnova}{github.com/georgestein/suPAErnova}
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
A multi-cubic-kilometre neutrino telescope in the western Pacific Ocean
Authors:
Z. P. Ye,
F. Hu,
W. Tian,
Q. C. Chang,
Y. L. Chang,
Z. S. Cheng,
J. Gao,
T. Ge,
G. H. Gong,
J. Guo,
X. X. Guo,
X. G. He,
J. T. Huang,
K. Jiang,
P. K. Jiang,
Y. P. Jing,
H. L. Li,
J. L. Li,
L. Li,
W. L. Li,
Z. Li,
N. Y. Liao,
Q. Lin,
F. Liu,
J. L. Liu
, et al. (33 additional authors not shown)
Abstract:
Next-generation neutrino telescopes with significantly improved sensitivity are required to pinpoint the sources of the diffuse astrophysical neutrino flux detected by IceCube and uncover the century-old puzzle of cosmic ray origins. A detector near the equator will provide a unique viewpoint of the neutrino sky, complementing IceCube and other neutrino telescopes in the Northern Hemisphere. Here…
▽ More
Next-generation neutrino telescopes with significantly improved sensitivity are required to pinpoint the sources of the diffuse astrophysical neutrino flux detected by IceCube and uncover the century-old puzzle of cosmic ray origins. A detector near the equator will provide a unique viewpoint of the neutrino sky, complementing IceCube and other neutrino telescopes in the Northern Hemisphere. Here we present results from an expedition to the north-eastern region of the South China Sea, in the western Pacific Ocean. A favorable neutrino telescope site was found on an abyssal plain at a depth of $\sim$ 3.5km. At depths below 3km, the sea current speed, water absorption and scattering lengths for Cherenkov light, were measured to be $v_{\mathrm{c}}<$10cm/s, $λ_{\mathrm{abs} }\simeq$ 27m and $λ_{\mathrm{sca} }\simeq$ 63m, respectively. Accounting for these measurements, we present the design and expected performance of a next-generation neutrino telescope, TRopIcal DEep-sea Neutrino Telescope (TRIDENT). With its advanced photon-detection technology and large dimensions, TRIDENT expects to observe the IceCube steady source candidate NGC 1068 with 5$σ$ significance within 1 year of operation. This level of sensitivity will open a new arena for diagnosing the origin of cosmic rays and probing fundamental physics over astronomical baselines.
△ Less
Submitted 13 May, 2024; v1 submitted 10 July, 2022;
originally announced July 2022.
-
Path Integral Methods with Stochastic Control Barrier Functions
Authors:
Chuyuan Tao,
Hyung-Jin Yoon,
Hunmin Kim,
Naira Hovakimyan,
Petros Voulgaris
Abstract:
Safe control designs for robotic systems remain challenging because of the difficulties of explicitly solving optimal control with nonlinear dynamics perturbed by stochastic noise. However, recent technological advances in computing devices enable online optimization or sampling-based methods to solve control problems. For example, Control Barrier Functions (CBFs), a Lyapunov-like control algorith…
▽ More
Safe control designs for robotic systems remain challenging because of the difficulties of explicitly solving optimal control with nonlinear dynamics perturbed by stochastic noise. However, recent technological advances in computing devices enable online optimization or sampling-based methods to solve control problems. For example, Control Barrier Functions (CBFs), a Lyapunov-like control algorithm, have been proposed to numerically solve convex optimizations that determine control input to stay in the safe set. Model Predictive Path Integral (MPPI) uses forward sampling of stochastic differential equations to solve optimal control problems online. Both control algorithms are widely used for nonlinear systems because they avoid calculating the derivatives of the nonlinear dynamic function. In this paper, we utilize Stochastic Control Barrier Functions (SCBFs) constraints to limit sample regions in the sample-based algorithm, ensuring safety in a probabilistic sense and improving sample efficiency with a stochastic differential equation. We provide a sampling complexity analysis for the required sample size of our algorithm and show that our algorithm needs fewer samples than the original MPPI algorithm does. Finally, we apply our algorithm to a path planning problem in a cluttered environment and compare the performance of the algorithms.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Low-Resource NLP
Authors:
Yufei Wang,
Jiayi Zheng,
Can Xu,
Xiubo Geng,
Tao Shen,
Chongyang Tao,
Daxin Jiang
Abstract:
This paper focuses on the data augmentation for low-resource NLP tasks where the training set is limited. The existing solutions either leverage task-independent heuristic rules (e.g., Synonym Replacement) or fine-tune general-purpose pre-trained language models (e.g., GPT2) using the limited training instances to produce new synthetic data. Consequently, they have trivial task-specific knowledge…
▽ More
This paper focuses on the data augmentation for low-resource NLP tasks where the training set is limited. The existing solutions either leverage task-independent heuristic rules (e.g., Synonym Replacement) or fine-tune general-purpose pre-trained language models (e.g., GPT2) using the limited training instances to produce new synthetic data. Consequently, they have trivial task-specific knowledge and are limited to yielding low-quality synthetic data. To combat this issue, we propose Knowledge Mixture Data Augmentation Model (KnowDA) which is an Seq2Seq language model pre-trained on a mixture of diverse NLP tasks under a novel framework of Knowledge Mixture Training (KoMT). The goal of KoMT is to condense diverse NLP task-specific knowledge into the single KnowDA model (i.e., all-in-one) such that KnowDA could utilize these knowledge to quickly grasp the inherent synthesis law of the target task through limited training instances. Specifically, KoMT reformulates input examples from various heterogeneous NLP tasks into a unified text-to-text format, and employs denoising training objectives in different granularity to learn to reconstruct partial or complete samples. To the best of our knowledge, we are the first attempt to apply 100+ NLP multi-task training for data augmentation. Extensive experiments show that i) the synthetic data produced by KnowDA successfully improves performance of the strong pre-trained language models (i.e., Bert, ALBert and Deberta) by a large margin on the low-resource NLP benchmark FewGLUE, CoNLL'03 and WikiAnn; ii) KnowDA successfully transfers the task knowledge to NLP tasks whose types are seen and unseen in KoMT.
△ Less
Submitted 27 January, 2023; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Towards Robust Ranker for Text Retrieval
Authors:
Yucheng Zhou,
Tao Shen,
Xiubo Geng,
Chongyang Tao,
Can Xu,
Guodong Long,
Binxing Jiao,
Daxin Jiang
Abstract:
A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranke…
▽ More
A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranker. Thereby, we propose multiple retrievers as negative generators improve the ranker's robustness, where i) involving extensive out-of-distribution label noises renders the ranker against each noise distribution, and ii) diverse hard negatives from a joint distribution are relatively close to the ranker's negative distribution, leading to more challenging thus effective training. To evaluate our robust ranker (dubbed R$^2$anker), we conduct experiments in various settings on the popular passage retrieval benchmark, including BM25-reranking, full-ranking, retriever distillation, etc. The empirical results verify the new state-of-the-art effectiveness of our model.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Poisson2Sparse: Self-Supervised Poisson Denoising From a Single Image
Authors:
Calvin-Khang Ta,
Abhishek Aich,
Akash Gupta,
Amit K. Roy-Chowdhury
Abstract:
Image enhancement approaches often assume that the noise is signal independent, and approximate the degradation model as zero-mean additive Gaussian. However, this assumption does not hold for biomedical imaging systems where sensor-based sources of noise are proportional to signal strengths, and the noise is better represented as a Poisson process. In this work, we explore a sparsity and dictiona…
▽ More
Image enhancement approaches often assume that the noise is signal independent, and approximate the degradation model as zero-mean additive Gaussian. However, this assumption does not hold for biomedical imaging systems where sensor-based sources of noise are proportional to signal strengths, and the noise is better represented as a Poisson process. In this work, we explore a sparsity and dictionary learning-based approach and present a novel self-supervised learning method for single-image denoising where the noise is approximated as a Poisson process, requiring no clean ground-truth data. Specifically, we approximate traditional iterative optimization algorithms for image denoising with a recurrent neural network that enforces sparsity with respect to the weights of the network. Since the sparse representations are based on the underlying image, it is able to suppress the spurious components (noise) in the image patches, thereby introducing implicit regularization for denoising tasks through the network structure. Experiments on two bio-imaging datasets demonstrate that our method outperforms the state-of-the-art approaches in terms of PSNR and SSIM. Our qualitative results demonstrate that, in addition to higher performance on standard quantitative metrics, we are able to recover much more subtle details than other compared approaches. Our code is made publicly available at https://github.com/tacalvin/Poisson2Sparse
△ Less
Submitted 27 June, 2022; v1 submitted 3 June, 2022;
originally announced June 2022.
-
Siamese Image Modeling for Self-Supervised Vision Representation Learning
Authors:
Chenxin Tao,
Xizhou Zhu,
Weijie Su,
Gao Huang,
Bin Li,
Jie Zhou,
Yu Qiao,
Xiaogang Wang,
Jifeng Dai
Abstract:
Self-supervised learning (SSL) has delivered superior performance on a variety of downstream vision tasks. Two main-stream SSL frameworks have been proposed, i.e., Instance Discrimination (ID) and Masked Image Modeling (MIM). ID pulls together representations from different views of the same image, while avoiding feature collapse. It lacks spatial sensitivity, which requires modeling the local str…
▽ More
Self-supervised learning (SSL) has delivered superior performance on a variety of downstream vision tasks. Two main-stream SSL frameworks have been proposed, i.e., Instance Discrimination (ID) and Masked Image Modeling (MIM). ID pulls together representations from different views of the same image, while avoiding feature collapse. It lacks spatial sensitivity, which requires modeling the local structure within each image. On the other hand, MIM reconstructs the original content given a masked image. It instead does not have good semantic alignment, which requires projecting semantically similar views into nearby representations. To address this dilemma, we observe that (1) semantic alignment can be achieved by matching different image views with strong augmentations; (2) spatial sensitivity can benefit from predicting dense representations with masked images. Driven by these analysis, we propose Siamese Image Modeling (SiameseIM), which predicts the dense representations of an augmented view, based on another masked view from the same image but with different augmentations. SiameseIM uses a Siamese network with two branches. The online branch encodes the first view, and predicts the second view's representation according to the relative positions between these two views. The target branch produces the target by encoding the second view. SiameseIM can surpass both ID and MIM on a wide range of downstream tasks, including ImageNet finetuning and linear probing, COCO and LVIS detection, and ADE20k semantic segmentation. The improvement is more significant in few-shot, long-tail and robustness-concerned scenarios. Code shall be released at https://github.com/fundamentalvision/Siamese-Image-Modeling.
△ Less
Submitted 16 November, 2022; v1 submitted 2 June, 2022;
originally announced June 2022.
-
UnifieR: A Unified Retriever for Large-Scale Retrieval
Authors:
Tao Shen,
Xiubo Geng,
Chongyang Tao,
Can Xu,
Guodong Long,
Kai Zhang,
Daxin Jiang
Abstract:
Large-scale retrieval is to recall relevant documents from a huge collection given a query. It relies on representation learning to embed documents and queries into a common semantic encoding space. According to the encoding space, recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. These two paradigms…
▽ More
Large-scale retrieval is to recall relevant documents from a huge collection given a query. It relies on representation learning to embed documents and queries into a common semantic encoding space. According to the encoding space, recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. These two paradigms unveil the PLMs' representation capability in different granularities, i.e., global sequence-level compression and local word-level contexts, respectively. Inspired by their complementary global-local contextualization and distinct representing views, we propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability. Experiments on passage retrieval benchmarks verify its effectiveness in both paradigms. A uni-retrieval scheme is further presented with even better retrieval quality. We lastly evaluate the model on BEIR benchmark to verify its transferability.
△ Less
Submitted 4 June, 2023; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Uniform Recalibration of Common Spectrophotometry Standard Stars onto the CALSPEC System using the SuperNova Integral Field Spectrograph
Authors:
David Rubin,
G. Aldering,
P. Antilogus,
C. Aragon,
S. Bailey,
C. Baltay,
S. Bongard,
K. Boone,
C. Buton,
Y. Copin,
S. Dixon,
D. Fouchez,
E. Gangler,
R. Gupta,
B. Hayden,
W. Hillebrandt,
A. G. Kim,
M. Kowalski,
D. Kuesters,
P. -F. Leget,
F. Mondon,
J. Nordin,
R. Pain,
E. Pecontal,
R. Pereira
, et al. (13 additional authors not shown)
Abstract:
We calibrate spectrophotometric optical spectra of 32 stars commonly used as standard stars, referenced to 14 stars already on the HST-based CALSPEC flux system. Observations of CALSPEC and non-CALSPEC stars were obtained with the SuperNova Integral Field Spectrograph over the wavelength range 3300 A to 9400 A as calibration for the Nearby Supernova Factory cosmology experiment. In total, this ana…
▽ More
We calibrate spectrophotometric optical spectra of 32 stars commonly used as standard stars, referenced to 14 stars already on the HST-based CALSPEC flux system. Observations of CALSPEC and non-CALSPEC stars were obtained with the SuperNova Integral Field Spectrograph over the wavelength range 3300 A to 9400 A as calibration for the Nearby Supernova Factory cosmology experiment. In total, this analysis used 4289 standard-star spectra taken on photometric nights. As a modern cosmology analysis, all pre-submission methodological decisions were made with the flux scale and external comparison results blinded. The large number of spectra per star allows us to treat the wavelength-by-wavelength calibration for all nights simultaneously with a Bayesian hierarchical model, thereby enabling a consistent treatment of the Type Ia supernova cosmology analysis and the calibration on which it critically relies. We determine the typical per-observation repeatability (median 14 mmag for exposures >~ 5 s), the Maunakea atmospheric transmission distribution (median dispersion of 7 mmag with uncertainty 1 mmag), and the scatter internal to our CALSPEC reference stars (median of 8 mmag). We also check our standards against literature filter photometry, finding generally good agreement over the full 12-magnitude range. Overall, the mean of our system is calibrated to the mean of CALSPEC at the level of ~ 3 mmag. With our large number of observations, careful crosschecks, and 14 reference stars, our results are the best calibration yet achieved with an integral-field spectrograph, and among the best calibrated surveys.
△ Less
Submitted 21 June, 2022; v1 submitted 2 May, 2022;
originally announced May 2022.
-
Euclid: Searching for pair-instability supernovae with the Deep Survey
Authors:
T. J. Moriya,
C. Inserra,
M. Tanaka,
E. Cappellaro,
M. Della Valle,
I. Hook,
R. Kotak,
G. Longo,
F. Mannucci,
S. Mattila,
C. Tao,
B. Altieri,
A. Amara,
N. Auricchio,
D. Bonino,
E. Branchini,
M. Brescia,
J. Brinchmann,
S. Camera,
V. Capobianco,
C. Carbone,
J. Carretero,
M. Castellano,
S. Cavuoti,
A. Cimatti
, et al. (84 additional authors not shown)
Abstract:
Pair-instability supernovae are theorized supernovae that have not yet been observationally confirmed. They are predicted to exist in low-metallicity environments. Because overall metallicity becomes lower at higher redshifts, deep near-infrared transient surveys probing high-redshift supernovae are suitable to discover pair-instability supernovae. The Euclid satellite, which is planned to be laun…
▽ More
Pair-instability supernovae are theorized supernovae that have not yet been observationally confirmed. They are predicted to exist in low-metallicity environments. Because overall metallicity becomes lower at higher redshifts, deep near-infrared transient surveys probing high-redshift supernovae are suitable to discover pair-instability supernovae. The Euclid satellite, which is planned to be launched in 2023, has a near-infrared wide-field instrument that is suitable for a high-redshift supernova survey. The Euclid Deep Survey is planned to make regular observations of three Euclid Deep Fields (40 deg2 in total) spanning the Euclid's 6 year primary mission period. While the observations of the Euclid Deep Fields are not frequent, we show that the predicted long duration of pair-instability supernovae would allow us to search for high-redshift pair-instability supernovae with the Euclid Deep Survey. Based on the current observational plan of the Euclid mission, we conduct survey simulations in order to estimate the expected numbers of pair-instability supernova discoveries. We find that up to several hundred pair-instability supernovae at z < ~ 3.5 can be discovered within the Euclid Deep Survey. We also show that pair-instability supernova candidates can be efficiently identified by their duration and color that can be determined with the current Euclid Deep Survey plan. We conclude that the Euclid mission can lead to the first confirmation of pair-instability supernovae if their event rates are as high as those predicted by recent theoretical studies. We also update the expected numbers of superluminous supernova discoveries in the Euclid Deep Survey based on the latest observational plan.
△ Less
Submitted 26 August, 2022; v1 submitted 19 April, 2022;
originally announced April 2022.
-
Learning to Express in Knowledge-Grounded Conversation
Authors:
Xueliang Zhao,
Tingchen Fu,
Chongyang Tao,
Wei Wu,
Dongyan Zhao,
Rui Yan
Abstract:
Grounding dialogue generation by extra knowledge has shown great potentials towards building a system capable of replying with knowledgeable and engaging responses. Existing studies focus on how to synthesize a response with proper knowledge, yet neglect that the same knowledge could be expressed differently by speakers even under the same context. In this work, we mainly consider two aspects of k…
▽ More
Grounding dialogue generation by extra knowledge has shown great potentials towards building a system capable of replying with knowledgeable and engaging responses. Existing studies focus on how to synthesize a response with proper knowledge, yet neglect that the same knowledge could be expressed differently by speakers even under the same context. In this work, we mainly consider two aspects of knowledge expression, namely the structure of the response and style of the content in each part. We therefore introduce two sequential latent variables to represent the structure and the content style respectively. We propose a segmentation-based generation model and optimize the model by a variational approach to discover the underlying pattern of knowledge expression in a response. Evaluation results on two benchmarks indicate that our model can learn the structure style defined by a few examples and generate responses in desired content style.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
TOV: The Original Vision Model for Optical Remote Sensing Image Understanding via Self-supervised Learning
Authors:
Chao Tao,
Ji Qia,
Guo Zhang,
Qing Zhu,
Weipeng Lu,
Haifeng Li
Abstract:
Do we on the right way for remote sensing image understanding (RSIU) by training models via supervised data-dependent and task-dependent way, instead of human vision in a label-free and task-independent way? We argue that a more desirable RSIU model should be trained with intrinsic structure from data rather that extrinsic human labels to realize generalizability across a wide range of RSIU tasks.…
▽ More
Do we on the right way for remote sensing image understanding (RSIU) by training models via supervised data-dependent and task-dependent way, instead of human vision in a label-free and task-independent way? We argue that a more desirable RSIU model should be trained with intrinsic structure from data rather that extrinsic human labels to realize generalizability across a wide range of RSIU tasks. According to this hypothesis, we proposed \textbf{T}he \textbf{O}riginal \textbf{V}ision model (TOV) in remote sensing filed. Trained by massive unlabeled optical data along a human-like self-supervised learning (SSL) path that is from general knowledge to specialized knowledge, TOV model can be easily adapted to various RSIU tasks, including scene classification, object detection, and semantic segmentation, and outperforms dominant ImageNet supervised pretrained method as well as two recently proposed SSL pretrained methods on majority of 12 publicly available benchmarks. Moreover, we analyze the influences of two key factors on the performance of building TOV model for RSIU, including the influence of using different data sampling methods and the selection of learning paths during self-supervised optimization. We believe that a general model which is trained by a label-free and task-independent way may be the next paradigm for RSIU and hope the insights distilled from this study can help to foster the development of an original vision model for RSIU.
△ Less
Submitted 10 April, 2022;
originally announced April 2022.
-
There Are a Thousand Hamlets in a Thousand People's Eyes: Enhancing Knowledge-grounded Dialogue with Personal Memory
Authors:
Tingchen Fu,
Xueliang Zhao,
Chongyang Tao,
Ji-Rong Wen,
Rui Yan
Abstract:
Knowledge-grounded conversation (KGC) shows great potential in building an engaging and knowledgeable chatbot, and knowledge selection is a key ingredient in it. However, previous methods for knowledge selection only concentrate on the relevance between knowledge and dialogue context, ignoring the fact that age, hobby, education and life experience of an interlocutor have a major effect on his or…
▽ More
Knowledge-grounded conversation (KGC) shows great potential in building an engaging and knowledgeable chatbot, and knowledge selection is a key ingredient in it. However, previous methods for knowledge selection only concentrate on the relevance between knowledge and dialogue context, ignoring the fact that age, hobby, education and life experience of an interlocutor have a major effect on his or her personal preference over external knowledge. Without taking the personalization issue into account, it is difficult to select the proper knowledge and generate persona-consistent responses. In this work, we introduce personal memory into knowledge selection in KGC to address the personalization issue. We propose a variational method to model the underlying relationship between one's personal memory and his or her selection of knowledge, and devise a learning scheme in which the forward mapping from personal memory to knowledge and its inverse mapping is included in a closed loop so that they could teach each other. Experiment results show that our method outperforms existing KGC methods significantly on both automatic evaluation and human evaluation.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Model BOSS & eBOSS Luminous Red Galaxies at 0.2 < z < 1.0 using SubHalo Abundance Matching with 3 parameters
Authors:
Jiaxi Yu,
Cheng Zhao,
Chia-Hsun Chuang,
Julian Bautista,
Ginevra Favole,
Jean-Paul Kneib,
Faizan Mohammad,
Ashley Ross,
Anand Raichoor,
Charling Tao,
Kyle Dawson,
Graziano Rossi
Abstract:
SubHalo Abundance Matching (SHAM) is an empirical method for constructing galaxy catalogues based on high-resolution $N$-body simulations. We apply SHAM on the UNIT simulation to simulate SDSS BOSS/eBOSS Luminous Red Galaxies (LRGs) within a wide redshift range of $0.2 < z < 1.0$. Besides the typical SHAM scatter parameter $σ$, we include $v_{\rm smear}$ and $V_{\rm ceil}$ to take into account the…
▽ More
SubHalo Abundance Matching (SHAM) is an empirical method for constructing galaxy catalogues based on high-resolution $N$-body simulations. We apply SHAM on the UNIT simulation to simulate SDSS BOSS/eBOSS Luminous Red Galaxies (LRGs) within a wide redshift range of $0.2 < z < 1.0$. Besides the typical SHAM scatter parameter $σ$, we include $v_{\rm smear}$ and $V_{\rm ceil}$ to take into account the redshift uncertainty and the galaxy incompleteness respectively. These two additional parameters are critical for reproducing the observed 2PCF multipoles on 5--25$\,h^{-1}\,{\rm Mpc}$. The redshift uncertainties obtained from the best-fitting $v_{\rm smear}$ agree with those measured from repeat observations for all SDSS LRGs except for the LOWZ sample. We explore several potential systematics but none of them can explain the discrepancy found in LOWZ. Our explanation is that the LOWZ galaxies might contain another type of galaxies which needs to be treated differently. The evolution of the measured $σ$ and $V_{\rm ceil}$ also reveals that the incompleteness of eBOSS galaxies decreases with the redshift. This is the consequence of the magnitude lower limit applied in eBOSS LRG target selection. Our SHAM also set upper limits for the intrinsic scatter of the galaxy--halo relation given a complete galaxy sample: $σ_{\rm int}<0.31$ for LOWZ at $0.2<z<0.33$, $σ_{\rm int}<0.36$ for LOWZ at $0.33<z<0.43$, and $σ_{\rm int}<0.46$ for CMASS at $0.43<z<0.51$. The projected 2PCFs of our SHAM galaxies also agree with the observational ones on the 2PCF fitting range.
△ Less
Submitted 28 July, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
Compression of Generative Pre-trained Language Models via Quantization
Authors:
Chaofan Tao,
Lu Hou,
Wei Zhang,
Lifeng Shang,
Xin Jiang,
Qun Liu,
Ping Luo,
Ngai Wong
Abstract:
The increasing size of generative Pre-trained Language Models (PLMs) has greatly increased the demand for model compression. Despite various methods to compress BERT or its variants, there are few attempts to compress generative PLMs, and the underlying difficulty remains unclear. In this paper, we compress generative PLMs by quantization. We find that previous quantization methods fail on generat…
▽ More
The increasing size of generative Pre-trained Language Models (PLMs) has greatly increased the demand for model compression. Despite various methods to compress BERT or its variants, there are few attempts to compress generative PLMs, and the underlying difficulty remains unclear. In this paper, we compress generative PLMs by quantization. We find that previous quantization methods fail on generative tasks due to the \textit{homogeneous word embeddings} caused by reduced capacity, and \textit{varied distribution of weights}. Correspondingly, we propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules. Empirical results on various tasks show that our proposed method outperforms the state-of-the-art compression methods on generative PLMs by a clear margin. With comparable performance with the full-precision models, we achieve 14.4x and 13.4x compression rates on GPT-2 and BART, respectively.
△ Less
Submitted 16 July, 2022; v1 submitted 20 March, 2022;
originally announced March 2022.
-
Sampling Complexity of Path Integral Methods for Trajectory Optimization
Authors:
Hyung-Jin Yoon,
Chuyuan Tao,
Hunmin Kim,
Naira Hovakimyan,
Petros Voulgaris
Abstract:
The use of random sampling in decision-making and control has become popular with the ease of access to graphic processing units that can generate and calculate multiple random trajectories for real-time robotic applications. In contrast to sequential optimization, the sampling-based method can take advantage of parallel computing to maintain constant control loop frequencies. Inspired by its wide…
▽ More
The use of random sampling in decision-making and control has become popular with the ease of access to graphic processing units that can generate and calculate multiple random trajectories for real-time robotic applications. In contrast to sequential optimization, the sampling-based method can take advantage of parallel computing to maintain constant control loop frequencies. Inspired by its wide applicability in robotic applications, we calculate a sampling complexity result applicable to general nonlinear systems considered in the path integral method, which is a sampling-based method. The result determines the required number of samples to satisfy the given error bounds of the estimated control signal from the optimal value with the predefined risk probability. The sampling complexity result shows that the variance of the estimated control value is upper-bounded in terms of the expectation of the cost. Then we apply the result to a linear time-varying dynamical system with quadratic cost and an indicator function cost to avoid constraint sets.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.
-
What Do Adversarially trained Neural Networks Focus: A Fourier Domain-based Study
Authors:
Binxiao Huang,
Chaofan Tao,
Rui Lin,
Ngai Wong
Abstract:
Although many fields have witnessed the superior performance brought about by deep learning, the robustness of neural networks remains an open issue. Specifically, a small adversarial perturbation on the input may cause the model to produce a completely different output. Such poor robustness implies many potential hazards, especially in security-critical applications, e.g., autonomous driving and…
▽ More
Although many fields have witnessed the superior performance brought about by deep learning, the robustness of neural networks remains an open issue. Specifically, a small adversarial perturbation on the input may cause the model to produce a completely different output. Such poor robustness implies many potential hazards, especially in security-critical applications, e.g., autonomous driving and mobile robotics. This work studies what information the adversarially trained model focuses on. Empirically, we notice that the differences between the clean and adversarial data are mainly distributed in the low-frequency region. We then find that an adversarially-trained model is more robust than its naturally-trained counterpart due to the reason that the former pays more attention to learning the dominant information in low-frequency components. In addition, we consider two common ways to improve model robustness, namely, by data augmentation and by using stronger network architectures, and understand these techniques from a frequency-domain perspective. We are hopeful this work can shed light on the design of more robust neural networks.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge
Authors:
Chao-Hong Tan,
Jia-Chen Gu,
Chongyang Tao,
Zhen-Hua Ling,
Can Xu,
Huang Hu,
Xiubo Geng,
Daxin Jiang
Abstract:
Generating natural and informative texts has been a long-standing problem in NLP. Much effort has been dedicated into incorporating pre-trained language models (PLMs) with various open-world knowledge, such as knowledge graphs or wiki pages. However, their ability to access and manipulate the task-specific knowledge is still limited on downstream tasks, as this type of knowledge is usually not wel…
▽ More
Generating natural and informative texts has been a long-standing problem in NLP. Much effort has been dedicated into incorporating pre-trained language models (PLMs) with various open-world knowledge, such as knowledge graphs or wiki pages. However, their ability to access and manipulate the task-specific knowledge is still limited on downstream tasks, as this type of knowledge is usually not well covered in PLMs and is hard to acquire. To address the problem, we propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework. Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively on the basis of PLMs. With the help of these two types of knowledge, our model can learn what and how to generate. Experiments on two text generation tasks of dialogue generation and question generation, and on two datasets show that our method achieves better performance than various baseline models.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
HeterMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations
Authors:
Jia-Chen Gu,
Chao-Hong Tan,
Chongyang Tao,
Zhen-Hua Ling,
Huang Hu,
Xiubo Geng,
Daxin Jiang
Abstract:
Recently, various response generation models for two-party conversations have achieved impressive improvements, but less effort has been paid to multi-party conversations (MPCs) which are more practical and complicated. Compared with a two-party conversation where a dialogue context is a sequence of utterances, building a response generation model for MPCs is more challenging, since there exist co…
▽ More
Recently, various response generation models for two-party conversations have achieved impressive improvements, but less effort has been paid to multi-party conversations (MPCs) which are more practical and complicated. Compared with a two-party conversation where a dialogue context is a sequence of utterances, building a response generation model for MPCs is more challenging, since there exist complicated context structures and the generated responses heavily rely on both interlocutors (i.e., speaker and addressee) and history utterances. To address these challenges, we present HeterMPC, a heterogeneous graph-based neural network for response generation in MPCs which models the semantics of utterances and interlocutors simultaneously with two types of nodes in a graph. Besides, we also design six types of meta relations with node-edge-type-dependent parameters to characterize the heterogeneous interactions within the graph. Through multi-hop updating, HeterMPC can adequately utilize the structural knowledge of conversations for response generation. Experimental results on the Ubuntu Internet Relay Chat (IRC) channel benchmark show that HeterMPC outperforms various baseline models for response generation in MPCs.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
MVD: Memory-Related Vulnerability Detection Based on Flow-Sensitive Graph Neural Networks
Authors:
Sicong Cao,
Xiaobing Sun,
Lili Bo,
Rongxin Wu,
Bin Li,
Chuanqi Tao
Abstract:
Memory-related vulnerabilities constitute severe threats to the security of modern software. Despite the success of deep learning-based approaches to generic vulnerability detection, they are still limited by the underutilization of flow information when applied for detecting memory-related vulnerabilities, leading to high false positives.
In this paper,we propose MVD, a statement-level Memory-r…
▽ More
Memory-related vulnerabilities constitute severe threats to the security of modern software. Despite the success of deep learning-based approaches to generic vulnerability detection, they are still limited by the underutilization of flow information when applied for detecting memory-related vulnerabilities, leading to high false positives.
In this paper,we propose MVD, a statement-level Memory-related Vulnerability Detection approach based on flow-sensitive graph neural networks (FS-GNN). FS-GNN is employed to jointly embed both unstructured information (i.e., source code) and structured information (i.e., control- and data-flow) to capture implicit memory-related vulnerability patterns. We evaluate MVD on the dataset which contains 4,353 real-world memory-related vulnerabilities, and compare our approach with three state-of-the-art deep learning-based approaches as well as five popular static analysisbased memory detectors. The experiment results show that MVD achieves better detection accuracy, outperforming both state-of-theart DL-based and static analysis-based approaches. Furthermore, MVD makes a great trade-off between accuracy and efficiency.
△ Less
Submitted 5 March, 2022;
originally announced March 2022.
-
PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks
Authors:
Yufei Wang,
Can Xu,
Qingfeng Sun,
Huang Hu,
Chongyang Tao,
Xiubo Geng,
Daxin Jiang
Abstract:
This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based D}ata Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthet…
▽ More
This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based D}ata Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The NLU models can be further improved when they are combined for training.
△ Less
Submitted 17 March, 2022; v1 submitted 25 February, 2022;
originally announced February 2022.
-
Mining On Alzheimer's Diseases Related Knowledge Graph to Identity Potential AD-related Semantic Triples for Drug Repurposing
Authors:
Yi Nian,
Xinyue Hu,
Rui Zhang,
Jingna Feng,
Jingcheng Du,
Fang Li,
Yong Chen,
Cui Tao
Abstract:
To date, there are no effective treatments for most neurodegenerative diseases. Knowledge graphs can provide comprehensive and semantic representation for heterogeneous data, and have been successfully leveraged in many biomedical applications including drug repurposing. Our objective is to construct a knowledge graph from literature to study relations between Alzheimer's disease (AD) and chemical…
▽ More
To date, there are no effective treatments for most neurodegenerative diseases. Knowledge graphs can provide comprehensive and semantic representation for heterogeneous data, and have been successfully leveraged in many biomedical applications including drug repurposing. Our objective is to construct a knowledge graph from literature to study relations between Alzheimer's disease (AD) and chemicals, drugs and dietary supplements in order to identify opportunities to prevent or delay neurodegenerative progression. We collected biomedical annotations and extracted their relations using SemRep via SemMedDB. We used both a BERT-based classifier and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The 1,672,110 filtered triples were used to train with knowledge graph completion algorithms (i.e., TransE, DistMult, and ComplEx) to predict candidates that might be helpful for AD treatment or prevention. Among three knowledge graph completion models, TransE outperformed the other two (MR = 13.45, Hits@1 = 0.306). We leveraged the time-slicing technique to further evaluate the prediction results. We found supporting evidence for most highly ranked candidates predicted by our model which indicates that our approach can inform reliable new knowledge. This paper shows that our graph mining model can predict reliable new relationships between AD and other entities (i.e., dietary supplements, chemicals, and drugs). The knowledge graph constructed can facilitate data-driven knowledge discoveries and the generation of novel hypotheses.
△ Less
Submitted 28 November, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Observation of the $π^2σ^2$-bond linear-chain molecular structure in $^{16}$C
Authors:
J. X. Han,
Y. Liu,
Y. L. Ye,
J. L. Lou,
X. F. Yang,
T. Baba,
M. Kimura,
B. Yang,
Z. H. Li,
Q. T. Li,
J. Y. Xu,
Y. C. Ge,
H. Hua,
Z. H. Yang,
J. S. Wang,
Y. Y. Yang,
P. Ma,
Z. Bai,
Q. Hu,
W. Liu,
K. Ma,
L. C. Tao,
Y. Jiang,
L. Y. Hu,
H. L. Zang
, et al. (15 additional authors not shown)
Abstract:
Measurements of the $^2$H($^{16}$C,$^{16}$C$^{*}$$\rightarrow^4$He+$^{12}$Be or $^6$He+$^{10}$Be)$^2$H inelastic excitation and cluster-decay reactions have been carried out at a beam energy of about 23.5 MeV/u. A specially designed detection system, including one multi-layer silicon-strip telescope at around zero degrees, has allowed the high-efficiency three-fold coincident detection and therefo…
▽ More
Measurements of the $^2$H($^{16}$C,$^{16}$C$^{*}$$\rightarrow^4$He+$^{12}$Be or $^6$He+$^{10}$Be)$^2$H inelastic excitation and cluster-decay reactions have been carried out at a beam energy of about 23.5 MeV/u. A specially designed detection system, including one multi-layer silicon-strip telescope at around zero degrees, has allowed the high-efficiency three-fold coincident detection and therefore the event-by-event determination of the energy of the unstable nucleus beam. The decay paths from the $^{16}$C resonances to various states of the final $^{10}$Be or $^{12}$Be nucleus are recognized thanks to the well-resolved $Q$-value spectra. The reconstructed resonances at 16.5(1), 17.3(2), 19.4(1) and 21.6(2) MeV are assigned as the $0^+$, $2^+$, $4^+$ and $6^+$ members, respectively, of the positive-parity $(3/2_π^-)^2(1/2_σ^-)^2$-bond linear-chain molecular band in $^{16}$C, based on the angular correlation analysis for the 16.5 MeV state and the excellent agreement of decay patterns between the measurements and theoretical predictions. Moreover, another intriguing high-lying state was observed at 27.2(1) MeV which decays almost exclusively to the $\sim$6 MeV states of $^{10}$Be, in line with the newly predicted pure $σ$-bond linear-chain configuration.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
Perfectly packing a square by squares of nearly harmonic sidelength
Authors:
Terence Tao
Abstract:
A well known open problem of Meir and Moser asks if the squares of sidelength $1/n$ for $n \geq 2$ can be packed perfectly into a square of area $\sum_{n=2}^\infty \frac{1}{n^2} = \frac{π^2}{6}-1$. In this paper we show that for any $1/2 < t < 1$, and any $n_0$ that is sufficiently large depending on $t$, the squares of sidelength $n^{-t}$ for $n \geq n_0$ can be packed perfectly into a square of…
▽ More
A well known open problem of Meir and Moser asks if the squares of sidelength $1/n$ for $n \geq 2$ can be packed perfectly into a square of area $\sum_{n=2}^\infty \frac{1}{n^2} = \frac{π^2}{6}-1$. In this paper we show that for any $1/2 < t < 1$, and any $n_0$ that is sufficiently large depending on $t$, the squares of sidelength $n^{-t}$ for $n \geq n_0$ can be packed perfectly into a square of area $\sum_{n=n_0}^\infty \frac{1}{n^{2t}}$. This was previously known (if one packs a rectangle instead of a square) for $1/2 < t \leq 2/3$ (in which case one can take $n_0=1$).
△ Less
Submitted 10 March, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
PCL: Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings
Authors:
Qiyu Wu,
Chongyang Tao,
Tao Shen,
Can Xu,
Xiubo Geng,
Daxin Jiang
Abstract:
Learning sentence embeddings in an unsupervised manner is fundamental in natural language processing. Recent common practice is to couple pre-trained language models with unsupervised contrastive learning, whose success relies on augmenting a sentence with a semantically-close positive instance to construct contrastive pairs. Nonetheless, existing approaches usually depend on a mono-augmenting str…
▽ More
Learning sentence embeddings in an unsupervised manner is fundamental in natural language processing. Recent common practice is to couple pre-trained language models with unsupervised contrastive learning, whose success relies on augmenting a sentence with a semantically-close positive instance to construct contrastive pairs. Nonetheless, existing approaches usually depend on a mono-augmenting strategy, which causes learning shortcuts towards the augmenting biases and thus corrupts the quality of sentence embeddings. A straightforward solution is resorting to more diverse positives from a multi-augmenting strategy, while an open question remains about how to unsupervisedly learn from the diverse positives but with uneven augmenting qualities in the text field. As one answer, we propose a novel Peer-Contrastive Learning (PCL) with diverse augmentations. PCL constructs diverse contrastive positives and negatives at the group level for unsupervised sentence embeddings. PCL performs peer-positive contrast as well as peer-network cooperation, which offers an inherent anti-bias ability and an effective way to learn from diverse augmentations. Experiments on STS benchmarks verify the effectiveness of PCL against its competitors in unsupervised sentence embeddings.
△ Less
Submitted 19 October, 2022; v1 submitted 28 January, 2022;
originally announced January 2022.
-
Rubin-Euclid Derived Data Products: Initial Recommendations
Authors:
Leanne P. Guy,
Jean-Charles Cuillandre,
Etienne Bachelet,
Manda Banerji,
Franz E. Bauer,
Thomas Collett,
Christopher J. Conselice,
Siegfried Eggl,
Annette Ferguson,
Adriano Fontana,
Catherine Heymans,
Isobel M. Hook,
Éric Aubourg,
Hervé Aussel,
James Bosch,
Benoit Carry,
Henk Hoekstra,
Konrad Kuijken,
Francois Lanusse,
Peter Melchior,
Joseph Mohr,
Michele Moresco,
Reiko Nakajima,
Stéphane Paltani,
Michael Troxel
, et al. (95 additional authors not shown)
Abstract:
This report is the result of a joint discussion between the Rubin and Euclid scientific communities. The work presented in this report was focused on designing and recommending an initial set of Derived Data products (DDPs) that could realize the science goals enabled by joint processing. All interested Rubin and Euclid data rights holders were invited to contribute via an online discussion forum…
▽ More
This report is the result of a joint discussion between the Rubin and Euclid scientific communities. The work presented in this report was focused on designing and recommending an initial set of Derived Data products (DDPs) that could realize the science goals enabled by joint processing. All interested Rubin and Euclid data rights holders were invited to contribute via an online discussion forum and a series of virtual meetings. Strong interest in enhancing science with joint DDPs emerged from across a wide range of astrophysical domains: Solar System, the Galaxy, the Local Volume, from the nearby to the primaeval Universe, and cosmology.
△ Less
Submitted 13 October, 2022; v1 submitted 11 January, 2022;
originally announced January 2022.
-
The inverse theorem for the $U^3$ Gowers uniformity norm on arbitrary finite abelian groups: Fourier-analytic and ergodic approaches
Authors:
Asgar Jamneshan,
Terence Tao
Abstract:
We state and prove a quantitative inverse theorem for the Gowers uniformity norm $U^3(G)$ on an arbitrary finite abelian group $G$; the cases when $G$ was of odd order or a vector space over ${\mathbf F}_2$ had previously been established by Green and the second author and by Samorodnitsky respectively by Fourier-analytic methods, which we also employ here. We also prove a qualitative version of t…
▽ More
We state and prove a quantitative inverse theorem for the Gowers uniformity norm $U^3(G)$ on an arbitrary finite abelian group $G$; the cases when $G$ was of odd order or a vector space over ${\mathbf F}_2$ had previously been established by Green and the second author and by Samorodnitsky respectively by Fourier-analytic methods, which we also employ here. We also prove a qualitative version of this inverse theorem using a structure theorem of Host--Kra type for ergodic ${\mathbf Z}^ω$-actions of order $2$ on probability spaces established recently by Shalom and the authors.
△ Less
Submitted 1 August, 2023; v1 submitted 27 December, 2021;
originally announced December 2021.
-
Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework
Authors:
Chenxin Tao,
Honghui Wang,
Xizhou Zhu,
Jiahua Dong,
Shiji Song,
Gao Huang,
Jifeng Dai
Abstract:
Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations. Various works are proposed to deal with self-supervised learning from different perspectives: (1) contrastive learning methods (e.g., MoCo, SimCLR) utilize both positive and negative samples to guide the training direction; (2) asymmetric network methods (e.g., BYOL, SimSiam…
▽ More
Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations. Various works are proposed to deal with self-supervised learning from different perspectives: (1) contrastive learning methods (e.g., MoCo, SimCLR) utilize both positive and negative samples to guide the training direction; (2) asymmetric network methods (e.g., BYOL, SimSiam) get rid of negative samples via the introduction of a predictor network and the stop-gradient operation; (3) feature decorrelation methods (e.g., Barlow Twins, VICReg) instead aim to reduce the redundancy between feature dimensions. These methods appear to be quite different in the designed loss functions from various motivations. The final accuracy numbers also vary, where different networks and tricks are utilized in different works. In this work, we demonstrate that these methods can be unified into the same form. Instead of comparing their loss functions, we derive a unified formula through gradient analysis. Furthermore, we conduct fair and detailed experiments to compare their performances. It turns out that there is little gap between these methods, and the use of momentum encoder is the key factor to boost performance. From this unified framework, we propose UniGrad, a simple but effective gradient form for self-supervised learning. It does not require a memory bank or a predictor network, but can still achieve state-of-the-art performance and easily adopt other training strategies. Extensive experiments on linear evaluation and many downstream tasks also show its effectiveness. Code is released at https://github.com/fundamentalvision/UniGrad.
△ Less
Submitted 5 July, 2022; v1 submitted 9 December, 2021;
originally announced December 2021.
-
Searching Parameterized AP Loss for Object Detection
Authors:
Chenxin Tao,
Zizhang Li,
Xizhou Zhu,
Gao Huang,
Yong Liu,
Jifeng Dai
Abstract:
Loss functions play an important role in training deep-network-based object detectors. The most widely used evaluation metric for object detection is Average Precision (AP), which captures the performance of localization and classification sub-tasks simultaneously. However, due to the non-differentiable nature of the AP metric, traditional object detectors adopt separate differentiable losses for…
▽ More
Loss functions play an important role in training deep-network-based object detectors. The most widely used evaluation metric for object detection is Average Precision (AP), which captures the performance of localization and classification sub-tasks simultaneously. However, due to the non-differentiable nature of the AP metric, traditional object detectors adopt separate differentiable losses for the two sub-tasks. Such a mis-alignment issue may well lead to performance degradation. To address this, existing works seek to design surrogate losses for the AP metric manually, which requires expertise and may still be sub-optimal. In this paper, we propose Parameterized AP Loss, where parameterized functions are introduced to substitute the non-differentiable components in the AP calculation. Different AP approximations are thus represented by a family of parameterized functions in a unified formula. Automatic parameter search algorithm is then employed to search for the optimal parameters. Extensive experiments on the COCO benchmark with three different object detectors (i.e., RetinaNet, Faster R-CNN, and Deformable DETR) demonstrate that the proposed Parameterized AP Loss consistently outperforms existing handcrafted losses. Code is released at https://github.com/fundamentalvision/Parameterized-AP-Loss.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
The structure of arbitrary Conze-Lesigne systems
Authors:
Asgar Jamneshan,
Or Shalom,
Terence Tao
Abstract:
Let $Γ$ be a countable abelian group. An (abstract) $Γ$-system $\mathrm{X}$ - that is, an (abstract) probability space equipped with an (abstract) probability-preserving action of $Γ$ - is said to be a Conze-Lesigne system if it is equal to its second Host-Kra-Ziegler factor $\mathrm{Z}^2(\mathrm{X})$. The main result of this paper is a structural description of such Conze-Lesigne systems for arbi…
▽ More
Let $Γ$ be a countable abelian group. An (abstract) $Γ$-system $\mathrm{X}$ - that is, an (abstract) probability space equipped with an (abstract) probability-preserving action of $Γ$ - is said to be a Conze-Lesigne system if it is equal to its second Host-Kra-Ziegler factor $\mathrm{Z}^2(\mathrm{X})$. The main result of this paper is a structural description of such Conze-Lesigne systems for arbitrary countable abelian $Γ$, namely that they are the inverse limit of translational systems $G_n/Λ_n$ arising from locally compact nilpotent groups $G_n$ of nilpotency class $2$, quotiented by a lattice $Λ_n$. Results of this type were previously known when $Γ$ was finitely generated, or the product of cyclic groups of prime order. In a companion paper, two of us will apply this structure theorem to obtain an inverse theorem for the Gowers $U^3(G)$ norm for arbitrary finite abelian groups $G$.
△ Less
Submitted 18 February, 2024; v1 submitted 3 December, 2021;
originally announced December 2021.
-
Learning by Active Forgetting for Neural Networks
Authors:
Jian Peng,
Xian Sun,
Min Deng,
Chao Tao,
Bo Tang,
Wenbo Li,
Guohua Wu,
QingZhu,
Yu Liu,
Tao Lin,
Haifeng Li
Abstract:
Remembering and forgetting mechanisms are two sides of the same coin in a human learning-memory system. Inspired by human brain memory mechanisms, modern machine learning systems have been working to endow machine with lifelong learning capability through better remembering while pushing the forgetting as the antagonist to overcome. Nevertheless, this idea might only see the half picture. Up until…
▽ More
Remembering and forgetting mechanisms are two sides of the same coin in a human learning-memory system. Inspired by human brain memory mechanisms, modern machine learning systems have been working to endow machine with lifelong learning capability through better remembering while pushing the forgetting as the antagonist to overcome. Nevertheless, this idea might only see the half picture. Up until very recently, increasing researchers argue that a brain is born to forget, i.e., forgetting is a natural and active process for abstract, rich, and flexible representations. This paper presents a learning model by active forgetting mechanism with artificial neural networks. The active forgetting mechanism (AFM) is introduced to a neural network via a "plug-and-play" forgetting layer (P\&PF), consisting of groups of inhibitory neurons with Internal Regulation Strategy (IRS) to adjust the extinction rate of themselves via lateral inhibition mechanism and External Regulation Strategy (ERS) to adjust the extinction rate of excitatory neurons via inhibition mechanism. Experimental studies have shown that the P\&PF offers surprising benefits: self-adaptive structure, strong generalization, long-term learning and memory, and robustness to data and parameter perturbation. This work sheds light on the importance of forgetting in the learning process and offers new perspectives to understand the underlying mechanisms of neural networks.
△ Less
Submitted 21 November, 2021;
originally announced November 2021.