Search | arXiv e-print repository

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Authors: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

Abstract: Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral… ▽ More Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral relatiOns. For the first time, our framework explores three types of bilateral entanglements within AVS: pixel entanglement, modality entanglement, and temporal entanglement. Regarding pixel entanglement, we employ a Siam-Encoder Module (SEM) that leverages prior knowledge to generate more precise visual features from the foundational model. For modality entanglement, we design a Bilateral-Fusion Module (BFM), enabling COMBO to align corresponding visual and auditory signals bi-directionally. As for temporal entanglement, we introduce an innovative adaptive inter-frame consistency loss according to the inherent rules of temporal. Comprehensive experiments and ablation studies on AVSBench-object (84.7 mIoU on S4, 59.2 mIou on MS3) and AVSBench-semantic (42.1 mIoU on AVSS) datasets demonstrate that COMBO surpasses previous state-of-the-art methods. Code and more results will be publicly available at https://yannqi.github.io/AVS-COMBO/. △ Less

Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: CVPR 2024 Highlight. 13 pages, 10 figures

arXiv:2312.06208 [pdf, other]

Dark solitons and their bound states in a nonlinear fiber with second- and fourth-order dispersion

Authors: Peng Gao, Li-Zheng Lv, Xin Li

Abstract: We study the excitations of dark solitons in a nonlinear optical fiber with the second- and fourth-order dispersion, and find the emergence of striped dark solitons (SDSs) and some multi-dark-soliton bound states. The SDSs can exhibit time-domain oscillating structures on a plane wave, and they have two types: the ones with or without the total phase step, while the multi-dark-soliton bound states… ▽ More We study the excitations of dark solitons in a nonlinear optical fiber with the second- and fourth-order dispersion, and find the emergence of striped dark solitons (SDSs) and some multi-dark-soliton bound states. The SDSs can exhibit time-domain oscillating structures on a plane wave, and they have two types: the ones with or without the total phase step, while the multi-dark-soliton bound states exhibit different numbers of amplitude humps. By the modified linear stability analysis, we regard the SDSs as the results of the competition between periodicity and localization, and analytically give their existence condition, oscillation frequency, and propagation stability, which show good agreements with numerical results. We also provide a possible interpretation of the formation of the existing striped bright solitons (SBSs), and find that SBS will become the pure-quartic soliton when its periodicity and localization keep balance. Our results provide the theoretical support for the experimental observation of striped solitons in nonlinear fibers, and our method can also guide the discovery of striped solitons in other physical systems. △ Less

Submitted 7 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 9 pages, 6 figures

arXiv:2312.04547 [pdf, other]

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Authors: Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Xiangyu Fan, Han Du, Liang Pan, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu

Abstract: In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models perso… ▽ More In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models personalities with systematic few-shot exemplars, incorporates a reflection process based on psychology principles, and emulates autonomy by initiating dialogue topics; 2) MoMat-MoGen: a text-driven motion synthesis paradigm for controlling the character's digital body. It integrates motion matching, a proven industry technique to ensure motion quality, with cutting-edge advancements in motion generation for diversity. Extensive experiments demonstrate that each module achieves state-of-the-art performance in its respective domain. Collectively, they enable virtual characters to initiate and sustain dialogues autonomously, while evolving their socio-psychological states. Concurrently, these characters can perform contextually relevant bodily movements. Additionally, a motion captioning module further allows the virtual character to recognize and appropriately respond to human players' actions. Homepage: https://digital-life-project.com/ △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Homepage: https://digital-life-project.com/

arXiv:2312.03700 [pdf, other]

OneLLM: One Framework to Align All Modalities with Language

Authors: Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue

Abstract: Multimodal large language models (MLLMs) have gained significant attention due to their strong multimodal understanding capability. However, existing works rely heavily on modality-specific encoders, which usually differ in architecture and are limited to common modalities. In this paper, we present OneLLM, an MLLM that aligns eight modalities to language using a unified framework. We achieve this… ▽ More Multimodal large language models (MLLMs) have gained significant attention due to their strong multimodal understanding capability. However, existing works rely heavily on modality-specific encoders, which usually differ in architecture and are limited to common modalities. In this paper, we present OneLLM, an MLLM that aligns eight modalities to language using a unified framework. We achieve this through a unified multimodal encoder and a progressive multimodal alignment pipeline. In detail, we first train an image projection module to connect a vision encoder with LLM. Then, we build a universal projection module (UPM) by mixing multiple image projection modules and dynamic routing. Finally, we progressively align more modalities to LLM with the UPM. To fully leverage the potential of OneLLM in following instructions, we also curated a comprehensive multimodal instruction dataset, including 2M items from image, audio, video, point cloud, depth/normal map, IMU and fMRI brain activity. OneLLM is evaluated on 25 diverse benchmarks, encompassing tasks such as multimodal captioning, question answering and reasoning, where it delivers excellent performance. Code, data, model and online demo are available at https://github.com/csuhan/OneLLM △ Less

Submitted 9 January, 2025; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR 2024. Code: https://github.com/csuhan/OneLLM

arXiv:2311.17963 [pdf, other]

M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation

Authors: Xiaowei Chi, Rongyu Zhang, Zhengkai Jiang, Yijiang Liu, Yatian Wang, Xingqun Qi, Wenhan Luo, Peng Gao, Shanghang Zhang, Qifeng Liu, Yike Guo

Abstract: While current LLM chatbots like GPT-4V bridge the gap between human instructions and visual representations to enable text-image generations, they still lack efficient alignment methods for high-fidelity performance on multiple downstream tasks. In this paper, we propose \textbf{$M^{2}Chat$}, a novel unified multimodal LLM framework for generating interleaved text-image conversation across various… ▽ More While current LLM chatbots like GPT-4V bridge the gap between human instructions and visual representations to enable text-image generations, they still lack efficient alignment methods for high-fidelity performance on multiple downstream tasks. In this paper, we propose \textbf{$M^{2}Chat$}, a novel unified multimodal LLM framework for generating interleaved text-image conversation across various scenarios. Specifically, we propose an $M^{3}Adapter$ that efficiently integrates granular low-level visual information and high-level semantic features from multi-modality prompts. Upon the well-aligned fused feature, $M^{3}Adapter$ tailors a learnable gating strategy to balance the model creativity and consistency across various tasks adaptively. Moreover, to further enhance the effectiveness of $M^{3}Adapter$ while preserving the coherence of semantic context comprehension, we introduce a two-stage $M^{3}FT$ fine-tuning strategy. This strategy optimizes disjoint groups of parameters for image-text alignment and visual-instruction respectively. Extensive experiments demonstrate our $M^{2}Chat$ surpasses state-of-the-art counterparts across diverse benchmarks, showcasing its prowess in interleaving generation, storytelling, and multimodal dialogue systems. The demo and code are available at \red{https://mattie-e.github.io/M2Chat.github.io}. △ Less

Submitted 13 April, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.16572

Adapting to climate change: Long-term impact of wind resource changes on China's power system resilience

Authors: Jiaqi Ruan, Xiangrui Meng, Yifan Zhu, Gaoqi Liang, Xianzhuo Sun, Huayi Wu, Huijuan Xiao, Mengqian Lu, Pin Gao, Jiapeng Li, Wai-Kin Wong, Zhao Xu, Junhua Zhao

Abstract: Modern society's reliance on power systems is at risk from the escalating effects of wind-related climate change. Yet, failure to identify the intricate relationship between wind-related climate risks and power systems could lead to serious short- and long-term issues, including partial or complete blackouts. Here, we develop a comprehensive framework to assess China's power system resilience acro… ▽ More Modern society's reliance on power systems is at risk from the escalating effects of wind-related climate change. Yet, failure to identify the intricate relationship between wind-related climate risks and power systems could lead to serious short- and long-term issues, including partial or complete blackouts. Here, we develop a comprehensive framework to assess China's power system resilience across various climate change scenarios, enabling a holistic evaluation of the repercussions induced by wind-related climate change. Our findings indicate that China's current wind projects and planning strategies could be jeopardized by wind-related climate change, with up to a 12\% decline in regional wind power availability. Moreover, our results underscore a pronounced vulnerability of power system resilience amidst the rigors of hastened climate change, unveiling a potential amplification of resilience deterioration, even approaching fourfold by 2060 under the most severe scenario, relative to the 2020 benchmark. This work advocates for strategic financial deployment within the power sector aimed at climate adaptation, enhancing power system resilience to avert profound losses from long-term, wind-influenced climatic fluctuations. △ Less

Submitted 24 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: Not suitable for publication

arXiv:2311.12381 [pdf]

Room-temperature continuous-wave pumped exciton polariton condensation in a perovskite microcavity

Authors: Jiepeng Song, Sanjib Ghosh, Xinyi Deng, Qiuyu Shang, Xinfeng Liu, Yubin Wang, Xiaoyue Gao, Wenkai Yang, Xianjin Wang, Qing Zhao, Kebin Shi, Peng Gao, Qihua Xiong, Qing Zhang

Abstract: Microcavity exciton polaritons (polaritons) as part-light part-matter quasiparticles, garner significant attention for non-equilibrium Bose-Einstein condensation at elevated temperatures. Recently, halide perovskites have emerged as promising room-temperature polaritonic platforms thanks to their large exciton binding energies and superior optical properties. However, currently, inducing room-temp… ▽ More Microcavity exciton polaritons (polaritons) as part-light part-matter quasiparticles, garner significant attention for non-equilibrium Bose-Einstein condensation at elevated temperatures. Recently, halide perovskites have emerged as promising room-temperature polaritonic platforms thanks to their large exciton binding energies and superior optical properties. However, currently, inducing room-temperature non-equilibrium polariton condensation in perovskite microcavities requires optical pulsed excitations with high excitation densities. Herein, we demonstrate continuous-wave optically pumped polariton condensation with an exceptionally low threshold of ~0.6 W cm-2 and a narrow linewidth of ~1 meV. Polariton condensation is unambiguously demonstrated by characterizing the nonlinear behavior and coherence properties. We also identify a microscopic mechanism involving the potential landscape in the perovskite microcavity, where numerous discretized energy levels arising from the hybridization of adjacent potential minima enhance the polariton relaxation, facilitating polariton condensate formation. Our findings lay the foundation for the next-generation energy-efficient polaritonic devices operating at room temperature. △ Less

Submitted 14 February, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: 16 pages, 4 figures

arXiv:2311.08626 [pdf, ps, other]

Ratios conjecture of cubic $L$-functions of prime moduli

Authors: Peng Gao, Liangyi Zhao

Abstract: We develop $L$-functions ratios conjecture with one shift in the numerator and denominator in certain ranges for the family of cubic Hecke $L$-functions of prime moduli over the Eisenstein field using multiple Dirichlet series under the generalized Riemann hypothesis. As applications, we evaluate asymptotically the first moment of central values as well as the one-level density of the same family… ▽ More We develop $L$-functions ratios conjecture with one shift in the numerator and denominator in certain ranges for the family of cubic Hecke $L$-functions of prime moduli over the Eisenstein field using multiple Dirichlet series under the generalized Riemann hypothesis. As applications, we evaluate asymptotically the first moment of central values as well as the one-level density of the same family of $L$-functions. △ Less

Submitted 26 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: 14 pages

MSC Class: 11M06; 11M41

arXiv:2311.07575 [pdf, other]

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Authors: Ziyi Lin, Chris Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Chen Lin, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Hongsheng Li, Yu Qiao

Abstract: We present SPHINX, a versatile multi-modal large language model (MLLM) with a joint mixing of model weights, tuning tasks, and visual embeddings. First, for stronger vision-language alignment, we unfreeze the large language model (LLM) during pre-training, and introduce a weight mix strategy between LLMs trained by real-world and synthetic data. By directly integrating the weights from two domains… ▽ More We present SPHINX, a versatile multi-modal large language model (MLLM) with a joint mixing of model weights, tuning tasks, and visual embeddings. First, for stronger vision-language alignment, we unfreeze the large language model (LLM) during pre-training, and introduce a weight mix strategy between LLMs trained by real-world and synthetic data. By directly integrating the weights from two domains, the mixed LLM can efficiently incorporate diverse semantics with favorable robustness. Then, to enable multi-purpose capabilities, we mix a variety of tasks for joint visual instruction tuning, and design task-specific instructions to avoid inter-task conflict. In addition to the basic visual question answering, we include more challenging tasks such as region-level understanding, caption grounding, document layout detection, and human pose estimation, contributing to mutual enhancement over different scenarios. Additionally, we propose to extract comprehensive visual embeddings from various network architectures, pre-training paradigms, and information granularity, providing language models with more robust image representations. Based on our proposed joint mixing, SPHINX exhibits superior multi-modal understanding capabilities on a wide range of applications. On top of this, we further propose an efficient strategy aiming to better capture fine-grained appearances of high-resolution images. With a mixing of different scales and high-resolution sub-images, SPHINX attains exceptional visual parsing and reasoning performance on existing evaluation benchmarks. We hope our work may cast a light on the exploration of joint mixing in future MLLM research. Code is released at https://github.com/Alpha-VLLM/LLaMA2-Accessory. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: Work in progress. Code and demos are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory

arXiv:2311.07023 [pdf]

Photochemical Upcycling of Ultrastrong Polyethylene Nanomembranes into Fibrous Carbon at Ambient Conditions

Authors: Yuexiang Sun, Xin Ma, Qiao Gu, Ping Gao

Abstract: The escalating global issue of plastic waste accumulation, specifically polyolefins, necessitates an urgent solution for upcycling these materials into beneficial compounds. Yet, achieving such upcycling without introducing carbon dioxide into the environment remains a formidable challenge. In this study, we demonstrate an eco-friendly approach for the photochemical conversion of ultrastrong, ultr… ▽ More The escalating global issue of plastic waste accumulation, specifically polyolefins, necessitates an urgent solution for upcycling these materials into beneficial compounds. Yet, achieving such upcycling without introducing carbon dioxide into the environment remains a formidable challenge. In this study, we demonstrate an eco-friendly approach for the photochemical conversion of ultrastrong, ultratransparent, and ultrathin polyethylene membrane into fibrous carbon nanomembrane at ambient conditions. The membrane was sputter-coated with platinum and cuprous oxide nanoparticles and exposed to simulated sunlight, resulting in a porous carbon membrane decorated with Pt nanoparticles. The new carbonized nanomembrane maintained the pristine membrane's morphology. The membrane exhibited high activity (2.11 mA/cm2) for electrochemical ethanol oxidation with stability over 1000 cycles. This work holds significance for sustainable plastic waste management and the design of new polyolefin materials in a circular economy. △ Less

Submitted 13 January, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

arXiv:2311.05533 [pdf, ps, other]

Building Hamiltonian Cycles in the Semi-Random Graph Process in Less Than $2n$ Rounds

Authors: Alan Frieze, Pu Gao, Calum MacRury, Paweł Prałat, Gregory Sorkin

Abstract: The semi-random graph process is an adaptive random graph process in which an online algorithm is initially presented an empty graph on $n$ vertices. In each round, a vertex $u$ is presented to the algorithm independently and uniformly at random. The algorithm then adaptively selects a vertex $v$, and adds the edge $uv$ to the graph. For a given graph property, the objective of the algorithm is to… ▽ More The semi-random graph process is an adaptive random graph process in which an online algorithm is initially presented an empty graph on $n$ vertices. In each round, a vertex $u$ is presented to the algorithm independently and uniformly at random. The algorithm then adaptively selects a vertex $v$, and adds the edge $uv$ to the graph. For a given graph property, the objective of the algorithm is to force the graph to satisfy this property asymptotically almost surely in as few rounds as possible. We focus on the property of Hamiltonicity. We present an adaptive strategy which creates a Hamiltonian cycle in $αn$ rounds, where $α< 1.81696$ is derived from the solution to a system of differential equations. We also show that achieving Hamiltonicity requires at least $βn$ rounds, where $β> 1.26575$. △ Less

Submitted 20 December, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: 29 pages. arXiv admin note: substantial text overlap with arXiv:2205.02350

arXiv:2311.01778 [pdf, other]

Quantum scattering treatment on the time-domain diffraction of a matter-wave soliton

Authors: Peng Gao, Jie Liu

Abstract: We study the dynamics of the matter-wave soliton interacting with a vibrating mirror created by an evanescent light and provide a quantum scattering picture for the time-domain diffraction of the matter-wave soliton. Under Kramers-Henneberger (KH) transformation, i.e., in a vibrating coordinate, the vibration of the mirror can be cast to an effective gauge field. We then can exploit Dyson series a… ▽ More We study the dynamics of the matter-wave soliton interacting with a vibrating mirror created by an evanescent light and provide a quantum scattering picture for the time-domain diffraction of the matter-wave soliton. Under Kramers-Henneberger (KH) transformation, i.e., in a vibrating coordinate, the vibration of the mirror can be cast to an effective gauge field. We then can exploit Dyson series and the quantum scattering theory to investigate the dynamics of the soliton that moves in the effective gauge field and is reflected by a static mirror. Our analytical theory can quantitatively deduce the locations and the relative weights of the scattered wave packets, which is consistent with our numerical simulations of directly solving a nonlinear Schrödinger equation. In particular, for a two-frequency vibrating case, our theory predicts some interesting multi-peak sideband structures in the diffracted matter-wave distributions, which can be resorted to the resonance of two frequencies. Underlying mechanisms and possible applications are discussed. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 10 pages, 4 figures

arXiv:2310.20491 [pdf, other]

Collaborative Decision-Making Using Spatiotemporal Graphs in Connected Autonomy

Authors: Peng Gao, Yu Shen, Ming C. Lin

Abstract: Collaborative decision-making is an essential capability for multi-robot systems, such as connected vehicles, to collaboratively control autonomous vehicles in accident-prone scenarios. Under limited communication bandwidth, capturing comprehensive situational awareness by integrating connected agents' observation is very challenging. In this paper, we propose a novel collaborative decision-making… ▽ More Collaborative decision-making is an essential capability for multi-robot systems, such as connected vehicles, to collaboratively control autonomous vehicles in accident-prone scenarios. Under limited communication bandwidth, capturing comprehensive situational awareness by integrating connected agents' observation is very challenging. In this paper, we propose a novel collaborative decision-making method that efficiently and effectively integrates collaborators' representations to control the ego vehicle in accident-prone scenarios. Our approach formulates collaborative decision-making as a classification problem. We first represent sequences of raw observations as spatiotemporal graphs, which significantly reduce the package size to share among connected vehicles. Then we design a novel spatiotemporal graph neural network based on heterogeneous graph learning, which analyzes spatial and temporal connections of objects in a unified way for collaborative decision-making. We evaluate our approach using a high-fidelity simulator that considers realistic traffic, communication bandwidth, and vehicle sensing among connected autonomous vehicles. The experimental results show that our representation achieves over 100x reduction in the shared data size that meets the requirements of communication bandwidth for connected autonomous driving. In addition, our approach achieves over 30% improvements in driving safety. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.17043 [pdf, other]

Quantifying the Transit Light Source Effect: Measurements of Spot Temperature and Coverage on the Photosphere of AU Microscopii with High-Resolution Spectroscopy and Multi-Color Photometry

Authors: William Waalkes, Zachory Berta-Thompson, Elisabeth Newton, Andrew Mann, Peter Gao, Hannah Wakeford, Lili Alderson, Peter Plavchan

Abstract: AU Mic is an active 24 Myr pre-main sequence M dwarf in the stellar neighborhood (d$=$9.7 pc) with a rotation period of 4.86 days. The two transiting planets orbiting AU Mic, AU Mic b and c, are warm sub-Neptunes on 8.5 and 18.9 day periods and are targets of interest for atmospheric observations of young planets. Here we study AU Mic's unocculted starspots using ground-based photometry and spectr… ▽ More AU Mic is an active 24 Myr pre-main sequence M dwarf in the stellar neighborhood (d$=$9.7 pc) with a rotation period of 4.86 days. The two transiting planets orbiting AU Mic, AU Mic b and c, are warm sub-Neptunes on 8.5 and 18.9 day periods and are targets of interest for atmospheric observations of young planets. Here we study AU Mic's unocculted starspots using ground-based photometry and spectra in order to complement current and future transmission spectroscopy of its planets. We gathered multi-color LCO 0.4m SBIG photometry to study the star's rotational modulations and LCO NRES high-resolution spectra to measure the different spectral components within the integrated spectrum of the star, parameterized by 3 spectral components and their coverage fractions. We find AU Mic's surface has at least 2 spectral components, a $4000\pm15$ K ambient photosphere with cool spots that have a temperature of $3000\pm70$ K and cover $39\pm4\%$ percent of the surface, increasing and decreasing by 5$\%$ from the average throughout a rotation. We also detect a third flux component with a filling factor less than 0.5$\%$ and a largely uncertain temperature that we attribute to flare flux not entirely omitted in the time-averaged spectra. We include measurements of spot temperature and coverage fraction from both 2- and 3- temperature models, which we find agree with each other strongly. Our expanded use of various techniques to study starspots will help us better understand this system and may have applications for interpreting the transmission spectra for exoplanets transiting stars of a wide range of activity levels. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 25 pages, 13 figures, Accepted to ApJ

arXiv:2310.08358 [pdf, other]

Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges

Authors: Peifeng Gao, Qianqian Xu, Yibo Yang, Peisong Wen, Huiyang Shao, Zhiyong Yang, Bernard Ghanem, Qingming Huang

Abstract: Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT). It is characterized by the collapse of features and classifier into a symmetrical structure, known as simplex equiangular tight frame (ETF). While there have been extensive studies on optimization characteristics showing the global optimality of neural collapse, little research has been… ▽ More Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT). It is characterized by the collapse of features and classifier into a symmetrical structure, known as simplex equiangular tight frame (ETF). While there have been extensive studies on optimization characteristics showing the global optimality of neural collapse, little research has been done on the generalization behaviors during the occurrence of NC. Particularly, the important phenomenon of generalization improvement during TPT has been remaining in an empirical observation and lacking rigorous theoretical explanation. In this paper, we establish the connection between the minimization of CE and a multi-class SVM during TPT, and then derive a multi-class margin generalization bound, which provides a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%. Additionally, our further theoretical results indicate that different alignment between labels and features in a simplex ETF can result in varying degrees of generalization improvement, despite all models reaching NC and demonstrating similar optimization performance on train set. We refer to this newly discovered property as "non-conservative generalization". In experiments, we also provide empirical observations to verify the indications suggested by our theoretical results. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 20 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:2304.08914

arXiv:2310.06311 [pdf, other]

Improving Compositional Text-to-image Generation with Large Vision-Language Models

Authors: Song Wen, Guian Fang, Renrui Zhang, Peng Gao, Hao Dong, Dimitris Metaxas

Abstract: Recent advancements in text-to-image models, particularly diffusion models, have shown significant promise. However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately align with input texts describing multiple objects, variable attributes, and intricate spatial relationships. To address this limitation, we employ large vision-lan… ▽ More Recent advancements in text-to-image models, particularly diffusion models, have shown significant promise. However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately align with input texts describing multiple objects, variable attributes, and intricate spatial relationships. To address this limitation, we employ large vision-language models (LVLMs) for multi-dimensional assessment of the alignment between generated images and their corresponding input texts. Utilizing this assessment, we fine-tune the diffusion model to enhance its alignment capabilities. During the inference phase, an initial image is produced using the fine-tuned diffusion model. The LVLM is then employed to pinpoint areas of misalignment in the initial image, which are subsequently corrected using the image editing algorithm until no further misalignments are detected by the LVLM. The resultant image is consequently more closely aligned with the input text. Our experimental results validate that the proposed methodology significantly improves text-image alignment in compositional image generation, particularly with respect to object number, attribute binding, spatial relationships, and aesthetic quality. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.04180 [pdf, other]

Degradation-Aware Self-Attention Based Transformer for Blind Image Super-Resolution

Authors: Qingguo Liu, Pan Gao, Kang Han, Ningzhong Liu, Wei Xiang

Abstract: Compared to CNN-based methods, Transformer-based methods achieve impressive image restoration outcomes due to their abilities to model remote dependencies. However, how to apply Transformer-based methods to the field of blind super-resolution (SR) and further make an SR network adaptive to degradation information is still an open problem. In this paper, we propose a new degradation-aware self-atte… ▽ More Compared to CNN-based methods, Transformer-based methods achieve impressive image restoration outcomes due to their abilities to model remote dependencies. However, how to apply Transformer-based methods to the field of blind super-resolution (SR) and further make an SR network adaptive to degradation information is still an open problem. In this paper, we propose a new degradation-aware self-attention-based Transformer model, where we incorporate contrastive learning into the Transformer network for learning the degradation representations of input images with unknown noise. In particular, we integrate both CNN and Transformer components into the SR network, where we first use the CNN modulated by the degradation information to extract local features, and then employ the degradation-aware Transformer to extract global semantic features. We apply our proposed model to several popular large-scale benchmark datasets for testing, and achieve the state-of-the-art performance compared to existing methods. In particular, our method yields a PSNR of 32.43 dB on the Urban100 dataset at $\times$2 scale, 0.94 dB higher than DASR, and 26.62 dB on the Urban100 dataset at $\times$4 scale, 0.26 dB improvement over KDSR, setting a new benchmark in this area. Source code is available at: https://github.com/I2-Multimedia-Lab/DSAT/tree/main. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: 12 pages

arXiv:2310.01443 [pdf, other]

doi 10.1155/2020/8216874

Quantum-Based Feature Selection for Multi-classification Problem in Complex Systems with Edge Computing

Authors: Wenjie Liu, Junxiu Chen, Yuxiang Wang, Peipei Gao, Zhibin Lei, Xu Ma

Abstract: The complex systems with edge computing require a huge amount of multi-feature data to extract appropriate insights for their decision making, so it is important to find a feasible feature selection method to improve the computational efficiency and save the resource consumption. In this paper, a quantum-based feature selection algorithm for the multi-classification problem, namely, QReliefF, is p… ▽ More The complex systems with edge computing require a huge amount of multi-feature data to extract appropriate insights for their decision making, so it is important to find a feasible feature selection method to improve the computational efficiency and save the resource consumption. In this paper, a quantum-based feature selection algorithm for the multi-classification problem, namely, QReliefF, is proposed, which can effectively reduce the complexity of algorithm and improve its computational efficiency. First, all features of each sample are encoded into a quantum state by performing operations CMP and R_y, and then the amplitude estimation is applied to calculate the similarity between any two quantum states (i.e., two samples). According to the similarities, the Grover-Long method is utilized to find the nearest k neighbor samples, and then the weight vector is updated. After a certain number of iterations through the above process, the desired features can be selected with regards to the final weight vector and the threshold τ. Compared with the classical ReliefF algorithm, our algorithm reduces the complexity of similarity calculation from O(MN) to O(M), the complexity of finding the nearest neighbor from O(M) to O(sqrt(M)), and resource consumption from O(MN) to O(MlogN). Meanwhile, compared with the quantum Relief algorithm, our algorithm is superior in finding the nearest neighbor, reducing the complexity from O(M) to O(sqrt(M)). Finally, in order to verify the feasibility of our algorithm, a simulation experiment based on Rigetti with a simple example is performed. △ Less

Submitted 30 September, 2023; originally announced October 2023.

Comments: 22 pages, 11 figures

Journal ref: Complexity, 2020.2020:p.8216874

arXiv:2309.16917 [pdf]

Atomic-scale mechanism of enhanced electron-phonon coupling at the interface of MgB$_2$ thin film

Authors: Xiaowen Zhang, Tiequan Xu, Ruochen Shi, Bo Han, Fachen Liu, Zhetong Liu, Xiaoyue Gao, Jinlong Du, Yue Wang, Peng Gao

Abstract: In this study, we explore the heterointerface of MgB$_2$ film on SiC substrate at atomic scale using electron microscopy and spectroscopy. We detect ~1 nm MgO between MgB$_2$ and SiC. Atomic-level electron energy loss spectra (EELS) show MgB$_2$-E2g mode splitting and softening near the MgB$_2$/MgO interface. Orbital-resolved core-level EELS link the phonon softening to in-plane boron-atom electro… ▽ More In this study, we explore the heterointerface of MgB$_2$ film on SiC substrate at atomic scale using electron microscopy and spectroscopy. We detect ~1 nm MgO between MgB$_2$ and SiC. Atomic-level electron energy loss spectra (EELS) show MgB$_2$-E2g mode splitting and softening near the MgB$_2$/MgO interface. Orbital-resolved core-level EELS link the phonon softening to in-plane boron-atom electron states' changes. Ab initio calculations confirm this softening enhances electron-phonon coupling at the interface. Our findings highlight interface engineering's potential for superconductivity enhancement. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.16583 [pdf, other]

GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond

Authors: Shen Zheng, Yuyu Zhang, Yijie Zhu, Chenguang Xi, Pengyang Gao, Xun Zhou, Kevin Chen-Chuan Chang

Abstract: With the rapid advancement of large language models (LLMs), there is a pressing need for a comprehensive evaluation suite to assess their capabilities and limitations. Existing LLM leaderboards often reference scores reported in other papers without consistent settings and prompts, which may inadvertently encourage cherry-picking favored settings and prompts for better results. In this work, we in… ▽ More With the rapid advancement of large language models (LLMs), there is a pressing need for a comprehensive evaluation suite to assess their capabilities and limitations. Existing LLM leaderboards often reference scores reported in other papers without consistent settings and prompts, which may inadvertently encourage cherry-picking favored settings and prompts for better results. In this work, we introduce GPT-Fathom, an open-source and reproducible LLM evaluation suite built on top of OpenAI Evals. We systematically evaluate 10+ leading LLMs as well as OpenAI's legacy models on 20+ curated benchmarks across 7 capability categories, all under aligned settings. Our retrospective study on OpenAI's earlier models offers valuable insights into the evolutionary path from GPT-3 to GPT-4. Currently, the community is eager to know how GPT-3 progressively improves to GPT-4, including technical details like whether adding code data improves LLM's reasoning capability, which aspects of LLM capability can be improved by SFT and RLHF, how much is the alignment tax, etc. Our analysis sheds light on many of these questions, aiming to improve the transparency of advanced LLMs. △ Less

Submitted 1 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: Accepted by NAACL 2024

arXiv:2309.14366 [pdf, other]

doi 10.1109/access.2019.2896316

A Unitary Weights Based One-Iteration Quantum Perceptron Algorithm for Non-Ideal Training Sets

Authors: Wenjie Liu, Peipei Gao, Yuxiang Wang, Wenbin Yu, Maojun Zhang

Abstract: In order to solve the problem of non-ideal training sets (i.e., the less-complete or over-complete sets) and implement one-iteration learning, a novel efficient quantum perceptron algorithm based on unitary weights is proposed, where the singular value decomposition of the total weight matrix from the training set is calculated to make the weight matrix to be unitary. The example validation of qua… ▽ More In order to solve the problem of non-ideal training sets (i.e., the less-complete or over-complete sets) and implement one-iteration learning, a novel efficient quantum perceptron algorithm based on unitary weights is proposed, where the singular value decomposition of the total weight matrix from the training set is calculated to make the weight matrix to be unitary. The example validation of quantum gates {H, S, T, CNOT, Toffoli, Fredkin} shows that our algorithm can accurately implement arbitrary quantum gates within one iteration. The performance comparison between our algorithm and other quantum perceptron algorithms demonstrates the advantages of our algorithm in terms of applicability, accuracy, and availability. For further validating the applicability of our algorithm, a quantum composite gate which consists of several basic quantum gates is also illustrated. △ Less

Submitted 23 September, 2023; originally announced September 2023.

Comments: 12 pages, 5 figures

Journal ref: IEEE Access, 2019. 7: p. 36854-36865

arXiv:2309.13811 [pdf, ps, other]

Ratios conjecture for primitive quadratic Hecke $L$-functions

Authors: Peng Gao, Liangyi Zhao

Abstract: We develop the ratios conjecture with one shift in the numerator and denominator in certain ranges for families of primitive quadratic Hecke $L$-functions of imaginary quadratic number fields with class number one using multiple Dirichlet series under the generalized Riemann hypothesis. We also obtain unconditional asymptotic formulas for the first moments of central values of these families of… ▽ More We develop the ratios conjecture with one shift in the numerator and denominator in certain ranges for families of primitive quadratic Hecke $L$-functions of imaginary quadratic number fields with class number one using multiple Dirichlet series under the generalized Riemann hypothesis. We also obtain unconditional asymptotic formulas for the first moments of central values of these families of $L$-functions with error terms of size that is the square root of that of the primary main terms. △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: 16 pages

MSC Class: 11M06; 11M41

arXiv:2309.13193 [pdf, other]

SurrealDriver: Designing LLM-powered Generative Driver Agent Framework based on Human Drivers' Driving-thinking Data

Authors: Ye Jin, Ruoxuan Yang, Zhijie Yi, Xiaoxi Shen, Huiling Peng, Xiaoan Liu, Jingli Qin, Jiayang Li, Jintao Xie, Peizhong Gao, Guyue Zhou, Jiangtao Gong

Abstract: Leveraging advanced reasoning capabilities and extensive world knowledge of large language models (LLMs) to construct generative agents for solving complex real-world problems is a major trend. However, LLMs inherently lack embodiment as humans, resulting in suboptimal performance in many embodied decision-making tasks. In this paper, we introduce a framework for building human-like generative dri… ▽ More Leveraging advanced reasoning capabilities and extensive world knowledge of large language models (LLMs) to construct generative agents for solving complex real-world problems is a major trend. However, LLMs inherently lack embodiment as humans, resulting in suboptimal performance in many embodied decision-making tasks. In this paper, we introduce a framework for building human-like generative driving agents using post-driving self-report driving-thinking data from human drivers as both demonstration and feedback. To capture high-quality, natural language data from drivers, we conducted urban driving experiments, recording drivers' verbalized thoughts under various conditions to serve as chain-of-thought prompts and demonstration examples for the LLM-Agent. The framework's effectiveness was evaluated through simulations and human assessments. Results indicate that incorporating expert demonstration data significantly reduced collision rates by 81.04\% and increased human likeness by 50\% compared to a baseline LLM-based agent. Our study provides insights into using natural language-based human demonstration data for embodied tasks. The driving-thinking dataset is available at \url{https://github.com/AIR-DISCOVER/Driving-Thinking-Dataset}. △ Less

Submitted 21 July, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: 6 pages, 3 figures

MSC Class: H.5.2

arXiv:2309.12719 [pdf, other]

doi 10.1007/s10773-017-3553-x

An Efficient and Secure Arbitrary N-Party Quantum Key Agreement Protocol Using Bell States

Authors: Wen-Jie Liu, Yong Xu, Ching-Nung Yang, Pei-Pei Gao, Wen-Bin Yu

Abstract: Two quantum key agreement protocols using Bell states and Bell measurement were recently proposed by Shukla et al.(Quantum Inf. Process. 13(11), 2391-2405, 2014). However, Zhu et al. pointed out that there are some security flaws and proposed an improved version (Quantum Inf. Process. 14(11), 4245-4254, 2015). In this study, we will show Zhu et al.'s improvement still exists some security problems… ▽ More Two quantum key agreement protocols using Bell states and Bell measurement were recently proposed by Shukla et al.(Quantum Inf. Process. 13(11), 2391-2405, 2014). However, Zhu et al. pointed out that there are some security flaws and proposed an improved version (Quantum Inf. Process. 14(11), 4245-4254, 2015). In this study, we will show Zhu et al.'s improvement still exists some security problems, and its efficiency is not high enough. For solving these problems, we utilize four Pauli operations {I, Z, X, Y } to encode two bits instead of the original two operations {I,X} to encode one bit, and then propose an efficient and secure arbitrary N-party quantum key agreement protocol. In the protocol, the channel checking with decoy single photons is introduced to avoid the eavesdropper's flip attack, and a post-measurement mechanism is used to prevent against the collusion attack. The security analysis shows the present protocol can guarantee the correctness, security, privacy and fairness of quantum key agreement. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: 13 pages, 5 figures

Journal ref: International Journal of Theoretical Physics, 2018. 57(1): p. 195-207

arXiv:2309.12119 [pdf, other]

Pseudo-Bayesian unit level modeling for small area estimation under informative sampling

Authors: Peter A. Gao, Jon Wakefield

Abstract: When mapping subnational health and demographic indicators, direct weighted estimators of small area means based on household survey data can be unreliable when data are limited. If survey microdata are available, unit level models can relate individual survey responses to unit level auxiliary covariates and explicitly account for spatial dependence and between area variation using random effects.… ▽ More When mapping subnational health and demographic indicators, direct weighted estimators of small area means based on household survey data can be unreliable when data are limited. If survey microdata are available, unit level models can relate individual survey responses to unit level auxiliary covariates and explicitly account for spatial dependence and between area variation using random effects. These models can produce estimators with improved precision, but often neglect to account for the design of the surveys used to collect data. Pseudo-Bayesian approaches incorporate sampling weights to address informative sampling when using such models to conduct population inference but credible sets based on the resulting pseudo-posterior distributions can be poorly calibrated without adjustment. We outline a pseudo-Bayesian strategy for small area estimation that addresses informative sampling and incorporates a post-processing rescaling step that produces credible sets with close to nominal empirical frequentist coverage rates. We compare our approach with existing design-based and model-based estimators using real and simulated data. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.10309 [pdf, other]

Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill

Authors: Wenzhe Cai, Siyuan Huang, Guangran Cheng, Yuxing Long, Peng Gao, Changyin Sun, Hao Dong

Abstract: Zero-shot object navigation is a challenging task for home-assistance robots. This task emphasizes visual grounding, commonsense inference and locomotion abilities, where the first two are inherent in foundation models. But for the locomotion part, most works still depend on map-based planning approaches. The gap between RGB space and map space makes it difficult to directly transfer the knowledge… ▽ More Zero-shot object navigation is a challenging task for home-assistance robots. This task emphasizes visual grounding, commonsense inference and locomotion abilities, where the first two are inherent in foundation models. But for the locomotion part, most works still depend on map-based planning approaches. The gap between RGB space and map space makes it difficult to directly transfer the knowledge from foundation models to navigation tasks. In this work, we propose a Pixel-guided Navigation skill (PixNav), which bridges the gap between the foundation models and the embodied navigation task. It is straightforward for recent foundation models to indicate an object by pixels, and with pixels as the goal specification, our method becomes a versatile navigation policy towards all different kinds of objects. Besides, our PixNav is a pure RGB-based policy that can reduce the cost of home-assistance robots. Experiments demonstrate the robustness of the PixNav which achieves 80+% success rate in the local path-planning task. To perform long-horizon object navigation, we design an LLM-based planner to utilize the commonsense knowledge between objects and rooms to select the best waypoint. Evaluations across both photorealistic indoor simulators and real-world environments validate the effectiveness of our proposed navigation strategy. Code and video demos are available at https://github.com/wzcai99/Pixel-Navigator. △ Less

Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: 8 pages, 5 figures

arXiv:2309.08365 [pdf, other]

M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection

Authors: Yao Yuan, Pan Gao, XiaoYang Tan

Abstract: Most existing salient object detection methods mostly use U-Net or feature pyramid structure, which simply aggregates feature maps of different scales, ignoring the uniqueness and interdependence of them and their respective contributions to the final prediction. To overcome these, we propose the M$^3$Net, i.e., the Multilevel, Mixed and Multistage attention network for Salient Object Detection (S… ▽ More Most existing salient object detection methods mostly use U-Net or feature pyramid structure, which simply aggregates feature maps of different scales, ignoring the uniqueness and interdependence of them and their respective contributions to the final prediction. To overcome these, we propose the M$^3$Net, i.e., the Multilevel, Mixed and Multistage attention network for Salient Object Detection (SOD). Firstly, we propose Multiscale Interaction Block which innovatively introduces the cross-attention approach to achieve the interaction between multilevel features, allowing high-level features to guide low-level feature learning and thus enhancing salient regions. Secondly, considering the fact that previous Transformer based SOD methods locate salient regions only using global self-attention while inevitably overlooking the details of complex objects, we propose the Mixed Attention Block. This block combines global self-attention and window self-attention, aiming at modeling context at both global and local levels to further improve the accuracy of the prediction map. Finally, we proposed a multilevel supervision strategy to optimize the aggregated feature stage-by-stage. Experiments on six challenging datasets demonstrate that the proposed M$^3$Net surpasses recent CNN and Transformer-based SOD arts in terms of four metrics. Codes are available at https://github.com/I2-Multimedia-Lab/M3Net. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.08214 [pdf, other]

MTG: Mapless Trajectory Generator with Traversability Coverage for Outdoor Navigation

Authors: Jing Liang, Peng Gao, Xuesu Xiao, Adarsh Jagan Sathyamoorthy, Mohamed Elnoor, Ming C. Lin, Dinesh Manocha

Abstract: We present a novel learning-based trajectory generation algorithm for outdoor robot navigation. Our goal is to compute collision-free paths that also satisfy the environment-specific traversability constraints. Our approach is designed for global planning using limited onboard robot perception in mapless environments while ensuring comprehensive coverage of all traversable directions. Our formulat… ▽ More We present a novel learning-based trajectory generation algorithm for outdoor robot navigation. Our goal is to compute collision-free paths that also satisfy the environment-specific traversability constraints. Our approach is designed for global planning using limited onboard robot perception in mapless environments while ensuring comprehensive coverage of all traversable directions. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model that is enhanced with traversability constraints and an optimization formulation used for the coverage. We highlight the benefits of our approach over state-of-the-art trajectory generation approaches and demonstrate its performance in challenging and large outdoor environments, including around buildings, across intersections, along trails, and off-road terrain, using a Clearpath Husky and a Boston Dynamics Spot robot. In practice, our approach results in a 6% improvement in coverage of traversable areas and an 89% reduction in trajectory portions residing in non-traversable regions. Our video is here: https://youtu.be/3eJ2soAzXnU △ Less

Submitted 4 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 9

arXiv:2309.07688 [pdf]

Nanoscale Cathodoluminescence Spectroscopy Probing the Nitride Quantum Wells in an Electron Microcope

Authors: Zhetong Liu, Bingyao Liu, Dongdong Liang, Xiaomei Li, Xiaomin Li, Li Chen, Rui Zhu, Jun Xu, Tongbo Wei, Xuedong Bai, Peng Gao

Abstract: To gain a deeper understanding of the luminescence of multiquantum wells and the factors affecting it on a microscopic level, cathodoluminescence combined with scanning transmission electron microscopy and spectroscopy was used to reveal the luminescence of In0.15Ga0.85N five-period multiquantum wells. The composition-wave-energy relationship was established in combination with energy-dispersive X… ▽ More To gain a deeper understanding of the luminescence of multiquantum wells and the factors affecting it on a microscopic level, cathodoluminescence combined with scanning transmission electron microscopy and spectroscopy was used to reveal the luminescence of In0.15Ga0.85N five-period multiquantum wells. The composition-wave-energy relationship was established in combination with energy-dispersive X-ray spectroscopy , and the bandgaps of In0.15Ga0.85N and GaN in multiple quantum wells were extracted by electron energy loss spectroscopy to understand the features of cathodoluminescence luminescence spectra. The luminescence differences between different periods of multiquantum wells and the effects on the luminescence of multiple quantum wells owing to defects such as composition fluctuation and dislocations were revealed. Our study establishing the direct correspondence between the atomic structure of InxGa1-xN multiquantum wells and photoelectric properties, provides useful information for nitride applications. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 13 pages,4 figures

arXiv:2309.06037 [pdf, other]

doi 10.1103/PhysRevD.107.123029

Fast resolving Galactic binaries in LISA data and its ability to study the Milky Way

Authors: Pin Gao, Xi-Long Fan, Zhou-Jian Cao, Xue-Hao Zhang

Abstract: Resolving individual gravitational waves from tens of millions of double white dwarf (DWD) binaries in the Milky Way is a challenge for future space-based gravitational wave detection programs. By using previous data to define the priors for the next search, we propose an accelerated approach of searching the DWD binaries and demonstrate its efficiency based on the GBSIEVER detection pipeline. Com… ▽ More Resolving individual gravitational waves from tens of millions of double white dwarf (DWD) binaries in the Milky Way is a challenge for future space-based gravitational wave detection programs. By using previous data to define the priors for the next search, we propose an accelerated approach of searching the DWD binaries and demonstrate its efficiency based on the GBSIEVER detection pipeline. Compared to the traditional GBSIEVER method, our method can obtain $\sim 50\%$ of sources with 2.5\% of the searching time for LDC1-4 data. In addition, we find that both methods have a similar ability to detect the Milky Way structure by their confirmed sources. The relative error of distance and chirp mass is about 20\% for DWD binaries whose gravitational wave frequency is higher than $4\times10^{-3}$ Hz, even if they are close to the Galactic center. Finally, we propose a signal-to-noise ratio (SNR) threshold for LISA to confirm the detection of DWD binaries. The threshold should be 16 when the gravitational wave frequency is lower than $4\times10^{-3}$ Hz and 9 when the frequency range is from $4\times10^{-3}$ Hz to $1.5\times10^{-2}$ Hz. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: 16 pages, 19 figures

Journal ref: Phys. Rev. D 107, 123029, 2023

arXiv:2309.05881 [pdf, ps, other]

On the pre- and post-positional semi-random graph processes

Authors: Pu Gao, Hidde Koerts

Abstract: We study the semi-random graph process, and a variant process recently suggested by Nick Wormald. We show that these two processes are asymptotically equally fast in constructing a semi-random graph $G$ that has property ${\mathcal P}$, for the following examples of ${\mathcal P}$: - ${\mathcal P}$ is the set of graphs containing a $d$-degenerate subgraph, where $d\ge 1$ is fixed; -… ▽ More We study the semi-random graph process, and a variant process recently suggested by Nick Wormald. We show that these two processes are asymptotically equally fast in constructing a semi-random graph $G$ that has property ${\mathcal P}$, for the following examples of ${\mathcal P}$: - ${\mathcal P}$ is the set of graphs containing a $d$-degenerate subgraph, where $d\ge 1$ is fixed; - ${\mathcal P}$ is the set of $k$-connected graphs, where $k\ge 1$ is fixed. In particular, our result of the $k$-connectedness above settles the open case $k=2$ of the original semi-random graph process. We also prove that there exist properties ${\mathcal P}$ where the two semi-random graph processes do not construct a graph in ${\mathcal P}$ asymptotically equally fast. We further propose some conjectures on ${\mathcal P}$ for which the two processes perform differently. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2309.03905 [pdf, other]

ImageBind-LLM: Multi-modality Instruction Tuning

Authors: Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao

Abstract: We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training… ▽ More We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder. Then, the image features transformed by the bind network are added to word tokens of all layers in LLaMA, which progressively injects visual instructions via an attention-free and zero-initialized gating mechanism. Aided by the joint embedding of ImageBind, the simple image-text training enables our model to exhibit superior multi-modality instruction-following capabilities. During inference, the multi-modality inputs are fed into the corresponding ImageBind encoders, and processed by a proposed visual cache model for further cross-modal embedding enhancement. The training-free cache model retrieves from three million image features extracted by ImageBind, which effectively mitigates the training-inference modality discrepancy. Notably, with our approach, ImageBind-LLM can respond to instructions of diverse modalities and demonstrate significant language generation quality. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter. △ Less

Submitted 11 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: Code is available at https://github.com/OpenGVLab/LLaMA-Adapter

arXiv:2309.02714 [pdf]

doi 10.1038/s41467-024-47688-5

Atomic-scale observation of localized phonons at FeSe/SrTiO3 interface

Authors: Ruochen Sh, Qize Li, Xiaofeng Xu, Bo Han, Ruixue Zhu, Fachen Liu, Ruishi Qi, Xiaowen Zhang, Jinlong Du, Ji Chen, Dapeng Yu, Xuetao Zhu, Jiandong Guo, Peng Gao

Abstract: In single unit-cell FeSe grown on SrTiO3, the superconductivity transition temperature features a significant enhancement. Local phonon modes at the interface associated with electron-phonon coupling may play an important role in the interface-induced enhancement. However, such phonon modes have eluded direct experimental observations. Indeed, the complicated atomic structure of the interface brin… ▽ More In single unit-cell FeSe grown on SrTiO3, the superconductivity transition temperature features a significant enhancement. Local phonon modes at the interface associated with electron-phonon coupling may play an important role in the interface-induced enhancement. However, such phonon modes have eluded direct experimental observations. Indeed, the complicated atomic structure of the interface brings challenges to obtain the accurate structure-phonon relation knowledge from either experiment or theory, thus hindering our understanding of the enhancement mechanism. Here, we achieve direct characterizations of atomic structure and phonon modes at the FeSe/SrTiO3 interface with atomically resolved imaging and electron energy loss spectroscopy in a scanning transmission electron microscope. We find several phonon modes highly localized (~1.3 nm) at the unique double layer Ti-O termination at the interface, one of which (~ 83 meV) engages in strong interactions with the electrons in FeSe based on ab initio calculations. The electron-phonon coupling strength for such a localized interface phonon with short-range interactions is comparable to that of Fuchs-Kliewer (FK) phonon mode with long-rang interactions. Thus, our atomic-scale study provides new insights into understanding the origin of superconductivity enhancement at the FeSe/SrTiO3 interface. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Journal ref: Nat Commun 15, 3418 (2024)

arXiv:2309.00767 [pdf, other]

Physics-informed machine learning of the correlation functions in bulk fluids

Authors: Wenqian Chen, Peiyuan Gao, Panos Stinis

Abstract: The Ornstein-Zernike (OZ) equation is the fundamental equation for pair correlation function computations in the modern integral equation theory for liquids. In this work, machine learning models, notably physics-informed neural networks and physics-informed neural operator networks, are explored to solve the OZ equation. The physics-informed machine learning models demonstrate great accuracy and… ▽ More The Ornstein-Zernike (OZ) equation is the fundamental equation for pair correlation function computations in the modern integral equation theory for liquids. In this work, machine learning models, notably physics-informed neural networks and physics-informed neural operator networks, are explored to solve the OZ equation. The physics-informed machine learning models demonstrate great accuracy and high efficiency in solving the forward and inverse OZ problems of various bulk fluids. The results highlight the significant potential of physics-informed machine learning for applications in thermodynamic state theory. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: 8 figures

Report number: PNNL-SA-189736

arXiv:2309.00615 [pdf, other]

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Authors: Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Yiwen Tang, Xianzheng Ma, Jiaming Han, Kexin Chen, Peng Gao, Xianzhi Li, Hongsheng Li, Pheng-Ann Heng

Abstract: We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video. Guided by ImageBind, we construct a joint embedding space between 3D and multi-modalities, enabling many promising applications, e.g., any-to-3D generation, 3D embedding arithmetic, and 3D open-world understanding. On top of this, we further present Point-LLM, the first 3D large lang… ▽ More We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video. Guided by ImageBind, we construct a joint embedding space between 3D and multi-modalities, enabling many promising applications, e.g., any-to-3D generation, 3D embedding arithmetic, and 3D open-world understanding. On top of this, we further present Point-LLM, the first 3D large language model (LLM) following 3D multi-modal instructions. By parameter-efficient fine-tuning techniques, Point-LLM injects the semantics of Point-Bind into pre-trained LLMs, e.g., LLaMA, which requires no 3D instruction data, but exhibits superior 3D and multi-modal question-answering capacity. We hope our work may cast a light on the community for extending 3D point clouds to multi-modality applications. Code is available at https://github.com/ZiyuGuo99/Point-Bind_Point-LLM. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: Work in progress. Code is available at https://github.com/ZiyuGuo99/Point-Bind_Point-LLM

arXiv:2308.14482 [pdf, other]

An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation

Authors: Pengzhi Gao, Ruiqing Zhang, Zhongjun He, Hua Wu, Haifeng Wang

Abstract: Consistency regularization methods, such as R-Drop (Liang et al., 2021) and CrossConST (Gao et al., 2023), have achieved impressive supervised and zero-shot performance in the neural machine translation (NMT) field. Can we also boost end-to-end (E2E) speech-to-text translation (ST) by leveraging consistency regularization? In this paper, we conduct empirical studies on intra-modal and cross-modal… ▽ More Consistency regularization methods, such as R-Drop (Liang et al., 2021) and CrossConST (Gao et al., 2023), have achieved impressive supervised and zero-shot performance in the neural machine translation (NMT) field. Can we also boost end-to-end (E2E) speech-to-text translation (ST) by leveraging consistency regularization? In this paper, we conduct empirical studies on intra-modal and cross-modal consistency and propose two training strategies, SimRegCR and SimZeroCR, for E2E ST in regular and zero-shot scenarios. Experiments on the MuST-C benchmark show that our approaches achieve state-of-the-art (SOTA) performance in most translation directions. The analyses prove that regularization brought by the intra-modal consistency, instead of modality gap, is crucial for the regular E2E ST, and the cross-modal consistency could close the modality gap and boost the zero-shot E2E ST performance. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.13137 [pdf, other]

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

Authors: Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo

Abstract: Large language models (LLMs) have revolutionized natural language processing tasks. However, their practical deployment is hindered by their immense memory and computation requirements. Although recent post-training quantization (PTQ) methods are effective in reducing memory footprint and improving the computational efficiency of LLM, they hand-craft quantization parameters, leading to low perform… ▽ More Large language models (LLMs) have revolutionized natural language processing tasks. However, their practical deployment is hindered by their immense memory and computation requirements. Although recent post-training quantization (PTQ) methods are effective in reducing memory footprint and improving the computational efficiency of LLM, they hand-craft quantization parameters, leading to low performance, especially in extremely low-bit quantization. To tackle this issue, we introduce an Omnidirectionally calibrated Quantization (\textbf{OmniQuant}) technique for LLMs, which achieves good performance in diverse quantization settings while maintaining the computational efficiency of PTQ by efficiently optimizing various quantization parameters. OmniQuant comprises two innovative components including Learnable Weight Clipping (LWC) and Learnable Equivalent Transformation (LET). LWC modulates the extreme values of weights by optimizing the clipping threshold. Meanwhile, LET tackles activation outliers by shifting the challenge of quantization from activations to weights. Operating within a differentiable framework using block-wise error minimization, OmniQuant can optimize the quantization process efficiently for both weight-only and weight-activation quantization. For instance, the LLaMA-2 model family size 7-70B can be processed with OmniQuant on a single A100-40G GPU within 1-16 hours using 128 samples. Extensive experiments validate OmniQuant's superior performance across diverse quantization configurations such as W4A4 (4-bit weight, 4-bit activation), W6A6, W4A16, W3A16, and W2A16. Additionally, OmniQuant demonstrates effectiveness in instruction-tuned models and delivers notable improvements in inference speed and memory reduction on real devices. Codes are available at \url{https://github.com/OpenGVLab/OmniQuant}. △ Less

Submitted 18 March, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: ICLR 2024 Camera Ready

arXiv:2308.12961 [pdf, other]

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks

Authors: Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Hao Dong, Peng Gao

Abstract: To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot semantic segmentation methods first pre-train the models on `seen' classes, and then evaluate their generalization performance on `unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap… ▽ More To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot semantic segmentation methods first pre-train the models on `seen' classes, and then evaluate their generalization performance on `unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap on `unseen' classes. To tackle these issues, we propose an efficient Training-free Few-shot 3D Segmentation netwrok, TFS3D, and a further training-based variant, TFS3D-T. Without any learnable parameters, TFS3D extracts dense representations by trigonometric positional encodings, and achieves comparable performance to previous training-based methods. Due to the elimination of pre-training, TFS3D can alleviate the domain gap issue and save a substantial amount of time. Building upon TFS3D, TFS3D-T only requires to train a lightweight query-support transferring attention (QUEST), which enhances the interaction between the few-shot query and support data. Experiments demonstrate TFS3D-T improves previous state-of-the-art methods by +6.93% and +17.96% mIoU respectively on S3DIS and ScanNet, while reducing the training time by -90%, indicating superior effectiveness and efficiency. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: Code is available at https://github.com/yangyangyang127/TFS3D

arXiv:2308.11219 [pdf, other]

Controlling the 2D magnetism of CrBr$_3$ by van der Waals stacking engineering

Authors: Shiqi Yang, Xiaolong Xu, Bo Han, Pingfan Gu, Roger Guzman, Yiwen Song, Zhongchong Lin, Peng Gao, Wu Zhou, Jinbo Yang, Zuxin Chen, Yu Ye

Abstract: The manipulation of two-dimensional (2D) magnetic order is of significant importance to facilitate future 2D magnets for low-power and high-speed spintronic devices. Van der Waals stacking engineering makes promises for controllable magnetism via interlayer magnetic coupling. However, directly examining the stacking order changes accompanying magnetic order transitions at the atomic scale and prep… ▽ More The manipulation of two-dimensional (2D) magnetic order is of significant importance to facilitate future 2D magnets for low-power and high-speed spintronic devices. Van der Waals stacking engineering makes promises for controllable magnetism via interlayer magnetic coupling. However, directly examining the stacking order changes accompanying magnetic order transitions at the atomic scale and preparing device-ready 2D magnets with controllable magnetic orders remain elusive. Here, we demonstrate effective control of interlayer stacking in exfoliated CrBr$_3$ via thermally assisted strain engineering. The stable interlayer ferromagnetic (FM), antiferromagnetic (AFM), and FM-AFM coexistent ground states confirmed by the magnetic circular dichroism measurements are realized. Combined with the first-principles calculations, the atomically-resolved imaging technique reveals the correlation between magnetic order and interlay stacking order in the CrBr$_3$ flakes unambiguously. A tunable exchange bias effect is obtained in the mixed phase of FM and AFM states. This work will introduce new magnetic properties by controlling the stacking order, and sequence of 2D magnets, providing ample opportunities for their application in spintronic devices. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: 7 pages, 4 figures

arXiv:2308.11138 [pdf, ps, other]

NLP-based detection of systematic anomalies among the narratives of consumer complaints

Authors: Peiheng Gao, Ning Sun, Xuefeng Wang, Chen Yang, Ričardas Zitikis

Abstract: We develop an NLP-based procedure for detecting systematic nonmeritorious consumer complaints, simply called systematic anomalies, among complaint narratives. While classification algorithms are used to detect pronounced anomalies, in the case of smaller and frequent systematic anomalies, the algorithms may falter due to a variety of reasons, including technical ones as well as natural limitations… ▽ More We develop an NLP-based procedure for detecting systematic nonmeritorious consumer complaints, simply called systematic anomalies, among complaint narratives. While classification algorithms are used to detect pronounced anomalies, in the case of smaller and frequent systematic anomalies, the algorithms may falter due to a variety of reasons, including technical ones as well as natural limitations of human analysts. Therefore, as the next step after classification, we convert the complaint narratives into quantitative data, which are then analyzed using an algorithm for detecting systematic anomalies. We illustrate the entire procedure using complaint narratives from the Consumer Complaint Database of the Consumer Financial Protection Bureau. △ Less

Submitted 26 March, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.10602 [pdf, ps, other]

doi 10.1016/j.jnt.2024.02.007

First moment of central values of some primitive Dirichlet $L$-functions with fixed order characters

Authors: Peng Gao, Liangyi Zhao

Abstract: We evaluate asymptotically the smoothed first moment of central values of families of primitive cubic, quartic and sextic Dirichlet $L$-functions, using the method of double Dirichlet series. Quantitative non-vanishing result for these $L$-values are also proved. We evaluate asymptotically the smoothed first moment of central values of families of primitive cubic, quartic and sextic Dirichlet $L$-functions, using the method of double Dirichlet series. Quantitative non-vanishing result for these $L$-values are also proved. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 11 pages. arXiv admin note: text overlap with arXiv:2306.10726

MSC Class: 11M06; 11M41; 11N37; 11L05; 11L40

Journal ref: J. Number Theory, vol. 261, 2024, pp. 125--142

arXiv:2308.07583 [pdf]

Atomic-Scale Tracking Phase Transition Dynamics of Berezinskii-Kosterlitz-Thouless Polar Vortex-Antivortex

Authors: Ruixue Zhu, Sizheng Zheng, Xiaomei Li, Tao Wang, Congbing Tan, Tiancheng Yu, Zhetong Liu, Xinqiang Wang, Jiangyu Li, Jie Wang, Peng Gao

Abstract: Particle-like topologies, such as vortex-antivortex (V-AV) pairs, have garnered significant attention in the field of condensed matter. However, the detailed phase transition dynamics of V-AV pairs, as exemplified by self-annihilation, motion, and dissociation, have yet to be verified in real space due to the lack of suitable experimental techniques. Here, we employ polar V-AV pairs as a model sys… ▽ More Particle-like topologies, such as vortex-antivortex (V-AV) pairs, have garnered significant attention in the field of condensed matter. However, the detailed phase transition dynamics of V-AV pairs, as exemplified by self-annihilation, motion, and dissociation, have yet to be verified in real space due to the lack of suitable experimental techniques. Here, we employ polar V-AV pairs as a model system and track their transition pathways at atomic resolution with the aid of in situ (scanning) transmission electron microscopy and phase field simulations. We demonstrate the absence of a Berezinskii-Kosterlitz-Thouless phase transition between the room-temperature quasi-long-range ordered ground phase and the high-temperature disordered phase. Instead, we observe polarization suppression in bound V-AV pairs as the temperature increases. Furthermore, electric fields can promote the vortex and antivortex to approach each other and annihilate near the interface. The elucidated intermediate dynamic behaviors of polar V-AV pairs under thermal- and electrical-fields lay the foundation for their potential applications in electronic devices. Moreover, the dynamic behaviors revealed at atomic scale provide us new insights into understanding topological phase of matter and their topological phase transitions. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: 19 pages and 4 figures

arXiv:2308.04701 [pdf]

Direct and in situ examination of Li+ transport kinetics in isotope labelled solid electrolyte interphase

Authors: Xiaofei Yu, Stefany Angarita-Gomez, Yaobin Xu, Peiyuan Gao, Jun-Gang Wang, Xin Zhang, Hao Jia, Wu Xu, Xiaolin Li, Yingge Du, Zhijie Xu, Janet S. Ho, Kang Xu, Perla B. Balbuena, Chongmin Wang, Zihua Zhu

Abstract: Here, using unique in-situ liquid secondary ion mass spectroscopy on isotope-labelled solid-electrolyte-interphase (SEI), assisted by cryogenic transmission electron microscopy and constrained ab initio molecular dynamics simulation, for the first time we answer the question regarding Li+ transport mechanism across SEI, and quantitatively determine the Li+-mobility therein. We unequivocally unveil… ▽ More Here, using unique in-situ liquid secondary ion mass spectroscopy on isotope-labelled solid-electrolyte-interphase (SEI), assisted by cryogenic transmission electron microscopy and constrained ab initio molecular dynamics simulation, for the first time we answer the question regarding Li+ transport mechanism across SEI, and quantitatively determine the Li+-mobility therein. We unequivocally unveil that Li+ transport in SEI follows a mechanism of successive displacement, rather than "direct-hopping". We further reveal, in accordance with spatial-dependence of SEI structure across the thickness, the apparent Li+ self-diffusivity varies from 6.7*10-19 m2/s to 1.0*10-20 m2/s, setting a quantitative gauging of ionic transport behavior of SEI layer against the underlining electrode as well as the rate limiting step of battery operation. This direct study on Li+ kinetics in SEI fills part of the decade-long knowledge gap about the most important component in advanced batteries and provides more precise guidelines to the tailoring of interphasial chemistries for future battery chemistries. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 25 pages, 4 figures

MSC Class: None ACM Class: I.6.4

arXiv:2308.03729 [pdf, other]

TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models

Authors: Wenqi Shao, Meng Lei, Yutao Hu, Peng Gao, Kaipeng Zhang, Fanqing Meng, Peng Xu, Siyuan Huang, Hongsheng Li, Yu Qiao, Ping Luo

Abstract: Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated significant progress in tackling complex multimodal tasks. Among these cutting-edge developments, Google's Bard stands out for its remarkable multimodal capabilities, promoting comprehensive comprehension and reasoning across various domains. This work presents an early and holistic evaluation of LVLMs' multimodal abilit… ▽ More Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated significant progress in tackling complex multimodal tasks. Among these cutting-edge developments, Google's Bard stands out for its remarkable multimodal capabilities, promoting comprehensive comprehension and reasoning across various domains. This work presents an early and holistic evaluation of LVLMs' multimodal abilities, with a particular focus on Bard, by proposing a lightweight variant of LVLM-eHub, named Tiny LVLM-eHub. In comparison to the vanilla version, Tiny LVLM-eHub possesses several appealing properties. Firstly, it provides a systematic assessment of six categories of multimodal capabilities, including visual perception, visual knowledge acquisition, visual reasoning, visual commonsense, object hallucination, and embodied intelligence, through quantitative evaluation of $42$ standard text-related visual benchmarks. Secondly, it conducts an in-depth analysis of LVLMs' predictions using the ChatGPT Ensemble Evaluation (CEE), which leads to a robust and accurate evaluation and exhibits improved alignment with human evaluation compared to the word matching approach. Thirdly, it comprises a mere $2.1$K image-text pairs, facilitating ease of use for practitioners to evaluate their own offline LVLMs. Through extensive experimental analysis, this study demonstrates that Bard outperforms previous LVLMs in most multimodal capabilities except object hallucination, to which Bard is still susceptible. Tiny LVLM-eHub serves as a baseline evaluation for various LVLMs and encourages innovative strategies aimed at advancing multimodal techniques. Our project is publicly available at \url{https://github.com/OpenGVLab/Multi-Modality-Arena}. △ Less

Submitted 10 August, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: accepted to IEEE Transactions on Big Data. Project Page: http://lvlm-ehub.opengvlab.com/

arXiv:2308.01017 [pdf, other]

doi 10.1103/PhysRevD.108.124006

Model-independent search for the quasinormal modes of gravitational wave echoes

Authors: Di Wu, Pengyuan Gao, Jing Ren, Niayesh Afshordi

Abstract: Postmerger gravitational wave echoes provide a unique opportunity to probe the near-horizon structure of astrophysical black holes, which may be modified due to nonperturbative quantum gravity phenomena. However, since the waveform is subject to large theoretical uncertainties, it is necessary to develop search methods that are less reliant on specific models for detecting echoes from observationa… ▽ More Postmerger gravitational wave echoes provide a unique opportunity to probe the near-horizon structure of astrophysical black holes, which may be modified due to nonperturbative quantum gravity phenomena. However, since the waveform is subject to large theoretical uncertainties, it is necessary to develop search methods that are less reliant on specific models for detecting echoes from observational data. A promising strategy is to identify the characteristic quasinormal modes (QNMs) associated with echoes, {\it in frequency space}, which complements existing searches of quasiperiodic pulses in time. In this study, we build upon our previous work targeting these modes by incorporating relative phase information to optimize the Bayesian search algorithm. Using a new phase-marginalized likelihood, the performance can be significantly improved for well-resolved QNMs. This enables an efficient search for QNMs of various shapes, utilizing a simple search template that is independent of specific models. To demonstrate the robustness of the search algorithm, we construct four complementary benchmarks for the echo waveform that span a diverse range of different theoretical possibilities for the near-horizon structure. We then validate our Bayesian search algorithms by injecting the benchmark models into different realizations of Gaussian noise. Using two types of phase-marginalized likelihoods, we find that the search algorithm can efficiently detect the corresponding QNMs. Therefore, our search strategy provides a concrete Bayesian and model-independent approach to "quantum black hole seismology." △ Less

Submitted 20 December, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

Comments: 46 pages, 19 figures, 4 tables. Python code to reproduce figures is available at the link http://github.com/hermione-evans/echomase; v2: typos fixed, matches published version in PRD

Journal ref: Phys.Rev.D 108 (2023) 12, 124006

arXiv:2307.16151 [pdf, other]

StylePrompter: All Styles Need Is Attention

Authors: Chenyi Zhuang, Pan Gao, Aljosa Smolic

Abstract: GAN inversion aims at inverting given images into corresponding latent codes for Generative Adversarial Networks (GANs), especially StyleGAN where exists a disentangled latent space that allows attribute-based image manipulation at latent level. As most inversion methods build upon Convolutional Neural Networks (CNNs), we transfer a hierarchical vision Transformer backbone innovatively to predict… ▽ More GAN inversion aims at inverting given images into corresponding latent codes for Generative Adversarial Networks (GANs), especially StyleGAN where exists a disentangled latent space that allows attribute-based image manipulation at latent level. As most inversion methods build upon Convolutional Neural Networks (CNNs), we transfer a hierarchical vision Transformer backbone innovatively to predict $\mathcal{W^+}$ latent codes at token level. We further apply a Style-driven Multi-scale Adaptive Refinement Transformer (SMART) in $\mathcal{F}$ space to refine the intermediate style features of the generator. By treating style features as queries to retrieve lost identity information from the encoder's feature maps, SMART can not only produce high-quality inverted images but also surprisingly adapt to editing tasks. We then prove that StylePrompter lies in a more disentangled $\mathcal{W^+}$ and show the controllability of SMART. Finally, quantitative and qualitative experiments demonstrate that StylePrompter can achieve desirable performance in balancing reconstruction quality and editability, and is "smart" enough to fit into most edits, outperforming other $\mathcal{F}$-involved inversion methods. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: Some figures in the appendix are compressed for the reason of arXiv submission constrict

arXiv:2307.16144 [pdf, other]

Video Frame Interpolation with Flow Transformer

Authors: Pan Gao, Haoyue Tian, Jie Qin

Abstract: Video frame interpolation has been actively studied with the development of convolutional neural networks. However, due to the intrinsic limitations of kernel weight sharing in convolution, the interpolated frame generated by it may lose details. In contrast, the attention mechanism in Transformer can better distinguish the contribution of each pixel, and it can also capture long-range pixel depen… ▽ More Video frame interpolation has been actively studied with the development of convolutional neural networks. However, due to the intrinsic limitations of kernel weight sharing in convolution, the interpolated frame generated by it may lose details. In contrast, the attention mechanism in Transformer can better distinguish the contribution of each pixel, and it can also capture long-range pixel dependencies, which provides great potential for video interpolation. Nevertheless, the original Transformer is commonly used for 2D images; how to develop a Transformer-based framework with consideration of temporal self-attention for video frame interpolation remains an open issue. In this paper, we propose Video Frame Interpolation Flow Transformer to incorporate motion dynamics from optical flows into the self-attention mechanism. Specifically, we design a Flow Transformer Block that calculates the temporal self-attention in a matched local area with the guidance of flow, making our framework suitable for interpolating frames with large motion while maintaining reasonably low complexity. In addition, we construct a multi-scale architecture to account for multi-scale motion, further improving the overall performance. Extensive experiments on three benchmarks demonstrate that the proposed method can generate interpolated frames with better visual quality than state-of-the-art methods. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: Accepted to ACM MM23

arXiv:2307.15685 [pdf, other]

Minors of matroids represented by sparse random matrices over finite fields

Authors: Pu Gao, Peter Nelson

Abstract: Consider a random $n\times m$ matrix $A$ over the finite field of order $q$ where every column has precisely $k$ nonzero elements, and let $M[A]$ be the matroid represented by $A$. In the case that q=2, Cooper, Frieze and Pegden (RS\&A 2019) proved that given a fixed binary matroid $N$, if $k\ge k_N$ and $m/n\ge d_N$ where $k_N$ and $d_N$ are sufficiently large constants depending on N, then a.a.s… ▽ More Consider a random $n\times m$ matrix $A$ over the finite field of order $q$ where every column has precisely $k$ nonzero elements, and let $M[A]$ be the matroid represented by $A$. In the case that q=2, Cooper, Frieze and Pegden (RS\&A 2019) proved that given a fixed binary matroid $N$, if $k\ge k_N$ and $m/n\ge d_N$ where $k_N$ and $d_N$ are sufficiently large constants depending on N, then a.a.s. $M[A]$ contains $N$ as a minor. We improve their result by determining the sharp threshold (of $m/n$) for the appearance of a fixed matroid $N$ as a minor of $M[A]$, for every $k\ge 3$, and every finite field. △ Less

Submitted 18 January, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.15024 [pdf, other]

doi 10.3847/1538-3881/ace536

The Variable Detection of Atmospheric Escape around the young, Hot Neptune AU Mic b

Authors: Keighley E. Rockcliffe, Elisabeth R. Newton, Allison Youngblood, Girish M. Duvvuri, Peter Plavchan, Peter Gao, Andrew W. Mann, Patrick J. Lowrance

Abstract: Photoevaporation is a potential explanation for several features within exoplanet demographics. Atmospheric escape observed in young Neptune-sized exoplanets can provide insight into and characterize which mechanisms drive this evolution and at what times they dominate. AU Mic b is one such exoplanet, slightly larger than Neptune (4.19 Earth radii). It closely orbits a 23 Myr pre-Main Sequence M d… ▽ More Photoevaporation is a potential explanation for several features within exoplanet demographics. Atmospheric escape observed in young Neptune-sized exoplanets can provide insight into and characterize which mechanisms drive this evolution and at what times they dominate. AU Mic b is one such exoplanet, slightly larger than Neptune (4.19 Earth radii). It closely orbits a 23 Myr pre-Main Sequence M dwarf with a period of 8.46 days. We obtained two visits of AU Mic b at Lyman-alpha with HST/STIS. One flare within the first HST visit is characterized and removed from our search for a planetary transit. We present a non-detection in our first visit followed by the detection of escaping neutral hydrogen ahead of the planet in our second visit. The outflow absorbed about 30% of the star's Lyman-alpha blue-wing 2.5 hours before the planet's white-light transit. We estimate the highest velocity escaping material has a column density of 10^13.96 cm^-2 and is moving 61.26 km/s away from the host star. AU Mic b's large high energy irradiation could photoionize its escaping neutral hydrogen in 44 minutes, rendering it temporarily unobservable. Our time-variable Lyman-alpha transit ahead of AU Mic b could also be explained by an intermediate stellar wind strength from AU Mic that shapes the escaping material into a leading tail. Future Lyman-alpha observations of this system will confirm and characterize the unique variable nature of its Lyman-alpha transit, which combined with modeling will tune the importance of stellar wind and photoionization. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 24 pages, 11 figures

Journal ref: The Astronomical Journal, Volume 166, Number 2, 2023

arXiv:2307.14399 [pdf, other]

doi 10.3847/2041-8213/acfee9

Probing reflection from aerosols with the near-infrared dayside spectrum of WASP-80b

Authors: Bob Jacobs, Jean-Michel Désert, Peter Gao, Caroline V. Morley, Jacob Arcangeli, Saugata Barat, Mark S. Marley, Julianne I. Moses, Jonathan J. Fortney, Jacob L. Bean, Kevin B. Stevenson, Vatsal Panwar

Abstract: The presence of aerosols is intimately linked to the global energy budget and the composition of a planet's atmospheres. Their ability to reflect incoming light prevents energy from being deposited into the atmosphere, and they shape spectra of exoplanets. We observed five near-infrared secondary eclipses of WASP-80b with the Wide Field Camera 3 (WFC3) aboard the \textit{Hubble Space Telescope} to… ▽ More The presence of aerosols is intimately linked to the global energy budget and the composition of a planet's atmospheres. Their ability to reflect incoming light prevents energy from being deposited into the atmosphere, and they shape spectra of exoplanets. We observed five near-infrared secondary eclipses of WASP-80b with the Wide Field Camera 3 (WFC3) aboard the \textit{Hubble Space Telescope} to provide constraints on the presence and properties of atmospheric aerosols. We detect a broadband eclipse depth of $34\pm10$\,ppm for WASP-80b. We detect a higher planetary flux than expected from thermal emission alone at $1.6σ$, which hints toward the presence of reflecting aerosols on this planet's dayside, indicating a geometric albedo of $A_g<0.33$ at 3$σ$. We paired the WFC3 data with Spitzer data and explored multiple atmospheric models with and without aerosols to interpret this spectrum. Albeit consistent with a clear dayside atmosphere, we found a slight preference for near-solar metallicities and for dayside clouds over hazes. We exclude soot haze formation rates higher than $10^{-10.7}$ g cm$^{-2}$s$^{-1}$ and tholin formation rates higher than $10^{-12.0}$ g cm$^{-2}$s$^{-1}$ at $3σ$. We applied the same atmospheric models to a previously published WFC3/Spitzer transmission spectrum for this planet and found weak haze formation. A single soot haze formation rate best fits both the dayside and the transmission spectra simultaneously. However, we emphasize that no models provide satisfactory fits in terms of the chi-square of both spectra simultaneously, indicating longitudinal dissimilarity in the atmosphere's aerosol composition. △ Less

Submitted 26 October, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: Published in ApJ Letters (20 Oct 2023)

Journal ref: ApJL 2023 Volume 956, Number 2, page L43

Showing 201–250 of 833 results for author: Gao, P