Search | arXiv e-print repository

GPT-4o System Card

Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities. △ Less

Submitted 25 October, 2024; originally announced October 2024.

arXiv:2410.20312 [pdf, other]

Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model

Authors: Jing Zhang, Linjiajie Fang, Kexin Shi, Wenjia Wang, Bing-Yi Jing

Abstract: ``Distribution shift'' is the main obstacle to the success of offline reinforcement learning. A learning policy may take actions beyond the behavior policy's knowledge, referred to as Out-of-Distribution (OOD) actions. The Q-values for these OOD actions can be easily overestimated. As a result, the learning policy is biased by using incorrect Q-value estimates. One common approach to avoid Q-value… ▽ More ``Distribution shift'' is the main obstacle to the success of offline reinforcement learning. A learning policy may take actions beyond the behavior policy's knowledge, referred to as Out-of-Distribution (OOD) actions. The Q-values for these OOD actions can be easily overestimated. As a result, the learning policy is biased by using incorrect Q-value estimates. One common approach to avoid Q-value overestimation is to make a pessimistic adjustment. Our key idea is to penalize the Q-values of OOD actions associated with high uncertainty. In this work, we propose Q-Distribution Guided Q-Learning (QDQ), which applies a pessimistic adjustment to Q-values in OOD regions based on uncertainty estimation. This uncertainty measure relies on the conditional Q-value distribution, learned through a high-fidelity and efficient consistency model. Additionally, to prevent overly conservative estimates, we introduce an uncertainty-aware optimization objective for updating the Q-value function. The proposed QDQ demonstrates solid theoretical guarantees for the accuracy of Q-value distribution learning and uncertainty measurement, as well as the performance of the learning policy. QDQ consistently shows strong performance on the D4RL benchmark and achieves significant improvements across many tasks. △ Less

Submitted 12 January, 2025; v1 submitted 26 October, 2024; originally announced October 2024.

Comments: Neurips 2024

arXiv:2410.19872 [pdf, other]

Radar and Camera Fusion for Object Detection and Tracking: A Comprehensive Survey

Authors: Kun Shi, Shibo He, Zhenyu Shi, Anjun Chen, Zehui Xiong, Jiming Chen, Jun Luo

Abstract: Multi-modal fusion is imperative to the implementation of reliable object detection and tracking in complex environments. Exploiting the synergy of heterogeneous modal information endows perception systems the ability to achieve more comprehensive, robust, and accurate performance. As a nucleus concern in wireless-vision collaboration, radar-camera fusion has prompted prospective research directio… ▽ More Multi-modal fusion is imperative to the implementation of reliable object detection and tracking in complex environments. Exploiting the synergy of heterogeneous modal information endows perception systems the ability to achieve more comprehensive, robust, and accurate performance. As a nucleus concern in wireless-vision collaboration, radar-camera fusion has prompted prospective research directions owing to its extensive applicability, complementarity, and compatibility. Nonetheless, there still lacks a systematic survey specifically focusing on deep fusion of radar and camera for object detection and tracking. To fill this void, we embark on an endeavor to comprehensively review radar-camera fusion in a holistic way. First, we elaborate on the fundamental principles, methodologies, and applications of radar-camera fusion perception. Next, we delve into the key techniques concerning sensor calibration, modal representation, data alignment, and fusion operation. Furthermore, we provide a detailed taxonomy covering the research topics related to object detection and tracking in the context of radar and camera technologies.Finally, we discuss the emerging perspectives in the field of radar-camera fusion perception and highlight the potential areas for future research. △ Less

Submitted 24 October, 2024; originally announced October 2024.

arXiv:2410.13271 [pdf, other]

Inductive Gradient Adjustment For Spectral Bias In Implicit Neural Representations

Authors: Kexuan Shi, Hai Chen, Leheng Zhang, Shuhang Gu

Abstract: Implicit Neural Representations (INRs), as a versatile representation paradigm, have achieved success in various computer vision tasks. Due to the spectral bias of the vanilla multi-layer perceptrons (MLPs), existing methods focus on designing MLPs with sophisticated architectures or repurposing training techniques for highly accurate INRs. In this paper, we delve into the linear dynamics model of… ▽ More Implicit Neural Representations (INRs), as a versatile representation paradigm, have achieved success in various computer vision tasks. Due to the spectral bias of the vanilla multi-layer perceptrons (MLPs), existing methods focus on designing MLPs with sophisticated architectures or repurposing training techniques for highly accurate INRs. In this paper, we delve into the linear dynamics model of MLPs and theoretically identify the empirical Neural Tangent Kernel (eNTK) matrix as a reliable link between spectral bias and training dynamics. Based on this insight, we propose a practical Inductive Gradient Adjustment (IGA) method, which could purposefully improve the spectral bias via inductive generalization of eNTK-based gradient transformation matrix. Theoretical and empirical analyses validate impacts of IGA on spectral bias. Further, we evaluate our method on different INRs tasks with various INR architectures and compare to existing training techniques. The superior and consistent improvements clearly validate the advantage of our IGA. Armed with our gradient adjustment method, better INRs with more enhanced texture details and sharpened edges can be learned from data by tailored impacts on spectral bias. △ Less

Submitted 25 May, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

Comments: Accepted to ICML 2025. Code available at https://github.com/LabShuHangGU/IGA-INR

arXiv:2410.13202 [pdf, other]

Anatomy of Thermally Interplayed Spin-Orbit Torque Driven Antiferromagnetic Switching

Authors: Wenlong Cai, Zanhong Chen, Yuzhang Shi, Daoqian Zhu, Guang Yang, Ao Du, Shiyang Lu, Kaihua Cao, Hongxi Liu, Kewen Shi, Weisheng Zhao

Abstract: Current-induced antiferromagnetic (AFM) switching remains critical in spintronics, yet the interplay between thermal effects and spin torques still lacks clear clarification. Here we experimentally investigate the thermally interplayed spin-orbit torque induced AFM switching in magnetic tunnel junctions via pulse-width dependent reversal and time-resolved measurements. By introducing the Langevin… ▽ More Current-induced antiferromagnetic (AFM) switching remains critical in spintronics, yet the interplay between thermal effects and spin torques still lacks clear clarification. Here we experimentally investigate the thermally interplayed spin-orbit torque induced AFM switching in magnetic tunnel junctions via pulse-width dependent reversal and time-resolved measurements. By introducing the Langevin random field into the AFM precession equation, we establish a novel AFM switching model that anatomically explains the experimental observations. Our findings elucidate the currentinduced AFM switching mechanism and offer significant promise for advancements in spintronics. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.08282 [pdf, other]

FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Authors: Irving Fang, Kairui Shi, Xujin He, Siqi Tan, Yifan Wang, Hanwen Zhao, Hung-Jui Huang, Wenzhen Yuan, Chen Feng, Jing Zhang

Abstract: Humans effortlessly integrate common-sense knowledge with sensory input from vision and touch to understand their surroundings. Emulating this capability, we introduce FusionSense, a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. FusionSense addresses three key challenges: (i) How can robo… ▽ More Humans effortlessly integrate common-sense knowledge with sensory input from vision and touch to understand their surroundings. Emulating this capability, we introduce FusionSense, a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. FusionSense addresses three key challenges: (i) How can robots efficiently acquire robust global shape information about the surrounding scene and objects? (ii) How can robots strategically select touch points on the object using geometric and common-sense priors? (iii) How can partial observations such as tactile signals improve the overall representation of the object? Our framework employs 3D Gaussian Splatting as a core representation and incorporates a hierarchical optimization strategy involving global structure construction, object visual hull pruning and local geometric constraints. This advancement results in fast and robust perception in environments with traditionally challenging objects that are transparent, reflective, or dark, enabling more downstream manipulation or navigation tasks. Experiments on real-world data suggest that our framework outperforms previously state-of-the-art sparse-view methods. All code and data are open-sourced on the project website. △ Less

Submitted 10 October, 2024; originally announced October 2024.

ACM Class: I.4.5; I.4.8

arXiv:2410.07069 [pdf, other]

ReIFE: Re-evaluating Instruction-Following Evaluation

Authors: Yixin Liu, Kejian Shi, Alexander R. Fabbri, Yilun Zhao, Peifeng Wang, Chien-Sheng Wu, Shafiq Joty, Arman Cohan

Abstract: The automatic evaluation of instruction following typically involves using large language models (LLMs) to assess response quality. However, there is a lack of comprehensive evaluation of these LLM-based evaluators across two dimensions: the base LLMs and the evaluation protocols. Therefore, we present a thorough meta-evaluation of instruction following, including 25 base LLMs and 15 recently prop… ▽ More The automatic evaluation of instruction following typically involves using large language models (LLMs) to assess response quality. However, there is a lack of comprehensive evaluation of these LLM-based evaluators across two dimensions: the base LLMs and the evaluation protocols. Therefore, we present a thorough meta-evaluation of instruction following, including 25 base LLMs and 15 recently proposed evaluation protocols, on 4 human-annotated datasets, assessing the evaluation accuracy of the LLM-evaluators. Our evaluation allows us to identify the best-performing base LLMs and evaluation protocols with a high degree of robustness. Moreover, our large-scale evaluation reveals: (1) Base LLM performance ranking remains largely consistent across evaluation protocols, with less capable LLMs showing greater improvement from protocol enhancements; (2) Robust evaluation of evaluation protocols requires many base LLMs with varying capability levels, as protocol effectiveness can depend on the base LLM used; (3) Evaluation results on different datasets are not always consistent, so a rigorous evaluation requires multiple datasets with distinctive features. We release our meta-evaluation suite ReIFE, which provides the codebase and evaluation result collection for more than 500 LLM-evaluator configurations, to support future research in instruction-following evaluation. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: GitHub Repo: https://github.com/yale-nlp/ReIFE, Evaluation Result Collection: https://huggingface.co/datasets/yale-nlp/ReIFE

arXiv:2410.06764 [pdf, other]

An Optimal Algorithm for the Stacker Crane Problem on Fixed Topologies

Authors: Yike Chen, Ke Shi, Chao Xu

Abstract: The Stacker Crane Problem (SCP) is a variant of the Traveling Salesman Problem. In SCP, pairs of pickup and delivery points are designated on a graph, and a crane must visit these points to move objects from each pickup location to its respective delivery point. The goal is to minimize the total distance traveled. SCP is known to be NP-hard, even on tree structures. The only positive results, in t… ▽ More The Stacker Crane Problem (SCP) is a variant of the Traveling Salesman Problem. In SCP, pairs of pickup and delivery points are designated on a graph, and a crane must visit these points to move objects from each pickup location to its respective delivery point. The goal is to minimize the total distance traveled. SCP is known to be NP-hard, even on tree structures. The only positive results, in terms of polynomial-time solvability, apply to graphs that are topologically equivalent to a path or a cycle. We propose an algorithm that is optimal for each fixed topology, running in near-linear time. This is achieved by demonstrating that the problem is fixed-parameter tractable (FPT) when parameterized by both the cycle rank and the number of branch vertices. △ Less

Submitted 9 October, 2024; originally announced October 2024.

arXiv:2410.06714 [pdf]

doi 10.1093/mnras/stae2213

Implications for galaxy property estimation revealed by CO luminosity-FWHM relations in local star-forming galaxies

Authors: Yi-Han Wu, Jun-Feng Wang, Xiao-Hu Li, Xue-Jian Jiang, Chao-Wei Tsai, Jing-Wen Wu, Kun-Peng Shi, Lin Zhu, Wen-Yu Zhong

Abstract: This study explores a relationship between the CO luminosity-full width at half-maximum linewidth linear relation (i.e. the CO LFR) and mean galaxy property of the local star-forming galaxy sample in the xCOLDGASS data base, via a mathematical statement. The whole data base galaxies were separated into two subsamples based on their stellar masses and redshifts, being a help to examine the dependen… ▽ More This study explores a relationship between the CO luminosity-full width at half-maximum linewidth linear relation (i.e. the CO LFR) and mean galaxy property of the local star-forming galaxy sample in the xCOLDGASS data base, via a mathematical statement. The whole data base galaxies were separated into two subsamples based on their stellar masses and redshifts, being a help to examine the dependence issue of the CO LFR. Selecting the galaxy data with a stringent requirement was also implemented in order to assure the validly of the CO LFR. An algorithm of the linear regression was conducted with the data of the subsample. An assessment about the linear correlation manifested a valid CO LFR occurs in the selected galaxy of the subsample, and the intercept of the CO LFR may be related with the mean galaxy properties such as the molecular gas fraction and galaxy size. For the finding on the intercept of the CO LFR, we aligned that intercept with those galaxy properties via the involvement of a $ψ$ parameter. On evaluating the $ψ$ value with our local star-forming galaxy sample, we numerically determined a relationship between the statistical result and the galaxy property in a different stellar mass range. It also shows a possible method on estimating galaxy property. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 10 pages, 3 tables, 2 figures; we sincerely appreciate the suggestion of the referee and the acceptance of the MNRAS. Y-HW is greatly grateful to all the co-authors for their works on his articles

arXiv:2409.04851 [pdf, other]

AdaptiveFusion: Adaptive Multi-Modal Multi-View Fusion for 3D Human Body Reconstruction

Authors: Anjun Chen, Xiangyu Wang, Zhi Xu, Kun Shi, Yan Qin, Yuchi Huo, Jiming Chen, Qi Ye

Abstract: Recent advancements in sensor technology and deep learning have led to significant progress in 3D human body reconstruction. However, most existing approaches rely on data from a specific sensor, which can be unreliable due to the inherent limitations of individual sensing modalities. Additionally, existing multi-modal fusion methods generally require customized designs based on the specific senso… ▽ More Recent advancements in sensor technology and deep learning have led to significant progress in 3D human body reconstruction. However, most existing approaches rely on data from a specific sensor, which can be unreliable due to the inherent limitations of individual sensing modalities. Additionally, existing multi-modal fusion methods generally require customized designs based on the specific sensor combinations or setups, which limits the flexibility and generality of these methods. Furthermore, conventional point-image projection-based and Transformer-based fusion networks are susceptible to the influence of noisy modalities and sensor poses. To address these limitations and achieve robust 3D human body reconstruction in various conditions, we propose AdaptiveFusion, a generic adaptive multi-modal multi-view fusion framework that can effectively incorporate arbitrary combinations of uncalibrated sensor inputs. By treating different modalities from various viewpoints as equal tokens, and our handcrafted modality sampling module by leveraging the inherent flexibility of Transformer models, AdaptiveFusion is able to cope with arbitrary numbers of inputs and accommodate noisy modalities with only a single training network. Extensive experiments on large-scale human datasets demonstrate the effectiveness of AdaptiveFusion in achieving high-quality 3D human body reconstruction in various environments. In addition, our method achieves superior accuracy compared to state-of-the-art fusion methods. △ Less

Submitted 13 March, 2025; v1 submitted 7 September, 2024; originally announced September 2024.

Comments: TMM 2025, Project Page: https://chen3110.github.io/adaptivefusion/index.html

arXiv:2409.03635 [pdf, ps, other]

On the Relativistic Zero Knowledge Quantum Proofs of Knowledge

Authors: Kaiyan Shi, Kaushik Chakraborty, Wen Yu Kon, Omar Amer, Marco Pistoia, Charles Lim

Abstract: We initiate the study of relativistic zero-knowledge quantum proof of knowledge systems with classical communication, formally defining a number of useful concepts and constructing appropriate knowledge extractors for all the existing protocols in the relativistic setting which satisfy a weaker variant of the special soundness property due to Unruh (EUROCRYPT 2012). We show that there exists quant… ▽ More We initiate the study of relativistic zero-knowledge quantum proof of knowledge systems with classical communication, formally defining a number of useful concepts and constructing appropriate knowledge extractors for all the existing protocols in the relativistic setting which satisfy a weaker variant of the special soundness property due to Unruh (EUROCRYPT 2012). We show that there exists quantum proofs of knowledge with knowledge error 1/2 + negl(η) for all relations in NP via a construction of such a system for the Hamiltonian cycle relation using a general relativistic commitment scheme exhibiting the fairly-binding property due to Fehr and Fillinger (EUROCRYPT 2016). We further show that one can construct quantum proof of knowledge extractors for proof systems which do not exhibit special soundness, and therefore require an extractor to rewind multiple times. We develop a new multi-prover quantum rewinding technique by combining ideas from monogamy of entanglement and gentle measurement lemmas that can break the quantum rewinding barrier. Finally, we prove a new bound on the impact of consecutive measurements and use it to significantly improve the soundness bound of some existing relativistic zero knowledge proof systems, such as the one due to Chailloux and Leverrier (EUROCRYPT 2017). △ Less

Submitted 17 December, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

Comments: 38 pages

arXiv:2408.04820 [pdf, other]

doi 10.1145/3696630.3728541

Natural Language Outlines for Code: Literate Programming in the LLM Era

Authors: Kensen Shi, Deniz Altınbüken, Saswat Anand, Mihai Christodorescu, Katja Grünwedel, Alexa Koenings, Sai Naidu, Anurag Pathak, Marc Rasi, Fredde Ribeiro, Brandon Ruffin, Siddhant Sanyam, Maxim Tabachnyk, Sara Toth, Roy Tu, Tobias Welp, Pengcheng Yin, Manzil Zaheer, Satish Chandra, Charles Sutton

Abstract: We propose using natural language outlines as a novel modality and interaction surface for providing AI assistance to developers throughout the software development process. An NL outline for a code function comprises multiple statements written in concise prose, which partition the code and summarize its main ideas in the style of literate programming. Crucially, we find that modern LLMs can gene… ▽ More We propose using natural language outlines as a novel modality and interaction surface for providing AI assistance to developers throughout the software development process. An NL outline for a code function comprises multiple statements written in concise prose, which partition the code and summarize its main ideas in the style of literate programming. Crucially, we find that modern LLMs can generate accurate and high-quality NL outlines in practice. Moreover, NL outlines enable a bidirectional sync between code and NL, where a developer can change either code or NL and have the LLM automatically update the other. We discuss many use cases for NL outlines: they can accelerate understanding and navigation of code and diffs, simplify code maintenance, augment code search, steer code generation, and more. We then propose and compare multiple LLM prompting techniques for generating outlines and ask professional developers to judge outline quality. Finally, we present two case studies applying NL outlines toward code review and malware detection. △ Less

Submitted 17 April, 2025; v1 submitted 8 August, 2024; originally announced August 2024.

Comments: Accepted to FSE'25 Industry Track

arXiv:2408.04201 [pdf, ps, other]

doi 10.1016/j.nuclphysb.2024.116777

Exact solution of a quantum integrable system associated with the $G_2$ exceptional Lie algebra

Authors: Guang-Liang Li, Junpeng Cao, Pei Sun, Wen-Li Yang, Kangjie Shi, Yupeng Wang

Abstract: A quantum integrable spin chain model associated with the $G_2$ exceptional Lie algebra is studied. By using the fusion technique, the closed recursive relations among the fused transfer matrices are obtained. These identities allow us to derive the exact energy spectrum and Bethe ansatz equations of the system based on polynomial analysis. The present method provides a unified treatment to invest… ▽ More A quantum integrable spin chain model associated with the $G_2$ exceptional Lie algebra is studied. By using the fusion technique, the closed recursive relations among the fused transfer matrices are obtained. These identities allow us to derive the exact energy spectrum and Bethe ansatz equations of the system based on polynomial analysis. The present method provides a unified treatment to investigate the Bethe ansatz solutions for both periodic and non-diagonal open boundary conditions associated with exceptional Lie algebras. △ Less

Submitted 16 December, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

Comments: Some numerical checks for small site numbers are added; 36 pages

Journal ref: Nucl. Phys. B 1010 (2025), 116777

arXiv:2407.16396 [pdf, other]

Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors

Authors: Wenyuan Zhang, Kanle Shi, Yu-Shen Liu, Zhizhong Han

Abstract: Unsigned distance functions (UDFs) have been a vital representation for open surfaces. With different differentiable renderers, current methods are able to train neural networks to infer a UDF by minimizing the rendering errors on the UDF to the multi-view ground truth. However, these differentiable renderers are mainly handcrafted, which makes them either biased on ray-surface intersections, or s… ▽ More Unsigned distance functions (UDFs) have been a vital representation for open surfaces. With different differentiable renderers, current methods are able to train neural networks to infer a UDF by minimizing the rendering errors on the UDF to the multi-view ground truth. However, these differentiable renderers are mainly handcrafted, which makes them either biased on ray-surface intersections, or sensitive to unsigned distance outliers, or not scalable to large scale scenes. To resolve these issues, we present a novel differentiable renderer to infer UDFs more accurately. Instead of using handcrafted equations, our differentiable renderer is a neural network which is pre-trained in a data-driven manner. It learns how to render unsigned distances into depth images, leading to a prior knowledge, dubbed volume rendering priors. To infer a UDF for an unseen scene from multiple RGB images, we generalize the learned volume rendering priors to map inferred unsigned distances in alpha blending for RGB image rendering. Our results show that the learned volume rendering priors are unbiased, robust, scalable, 3D aware, and more importantly, easy to learn. We evaluate our method on both widely used benchmarks and real scenes, and report superior performance over the state-of-the-art methods. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024. Project page: https://wen-yuan-zhang.github.io/VolumeRenderingPriors/

arXiv:2407.11789 [pdf, other]

Large Language Models as Misleading Assistants in Conversation

Authors: Betty Li Hou, Kejian Shi, Jason Phang, James Aung, Steven Adler, Rosie Campbell

Abstract: Large Language Models (LLMs) are able to provide assistance on a wide range of information-seeking tasks. However, model outputs may be misleading, whether unintentionally or in cases of intentional deception. We investigate the ability of LLMs to be deceptive in the context of providing assistance on a reading comprehension task, using LLMs as proxies for human users. We compare outcomes of (1) w… ▽ More Large Language Models (LLMs) are able to provide assistance on a wide range of information-seeking tasks. However, model outputs may be misleading, whether unintentionally or in cases of intentional deception. We investigate the ability of LLMs to be deceptive in the context of providing assistance on a reading comprehension task, using LLMs as proxies for human users. We compare outcomes of (1) when the model is prompted to provide truthful assistance, (2) when it is prompted to be subtly misleading, and (3) when it is prompted to argue for an incorrect answer. Our experiments show that GPT-4 can effectively mislead both GPT-3.5-Turbo and GPT-4, with deceptive assistants resulting in up to a 23% drop in accuracy on the task compared to when a truthful assistant is used. We also find that providing the user model with additional context from the passage partially mitigates the influence of the deceptive model. This work highlights the ability of LLMs to produce misleading information and the effects this may have in real-world situations. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Next Generation of AI Safety Workshop, 41st International Conference on Machine Learning (ICML 2024)

arXiv:2406.19545 [pdf, other]

Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations

Authors: Ritam Dutt, Zhen Wu, Kelly Shi, Divyanshu Sheth, Prakhar Gupta, Carolyn Penstein Rose

Abstract: We present a generalizable classification approach that leverages Large Language Models (LLMs) to facilitate the detection of implicitly encoded social meaning in conversations. We design a multi-faceted prompt to extract a textual explanation of the reasoning that connects visible cues to underlying social meanings. These extracted explanations or rationales serve as augmentations to the conversa… ▽ More We present a generalizable classification approach that leverages Large Language Models (LLMs) to facilitate the detection of implicitly encoded social meaning in conversations. We design a multi-faceted prompt to extract a textual explanation of the reasoning that connects visible cues to underlying social meanings. These extracted explanations or rationales serve as augmentations to the conversational text to facilitate dialogue understanding and transfer. Our empirical results over 2,340 experimental settings demonstrate the significant positive impact of adding these rationales. Our findings hold true for in-domain classification, zero-shot, and few-shot domain transfer for two different social meaning detection tasks, each spanning two different corpora. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: To appear at The Proceedings of the Association for Computational Linguistics, 2024

arXiv:2406.16263 [pdf, ps, other]

Discrete-time Integral Resonant Control of Negative Imaginary Systems: Application to a High-speed Nanopositioner

Authors: Kanghong Shi, Erfan Khodabakhshi, Prosanto Biswas, Ian R. Petersen, S. O. Reza Moheimani

Abstract: We propose a discrete-time integral resonant control (IRC) approach for negative imaginary (NI) systems, which overcomes several limitations of continuous-time IRC. We show that a discrete-time IRC has a step-advanced negative imaginary property. A zero-order hold-sampled NI system can be asymptotically stabilized using a discrete-time IRC with suitable parameters. A hardware experiment is conduct… ▽ More We propose a discrete-time integral resonant control (IRC) approach for negative imaginary (NI) systems, which overcomes several limitations of continuous-time IRC. We show that a discrete-time IRC has a step-advanced negative imaginary property. A zero-order hold-sampled NI system can be asymptotically stabilized using a discrete-time IRC with suitable parameters. A hardware experiment is conducted where a high-speed flexure-guided nanopositioner is efficiently damped using the proposed discrete-time IRC with the discrete-time controller being implemented in FPGA hardware at the sampling rate of 1.25 MHz. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 10 pages, 10 figures

arXiv:2406.13179 [pdf, other]

Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting

Authors: Shuai Wang, Dehao Zhang, Kexin Shi, Yuchen Wang, Wenjie Wei, Jibin Wu, Malu Zhang

Abstract: Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative… ▽ More Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative modules: 1) Global-Local Spiking Convolution (GLSC) module and 2) Bottleneck-PLIF module. Compared to the hand-crafted feature extraction methods, the GLSC module achieves speech feature extraction that is sparser, more energy-efficient, and yields better performance. The Bottleneck-PLIF module further processes the signals from GLSC with the aim to achieve higher accuracy with fewer parameters. Extensive experiments are conducted on the Google Speech Commands Dataset (V1 and V2). The results show our method achieves competitive performance among SNN-based KWS models with fewer parameters. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.07835 [pdf, other]

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

Authors: David Wadden, Kejian Shi, Jacob Morrison, Aakanksha Naik, Shruti Singh, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, Arman Cohan

Abstract: We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF demonstrations are notable for their long input contexts, detailed t… ▽ More We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF demonstrations are notable for their long input contexts, detailed task specifications, and complex structured outputs. While instruction-following resources are available in specific domains such as clinical medicine and chemistry, SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields. To demonstrate the utility of SciRIFF, we develop a sample-efficient strategy to adapt a general instruction-following model for science by performing additional finetuning on a mix of general-domain and SciRIFF demonstrations. In evaluations on nine held-out scientific tasks, our model -- called SciTulu -- improves over a strong LLM baseline by 28.1% and 6.5% at the 7B and 70B scales respectively, while maintaining general instruction-following performance within 2% of the baseline. We are optimistic that SciRIFF will facilitate the development and evaluation of LLMs to help researchers navigate the ever-growing body of scientific literature. We release our dataset, model checkpoints, and data processing and evaluation code to enable further research. △ Less

Submitted 19 August, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: Submitted to NeurIPS Datasets and Benchmarks 2024

arXiv:2406.01643 [pdf, other]

Unified Control of Voltage, Frequency and Angle in Electrical Power Systems: A Passivity and Negative-Imaginary based Approach

Authors: Yijun Chen, Kanghong Shi, Ian R. Petersen, Elizabeth L. Ratnam

Abstract: This paper proposes a unified methodology for voltage regulation, frequency synchronization, and rotor angle control in power transmission systems considering a one-axis generator model with time-varying voltages. First, we formulate an output consensus problem with a passivity and negative-imaginary (NI) based control framework. We establish output consensus results for both networked passive sys… ▽ More This paper proposes a unified methodology for voltage regulation, frequency synchronization, and rotor angle control in power transmission systems considering a one-axis generator model with time-varying voltages. First, we formulate an output consensus problem with a passivity and negative-imaginary (NI) based control framework. We establish output consensus results for both networked passive systems and networked NI systems. Next, we apply the output consensus problem by controlling large-scale batteries co-located with synchronous generators -- using real-time voltage phasor measurements. By controlling the battery storage systems so as to dispatch real and reactive power, we enable simultaneous control of voltage, frequency, and power angle differences across a transmission network. Validation through numerical simulations on a four-area transmission network confirms the robustness of our unified control framework. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 8 pages, 7 figures, the 63rd IEEE Conference on Decision and Control. arXiv admin note: text overlap with arXiv:2406.01206

arXiv:2406.01206 [pdf, other]

On the Stability of Networked Nonlinear Negative Imaginary Systems with Applications to Electrical Power Systems

Authors: Yijun Chen, Kanghong Shi, Ian R. Petersen, Elizabeth L. Ratnam

Abstract: In the transition to achieving net zero emissions, it has been suggested that a substantial expansion of electric power grids will be necessary to support emerging renewable energy zones. In this paper, we propose employing battery-based feedback control and nonlinear negative imaginary (NI) systems theory to reduce the need for such expansion. By formulating a novel Luré-Postnikov-like Lyapunov f… ▽ More In the transition to achieving net zero emissions, it has been suggested that a substantial expansion of electric power grids will be necessary to support emerging renewable energy zones. In this paper, we propose employing battery-based feedback control and nonlinear negative imaginary (NI) systems theory to reduce the need for such expansion. By formulating a novel Luré-Postnikov-like Lyapunov function, stability results are presented for the feedback interconnection of two single nonlinear NI systems, while output feedback consensus results are established for the feedback interconnection of two networked nonlinear NI systems based on a network topology. This theoretical framework underpins our design of battery-based control in power transmission systems. We demonstrate that the power grid can be gradually transitioned into the proposed NI systems, one transmission line at a time. △ Less

Submitted 11 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: 8 pages, 2 figures, 26th International Symposium on Mathematical Theory of Networks and Systems

arXiv:2405.20215 [pdf, other]

TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models

Authors: Chen Zhang, Chengguang Tang, Dading Chong, Ke Shi, Guohua Tang, Feng Jiang, Haizhou Li

Abstract: Mainstream approaches to aligning large language models (LLMs) heavily rely on human preference data, particularly when models require periodic updates. The standard process for iterative alignment of LLMs involves collecting new human feedback for each update. However, the data collection process is costly and challenging to scale. To address this issue, we introduce the "TS-Align" framework, whi… ▽ More Mainstream approaches to aligning large language models (LLMs) heavily rely on human preference data, particularly when models require periodic updates. The standard process for iterative alignment of LLMs involves collecting new human feedback for each update. However, the data collection process is costly and challenging to scale. To address this issue, we introduce the "TS-Align" framework, which fine-tunes a policy model using pairwise feedback data automatically mined from its outputs. This automatic mining process is efficiently accomplished through the collaboration between a large-scale teacher model and a small-scale student model. The policy fine-tuning process can be iteratively repeated using on-policy generations within our proposed teacher-student collaborative framework. Through extensive experiments, we demonstrate that our final aligned policy outperforms the base policy model with an average win rate of 69.7% across seven conversational or instruction-following datasets. Furthermore, we show that the ranking capability of the teacher is effectively distilled into the student through our pipeline, resulting in a small-scale yet effective reward model for policy model alignment. △ Less

Submitted 29 September, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: EMNLP-2024 Findings

arXiv:2405.19299 [pdf, other]

Expert-Guided Extinction of Toxic Tokens for Debiased Generation

Authors: Xueyao Sun, Kaize Shi, Haoran Tang, Guandong Xu, Qing Li

Abstract: Large language models (LLMs) can elicit social bias during generations, especially when inference with toxic prompts. Controlling the sensitive attributes in generation encounters challenges in data distribution, generalizability, and efficiency. Specifically, fine-tuning and retrieval demand extensive unbiased corpus, while direct prompting requires meticulously curated instructions for correctin… ▽ More Large language models (LLMs) can elicit social bias during generations, especially when inference with toxic prompts. Controlling the sensitive attributes in generation encounters challenges in data distribution, generalizability, and efficiency. Specifically, fine-tuning and retrieval demand extensive unbiased corpus, while direct prompting requires meticulously curated instructions for correcting the output in multiple rounds of thoughts but poses challenges on memory and inference latency. In this work, we propose the Expert-Guided Extinction of Toxic Tokens for Debiased Generation (EXPOSED) to eliminate the undesired harmful outputs for LLMs without the aforementioned requirements. EXPOSED constructs a debiasing expert based on the abundant toxic corpus to expose and elicit the potentially dangerous tokens. It then processes the output to the LLMs and constructs a fair distribution by suppressing and attenuating the toxic tokens. EXPOSED is evaluated on fairness benchmarks over three LLM families. Extensive experiments demonstrate that compared with other baselines, the proposed EXPOSED significantly reduces the potential social bias while balancing fairness and generation performance. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18067 [pdf, ps, other]

Ekeland-Hofer-Zehnder capacities of lagrangian products with special forms

Authors: Kun Shi

Abstract: In this paper, we give some estimations for Ekeland-Hofer-Zehnder capacities of lagrangian products with special forms through combinatorial formulas. Based on these estimations, we give some interesting corollaries. In this paper, we give some estimations for Ekeland-Hofer-Zehnder capacities of lagrangian products with special forms through combinatorial formulas. Based on these estimations, we give some interesting corollaries. △ Less

Submitted 5 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: 6pages

MSC Class: 53D05; 53C23 (primary); 70H05; 57R17 (secondary)

arXiv:2405.17659 [pdf, other]

Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba

Authors: Jiahao Huang, Liutao Yang, Fanwen Wang, Yang Nan, Weiwen Wu, Chengyan Wang, Kuangyu Shi, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb, Daoqiang Zhang, Guang Yang

Abstract: Deep learning has been extensively applied in medical image reconstruction, where Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has sh… ▽ More Deep learning has been extensively applied in medical image reconstruction, where Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has shown superiority in learning visual representation, which combines the advantages of linear scalability and global sensitivity. In this study, we introduce MambaMIR, an Arbitrary-Masked Mamba-based model with wavelet decomposition for joint medical image reconstruction and uncertainty estimation. A novel Arbitrary Scan Masking (ASM) mechanism "masks out" redundant information to introduce randomness for further uncertainty estimation. Compared to the commonly used Monte Carlo (MC) dropout, our proposed MC-ASM provides an uncertainty map without the need for hyperparameter tuning and mitigates the performance drop typically observed when applying dropout to low-level tasks. For further texture preservation and better perceptual quality, we employ the wavelet transformation into MambaMIR and explore its variant based on the Generative Adversarial Network, namely MambaMIR-GAN. Comprehensive experiments have been conducted for multiple representative medical image reconstruction tasks, demonstrating that the proposed MambaMIR and MambaMIR-GAN outperform other baseline and state-of-the-art methods in different reconstruction tasks, where MambaMIR achieves the best reconstruction fidelity and MambaMIR-GAN has the best perceptual quality. In addition, our MC-ASM provides uncertainty maps as an additional tool for clinicians, while mitigating the typical performance drop caused by the commonly used dropout. △ Less

Submitted 25 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.12996 [pdf, ps, other]

Dose-aware Diffusion Model for 3D PET Image Denoising: Multi-institutional Validation with Reader Study and Real Low-dose Data

Authors: Huidong Xie, Weijie Gan, Reimund Bayerlein, Bo Zhou, Ming-Kai Chen, Michal Kulon, Annemarie Boustani, Kuan-Yin Ko, Der-Shiun Wang, Benjamin A. Spencer, Wei Ji, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, Yinchi Zhou, Hui Liu, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Ge Wang , et al. (2 additional authors not shown)

Abstract: Reducing scan times, radiation dose, and enhancing image quality for lower-performance scanners, are critical in low-dose PET imaging. Deep learning techniques have been investigated for PET image denoising. However, existing models have often resulted in compromised image quality when achieving low-count/low-dose PET and have limited generalizability to different image noise-levels, acquisition p… ▽ More Reducing scan times, radiation dose, and enhancing image quality for lower-performance scanners, are critical in low-dose PET imaging. Deep learning techniques have been investigated for PET image denoising. However, existing models have often resulted in compromised image quality when achieving low-count/low-dose PET and have limited generalizability to different image noise-levels, acquisition protocols, and patient populations. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for medical imaging tasks. However, for low-dose PET imaging, existing diffusion models failed to generate consistent 3D reconstructions, unable to generalize across varying noise-levels, often produced visually-appealing but distorted image details, and produced images with biased tracer uptake. Here, we develop DDPET-3D, a dose-aware diffusion model for 3D low-dose PET imaging to address these challenges. Collected from 4 medical centers globally with different scanners and clinical protocols, we evaluated the proposed model using a total of 9,783 18F-FDG studies with low-dose levels ranging from 1% to 50%. With a cross-center, cross-scanner validation, the proposed DDPET-3D demonstrated its potential to generalize to different low-dose levels, different scanners, and different clinical protocols. As confirmed with reader studies performed by board-certified nuclear medicine physicians, experienced readers judged the images to be similar or superior to the full-dose images and previous DL baselines based on qualitative visual impression. Lesion-level quantitative accuracy was evaluated using a Monte Carlo simulation study and a lesion segmentation network. The presented results show the potential to achieve low-dose PET while maintaining image quality. Real low-dose scans was also included for evaluation. △ Less

Submitted 16 June, 2025; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 18 Pages, 16 Figures, 5 Tables. Paper under review. First-place Freek J. Beekman Young Investigator Award at SNMMI 2024. Code available after paper publication. arXiv admin note: substantial text overlap with arXiv:2311.04248

arXiv:2405.03085 [pdf, other]

Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation

Authors: Kaize Shi, Xueyao Sun, Qing Li, Guandong Xu

Abstract: Large Language Models (LLMs) have made significant strides in information acquisition. However, their overreliance on potentially flawed parametric knowledge leads to hallucinations and inaccuracies, particularly when handling long-tail, domain-specific queries. Retrieval Augmented Generation (RAG) addresses this limitation by incorporating external, non-parametric knowledge. Nevertheless, the ret… ▽ More Large Language Models (LLMs) have made significant strides in information acquisition. However, their overreliance on potentially flawed parametric knowledge leads to hallucinations and inaccuracies, particularly when handling long-tail, domain-specific queries. Retrieval Augmented Generation (RAG) addresses this limitation by incorporating external, non-parametric knowledge. Nevertheless, the retrieved long-context documents often contain noisy, irrelevant information alongside vital knowledge, negatively diluting LLMs' attention. Inspired by the supportive role of essential concepts in individuals' reading comprehension, we propose a novel concept-based RAG framework with the Abstract Meaning Representation (AMR)-based concept distillation algorithm. The proposed algorithm compresses the cluttered raw retrieved documents into a compact set of crucial concepts distilled from the informative nodes of AMR by referring to reliable linguistic features. The concepts explicitly constrain LLMs to focus solely on vital information in the inference process. We conduct extensive experiments on open-domain question-answering datasets to empirically evaluate the proposed method's effectiveness. The results indicate that the concept-based RAG framework outperforms other baseline methods, particularly as the number of supporting documents increases, while also exhibiting robustness across various backbone LLMs. This emphasizes the distilled concepts are informative for augmenting the RAG process by filtering out interference information. To the best of our knowledge, this is the first work introducing AMR to enhance the RAG, presenting a potential solution to augment inference performance with semantic-based context compression. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.01109 [pdf, other]

Hypergraph $p$-Laplacian regularization on point clouds for data interpolation

Authors: Kehan Shi, Martin Burger

Abstract: As a generalization of graphs, hypergraphs are widely used to model higher-order relations in data. This paper explores the benefit of the hypergraph structure for the interpolation of point cloud data that contain no explicit structural information. We define the $\varepsilon_n$-ball hypergraph and the $k_n$-nearest neighbor hypergraph on a point cloud and study the $p$-Laplacian regularization o… ▽ More As a generalization of graphs, hypergraphs are widely used to model higher-order relations in data. This paper explores the benefit of the hypergraph structure for the interpolation of point cloud data that contain no explicit structural information. We define the $\varepsilon_n$-ball hypergraph and the $k_n$-nearest neighbor hypergraph on a point cloud and study the $p$-Laplacian regularization on the hypergraphs. We prove the variational consistency between the hypergraph $p$-Laplacian regularization and the continuum $p$-Laplacian regularization in a semisupervised setting when the number of points $n$ goes to infinity while the number of labeled points remains fixed. A key improvement compared to the graph case is that the results rely on weaker assumptions on the upper bound of $\varepsilon_n$ and $k_n$. To solve the convex but non-differentiable large-scale optimization problem, we utilize the stochastic primal-dual hybrid gradient algorithm. Numerical experiments on data interpolation verify that the hypergraph $p$-Laplacian regularization outperforms the graph $p$-Laplacian regularization in preventing the development of spikes at the labeled points. △ Less

Submitted 17 March, 2025; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 34 pages

MSC Class: 49J55; 35J20; 65N12

arXiv:2404.19689 [pdf, ps, other]

Continuum limit of $p$-biharmonic equations on graphs

Authors: Kehan Shi, Martin Burger

Abstract: This paper studies the $p$-biharmonic equation on graphs, which arises in point cloud processing and can be interpreted as a natural extension of the graph $p$-Laplacian from the perspective of hypergraph. The asymptotic behavior of the solution is investigated when the random geometric graph is considered and the number of data points goes to infinity. We show that the continuum limit is an appro… ▽ More This paper studies the $p$-biharmonic equation on graphs, which arises in point cloud processing and can be interpreted as a natural extension of the graph $p$-Laplacian from the perspective of hypergraph. The asymptotic behavior of the solution is investigated when the random geometric graph is considered and the number of data points goes to infinity. We show that the continuum limit is an appropriately weighted $p$-biharmonic equation with homogeneous Neumann boundary conditions. The result relies on the uniform $L^p$ estimates for solutions and gradients of nonlocal and graph Poisson equations. The $L^\infty$ estimates of solutions are also obtained as a byproduct. △ Less

Submitted 25 April, 2025; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: 21 pages

MSC Class: 35R02; 35J30; 65N12

arXiv:2404.17994 [pdf]

LeqMod: Adaptable Lesion-Quantification-Consistent Modulation for Deep Learning Low-Count PET Image Denoising

Authors: Menghua Xia, Huidong Xie, Qiong Liu, Bo Zhou, Hanzhong Wang, Biao Li, Axel Rominger, Quanzheng Li, Ramsey D. Badawi, Kuangyu Shi, Georges El Fakhri, Chi Liu

Abstract: Deep learning-based positron emission tomography (PET) image denoising offers the potential to reduce radiation exposure and scanning time by transforming low-count images into high-count equivalents. However, existing methods typically blur crucial details, leading to inaccurate lesion quantification. This paper proposes a lesion-perceived and quantification-consistent modulation (LeqMod) strateg… ▽ More Deep learning-based positron emission tomography (PET) image denoising offers the potential to reduce radiation exposure and scanning time by transforming low-count images into high-count equivalents. However, existing methods typically blur crucial details, leading to inaccurate lesion quantification. This paper proposes a lesion-perceived and quantification-consistent modulation (LeqMod) strategy for enhanced PET image denoising, via employing downstream lesion quantification analysis as auxiliary tools. The LeqMod is a plug-and-play design adaptable to a wide range of model architectures, modulating the sampling and optimization procedures of model training without adding any computational burden to the inference phase. Specifically, the LeqMod consists of two components, the lesion-perceived modulation (LeMod) and the multiscale quantification-consistent modulation (QuMod). The LeMod enhances lesion contrast and visibility by allocating higher sampling weights and stricter loss criteria to lesion-present samples determined by an auxiliary segmentation network than lesion-absent ones. The QuMod further emphasizes quantification accuracy for both the mean and maximum standardized uptake value (SUVmean and SUVmax) across multiscale sub-regions throughout the entire image, thereby reducing biases of denoised results relative to high-count references. Experiments conducted on large PET datasets from multiple centers and vendors, and varying noise levels demonstrated the LeqMod efficacy across various denoising frameworks. Compared to frameworks without LeqMod, the integration of LeqMod reduces the lesion SUVmax bias by 5.92% on average and increases the peak signal-to-noise ratio (PSNR) by 0.36 on average, when denoising images across participating sites. △ Less

Submitted 4 March, 2025; v1 submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.14662 [pdf, other]

NExT: Teaching Large Language Models to Reason about Code Execution

Authors: Ansong Ni, Miltiadis Allamanis, Arman Cohan, Yinlin Deng, Kensen Shi, Charles Sutton, Pengcheng Yin

Abstract: A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debug and repair code (aka. rubber duck debugging). However, large language models (LLMs) of code are typically trained on the surface textual form of programs, thus may lack a semantic understanding of h… ▽ More A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debug and repair code (aka. rubber duck debugging). However, large language models (LLMs) of code are typically trained on the surface textual form of programs, thus may lack a semantic understanding of how programs execute at run-time. To address this issue, we propose NExT, a method to teach LLMs to inspect the execution traces of programs (variable states of executed lines) and reason about their run-time behavior through chain-of-thought (CoT) rationales. Specifically, NExT uses self-training to bootstrap a synthetic training set of execution-aware rationales that lead to correct task solutions (e.g., fixed programs) without laborious manual annotation. Experiments on program repair tasks based on MBPP and HumanEval demonstrate that NExT improves the fix rate of a PaLM 2 model, by 26.1% and 14.3% absolute, respectively, with significantly improved rationale quality as verified by automated metrics and human raters. Our model can also generalize to scenarios where program traces are absent at test-time. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 35 pages

arXiv:2404.06851 [pdf, other]

UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion

Authors: Junsheng Zhou, Weiqi Zhang, Baorui Ma, Kanle Shi, Yu-Shen Liu, Zhizhong Han

Abstract: Diffusion models have shown remarkable results for image generation, editing and inpainting. Recent works explore diffusion models for 3D shape generation with neural implicit functions, i.e., signed distance function and occupancy function. However, they are limited to shapes with closed surfaces, which prevents them from generating diverse 3D real-world contents containing open surfaces. In this… ▽ More Diffusion models have shown remarkable results for image generation, editing and inpainting. Recent works explore diffusion models for 3D shape generation with neural implicit functions, i.e., signed distance function and occupancy function. However, they are limited to shapes with closed surfaces, which prevents them from generating diverse 3D real-world contents containing open surfaces. In this work, we present UDiFF, a 3D diffusion model for unsigned distance fields (UDFs) which is capable to generate textured 3D shapes with open surfaces from text conditions or unconditionally. Our key idea is to generate UDFs in spatial-frequency domain with an optimal wavelet transformation, which produces a compact representation space for UDF generation. Specifically, instead of selecting an appropriate wavelet transformation which requires expensive manual efforts and still leads to large information loss, we propose a data-driven approach to learn the optimal wavelet transformation for UDFs. We evaluate UDiFF to show our advantages by numerical and visual comparisons with the latest methods on widely used benchmarks. Page: https://weiqi-zhang.github.io/UDiFF. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: To appear at CVPR2024. Project page: https://weiqi-zhang.github.io/UDiFF

arXiv:2404.05528 [pdf, other]

NAND-like SOT-MRAM-based Approximate Storage for Error-Tolerant Applications

Authors: Min Wang, Zhengyi Hou, Chenyi Wang, Zhengjie Yan, Shixing Li, Ao Du, Wenlong Cai, Jinhao Li, Hongchao Zhang, Kaihua Cao, Kewen Shi, Bi Wang, Yuanfu Zhao, Qingyi Xiang, Zhaohao Wang, Weisheng Zhao

Abstract: We demonstrate approximate storage based on NAND-like spin-orbit torque (SOT) MRAM, through "device-modeling-architecture" explorations. We experimentally achieve down to 1E-5 level selectivity. Selectivity and low-power solutions are established by numerical calculation workflow. System-level power consumption is evaluated in the 512 KB last-level cache according to 5 quality levels. Error-tolera… ▽ More We demonstrate approximate storage based on NAND-like spin-orbit torque (SOT) MRAM, through "device-modeling-architecture" explorations. We experimentally achieve down to 1E-5 level selectivity. Selectivity and low-power solutions are established by numerical calculation workflow. System-level power consumption is evaluated in the 512 KB last-level cache according to 5 quality levels. Error-tolerant applications, such as image processing, alleviate the demand for selectivity down to the 5E-2 level, leading to 54% ~ 61% energy-saving. Our proposal paves the novel and suitable path for high-density and low-power NAND-like SOT-MRAM. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.19276 [pdf, ps, other]

Enhanced Bayesian Personalized Ranking for Robust Hard Negative Sampling in Recommender Systems

Authors: Kexin Shi, Jing Zhang, Linjiajie Fang, Wenjia Wang, Bingyi Jing

Abstract: In implicit collaborative filtering, hard negative mining techniques are developed to accelerate and enhance the recommendation model learning. However, the inadvertent selection of false negatives remains a major concern in hard negative sampling, as these false negatives can provide incorrect information and mislead the model learning. To date, only a small number of studies have been committed… ▽ More In implicit collaborative filtering, hard negative mining techniques are developed to accelerate and enhance the recommendation model learning. However, the inadvertent selection of false negatives remains a major concern in hard negative sampling, as these false negatives can provide incorrect information and mislead the model learning. To date, only a small number of studies have been committed to solve the false negative problem, primarily focusing on designing sophisticated sampling algorithms to filter false negatives. In contrast, this paper shifts its focus to refining the loss function. We find that the original Bayesian Personalized Ranking (BPR), initially designed for uniform negative sampling, is inadequate in adapting to hard sampling scenarios. Hence, we introduce an enhanced Bayesian Personalized Ranking objective, named as Hard-BPR, which is specifically crafted for dynamic hard negative sampling to mitigate the influence of false negatives. This method is simple yet efficient for real-world deployment. Extensive experiments conducted on three real-world datasets demonstrate the effectiveness and robustness of our approach, along with the enhanced ability to distinguish false negatives. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 9 pages

arXiv:2403.16046 [pdf, ps, other]

Digital control of negative imaginary systems: a discrete-time hybrid integrator-gain system approach

Authors: Kanghong Shi, Ian R. Petersen

Abstract: A hybrid integrator-gain system (HIGS) is a control element that switches between an integrator and a gain, which overcomes some inherent limitations of linear controllers. In this paper, we consider using discrete-time HIGS controllers for the digital control of negative imaginary (NI) systems. We show that the discrete-time HIGS themselves are step-advanced negative imaginary systems. For a mini… ▽ More A hybrid integrator-gain system (HIGS) is a control element that switches between an integrator and a gain, which overcomes some inherent limitations of linear controllers. In this paper, we consider using discrete-time HIGS controllers for the digital control of negative imaginary (NI) systems. We show that the discrete-time HIGS themselves are step-advanced negative imaginary systems. For a minimal linear NI system, there always exists a HIGS controller that can asymptotically stablize it. An illustrative example is provided, where we use the proposed HIGS control method to stabilize a discrete-time mass-spring system. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: To appear in the 2024 European Control Conference. 7 pages, 3 figures

arXiv:2403.15140 [pdf, ps, other]

Hybrid integrator-gain system based integral resonant controllers for negative imaginary systems

Authors: Kanghong Shi, Ian R. Petersen

Abstract: We introduce a hybrid control system called a hybrid integrator-gain system (HIGS) based integral resonant controller (IRC) to stabilize negative imaginary (NI) systems. A HIGS-based IRC has a similar structure to an IRC, with the integrator replaced by a HIGS. We show that a HIGS-based IRC is an NI system. Also, for a SISO NI system with a minimal realization, we show there exists a HIGS-based IR… ▽ More We introduce a hybrid control system called a hybrid integrator-gain system (HIGS) based integral resonant controller (IRC) to stabilize negative imaginary (NI) systems. A HIGS-based IRC has a similar structure to an IRC, with the integrator replaced by a HIGS. We show that a HIGS-based IRC is an NI system. Also, for a SISO NI system with a minimal realization, we show there exists a HIGS-based IRC such that their closed-loop interconnection is asymptotically stable. Also, we propose a proportional-integral-double-integral resonant controller and a HIGS-based proportional-integral-double-integral resonant controller, and we show that both of them can be applied to asymptotically stabilize an NI system. An example is provided to illustrate the proposed results. △ Less

Submitted 9 September, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: 9 pages, 9 figures. The 63rd IEEE Conference on Decision and Control (CDC 2024)

arXiv:2403.05769 [pdf]

doi 10.1103/PhysRevMaterials.7.035201

High-rectification near-field radiative thermal diode using Weyl semimetals

Authors: Yang Hu, Haotuo Liu, Bing Yang, Kezhang Shi, Mauro Antezza, Xiaohu Wu, Yasong Sun

Abstract: Thermal diodes, which allow heat transfer in a preferential direction while being blocked in a reverse direction, have numerous applications in thermal management, information processing, energy harvesting, etc. Typical materials of thermal diodes in previous works include phase-change and magneto-optical materials. However, such thermal diodes highly depend on specific working temperatures or ext… ▽ More Thermal diodes, which allow heat transfer in a preferential direction while being blocked in a reverse direction, have numerous applications in thermal management, information processing, energy harvesting, etc. Typical materials of thermal diodes in previous works include phase-change and magneto-optical materials. However, such thermal diodes highly depend on specific working temperatures or external magnetic fields. In this work, we propose a near-field radiative thermal diode (NFRTD) based on two Weyl semimetals (WSMs) nanoparticles (NPs) mediated by WSMs planar substrate, which works without external magnetic field and with flexible temperatures. Numerical results show that the maximum rectification ratio of NFRTD can be up to 2673 when the emitter is 200 K and receiver is 180 K, which exceeds the maximum value reported in previous works by more than 10 times. The underlying physical mechanism is the strong coupling of the localized plasmon modes in the NPs and nonreciprocal surface plasmon polaritons in the substrate. In addition, we calculate the distribution of the Green function and reflection coefficient to investigate nonreciprocal energy transfer in NFRTD. Finally, we discuss the effects of momentum-separation on the rectification performance of the NFRTD. This work demonstrates the great potential of WSMs in thermal rectification and paves a novel path in designing high-performance NFRTD. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Journal ref: Phys. Rev. Materials 7, 035201 (2023)

arXiv:2403.03346 [pdf, other]

Enhancing Vision-Language Pre-training with Rich Supervisions

Authors: Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto

Abstract: We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering. Using web screenshots unlocks a treasure trove of visual and textual cues that are not present in using image-text pairs. In S4, we leverage the inherent tree-structured hierarchy of HTML elements and the spatial localiza… ▽ More We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering. Using web screenshots unlocks a treasure trove of visual and textual cues that are not present in using image-text pairs. In S4, we leverage the inherent tree-structured hierarchy of HTML elements and the spatial localization to carefully design 10 pre-training tasks with large scale annotated data. These tasks resemble downstream tasks across different domains and the annotations are cheap to obtain. We demonstrate that, compared to current screenshot pre-training objectives, our innovative pre-training method significantly enhances performance of image-to-text model in nine varied and popular downstream tasks - up to 76.1% improvements on Table Detection, and at least 1% on Widget Captioning. △ Less

Submitted 12 March, 2025; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.02249 [pdf, other]

Non-autoregressive Sequence-to-Sequence Vision-Language Models

Authors: Kunyu Shi, Qi Dong, Luis Goncalves, Zhuowen Tu, Stefano Soatto

Abstract: Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions. We propose a parallel decoding sequence-to-sequence vision-language model, trained with a Query-CTC loss, that marginalizes over multiple inference paths in the decoder. This allows us to model the joint distributi… ▽ More Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions. We propose a parallel decoding sequence-to-sequence vision-language model, trained with a Query-CTC loss, that marginalizes over multiple inference paths in the decoder. This allows us to model the joint distribution of tokens, rather than restricting to conditional distribution as in an autoregressive model. The resulting model, NARVL, achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time, reducing from the linear complexity associated with the sequential generation of tokens to a paradigm of constant time joint inference. △ Less

Submitted 12 March, 2025; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2402.15134 [pdf, other]

Deep Coupling Network For Multivariate Time Series Forecasting

Authors: Kun Yi, Qi Zhang, Hui He, Kaize Shi, Liang Hu, Ning An, Zhendong Niu

Abstract: Multivariate time series (MTS) forecasting is crucial in many real-world applications. To achieve accurate MTS forecasting, it is essential to simultaneously consider both intra- and inter-series relationships among time series data. However, previous work has typically modeled intra- and inter-series relationships separately and has disregarded multi-order interactions present within and between… ▽ More Multivariate time series (MTS) forecasting is crucial in many real-world applications. To achieve accurate MTS forecasting, it is essential to simultaneously consider both intra- and inter-series relationships among time series data. However, previous work has typically modeled intra- and inter-series relationships separately and has disregarded multi-order interactions present within and between time series data, which can seriously degrade forecasting accuracy. In this paper, we reexamine intra- and inter-series relationships from the perspective of mutual information and accordingly construct a comprehensive relationship learning mechanism tailored to simultaneously capture the intricate multi-order intra- and inter-series couplings. Based on the mechanism, we propose a novel deep coupling network for MTS forecasting, named DeepCN, which consists of a coupling mechanism dedicated to explicitly exploring the multi-order intra- and inter-series relationships among time series data concurrently, a coupled variable representation module aimed at encoding diverse variable patterns, and an inference module facilitating predictions through one forward step. Extensive experiments conducted on seven real-world datasets demonstrate that our proposed DeepCN achieves superior performance compared with the state-of-the-art baselines. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.12692 [pdf, other]

FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning

Authors: Xiao Li, Bolin Zhu, Kaiwen Shi, Sichen Liu, Yin Zhu, Yiwei Liu, Gong Cheng

Abstract: The application of formulas (e.g., physics formulas) is a fundamental ability of humans when solving numerical reasoning problems. Existing numerical reasoning datasets seldom explicitly indicate the formulas employed in reasoning, as their questions rely on implicit commonsense mathematical knowledge. In contrast, in this paper, we introduce FormulaReasoning, a new dataset specifically designed f… ▽ More The application of formulas (e.g., physics formulas) is a fundamental ability of humans when solving numerical reasoning problems. Existing numerical reasoning datasets seldom explicitly indicate the formulas employed in reasoning, as their questions rely on implicit commonsense mathematical knowledge. In contrast, in this paper, we introduce FormulaReasoning, a new dataset specifically designed for formula-based numerical reasoning. Each of the 4,751 questions in our dataset requires numerical calculation with external physics formulas, making it a more challenging benchmark for evaluating large language models (LLMs). We offer normalized fine-grained annotations for the questions, available in English and Chinese, including formula structures, parameter names, symbols, numerical values, and units, derived from extensive manual effort with LLM assistance for guaranteed quality. We also provide a consolidated formula database to serve as an external knowledge base accompanying the dataset. We employ FormulaReasoning to evaluate LLMs with 7B to over 100B parameters, and explore retrieval-augmented generation with the formula database. Our evaluation also covers supervised methods that break down the reasoning process into formula generation, parameter extraction, and numerical calculation, as well as direct preference optimization methods based on derived preference data. △ Less

Submitted 18 May, 2025; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11558 [pdf, other]

A Temporally Disentangled Contrastive Diffusion Model for Spatiotemporal Imputation

Authors: Yakun Chen, Kaize Shi, Zhangkai Wu, Juan Chen, Xianzhi Wang, Julian McAuley, Guandong Xu, Shui Yu

Abstract: Spatiotemporal data analysis is pivotal across various domains, such as transportation, meteorology, and healthcare. The data collected in real-world scenarios are often incomplete due to device malfunctions and network errors. Spatiotemporal imputation aims to predict missing values by exploiting the spatial and temporal dependencies in the observed data. Traditional imputation approaches based o… ▽ More Spatiotemporal data analysis is pivotal across various domains, such as transportation, meteorology, and healthcare. The data collected in real-world scenarios are often incomplete due to device malfunctions and network errors. Spatiotemporal imputation aims to predict missing values by exploiting the spatial and temporal dependencies in the observed data. Traditional imputation approaches based on statistical and machine learning techniques require the data to conform to their distributional assumptions, while graph and recurrent neural networks are prone to error accumulation problems due to their recurrent structures. Generative models, especially diffusion models, can potentially circumvent the reliance on inaccurate, previously imputed values for future predictions; However, diffusion models still face challenges in generating stable results. We propose to address these challenges by designing conditional information to guide the generative process and expedite the training process. We introduce a conditional diffusion framework called C$^2$TSD, which incorporates disentangled temporal (trend and seasonality) representations as conditional information and employs contrastive learning to improve generalizability. Our extensive experiments on three real-world datasets demonstrate the superior performance of our approach compared to a number of state-of-the-art baselines. △ Less

Submitted 22 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.08073 [pdf, other]

Grounding Data Science Code Generation with Input-Output Specifications

Authors: Yeming Wen, Pengcheng Yin, Kensen Shi, Henryk Michalewski, Swarat Chaudhuri, Alex Polozov

Abstract: Large language models (LLMs) have recently demonstrated a remarkable ability to generate code from natural language (NL) prompts. However, in the real world, NL is often too ambiguous to capture the true intent behind programming problems, requiring additional input-output (I/O) specifications. Unfortunately, LLMs can have difficulty aligning their outputs with both the NL prompt and the I/O speci… ▽ More Large language models (LLMs) have recently demonstrated a remarkable ability to generate code from natural language (NL) prompts. However, in the real world, NL is often too ambiguous to capture the true intent behind programming problems, requiring additional input-output (I/O) specifications. Unfortunately, LLMs can have difficulty aligning their outputs with both the NL prompt and the I/O specification. In this paper, we give a way to mitigate this issue in the context of data science programming, where tasks require explicit I/O specifications for clarity. Specifically, we propose GIFT4Code, a novel approach for the instruction fine-tuning of LLMs with respect to I/O specifications. Our method leverages synthetic data produced by the LLM itself and utilizes execution-derived feedback as a key learning signal. This feedback, in the form of program I/O specifications, is provided to the LLM to facilitate instruction fine-tuning. We evaluated our approach on two challenging data science benchmarks, Arcade and DS-1000. The results demonstrate a significant improvement in the LLM's ability to generate code that is not only executable but also accurately aligned with user specifications, substantially improving the quality of code generation for complex data science tasks. △ Less

Submitted 14 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.12606 [pdf]

doi 10.1063/5.0032602

The Young-Laplace equation for a solid-liquid interface

Authors: P. Montero de Hijes, K. Shi, E. G. Noya, E. E. Santiso, K. E. Gubbins, E. Sanz, C. Vega

Abstract: The application of the Young-Laplace equation to a solid-liquid interface is considered. Computer simulations show that the pressure inside a solid cluster of hard spheres is smaller than the external pressure of the liquid (both for small and large clusters). That would suggest a negative value for the interfacial free energy. We show that in a Gibbsian description of the thermodynamics of a curv… ▽ More The application of the Young-Laplace equation to a solid-liquid interface is considered. Computer simulations show that the pressure inside a solid cluster of hard spheres is smaller than the external pressure of the liquid (both for small and large clusters). That would suggest a negative value for the interfacial free energy. We show that in a Gibbsian description of the thermodynamics of a curved solid-liquid interface in equilibrium, the choice of the thermodynamic (rather than mechanical) pressure is required, as suggested by Tolman for the liquid-gas scenario. With this definition, the interfacial free energy is positive, and the values obtained are in excellent agreement with previous results from nucleation studies. Although for a curved fluid-fluid interface there is no distinction between mechanical and thermal pressures (for a sufficiently large inner phase), in the solid-liquid they do not coincide, as hypothesized by Gibbs. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Journal ref: J. Chem. Phys. 153, 191102 (2020)

arXiv:2401.08224 [pdf, other]

Privacy Preserving Adaptive Experiment Design

Authors: Jiachun Li, Kaining Shi, David Simchi-Levi

Abstract: Adaptive experiment is widely adopted to estimate conditional average treatment effect (CATE) in clinical trials and many other scenarios. While the primary goal in experiment is to maximize estimation accuracy, due to the imperative of social welfare, it's also crucial to provide treatment with superior outcomes to patients, which is measured by regret in contextual bandit framework. These two ob… ▽ More Adaptive experiment is widely adopted to estimate conditional average treatment effect (CATE) in clinical trials and many other scenarios. While the primary goal in experiment is to maximize estimation accuracy, due to the imperative of social welfare, it's also crucial to provide treatment with superior outcomes to patients, which is measured by regret in contextual bandit framework. These two objectives often lead to contrast optimal allocation mechanism. Furthermore, privacy concerns arise in clinical scenarios containing sensitive data like patients health records. Therefore, it's essential for the treatment allocation mechanism to incorporate robust privacy protection measures. In this paper, we investigate the tradeoff between loss of social welfare and statistical power in contextual bandit experiment. We propose a matched upper and lower bound for the multi-objective optimization problem, and then adopt the concept of Pareto optimality to mathematically characterize the optimality condition. Furthermore, we propose differentially private algorithms which still matches the lower bound, showing that privacy is "almost free". Additionally, we derive the asymptotic normality of the estimator, which is essential in statistical inference and hypothesis testing. △ Less

Submitted 5 February, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Add a table

arXiv:2401.07402 [pdf, other]

Improved Implicit Neural Representation with Fourier Reparameterized Training

Authors: Kexuan Shi, Xingyu Zhou, Shuhang Gu

Abstract: Implicit Neural Representation (INR) as a mighty representation paradigm has achieved success in various computer vision tasks recently. Due to the low-frequency bias issue of vanilla multi-layer perceptron (MLP), existing methods have investigated advanced techniques, such as positional encoding and periodic activation function, to improve the accuracy of INR. In this paper, we connect the networ… ▽ More Implicit Neural Representation (INR) as a mighty representation paradigm has achieved success in various computer vision tasks recently. Due to the low-frequency bias issue of vanilla multi-layer perceptron (MLP), existing methods have investigated advanced techniques, such as positional encoding and periodic activation function, to improve the accuracy of INR. In this paper, we connect the network training bias with the reparameterization technique and theoretically prove that weight reparameterization could provide us a chance to alleviate the spectral bias of MLP. Based on our theoretical analysis, we propose a Fourier reparameterization method which learns coefficient matrix of fixed Fourier bases to compose the weights of MLP. We evaluate the proposed Fourier reparameterization method on different INR tasks with various MLP architectures, including vanilla MLP, MLP with positional encoding and MLP with advanced activation function, etc. The superiority approximation results on different MLP architectures clearly validate the advantage of our proposed method. Armed with our Fourier reparameterization method, better INR with more textures and less artifacts can be learned from the training data. △ Less

Submitted 4 July, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

Comments: CVPR 2024

arXiv:2401.06827 [pdf, other]

APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning

Authors: Guiming Cao, Kaize Shi, Hong Fu, Huaiwen Zhang, Guandong Xu

Abstract: Pre-trained Vision-Language (V-L) models set the benchmark for generalization to downstream tasks among the noteworthy contenders. Many characteristics of the V-L model have been explored in existing research including the challenge of the sensitivity to text input and the tuning process across multi-modal prompts. With the advanced utilization of the V-L model like CLIP, recent approaches deploy… ▽ More Pre-trained Vision-Language (V-L) models set the benchmark for generalization to downstream tasks among the noteworthy contenders. Many characteristics of the V-L model have been explored in existing research including the challenge of the sensitivity to text input and the tuning process across multi-modal prompts. With the advanced utilization of the V-L model like CLIP, recent approaches deploy learnable prompts instead of hand-craft prompts to boost the generalization performance and address the aforementioned challenges. Inspired by layer-wise training, which is wildly used in image fusion, we note that using a sequential training process to adapt different modalities branches of CLIP efficiently facilitates the improvement of generalization. In the context of addressing the multi-modal prompting challenge, we propose Token-wise Adaptive for Multi-modal Prompt Learning (APLe) for tuning both modalities prompts, vision and language, as tokens in a sequential manner. APLe addresses the challenges in V-L models to promote prompt learning across both modalities, which indicates a competitive generalization performance in line with the state-of-the-art. Preeminently, APLe shows robustness and favourable performance in prompt-length experiments with an absolute advantage in adopting the V-L models. △ Less

Submitted 23 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: 7 pages,3 figures

arXiv:2401.01250 [pdf, other]

doi 10.1103/PhysRevA.109.013324

Floquet topological phases with large winding number

Authors: Kaiye Shi, Xiang Zhang, Wei Zhang

Abstract: Recently, anomalous Floquet topological phases without static counterparts have been observed in different systems, where periodically driven models are realized to support a winding number of 1 and a pair of edge modes in each quasienergy gap. Here, we focus on cold atomic gases in optical lattices and propose a novel driving scheme that breaks rotation symmetry but maintains inversion symmetry o… ▽ More Recently, anomalous Floquet topological phases without static counterparts have been observed in different systems, where periodically driven models are realized to support a winding number of 1 and a pair of edge modes in each quasienergy gap. Here, we focus on cold atomic gases in optical lattices and propose a novel driving scheme that breaks rotation symmetry but maintains inversion symmetry of the instantaneous Hamiltonian, and discover a novel type of anomalous Floquet topological phase with winding number larger than 1. By analyzing the condition of band touching under symmetry constraint, we map out the phase diagram exactly by varying the driving parameters and discuss the quasienergy spectra of typical topological phases, which can present multiple pairs of edge modes within a single gap. Finally, we suggest to characterize the topology of such phases by detecting the band inversion surfaces via quench dynamics. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 7 pages, 5 figures

Journal ref: Phys. Rev. A 109, 013324 (2024)

arXiv:2401.00999 [pdf, ps, other]

Possible Meissner effect near room temperature in copper-substituted lead apatite

Authors: Hongyang Wang, Yao Yao, Ke Shi, Yijing Zhao, Hao Wu, Zhixing Wu, Zhihui Geng, Shufeng Ye, Ning Chen

Abstract: With copper-substituted lead apatite below room temperature, we observe diamagnetic dc magnetization under magnetic field of 25 Oe with remarkable bifurcation between zero-field-cooling and field-cooling measurements, and under 200 Oe it changes to be paramagnetism. A glassy memory effect is found during cooling. Typical hysteresis loops for superconductors are detected below 250 K, along with an… ▽ More With copper-substituted lead apatite below room temperature, we observe diamagnetic dc magnetization under magnetic field of 25 Oe with remarkable bifurcation between zero-field-cooling and field-cooling measurements, and under 200 Oe it changes to be paramagnetism. A glassy memory effect is found during cooling. Typical hysteresis loops for superconductors are detected below 250 K, along with an asymmetry between forward and backward sweep of magnetic field. Our experiment suggests at room temperature the Meissner effect is possibly present in this material. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: 7 pages, 4 figures

arXiv:2401.00646 [pdf, ps, other]

doi 10.1103/PhysRevB.108.214108

High magnetic field phase diagram and weak FM breaking in (Ni0.93Co0.07)3V2O8

Authors: Jiating Wu, Minjie Zhang, Ke Shi, Huxin Yin, Yuyan Han, Lansheng Ling, Wei Tong, Chuanying Xi, Li Pi, Zhaosheng Wang

Abstract: We present magnetostriction and thermal expansion measurements on multiferroic (Ni0.93Co0.07)3V2O8. The high field phase diagrams up to 33 T along the a, b and c directions are built. For H//a, as the magnetic field increases, two intermediate phases appear between the incommensurate phase and the paramagnetic phase at about 7 K, and then a magnetically induced phase appears above the paramagnetic… ▽ More We present magnetostriction and thermal expansion measurements on multiferroic (Ni0.93Co0.07)3V2O8. The high field phase diagrams up to 33 T along the a, b and c directions are built. For H//a, as the magnetic field increases, two intermediate phases appear between the incommensurate phase and the paramagnetic phase at about 7 K, and then a magnetically induced phase appears above the paramagnetic phase. For H//b,thermal expansion measurement indicates a mutation in the spin lattice coupling of the high field phases. The interlaced phase boundary suggests a mixed state in the optical high field phase. For H//c, an intermediate phase between the commensurate phase and the incommensurate phase is detected. A nonlinear boundary between the intermediate phase and the low temperature incommensurate phase, and a clear boundary between the commensurate phase and the paramagnetic phase are found. These results indicate that doping Co2+ breaks the weak ferromagnetic moment of the commensurate phase, which exists in the parent compound Ni3V2O8 and (Ni0.9Co0.1)3V2O8. This nonlinear influence reflects complicated spin modulation in Ni3V2O8 by doping Co2+. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: 7 pages, 4 figures

Journal ref: Phys. Rev. B 108, 214108(2023)

Showing 51–100 of 361 results for author: Shi, K