-
POST: Photonic Swin Transformer for Automated and Efficient Prediction of PCSEL
Authors:
Qi Xin,
Hai Huang,
Chenyu Li,
Kewei Shi,
Zhaoyu Zhang
Abstract:
This work designs a model named POST based on the Vision Transformer (ViT) approach. Across single, double, and even triple lattices, as well as various non-circular complex hole structures, POST enables prediction of multiple optical properties of photonic crystal layers in Photonic Crystal Surface Emitting Lasers (PCSELs) with high speed and accuracy, without requiring manual intervention, which…
▽ More
This work designs a model named POST based on the Vision Transformer (ViT) approach. Across single, double, and even triple lattices, as well as various non-circular complex hole structures, POST enables prediction of multiple optical properties of photonic crystal layers in Photonic Crystal Surface Emitting Lasers (PCSELs) with high speed and accuracy, without requiring manual intervention, which serves as a comprehensive surrogate for the optical field simulation. In the predictions of Quality Factor (Q) and Surface-emitting Efficiency (SE) for PCSEL, the R-squared values reach 0.909 and 0.779, respectively. Additionally, it achieves nearly 5,000 predictions per second, significantly lowering simulation costs. The precision and speed of POST predictions lay a solid foundation for future ultra-complex model parameter tuning involving dozens of parameters. It can also swiftly meets designers' ad-hoc requirements for evaluating photonic crystal properties. The database used for training the POST model is derived from predictions of different photonic crystal structures using the Coupled-Wave Theory (CWT) model. This dataset will be made publicly available to foster interdisciplinary research advancements in materials science and computer science.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Unsupervised deep learning model for fast energy layer pre-selection of delivery-efficient proton arc therapy plan optimization of nasopharyngeal carcinoma
Authors:
Bohan Yang,
Gang Liu,
Rirao Dao,
Yujia Qian,
Ke Shi,
Anke Tang,
Yong Luo,
Jingnan Liu
Abstract:
Objective. Proton arc therapy (PAT) is an emerging and promising modality in radiotherapy, offering several advantages over conventional intensitymodulated proton therapy (IMPT). However, identifying the optimal energy layer (EL) sequence remains computationally intensive due to the large number of possible energy layer transitions. This study proposes an unsupervised deep learning framework for f…
▽ More
Objective. Proton arc therapy (PAT) is an emerging and promising modality in radiotherapy, offering several advantages over conventional intensitymodulated proton therapy (IMPT). However, identifying the optimal energy layer (EL) sequence remains computationally intensive due to the large number of possible energy layer transitions. This study proposes an unsupervised deep learning framework for fast and effective EL pre-selection, aiming to minimize energy layer switch time while preserving high plan quality. Approach. We introduce a novel data representation method, spot-count representation, which encodes the number of proton spots intersecting the target and organs at risk (OARs) in a matrix structured by sorted gantry angles and energy layers. This representation is the input of a UNet-based architecture, SPArcdl, which is trained to optimize a tri-objective function: maximizing target coverage, minimizing OAR exposure, and reducing energy switching time. The model is evaluated on 54 nasopharyngeal cancer cases, and its performance is benchmarked against plans generated by SPArcparticle swarm. Main results. SPArcdl produces EL pre-selection that significantly improves both plan quality and delivery efficiency. Compared to SPArc particle swarm, it enhances the conformity index by 0.16 (p < 0.01), reduces the homogeneity index by 0.71 (p < 0.01), shortens the energy switching time by 38.4% (p < 0.01), and lowers the mean dose to brainstem by 0.21 (p < 0.01). The results unintentionally reveal employing unchanged ELS is more time-wise efficient than descended ELS. SPArcdl's inference time is within 1 second. Significance. SPArcdl is a fast and effective tool for generating high-quality PAT plans by strategically pre-selecting energy layers to reduce delivery time while maintaining excellent dosimetric performance.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
First Positronium Lifetime Imaging with Scandium-44 on a Long Axial Field-of-view PET/CT
Authors:
Lorenzo Mercolli,
William M. Steinberger,
Pascal V. Grundler,
Anzhelika Moiseeva,
Saverio Braccini,
Maurizio Conti,
Paweł Moskal,
Narendra Rathod,
Axel Rominger,
Hasan Sari,
Roger Schibli,
Robert Seifert,
Kuangyu Shi,
Ewa Ł. Stępień,
Nicholas P. van der Meulen
Abstract:
Purpose: 44Sc has been successfully produced, synthesized, labeled and first-in-human studies were conducted some years ago. The decay properties of 44Sc, together with being close to a clinical implementation, make it an ideal candidate for in vivo positronium lifetime measurements. In this study, we investigate the count statistics for ortho-positronium (oPs) measurements with 44Sc.
Method: A…
▽ More
Purpose: 44Sc has been successfully produced, synthesized, labeled and first-in-human studies were conducted some years ago. The decay properties of 44Sc, together with being close to a clinical implementation, make it an ideal candidate for in vivo positronium lifetime measurements. In this study, we investigate the count statistics for ortho-positronium (oPs) measurements with 44Sc.
Method: A NEMA image quality phantom was filled with 41.7 MBq of 44Sc dissolved in water and scanned on a commercial long-axial field-of-view PET/CT. Three-photon events were identified using a prototype feature of the scanner and dedicated software. The lifetime of oPs was determined in the phantom spheres and in 4x4x4 mm^3 voxels.
Results: All measured oPs lifetimes are compatible, within the uncertainties, with the literature values for water. The oPs lifetime is 2.65+-0.50, 1.39+-0.20 and 1.76+-0.18 ns in the three smallest spheres of the phantom and 1.79+-0.57 ns for a single voxel in the central region of the largest sphere. The relative standard deviation in the background regions of the time difference distributions, i.e., for time differences smaller than -2.7 ns, is above 20% - even for voxels inside the phantom spheres.
Conclusions: Despite the favorable physical properties of 44Sc, the count statistics of three-photon events remains a challenge. The high prompt-photon energy causes a significant amount of random three-photon coincidences with the given methodology and, therefore, increases the statistical uncertainties on the measured oPs lifetime.
△ Less
Submitted 17 June, 2025; v1 submitted 16 June, 2025;
originally announced June 2025.
-
Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems
Authors:
Yuqi Ping,
Tianhao Liang,
Huahao Ding,
Guangyu Lei,
Junwei Wu,
Xuan Zou,
Kuan Shi,
Rui Shao,
Chiya Zhang,
Weizheng Zhang,
Weijie Yuan,
Tingting Zhang
Abstract:
Recent breakthroughs in multimodal large language models (MLLMs) have endowed AI systems with unified perception, reasoning and natural-language interaction across text, image and video streams. Meanwhile, Unmanned Aerial Vehicle (UAV) swarms are increasingly deployed in dynamic, safety-critical missions that demand rapid situational understanding and autonomous adaptation. This paper explores pot…
▽ More
Recent breakthroughs in multimodal large language models (MLLMs) have endowed AI systems with unified perception, reasoning and natural-language interaction across text, image and video streams. Meanwhile, Unmanned Aerial Vehicle (UAV) swarms are increasingly deployed in dynamic, safety-critical missions that demand rapid situational understanding and autonomous adaptation. This paper explores potential solutions for integrating MLLMs with UAV swarms to enhance the intelligence and adaptability across diverse tasks. Specifically, we first outline the fundamental architectures and functions of UAVs and MLLMs. Then, we analyze how MLLMs can enhance the UAV system performance in terms of target detection, autonomous navigation, and multi-agent coordination, while exploring solutions for integrating MLLMs into UAV systems. Next, we propose a practical case study focused on the forest fire fighting. To fully reveal the capabilities of the proposed framework, human-machine interaction, swarm task planning, fire assessment, and task execution are investigated. Finally, we discuss the challenges and future research directions for the MLLMs-enabled UAV swarm. An experiment illustration video could be found online at https://youtu.be/zwnB9ZSa5A4.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Towards Biosignals-Free Autonomous Prosthetic Hand Control via Imitation Learning
Authors:
Kaijie Shi,
Wanglong Lu,
Hanli Zhao,
Vinicius Prado da Fonseca,
Ting Zou,
Xianta Jiang
Abstract:
Limb loss affects millions globally, impairing physical function and reducing quality of life. Most traditional surface electromyographic (sEMG) and semi-autonomous methods require users to generate myoelectric signals for each control, imposing physically and mentally taxing demands. This study aims to develop a fully autonomous control system that enables a prosthetic hand to automatically grasp…
▽ More
Limb loss affects millions globally, impairing physical function and reducing quality of life. Most traditional surface electromyographic (sEMG) and semi-autonomous methods require users to generate myoelectric signals for each control, imposing physically and mentally taxing demands. This study aims to develop a fully autonomous control system that enables a prosthetic hand to automatically grasp and release objects of various shapes using only a camera attached to the wrist. By placing the hand near an object, the system will automatically execute grasping actions with a proper grip force in response to the hand's movements and the environment. To release the object being grasped, just naturally place the object close to the table and the system will automatically open the hand. Such a system would provide individuals with limb loss with a very easy-to-use prosthetic control interface and greatly reduce mental effort while using. To achieve this goal, we developed a teleoperation system to collect human demonstration data for training the prosthetic hand control model using imitation learning, which mimics the prosthetic hand actions from human. Through training the model using only a few objects' data from one single participant, we have shown that the imitation learning algorithm can achieve high success rates, generalizing to more individuals and unseen objects with a variation of weights. The demonstrations are available at \href{https://sites.google.com/view/autonomous-prosthetic-hand}{https://sites.google.com/view/autonomous-prosthetic-hand}
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
First positronium imaging using $^{44}$Sc with the J-PET scanner: a case study on the NEMA-Image Quality phantom
Authors:
Manish Das,
Sushil Sharma,
Aleksander Bilewicz,
Jarosław Choiński,
Neha Chug,
Catalina Curceanu,
Eryk Czerwiński,
Jakub Hajduga,
Sharareh Jalali,
Krzysztof Kacprzak,
Tevfik Kaplanoglu,
Łukasz Kapłon,
Kamila Kasperska,
Aleksander Khreptak,
Grzegorz Korcyl,
Tomasz Kozik,
Karol Kubat,
Deepak Kumar,
Anoop Kunimmal Venadan,
Edward Lisowski,
Filip Lisowski,
Justyna Medrala-Sowa,
Simbarashe Moyo,
Wiktor Mryka,
Szymon Niedźwiecki
, et al. (19 additional authors not shown)
Abstract:
Positronium Lifetime Imaging (PLI), an emerging extension of conventional positron emission tomography (PET) imaging, offers a novel window for probing the submolecular properties of biological tissues by imaging the mean lifetime of the positronium atom. Currently, the method is under rapid development in terms of reconstruction and detection systems. Recently, the first in vivo PLI of the human…
▽ More
Positronium Lifetime Imaging (PLI), an emerging extension of conventional positron emission tomography (PET) imaging, offers a novel window for probing the submolecular properties of biological tissues by imaging the mean lifetime of the positronium atom. Currently, the method is under rapid development in terms of reconstruction and detection systems. Recently, the first in vivo PLI of the human brain was performed using the J-PET scanner utilizing the $^{68}$Ga isotope. However, this isotope has limitations due to its comparatively low prompt gamma yields, which is crucial for positronium lifetime measurement. Among alternative radionuclides, $^{44}$Sc stands out as a promising isotope for PLI, characterized by a clinically suitable half-life (4.04 hours) emitting 1157 keV prompt gamma in 100% cases after the emission of the positron. This study reports the first experimental demonstration of PLI with $^{44}$Sc, carried out on a NEMA-Image Quality (IQ) phantom using the Modular J-PET tomograph-the first plastic scintillators-based PET scanner.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
UAQFact: Evaluating Factual Knowledge Utilization of LLMs on Unanswerable Questions
Authors:
Chuanyuan Tan,
Wenbiao Shao,
Hao Xiong,
Tong Zhu,
Zhenhua Liu,
Kai Shi,
Wenliang Chen
Abstract:
Handling unanswerable questions (UAQ) is crucial for LLMs, as it helps prevent misleading responses in complex situations. While previous studies have built several datasets to assess LLMs' performance on UAQ, these datasets lack factual knowledge support, which limits the evaluation of LLMs' ability to utilize their factual knowledge when handling UAQ. To address the limitation, we introduce a ne…
▽ More
Handling unanswerable questions (UAQ) is crucial for LLMs, as it helps prevent misleading responses in complex situations. While previous studies have built several datasets to assess LLMs' performance on UAQ, these datasets lack factual knowledge support, which limits the evaluation of LLMs' ability to utilize their factual knowledge when handling UAQ. To address the limitation, we introduce a new unanswerable question dataset UAQFact, a bilingual dataset with auxiliary factual knowledge created from a Knowledge Graph. Based on UAQFact, we further define two new tasks to measure LLMs' ability to utilize internal and external factual knowledge, respectively. Our experimental results across multiple LLM series show that UAQFact presents significant challenges, as LLMs do not consistently perform well even when they have factual knowledge stored. Additionally, we find that incorporating external knowledge may enhance performance, but LLMs still cannot make full use of the knowledge which may result in incorrect responses.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Perfect Matchings on Doubly Free Boundary Rail-Yard Graph with Macdonald Weights
Authors:
Zhongyang Li,
Kaili Shi
Abstract:
We investigate the asymptotic behavior of perfect matchings on rail-yard graphs with doubly free boundary conditions and Jack weights. While a special case of this model reduces to the half space Macdonald process with Jack weights introduced by Barraquand, Borodin, and Corwin [3], the asymptotic behavior in the general Jack-weighted free boundary setting considered here has, to our knowledge, rem…
▽ More
We investigate the asymptotic behavior of perfect matchings on rail-yard graphs with doubly free boundary conditions and Jack weights. While a special case of this model reduces to the half space Macdonald process with Jack weights introduced by Barraquand, Borodin, and Corwin [3], the asymptotic behavior in the general Jack-weighted free boundary setting considered here has, to our knowledge, remained open in the literature; perhaps due to the absence of determinantal structure and the analytic complexity of boundary interactions that distinguish this setting from previously tractable cases. Our analysis is inspired by the asymptotic framework developed around the Negut operator by Gorin, Zhang, and Ahn, but it is adapted in new directions to address the challenges posed by the fully free boundary Jack-weighted regime. In particular, we establish novel identities for Macdonald polynomials and analyze infinite-product expansions not previously studied in this context. These tools enable us to rigorously establish the existence of a limit shape and to prove that the height fluctuations converge to the Gaussian Free Field (GFF) in the liquid region. These results, to the best of our knowledge, provide the first rigorous limit shape and fluctuation analysis in Jack-weighted tiling models with general free boundary conditions. In doing so, we expand the asymptotic theory of symmetric-function-deformed models beyond previously accessible, determinantal frameworks.
△ Less
Submitted 25 May, 2025; v1 submitted 23 May, 2025;
originally announced May 2025.
-
ReviewInstruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models
Authors:
Jiangxu Wu,
Cong Wang,
TianHuang Su,
Jun Yang,
Haozhi Lin,
Chao Zhang,
Ming Peng,
Kai Shi,
SongPan Yang,
BinQing Pan,
ZiXian Li,
Ni Yang,
ZhenYu Yang
Abstract:
The effectiveness of large language models (LLMs) in conversational AI is hindered by their reliance on single-turn supervised fine-tuning (SFT) data, which limits contextual coherence in multi-turn dialogues. Existing methods for generating multi-turn dialogue data struggle to ensure both diversity and quality in instructions. To address this, we propose Review-Instruct, a novel framework that sy…
▽ More
The effectiveness of large language models (LLMs) in conversational AI is hindered by their reliance on single-turn supervised fine-tuning (SFT) data, which limits contextual coherence in multi-turn dialogues. Existing methods for generating multi-turn dialogue data struggle to ensure both diversity and quality in instructions. To address this, we propose Review-Instruct, a novel framework that synthesizes multi-turn conversations through an iterative "Ask-Respond-Review" process involving three agent roles: a Candidate, multiple Reviewers, and a Chairman. The framework iteratively refines instructions by incorporating Reviewer feedback, enhancing dialogue diversity and difficulty. We construct a multi-turn dataset using the Alpaca dataset and fine-tune the LLaMA2-13B model. Evaluations on MT-Bench, MMLU-Pro, and Auto-Arena demonstrate significant improvements, achieving absolute gains of 2.9\% on MMLU-Pro and 2\% on MT-Bench compared to prior state-of-the-art models based on LLaMA2-13B. Ablation studies confirm the critical role of the Review stage and the use of multiple Reviewers in boosting instruction diversity and difficulty. Our work highlights the potential of review-driven, multi-agent frameworks for generating high-quality conversational data at scale.
△ Less
Submitted 4 July, 2025; v1 submitted 16 May, 2025;
originally announced May 2025.
-
Nonlinearity Modulation of Auto-oscillations in Three-terminal Magnetic Tunnel Junctions
Authors:
Zixi Wang,
Wenlong Cai,
Ao Du,
Zanhong Chen,
Lei Zhou,
Shiyang Lu,
Kewen Shi,
Weisheng Zhao
Abstract:
Spin torque nano-oscillators (STNOs) hold encouraging promise for nanoscale microwave generators, modulators, and new types of intelligent computing. The nonlinearity, describing the current-induced tunability of oscillating frequency, is a distinctive feature of STNOs, which plays important roles in efficient manipulation of microwave frequencies, rapid spec-trum analysis, and the design of neuro…
▽ More
Spin torque nano-oscillators (STNOs) hold encouraging promise for nanoscale microwave generators, modulators, and new types of intelligent computing. The nonlinearity, describing the current-induced tunability of oscillating frequency, is a distinctive feature of STNOs, which plays important roles in efficient manipulation of microwave frequencies, rapid spec-trum analysis, and the design of neuromorphic devices. However, experimental research on its efficient modulation remains limited. Here, we comprehensively studied the impact of several factors on nonlinearity in nanoscale three-terminal MTJ-STNOs, including the external magnetic field, the thickness of CoFeB free layer, and the combination of spin-transfer torque (STT) and spin-orbit torque (SOT). Among these factors, nonlinearity can be significantly tuned by the direction of magnetic field as well as the thickness of CoFeB free layer. Notably, it reaches zero in 1.1 nm CoFeB, where the oscillation frequency is not affected by the drive current. Such property provides a more intrinsic and robust approach to achieve zero nonlinearity in STNOs, which is advantageous for high-quality microwave generators. More importantly, we found that nonlinearity can also be electrically modulated by both STT and SOT currents, and develop a refined model that accounts for the additional contribution of the SOT current to explain the mechanism. This electrical approach is more convenient, energy-efficient, and well-suited for miniaturization. Our findings offer a comprehensive understanding and open up a new dimension for the current tunability of nonlinearity in MTJ-STNOs, benefiting further optimization in nanoscale STNO-based microwave generators and neuromorphic computing devices.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind
Authors:
Mouad Abrini,
Omri Abend,
Dina Acklin,
Henny Admoni,
Gregor Aichinger,
Nitay Alon,
Zahra Ashktorab,
Ashish Atreja,
Moises Auron,
Alexander Aufreiter,
Raghav Awasthi,
Soumya Banerjee,
Joe M. Barnby,
Rhea Basappa,
Severin Bergsmann,
Djallel Bouneffouf,
Patrick Callaghan,
Marc Cavazza,
Thierry Chaminade,
Sonia Chernova,
Mohamed Chetouan,
Moumita Choudhury,
Axel Cleeremans,
Jacek B. Cywinski,
Fabio Cuzzolin
, et al. (83 additional authors not shown)
Abstract:
This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.
This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.
△ Less
Submitted 28 April, 2025;
originally announced May 2025.
-
Study on impact mechanism and precursor information induced by high intensity mining
Authors:
Kaiwen Shi,
Wenhao Shi,
Shankun Zhao,
Hongfei Duan,
Yuwei Li,
Haojie Xue,
Xueyi Shang,
Wengang Dang,
Peng Li,
Yunfei Zhang,
Binghuo Guan,
Xiang Ma,
Hongke Gao
Abstract:
With heightened mining intensity, the incidence of coal bursts is escalating, necessitating advanced understanding and prediction techniques. This research delves into the intricacies of coal burst mechanisms, proposing a novel theoretical model for the release of coal mass energy founded on the tenets of stress superposition. A significant revelation is that the energy culminating in a coal burst…
▽ More
With heightened mining intensity, the incidence of coal bursts is escalating, necessitating advanced understanding and prediction techniques. This research delves into the intricacies of coal burst mechanisms, proposing a novel theoretical model for the release of coal mass energy founded on the tenets of stress superposition. A significant revelation is that the energy culminating in a coal burst is an amalgamation of intrinsic coal strain energy and perturbations from mining activities. Field investigations scrutinize the microseismic parameters across a spectrum of mining velocities, discerning potential failure regions and precursor hallmarks in high-intensity mining environments. Notably, microseismic energy, in such contexts, experiences an augmentation of approximately 2000 J. Numerical simulations executed via 3DEC elucidate stress distribution patterns and failure modalities of adjacent rock structures in relation to mining velocities. The simulations underscore that an uptick in mining speed diminishes the buffer to high-pressure abutments, intensifying inherent pressures. For mitigation, it's advocated that high-intensity mining advances be capped at 11 m/d. Merging theoretical analysis, experimental data, field assessments, and computational simulations, this study proffers a holistic insight into coal burst dynamics, underscoring its value in refining monitoring and early warning protocols in the domain.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Negative Imaginary Neural ODEs: Learning to Control Mechanical Systems with Stability Guarantees
Authors:
Kanghong Shi,
Ruigang Wang,
Ian R. Manchester
Abstract:
We propose a neural control method to provide guaranteed stabilization for mechanical systems using a novel negative imaginary neural ordinary differential equation (NINODE) controller. Specifically, we employ neural networks with desired properties as state-space function matrices within a Hamiltonian framework to ensure the system possesses the NI property. This NINODE system can serve as a cont…
▽ More
We propose a neural control method to provide guaranteed stabilization for mechanical systems using a novel negative imaginary neural ordinary differential equation (NINODE) controller. Specifically, we employ neural networks with desired properties as state-space function matrices within a Hamiltonian framework to ensure the system possesses the NI property. This NINODE system can serve as a controller that asymptotically stabilizes an NI plant under certain conditions. For mechanical plants with colocated force actuators and position sensors, we demonstrate that all the conditions required for stability can be translated into regularity constraints on the neural networks used in the controller. We illustrate the utility, effectiveness, and stability guarantees of the NINODE controller through an example involving a nonlinear mass-spring system.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement
Authors:
Kangbiao Shi,
Yixu Feng,
Tao Hu,
Yu Cao,
Peng Wu,
Yijin Liang,
Yanning Zhang,
Qingsen Yan
Abstract:
The advent of Deep Neural Networks (DNNs) has driven remarkable progress in low-light image enhancement (LLIE), with diverse architectures (e.g., CNNs and Transformers) and color spaces (e.g., sRGB, HSV, HVI) yielding impressive results. Recent efforts have sought to leverage the complementary strengths of these paradigms, offering promising solutions to enhance performance across varying degradat…
▽ More
The advent of Deep Neural Networks (DNNs) has driven remarkable progress in low-light image enhancement (LLIE), with diverse architectures (e.g., CNNs and Transformers) and color spaces (e.g., sRGB, HSV, HVI) yielding impressive results. Recent efforts have sought to leverage the complementary strengths of these paradigms, offering promising solutions to enhance performance across varying degradation scenarios. However, existing fusion strategies are hindered by challenges such as parameter explosion, optimization instability, and feature misalignment, limiting further improvements. To overcome these issues, we introduce FusionNet, a novel multi-model linear fusion framework that operates in parallel to effectively capture global and local features across diverse color spaces. By incorporating a linear fusion strategy underpinned by Hilbert space theoretical guarantees, FusionNet mitigates network collapse and reduces excessive training costs. Our method achieved 1st place in the CVPR2025 NTIRE Low Light Enhancement Challenge. Extensive experiments conducted on synthetic and real-world benchmark datasets demonstrate that the proposed method significantly outperforms state-of-the-art methods in terms of both quantitative and qualitative results, delivering robust enhancement under diverse low-light conditions.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Narrowband Imaging of a z=3.24 Protocluster: Insights from [O III] Emitting Galaxies
Authors:
Ke Shi,
Jun Toshikawa,
XianZhong Zheng,
Zheng Cai,
DongDong Shi
Abstract:
We present a narrowband imaging on a spectroscopically confirmed protocluster ``D4UD01'' at z=3.24 using CFHT/WIRCam. We identify a sample of 24 [O III] emission line galaxies in the field, which forms a large overdensity in the protocluster region. The protocluster is expected to evolve into a Virgo-like cluster by z=0. Utilizing multiwavelength data, we derive the physical properties of these [O…
▽ More
We present a narrowband imaging on a spectroscopically confirmed protocluster ``D4UD01'' at z=3.24 using CFHT/WIRCam. We identify a sample of 24 [O III] emission line galaxies in the field, which forms a large overdensity in the protocluster region. The protocluster is expected to evolve into a Virgo-like cluster by z=0. Utilizing multiwavelength data, we derive the physical properties of these [O III] emitters and find they are medium mass normal star-forming galaxies ($\sim10^{10}$M$_\odot$) roughly following the star-forming main sequence. The [O III] emitters trace an overdensity spatially offset from that of photometric-redshift and quiescent galaxies, suggesting these distinct galaxy populations may inhabit dark matter halos that formed at different epochs. A comparative analysis of [O III] emitter properties shows similar characteristics in both protocluster and field environments. This protocluster likely represents an evolved structure that has progressed beyond its peak star-formation phase, although our limited sample size may prevent detection of subtle environmental effects.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
PlugSelect: Pruning Channels with Plug-and-Play Flexibility for Electroencephalography-based Brain Computer Interface
Authors:
Xue Yuan,
Keren Shi,
Ning Jiang,
Jiayuan He
Abstract:
Automatic minimization and optimization of the number of the electrodes is essential for the practical application of electroencephalography (EEG)-based brain computer interface (BCI). Previous methods typically require additional training costs or rely on prior knowledge assumptions. This study proposed a novel channel pruning model, plug-and-select (PlugSelect), applicable across a broad range o…
▽ More
Automatic minimization and optimization of the number of the electrodes is essential for the practical application of electroencephalography (EEG)-based brain computer interface (BCI). Previous methods typically require additional training costs or rely on prior knowledge assumptions. This study proposed a novel channel pruning model, plug-and-select (PlugSelect), applicable across a broad range of BCI paradigms with no additional training cost and plug-and-play functionality. It integrates gradients along the input path to globally infer the causal relationships between input channels and outputs, and ranks the contribution sequences to identify the most highly attributed channels. The results showed that for three BCI paradigms, i.e., auditory attention decoding (AAD), motor imagery (MI), affective computation (AC), PlugSelect could reduce the number of channels by at least half while effectively maintaining decoding performance and improving efficiency. The outcome benefits the design of wearable EEG-based devices, facilitating the practical application of BCI technology.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
Authors:
Jiaqi Deng,
Kaize Shi,
Zonghan Wu,
Huan Huo,
Dingxian Wang,
Guandong Xu
Abstract:
Knowledge-based Vision Question Answering (KB-VQA) systems address complex visual-grounded questions with knowledge retrieved from external knowledge bases. The tasks of knowledge retrieval and answer generation tasks both necessitate precise multimodal understanding of question context and external knowledge. However, existing methods treat these two stages as separate modules with limited intera…
▽ More
Knowledge-based Vision Question Answering (KB-VQA) systems address complex visual-grounded questions with knowledge retrieved from external knowledge bases. The tasks of knowledge retrieval and answer generation tasks both necessitate precise multimodal understanding of question context and external knowledge. However, existing methods treat these two stages as separate modules with limited interaction during training, which hinders bi-directional parametric knowledge sharing, ultimately leading to suboptimal performance. To fully exploit the cross-task synergy in KB-VQA, we propose a unified retrieval-augmented VQA framework with collaborative parametric knowledge calibration. The proposed framework can effectively adapt general multimodal pre-trained models for fine-grained, knowledge-intensive tasks while enabling the retriever and generator to collaboratively enhance and share their parametric knowledge during both training and inference. To enhance fine-grained understanding of questions and external documents, we also integrate late interaction mechanism into the proposed training framework. Additionally, we introduce a reflective-answering mechanism that allows the model to explicitly evaluate and refine its knowledge boundary. Our approach achieves competitive performance against state-of-the-art models, delivering a significant 4.7\% improvement in answering accuracy, and brings an average 7.5\% boost in base MLLMs' VQA performance.
△ Less
Submitted 30 June, 2025; v1 submitted 5 April, 2025;
originally announced April 2025.
-
MMCE: A Framework for Deep Monotonic Modeling of Multiple Causal Effects
Authors:
Juhua Chen,
Karson shi,
Jialing He,
North Chen,
Kele Jiang
Abstract:
When we plan to use money as an incentive to change the behavior of a person (such as making riders to deliver more orders or making consumers to buy more items), the common approach of this problem is to adopt a two-stage framework in order to maximize ROI under cost constraints. In the first stage, the individual price response curve is obtained. In the second stage, business goals and resource…
▽ More
When we plan to use money as an incentive to change the behavior of a person (such as making riders to deliver more orders or making consumers to buy more items), the common approach of this problem is to adopt a two-stage framework in order to maximize ROI under cost constraints. In the first stage, the individual price response curve is obtained. In the second stage, business goals and resource constraints are formally expressed and modeled as an optimization problem. The first stage is very critical. It can answer a very important question. This question is how much incremental results can incentives bring, which is the basis of the second stage. Usually, the causal modeling is used to obtain the curve. In the case of only observational data, causal modeling and evaluation are very challenging. In some business scenarios, multiple causal effects need to be obtained at the same time. This paper proposes a new observational data modeling and evaluation framework, which can simultaneously model multiple causal effects and greatly improve the modeling accuracy under some abnormal distributions. In the absence of RCT data, evaluation seems impossible. This paper summarizes three priors to illustrate the necessity and feasibility of qualitative evaluation of cognitive testing. At the same time, this paper innovatively proposes the conditions under which observational data can be considered as an evaluation dataset. Our approach is very groundbreaking. It is the first to propose a modeling framework that simultaneously obtains multiple causal effects. The offline analysis and online experimental results show the effectiveness of the results and significantly improve the effectiveness of the allocation strategies generated in real world marketing activities.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Consistency Trajectory Matching for One-Step Generative Super-Resolution
Authors:
Weiyi You,
Mingyang Zhang,
Leheng Zhang,
Xingyu Zhou,
Kexuan Shi,
Shuhang Gu
Abstract:
Current diffusion-based super-resolution (SR) approaches achieve commendable performance at the cost of high inference overhead. Therefore, distillation techniques are utilized to accelerate the multi-step teacher model into one-step student model. Nevertheless, these methods significantly raise training costs and constrain the performance of the student model by the teacher model. To overcome the…
▽ More
Current diffusion-based super-resolution (SR) approaches achieve commendable performance at the cost of high inference overhead. Therefore, distillation techniques are utilized to accelerate the multi-step teacher model into one-step student model. Nevertheless, these methods significantly raise training costs and constrain the performance of the student model by the teacher model. To overcome these tough challenges, we propose Consistency Trajectory Matching for Super-Resolution (CTMSR), a distillation-free strategy that is able to generate photo-realistic SR results in one step. Concretely, we first formulate a Probability Flow Ordinary Differential Equation (PF-ODE) trajectory to establish a deterministic mapping from low-resolution (LR) images with noise to high-resolution (HR) images. Then we apply the Consistency Training (CT) strategy to directly learn the mapping in one step, eliminating the necessity of pre-trained diffusion model. To further enhance the performance and better leverage the ground-truth during the training process, we aim to align the distribution of SR results more closely with that of the natural images. To this end, we propose to minimize the discrepancy between their respective PF-ODE trajectories from the LR image distribution by our meticulously designed Distribution Trajectory Matching (DTM) loss, resulting in improved realism of our recovered HR images. Comprehensive experimental results demonstrate that the proposed methods can attain comparable or even superior capabilities on both synthetic and real datasets while maintaining minimal inference latency.
△ Less
Submitted 30 June, 2025; v1 submitted 26 March, 2025;
originally announced March 2025.
-
Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model
Authors:
Leheng Zhang,
Weiyi You,
Kexuan Shi,
Shuhang Gu
Abstract:
Diffusion-based image super-resolution methods have demonstrated significant advantages over GAN-based approaches, particularly in terms of perceptual quality. Building upon a lengthy Markov chain, diffusion-based methods possess remarkable modeling capacity, enabling them to achieve outstanding performance in real-world scenarios. Unlike previous methods that focus on modifying the noise schedule…
▽ More
Diffusion-based image super-resolution methods have demonstrated significant advantages over GAN-based approaches, particularly in terms of perceptual quality. Building upon a lengthy Markov chain, diffusion-based methods possess remarkable modeling capacity, enabling them to achieve outstanding performance in real-world scenarios. Unlike previous methods that focus on modifying the noise schedule or sampling process to enhance performance, our approach emphasizes the improved utilization of LR information. We find that different regions of the LR image can be viewed as corresponding to different timesteps in a diffusion process, where flat areas are closer to the target HR distribution but edge and texture regions are farther away. In these flat areas, applying a slight noise is more advantageous for the reconstruction. We associate this characteristic with uncertainty and propose to apply uncertainty estimate to guide region-specific noise level control, a technique we refer to as Uncertainty-guided Noise Weighting. Pixels with lower uncertainty (i.e., flat regions) receive reduced noise to preserve more LR information, therefore improving performance. Furthermore, we modify the network architecture of previous methods to develop our Uncertainty-guided Perturbation Super-Resolution (UPSR) model. Extensive experimental results demonstrate that, despite reduced model size and training overhead, the proposed UWSR method outperforms current state-of-the-art methods across various datasets, both quantitatively and qualitatively.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
Authors:
Wenyuan Zhang,
Yixiao Yang,
Han Huang,
Liang Han,
Kanle Shi,
Yu-Shen Liu,
Zhizhong Han
Abstract:
Monocular depth priors have been widely adopted by neural rendering in multi-view based tasks such as 3D reconstruction and novel view synthesis. However, due to the inconsistent prediction on each view, how to more effectively leverage monocular cues in a multi-view context remains a challenge. Current methods treat the entire estimated depth map indiscriminately, and use it as ground truth super…
▽ More
Monocular depth priors have been widely adopted by neural rendering in multi-view based tasks such as 3D reconstruction and novel view synthesis. However, due to the inconsistent prediction on each view, how to more effectively leverage monocular cues in a multi-view context remains a challenge. Current methods treat the entire estimated depth map indiscriminately, and use it as ground truth supervision, while ignoring the inherent inaccuracy and cross-view inconsistency in monocular priors. To resolve these issues, we propose MonoInstance, a general approach that explores the uncertainty of monocular depths to provide enhanced geometric priors for neural rendering and reconstruction. Our key insight lies in aligning each segmented instance depths from multiple views within a common 3D space, thereby casting the uncertainty estimation of monocular depths into a density measure within noisy point clouds. For high-uncertainty areas where depth priors are unreliable, we further introduce a constraint term that encourages the projected instances to align with corresponding instance masks on nearby views. MonoInstance is a versatile strategy which can be seamlessly integrated into various multi-view neural rendering frameworks. Our experimental results demonstrate that MonoInstance significantly improves the performance in both reconstruction and novel view synthesis under various benchmarks.
△ Less
Submitted 30 March, 2025; v1 submitted 24 March, 2025;
originally announced March 2025.
-
NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction
Authors:
Wenyuan Zhang,
Emily Yue-ting Jia,
Junsheng Zhou,
Baorui Ma,
Kanle Shi,
Yu-Shen Liu,
Zhizhong Han
Abstract:
Recently, it has shown that priors are vital for neural implicit functions to reconstruct high-quality surfaces from multi-view RGB images. However, current priors require large-scale pre-training, and merely provide geometric clues without considering the importance of color. In this paper, we present NeRFPrior, which adopts a neural radiance field as a prior to learn signed distance fields using…
▽ More
Recently, it has shown that priors are vital for neural implicit functions to reconstruct high-quality surfaces from multi-view RGB images. However, current priors require large-scale pre-training, and merely provide geometric clues without considering the importance of color. In this paper, we present NeRFPrior, which adopts a neural radiance field as a prior to learn signed distance fields using volume rendering for surface reconstruction. Our NeRF prior can provide both geometric and color clues, and also get trained fast under the same scene without additional data. Based on the NeRF prior, we are enabled to learn a signed distance function (SDF) by explicitly imposing a multi-view consistency constraint on each ray intersection for surface inference. Specifically, at each ray intersection, we use the density in the prior as a coarse geometry estimation, while using the color near the surface as a clue to check its visibility from another view angle. For the textureless areas where the multi-view consistency constraint does not work well, we further introduce a depth consistency loss with confidence weights to infer the SDF. Our experimental results outperform the state-of-the-art methods under the widely used benchmarks.
△ Less
Submitted 30 March, 2025; v1 submitted 24 March, 2025;
originally announced March 2025.
-
Fed-NDIF: A Noise-Embedded Federated Diffusion Model For Low-Count Whole-Body PET Denoising
Authors:
Yinchi Zhou,
Huidong Xie,
Menghua Xia,
Qiong Liu,
Bo Zhou,
Tianqi Chen,
Jun Hou,
Liang Guo,
Xinyuan Zheng,
Hanzhong Wang,
Biao Li,
Axel Rominger,
Kuangyu Shi,
Nicha C. Dvorneka,
Chi Liu
Abstract:
Low-count positron emission tomography (LCPET) imaging can reduce patients' exposure to radiation but often suffers from increased image noise and reduced lesion detectability, necessitating effective denoising techniques. Diffusion models have shown promise in LCPET denoising for recovering degraded image quality. However, training such models requires large and diverse datasets, which are challe…
▽ More
Low-count positron emission tomography (LCPET) imaging can reduce patients' exposure to radiation but often suffers from increased image noise and reduced lesion detectability, necessitating effective denoising techniques. Diffusion models have shown promise in LCPET denoising for recovering degraded image quality. However, training such models requires large and diverse datasets, which are challenging to obtain in the medical domain. To address data scarcity and privacy concerns, we combine diffusion models with federated learning -- a decentralized training approach where models are trained individually at different sites, and their parameters are aggregated on a central server over multiple iterations. The variation in scanner types and image noise levels within and across institutions poses additional challenges for federated learning in LCPET denoising. In this study, we propose a novel noise-embedded federated learning diffusion model (Fed-NDIF) to address these challenges, leveraging a multicenter dataset and varying count levels. Our approach incorporates liver normalized standard deviation (NSTD) noise embedding into a 2.5D diffusion model and utilizes the Federated Averaging (FedAvg) algorithm to aggregate locally trained models into a global model, which is subsequently fine-tuned on local datasets to optimize performance and obtain personalized models. Extensive validation on datasets from the University of Bern, Ruijin Hospital in Shanghai, and Yale-New Haven Hospital demonstrates the superior performance of our method in enhancing image quality and improving lesion quantification. The Fed-NDIF model shows significant improvements in PSNR, SSIM, and NMSE of the entire 3D volume, as well as enhanced lesion detectability and quantification, compared to local diffusion models and federated UNet-based models.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Positronium Imaging: History, Current Status, and Future Perspectives
Authors:
Paweł Moskal,
Aleksander Bilewicz,
Manish Das,
Bangyan Huang,
Aleksander Khreptak,
Szymon Parzych,
Jinyi Qi,
Axel Rominger,
Robert Seifert,
Sushil Sharma,
Kuangyu Shi,
William Steinberger,
Rafał Walczak,
Ewa Stępień
Abstract:
Positronium imaging was recently proposed to image the properties of positronium atoms in the patient body. Positronium properties depend on the size of intramolecular voids and oxygen concentration; therefore, they deliver information different and complementary to the anatomic, morphological, and metabolic images. Thus far, the mean ortho-positronium lifetime imaging has been at the center of re…
▽ More
Positronium imaging was recently proposed to image the properties of positronium atoms in the patient body. Positronium properties depend on the size of intramolecular voids and oxygen concentration; therefore, they deliver information different and complementary to the anatomic, morphological, and metabolic images. Thus far, the mean ortho-positronium lifetime imaging has been at the center of research interest. The first ex vivo and in vivo positronium lifetime images of humans have been demonstrated with the dedicated J-PET scanner enabling simultaneous registration of annihilation photons and prompt gamma from ${β^{+} γ}$ emitters. Annihilation photons are used to reconstruct the annihilation place and time while prompt gamma is used to reconstruct the time of positronium formation. This review describes recent achievements in the translation of positronium imaging into clinics. The first measurements of positronium lifetime in humans with commercial PET scanners modernized to register triple coincidences are reported. The in vivo observations of differences in ortho-positronium lifetime between tumor and healthy tissues and between different oxygen concentrations are discussed. So far, the positronium lifetime measurements in humans were completed with clinically available ${^{68}\text{Ga}}$, ${^{82}\text{Rb}}$, and ${^{124}\text{I}}$ radionuclides. Status and challenges in developing positronium imaging on a way to a clinically useful procedure are presented and discussed.
△ Less
Submitted 1 July, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
Auditing Differential Privacy in the Black-Box Setting
Authors:
Kaining Shi,
Cong Ma
Abstract:
This paper introduces a novel theoretical framework for auditing differential privacy (DP) in a black-box setting. Leveraging the concept of $f$-differential privacy, we explicitly define type I and type II errors and propose an auditing mechanism based on conformal inference. Our approach robustly controls the type I error rate under minimal assumptions. Furthermore, we establish a fundamental im…
▽ More
This paper introduces a novel theoretical framework for auditing differential privacy (DP) in a black-box setting. Leveraging the concept of $f$-differential privacy, we explicitly define type I and type II errors and propose an auditing mechanism based on conformal inference. Our approach robustly controls the type I error rate under minimal assumptions. Furthermore, we establish a fundamental impossibility result, demonstrating the inherent difficulty of simultaneously controlling both type I and type II errors without additional assumptions. Nevertheless, under a monotone likelihood ratio (MLR) assumption, our auditing mechanism effectively controls both errors. We also extend our method to construct valid confidence bands for the trade-off function in the finite-sample regime.
△ Less
Submitted 10 April, 2025; v1 submitted 15 March, 2025;
originally announced March 2025.
-
SDTrack: A Baseline for Event-based Tracking via Spiking Neural Networks
Authors:
Yimeng Shan,
Zhenbang Ren,
Haodi Wu,
Wenjie Wei,
Rui-Jie Zhu,
Shuai Wang,
Dehao Zhang,
Yichen Xiao,
Jieyuan Zhang,
Kexin Shi,
Jingzhinan Wang,
Jason K. Eshraghian,
Haicheng Qu,
Jiqing Zhang,
Malu Zhang,
Yang Yang
Abstract:
Event cameras provide superior temporal resolution, dynamic range, power efficiency, and pixel bandwidth. Spiking Neural Networks (SNNs) naturally complement event data through discrete spike signals, making them ideal for event-based tracking. However, current approaches that combine Artificial Neural Networks (ANNs) and SNNs, along with suboptimal architectures, compromise energy efficiency and…
▽ More
Event cameras provide superior temporal resolution, dynamic range, power efficiency, and pixel bandwidth. Spiking Neural Networks (SNNs) naturally complement event data through discrete spike signals, making them ideal for event-based tracking. However, current approaches that combine Artificial Neural Networks (ANNs) and SNNs, along with suboptimal architectures, compromise energy efficiency and limit tracking performance. To address these limitations, we propose the first Transformer-based spike-driven tracking pipeline. Our Global Trajectory Prompt (GTP) method effectively captures global trajectory information and aggregates it with event streams into event images to enhance spatiotemporal representation. We then introduce SDTrack, a Transformer-based spike-driven tracker comprising a Spiking MetaFormer backbone and a simple tracking head that directly predicts normalized coordinates using spike signals. The framework is end-to-end, does not require data augmentation or post-processing. Extensive experiments demonstrate that SDTrack achieves state-of-the-art performance while maintaining the lowest parameter count and energy consumption across multiple event-based tracking benchmarks, establishing a solid baseline for future research in the field of neuromorphic vision.
△ Less
Submitted 17 June, 2025; v1 submitted 8 March, 2025;
originally announced March 2025.
-
Semi-Supervised Learning for Dose Prediction in Targeted Radionuclide: A Synthetic Data Study
Authors:
Jing Zhang,
Alexandre Bousse,
Laetitia Imbert,
Song Xue,
Kuangyu Shi,
Julien Bert
Abstract:
Targeted Radionuclide Therapy (TRT) is a modern strategy in radiation oncology that aims to administer a potent radiation dose specifically to cancer cells using cancer-targeting radiopharmaceuticals. Accurate radiation dose estimation tailored to individual patients is crucial. Deep learning, particularly with pre-therapy imaging, holds promise for personalizing TRT doses. However, current method…
▽ More
Targeted Radionuclide Therapy (TRT) is a modern strategy in radiation oncology that aims to administer a potent radiation dose specifically to cancer cells using cancer-targeting radiopharmaceuticals. Accurate radiation dose estimation tailored to individual patients is crucial. Deep learning, particularly with pre-therapy imaging, holds promise for personalizing TRT doses. However, current methods require large time series of SPECT imaging, which is hardly achievable in routine clinical practice, and thus raises issues of data availability. Our objective is to develop a semi-supervised learning (SSL) solution to personalize dosimetry using pre-therapy images. The aim is to develop an approach that achieves accurate results when PET/CT images are available, but are associated with only a few post-therapy dosimetry data provided by SPECT images. In this work, we introduce an SSL method using a pseudo-label generation approach for regression tasks inspired by the FixMatch framework. The feasibility of the proposed solution was preliminarily evaluated through an in-silico study using synthetic data and Monte Carlo simulation. Experimental results for organ dose prediction yielded promising outcomes, showing that the use of pseudo-labeled data provides better accuracy compared to using only labeled data.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
Geometric Asymmetry-Enhanced Nonreciprocal Supercurrent Transport Revealed by Second-Harmonic Response
Authors:
Yu He,
Zifeng Wang,
Jiaxu Li,
Fenglin Zhong,
Haozhe Yang,
Kewen Shi,
Le Wang,
Guang Yang,
Weisheng Zhao
Abstract:
Nonreciprocal transport in superconducting systems serves as a powerful probe of symmetry-breaking mechanisms, with the superconducting diode effect emerging as a key manifestation enabling cryogenic rectification. While theoretical models have extensively explored superconducting nonreciprocity, experimental verification remains challenging, as conventional transport measurements struggle to dise…
▽ More
Nonreciprocal transport in superconducting systems serves as a powerful probe of symmetry-breaking mechanisms, with the superconducting diode effect emerging as a key manifestation enabling cryogenic rectification. While theoretical models have extensively explored superconducting nonreciprocity, experimental verification remains challenging, as conventional transport measurements struggle to disentangle intrinsic and extrinsic contributions. Nonlinear transport analysis, particularly second-harmonic response, offers an alternative approach by providing a sensitive probe for detecting spatial inversion symmetry breaking in the presence of time-reversal symmetry violation. Here, we systematically investigate the influence of geometric symmetry on nonreciprocal transport by comparing two triangular-extended Hall bar configurations with a symmetric Hall bar control. Second-harmonic nonlinear transport measurements reveal that the triangular extension significantly enhances nonreciprocal response, exhibiting a clear dependence on the base angle of the extension. These findings establish a direct connection between mesoscopic geometry and macroscopic nonreciprocity, demonstrating how spatial symmetry and vortex dynamics govern nonlinear transport. This insight offers a guiding principle for designing superconducting rectification architectures.
△ Less
Submitted 12 April, 2025; v1 submitted 5 March, 2025;
originally announced March 2025.
-
Developing a PET/CT Foundation Model for Cross-Modal Anatomical and Functional Imaging
Authors:
Yujin Oh,
Robert Seifert,
Yihan Cao,
Christoph Clement,
Justin Ferdinandus,
Constantin Lapa,
Alessandro Liebich,
Michelle Amon,
Johanna Enke,
Sifan Song,
Runqi Meng,
Fang Zeng,
Ning Guo,
Xiang Li,
Pedram Heidari,
Axel Rominger,
Kuangyu Shi,
Quanzheng Li
Abstract:
In oncology, Positron Emission Tomography-Computed Tomography (PET/CT) is widely used in cancer diagnosis, staging, and treatment monitoring, as it combines anatomical details from CT with functional metabolic activity and molecular marker expression information from PET. However, existing artificial intelligence-driven PET/CT analyses rely predominantly on task-specific models trained from scratc…
▽ More
In oncology, Positron Emission Tomography-Computed Tomography (PET/CT) is widely used in cancer diagnosis, staging, and treatment monitoring, as it combines anatomical details from CT with functional metabolic activity and molecular marker expression information from PET. However, existing artificial intelligence-driven PET/CT analyses rely predominantly on task-specific models trained from scratch or on limited datasets, limiting their generalizability and robustness. To address this, we propose a foundation model approach specifically designed for multimodal PET/CT imaging. We introduce the Cross-Fraternal Twin Masked Autoencoder (FratMAE), a novel framework that effectively integrates whole-body anatomical and functional or molecular information. FratMAE employs separate Vision Transformer (ViT) encoders for PET and CT scans, along with cross-attention decoders that enable synergistic interactions between modalities during masked autoencoder training. Additionally, it incorporates textual metadata to enhance PET representation learning. By pre-training on PET/CT datasets, FratMAE captures intricate cross-modal relationships and global uptake patterns, achieving superior performance on downstream tasks and demonstrating its potential as a generalizable foundation model.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
PET Image Denoising via Text-Guided Diffusion: Integrating Anatomical Priors through Text Prompts
Authors:
Boxiao Yu,
Savas Ozdemir,
Jiong Wu,
Yizhou Chen,
Ruogu Fang,
Kuangyu Shi,
Kuang Gong
Abstract:
Low-dose Positron Emission Tomography (PET) imaging presents a significant challenge due to increased noise and reduced image quality, which can compromise its diagnostic accuracy and clinical utility. Denoising diffusion probabilistic models (DDPMs) have demonstrated promising performance for PET image denoising. However, existing DDPM-based methods typically overlook valuable metadata such as pa…
▽ More
Low-dose Positron Emission Tomography (PET) imaging presents a significant challenge due to increased noise and reduced image quality, which can compromise its diagnostic accuracy and clinical utility. Denoising diffusion probabilistic models (DDPMs) have demonstrated promising performance for PET image denoising. However, existing DDPM-based methods typically overlook valuable metadata such as patient demographics, anatomical information, and scanning parameters, which should further enhance the denoising performance if considered. Recent advances in vision-language models (VLMs), particularly the pre-trained Contrastive Language-Image Pre-training (CLIP) model, have highlighted the potential of incorporating text-based information into visual tasks to improve downstream performance. In this preliminary study, we proposed a novel text-guided DDPM for PET image denoising that integrated anatomical priors through text prompts. Anatomical text descriptions were encoded using a pre-trained CLIP text encoder to extract semantic guidance, which was then incorporated into the diffusion process via the cross-attention mechanism. Evaluations based on paired 1/20 low-dose and normal-dose 18F-FDG PET datasets demonstrated that the proposed method achieved better quantitative performance than conventional UNet and standard DDPM methods at both the whole-body and organ levels. These results underscored the potential of leveraging VLMs to integrate rich metadata into the diffusion framework to enhance the image quality of low-dose PET scans.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
HVI: A New Color Space for Low-light Image Enhancement
Authors:
Qingsen Yan,
Yixu Feng,
Cheng Zhang,
Guansong Pang,
Kangbiao Shi,
Peng Wu,
Wei Dong,
Jinqiu Sun,
Yanning Zhang
Abstract:
Low-Light Image Enhancement (LLIE) is a crucial computer vision task that aims to restore detailed visual information from corrupted low-light images. Many existing LLIE methods are based on standard RGB (sRGB) space, which often produce color bias and brightness artifacts due to inherent high color sensitivity in sRGB. While converting the images using Hue, Saturation and Value (HSV) color space…
▽ More
Low-Light Image Enhancement (LLIE) is a crucial computer vision task that aims to restore detailed visual information from corrupted low-light images. Many existing LLIE methods are based on standard RGB (sRGB) space, which often produce color bias and brightness artifacts due to inherent high color sensitivity in sRGB. While converting the images using Hue, Saturation and Value (HSV) color space helps resolve the brightness issue, it introduces significant red and black noise artifacts. To address this issue, we propose a new color space for LLIE, namely Horizontal/Vertical-Intensity (HVI), defined by polarized HS maps and learnable intensity. The former enforces small distances for red coordinates to remove the red artifacts, while the latter compresses the low-light regions to remove the black artifacts. To fully leverage the chromatic and intensity information, a novel Color and Intensity Decoupling Network (CIDNet) is further introduced to learn accurate photometric mapping function under different lighting conditions in the HVI space. Comprehensive results from benchmark and ablation experiments show that the proposed HVI color space with CIDNet outperforms the state-of-the-art methods on 10 datasets. The code is available at https://github.com/Fediory/HVI-CIDNet.
△ Less
Submitted 28 February, 2025; v1 submitted 27 February, 2025;
originally announced February 2025.
-
Enhanced response at exceptional points in multi-qubit systems for sensing
Authors:
Tingting Shi,
Vasilii Smirnov,
Kaiye Shi,
Wei Zhang
Abstract:
Exceptional points featuring enhanced energy response to perturbation hold significant potential in detection and measurement of weak signals. Of particular interest is the existence and property of high-order exceptional points in quantum systems, owing to the capability to provide high-order response to perturbations. We investigate the exceptional points in a system of $n$ identical qubits poss…
▽ More
Exceptional points featuring enhanced energy response to perturbation hold significant potential in detection and measurement of weak signals. Of particular interest is the existence and property of high-order exceptional points in quantum systems, owing to the capability to provide high-order response to perturbations. We investigate the exceptional points in a system of $n$ identical qubits possessing parity-time-reversal symmetry. We prove that owing to an incomplete coalescence of eigenstates, the highest possible order of exceptional point is $n+1$, which is also the upper bound of the order of energy response to perturbation. More interestingly, by considering an Ising-type interaction, we analytically prove that to achieve an $(m+1)$-th order response for any $m \le n$, the system must acquire a nontrivial $m$-body interaction. Finally, we propose a Floquet driving scheme to implement an effective multi-body Ising-type interaction, which can be realized in trapped ions or superconducting qubits.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
WRT-SAM: Foundation Model-Driven Segmentation for Generalized Weld Radiographic Testing
Authors:
Yunyi Zhou,
Kun Shi,
Gang Hao
Abstract:
Radiographic testing is a fundamental non-destructive evaluation technique for identifying weld defects and assessing quality in industrial applications due to its high-resolution imaging capabilities. Over the past decade, deep learning techniques have significantly advanced weld defect identification in radiographic images. However, conventional approaches, which rely on training small-scale, ta…
▽ More
Radiographic testing is a fundamental non-destructive evaluation technique for identifying weld defects and assessing quality in industrial applications due to its high-resolution imaging capabilities. Over the past decade, deep learning techniques have significantly advanced weld defect identification in radiographic images. However, conventional approaches, which rely on training small-scale, task-specific models on single-scenario datasets, exhibit poor cross-scenario generalization. Recently, the Segment Anything Model (SAM), a pre-trained visual foundation model trained on large-scale datasets, has demonstrated exceptional zero-shot generalization capabilities. Fine-tuning SAM with limited domain-specific data has yielded promising results in fields such as medical image segmentation and anomaly detection. To the best of our knowledge, this work is the first to introduce SAM-based segmentation for general weld radiographic testing images. We propose WRT-SAM, a novel weld radiographic defect segmentation model that leverages SAM through an adapter-based integration with a specialized prompt generator architecture. To improve adaptability to grayscale weld radiographic images, we introduce a frequency prompt generator module, which enhances the model's sensitivity to frequency-domain information. Furthermore, to address the multi-scale nature of weld defects, we incorporate a multi-scale prompt generator module, enabling the model to effectively extract and encode defect information across varying scales. Extensive experimental evaluations demonstrate that WRT-SAM achieves a recall of 78.87%, a precision of 84.04%, and an AUC of 0.9746, setting a new state-of-the-art (SOTA) benchmark. Moreover, the model exhibits superior zero-shot generalization performance, highlighting its potential for practical deployment in diverse radiographic testing scenarios.
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
Universal Lesion Segmentation Challenge 2023: A Comparative Research of Different Algorithms
Authors:
Kaiwen Shi,
Yifei Li,
Binh Ho,
Jovian Wang,
Kobe Guo
Abstract:
In recent years, machine learning algorithms have achieved much success in segmenting lesions across various tissues. There is, however, not one satisfying model that works well on all tissue types universally. In response to this need, we attempt to train a model that 1) works well on all tissue types, and 2) is capable of still performing fast inferences. To this end, we design our architectures…
▽ More
In recent years, machine learning algorithms have achieved much success in segmenting lesions across various tissues. There is, however, not one satisfying model that works well on all tissue types universally. In response to this need, we attempt to train a model that 1) works well on all tissue types, and 2) is capable of still performing fast inferences. To this end, we design our architectures, test multiple existing architectures, compare their results, and settle upon SwinUnet. We document our rationales, successes, and failures. Finally, we propose some further directions that we think are worth exploring. codes: https://github.com/KWFredShi/ULS2023NGKD.git
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Time Parameterized Optimal Transport
Authors:
Kaiwen Shi
Abstract:
Optimal transport has gained significant attention in recent years due to its effectiveness in deep learning and computer vision. Its descendant metric, the Wasserstein distance, has been particularly successful in measuring distribution dissimilarities. While extensive research has focused on optimal transport and its regularized variants (such as entropy, sparsity, and capacity constraints) the…
▽ More
Optimal transport has gained significant attention in recent years due to its effectiveness in deep learning and computer vision. Its descendant metric, the Wasserstein distance, has been particularly successful in measuring distribution dissimilarities. While extensive research has focused on optimal transport and its regularized variants (such as entropy, sparsity, and capacity constraints) the role of time has been largely overlooked. However, time is a critical factor in real world transport problems.
In this work, we introduce a time parameterized formulation of the optimal transport problem, incorporating a time variable t to represent sequential steps and enforcing specific constraints at each step. We propose a systematic method to solve a special subproblem and develop a heuristic search algorithm that achieves nearly optimal solutions while significantly reducing computational time.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
A Survey of Quantized Graph Representation Learning: Connecting Graph Structures with Large Language Models
Authors:
Qika Lin,
Zhen Peng,
Kaize Shi,
Kai He,
Yiming Xu,
Erik Cambria,
Mengling Feng
Abstract:
Recent years have witnessed rapid advances in graph representation learning, with the continuous embedding approach emerging as the dominant paradigm. However, such methods encounter issues regarding parameter efficiency, interpretability, and robustness. Thus, Quantized Graph Representation (QGR) learning has recently gained increasing interest, which represents the graph structure with discrete…
▽ More
Recent years have witnessed rapid advances in graph representation learning, with the continuous embedding approach emerging as the dominant paradigm. However, such methods encounter issues regarding parameter efficiency, interpretability, and robustness. Thus, Quantized Graph Representation (QGR) learning has recently gained increasing interest, which represents the graph structure with discrete codes instead of conventional continuous embeddings. Given its analogous representation form to natural language, QGR also possesses the capability to seamlessly integrate graph structures with large language models (LLMs). As this emerging paradigm is still in its infancy yet holds significant promise, we undertake this thorough survey to promote its rapid future prosperity. We first present the background of the general quantization methods and their merits. Moreover, we provide an in-depth demonstration of current QGR studies from the perspectives of quantized strategies, training objectives, distinctive designs, knowledge graph quantization, and applications. We further explore the strategies for code dependence learning and integration with LLMs. At last, we give discussions and conclude future directions, aiming to provide a comprehensive picture of QGR and inspire future research.
△ Less
Submitted 2 February, 2025;
originally announced February 2025.
-
Joint System Latency and Data Freshness Optimization for Cache-enabled Mobile Crowdsensing Networks
Authors:
Kexin Shi,
Yaru Fu,
Yongna Guo,
Fu Lee Wang,
Yan Zhang
Abstract:
Mobile crowdsensing (MCS) networks enable large-scale data collection by leveraging the ubiquity of mobile devices. However, frequent sensing and data transmission can lead to significant resource consumption. To mitigate this issue, edge caching has been proposed as a solution for storing recently collected data. Nonetheless, this approach may compromise data freshness. In this paper, we investig…
▽ More
Mobile crowdsensing (MCS) networks enable large-scale data collection by leveraging the ubiquity of mobile devices. However, frequent sensing and data transmission can lead to significant resource consumption. To mitigate this issue, edge caching has been proposed as a solution for storing recently collected data. Nonetheless, this approach may compromise data freshness. In this paper, we investigate the trade-off between re-using cached task results and re-sensing tasks in cache-enabled MCS networks, aiming to minimize system latency while maintaining information freshness. To this end, we formulate a weighted delay and age of information (AoI) minimization problem, jointly optimizing sensing decisions, user selection, channel selection, task allocation, and caching strategies. The problem is a mixed-integer non-convex programming problem which is intractable. Therefore, we decompose the long-term problem into sequential one-shot sub-problems and design a framework that optimizes system latency, task sensing decision, and caching strategy subproblems. When one task is re-sensing, the one-shot problem simplifies to the system latency minimization problem, which can be solved optimally. The task sensing decision is then made by comparing the system latency and AoI. Additionally, a Bayesian update strategy is developed to manage the cached task results. Building upon this framework, we propose a lightweight and time-efficient algorithm that makes real-time decisions for the long-term optimization problem. Extensive simulation results validate the effectiveness of our approach.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
AI Chatbots as Professional Service Agents: Developing a Professional Identity
Authors:
Wenwen Li,
Kangwei Shi,
Yidong Chai
Abstract:
With the rapid expansion of large language model (LLM) applications, there is an emerging shift in the role of LLM-based AI chatbots from serving merely as general inquiry tools to acting as professional service agents. However, current studies often overlook a critical aspect of professional service agents: the act of communicating in a manner consistent with their professional identities. This i…
▽ More
With the rapid expansion of large language model (LLM) applications, there is an emerging shift in the role of LLM-based AI chatbots from serving merely as general inquiry tools to acting as professional service agents. However, current studies often overlook a critical aspect of professional service agents: the act of communicating in a manner consistent with their professional identities. This is of particular importance in the healthcare sector, where effective communication with patients is essential for achieving professional goals, such as promoting patient well-being by encouraging healthy behaviors. To bridge this gap, we propose LAPI (LLM-based Agent with a Professional Identity), a novel framework for designing professional service agent tailored for medical question-and-answer (Q\&A) services, ensuring alignment with a specific professional identity. Our method includes a theory-guided task planning process that decomposes complex professional tasks into manageable subtasks aligned with professional objectives and a pragmatic entropy method designed to generate professional and ethical responses with low uncertainty. Experiments on various LLMs show that the proposed approach outperforms baseline methods, including few-shot prompting, chain-of-thought prompting, across key metrics such as fluency, naturalness, empathy, patient-centricity, and ROUGE-L scores. Additionally, the ablation study underscores the contribution of each component to the overall effectiveness of the approach.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
Positronium Lifetime Imaging with the Biograph Vision Quadra using 124I
Authors:
Lorenzo Mercolli,
William M. Steinberger,
Narendra Rathod,
Maurizio Conti,
Paweł Moskal,
Axel Rominger,
Robert Seifert,
Kuangyu Shi,
Ewa Ł. Stępień,
Hasan Sari
Abstract:
Purpose: Measuring the ortho-positronium (oPs) lifetime in human tissue bears the potential of adding clinically relevant information about the tissue microenvironment to conventional positron emission tomography (PET). Through phantom measurements, we investigate the voxel-wise measurement of oPs lifetime using a commercial long-axial field-of-view (LAFOV) PET scanner. Methods: We prepared four s…
▽ More
Purpose: Measuring the ortho-positronium (oPs) lifetime in human tissue bears the potential of adding clinically relevant information about the tissue microenvironment to conventional positron emission tomography (PET). Through phantom measurements, we investigate the voxel-wise measurement of oPs lifetime using a commercial long-axial field-of-view (LAFOV) PET scanner. Methods: We prepared four samples with mixtures of Amberlite XAD4, a porous polymeric adsorbent, and water and added between 1.12 MBq and 1.44 MBq of $^{124}$I. The samples were scanned in two different setups: once with a couple of centimeters between each sample (15 minutes scan time) and once with all samples taped together (40 minutes scan time). For each scan, we determine the oPs lifetime for the full samples and at the voxel level. The voxel sizes under consideration are $10.0^3$ mm$^3$, $7.1^3$ mm$^3$ and $4.0^3$ mm$^3$. Results: Amberlite XAD4 allows the preparation of samples with distinct oPs lifetime. Using a Bayesian fitting procedure, the oPs lifetimes in the whole samples are $2.52 \pm 0.03$ ns, $2.37\pm 0.03$ ns, $2.27\pm0.04$ ns and $1.82\pm 0.02$ ns, respectively. The voxel-wise oPs lifetime fits showed that even with $4.0^3$ mm$^3$ voxels the samples are clearly distinguishable and a central voxels have good count statistics. However, the situation with the samples close together remains challenging with respect to the spatial distinction of regions with different oPs lifetimes. Conclusion: Our study shows that positronium lifetime imaging on a commercial LAFOV PET/CT should be feasible under clinical conditions using $^{124}$I.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
AADNet: Exploring EEG Spatiotemporal Information for Fast and Accurate Orientation and Timbre Detection of Auditory Attention Based on A Cue-Masked Paradigm
Authors:
Keren Shi,
Xu Liu,
Xue Yuan,
Haijie Shang,
Ruiting Dai,
Hanbin Wang,
Yunfa Fu,
Ning Jiang,
Jiayuan He
Abstract:
Auditory attention decoding from electroencephalogram (EEG) could infer to which source the user is attending in noisy environments. Decoding algorithms and experimental paradigm designs are crucial for the development of technology in practical applications. To simulate real-world scenarios, this study proposed a cue-masked auditory attention paradigm to avoid information leakage before the exper…
▽ More
Auditory attention decoding from electroencephalogram (EEG) could infer to which source the user is attending in noisy environments. Decoding algorithms and experimental paradigm designs are crucial for the development of technology in practical applications. To simulate real-world scenarios, this study proposed a cue-masked auditory attention paradigm to avoid information leakage before the experiment. To obtain high decoding accuracy with low latency, an end-to-end deep learning model, AADNet, was proposed to exploit the spatiotemporal information from the short time window of EEG signals. The results showed that with a 0.5-second EEG window, AADNet achieved an average accuracy of 93.46% and 91.09% in decoding auditory orientation attention (OA) and timbre attention (TA), respectively. It significantly outperformed five previous methods and did not need the knowledge of the original audio source. This work demonstrated that it was possible to detect the orientation and timbre of auditory attention from EEG signals fast and accurately. The results are promising for the real-time multi-property auditory attention decoding, facilitating the application of the neuro-steered hearing aids and other assistive listening devices.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
Evaluation of Deep Learning-based Scatter Correction on a Long-axial Field-of-view PET scanner
Authors:
Baptiste Laurent,
Alexandre Bousse,
Thibaut Merlin,
Axel Rominger,
Kuangyu Shi,
Dimitris Visvikis
Abstract:
Objective: Long-axial field-of-view (LAFOV) positron emission tomography (PET) systems allow higher sensitivity, with an increased number of detected lines of response induced by a larger angle of acceptance. However, this extended angle increases the number of multiple scatters and the scatter contribution within oblique planes. As scattering affects both quality and quantification of the reconst…
▽ More
Objective: Long-axial field-of-view (LAFOV) positron emission tomography (PET) systems allow higher sensitivity, with an increased number of detected lines of response induced by a larger angle of acceptance. However, this extended angle increases the number of multiple scatters and the scatter contribution within oblique planes. As scattering affects both quality and quantification of the reconstructed image, it is crucial to correct this effect with more accurate methods than the state-of-the-art single scatter simulation (SSS) that can reach its limits with such an extended field-of-view (FOV). In this work, which is an extension of our previous assessment of deep learning-based scatter estimation (DLSE) carried out on a conventional PET system, we aim to evaluate the DLSE method performance on LAFOV total-body PET.
Approach: The proposed DLSE method based on a convolutional neural network (CNN) U-Net architecture uses emission and attenuation sinograms to estimate scatter sinogram. The network was trained from Monte-Carlo (MC) simulations of XCAT phantoms [18F]-FDG PET acquisitions using a Siemens Biograph Vision Quadra scanner model, with multiple morphologies and dose distributions. We firstly evaluated the method performance on simulated data in both sinogram and image domain by comparing it to the MC ground truth and SSS scatter sinograms. We then tested the method on seven [18F]-FDG and seven [18F]-PSMA clinical datasets, and compare it to SSS estimations.
Results: DLSE showed superior accuracy on phantom data, greater robustness to patient size and dose variations compared to SSS, and better lesion contrast recovery. It also yielded promising clinical results, improving lesion contrasts in [18F]-FDG datasets and performing consistently with [18F]-PSMA datasets despite no training with [18F]-PSMA.
△ Less
Submitted 7 February, 2025; v1 submitted 2 January, 2025;
originally announced January 2025.
-
OpenAI o1 System Card
Authors:
OpenAI,
:,
Aaron Jaech,
Adam Kalai,
Adam Lerer,
Adam Richardson,
Ahmed El-Kishky,
Aiden Low,
Alec Helyar,
Aleksander Madry,
Alex Beutel,
Alex Carney,
Alex Iftimie,
Alex Karpenko,
Alex Tachard Passos,
Alexander Neitz,
Alexander Prokofiev,
Alexander Wei,
Allison Tam,
Ally Bennett,
Ananya Kumar,
Andre Saraiva,
Andrea Vallone,
Andrew Duberstein,
Andrew Kondrich
, et al. (238 additional authors not shown)
Abstract:
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-ar…
▽ More
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.
-
Detecting Fake News on Social Media: A Novel Reliability Aware Machine-Crowd Hybrid Intelligence-Based Method
Authors:
Yidong Chai,
Kangwei Shi,
Jiaheng Xie,
Chunli Liu,
Yuanchun Jiang,
Yezheng Liu
Abstract:
Fake news on social media platforms poses a significant threat to societal systems, underscoring the urgent need for advanced detection methods. The existing detection methods can be divided into machine intelligence-based, crowd intelligence-based, and hybrid intelligence-based methods. Among them, hybrid intelligence-based methods achieve the best performance but fail to consider the reliability…
▽ More
Fake news on social media platforms poses a significant threat to societal systems, underscoring the urgent need for advanced detection methods. The existing detection methods can be divided into machine intelligence-based, crowd intelligence-based, and hybrid intelligence-based methods. Among them, hybrid intelligence-based methods achieve the best performance but fail to consider the reliability issue in detection. In light of this, we propose a novel Reliability Aware Hybrid Intelligence (RAHI) method for fake news detection. Our method comprises three integral modules. The first module employs a Bayesian deep learning model to capture the inherent reliability within machine intelligence. The second module uses an Item Response Theory (IRT)-based user response aggregation to account for the reliability in crowd intelligence. The third module introduces a new distribution fusion mechanism, which takes the distributions derived from both machine and crowd intelligence as input, and outputs a fused distribution that provides predictions along with the associated reliability. The experiments on the Weibo dataset demonstrate the advantages of our method. This study contributes to the research field with a novel RAHI-based method, and the code is shared at https://github.com/Kangwei-g/RAHI. This study has practical implications for three key stakeholders: internet users, online platform managers, and the government.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
Anomalous wave-packet transport on boundaries of Floquet topological systems
Authors:
Xin-Xin Yang,
Kai-Ye Shi,
F. Nur Ünal,
Wei Zhang
Abstract:
A two-dimensional periodically driven (Floquet) system with zero winding number in the absence of time-reversal symmetry is usually considered topologically trivial. Here, we study the dynamics of a Gaussian wave packet placed at the boundary of a two-dimensional driven system with zero winding numbers but multiple valley-protected edge states that can be realized in a square Raman lattice, and in…
▽ More
A two-dimensional periodically driven (Floquet) system with zero winding number in the absence of time-reversal symmetry is usually considered topologically trivial. Here, we study the dynamics of a Gaussian wave packet placed at the boundary of a two-dimensional driven system with zero winding numbers but multiple valley-protected edge states that can be realized in a square Raman lattice, and investigate the unidirectionally propagating topological edge currents. By carefully tuning the initial parameters of the wave packet including its spin polarization as well as the initial time of the periodic driving, we control the population of different edge states, where the speed of the resulting propagation establishes a direct correspondence with the target dispersions across different gaps and valleys. Interestingly, we find that the edge states at different valleys in the $π$ gap can hybridize and form bowtie-shaped edge bands fully detached from the bulk. This phase, not only presents a favorable regime with narrower bulk bands, but also exhibits distinct edge dynamics where the majority of particles bounce back-and-forth confined to a boundary while a small portion can follow a chiral transport around the sample.
△ Less
Submitted 4 December, 2024; v1 submitted 2 December, 2024;
originally announced December 2024.
-
Predictive Power of LLMs in Financial Markets
Authors:
Jerick Shi,
Burton Hollifield
Abstract:
Predicting the movement of the stock market and other assets has been valuable over the past few decades. Knowing how the value of a certain sector market may move in the future provides much information for investors, as they use that information to develop strategies to maximize profit or minimize risk. However, market data are quite noisy, and it is challenging to choose the right data or the r…
▽ More
Predicting the movement of the stock market and other assets has been valuable over the past few decades. Knowing how the value of a certain sector market may move in the future provides much information for investors, as they use that information to develop strategies to maximize profit or minimize risk. However, market data are quite noisy, and it is challenging to choose the right data or the right model to create such predictions. With the rise of large language models, there are ways to analyze certain data much more efficiently than before.
Our goal is to determine whether the GPT model provides more useful information compared to other traditional transformer models, such as the BERT model. We shall use data from the Federal Reserve Beige Book, which provides summaries of economic conditions in different districts in the US. Using such data, we then employ the LLM's to make predictions on the correlations. Using these correlations, we then compare the results with well-known strategies and determine whether knowing the economic conditions improves investment decisions. We conclude that the Beige Book does contain information regarding correlations amongst different assets, yet the GPT model has too much look-ahead bias and that traditional models still triumph.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
Hypergraph $p$-Laplacian equations for data interpolation and semi-supervised learning
Authors:
Kehan Shi,
Martin Burger
Abstract:
Hypergraph learning with $p$-Laplacian regularization has attracted a lot of attention due to its flexibility in modeling higher-order relationships in data. This paper focuses on its fast numerical implementation, which is challenging due to the non-differentiability of the objective function and the non-uniqueness of the minimizer. We derive a hypergraph $p$-Laplacian equation from the subdiffer…
▽ More
Hypergraph learning with $p$-Laplacian regularization has attracted a lot of attention due to its flexibility in modeling higher-order relationships in data. This paper focuses on its fast numerical implementation, which is challenging due to the non-differentiability of the objective function and the non-uniqueness of the minimizer. We derive a hypergraph $p$-Laplacian equation from the subdifferential of the $p$-Laplacian regularization. A simplified equation that is mathematically well-posed and computationally efficient is proposed as an alternative. Numerical experiments verify that the simplified $p$-Laplacian equation suppresses spiky solutions in data interpolation and improves classification accuracy in semi-supervised learning. The remarkably low computational cost enables further applications.
△ Less
Submitted 7 April, 2025; v1 submitted 19 November, 2024;
originally announced November 2024.
-
When Mamba Meets xLSTM: An Efficient and Precise Method with the xLSTM-VMUNet Model for Skin lesion Segmentation
Authors:
Zhuoyi Fang,
Jiajia Liu,
Kexuan Shi,
Qiang Han
Abstract:
Automatic melanoma segmentation is essential for early skin cancer detection, yet challenges arise from the heterogeneity of melanoma, as well as interfering factors like blurred boundaries, low contrast, and imaging artifacts. While numerous algorithms have been developed to address these issues, previous approaches have often overlooked the need to jointly capture spatial and sequential features…
▽ More
Automatic melanoma segmentation is essential for early skin cancer detection, yet challenges arise from the heterogeneity of melanoma, as well as interfering factors like blurred boundaries, low contrast, and imaging artifacts. While numerous algorithms have been developed to address these issues, previous approaches have often overlooked the need to jointly capture spatial and sequential features within dermatological images. This limitation hampers segmentation accuracy, especially in cases with indistinct borders or structurally similar lesions. Additionally, previous models lacked both a global receptive field and high computational efficiency. In this work, we present the xLSTM-VMUNet Model, which jointly capture spatial and sequential features within dermatological images successfully. xLSTM-VMUNet can not only specialize in extracting spatial features from images, focusing on the structural characteristics of skin lesions, but also enhance contextual understanding, allowing more effective handling of complex medical image structures. Experiment results on the ISIC2018 dataset demonstrate that xLSTM-VMUNet outperforms VMUNet by 4.85% on DSC and 6.41% on IoU on the ISIC2017 dataset, by 1.25% on DSC and 2.07% on IoU on the ISIC2018 dataset, with faster convergence and consistently high segmentation performance. Our code is available at https://github.com/FangZhuoyi/XLSTM-VMUNet.
△ Less
Submitted 12 March, 2025; v1 submitted 14 November, 2024;
originally announced November 2024.
-
Linear Spherical Sliced Optimal Transport: A Fast Metric for Comparing Spherical Data
Authors:
Xinran Liu,
Yikun Bai,
Rocío Díaz Martín,
Kaiwen Shi,
Ashkan Shahbazi,
Bennett A. Landman,
Catie Chang,
Soheil Kolouri
Abstract:
Efficient comparison of spherical probability distributions becomes important in fields such as computer vision, geosciences, and medicine. Sliced optimal transport distances, such as spherical and stereographic spherical sliced Wasserstein distances, have recently been developed to address this need. These methods reduce the computational burden of optimal transport by slicing hyperspheres into o…
▽ More
Efficient comparison of spherical probability distributions becomes important in fields such as computer vision, geosciences, and medicine. Sliced optimal transport distances, such as spherical and stereographic spherical sliced Wasserstein distances, have recently been developed to address this need. These methods reduce the computational burden of optimal transport by slicing hyperspheres into one-dimensional projections, i.e., lines or circles. Concurrently, linear optimal transport has been proposed to embed distributions into \( L^2 \) spaces, where the \( L^2 \) distance approximates the optimal transport distance, thereby simplifying comparisons across multiple distributions. In this work, we introduce the Linear Spherical Sliced Optimal Transport (LSSOT) framework, which utilizes slicing to embed spherical distributions into \( L^2 \) spaces while preserving their intrinsic geometry, offering a computationally efficient metric for spherical probability measures. We establish the metricity of LSSOT and demonstrate its superior computational efficiency in applications such as cortical surface registration, 3D point cloud interpolation via gradient flow, and shape embedding. Our results demonstrate the significant computational benefits and high accuracy of LSSOT in these applications.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Dependence of Electrostatic Patch Force Evaluation on the Lateral Resolution of Kelvin Probe Force Microscopy
Authors:
Kun Shi,
Pengshun Luo,
Jinquan Liu,
Hang Yin,
Zebing Zhou
Abstract:
Kelvin Probe Force Microscopy (KPFM) is widely used to measure the surface potential on samples, from which electrostatic patch force can be calculated. However, since the KPFM measurements represent a weighted average of local potentials on the sample, the accuracy of the evaluation critically depends on the precision and lateral resolution of the method. In this paper, we investigate the influen…
▽ More
Kelvin Probe Force Microscopy (KPFM) is widely used to measure the surface potential on samples, from which electrostatic patch force can be calculated. However, since the KPFM measurements represent a weighted average of local potentials on the sample, the accuracy of the evaluation critically depends on the precision and lateral resolution of the method. In this paper, we investigate the influence of this averaging effect on patch force estimations using both analytic and numerical methods. First, we derive the correlation functions of patch potential and establish the formulas for calculating the electrostatic patch forces in the parallel-plate geometry, with and without consideration of the KPFM measurement effect. Thus, an analytic method is established to determine the accuracy of patch force evaluation when the statistical parameters of the patch potential and the lateral resolution of the KPFM are given. Second, numerical simulations are employed to explore the dependence of estimated patch forces on the KPFM's lateral resolution under more realistic conditions. Both analytic and numerical results show a similar dependence of the patch force estimation on the patch characteristic size, potential fluctuation and the lateral resolution of the KPFM. It is also found that the underestimation of the patch force becomes less sensitive to the KPFM's resolution as the separation between plates increases. The results of this study could provide useful guidance for the accurate evaluation of electrostatic patch forces using KPFM.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
GPT-4o System Card
Authors:
OpenAI,
:,
Aaron Hurst,
Adam Lerer,
Adam P. Goucher,
Adam Perelman,
Aditya Ramesh,
Aidan Clark,
AJ Ostrow,
Akila Welihinda,
Alan Hayes,
Alec Radford,
Aleksander Mądry,
Alex Baker-Whitcomb,
Alex Beutel,
Alex Borzunov,
Alex Carney,
Alex Chow,
Alex Kirillov,
Alex Nichol,
Alex Paino,
Alex Renzin,
Alex Tachard Passos,
Alexander Kirillov,
Alexi Christakis
, et al. (395 additional authors not shown)
Abstract:
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil…
▽ More
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.