Skip to main content

Showing 1–50 of 150 results for author: Song, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04766  [pdf, ps, other

    cs.LG cs.CL

    ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems

    Authors: Yiming Zhang, Yingfan Ma, Yanmei Gu, Zhengkai Yang, Yihong Zhuang, Feng Wang, Zenan Huang, Yuanyuan Wang, Chao Huang, Bowen Song, Cheng Lin, Junbo Zhao

    Abstract: Large Language Models (LLMs) have shown impressive performance in domains such as mathematics and programming, yet their capabilities in physics remain underexplored and poorly understood. Physics poses unique challenges that demand not only precise computation but also deep conceptual understanding and physical modeling skills. Existing benchmarks often fall short due to limited difficulty, multi… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2506.23107  [pdf, ps, other

    cs.AI

    Can Large Language Models Capture Human Risk Preferences? A Cross-Cultural Study

    Authors: Bing Song, Jianing Liu, Sisi Jian, Chenyang Wu, Vinayak Dixit

    Abstract: Large language models (LLMs) have made significant strides, extending their applications to dialogue systems, automated content creation, and domain-specific advisory tasks. However, as their use grows, concerns have emerged regarding their reliability in simulating complex decision-making behavior, such as risky decision-making, where a single choice can lead to multiple outcomes. This study inve… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: 20 pages, 1 figure

  3. arXiv:2506.15160  [pdf, ps, other

    cs.CV

    Enhancing point cloud analysis via neighbor aggregation correction based on cross-stage structure correlation

    Authors: Jiaqi Shi, Jin Xiao, Xiaoguang Hu, Boyang Song, Hao Jiang, Tianyou Chen, Baochang Zhang

    Abstract: Point cloud analysis is the cornerstone of many downstream tasks, among which aggregating local structures is the basis for understanding point cloud data. While numerous works aggregate neighbor using three-dimensional relative coordinates, there are irrelevant point interference and feature hierarchy gap problems due to the limitation of local coordinates. Although some works address this limita… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 17 papes, 7 figures

  4. arXiv:2506.06185  [pdf, ps, other

    cs.LG math.NA stat.CO stat.ML

    Antithetic Noise in Diffusion Models

    Authors: Jing Jia, Sifan Liu, Bowen Song, Wei Yuan, Liyue Shen, Guanyang Wang

    Abstract: We initiate a systematic study of antithetic initial noise in diffusion models. Across unconditional models trained on diverse datasets, text-conditioned latent-diffusion models, and diffusion-posterior samplers, we find that pairing each initial noise with its negation consistently yields strongly negatively correlated samples. To explain this phenomenon, we combine experiments and theoretical an… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 43 pages, 20 figures, 9 tables

  5. arXiv:2505.20871  [pdf, ps, other

    cs.CL

    Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG

    Authors: Xin Sun, Jianan Xie, Zhongqi Chen, Qiang Liu, Shu Wu, Yuehe Chen, Bowen Song, Weiqiang Wang, Zilei Wang, Liang Wang

    Abstract: Large language models (LLMs) augmented with retrieval systems have significantly advanced natural language processing tasks by integrating external knowledge sources, enabling more accurate and contextually rich responses. To improve the robustness of such systems against noisy retrievals, Retrieval-Augmented Fine-Tuning (RAFT) has emerged as a widely adopted method. However, RAFT conditions model… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: ACL 2025 main

  6. arXiv:2504.04747  [pdf, other

    cs.CV

    Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models

    Authors: Yoojin Jung, Byung Cheol Song

    Abstract: Deep learning-based computer vision systems adopt complex and large architectures to improve performance, yet they face challenges in deployment on resource-constrained mobile and edge devices. To address this issue, model compression techniques such as pruning, quantization, and matrix factorization have been proposed; however, these compressed models are often highly vulnerable to adversarial at… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR2025

  7. arXiv:2504.03667  [pdf, other

    cs.DC

    High-Performance Parallelization of Dijkstra's Algorithm Using MPI and CUDA

    Authors: Boyang Song

    Abstract: This paper investigates the parallelization of Dijkstra's algorithm for computing the shortest paths in large-scale graphs using MPI and CUDA. The primary hypothesis is that by leveraging parallel computing, the computation time can be significantly reduced compared to a serial implementation. To validate this, I implemented three versions of the algorithm: a serial version, an MPI-based parallel… ▽ More

    Submitted 19 March, 2025; originally announced April 2025.

  8. arXiv:2504.01822  [pdf, other

    cs.SE cs.CR

    Track and Trace: Automatically Uncovering Cross-chain Transactions in the Multi-blockchain Ecosystems

    Authors: Dan Lin, Ziye Zheng, Jiajing Wu, Jingjing Yang, Kaixin Lin, Huan Xiao, Bowen Song, Zibin Zheng

    Abstract: Cross-chain technology enables seamless asset transfer and message-passing within decentralized finance (DeFi) ecosystems, facilitating multi-chain coexistence in the current blockchain environment. However, this development also raises security concerns, as malicious actors exploit cross-chain asset flows to conceal the provenance and destination of assets, thereby facilitating illegal activities… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  9. arXiv:2503.12799  [pdf, other

    cs.CV cs.MM

    Grounded Chain-of-Thought for Multimodal Large Language Models

    Authors: Qiong Wu, Xiangcong Yang, Yiyi Zhou, Chenxin Fang, Baiyang Song, Xiaoshuai Sun, Rongrong Ji

    Abstract: Despite great progress, existing multimodal large language models (MLLMs) are prone to visual hallucination, greatly impeding their trustworthy applications. In this paper, we study this problem from the perspective of visual-spatial reasoning, and propose a new learning task for MLLMs, termed Grounded Chain-of-Thought (GCoT). Different from recent visual CoT studies, which focus more on visual kn… ▽ More

    Submitted 24 March, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

  10. arXiv:2503.02989  [pdf, other

    cs.CL cs.AI

    Effectively Steer LLM To Follow Preference via Building Confident Directions

    Authors: Bingqing Song, Boran Han, Shuai Zhang, Hao Wang, Haoyang Fang, Bonan Min, Yuyang Wang, Mingyi Hong

    Abstract: Having an LLM that aligns with human preferences is essential for accommodating individual needs, such as maintaining writing style or generating specific topics of interest. The majority of current alignment methods rely on fine-tuning or prompting, which can be either costly or difficult to control. Model steering algorithms, which modify the model output by constructing specific steering direct… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  11. arXiv:2502.17307  [pdf, ps, other

    cs.LG cs.GT cs.MA

    Survey on Strategic Mining in Blockchain: A Reinforcement Learning Approach

    Authors: Jichen Li, Lijia Xie, Hanting Huang, Bo Zhou, Binfeng Song, Wanying Zeng, Xiaotie Deng, Xiao Zhang

    Abstract: Strategic mining attacks, such as selfish mining, exploit blockchain consensus protocols by deviating from honest behavior to maximize rewards. Markov Decision Process (MDP) analysis faces scalability challenges in modern digital economics, including blockchain. To address these limitations, reinforcement learning (RL) provides a scalable alternative, enabling adaptive strategy optimization in com… ▽ More

    Submitted 24 February, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

    Comments: 10 pages

  12. arXiv:2502.04670  [pdf, other

    cs.LG cs.AI

    CCS: Controllable and Constrained Sampling with Diffusion Models via Initial Noise Perturbation

    Authors: Bowen Song, Zecheng Zhang, Zhaoxu Luo, Jason Hu, Wei Yuan, Jing Jia, Zhengxu Tang, Guanyang Wang, Liyue Shen

    Abstract: Diffusion models have emerged as powerful tools for generative tasks, producing high-quality outputs across diverse domains. However, how the generated data responds to the initial noise perturbation in diffusion models remains under-explored, which hinders understanding the controllability of the sampling process. In this work, we first observe an interesting phenomenon: the relationship between… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  13. arXiv:2501.09802  [pdf

    cs.CR

    W3ID: A Quantum Computing-Secure Digital Identity System Redefining Standards for Web3 and Digital Twins

    Authors: Joseph Yun, Eli Lifton, Eunseo Lee, Yohan Yun, Abigail Song, Joshua Lee, Cristian Jimenez-Bert, Benedict Song, Yejun Lee, Alex Seo, Sijung Yun

    Abstract: The rapid advancements in quantum computing present significant threats to existing encryption standards and internet security. Simultaneously, the advent of Web 3.0 marks a transformative era in internet history, emphasizing enhanced data security, decentralization, and user ownership. This white paper introduces the W3ID, an abbreviation of Web3 standard meeting universal digital ID, which is a… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  14. arXiv:2501.07844  [pdf, other

    quant-ph cs.CR

    Towards A Hybrid Quantum Differential Privacy

    Authors: Baobao Song, Shiva Raj Pokhrel, Athanasios V. Vasilakos, Tianqing Zhu, Gang Li

    Abstract: Quantum computing offers unparalleled processing power but raises significant data privacy challenges. Quantum Differential Privacy (QDP) leverages inherent quantum noise to safeguard privacy, surpassing traditional DP. This paper develops comprehensive noise profiles, identifies noise types beneficial for QDP, and highlights teh need for practical implementations beyond theoretical models. Existi… ▽ More

    Submitted 15 January, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

  15. arXiv:2412.15822  [pdf, other

    cs.LG cs.AI cs.CL

    S$^2$DN: Learning to Denoise Unconvincing Knowledge for Inductive Knowledge Graph Completion

    Authors: Tengfei Ma, Yujie Chen, Liang Wang, Xuan Lin, Bosheng Song, Xiangxiang Zeng

    Abstract: Inductive Knowledge Graph Completion (KGC) aims to infer missing facts between newly emerged entities within knowledge graphs (KGs), posing a significant challenge. While recent studies have shown promising results in inferring such entities through knowledge subgraph reasoning, they suffer from (i) the semantic inconsistencies of similar relations, and (ii) noisy interactions inherent in KGs due… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: 15 pages

  16. arXiv:2412.03571  [pdf, other

    cs.CV

    Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation

    Authors: Bingjie Song, Xin Huang, Ruting Xie, Xue Wang, Qing Wang

    Abstract: We present Style3D, a novel approach for generating stylized 3D objects from a content image and a style image. Unlike most previous methods that require case- or style-specific training, Style3D supports instant 3D object stylization. Our key insight is that 3D object stylization can be decomposed into two interconnected processes: multi-view dual-feature alignment and sparse-view spatial reconst… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  17. arXiv:2411.13609  [pdf, other

    cs.CV

    What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality

    Authors: Zihan Wang, Songlin Li, Lingyan Hao, Xinyu Hu, Bowen Song

    Abstract: As video generation models advance rapidly, assessing the quality of generated videos has become increasingly critical. Existing metrics, such as Fréchet Video Distance (FVD), Inception Score (IS), and ClipSim, measure quality primarily in latent space rather than from a human visual perspective, often overlooking key aspects like appearance and motion consistency to physical laws. In this paper,… ▽ More

    Submitted 24 November, 2024; v1 submitted 19 November, 2024; originally announced November 2024.

  18. arXiv:2411.08196  [pdf, other

    cs.CV

    Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing

    Authors: Zitao Shuai, Chenwei Wu, Zhengxu Tang, Bowen Song, Liyue Shen

    Abstract: Diffusion Transformers (DiTs) have recently achieved remarkable success in text-guided image generation. In image editing, DiTs project text and image inputs to a joint latent space, from which they decode and synthesize new images. However, it remains largely unexplored how multimodal information collectively forms this joint space and how they guide the semantics of the synthesized images. In th… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2408.13335

  19. arXiv:2411.07538  [pdf, other

    cs.LG math.OC

    Unraveling the Gradient Descent Dynamics of Transformers

    Authors: Bingqing Song, Boran Han, Shuai Zhang, Jie Ding, Mingyi Hong

    Abstract: While the Transformer architecture has achieved remarkable success across various domains, a thorough theoretical foundation explaining its optimization dynamics is yet to be fully developed. In this study, we aim to bridge this understanding gap by answering the following two core questions: (1) Which types of Transformer architectures allow Gradient Descent (GD) to achieve guaranteed convergence… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

  20. arXiv:2411.06193  [pdf, ps, other

    cs.IT eess.SP

    Large Language Models and Artificial Intelligence Generated Content Technologies Meet Communication Networks

    Authors: Jie Guo, Meiting Wang, Hang Yin, Bin Song, Yuhao Chi, Fei Richard Yu, Chau Yuen

    Abstract: Artificial intelligence generated content (AIGC) technologies, with a predominance of large language models (LLMs), have demonstrated remarkable performance improvements in various applications, which have attracted great interests from both academia and industry. Although some noteworthy advancements have been made in this area, a comprehensive exploration of the intricate relationship between AI… ▽ More

    Submitted 12 November, 2024; v1 submitted 9 November, 2024; originally announced November 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  21. arXiv:2410.11730  [pdf, other

    cs.CV cs.AI eess.IV

    Patch-Based Diffusion Models Beat Whole-Image Models for Mismatched Distribution Inverse Problems

    Authors: Jason Hu, Bowen Song, Jeffrey A. Fessler, Liyue Shen

    Abstract: Diffusion models have achieved excellent success in solving inverse problems due to their ability to learn strong image priors, but existing approaches require a large training dataset of images that should come from the same distribution as the test dataset. When the training and test distributions are mismatched, artifacts and hallucinations can occur in reconstructed images due to the incorrect… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  22. arXiv:2410.05272  [pdf

    eess.IV cs.CV

    DVS: Blood cancer detection using novel CNN-based ensemble approach

    Authors: Md Taimur Ahad, Israt Jahan Payel, Bo Song, Yan Li

    Abstract: Blood cancer can only be diagnosed properly if it is detected early. Each year, more than 1.24 million new cases of blood cancer are reported worldwide. There are about 6,000 cancers worldwide due to this disease. The importance of cancer detection and classification has prompted researchers to evaluate Deep Convolutional Neural Networks for the purpose of classifying blood cancers. The objective… ▽ More

    Submitted 12 September, 2024; originally announced October 2024.

  23. arXiv:2409.18695  [pdf, other

    cs.AI cs.CE cs.CL

    KALE-LM: Unleash The Power Of AI For Science Via Knowledge And Logic Enhanced Large Model

    Authors: Weichen Dai, Yezeng Chen, Zijie Dai, Yubo Liu, Zhijie Huang, Yixuan Pan, Baiyang Song, Chengli Zhong, Xinhe Li, Zeyu Wang, Zhuoying Feng, Yi Zhou

    Abstract: Artificial intelligence is gradually demonstrating its immense potential, and increasing attention is being given to how AI can be harnessed to advance scientific research. In this vision paper, we present our perspectives on how AI can better assist scientific inquiry and explore corresponding technical approach. We have proposed and open-sourced two large models of our KALE-LM model series, KALE… ▽ More

    Submitted 7 April, 2025; v1 submitted 27 September, 2024; originally announced September 2024.

  24. arXiv:2409.16728  [pdf, other

    eess.IV cs.CV

    SDCL: Students Discrepancy-Informed Correction Learning for Semi-supervised Medical Image Segmentation

    Authors: Bentao Song, Qingfeng Wang

    Abstract: Semi-supervised medical image segmentation (SSMIS) has been demonstrated the potential to mitigate the issue of limited medical labeled data. However, confirmation and cognitive biases may affect the prevalent teacher-student based SSMIS methods due to erroneous pseudo-labels. To tackle this challenge, we improve the mean teacher approach and propose the Students Discrepancy-Informed Correction Le… ▽ More

    Submitted 4 October, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

    Comments: Accepted at MICCAI 2024

  25. arXiv:2409.12926  [pdf

    cs.CV cs.AI

    MaskMol: Knowledge-guided Molecular Image Pre-Training Framework for Activity Cliffs

    Authors: Zhixiang Cheng, Hongxin Xiang, Pengsen Ma, Li Zeng, Xin Jin, Xixi Yang, Jianxin Lin, Yang Deng, Bosheng Song, Xinxin Feng, Changhui Deng, Xiangxiang Zeng

    Abstract: Activity cliffs, which refer to pairs of molecules that are structurally similar but show significant differences in their potency, can lead to model representation collapse and make the model challenging to distinguish them. Our research indicates that as molecular similarity increases, graph-based methods struggle to capture these nuances, whereas image-based approaches effectively retain the di… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 33 pages, 5 figures

  26. arXiv:2409.06689  [pdf

    eess.IV cs.CV

    A comprehensive study on Blood Cancer detection and classification using Convolutional Neural Network

    Authors: Md Taimur Ahad, Sajib Bin Mamun, Sumaya Mustofa, Bo Song, Yan Li

    Abstract: Over the years in object detection several efficient Convolutional Neural Networks (CNN) networks, such as DenseNet201, InceptionV3, ResNet152v2, SEresNet152, VGG19, Xception gained significant attention due to their performance. Moreover, CNN paradigms have expanded to transfer learning and ensemble models from original CNN architectures. Research studies suggest that transfer learning and ensemb… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  27. arXiv:2409.04937  [pdf, other

    cs.SE

    CONNECTOR: Enhancing the Traceability of Decentralized Bridge Applications via Automatic Cross-chain Transaction Association

    Authors: Dan Lin, Jiajing Wu, Yuxin Su, Ziye Zheng, Yuhong Nan, Qinnan Zhang, Bowen Song, Zibin Zheng

    Abstract: Decentralized bridge applications are important software that connects various blockchains and facilitates cross-chain asset transfer in the decentralized finance (DeFi) ecosystem which currently operates in a multi-chain environment. Cross-chain transaction association identifies and matches unique transactions executed by bridge DApps, which is important research to enhance the traceability of c… ▽ More

    Submitted 19 December, 2024; v1 submitted 7 September, 2024; originally announced September 2024.

  28. arXiv:2408.13335  [pdf, other

    cs.CV

    Latent Space Disentanglement in Diffusion Transformers Enables Zero-shot Fine-grained Semantic Editing

    Authors: Zitao Shuai, Chenwei Wu, Zhengxu Tang, Bowen Song, Liyue Shen

    Abstract: Diffusion Transformers (DiTs) have achieved remarkable success in diverse and high-quality text-to-image(T2I) generation. However, how text and image latents individually and jointly contribute to the semantics of generated images, remain largely unexplored. Through our investigation of DiT's latent space, we have uncovered key findings that unlock the potential for zero-shot fine-grained semantic… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  29. arXiv:2408.10679  [pdf, other

    cs.CV

    DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba

    Authors: Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, Jiantao Zhou

    Abstract: Moire patterns, resulting from the interference of two similar repetitive patterns, are frequently observed during the capture of images or videos on screens. These patterns vary in color, shape, and location across video frames, posing challenges in extracting information from adjacent frames and preserving temporal consistency throughout the restoration process. Existing deep learning methods of… ▽ More

    Submitted 18 November, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  30. arXiv:2408.05136  [pdf, ps, other

    cs.LG

    Cycle-Configuration: A Novel Graph-theoretic Descriptor Set for Molecular Inference

    Authors: Bowen Song, Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu

    Abstract: In this paper, we propose a novel family of descriptors of chemical graphs, named cycle-configuration (CC), that can be used in the standard "two-layered (2L) model" of mol-infer, a molecular inference framework based on mixed integer linear programming (MILP) and machine learning (ML). Proposed descriptors capture the notion of ortho/meta/para patterns that appear in aromatic rings, which has bee… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  31. arXiv:2408.03704  [pdf, ps, other

    cs.CR

    BioDeepHash: Mapping Biometrics into a Stable Code

    Authors: Baogang Song, Dongdong Zhao, Jiang Yan, Huanhuan Li, Hao Jiang

    Abstract: With the wide application of biometrics, more and more attention has been paid to the security of biometric templates. However most of existing biometric template protection (BTP) methods have some security problems, e.g. the problem that protected templates leak part of the original biometric data (exists in Cancelable Biometrics (CB)), the use of error-correcting codes (ECC) leads to decodable a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  32. arXiv:2407.12676  [pdf, other

    cs.CV eess.IV

    CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems

    Authors: Jiankun Zhao, Bowen Song, Liyue Shen

    Abstract: Diffusion models have been demonstrated as strong priors for solving general inverse problems. Most existing Diffusion model-based Inverse Problem Solvers (DIS) employ a plug-and-play approach to guide the sampling trajectory with either projections or gradients. Though effective, these methods generally necessitate hundreds of sampling steps, posing a dilemma between inference time and reconstruc… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  33. arXiv:2407.09030  [pdf, other

    eess.IV cs.CV

    CAMP: Continuous and Adaptive Learning Model in Pathology

    Authors: Anh Tien Nguyen, Keunho Byeon, Kyungeun Kim, Boram Song, Seoung Wan Chae, Jin Tae Kwak

    Abstract: There exist numerous diagnostic tasks in pathology. Conventional computational pathology formulates and tackles them as independent and individual image classification problems, thereby resulting in computational inefficiency and high costs. To address the challenges, we propose a generic, unified, and universal framework, called a continuous and adaptive learning model in pathology (CAMP), for pa… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Under review

  34. arXiv:2407.08503  [pdf, other

    eess.IV cs.CV

    DIOR-ViT: Differential Ordinal Learning Vision Transformer for Cancer Classification in Pathology Images

    Authors: Ju Cheon Lee, Keunho Byeon, Boram Song, Kyungeun Kim, Jin Tae Kwak

    Abstract: In computational pathology, cancer grading has been mainly studied as a categorical classification problem, which does not utilize the ordering nature of cancer grades such as the higher the grade is, the worse the cancer is. To incorporate the ordering relationship among cancer grades, we introduce a differential ordinal learning problem in which we define and learn the degree of difference in th… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  35. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  36. arXiv:2406.10225  [pdf, other

    cs.CV

    SatDiffMoE: A Mixture of Estimation Method for Satellite Image Super-resolution with Latent Diffusion Models

    Authors: Zhaoxu Luo, Bowen Song, Liyue Shen

    Abstract: During the acquisition of satellite images, there is generally a trade-off between spatial resolution and temporal resolution (acquisition frequency) due to the onboard sensors of satellite imaging systems. High-resolution satellite images are very important for land crop monitoring, urban planning, wildfire management and a variety of applications. It is a significant yet challenging task to achi… ▽ More

    Submitted 18 November, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024 Workshop on Advancing Neural Network Training (WANT): Computational Efficiency, Scalability, and Resource Optimization

  37. arXiv:2406.10211  [pdf, other

    cs.CV

    DiffusionBlend: Learning 3D Image Prior through Position-aware Diffusion Score Blending for 3D Computed Tomography Reconstruction

    Authors: Bowen Song, Jason Hu, Zhaoxu Luo, Jeffrey A. Fessler, Liyue Shen

    Abstract: Diffusion models face significant challenges when employed for large-scale medical image reconstruction in real practice such as 3D Computed Tomography (CT). Due to the demanding memory, time, and data requirements, it is difficult to train a diffusion model directly on the entire volume of high-dimensional data to obtain an efficient 3D diffusion prior. Existing works utilizing diffusion priors o… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  38. arXiv:2406.09716  [pdf, ps, other

    cs.CR cs.AI cs.DC cs.LG

    Speed-up of Data Analysis with Kernel Trick in Encrypted Domain

    Authors: Joon Soo Yoo, Baek Kyung Song, Tae Min Ahn, Ji Won Heo, Ji Won Yoon

    Abstract: Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performanc… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Submitted as a preprint

  39. arXiv:2406.02462  [pdf, other

    cs.CV cs.AI

    Learning Image Priors through Patch-based Diffusion Models for Solving Inverse Problems

    Authors: Jason Hu, Bowen Song, Xiaojian Xu, Liyue Shen, Jeffrey A. Fessler

    Abstract: Diffusion models can learn strong image priors from underlying data distribution and use them to solve inverse problems, but the training process is computationally expensive and requires lots of data. Such bottlenecks prevent most existing works from being feasible for high-dimensional and high-resolution data such as 3D images. This paper proposes a method to learn an efficient data prior for th… ▽ More

    Submitted 30 October, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  40. arXiv:2406.01538  [pdf, other

    cs.CL cs.AI

    What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores

    Authors: Ebrahim Feghhi, Nima Hadidi, Bryan Song, Idan A. Blank, Jonathan C. Kao

    Abstract: Given the remarkable capabilities of large language models (LLMs), there has been a growing interest in evaluating their similarity to the human brain. One approach towards quantifying this similarity is by measuring how well a model predicts neural signals, also called "brain score". Internal representations from LLMs achieve state-of-the-art brain scores, leading to speculation that they share c… ▽ More

    Submitted 20 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 figures in the main paper

  41. arXiv:2405.13651  [pdf

    cs.AI cs.RO

    ConcertoRL: An Innovative Time-Interleaved Reinforcement Learning Approach for Enhanced Control in Direct-Drive Tandem-Wing Vehicles

    Authors: Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang

    Abstract: In control problems for insect-scale direct-drive experimental platforms under tandem wing influence, the primary challenge facing existing reinforcement learning models is their limited safety in the exploration process and the stability of the continuous training process. We introduce the ConcertoRL algorithm to enhance control precision and stabilize the online training process, which consists… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 48 pages, 35 figures

    MSC Class: 68T40 ACM Class: I.2.9

  42. arXiv:2405.09819  [pdf

    cs.SE cs.LG

    Automating the Training and Deployment of Models in MLOps by Integrating Systems with Machine Learning

    Authors: Penghao Liang, Bo Song, Xiaoan Zhan, Zhou Chen, Jiaqiang Yuan

    Abstract: This article introduces the importance of machine learning in real-world applications and explores the rise of MLOps (Machine Learning Operations) and its importance for solving challenges such as model deployment and performance monitoring. By reviewing the evolution of MLOps and its relationship to traditional software development methods, the paper proposes ways to integrate the system into mac… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  43. arXiv:2405.06655  [pdf

    q-bio.BM cs.AI cs.LG

    RNA Secondary Structure Prediction Using Transformer-Based Deep Learning Models

    Authors: Yanlin Zhou, Tong Zhan, Yichao Wu, Bo Song, Chenxi Shi

    Abstract: The Human Genome Project has led to an exponential increase in data related to the sequence, structure, and function of biomolecules. Bioinformatics is an interdisciplinary research field that primarily uses computational methods to analyze large amounts of biological macromolecule data. Its goal is to discover hidden biological patterns and related information. Furthermore, analysing additional r… ▽ More

    Submitted 14 April, 2024; originally announced May 2024.

  44. arXiv:2404.03893  [pdf, other

    cs.AI

    KGExplainer: Towards Exploring Connected Subgraph Explanations for Knowledge Graph Completion

    Authors: Tengfei Ma, Xiang song, Wen Tao, Mufei Li, Jiani Zhang, Xiaoqin Pan, Jianxin Lin, Bosheng Song, xiangxiang Zeng

    Abstract: Knowledge graph completion (KGC) aims to alleviate the inherent incompleteness of knowledge graphs (KGs), which is a critical task for various applications, such as recommendations on the web. Although knowledge graph embedding (KGE) models have demonstrated superior predictive performance on KGC tasks, these models infer missing links in a black-box manner that lacks transparency and accountabili… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 13 pages, 7 figures, 11 tables. Under Review

  45. arXiv:2403.09962  [pdf

    cs.CV

    ViTCN: Vision Transformer Contrastive Network For Reasoning

    Authors: Bo Song, Yuanhao Xu, Yichao Wu

    Abstract: Machine learning models have achieved significant milestones in various domains, for example, computer vision models have an exceptional result in object recognition, and in natural language processing, where Large Language Models (LLM) like GPT can start a conversation with human-like proficiency. However, abstract reasoning remains a challenge for these models, Can AI really thinking like a huma… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 5 pages, 2 figures , in proceeding of 5th International Seminar on Artificial Intelligence, Networking and Information Technology

  46. arXiv:2402.06423  [pdf, other

    cs.CV

    CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention

    Authors: Yifeng Bai, Zhirong Chen, Pengpeng Liang, Bo Song, Erkang Cheng

    Abstract: In autonomous driving, accurate 3D lane detection using monocular cameras is important for downstream tasks. Recent CNN and Transformer approaches usually apply a two-stage model design. The first stage transforms the image feature from a front image into a bird's-eye-view (BEV) representation. Subsequently, a sub-network processes the BEV feature to generate the 3D detection results. However, the… ▽ More

    Submitted 16 March, 2025; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2209.07989

  47. Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization

    Authors: Bo Yang, Chen Wang, Xiaoshuang Ma, Beiping Song, Zhuang Liu, Fangde Sun

    Abstract: Effectively and efficiently retrieving images from remote sensing databases is a critical challenge in the realm of remote sensing big data. Utilizing hand-drawn sketches as retrieval inputs offers intuitive and user-friendly advantages, yet the potential of multi-level feature integration from sketches remains underexplored, leading to suboptimal retrieval performance. To address this gap, our st… ▽ More

    Submitted 15 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: 44 pages, 6 figures

    Journal ref: Remote Sens. 2024, 16, 1653

  48. arXiv:2401.00241  [pdf

    cs.CV

    Image Super-resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features

    Authors: Yuming Huang, Yingpin Chen, Changhui Wu, Hanrong Xie, Binhui Song, Hui Wang

    Abstract: The Swin Transformer image super-resolution reconstruction network only relies on the long-range relationship of window attention and shifted window attention to explore features. This mechanism has two limitations. On the one hand, it only focuses on global features while ignoring local features. On the other hand, it is only concerned with spatial feature interactions while ignoring channel feat… ▽ More

    Submitted 5 April, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

  49. arXiv:2312.09063  [pdf, other

    eess.IV cs.CV

    Image Demoireing in RAW and sRGB Domains

    Authors: Shuning Xu, Binbin Song, Xiangyu Chen, Xina Liu, Jiantao Zhou

    Abstract: Moire patterns frequently appear when capturing screens with smartphones or cameras, potentially compromising image quality. Previous studies suggest that moire pattern elimination in the RAW domain offers greater effectiveness compared to demoireing in the sRGB domain. Nevertheless, relying solely on RAW data for image demoireing is insufficient in mitigating the color cast due to the absence of… ▽ More

    Submitted 18 November, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted in ECCV'24

  50. Learning to Denoise Biomedical Knowledge Graph for Robust Molecular Interaction Prediction

    Authors: Tengfei Ma, Yujie Chen, Wen Tao, Dashun Zheng, Xuan Lin, Patrick Cheong-lao Pang, Yiping Liu, Yijun Wang, Longyue Wang, Bosheng Song, Xiangxiang Zeng, Philip S. Yu

    Abstract: Molecular interaction prediction plays a crucial role in forecasting unknown interactions between molecules, such as drug-target interaction (DTI) and drug-drug interaction (DDI), which are essential in the field of drug discovery and therapeutics. Although previous prediction methods have yielded promising results by leveraging the rich semantics and topological structure of biomedical knowledge… ▽ More

    Submitted 22 October, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: 13 pages, Accepted at TKDE