Skip to main content

Showing 1–50 of 73 results for author: Feng, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.09114  [pdf, other

    cs.LG

    TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval

    Authors: Jialin Chen, Ziyu Zhao, Gaukhar Nurbek, Aosong Feng, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, Rex Ying

    Abstract: The ubiquity of dynamic data in domains such as weather, healthcare, and energy underscores a growing need for effective interpretation and retrieval of time-series data. These data are inherently tied to domain-specific contexts, such as clinical notes or weather narratives, making cross-modal retrieval essential not only for downstream tasks but also for developing robust time-series foundation… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  2. arXiv:2505.19590  [pdf, other

    cs.LG cs.CL

    Learning to Reason without External Rewards

    Authors: Xuandong Zhao, Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song

    Abstract: Training large language models (LLMs) for complex reasoning via Reinforcement Learning with Verifiable Rewards (RLVR) is effective but limited by reliance on costly, domain-specific supervision. We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data. We propose Intuitor, an RLIF method… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  3. arXiv:2505.16186  [pdf, ps, other

    cs.AI cs.CL cs.CR

    SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

    Authors: Kaiwen Zhou, Xuandong Zhao, Gaowen Liu, Jayanth Srinivasa, Aosong Feng, Dawn Song, Xin Eric Wang

    Abstract: Large Reasoning Models (LRMs) introduce a new generation paradigm of explicitly reasoning before answering, leading to remarkable improvements in complex tasks. However, they pose great safety risks against harmful queries and adversarial attacks. While recent mainstream safety efforts on LRMs, supervised fine-tuning (SFT), improve safety performance, we find that SFT-aligned models struggle to ge… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  4. arXiv:2504.10816  [pdf, other

    cs.IR cs.CL

    CSPLADE: Learned Sparse Retrieval with Causal Language Models

    Authors: Zhichao Xu, Aosong Feng, Yijun Tian, Haibo Ding, Lin Lee Cheong

    Abstract: In recent years, dense retrieval has been the focus of information retrieval (IR) research. While effective, dense retrieval produces uninterpretable dense vectors, and suffers from the drawback of large index size. Learned sparse retrieval (LSR) has emerged as promising alternative, achieving competitive retrieval performance while also being able to leverage the classical inverted index data str… ▽ More

    Submitted 16 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  5. arXiv:2504.03909  [pdf, other

    cs.CR cs.DC cs.ET

    Secure Federated XGBoost with CUDA-accelerated Homomorphic Encryption via NVIDIA FLARE

    Authors: Ziyue Xu, Yuan-Ting Hsieh, Zhihong Zhang, Holger R. Roth, Chester Chen, Yan Cheng, Andrew Feng

    Abstract: Federated learning (FL) enables collaborative model training across decentralized datasets. NVIDIA FLARE's Federated XGBoost extends the popular XGBoost algorithm to both vertical and horizontal federated settings, facilitating joint model development without direct data sharing. However, the initial implementation assumed mutual trust over the sharing of intermediate gradient statistics produced… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  6. arXiv:2504.00609  [pdf, other

    cs.CV cs.LG

    Bi-Grid Reconstruction for Image Anomaly Detection

    Authors: Huichuan Huang, Zhiqing Zhong, Guangyu Wei, Yonghao Wan, Wenlong Sun, Aimin Feng

    Abstract: In image anomaly detection, significant advancements have been made using un- and self-supervised methods with datasets containing only normal samples. However, these approaches often struggle with fine-grained anomalies. This paper introduces \textbf{GRAD}: Bi-\textbf{G}rid \textbf{R}econstruction for Image \textbf{A}nomaly \textbf{D}etection, which employs two continuous grids to enhance anomaly… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  7. arXiv:2503.19786  [pdf, other

    cs.CL cs.AI

    Gemma 3 Technical Report

    Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

    Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  8. arXiv:2503.16858  [pdf, other

    cs.CL cs.AI

    MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering

    Authors: Jialin Chen, Aosong Feng, Ziyu Zhao, Juan Garza, Gaukhar Nurbek, Cheng Qin, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, Rex Ying

    Abstract: Understanding the relationship between textual news and time-series evolution is a critical yet under-explored challenge in applied data science. While multimodal learning has gained traction, existing multimodal time-series datasets fall short in evaluating cross-modal reasoning and complex question answering, which are essential for capturing complex interactions between narrative information an… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 14 pages

  9. arXiv:2503.10497  [pdf, other

    cs.CL

    MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation

    Authors: Weihao Xuan, Rui Yang, Heli Qi, Qingcheng Zeng, Yunze Xiao, Aosong Feng, Dairui Liu, Yun Xing, Junjue Wang, Fan Gao, Jinghui Lu, Yuang Jiang, Huitao Li, Xin Li, Kunyu Yu, Ruihai Dong, Shangding Gu, Yuekang Li, Xiaofei Xie, Felix Juefei-Xu, Foutse Khomh, Osamu Yoshie, Qingyu Chen, Douglas Teodoro, Nan Liu , et al. (7 additional authors not shown)

    Abstract: Existing large language model (LLM) evaluation benchmarks primarily focus on English, while current multilingual tasks lack parallel questions that specifically assess cross-linguistic reasoning abilities. This dual limitation makes it challenging to comprehensively assess LLMs' performance in the multilingual setting. To fill this gap, we introduce MMLU-ProX, a comprehensive benchmark covering 29… ▽ More

    Submitted 26 May, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  10. arXiv:2503.08933  [pdf, other

    cs.CV

    PromptGAR: Flexible Promptive Group Activity Recognition

    Authors: Zhangyu Jin, Andrew Feng, Ankur Chemburkar, Celso M. De Melo

    Abstract: We present PromptGAR, a novel framework that addresses the limitations of current Group Activity Recognition (GAR) approaches by leveraging multi-modal prompts to achieve both input flexibility and high recognition accuracy. The existing approaches suffer from limited real-world applicability due to their reliance on full prompt annotations, the lack of long-term actor consistency, and under-explo… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  11. arXiv:2502.15786  [pdf, ps, other

    q-bio.NC cs.AI cs.LG eess.SP

    MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding

    Authors: Weikang Qiu, Zheng Huang, Haoyu Hu, Aosong Feng, Yujun Yan, Rex Ying

    Abstract: Decoding functional magnetic resonance imaging (fMRI) signals into text has been a key challenge in the neuroscience community, with the potential to advance brain-computer interfaces and uncover deeper insights into brain mechanisms. However, existing approaches often struggle with suboptimal predictive performance, limited task variety, and poor generalization across subjects. In response to thi… ▽ More

    Submitted 6 June, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: Forty-Second International Conference on Machine Learning (ICML 2025)

  12. arXiv:2502.06146  [pdf, other

    cs.LG cs.AI

    Guided Exploration for Efficient Relational Model Learning

    Authors: Annie Feng, Nishanth Kumar, Tomas Lozano-Perez, Leslie Pack-Kaelbling

    Abstract: Efficient exploration is critical for learning relational models in large-scale environments with complex, long-horizon tasks. Random exploration methods often collect redundant or irrelevant data, limiting their ability to learn accurate relational models of the environment. Goal-literal babbling (GLIB) improves upon random exploration by setting and planning to novel goals, but its reliance on r… ▽ More

    Submitted 10 May, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

  13. arXiv:2501.18630  [pdf, other

    cs.CV cs.GR

    Deformable Beta Splatting

    Authors: Rong Liu, Dylan Sun, Meida Chen, Yue Wang, Andrew Feng

    Abstract: 3D Gaussian Splatting (3DGS) has advanced radiance field reconstruction by enabling real-time rendering. However, its reliance on Gaussian kernels for geometry and low-order Spherical Harmonics (SH) for color encoding limits its ability to capture complex geometries and diverse colors. We introduce Deformable Beta Splatting (DBS), a deformable and compact approach that enhances both geometry and c… ▽ More

    Submitted 6 May, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: SIGGRAPH 2025

  14. SplatMAP: Online Dense Monocular SLAM with 3D Gaussian Splatting

    Authors: Yue Hu, Rong Liu, Meida Chen, Peter Beerel, Andrew Feng

    Abstract: Achieving high-fidelity 3D reconstruction from monocular video remains challenging due to the inherent limitations of traditional methods like Structure-from-Motion (SfM) and monocular SLAM in accurately capturing scene details. While differentiable rendering techniques such as Neural Radiance Fields (NeRF) address some of these challenges, their high computational costs make them unsuitable for r… ▽ More

    Submitted 11 April, 2025; v1 submitted 12 January, 2025; originally announced January 2025.

    ACM Class: I.4

    Journal ref: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D '25) May 2025

  15. arXiv:2412.13163  [pdf, other

    cs.DC cs.IR

    C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System

    Authors: Parker Addison, Minh-Tuan H. Nguyen, Tomislav Medan, Jinali Shah, Mohammad T. Manzari, Brendan McElrone, Laksh Lalwani, Aboli More, Smita Sharma, Holger R. Roth, Isaac Yang, Chester Chen, Daguang Xu, Yan Cheng, Andrew Feng, Ziyue Xu

    Abstract: Organizations seeking to utilize Large Language Models (LLMs) for knowledge querying and analysis often encounter challenges in maintaining an LLM fine-tuned on targeted, up-to-date information that keeps answers relevant and grounded. Retrieval Augmented Generation (RAG) has quickly become a feasible solution for organizations looking to overcome the challenges of maintaining proprietary models a… ▽ More

    Submitted 18 December, 2024; v1 submitted 17 December, 2024; originally announced December 2024.

  16. arXiv:2412.10902  [pdf

    cs.CV

    Enhancing Road Crack Detection Accuracy with BsS-YOLO: Optimizing Feature Fusion and Attention Mechanisms

    Authors: Jiaze Tang, Angzehua Feng, Vladimir Korkhov, Yuxi Pu

    Abstract: Effective road crack detection is crucial for road safety, infrastructure preservation, and extending road lifespan, offering significant economic benefits. However, existing methods struggle with varied target scales, complex backgrounds, and low adaptability to different environments. This paper presents the BsS-YOLO model, which optimizes multi-scale feature fusion through an enhanced Path Aggr… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

    MSC Class: I.2.10

  17. arXiv:2412.06268  [pdf

    cs.CV

    Open-Vocabulary High-Resolution 3D (OVHR3D) Data Segmentation and Annotation Framework

    Authors: Jiuyi Xu, Meida Chen, Andrew Feng, Zifan Yu, Yangming Shi

    Abstract: In the domain of the U.S. Army modeling and simulation, the availability of high quality annotated 3D data is pivotal to creating virtual environments for training and simulations. Traditional methodologies for 3D semantic and instance segmentation, such as KpConv, RandLA, Mask3D, etc., are designed to train on extensive labeled datasets to obtain satisfactory performance in practical tasks. This… ▽ More

    Submitted 18 December, 2024; v1 submitted 9 December, 2024; originally announced December 2024.

    Journal ref: Interservice/Industry Training, Simulation and Education Conference (2024)

  18. arXiv:2410.20926  [pdf, other

    cs.CL

    Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning

    Authors: Aosong Feng, Rex Ying, Leandros Tassiulas

    Abstract: As the demand for processing extended textual data grows, the ability to handle long-range dependencies and maintain computational efficiency is more critical than ever. One of the key issues for long-sequence modeling using attention-based model is the mismatch between the limited-range modeling power of full attention and the long-range token dependency in the input sequence. In this work, we pr… ▽ More

    Submitted 22 May, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

  19. arXiv:2410.17600  [pdf, other

    cs.CL cs.AI cs.DB

    Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective

    Authors: Rui Yang, Boming Yang, Aosong Feng, Sixun Ouyang, Moritz Blum, Tianwei She, Yuang Jiang, Freddy Lecue, Jinghui Lu, Irene Li

    Abstract: Knowledge Graphs (KGs) are crucial in the field of artificial intelligence and are widely used in downstream tasks, such as question-answering (QA). The construction of KGs typically requires significant effort from domain experts. Large Language Models (LLMs) have recently been used for Knowledge Graph Construction (KGC). However, most existing approaches focus on a local perspective, extracting… ▽ More

    Submitted 3 February, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2407.10794

  20. arXiv:2410.04010  [pdf, other

    cs.LG cs.AI cs.CL cs.NE

    Hyperbolic Fine-tuning for Large Language Models

    Authors: Menglin Yang, Aosong Feng, Bo Xiong, Jihong Liu, Irwin King, Rex Ying

    Abstract: Large language models (LLMs) have demonstrated remarkable performance on various tasks. However, it remains an open question whether the default Euclidean space is the most suitable choice for embedding tokens in LLMs. In this study, we first investigate the non-Euclidean characteristics of LLMs. Our findings reveal that token frequency follows a power-law distribution, with high-frequency tokens… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: The preliminary work was accepted for the ICML 2024 LLM Cognition Workshop, and this version includes new investigations, analyses, experiments, and results

  21. arXiv:2409.16685  [pdf, other

    cs.CV

    Skyeyes: Ground Roaming using Aerial View Images

    Authors: Zhiyuan Gao, Wenbin Teng, Gonglin Chen, Jinsen Wu, Ningli Xu, Rongjun Qin, Andrew Feng, Yajie Zhao

    Abstract: Integrating aerial imagery-based scene generation into applications like autonomous driving and gaming enhances realism in 3D environments, but challenges remain in creating detailed content for occluded areas and ensuring real-time, consistent rendering. In this paper, we introduce Skyeyes, a novel framework that can generate photorealistic sequences of ground view images using only aerial view i… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  22. arXiv:2409.02310  [pdf, other

    cs.CV

    Geometry-Aware Feature Matching for Large-Scale Structure from Motion

    Authors: Gonglin Chen, Jinsen Wu, Haiwei Chen, Wenbin Teng, Zhiyuan Gao, Andrew Feng, Rongjun Qin, Yajie Zhao

    Abstract: Establishing consistent and dense correspondences across multiple images is crucial for Structure from Motion (SfM) systems. Significant view changes, such as air-to-ground with very sparse view overlap, pose an even greater challenge to the correspondence solvers. We present a novel optimization-based approach that significantly enhances existing feature matching methods by introducing geometry c… ▽ More

    Submitted 12 May, 2025; v1 submitted 3 September, 2024; originally announced September 2024.

  23. arXiv:2407.10794  [pdf, other

    cs.CL cs.AI

    Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education

    Authors: Rui Yang, Boming Yang, Sixun Ouyang, Tianwei She, Aosong Feng, Yuang Jiang, Freddy Lecue, Jinghui Lu, Irene Li

    Abstract: Knowledge graphs (KGs) are crucial in the field of artificial intelligence and are widely applied in downstream tasks, such as enhancing Question Answering (QA) systems. The construction of KGs typically requires significant effort from domain experts. Recently, Large Language Models (LLMs) have been used for knowledge graph construction (KGC), however, most existing approaches focus on a local pe… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 24 pages, 11 figures, 13 tables. arXiv admin note: substantial text overlap with arXiv:2402.14293

  24. arXiv:2407.04236  [pdf, other

    cs.LG

    Graph Pooling via Ricci Flow

    Authors: Amy Feng, Melanie Weber

    Abstract: Graph Machine Learning often involves the clustering of nodes based on similarity structure encoded in the graph's topology and the nodes' attributes. On homophilous graphs, the integration of pooling layers has been shown to enhance the performance of Graph Neural Networks by accounting for inherent multi-scale structure. Here, similar nodes are grouped together to coarsen the graph and reduce th… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 32 pages, 7 figures

  25. arXiv:2407.00031  [pdf, other

    cs.DC cs.SE

    Supercharging Federated Learning with Flower and NVIDIA FLARE

    Authors: Holger R. Roth, Daniel J. Beutel, Yan Cheng, Javier Fernandez Marques, Heng Pan, Chester Chen, Zhihong Zhang, Yuhong Wen, Sean Yang, Isaac, Yang, Yuan-Ting Hsieh, Ziyue Xu, Daguang Xu, Nicholas D. Lane, Andrew Feng

    Abstract: Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in re… ▽ More

    Submitted 22 July, 2024; v1 submitted 21 May, 2024; originally announced July 2024.

    Comments: Added a figure comparing running a Flower application natively or within FLARE

  26. arXiv:2406.12072  [pdf, other

    cs.AI cs.LG

    DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

    Authors: Jiasheng Zhang, Jialin Chen, Menglin Yang, Aosong Feng, Shuang Liang, Jie Shao, Rex Ying

    Abstract: Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios, where each node and edge are associated with text descriptions, and both the graph structure and text descriptions evolve over time. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs, which hinders the potential advancement in many research fields. To address… ▽ More

    Submitted 4 November, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 28 pages, 13 figures, camera-ready version for NeurIPS 2024 Datasets and Benchmarks Track

  27. CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems

    Authors: Yanlin Feng, Sajjadur Rahman, Aaron Feng, Vincent Chen, Eser Kandogan

    Abstract: Compound AI systems (CASs) that employ LLMs as agents to accomplish knowledge-intensive tasks via interactions with tools and data retrievers have garnered significant interest within database and AI communities. While these systems have the potential to supplement typical analysis workflows of data analysts in enterprise data platforms, unfortunately, CASs are subject to the same data discovery c… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Governance, Understanding and Integration of Data for Effective and Responsible AI (GUIDE-AI '24), June 14, 2024, Santiago, AA, Chile

  28. arXiv:2405.12369  [pdf, other

    cs.CV

    AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field

    Authors: Rong Liu, Rui Xu, Yue Hu, Meida Chen, Andrew Feng

    Abstract: 3D Gaussian Splatting (3DGS) has recently advanced radiance field reconstruction by offering superior capabilities for novel view synthesis and real-time rendering speed. However, its strategy of blending optimization and adaptive density control might lead to sub-optimal results; it can sometimes yield noisy geometry and blurry artifacts due to prioritizing optimizing large Gaussians at the cost… ▽ More

    Submitted 9 November, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: BMVC 2024

  29. arXiv:2404.01340  [pdf, other

    cs.LG cs.AI

    From Similarity to Superiority: Channel Clustering for Time Series Forecasting

    Authors: Jialin Chen, Jan Eric Lenssen, Aosong Feng, Weihua Hu, Matthias Fey, Leandros Tassiulas, Jure Leskovec, Rex Ying

    Abstract: Time series forecasting has attracted significant attention in recent decades. Previous studies have demonstrated that the Channel-Independent (CI) strategy improves forecasting performance by treating different channels individually, while it leads to poor generalization on unseen instances and ignores potentially necessary interactions between channels. Conversely, the Channel-Dependent (CD) str… ▽ More

    Submitted 6 November, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: NeurIPS 2024

  30. arXiv:2403.10585  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Solving General Noisy Inverse Problem via Posterior Sampling: A Policy Gradient Viewpoint

    Authors: Haoyue Tang, Tian Xie, Aosong Feng, Hanyu Wang, Chenyang Zhang, Yang Bai

    Abstract: Solving image inverse problems (e.g., super-resolution and inpainting) requires generating a high fidelity image that matches the given input (the low-resolution image or the masked image). By using the input image as guidance, we can leverage a pretrained diffusion generative model to solve a wide range of image inverse tasks without task specific model fine-tuning. To precisely estimate the guid… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted and to Appear, AISTATS 2024

  31. arXiv:2403.04882  [pdf, other

    cs.LG

    Efficient High-Resolution Time Series Classification via Attention Kronecker Decomposition

    Authors: Aosong Feng, Jialin Chen, Juan Garza, Brooklyn Berry, Francisco Salazar, Yifeng Gao, Rex Ying, Leandros Tassiulas

    Abstract: The high-resolution time series classification problem is essential due to the increasing availability of detailed temporal data in various domains. To tackle this challenge effectively, it is imperative that the state-of-the-art attention model is scalable to accommodate the growing sequence lengths typically encountered in high-resolution time series data, while also demonstrating robustness in… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  32. arXiv:2403.04880  [pdf, other

    cs.CV

    An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control

    Authors: Aosong Feng, Weikang Qiu, Jinbin Bai, Xiao Zhang, Zhen Dong, Kaicheng Zhou, Rex Ying, Leandros Tassiulas

    Abstract: Building on the success of text-to-image diffusion models (DPMs), image editing is an important application to enable human interaction with AI-generated content. Among various editing methods, editing within the prompt space gains more attention due to its capacity and simplicity of controlling semantics. However, since diffusion models are commonly pretrained on descriptive text captions, direct… ▽ More

    Submitted 26 January, 2025; v1 submitted 7 March, 2024; originally announced March 2024.

  33. arXiv:2402.14293  [pdf, other

    cs.CL

    Leveraging Large Language Models for Concept Graph Recovery and Question Answering in NLP Education

    Authors: Rui Yang, Boming Yang, Sixun Ouyang, Tianwei She, Aosong Feng, Yuang Jiang, Freddy Lecue, Jinghui Lu, Irene Li

    Abstract: In the domain of Natural Language Processing (NLP), Large Language Models (LLMs) have demonstrated promise in text-generation tasks. However, their educational applications, particularly for domain-specific queries, remain underexplored. This study investigates LLMs' capabilities in educational scenarios, focusing on concept graph recovery and question-answering (QA). We assess LLMs' zero-shot per… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  34. arXiv:2402.07792  [pdf, other

    cs.LG cs.DC

    Empowering Federated Learning for Massive Models with NVIDIA FLARE

    Authors: Holger R. Roth, Ziyue Xu, Yuan-Ting Hsieh, Adithya Renduchintala, Isaac Yang, Zhihong Zhang, Yuhong Wen, Sean Yang, Kevin Lu, Kristopher Kersten, Camir Ricketts, Daguang Xu, Chester Chen, Yan Cheng, Andrew Feng

    Abstract: In the ever-evolving landscape of artificial intelligence (AI) and large language models (LLMs), handling and leveraging data effectively has become a critical challenge. Most state-of-the-art machine learning algorithms are data-centric. However, as the lifeblood of model performance, necessary data cannot always be centralized due to various factors such as privacy, regulation, geopolitics, copy… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  35. arXiv:2310.16002  [pdf, other

    cs.CV

    Integrating View Conditions for Image Synthesis

    Authors: Jinbin Bai, Zhen Dong, Aosong Feng, Xiao Zhang, Tian Ye, Kaicheng Zhou

    Abstract: In the field of image processing, applying intricate semantic modifications within existing images remains an enduring challenge. This paper introduces a pioneering framework that integrates viewpoint information to enhance the control of image editing tasks, especially for interior design scenes. By surveying existing object editing methodologies, we distill three essential criteria -- consistenc… ▽ More

    Submitted 8 May, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted by IJCAI 2024

  36. arXiv:2309.10011  [pdf, other

    cs.CV eess.IV

    Universal Photorealistic Style Transfer: A Lightweight and Adaptive Approach

    Authors: Rong Liu, Enyu Zhao, Zhiyuan Liu, Andrew Feng, Scott John Easley

    Abstract: Photorealistic style transfer aims to apply stylization while preserving the realism and structure of input content. However, existing methods often encounter challenges such as color tone distortions, dependency on pair-wise pre-training, inefficiency with high-resolution inputs, and the need for additional constraints in video style transfer tasks. To address these issues, we propose a Universal… ▽ More

    Submitted 20 November, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

  37. arXiv:2308.13420  [pdf, other

    cs.NE cs.AI cs.LG

    Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and Research Opportunities

    Authors: Yanjie Song, Yutong Wu, Yangyang Guo, Ran Yan, P. N. Suganthan, Yue Zhang, Witold Pedrycz, Swagatam Das, Rammohan Mallipeddi, Oladayo Solomon Ajani. Qiang Feng

    Abstract: Evolutionary algorithms (EA), a class of stochastic search methods based on the principles of natural evolution, have received widespread acclaim for their exceptional performance in various real-world optimization problems. While researchers worldwide have proposed a wide variety of EAs, certain limitations remain, such as slow convergence speed and poor generalization capabilities. Consequently,… ▽ More

    Submitted 27 January, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: 28 pages, 16 figures

    Report number: SWEVO-S-2023-00771

  38. arXiv:2307.15208  [pdf, other

    eess.IV cs.CV

    Generative AI for Medical Imaging: extending the MONAI Framework

    Authors: Walter H. L. Pinaya, Mark S. Graham, Eric Kerfoot, Petru-Daniel Tudosiu, Jessica Dafflon, Virginia Fernandez, Pedro Sanchez, Julia Wolleb, Pedro F. da Costa, Ashay Patel, Hyungjin Chung, Can Zhao, Wei Peng, Zelong Liu, Xueyan Mei, Oeslle Lucena, Jong Chul Ye, Sotirios A. Tsaftaris, Prerna Dogra, Andrew Feng, Marc Modat, Parashkev Nachev, Sebastien Ourselin, M. Jorge Cardoso

    Abstract: Recent advances in generative AI have brought incredible breakthroughs in several areas, including medical imaging. These generative models have tremendous potential not only to help safely share medical data via synthetic datasets but also to perform an array of diverse applications, such as anomaly detection, image-to-image translation, denoising, and MRI reconstruction. However, due to the comp… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  39. arXiv:2307.13560  [pdf, other

    cs.CL

    XDLM: Cross-lingual Diffusion Language Model for Machine Translation

    Authors: Linyao Chen, Aosong Feng, Boming Yang, Zihui Li

    Abstract: Recently, diffusion models have excelled in image generation tasks and have also been applied to neural language processing (NLP) for controllable text generation. However, the application of diffusion models in a cross-lingual setting is less unexplored. Additionally, while pretraining with diffusion models has been studied within a single language, the potential of cross-lingual pretraining rema… ▽ More

    Submitted 30 July, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

  40. arXiv:2307.05780  [pdf

    cs.CV

    Automated Artifact Detection in Ultra-widefield Fundus Photography of Patients with Sickle Cell Disease

    Authors: Anqi Feng, Dimitri Johnson, Grace R. Reilly, Loka Thangamathesvaran, Ann Nampomba, Mathias Unberath, Adrienne W. Scott, Craig Jones

    Abstract: Importance: Ultra-widefield fundus photography (UWF-FP) has shown utility in sickle cell retinopathy screening; however, image artifact may diminish quality and gradeability of images. Objective: To create an automated algorithm for UWF-FP artifact classification. Design: A neural network based automated artifact detection algorithm was designed to identify commonly encountered UWF-FP artifacts in… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  41. arXiv:2305.10655  [pdf, other

    eess.IV cs.CV cs.LG

    DeepEdit: Deep Editable Learning for Interactive Segmentation of 3D Medical Images

    Authors: Andres Diaz-Pinto, Pritesh Mehta, Sachidanand Alle, Muhammad Asad, Richard Brown, Vishwesh Nath, Alvin Ihsani, Michela Antonelli, Daniel Palkovics, Csaba Pinter, Ron Alkalay, Steve Pieper, Holger R. Roth, Daguang Xu, Prerna Dogra, Tom Vercauteren, Andrew Feng, Abood Quraini, Sebastien Ourselin, M. Jorge Cardoso

    Abstract: Automatic segmentation of medical images is a key step for diagnostic and interventional tasks. However, achieving this requires large amounts of annotated volumes, which can be tedious and time-consuming task for expert annotators. In this paper, we introduce DeepEdit, a deep learning-based method for volumetric medical image annotation, that allows automatic and semi-automatic segmentation, and… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  42. arXiv:2305.03319  [pdf, other

    cs.CL

    HiPool: Modeling Long Documents Using Graph Neural Networks

    Authors: Irene Li, Aosong Feng, Dragomir Radev, Rex Ying

    Abstract: Encoding long sequences in Natural Language Processing (NLP) is a challenging problem. Though recent pretraining language models achieve satisfying performances in many NLP tasks, they are still restricted by a pre-defined maximum length, making them challenging to be extended to longer sequences. So some recent works utilize hierarchies to model long sequences. However, most of them apply sequent… ▽ More

    Submitted 14 May, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Journal ref: ACL 2023 main proceedings

  43. FIREBALL: A Dataset of Dungeons and Dragons Actual-Play with Structured Game State Information

    Authors: Andrew Zhu, Karmanya Aggarwal, Alexander Feng, Lara J. Martin, Chris Callison-Burch

    Abstract: Dungeons & Dragons (D&D) is a tabletop roleplaying game with complex natural language interactions between players and hidden state information. Recent work has shown that large language models (LLMs) that have access to state information can generate higher quality game turns than LLMs that use dialog history alone. However, previous work used game state information that was heuristically created… ▽ More

    Submitted 25 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 21 pages, 2 figures. Accepted at ACL 2023

    Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 4171-4193

  44. arXiv:2303.12822  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Co-Speech Gesture Synthesis using Discrete Gesture Token Learning

    Authors: Shuhong Lu, Youngwoo Yoon, Andrew Feng

    Abstract: Synthesizing realistic co-speech gestures is an important and yet unsolved problem for creating believable motions that can drive a humanoid robot to interact and communicate with human users. Such capability will improve the impressions of the robots by human users and will find applications in education, training, and medical services. One challenge in learning the co-speech gesture model is tha… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: 8 pages, 3 figures, 3 tables

  45. Author as Character and Narrator: Deconstructing Personal Narratives from the r/AmITheAsshole Reddit Community

    Authors: Salvatore Giorgi, Ke Zhao, Alexander H. Feng, Lara J. Martin

    Abstract: In the r/AmITheAsshole subreddit, people anonymously share first person narratives that contain some moral dilemma or conflict and ask the community to judge who is at fault (i.e., who is "the asshole"). In general, first person narratives are a unique storytelling domain where the author is the narrator (the person telling the story) but can also be a character (the person living the story) and,… ▽ More

    Submitted 15 March, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: Accepted to the 17th International AAAI Conference on Web and Social Media (ICWSM), 2023

    Journal ref: Proceedings of the International AAAI Conference on Web and Social Media (ICWSM) 2023, 17(1), 233-244

  46. arXiv:2301.06114  [pdf, other

    eess.IV cs.LG

    Segmenting thalamic nuclei from manifold projections of multi-contrast MRI

    Authors: Chang Yan, Muhan Shao, Zhangxing Bian, Anqi Feng, Yuan Xue, Jiachen Zhuo, Rao P. Gullapalli, Aaron Carass, Jerry L. Prince

    Abstract: The thalamus is a subcortical gray matter structure that plays a key role in relaying sensory and motor signals within the brain. Its nuclei can atrophy or otherwise be affected by neurological disease and injuries including mild traumatic brain injury. Segmenting both the thalamus and its nuclei is challenging because of the relatively low contrast within and around the thalamus in conventional m… ▽ More

    Submitted 31 January, 2023; v1 submitted 15 January, 2023; originally announced January 2023.

    Comments: 8 pages, 3 figures, 2023 SPIE-MI Image Processing

  47. arXiv:2211.02701  [pdf, other

    cs.LG cs.AI cs.CV

    MONAI: An open-source framework for deep learning in healthcare

    Authors: M. Jorge Cardoso, Wenqi Li, Richard Brown, Nic Ma, Eric Kerfoot, Yiheng Wang, Benjamin Murrey, Andriy Myronenko, Can Zhao, Dong Yang, Vishwesh Nath, Yufan He, Ziyue Xu, Ali Hatamizadeh, Andriy Myronenko, Wentao Zhu, Yun Liu, Mingxin Zheng, Yucheng Tang, Isaac Yang, Michael Zephyr, Behrooz Hashemian, Sachidanand Alle, Mohammad Zalbagi Darestani, Charlie Budd , et al. (32 additional authors not shown)

    Abstract: Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geo… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: www.monai.io

  48. arXiv:2210.13291  [pdf, other

    cs.LG cs.AI cs.CV cs.NI cs.SE

    NVIDIA FLARE: Federated Learning from Simulation to Real-World

    Authors: Holger R. Roth, Yan Cheng, Yuhong Wen, Isaac Yang, Ziyue Xu, Yuan-Ting Hsieh, Kristopher Kersten, Ahmed Harouni, Can Zhao, Kevin Lu, Zhihong Zhang, Wenqi Li, Andriy Myronenko, Dong Yang, Sean Yang, Nicola Rieke, Abood Quraini, Chester Chen, Daguang Xu, Nic Ma, Prerna Dogra, Mona Flores, Andrew Feng

    Abstract: Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and… ▽ More

    Submitted 28 April, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted at the International Workshop on Federated Learning, NeurIPS 2022, New Orleans, USA (https://federated-learning.org/fl-neurips-2022); Revised version v2: added Key Components list, system metrics for homomorphic encryption experiment; Extended v3 for journal submission

    Journal ref: IEEE Data Eng. Bull., Vol. 46, No. 1, 2023

  49. arXiv:2210.11794  [pdf, other

    cs.LG cs.CL

    Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences

    Authors: Aosong Feng, Irene Li, Yuang Jiang, Rex Ying

    Abstract: Efficient Transformers have been developed for long sequence modeling, due to their subquadratic memory and time complexity. Sparse Transformer is a popular approach to improving the efficiency of Transformers by restricting self-attention to locations specified by the predefined sparse patterns. However, leveraging sparsity may sacrifice expressiveness compared to full-attention, when important t… ▽ More

    Submitted 31 January, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

  50. arXiv:2210.03534  [pdf, other

    cs.NI

    A Quantitative Theory of Bottleneck Structures for Data Networks

    Authors: Jordi Ros-Giralt, Noah Amsel, Sruthi Yellamraju, James Ezick, Richard Lethin, Yuang Jiang, Aosong Feng, Leandros Tassiulas

    Abstract: The conventional view of the congestion control problem in data networks is based on the principle that a flow's performance is uniquely determined by the state of its bottleneck link, regardless of the topological properties of the network. However, recent work has shown that the behavior of congestion-controlled networks is better explained by models that account for the interactions between bot… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.