Skip to main content

Showing 1–50 of 87 results for author: Niu, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13751  [pdf, ps, other

    cs.RO cs.AI

    LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction

    Authors: Haoru Xue, Xiaoyu Huang, Dantong Niu, Qiayuan Liao, Thomas Kragerud, Jan Tommy Gravdahl, Xue Bin Peng, Guanya Shi, Trevor Darrell, Koushil Screenath, Shankar Sastry

    Abstract: Vision-language-action (VLA) models have demonstrated strong semantic understanding and zero-shot generalization, yet most existing systems assume an accurate low-level controller with hand-crafted action "vocabulary" such as end-effector pose or root velocity. This assumption confines prior work to quasi-static tasks and precludes the agile, whole-body behaviors required by humanoid whole-body co… ▽ More

    Submitted 19 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: https://ember-lab-berkeley.github.io/LeVERB-Website/

  2. arXiv:2505.20325  [pdf, other

    cs.CL cs.AI

    Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence

    Authors: Amirhosein Ghasemabadi, Keith G. Mills, Baochun Li, Di Niu

    Abstract: Test-Time Scaling (TTS) methods for enhancing Large Language Model (LLM) reasoning often incur substantial computational costs, primarily due to extensive reliance on external Process Reward Models (PRMs) or sampling methods like Best-of-N (BoN). This paper introduces Guided by Gut (GG), an efficient self-guided TTS framework that achieves PRM-level performance without costly external verifier mod… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  3. arXiv:2505.19734  [pdf, ps, other

    cs.AI cs.AR

    ReChisel: Effective Automatic Chisel Code Generation by LLM with Reflection

    Authors: Juxin Niu, Xiangfeng Liu, Dan Niu, Xi Wang, Zhe Jiang, Nan Guan

    Abstract: Coding with hardware description languages (HDLs) such as Verilog is a time-intensive and laborious task. With the rapid advancement of large language models (LLMs), there is increasing interest in applying LLMs to assist with HDL coding. Recent efforts have demonstrated the potential of LLMs in translating natural language to traditional HDL Verilog. Chisel, a next-generation HDL based on Scala,… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Accepted to DAC 2025

  4. arXiv:2504.14237  [pdf, other

    cs.LG

    A Novel Frequency-Spatial Domain Aware Network for Fast Thermal Prediction in 2.5D ICs

    Authors: Dekang Zhang, Dan Niu, Zhou Jin, Yichao Dong, Jingweijia Tan, Changyin Sun

    Abstract: In the post-Moore era, 2.5D chiplet-based ICs present significant challenges in thermal management due to increased power density and thermal hotspots. Neural network-based thermal prediction models can perform real-time predictions for many unseen new designs. However, existing CNN-based and GCN-based methods cannot effectively capture the global thermal features, especially for high-frequency co… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: 7 pages, 5 figures, 22nd Design, Automation and Test in Europe Conference (DATE '25)

  5. arXiv:2503.15465  [pdf, ps, other

    cs.CV

    FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers

    Authors: Ruichen Chen, Keith G. Mills, Di Niu

    Abstract: Diffusion Models (DM) have revolutionized the text-to-image visual generation process. However, the large computational cost and model footprint of DMs hinders practical deployment, especially on edge devices. Post-training quantization (PTQ) is a lightweight method to alleviate these burdens without the need for training or fine-tuning. While recent DM PTQ methods achieve W4A8 on integer-based PT… ▽ More

    Submitted 26 June, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: The code is available at https://github.com/cccrrrccc/FP4DiT

  6. arXiv:2503.09051  [pdf, ps, other

    cs.LG cs.AI

    TreeX: Generating Global Graphical GNN Explanations via Critical Subtree Extraction

    Authors: Shengyao Lu, Jiuding Yang, Baochun Li, Di Niu

    Abstract: The growing demand for transparency and interpretability in critical domains has driven increased interests in comprehending the explainability of Message-Passing (MP) Graph Neural Networks (GNNs). Although substantial research efforts have been made to generate explanations for individual graph instances, identifying global explaining concepts for a GNN still poses great challenges, especially wh… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  7. arXiv:2502.13142  [pdf, other

    cs.RO cs.AI

    Pre-training Auto-regressive Robotic Models with 4D Representations

    Authors: Dantong Niu, Yuvan Sharma, Haoru Xue, Giscard Biamby, Junyi Zhang, Ziteng Ji, Trevor Darrell, Roei Herzig

    Abstract: Foundation models pre-trained on massive unlabeled datasets have revolutionized natural language and computer vision, exhibiting remarkable generalization capabilities, thus highlighting the importance of pre-training. Yet, efforts in robotics have struggled to achieve similar success, limited by either the need for costly robotic annotations or the lack of representations that effectively model t… ▽ More

    Submitted 17 May, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  8. arXiv:2502.11046  [pdf, other

    cs.AR

    Enabling Efficient Transaction Processing on CXL-Based Memory Sharing

    Authors: Zhao Wang, Yiqi Chen, Cong Li, Dimin Niu, Tianchan Guan, Zhaoyang Du, Xingda Wei, Guangyu Sun

    Abstract: Transaction processing systems are the crux for modern data-center applications, yet current multi-node systems are slow due to network overheads. This paper advocates for Compute Express Link (CXL) as a network alternative, which enables low-latency and cache-coherent shared memory accesses. However, directly adopting standard CXL primitives leads to performance degradation due to the high cost o… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  9. arXiv:2501.00636  [pdf, other

    cs.LG cs.CV

    Applying Graph Explanation to Operator Fusion

    Authors: Keith G. Mills, Muhammad Fetrat Qharabagh, Weichen Qiu, Fred X. Han, Mohammad Salameh, Wei Lu, Shangling Jui, Di Niu

    Abstract: Layer fusion techniques are critical to improving the inference efficiency of deep neural networks (DNN) for deployment. Fusion aims to lower inference costs by reducing data transactions between an accelerator's on-chip buffer and DRAM. This is accomplished by grouped execution of multiple operations like convolution and activations together into single execution units - fusion groups. However, o… ▽ More

    Submitted 31 December, 2024; originally announced January 2025.

    Comments: DAC'23 WIP Poster; 8 pages, 5 Figures 5 Tables

  10. arXiv:2412.14628  [pdf, other

    cs.CV cs.LG

    Qua$^2$SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models

    Authors: Keith G. Mills, Mohammad Salameh, Ruichen Chen, Negar Hassanpour, Wei Lu, Di Niu

    Abstract: Diffusion Models (DM) have democratized AI image generation through an iterative denoising process. Quantization is a major technique to alleviate the inference cost and reduce the size of DM denoiser networks. However, as denoisers evolve from variants of convolutional U-Nets toward newer Transformer architectures, it is of growing importance to understand the quantization sensitivity of differen… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: AAAI 2025; version includes supplementary material; 22 Pages, 18 Figures, 8 Tables

  11. arXiv:2412.14283  [pdf, other

    cs.CV cs.AI cs.GR

    PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation

    Authors: Liyao Jiang, Negar Hassanpour, Mohammad Salameh, Mohammadreza Samadi, Jiao He, Fengyu Sun, Di Niu

    Abstract: Recent research explores the potential of Diffusion Models (DMs) for consistent object editing, which aims to modify object position, size, and composition, etc., while preserving the consistency of objects and background without changing their texture and attributes. Current inference-time methods often rely on DDIM inversion, which inherently compromises efficiency and the achievable consistency… ▽ More

    Submitted 29 January, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: AAAI 2025; version includes supplementary material; 27 Pages, 15 Figures, 6 Tables

  12. arXiv:2412.08210  [pdf, other

    cs.CV eess.IV

    Unicorn: Unified Neural Image Compression with One Number Reconstruction

    Authors: Qi Zheng, Haozhi Wang, Zihao Liu, Jiaming Liu, Peiye Liu, Zhijian Hao, Yanheng Lu, Dimin Niu, Jinjia Zhou, Minge Jing, Yibo Fan

    Abstract: Prevalent lossy image compression schemes can be divided into: 1) explicit image compression (EIC), including traditional standards and neural end-to-end algorithms; 2) implicit image compression (IIC) based on implicit neural representations (INR). The former is encountering impasses of either leveling off bitrate reduction at a cost of tremendous complexity while the latter suffers from excessiv… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  13. arXiv:2412.02575  [pdf, other

    cs.CV cs.MM

    Copy-Move Forgery Detection and Question Answering for Remote Sensing Image

    Authors: Ze Zhang, Enyuan Zhao, Di Niu, Jie Nie, Xinyue Liang, Lei Huang

    Abstract: Driven by practical demands in land resource monitoring and national defense security, this paper introduces the Remote Sensing Copy-Move Question Answering (RSCMQA) task. Unlike traditional Remote Sensing Visual Question Answering (RSVQA), RSCMQA focuses on interpreting complex tampering scenarios and inferring relationships between objects. We present a suite of global RSCMQA datasets, comprisin… ▽ More

    Submitted 22 May, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

    Comments: 11 figs, 7 tables

  14. arXiv:2411.17720  [pdf, other

    cs.DC cs.AI cs.PF

    MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices

    Authors: Mohammadali Shakerdargah, Shan Lu, Chao Gao, Di Niu

    Abstract: The advent of foundation models have revolutionized various fields, enabling unprecedented task accuracy and flexibility in computational linguistics, computer vision and other domains. Attention mechanism has become an essential component of foundation models, due to their superb capability of capturing correlations in a sequence. However, attention results in quadratic complexity in memory and c… ▽ More

    Submitted 15 May, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

    Comments: Accepted to MLSys 2025,

    ACM Class: C.1.4; I.2.7; I.5.1

  15. arXiv:2410.12782  [pdf, other

    cs.RO cs.CL

    In-Context Learning Enables Robot Action Prediction in LLMs

    Authors: Yida Yin, Zekai Wang, Yuvan Sharma, Dantong Niu, Trevor Darrell, Roei Herzig

    Abstract: Recently, Large Language Models (LLMs) have achieved remarkable success using in-context learning (ICL) in the language domain. However, leveraging the ICL capabilities within LLMs to directly predict robot actions remains largely unexplored. In this paper, we introduce RoboPrompt, a framework that enables off-the-shelf text-only LLMs to directly predict robot actions through ICL without training.… ▽ More

    Submitted 17 March, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: Published in ICRA 2025

  16. arXiv:2410.03936  [pdf, other

    cs.CV cs.AI cs.LG

    Learning Truncated Causal History Model for Video Restoration

    Authors: Amirhosein Ghasemabadi, Muhammad Kamran Janjua, Mohammad Salameh, Di Niu

    Abstract: One key challenge to video restoration is to model the transition dynamics of video frames governed by motion. In this work, we propose TURTLE to learn the truncated causal history model for efficient and high-performing video restoration. Unlike traditional methods that process a range of contextual frames in parallel, TURTLE enhances efficiency by storing and summarizing a truncated history of t… ▽ More

    Submitted 15 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted to NeurIPS 2024. 24 pages

  17. arXiv:2410.02795  [pdf, other

    cs.CY cs.AI cs.CL

    TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution

    Authors: Jiuding Yang, Shengyao Lu, Weidong Guo, Xiangyang Li, Kaitong Yang, Yu Xu, Di Niu

    Abstract: Large Language Models (LLMs) require precise alignment with complex instructions to optimize their performance in real-world applications. As the demand for refined instruction tuning data increases, traditional methods that evolve simple seed instructions often struggle to effectively enhance complexity or manage difficulty scaling across various domains. Our innovative approach, Task-Centered In… ▽ More

    Submitted 18 September, 2024; originally announced October 2024.

  18. arXiv:2408.11706  [pdf, other

    cs.CV

    FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting

    Authors: Liyao Jiang, Negar Hassanpour, Mohammad Salameh, Mohan Sai Singamsetti, Fengyu Sun, Wei Lu, Di Niu

    Abstract: Text-to-image (T2I) diffusion models have demonstrated impressive capabilities in generating high-quality images given a text prompt. However, ensuring the prompt-image alignment remains a considerable challenge, i.e., generating images that faithfully align with the prompt's semantics. Recent works attempt to improve the faithfulness by optimizing the latent code, which potentially could cause th… ▽ More

    Submitted 6 April, 2025; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: TMLR 2025

  19. arXiv:2408.08495  [pdf, other

    cs.CV

    FunEditor: Achieving Complex Image Edits via Function Aggregation with Diffusion Models

    Authors: Mohammadreza Samadi, Fred X. Han, Mohammad Salameh, Hao Wu, Fengyu Sun, Chunhua Zhou, Di Niu

    Abstract: Diffusion models have demonstrated outstanding performance in generative tasks, making them ideal candidates for image editing. Recent studies highlight their ability to apply desired edits effectively by following textual instructions, yet with two key challenges remaining. First, these models struggle to apply multiple edits simultaneously, resulting in computational inefficiencies due to their… ▽ More

    Submitted 17 December, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  20. arXiv:2407.21721  [pdf, other

    cs.MM cs.AI

    Open-Vocabulary Audio-Visual Semantic Segmentation

    Authors: Ruohao Guo, Liao Qu, Dantong Niu, Yanyu Qi, Wenzhen Yue, Ji Shi, Bowei Xing, Xianghua Ying

    Abstract: Audio-visual semantic segmentation (AVSS) aims to segment and classify sounding objects in videos with acoustic cues. However, most approaches operate on the close-set assumption and only identify pre-defined categories from training data, lacking the generalization ability to detect novel categories in practical applications. In this paper, we introduce a new task: open-vocabulary audio-visual se… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024 (Oral)

  21. arXiv:2406.11815  [pdf, other

    cs.RO cs.CV cs.LG

    LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning

    Authors: Dantong Niu, Yuvan Sharma, Giscard Biamby, Jerome Quenum, Yutong Bai, Baifeng Shi, Trevor Darrell, Roei Herzig

    Abstract: In recent years, instruction-tuned Large Multimodal Models (LMMs) have been successful at several tasks, including image captioning and visual question answering; yet leveraging these models remains an open question for robotics. Prior LMMs for robotics applications have been extensively trained on language and action data, but their ability to generalize in different settings has often been less… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  22. arXiv:2406.11301  [pdf, other

    cs.AI cs.CL cs.LG

    Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants

    Authors: Jiuding Yang, Weidong Guo, Kaitong Yang, Xiangyang Li, Yu Xu, Di Niu

    Abstract: The effective alignment of Large Language Models (LLMs) with precise instructions is essential for their application in diverse real-world scenarios. Current methods focus on enhancing the diversity and complexity of training and evaluation samples, yet they fall short in accurately assessing LLMs' ability to follow similar instruction variants. We introduce an effective data augmentation techniqu… ▽ More

    Submitted 15 October, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  23. arXiv:2405.08013  [pdf, other

    cs.LG cs.AI cs.SI

    CTRL: Continuous-Time Representation Learning on Temporal Heterogeneous Information Network

    Authors: Chenglin Li, Yuanzhen Xie, Chenyun Yu, Lei Cheng, Bo Hu, Zang Li, Di Niu

    Abstract: Inductive representation learning on temporal heterogeneous graphs is crucial for scalable deep learning on heterogeneous information networks (HINs) which are time-varying, such as citation networks. However, most existing approaches are not inductive and thus cannot handle new nodes or edges. Moreover, previous temporal graph embedding methods are often trained with the temporal link prediction… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  24. arXiv:2405.01762  [pdf, ps, other

    cs.LG

    EiG-Search: Generating Edge-Induced Subgraphs for GNN Explanation in Linear Time

    Authors: Shengyao Lu, Bang Liu, Keith G. Mills, Jiao He, Di Niu

    Abstract: Understanding and explaining the predictions of Graph Neural Networks (GNNs), is crucial for enhancing their safety and trustworthiness. Subgraph-level explanations are gaining attention for their intuitive appeal. However, most existing subgraph-level explainers face efficiency challenges in explaining GNNs due to complex search processes. The key challenge is to find a balance between intuitiven… ▽ More

    Submitted 16 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 19 pages

    Journal ref: ICML 2024

  25. arXiv:2404.19038  [pdf, other

    cs.CV cs.AI

    Embedded Representation Learning Network for Animating Styled Video Portrait

    Authors: Tianyong Wang, Xiangyu Liang, Wangguandong Zheng, Dan Niu, Haifeng Xia, Siyu Xia

    Abstract: The talking head generation recently attracted considerable attention due to its widespread application prospects, especially for digital avatars and 3D animation design. Inspired by this practical demand, several works explored Neural Radiance Fields (NeRF) to synthesize the talking heads. However, these methods based on NeRF face two challenges: (1) Difficulty in generating style-controllable ta… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  26. arXiv:2403.13293  [pdf, other

    cs.CV cs.AI cs.LG

    Building Optimal Neural Architectures using Interpretable Knowledge

    Authors: Keith G. Mills, Fred X. Han, Mohammad Salameh, Shengyao Lu, Chunhua Zhou, Jiao He, Fengyu Sun, Di Niu

    Abstract: Neural Architecture Search is a costly practice. The fact that a search space can span a vast number of design choices with each architecture evaluation taking nontrivial overhead makes it hard for an algorithm to sufficiently explore candidate networks. In this paper, we propose AutoBuild, a scheme which learns to align the latent embeddings of operations and architecture modules with the ground-… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: CVPR'24; 18 Pages, 18 Figures, 3 Tables

  27. arXiv:2403.07557  [pdf, other

    cs.CL cs.LG

    SIFiD: Reassess Summary Factual Inconsistency Detection with LLM

    Authors: Jiuding Yang, Hui Liu, Weidong Guo, Zhuwei Rao, Yu Xu, Di Niu

    Abstract: Ensuring factual consistency between the summary and the original document is paramount in summarization tasks. Consequently, considerable effort has been dedicated to detecting inconsistencies. With the advent of Large Language Models (LLMs), recent studies have begun to leverage their advanced language understanding capabilities for inconsistency detection. However, early attempts have shown tha… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  28. arXiv:2402.11140  [pdf, other

    cs.CL cs.AI cs.LG

    Boosting of Thoughts: Trial-and-Error Problem Solving with Large Language Models

    Authors: Sijia Chen, Baochun Li, Di Niu

    Abstract: The reasoning performance of Large Language Models (LLMs) on a wide range of problems critically relies on chain-of-thought prompting, which involves providing a few chain of thought demonstrations as exemplars in prompts. Recent work, e.g., Tree of Thoughts, has pointed out the importance of exploration and self-evaluation in reasoning step selection for complex problem solving. In this paper, we… ▽ More

    Submitted 6 January, 2025; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted as a poster paper by ICLR2024. 27 pages, 5 figures, 18 tables. [Source Code](https://github.com/iQua/llmpebase/tree/main/examples/BoTReasoning)

  29. arXiv:2401.15235  [pdf, other

    eess.IV cs.CV cs.LG

    CascadedGaze: Efficiency in Global Context Extraction for Image Restoration

    Authors: Amirhosein Ghasemabadi, Muhammad Kamran Janjua, Mohammad Salameh, Chunhua Zhou, Fengyu Sun, Di Niu

    Abstract: Image restoration tasks traditionally rely on convolutional neural networks. However, given the local nature of the convolutional operator, they struggle to capture global information. The promise of attention mechanisms in Transformers is to circumvent this problem, but it comes at the cost of intensive computational overhead. Many recent studies in image restoration have focused on solving the c… ▽ More

    Submitted 7 May, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Published in Transactions on Machine Learning Research (TMLR), 2024. 20 pages

  30. arXiv:2401.14578  [pdf, ps, other

    cs.LG

    GOAt: Explaining Graph Neural Networks via Graph Output Attribution

    Authors: Shengyao Lu, Keith G. Mills, Jiao He, Bang Liu, Di Niu

    Abstract: Understanding the decision-making process of Graph Neural Networks (GNNs) is crucial to their interpretability. Most existing methods for explaining GNNs typically rely on training auxiliary models, resulting in the explanations remain black-boxed. This paper introduces Graph Output Attribution (GOAt), a novel method to attribute graph outputs to input graph features, creating GNN explanations tha… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: ICLR 2024 Poster

  31. arXiv:2312.17243  [pdf, other

    cs.CV

    Unsupervised Universal Image Segmentation

    Authors: Dantong Niu, Xudong Wang, Xinyang Han, Long Lian, Roei Herzig, Trevor Darrell

    Abstract: Several unsupervised image segmentation approaches have been proposed which eliminate the need for dense manually-annotated segmentation masks; current models separately handle either semantic segmentation (e.g., STEGO) or class-agnostic instance segmentation (e.g., CutLER), but not both (i.e., panoptic segmentation). We propose an Unsupervised Universal Segmentation model (U2Seg) adept at perform… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  32. arXiv:2312.15692  [pdf, other

    cs.AI

    Instruction Fusion: Advancing Prompt Evolution through Hybridization

    Authors: Weidong Guo, Jiuding Yang, Kaitong Yang, Xiangyang Li, Zhuwei Rao, Yu Xu, Di Niu

    Abstract: The fine-tuning of Large Language Models (LLMs) specialized in code generation has seen notable advancements through the use of open-domain coding queries. Despite the successes, existing methodologies like Evol-Instruct encounter performance limitations, impeding further enhancements in code generation tasks. This paper examines the constraints of existing prompt evolution techniques and introduc… ▽ More

    Submitted 17 June, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

  33. arXiv:2311.17942  [pdf, other

    cs.CV

    Object-based (yet Class-agnostic) Video Domain Adaptation

    Authors: Dantong Niu, Amir Bar, Roei Herzig, Trevor Darrell, Anna Rohrbach

    Abstract: Existing video-based action recognition systems typically require dense annotation and struggle in environments when there is significant distribution shift relative to the training data. Current methods for video domain adaptation typically fine-tune the model using fully annotated data on a subset of target domain data or align the representation of the two domains using bootstrapping or adversa… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  34. arXiv:2310.18709  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Audio-Visual Instance Segmentation

    Authors: Ruohao Guo, Xianghua Ying, Yaru Chen, Dantong Niu, Guangyao Li, Liao Qu, Yanyu Qi, Jinxing Zhou, Bowei Xing, Wenzhen Yue, Ji Shi, Qixun Wang, Peiliang Zhang, Buwen Liang

    Abstract: In this paper, we propose a new multi-modal task, termed audio-visual instance segmentation (AVIS), which aims to simultaneously identify, segment and track individual sounding object instances in audible videos. To facilitate this research, we introduce a high-quality benchmark named AVISeg, containing over 90K instance masks from 26 semantic categories in 926 long videos. Additionally, we propos… ▽ More

    Submitted 2 March, 2025; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted by CVPR 2025

  35. arXiv:2309.08159  [pdf, other

    cs.CV cs.IR cs.LG

    AdSEE: Investigating the Impact of Image Style Editing on Advertisement Attractiveness

    Authors: Liyao Jiang, Chenglin Li, Haolan Chen, Xiaodong Gao, Xinwang Zhong, Yang Qiu, Shani Ye, Di Niu

    Abstract: Online advertisements are important elements in e-commerce sites, social media platforms, and search engines. With the increasing popularity of mobile browsing, many online ads are displayed with visual information in the form of a cover image in addition to text descriptions to grab the attention of users. Various recent studies have focused on predicting the click rates of online advertisements… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted to KDD 2023 Applied Data Science Track

  36. arXiv:2309.07967  [pdf, other

    cs.IR

    iHAS: Instance-wise Hierarchical Architecture Search for Deep Learning Recommendation Models

    Authors: Yakun Yu, Shi-ang Qi, Jiuding Yang, Liyao Jiang, Di Niu

    Abstract: Current recommender systems employ large-sized embedding tables with uniform dimensions for all features, leading to overfitting, high computational cost, and suboptimal generalizing performance. Many techniques aim to solve this issue by feature selection or embedding dimension search. However, these techniques typically select a fixed subset of features or embedding dimensions for all instances… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted as CIKM23 Long paper

  37. arXiv:2306.15796  [pdf, other

    cs.AI

    ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis

    Authors: Yakun Yu, Mingjun Zhao, Shi-ang Qi, Feiran Sun, Baoxun Wang, Weidong Guo, Xiaoli Wang, Lei Yang, Di Niu

    Abstract: Multimodal Sentiment Analysis leverages multimodal signals to detect the sentiment of a speaker. Previous approaches concentrate on performing multimodal fusion and representation learning based on general knowledge obtained from pretrained models, which neglects the effect of domain-specific knowledge. In this paper, we propose Contrastive Knowledge Injection (ConKI) for multimodal sentiment anal… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: Accepted by ACL Findings 2023

  38. arXiv:2304.12561  [pdf, other

    cs.CV cs.MM

    TCR: Short Video Title Generation and Cover Selection with Attention Refinement

    Authors: Yakun Yu, Jiuding Yang, Weidong Guo, Hui Liu, Yu Xu, Di Niu

    Abstract: With the widespread popularity of user-generated short videos, it becomes increasingly challenging for content creators to promote their content to potential viewers. Automatically generating appealing titles and covers for short videos can help grab viewers' attention. Existing studies on video captioning mostly focus on generating factual descriptions of actions, which do not conform to video ti… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted by PAKDD23

  39. CEIL: A General Classification-Enhanced Iterative Learning Framework for Text Clustering

    Authors: Mingjun Zhao, Mengzhen Wang, Yinglong Ma, Di Niu, Haijiang Wu

    Abstract: Text clustering, as one of the most fundamental challenges in unsupervised learning, aims at grouping semantically similar text segments without relying on human annotations. With the rapid development of deep learning, deep clustering has achieved significant advantages over traditional clustering methods. Despite the effectiveness, most existing deep text clustering methods rely heavily on repre… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: The Web Conference 2023

  40. arXiv:2304.10316  [pdf, other

    cs.CV

    Search-Map-Search: A Frame Selection Paradigm for Action Recognition

    Authors: Mingjun Zhao, Yakun Yu, Xiaoli Wang, Lei Yang, Di Niu

    Abstract: Despite the success of deep learning in video understanding tasks, processing every frame in a video is computationally expensive and often unnecessary in real-time applications. Frame selection aims to extract the most informative and representative frames to help a model better understand video content. Existing frame selection methods either individually sample frames based on per-frame importa… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  41. LA3: Efficient Label-Aware AutoAugment

    Authors: Mingjun Zhao, Shan Lu, Zixuan Wang, Xiaoli Wang, Di Niu

    Abstract: Automated augmentation is an emerging and effective technique to search for data augmentation policies to improve generalizability of deep neural network training. Most existing work focuses on constructing a unified policy applicable to all data samples in a given dataset, without considering sample or class variations. In this paper, we propose a novel two-stage data augmentation algorithm, name… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: ECCV 2022

  42. arXiv:2303.17870  [pdf, other

    cs.CV

    GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation

    Authors: Jian Ma, Mingjun Zhao, Chen Chen, Ruichen Wang, Di Niu, Haonan Lu, Xiaodong Lin

    Abstract: Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions.Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate text coherently within images, particularly for compl… ▽ More

    Submitted 23 May, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: 24 pages, 5 figures

  43. arXiv:2303.02733  [pdf, other

    cs.LG cs.AI cs.CV

    Reparameterization through Spatial Gradient Scaling

    Authors: Alexander Detkov, Mohammad Salameh, Muhammad Fetrat Qharabagh, Jialin Zhang, Wei Lui, Shangling Jui, Di Niu

    Abstract: Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training. However, there exists a gap in understanding how reparameterization may change and benefit the learning process of neural networks. In this paper, we present a novel spatial gradient scaling method to redistribute learning foc… ▽ More

    Submitted 6 March, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: Published at ICLR 2023. Code available at https://github.com/Ascend-Research/Reparameterization

  44. A General-Purpose Transferable Predictor for Neural Architecture Search

    Authors: Fred X. Han, Keith G. Mills, Fabian Chudak, Parsa Riahi, Mohammad Salameh, Jialin Zhang, Wei Lu, Shangling Jui, Di Niu

    Abstract: Understanding and modelling the performance of neural architectures is key to Neural Architecture Search (NAS). Performance predictors have seen widespread use in low-cost NAS and achieve high ranking correlations between predicted and ground truth performance in several NAS benchmarks. However, existing predictors are often designed based on network encodings specific to a predefined search space… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: Accepted to SDM2023; version includes supplementary material; 12 Pages, 3 Figures, 6 Tables

  45. AIO-P: Expanding Neural Performance Predictors Beyond Image Classification

    Authors: Keith G. Mills, Di Niu, Mohammad Salameh, Weichen Qiu, Fred X. Han, Puyuan Liu, Jialin Zhang, Wei Lu, Shangling Jui

    Abstract: Evaluating neural network performance is critical to deep neural network design but a costly procedure. Neural predictors provide an efficient solution by treating architectures as samples and learning to estimate their performance on a given task. However, existing predictors are task-dependent, predominantly estimating neural network performance on image classification benchmarks. They are also… ▽ More

    Submitted 24 April, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: AAAI 2023 Oral Presentation; version includes supplementary material; 16 Pages, 4 Figures, 22 Tables

  46. GENNAPE: Towards Generalized Neural Architecture Performance Estimators

    Authors: Keith G. Mills, Fred X. Han, Jialin Zhang, Fabian Chudak, Ali Safari Mamaghani, Mohammad Salameh, Wei Lu, Shangling Jui, Di Niu

    Abstract: Predicting neural architecture performance is a challenging task and is crucial to neural architecture design and search. Existing approaches either rely on neural performance predictors which are limited to modeling architectures in a predefined design space involving specific sets of operators and connection rules, and cannot generalize to unseen architectures, or resort to zero-cost proxies whi… ▽ More

    Submitted 24 April, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: AAAI 2023 Oral Presentation; includes supplementary materials with more details on introduced benchmarks; 14 Pages, 6 Figures, 10 Tables

  47. One for All, All for One: Learning and Transferring User Embeddings for Cross-Domain Recommendation

    Authors: Chenglin Li, Yuanzhen Xie, Chenyun Yu, Bo Hu, Zang li, Guoqiang Shu, Xiaohu Qie, Di Niu

    Abstract: Cross-domain recommendation is an important method to improve recommender system performance, especially when observations in target domains are sparse. However, most existing techniques focus on single-target or dual-target cross-domain recommendation (CDR) and are hard to be generalized to CDR with multiple target domains. In addition, the negative transfer problem is prevalent in CDR, where the… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: 9 pages, accepted by WSDM 2023

  48. arXiv:2211.10854  [pdf, other

    cs.CL cs.LG

    Mulco: Recognizing Chinese Nested Named Entities Through Multiple Scopes

    Authors: Jiuding Yang, Jinwen Luo, Weidong Guo, Jerry Chen, Di Niu, Yu Xu

    Abstract: Nested Named Entity Recognition (NNER) has been a long-term challenge to researchers as an important sub-area of Named Entity Recognition. NNER is where one entity may be part of a longer entity, and this may happen on multiple levels, as the term nested suggests. These nested structures make traditional sequence labeling methods unable to properly recognize all entities. While recent researches f… ▽ More

    Submitted 19 November, 2022; originally announced November 2022.

  49. arXiv:2207.13848  [pdf, other

    cs.DC cs.LG cs.PF math.NA

    Predicting the Output Structure of Sparse Matrix Multiplication with Sampled Compression Ratio

    Authors: Zhaoyang Du, Yijin Guan, Tianchan Guan, Dimin Niu, Nianxiong Tan, Xiaopeng Yu, Hongzhong Zheng, Jianyi Meng, Xiaolang Yan, Yuan Xie

    Abstract: Sparse general matrix multiplication (SpGEMM) is a fundamental building block in numerous scientific applications. One critical task of SpGEMM is to compute or predict the structure of the output matrix (i.e., the number of nonzero elements per output row) for efficient memory allocation and load balance, which impact the overall performance of SpGEMM. Existing work either precisely calculates the… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: This paper has been submitted to the IEEE International Conference on Parallel and Distributed Systems (ICPADS). 8 pages, 2 fgures, 3 tables

    ACM Class: F.2.1; G.3; D.1.3; G.1.3

  50. arXiv:2206.07244  [pdf, other

    cs.DC

    OpSparse: a Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs

    Authors: Zhaoyang Du, Yijin Guan, Tianchan Guan, Dimin Niu, Linyong Huang, Hongzhong Zheng, Yuan Xie

    Abstract: Sparse general matrix multiplication (SpGEMM) is an important and expensive computation primitive in many real-world applications. Due to SpGEMM's inherent irregularity and the vast diversity of its input matrices, developing high-performance SpGEMM implementation on modern processors such as GPUs is challenging. The state-of-the-art SpGEMM libraries (i.e., $nsparse$ and $spECK$) adopt several alg… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: This paper has been submitted to the IEEE Access since May 7, 2022, and is currently under review by IEEE Access. 20 pages, 11 fgures, 5 tables

    MSC Class: 68-02; 68W10; 65F50 ACM Class: D.1.3; G.1.3