Skip to main content

Showing 1–50 of 119 results for author: Pechenizkiy, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.19468  [pdf, ps, other

    cs.CL cs.AI

    MuBench: Assessment of Multilingual Capabilities of Large Language Models Across 61 Languages

    Authors: Wenhan Han, Yifan Zhang, Zhixun Chen, Binbin Liu, Haobin Lin, Bingni Zhang, Taifeng Wang, Mykola Pechenizkiy, Meng Fang, Yin Zheng

    Abstract: Multilingual large language models (LLMs) are advancing rapidly, with new models frequently claiming support for an increasing number of languages. However, existing evaluation datasets are limited and lack cross-lingual alignment, leaving assessments of multilingual capabilities fragmented in both language and skill coverage. To address this, we introduce MuBench, a benchmark covering 61 language… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  2. arXiv:2506.14990  [pdf, ps, other

    cs.AI

    MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

    Authors: Tristan Tomilin, Luka van den Boogaard, Samuel Garcin, Bram Grooten, Meng Fang, Mykola Pechenizkiy

    Abstract: Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms, with environment availability strongly impacting research. One particularly underexplored intersection is continual learning (CL) in cooperative multi-agent settings. To remedy this, we introduce MEAL (Multi-agent Environments for Adaptive Learning), the first benchmark tailored for continual… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  3. arXiv:2506.10629  [pdf, ps, other

    cs.LG cs.AI cs.IT

    Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning

    Authors: Yucheng Yang, Tianyi Zhou, Qiang He, Lei Han, Mykola Pechenizkiy, Meng Fang

    Abstract: Unsupervised reinforcement learning (URL) aims to learn general skills for unseen downstream tasks. Mutual Information Skill Learning (MISL) addresses URL by maximizing the mutual information between states and skills but lacks sufficient theoretical analysis, e.g., how well its learned skills can initialize a downstream task's policy. Our new theoretical analysis in this paper shows that the dive… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Spotlight paper at ICLR 2024. This version includes acknowledgments omitted from the ICLR version and indicates the corresponding authors primarily responsible for the work

    ACM Class: I.2.6; I.2.8; G.3

    Journal ref: International Conference on Learning Representations (ICLR), 2024, Spotlight paper

  4. arXiv:2506.00932  [pdf, other

    cs.LG

    Addressing the Collaboration Dilemma in Low-Data Federated Learning via Transient Sparsity

    Authors: Qiao Xiao, Boqian Wu, Andrey Poddubnyy, Elena Mocanu, Phuong H. Nguyen, Mykola Pechenizkiy, Decebal Constantin Mocanu

    Abstract: Federated learning (FL) enables collaborative model training across decentralized clients while preserving data privacy, leveraging aggregated updates to build robust global models. However, this training paradigm faces significant challenges due to data heterogeneity and limited local datasets, which often impede effective collaboration. In such scenarios, we identify the Layer-wise Inertia Pheno… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  5. arXiv:2505.24037  [pdf, other

    cs.AI

    Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

    Authors: Qiao Xiao, Alan Ansell, Boqian Wu, Lu Yin, Mykola Pechenizkiy, Shiwei Liu, Decebal Constantin Mocanu

    Abstract: Large language models (LLMs) have achieved remarkable success across various tasks but face deployment challenges due to their massive computational demands. While post-training pruning methods like SparseGPT and Wanda can effectively reduce the model size, but struggle to maintain model performance at high sparsity levels, limiting their utility for downstream tasks. Existing fine-tuning methods,… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  6. arXiv:2505.17909  [pdf, ps, other

    cs.LG cs.AI

    NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling

    Authors: Bram Grooten, Farid Hasanov, Chenxiang Zhang, Qiao Xiao, Boqian Wu, Zahra Atashgahi, Ghada Sokar, Shiwei Liu, Lu Yin, Elena Mocanu, Mykola Pechenizkiy, Decebal Constantin Mocanu

    Abstract: Model ensembles have long been a cornerstone for improving generalization and robustness in deep learning. However, their effectiveness often comes at the cost of substantial computational overhead. To address this issue, state-of-the-art methods aim to replicate ensemble-class performance without requiring multiple independently trained networks. Unfortunately, these algorithms often still demand… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: Our open-source code is available at https://github.com/bramgrooten/neurotrails

  7. arXiv:2505.16793  [pdf, ps, other

    cs.CV

    REOBench: Benchmarking Robustness of Earth Observation Foundation Models

    Authors: Xiang Li, Yong Tao, Siyuan Zhang, Siwei Liu, Zhitong Xiong, Chunbo Luo, Lu Liu, Mykola Pechenizkiy, Xiao Xiang Zhu, Tianjin Huang

    Abstract: Earth observation foundation models have shown strong generalization across multiple Earth observation tasks, but their robustness under real-world perturbations remains underexplored. To bridge this gap, we introduce REOBench, the first comprehensive benchmark for evaluating the robustness of Earth observation foundation models across six tasks and twelve types of image corruptions, including bot… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 24 pages

  8. arXiv:2503.08241  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents

    Authors: Tristan Tomilin, Meng Fang, Mykola Pechenizkiy

    Abstract: Advancing safe autonomous systems through reinforcement learning (RL) requires robust benchmarks to evaluate performance, analyze methods, and assess agent competencies. Humans primarily rely on embodied visual perception to safely navigate and interact with their surroundings, making it a valuable capability for RL agents. However, existing vision-based 3D benchmarks only consider simple navigati… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: Accepted to ICLR 2025

  9. arXiv:2502.19557  [pdf, other

    cs.CL cs.AI

    Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?

    Authors: Yudi Zhang, Lu Wang, Meng Fang, Yali Du, Chenghua Huang, Jun Wang, Qingwei Lin, Mykola Pechenizkiy, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: Distilling large language models (LLMs) typically involves transferring the teacher model's responses through supervised fine-tuning (SFT). However, this approach neglects the potential to distill both data (output content) and reward signals (quality evaluations). Extracting reliable reward signals directly from teacher models is challenging, as LLMs are optimized for generation rather than evalu… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 14 pages, 7 figures

  10. arXiv:2501.00066  [pdf, ps, other

    cs.CL cs.AI cs.CR cs.LG

    On Adversarial Robustness of Language Models in Transfer Learning

    Authors: Bohdan Turbal, Anastasiia Mazur, Jiaxu Zhao, Mykola Pechenizkiy

    Abstract: We investigate the adversarial robustness of LLMs in transfer learning scenarios. Through comprehensive experiments on multiple datasets (MBIB Hate Speech, MBIB Political Bias, MBIB Gender Bias) and various model architectures (BERT, RoBERTa, GPT-2, Gemma, Phi), we reveal that transfer learning, while improving standard performance metrics, often leads to increased vulnerability to adversarial att… ▽ More

    Submitted 7 June, 2025; v1 submitted 29 December, 2024; originally announced January 2025.

    Journal ref: Socially Responsible Language Modelling Research (SoLaR) Workshop at NeurIPS 2024

  11. arXiv:2411.09431  [pdf, other

    cs.CL

    Everyone deserves their voice to be heard: Analyzing Predictive Gender Bias in ASR Models Applied to Dutch Speech Data

    Authors: Rik Raes, Saskia Lensink, Mykola Pechenizkiy

    Abstract: Recent research has shown that state-of-the-art (SotA) Automatic Speech Recognition (ASR) systems, such as Whisper, often exhibit predictive biases that disproportionately affect various demographic groups. This study focuses on identifying the performance disparities of Whisper models on Dutch speech data from the Common Voice dataset and the Dutch National Public Broadcasting organisation. We an… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

    Comments: Accepted at ECML PKDD 2024, 4th Workshop on Bias and Fairness in AI (BIAS)

  12. arXiv:2411.03349  [pdf, other

    cs.AI cs.CL cs.LG

    RuAG: Learned-rule-augmented Generation for Large Language Models

    Authors: Yudi Zhang, Pei Xiao, Lu Wang, Chaoyun Zhang, Meng Fang, Yali Du, Yevgeniy Puzyrev, Randolph Yao, Si Qin, Qingwei Lin, Mykola Pechenizkiy, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: In-context learning (ICL) and Retrieval-Augmented Generation (RAG) have gained attention for their ability to enhance LLMs' reasoning by incorporating external knowledge but suffer from limited contextual window size, leading to insufficient information injection. To this end, we propose a novel framework, RuAG, to automatically distill large volumes of offline data into interpretable first-order… ▽ More

    Submitted 3 November, 2024; originally announced November 2024.

  13. arXiv:2410.13458  [pdf, other

    cs.CL

    MedINST: Meta Dataset of Biomedical Instructions

    Authors: Wenhan Han, Meng Fang, Zihan Zhang, Yu Yin, Zirui Song, Ling Chen, Mykola Pechenizkiy, Qingyu Chen

    Abstract: The integration of large language model (LLM) techniques in the field of medical analysis has brought about significant advancements, yet the scarcity of large, diverse, and well-annotated datasets remains a major challenge. Medical data and tasks, which vary in format, size, and other parameters, require extensive preprocessing and standardization for effective use in training LLMs. To address th… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  14. arXiv:2410.03030  [pdf, other

    cs.CV cs.AI

    Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness

    Authors: Boqian Wu, Qiao Xiao, Shunxin Wang, Nicola Strisciuglio, Mykola Pechenizkiy, Maurice van Keulen, Decebal Constantin Mocanu, Elena Mocanu

    Abstract: It is generally perceived that Dynamic Sparse Training opens the door to a new era of scalability and efficiency for artificial neural networks at, perhaps, some costs in accuracy performance for the classification task. At the same time, Dense Training is widely accepted as being the "de facto" approach to train artificial neural networks if one would like to maximize their robustness against ima… ▽ More

    Submitted 4 March, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: Accepted at ICLR 2025

  15. arXiv:2409.09196  [pdf, other

    cs.CV cs.LG

    Are Sparse Neural Networks Better Hard Sample Learners?

    Authors: Qiao Xiao, Boqian Wu, Lu Yin, Christopher Neil Gadzinski, Tianjin Huang, Mykola Pechenizkiy, Decebal Constantin Mocanu

    Abstract: While deep learning has demonstrated impressive progress, it remains a daunting challenge to learn from hard samples as these samples are usually noisy and intricate. These hard samples play a crucial role in the optimal performance of deep neural networks. Most research on Sparse Neural Networks (SNNs) has focused on standard training data, leaving gaps in understanding their effectiveness on com… ▽ More

    Submitted 27 December, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted at British Machine Vision Conference (BMVC 2024)

  16. arXiv:2408.14319  [pdf, other

    cs.LG stat.ML

    Rethinking Knowledge Transfer in Learning Using Privileged Information

    Authors: Danil Provodin, Bram van den Akker, Christina Katsimerou, Maurits Kaptein, Mykola Pechenizkiy

    Abstract: In supervised machine learning, privileged information (PI) is information that is unavailable at inference, but is accessible during training time. Research on learning using privileged information (LUPI) aims to transfer the knowledge captured in PI onto a model that can perform inference without PI. It seems that this extra bit of information ought to make the resulting model better. However, f… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  17. A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams

    Authors: Ben Halstead, Yun Sing Koh, Patricia Riddle, Mykola Pechenizkiy, Albert Bifet

    Abstract: The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for current data. Learning from irrelevant experience describing a different concept can degrade performance. A system learning from streaming data must identify wh… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  18. arXiv:2408.07364  [pdf, other

    cs.LG

    Robust Active Learning (RoAL): Countering Dynamic Adversaries in Active Learning with Elastic Weight Consolidation

    Authors: Ricky Maulana Fajri, Yulong Pei, Lu Yin, Mykola Pechenizkiy

    Abstract: Despite significant advancements in active learning and adversarial attacks, the intersection of these two fields remains underexplored, particularly in developing robust active learning frameworks against dynamic adversarial threats. The challenge of developing robust active learning frameworks under dynamic adversarial attacks is critical, as these attacks can lead to catastrophic forgetting wit… ▽ More

    Submitted 14 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  19. arXiv:2408.04583  [pdf, other

    cs.LG cs.AI

    Unveiling the Power of Sparse Neural Networks for Feature Selection

    Authors: Zahra Atashgahi, Tennison Liu, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu, Mihaela van der Schaar

    Abstract: Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection. Leveraging the dynamic sparse training (DST) algorithms within SNNs has demonstrated promising feature selection capabilities while drastically reducing computational overheads. Despite these advancements, several critical aspects remain insufficiently explored for feature selection. Questions persist reg… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  20. arXiv:2407.17437  [pdf, other

    cs.LG

    Nerva: a Truly Sparse Implementation of Neural Networks

    Authors: Wieger Wesselink, Bram Grooten, Qiao Xiao, Cassio de Campos, Mykola Pechenizkiy

    Abstract: We introduce Nerva, a fast neural network library under development in C++. It supports sparsity by using the sparse matrix operations of Intel's Math Kernel Library (MKL), which eliminates the need for binary masks. We show that Nerva significantly decreases training time and memory usage while reaching equivalent accuracy to PyTorch. We run static sparse experiments with an MLP on CIFAR-10. On h… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: The Nerva library is available at https://github.com/wiegerw/nerva

  21. arXiv:2407.17412  [pdf, other

    cs.CV cs.AI

    (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork

    Authors: Tianjin Huang, Fang Meng, Li Shen, Fan Liu, Yulong Pei, Mykola Pechenizkiy, Shiwei Liu, Tianlong Chen

    Abstract: Large-scale neural networks have demonstrated remarkable performance in different domains like vision and language processing, although at the cost of massive computation resources. As illustrated by compression literature, structural model pruning is a prominent algorithm to encourage model efficiency, thanks to its acceleration-friendly sparsity patterns. One of the key questions of structural p… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Under review

  22. arXiv:2406.18373  [pdf, other

    cs.CL cs.SD eess.AS

    Dynamic Data Pruning for Automatic Speech Recognition

    Authors: Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, Shiwei Liu

    Abstract: The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  23. arXiv:2406.06495  [pdf, ps, other

    cs.LG

    Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity

    Authors: Calarina Muslimani, Bram Grooten, Deepak Ranganatha Sastry Mamillapalli, Mykola Pechenizkiy, Decebal Constantin Mocanu, Matthew E. Taylor

    Abstract: To integrate into human-centered environments, autonomous agents must learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) can enable this by learning reward functions from human preferences. However, humans live in a world full of diverse information, most of which is irrelevant to completing any particular task. It then becomes essential that ag… ▽ More

    Submitted 3 July, 2025; v1 submitted 10 June, 2024; originally announced June 2024.

  24. arXiv:2406.02177  [pdf, other

    cs.LG cs.AI cs.DC stat.ML

    One-Shot Federated Learning with Bayesian Pseudocoresets

    Authors: Tim d'Hondt, Mykola Pechenizkiy, Robert Peharz

    Abstract: Optimization-based techniques for federated learning (FL) often come with prohibitive communication cost, as high dimensional model parameters need to be communicated repeatedly between server and clients. In this paper, we follow a Bayesian approach allowing to perform FL with one-shot communication, by solving the global inference problem as a product of local client posteriors. For models with… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 10 pages

  25. arXiv:2405.19017  [pdf, other

    cs.LG

    Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling

    Authors: Danil Provodin, Maurits Kaptein, Mykola Pechenizkiy

    Abstract: We present a new algorithm based on posterior sampling for learning in Constrained Markov Decision Processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically compared to the existing algorithms. Our main theoretical result is a Bayesian regret bound for each cost component of $\tilde{O} (DS\sqrt{AT})$ for an… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: To appear at ICML'24

  26. The Neutrality Fallacy: When Algorithmic Fairness Interventions are (Not) Positive Action

    Authors: Hilde Weerts, Raphaƫle Xenidis, Fabien Tarissan, Henrik Palmer Olsen, Mykola Pechenizkiy

    Abstract: Various metrics and interventions have been developed to identify and mitigate unfair outputs of machine learning systems. While individuals and organizations have an obligation to avoid discrimination, the use of fairness-aware machine learning interventions has also been described as amounting to 'algorithmic positive action' under European Union (EU) non-discrimination law. As the Court of Just… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Journal ref: 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24)

  27. arXiv:2404.08006  [pdf, other

    cs.RO cs.AI cs.LG math.OC

    Learning Efficient and Fair Policies for Uncertainty-Aware Collaborative Human-Robot Order Picking

    Authors: Igor G. Smit, Zaharah Bukhsh, Mykola Pechenizkiy, Kostas Alogariastos, Kasper Hendriks, Yingqian Zhang

    Abstract: In collaborative human-robot order picking systems, human pickers and Autonomous Mobile Robots (AMRs) travel independently through a warehouse and meet at pick locations where pickers load items onto the AMRs. In this paper, we consider an optimization problem in such systems where we allocate pickers to AMRs in a stochastic environment. We propose a novel multi-objective Deep Reinforcement Learni… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  28. arXiv:2403.16101  [pdf, other

    cs.AI

    Public Perceptions of Fairness Metrics Across Borders

    Authors: Yuya Sasaki, Sohei Tokuno, Haruka Maeda, Kazuki Nakajima, Osamu Sakura, George Fletcher, Mykola Pechenizkiy, Panagiotis Karras, Irina Shklovski

    Abstract: Which fairness metrics are appropriately applicable in your contexts? There may be instances of discordance regarding the perception of fairness, even when the outcomes comply with established fairness metrics. Several questionnaire-based surveys have been conducted to evaluate fairness metrics with human perceptions of fairness. However, these surveys were limited in scope, including only a few h… ▽ More

    Submitted 8 May, 2025; v1 submitted 24 March, 2024; originally announced March 2024.

  29. arXiv:2402.19226  [pdf, other

    cs.LG cs.CY

    Investigating Gender Fairness in Machine Learning-driven Personalized Care for Chronic Pain

    Authors: Pratik Gajane, Sean Newman, Mykola Pechenizkiy, John D. Piette

    Abstract: Chronic pain significantly diminishes the quality of life for millions worldwide. While psychoeducation and therapy can improve pain outcomes, many individuals experiencing pain lack access to evidence-based treatments or fail to complete the necessary number of sessions to achieve benefit. Reinforcement learning (RL) shows potential in tailoring personalized pain management interventions accordin… ▽ More

    Submitted 14 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  30. arXiv:2401.09334  [pdf, other

    cs.CL cs.AI

    Large Language Models Are Neurosymbolic Reasoners

    Authors: Meng Fang, Shilong Deng, Yudi Zhang, Zijing Shi, Ling Chen, Mykola Pechenizkiy, Jun Wang

    Abstract: A wide range of real-world applications is characterized by their symbolic nature, necessitating a strong capability for symbolic reasoning. This paper investigates the potential application of Large Language Models (LLMs) as symbolic reasoners. We focus on text-based games, significant benchmarks for agents with natural language capabilities, particularly in symbolic tasks like math, map reading,… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI 2024

  31. arXiv:2312.15339  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning

    Authors: Bram Grooten, Tristan Tomilin, Gautham Vasan, Matthew E. Taylor, A. Rupam Mahmood, Meng Fang, Mykola Pechenizkiy, Decebal Constantin Mocanu

    Abstract: The visual world provides an abundance of information, but many input pixels received by agents often contain distracting stimuli. Autonomous agents need the ability to distinguish useful information from task-irrelevant perceptions, enabling them to generalize to unseen environments with new distractions. Existing works approach this problem using data augmentation or large auxiliary networks wit… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: Accepted as full-paper (oral) at AAMAS 2024. Code is available at https://github.com/bramgrooten/mask-distractions and see our 40-second video at https://youtu.be/2oImF0h1k48

  32. arXiv:2312.06315  [pdf, other

    cs.CL cs.CY cs.LG

    GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models

    Authors: Jiaxu Zhao, Meng Fang, Shirui Pan, Wenpeng Yin, Mykola Pechenizkiy

    Abstract: Warning: This paper contains content that may be offensive or upsetting. There has been a significant increase in the usage of large language models (LLMs) in various applications, both in their original form and through fine-tuned adaptations. As a result, LLMs have gained popularity and are being widely adopted by a large user community. However, one of the concerns with LLMs is the potential ge… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  33. arXiv:2312.04727  [pdf, other

    cs.CV

    E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation

    Authors: Boqian Wu, Qiao Xiao, Shiwei Liu, Lu Yin, Mykola Pechenizkiy, Decebal Constantin Mocanu, Maurice Van Keulen, Elena Mocanu

    Abstract: Deep neural networks have evolved as the leading approach in 3D medical image segmentation due to their outstanding performance. However, the ever-increasing model size and computation cost of deep neural networks have become the primary barrier to deploying them on real-world resource-limited hardware. In pursuit of improving performance and efficiency, we propose a 3D medical image segmentation… ▽ More

    Submitted 19 February, 2025; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted at NeurIPS 2024

  34. arXiv:2312.04307  [pdf, other

    cs.LG

    A Structural-Clustering Based Active Learning for Graph Neural Networks

    Authors: Ricky Maulana Fajri, Yulong Pei, Lu Yin, Mykola Pechenizkiy

    Abstract: In active learning for graph-structured data, Graph Neural Networks (GNNs) have shown effectiveness. However, a common challenge in these applications is the underutilization of crucial structural information. To address this problem, we propose the Structural-Clustering PageRank method for improved Active learning (SPA) specifically designed for graph-structured data. SPA integrates community det… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  35. arXiv:2312.03044  [pdf, other

    cs.LG

    REST: Enhancing Group Robustness in DNNs through Reweighted Sparse Training

    Authors: Jiaxu Zhao, Lu Yin, Shiwei Liu, Meng Fang, Mykola Pechenizkiy

    Abstract: The deep neural network (DNN) has been proven effective in various domains. However, they often struggle to perform well on certain minority groups during inference, despite showing strong performance on the majority of data groups. This is because over-parameterized models learned \textit{bias attributes} from a large number of \textit{bias-aligned} training samples. These bias attributes are str… ▽ More

    Submitted 8 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

  36. arXiv:2312.01397  [pdf, other

    cs.CV cs.LG

    Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective

    Authors: Can Jin, Tianjin Huang, Yihua Zhang, Mykola Pechenizkiy, Sijia Liu, Shiwei Liu, Tianlong Chen

    Abstract: The rapid development of large-scale deep learning models questions the affordability of hardware platforms, which necessitates the pruning to reduce their computational and memory footprints. Sparse neural networks as the product, have demonstrated numerous favorable benefits like low complexity, undamaged generalization, etc. Most of the prominent pruning strategies are invented from a model-cen… ▽ More

    Submitted 5 September, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

  37. arXiv:2310.19650  [pdf, other

    cs.CL

    KeyGen2Vec: Learning Document Embedding via Multi-label Keyword Generation in Question-Answering

    Authors: Iftitahu Ni'mah, Samaneh Khoshrou, Vlado Menkovski, Mykola Pechenizkiy

    Abstract: Representing documents into high dimensional embedding space while preserving the structural similarity between document sources has been an ultimate goal for many works on text representation learning. Current embedding models, however, mainly rely on the availability of label supervision to increase the expressiveness of the resulting embeddings. In contrast, unsupervised embeddings are cheap, b… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Arxiv preprint

  38. arXiv:2310.08725  [pdf, ps, other

    cs.LG

    Heterophily-Based Graph Neural Network for Imbalanced Classification

    Authors: Zirui Liang, Yuntao Li, Tianjin Huang, Akrati Saxena, Yulong Pei, Mykola Pechenizkiy

    Abstract: Graph neural networks (GNNs) have shown promise in addressing graph-related problems, including node classification. However, conventional GNNs assume an even distribution of data across classes, which is often not the case in real-world scenarios, where certain classes are severely underrepresented. This leads to suboptimal performance of standard GNNs on imbalanced graphs. In this paper, we intr… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted by Twelfth International Conference on Complex Networks & Their Applications

  39. arXiv:2310.05175  [pdf, ps, other

    cs.LG

    Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

    Authors: Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Gen Li, Ajay Jaiswal, Mykola Pechenizkiy, Yi Liang, Michael Bendersky, Zhangyang Wang, Shiwei Liu

    Abstract: Large Language Models (LLMs), renowned for their remarkable performance across diverse domains, present a challenge when it comes to practical deployment due to their colossal model size. In response to this challenge, efforts have been directed toward the application of traditional network pruning techniques to LLMs, uncovering a massive number of parameters that can be pruned in one-shot without… ▽ More

    Submitted 30 June, 2025; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: Published at ICML 2024

  40. arXiv:2309.15737  [pdf, other

    cs.LG

    Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need

    Authors: Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

    Abstract: We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically compared to the existing algorithms. Our main theoretical result is a Bayesian regret bound for each cost component of \tilde{O} (HS \sqrt{AT}) for any… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  41. arXiv:2306.14275  [pdf, other

    cs.LG cs.AI

    Enhancing Adversarial Training via Reweighting Optimization Trajectory

    Authors: Tianjin Huang, Shiwei Liu, Tianlong Chen, Meng Fang, Li Shen, Vlaod Menkovski, Lu Yin, Yulong Pei, Mykola Pechenizkiy

    Abstract: Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization. A number of approaches have been proposed to address these drawbacks such as extra regularization, adversarial weights perturbat… ▽ More

    Submitted 4 February, 2024; v1 submitted 25 June, 2023; originally announced June 2023.

    Comments: Accepted by ECML 2023

    Journal ref: ECML 2023

  42. arXiv:2305.19454  [pdf, other

    cs.LG cs.AI cs.CV

    Dynamic Sparsity Is Channel-Level Sparsity Learner

    Authors: Lu Yin, Gen Li, Meng Fang, Li Shen, Tianjin Huang, Zhangyang Wang, Vlado Menkovski, Xiaolong Ma, Mykola Pechenizkiy, Shiwei Liu

    Abstract: Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for the entire training process as well as inference. Dynamic sparse training (DST), as a leading sparse training approach, can train deep neural networks at high sparsity from scratch to match the performance of their dense counterparts. However, most if not all DST prior arts demonstrat… ▽ More

    Submitted 10 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  43. arXiv:2305.19412  [pdf, other

    cs.CV cs.AI

    Are Large Kernels Better Teachers than Transformers for ConvNets?

    Authors: Tianjin Huang, Lu Yin, Zhenyu Zhang, Li Shen, Meng Fang, Mykola Pechenizkiy, Zhangyang Wang, Shiwei Liu

    Abstract: This paper reveals a new appeal of the recently emerged large-kernel Convolutional Neural Networks (ConvNets): as the teacher in Knowledge Distillation (KD) for small-kernel ConvNets. While Transformers have led state-of-the-art (SOTA) performance in various fields with ever-larger models and labeled data, small-kernel ConvNets are considered more suitable for resource-limited applications due to… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by ICML 2023

    Journal ref: ICML 2023

  44. arXiv:2305.18427  [pdf, other

    cs.LG

    Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach

    Authors: Yudi Zhang, Yali Du, Biwei Huang, Ziyan Wang, Jun Wang, Meng Fang, Mykola Pechenizkiy

    Abstract: A major challenge in reinforcement learning is to determine which state-action pairs are responsible for future rewards that are delayed. Reward redistribution serves as a solution to re-assign credits for each time step from observed sequences. While the majority of current approaches construct the reward redistribution in an uninterpretable manner, we propose to explicitly model the contribution… ▽ More

    Submitted 10 November, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 camera-ready version

  45. arXiv:2305.18382  [pdf, other

    cs.LG

    Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers

    Authors: Zahra Atashgahi, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu

    Abstract: Efficient time series forecasting has become critical for real-world applications, particularly with deep neural networks (DNNs). Efficiency in DNNs can be achieved through sparse connectivity and reducing the model size. However, finding the sparsity level automatically during training remains challenging due to the heterogeneity in the loss-sparsity tradeoffs across the datasets. In this paper,… ▽ More

    Submitted 12 June, 2024; v1 submitted 28 May, 2023; originally announced May 2023.

  46. arXiv:2305.13938  [pdf, other

    cs.CY cs.AI cs.LG

    Algorithmic Unfairness through the Lens of EU Non-Discrimination Law: Or Why the Law is not a Decision Tree

    Authors: Hilde Weerts, Raphaƫle Xenidis, Fabien Tarissan, Henrik Palmer Olsen, Mykola Pechenizkiy

    Abstract: Concerns regarding unfairness and discrimination in the context of artificial intelligence (AI) systems have recently received increased attention from both legal and computer science scholars. Yet, the degree of overlap between notions of algorithmic bias and fairness on the one hand, and legal notions of discrimination and equality on the other, is often unclear, leading to misunderstandings bet… ▽ More

    Submitted 24 May, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Journal ref: 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT '23)

  47. arXiv:2305.11262  [pdf, other

    cs.CL

    CHBias: Bias Evaluation and Mitigation of Chinese Conversational Language Models

    Authors: Jiaxu Zhao, Meng Fang, Zijing Shi, Yitong Li, Ling Chen, Mykola Pechenizkiy

    Abstract: \textit{\textbf{\textcolor{red}{Warning}:} This paper contains content that may be offensive or upsetting.} Pretrained conversational agents have been exposed to safety issues, exhibiting a range of stereotypical human biases such as gender bias. However, there are still limited bias categories in current research, and most of them only focus on English. In this paper, we introduce a new Chinese d… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023

  48. arXiv:2305.08566  [pdf, other

    cs.CL

    NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference Checklist

    Authors: Iftitahu Ni'mah, Meng Fang, Vlado Menkovski, Mykola Pechenizkiy

    Abstract: In this study, we analyze automatic evaluation metrics for Natural Language Generation (NLG), specifically task-agnostic metrics and human-aligned metrics. Task-agnostic metrics, such as Perplexity, BLEU, BERTScore, are cost-effective and highly adaptable to diverse NLG tasks, yet they have a weak correlation with human. Human-aligned metrics (CTC, CtrlEval, UniEval) improves correlation level by… ▽ More

    Submitted 26 May, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: To appear at ACL 2023 Toronto (main conference). 9 pages (main), 1 page for Limitations and Ethics, 11 pages for Appendix

  49. Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML

    Authors: Hilde Weerts, Florian Pfisterer, Matthias Feurer, Katharina Eggensperger, Edward Bergman, Noor Awad, Joaquin Vanschoren, Mykola Pechenizkiy, Bernd Bischl, Frank Hutter

    Abstract: The field of automated machine learning (AutoML) introduces techniques that automate parts of the development of machine learning (ML) systems, accelerating the process and reducing barriers for novices. However, decisions derived from ML models can reproduce, amplify, or even introduce unfairness in our societies, causing harm to (groups of) individuals. In response, researchers have started to p… ▽ More

    Submitted 20 February, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Journal ref: Journal of Artificial Intelligence Research 79 (2024) 639-677

  50. arXiv:2303.07200  [pdf, other

    cs.NE cs.AI cs.LG

    Supervised Feature Selection with Neuron Evolution in Sparse Neural Networks

    Authors: Zahra Atashgahi, Xuhao Zhang, Neil Kichler, Shiwei Liu, Lu Yin, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu

    Abstract: Feature selection that selects an informative subset of variables from data not only enhances the model interpretability and performance but also alleviates the resource demands. Recently, there has been growing attention on feature selection using neural networks. However, existing methods usually suffer from high computational costs when applied to high-dimensional datasets. In this paper, inspi… ▽ More

    Submitted 14 March, 2023; v1 submitted 10 March, 2023; originally announced March 2023.