Skip to main content

Showing 1–17 of 17 results for author: Konstantinov, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.20573  [pdf, ps, other

    stat.ML cs.LG

    LARP: Learner-Agnostic Robust Data Prefiltering

    Authors: Kristian Minchev, Dimitar Iliev Dimitrov, Nikola Konstantinov

    Abstract: The widespread availability of large public datasets is a key factor behind the recent successes of statistical inference and machine learning methods. However, these datasets often contain some low-quality or contaminated data, to which many learning procedures are sensitive. Therefore, the question of whether and how public datasets should be prefiltered to facilitate accurate downstream learnin… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  2. arXiv:2502.02331  [pdf, other

    stat.ML cs.LG

    On the Impact of Performative Risk Minimization for Binary Random Variables

    Authors: Nikita Tsoy, Ivan Kirev, Negin Rahimiyazdi, Nikola Konstantinov

    Abstract: Performativity, the phenomenon where outcomes are influenced by predictions, is particularly prevalent in social contexts where individuals strategically respond to a deployed model. In order to preserve the high accuracy of machine learning models under distribution shifts caused by performativity, Perdomo et al. (2020) introduced the concept of performative risk minimization (PRM). While this fr… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  3. arXiv:2412.00980  [pdf, ps, other

    cs.LG cs.GT stat.ML

    Incentivizing Truthful Collaboration in Heterogeneous Federated Learning

    Authors: Dimitar Chakarov, Nikita Tsoy, Kristian Minchev, Nikola Konstantinov

    Abstract: Federated learning (FL) is a distributed collaborative learning method, where multiple clients learn together by sharing gradient updates instead of raw data. However, it is well-known that FL is vulnerable to manipulated updates from clients. In this work we study the impact of data heterogeneity on clients' incentives to manipulate their updates. First, we present heterogeneous collaborative lea… ▽ More

    Submitted 5 March, 2025; v1 submitted 1 December, 2024; originally announced December 2024.

    Comments: 29 pages, 8 figures

  4. arXiv:2410.07959  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act

    Authors: Philipp Guldimann, Alexander Spiridonov, Robin Staab, Nikola Jovanović, Mark Vero, Velko Vechev, Anna-Maria Gueorguieva, Mislav Balunović, Nikola Konstantinov, Pavol Bielik, Petar Tsankov, Martin Vechev

    Abstract: The EU's Artificial Intelligence Act (AI Act) is a significant step towards responsible AI development, but lacks clear technical interpretation, making it difficult to assess models' compliance. This work presents COMPL-AI, a comprehensive framework consisting of (i) the first technical interpretation of the EU AI Act, translating its broad regulatory requirements into measurable technical requir… ▽ More

    Submitted 3 February, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

  5. arXiv:2405.17299  [pdf, other

    stat.ML cs.LG math.OC

    Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data

    Authors: Nikita Tsoy, Nikola Konstantinov

    Abstract: Simplicity bias, the propensity of deep models to over-rely on simple features, has been identified as a potential reason for limited out-of-distribution generalization of neural networks (Shah et al., 2020). Despite the important implications, this phenomenon has been theoretically confirmed and characterized only under strong dataset assumptions, such as linear separability (Lyu et al., 2021). I… ▽ More

    Submitted 7 November, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: ICML 2024, camera-ready version (expanded related work)

  6. arXiv:2403.06672  [pdf, ps, other

    stat.ML cs.CR cs.GT cs.LG

    Provable Mutual Benefits from Federated Learning in Privacy-Sensitive Domains

    Authors: Nikita Tsoy, Anna Mihalkova, Teodora Todorova, Nikola Konstantinov

    Abstract: Cross-silo federated learning (FL) allows data owners to train accurate machine learning models by benefiting from each others private datasets. Unfortunately, the model accuracy benefits of collaboration are often undermined by privacy defenses. Therefore, to incentivize client participation in privacy-sensitive domains, a FL protocol should strike a delicate balance between privacy guarantees an… ▽ More

    Submitted 7 November, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: AISTATS 2024; Camera-ready version (updated references)

  7. arXiv:2305.16272  [pdf, other

    cs.LG cs.GT stat.ML

    Incentivizing Honesty among Competitors in Collaborative Learning and Optimization

    Authors: Florian E. Dorner, Nikola Konstantinov, Georgi Pashaliev, Martin Vechev

    Abstract: Collaborative learning techniques have the potential to enable training machine learning models that are superior to models trained on a single entity's data. However, in many cases, potential participants in such collaborative schemes are competitors on a downstream task, such as firms that each aim to attract customers by providing the best recommendations. This can incentivize dishonest updates… ▽ More

    Submitted 10 February, 2025; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Updated experimental results after fixing a mistake in the code. Previous version published in NeurIPS 2023; 37 pages, 5 figures

  8. arXiv:2305.16052  [pdf, ps, other

    cs.LG cs.GT

    Strategic Data Sharing between Competitors

    Authors: Nikita Tsoy, Nikola Konstantinov

    Abstract: Collaborative learning techniques have significantly advanced in recent years, enabling private model training across multiple organizations. Despite this opportunity, firms face a dilemma when considering data sharing with competitors -- while collaboration can improve a company's machine learning model, it may also benefit competitors and hence reduce profits. In this work, we introduce a genera… ▽ More

    Submitted 30 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted to NeurIPS 2023

  9. arXiv:2212.10154  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Human-Guided Fair Classification for Natural Language Processing

    Authors: Florian E. Dorner, Momchil Peychev, Nikola Konstantinov, Naman Goel, Elliott Ash, Martin Vechev

    Abstract: Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them.… ▽ More

    Submitted 16 March, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Published at ICLR 2023 (notable top 25%). 30 pages, 1 figure

  10. arXiv:2206.12395  [pdf, other

    cs.LG cs.CR cs.DC

    Data Leakage in Federated Averaging

    Authors: Dimitar I. Dimitrov, Mislav Balunović, Nikola Konstantinov, Martin Vechev

    Abstract: Recent attacks have shown that user data can be recovered from FedSGD updates, thus breaking privacy. However, these attacks are of limited practical relevance as federated learning typically uses the FedAvg algorithm. Compared to FedSGD, recovering data from FedAvg updates is much harder as: (i) the updates are computed at unobserved intermediate network weights, (ii) a large number of batches ar… ▽ More

    Submitted 1 November, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

    ACM Class: I.2.11

  11. arXiv:2106.11732  [pdf, other

    cs.LG stat.ML

    FLEA: Provably Robust Fair Multisource Learning from Unreliable Training Data

    Authors: Eugenia Iofinova, Nikola Konstantinov, Christoph H. Lampert

    Abstract: Fairness-aware learning aims at constructing classifiers that not only make accurate predictions, but also do not discriminate against specific groups. It is a fast-growing area of machine learning with far-reaching societal impact. However, existing fair learning methods are vulnerable to accidental or malicious artifacts in the training data, which can cause them to unknowingly produce unfair cl… ▽ More

    Submitted 11 January, 2023; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: 10 pages in main text; 42 pages including bibliography and appendix. Published in Transactions of Machine Learning Research (TMLR), 2022, https://openreview.net/forum?id=XsPopigZX; project website at https://github.com/ISTAustria-CVML/FLEA

  12. arXiv:2102.06004  [pdf, ps, other

    cs.LG stat.ML

    Fairness-Aware PAC Learning from Corrupted Data

    Authors: Nikola Konstantinov, Christoph H. Lampert

    Abstract: Addressing fairness concerns about machine learning models is a crucial step towards their long-term adoption in real-world automated systems. While many approaches have been developed for training fair models from data, little is known about the robustness of these methods to data corruption. In this work we consider fairness-aware learning under worst-case data manipulations. We show that an adv… ▽ More

    Submitted 7 June, 2022; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: In Journal of Machine Learning Research (JMLR): http://jmlr.org/papers/v23/21-1189.html

  13. arXiv:2102.05996  [pdf, other

    cs.LG cs.IR stat.ML

    Fairness Through Regularization for Learning to Rank

    Authors: Nikola Konstantinov, Christoph H. Lampert

    Abstract: Given the abundance of applications of ranking in recent years, addressing fairness concerns around automated ranking systems becomes necessary for increasing the trust among end-users. Previous work on fair ranking has mostly focused on application-specific fairness notions, often tailored to online advertising, and it rarely considers learning as part of the process. In this work, we show how to… ▽ More

    Submitted 7 June, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: 34 pages

  14. arXiv:2002.10384  [pdf, other

    cs.LG stat.ML

    On the Sample Complexity of Adversarial Multi-Source PAC Learning

    Authors: Nikola Konstantinov, Elias Frantar, Dan Alistarh, Christoph H. Lampert

    Abstract: We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is known that in the single-so… ▽ More

    Submitted 30 June, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: International Conference on Machine Learning (ICML) 2020: Camera-ready. Strengthened the definition of adversarial PAC-learnability, added explicit bounds on sample complexity

  15. arXiv:1901.10310  [pdf, other

    cs.LG stat.ML

    Robust Learning from Untrusted Sources

    Authors: Nikola Konstantinov, Christoph Lampert

    Abstract: Modern machine learning methods often require more data for training than a single expert can provide. Therefore, it has become a standard procedure to collect data from external sources, e.g. via crowdsourcing. Unfortunately, the quality of these sources is not always guaranteed. As additional complications, the data might be stored in a distributed way, or might even have to remain private. In t… ▽ More

    Submitted 17 May, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: Accepted to International Conference on Machine Learning (ICML), 2019; Camera-ready version

  16. arXiv:1809.10505  [pdf, other

    cs.LG cs.DC stat.ML

    The Convergence of Sparsified Gradient Methods

    Authors: Dan Alistarh, Torsten Hoefler, Mikael Johansson, Sarit Khirirat, Nikola Konstantinov, Cédric Renggli

    Abstract: Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace. Several families of communication-reduction methods, such as quantization, large-batch methods, and gradient sparsification, have been proposed. To date, gradient sparsification methods - where each node sorts gradients by magnitude, and only c… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

    Comments: NIPS 2018 - Advances in Neural Information Processing Systems; Authors in alphabetic order

  17. arXiv:1803.08841  [pdf, other

    cs.DC cs.LG stat.ML

    The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

    Authors: Dan Alistarh, Christopher De Sa, Nikola Konstantinov

    Abstract: Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on distributed machine learning, significant work has been dedicated to the convergence properties of this algorithm under the inconsistent and noisy updates arising from ex… ▽ More

    Submitted 22 June, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: To be published in PoDC 2018; 18 pages, 1 figure; Changes: added pseudocode for Algorithm 2, some references and corrected typos