Skip to main content

Showing 1–24 of 24 results for author: Meisenbacher, S

.
  1. arXiv:2503.22379  [pdf, other

    cs.CR cs.CL

    Spend Your Budget Wisely: Towards an Intelligent Distribution of the Privacy Budget in Differentially Private Text Rewriting

    Authors: Stephen Meisenbacher, Chaeeun Joy Lee, Florian Matthes

    Abstract: The task of $\textit{Differentially Private Text Rewriting}$ is a class of text privatization techniques in which (sensitive) input textual documents are $\textit{rewritten}$ under Differential Privacy (DP) guarantees. The motivation behind such methods is to hide both explicit and implicit identifiers that could be contained in text, while still retaining the semantic meaning of the original text… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 14 pages, 1 figure, 6 tables. Accepted to CODASPY 2025

  2. arXiv:2503.09338  [pdf, other

    cs.CL cs.HC

    Investigating User Perspectives on Differentially Private Text Privatization

    Authors: Stephen Meisenbacher, Alexandra Klymenko, Alexander Karpp, Florian Matthes

    Abstract: Recent literature has seen a considerable uptick in $\textit{Differentially Private Natural Language Processing}$ (DP NLP). This includes DP text privatization, where potentially sensitive input texts are transformed under DP to achieve privatized output texts that ideally mask sensitive information $\textit{and}$ maintain original semantics. Despite continued work to address the open challenges i… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: 20 pages, 5 figures, 10 tables. Accepted to PrivateNLP 2025

  3. arXiv:2502.04173  [pdf, other

    cs.CL

    Lexical Substitution is not Synonym Substitution: On the Importance of Producing Contextually Relevant Word Substitutes

    Authors: Juraj Vladika, Stephen Meisenbacher, Florian Matthes

    Abstract: Lexical Substitution is the task of replacing a single word in a sentence with a similar one. This should ideally be one that is not necessarily only synonymous, but also fits well into the surrounding context of the target word, while preserving the sentence's grammatical structure. Recent advances in Lexical Substitution have leveraged the masked token prediction task of Pre-trained Language Mod… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

    Comments: Accepted to ICAART 2025

  4. arXiv:2501.19022  [pdf, other

    cs.CL

    On the Impact of Noise in Differentially Private Text Rewriting

    Authors: Stephen Meisenbacher, Maulik Chevli, Florian Matthes

    Abstract: The field of text privatization often leverages the notion of $\textit{Differential Privacy}$ (DP) to provide formal guarantees in the rewriting or obfuscation of sensitive textual data. A common and nearly ubiquitous form of DP application necessitates the addition of calibrated noise to vector representations of text, either at the data- or model-level, which is governed by the privacy parameter… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

    Comments: 19 pages, 3 figures, 9 tables. Accepted to NAACL 2025 (Findings)

  5. arXiv:2412.00423  [pdf, other

    cs.LG eess.SP stat.ML

    On autoregressive deep learning models for day-ahead wind power forecasting with irregular shutdowns due to redispatching

    Authors: Stefan Meisenbacher, Silas Aaron Selzer, Mehdi Dado, Maximilian Beichter, Tim Martin, Markus Zdrallek, Peter Bretschneider, Veit Hagenmeyer, Ralf Mikut

    Abstract: Renewable energies and their operation are becoming increasingly vital for the stability of electrical power grids since conventional power plants are progressively being displaced, and their contribution to redispatch interventions is thereby diminishing. In order to consider renewable energies like Wind Power (WP) for such interventions as a substitute, day-ahead forecasts are necessary to commu… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

  6. arXiv:2412.00419  [pdf, other

    cs.LG eess.SP stat.ML

    AutoPQ: Automating Quantile estimation from Point forecasts in the context of sustainability

    Authors: Stefan Meisenbacher, Kaleb Phipps, Oskar Taubert, Marie Weiel, Markus Götz, Ralf Mikut, Veit Hagenmeyer

    Abstract: Optimizing smart grid operations relies on critical decision-making informed by uncertainty quantification, making probabilistic forecasting a vital tool. Designing such forecasting models involves three key challenges: accurate and unbiased uncertainty quantification, workload reduction for data scientists during the design process, and limitation of the environmental impact of model training. In… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

  7. arXiv:2410.00751  [pdf, other

    cs.CL

    Thinking Outside of the Differential Privacy Box: A Case Study in Text Privatization with Language Model Prompting

    Authors: Stephen Meisenbacher, Florian Matthes

    Abstract: The field of privacy-preserving Natural Language Processing has risen in popularity, particularly at a time when concerns about privacy grow with the proliferation of Large Language Models. One solution consistently appearing in recent literature has been the integration of Differential Privacy (DP) into NLP techniques. In this paper, we take these approaches into critical view, discussing the res… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: 10 pages, 3 tables, Accepted to EMNLP 2024 (Main)

  8. arXiv:2407.14085  [pdf, other

    cs.CL

    An Improved Method for Class-specific Keyword Extraction: A Case Study in the German Business Registry

    Authors: Stephen Meisenbacher, Tim Schopf, Weixin Yan, Patrick Holl, Florian Matthes

    Abstract: The task of $\textit{keyword extraction}$ is often an important initial step in unsupervised information extraction, forming the basis for tasks such as topic modeling or document classification. While recent methods have proven to be quite effective in the extraction of keywords, the identification of $\textit{class-specific}$ keywords, or only those pertaining to a predefined class, remains chal… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 7 pages, 1 figure, 1 table. Accepted to KONVENS 2024

  9. arXiv:2407.02027  [pdf, other

    cs.CY

    Privacy Risks of General-Purpose AI Systems: A Foundation for Investigating Practitioner Perspectives

    Authors: Stephen Meisenbacher, Alexandra Klymenko, Patrick Gage Kelley, Sai Teja Peddinti, Kurt Thomas, Florian Matthes

    Abstract: The rise of powerful AI models, more formally $\textit{General-Purpose AI Systems}$ (GPAIS), has led to impressive leaps in performance across a wide range of tasks. At the same time, researchers and practitioners alike have raised a number of privacy concerns, resulting in a wealth of literature covering various privacy risks and vulnerabilities of AI models. Works surveying such risks provide di… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 5 pages. Accepted to SUPA@SOUPS'24

  10. arXiv:2407.00638  [pdf, other

    cs.CL

    A Collocation-based Method for Addressing Challenges in Word-level Metric Differential Privacy

    Authors: Stephen Meisenbacher, Maulik Chevli, Florian Matthes

    Abstract: Applications of Differential Privacy (DP) in NLP must distinguish between the syntactic level on which a proposed mechanism operates, often taking the form of $\textit{word-level}$ or $\textit{document-level}$ privatization. Recently, several word-level $\textit{Metric}$ Differential Privacy approaches have been proposed, which rely on this generalized DP notion for operating in word embedding spa… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 13 pages, 2 figures, 9 tables. Accepted to PrivateNLP 2024

  11. arXiv:2407.00637  [pdf, other

    cs.CL

    DP-MLM: Differentially Private Text Rewriting Using Masked Language Models

    Authors: Stephen Meisenbacher, Maulik Chevli, Juraj Vladika, Florian Matthes

    Abstract: The task of text privatization using Differential Privacy has recently taken the form of $\textit{text rewriting}$, in which an input text is obfuscated via the use of generative (large) language models. While these methods have shown promising results in the ability to preserve privacy, these methods rely on autoregressive models which lack a mechanism to contextualize the private rewriting proce… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 15 pages, 2 figures, 8 tables. Accepted to ACL 2024 (Findings)

  12. arXiv:2405.19831  [pdf, other

    cs.CL

    Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text

    Authors: Stephen Meisenbacher, Florian Matthes

    Abstract: The study of Differential Privacy (DP) in Natural Language Processing often views the task of text privatization as a $\textit{rewriting}$ task, in which sensitive input texts are rewritten to hide explicit or implicit private information. In order to evaluate the privacy-preserving capabilities of a DP text rewriting mechanism, $\textit{empirical privacy}$ tests are frequently employed. In these… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 10 pages, 2 figures, 2 tables. Accepted to ARES 2024 (IWAPS)

  13. arXiv:2405.01678  [pdf, other

    cs.CL

    1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy

    Authors: Stephen Meisenbacher, Maulik Chevli, Florian Matthes

    Abstract: The study of privacy-preserving Natural Language Processing (NLP) has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are $\textit{word-level Metric Local Differential Privacy (MLDP)}$ mechanisms, which work to obfuscate potential… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 12 pages, 7 figures, 7 tables, 10th ACM International Workshop on Security and Privacy Analytics (IWSPA 2024)

  14. arXiv:2404.18759  [pdf

    cs.CL cs.CY

    Towards A Structured Overview of Use Cases for Natural Language Processing in the Legal Domain: A German Perspective

    Authors: Juraj Vladika, Stephen Meisenbacher, Martina Preis, Alexandra Klymenko, Florian Matthes

    Abstract: In recent years, the field of Legal Tech has risen in prevalence, as the Natural Language Processing (NLP) and legal disciplines have combined forces to digitalize legal processes. Amidst the steady flow of research solutions stemming from the NLP domain, the study of use cases has fallen behind, leading to a number of innovative technical methods without a place in practice. In this work, we aim… ▽ More

    Submitted 2 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 tables, 30th Americas Conference on Information Systems (AMCIS 2024)

  15. arXiv:2404.03324  [pdf, other

    cs.CL

    A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off

    Authors: Stephen Meisenbacher, Nihildev Nandakumar, Alexandra Klymenko, Florian Matthes

    Abstract: The application of Differential Privacy to Natural Language Processing techniques has emerged in relevance in recent years, with an increasing number of studies published in established NLP outlets. In particular, the adaptation of Differential Privacy for use in NLP tasks has first focused on the $\textit{word-level}$, where calibrated noise is added to word embedding vectors to achieve "noisy" r… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024

  16. arXiv:2306.15497  [pdf

    cs.CR cs.CY

    Identifying Practical Challenges in the Implementation of Technical Measures for Data Privacy Compliance

    Authors: Oleksandra Klymenko, Stephen Meisenbacher, Florian Matthes

    Abstract: Modern privacy regulations provide a strict mandate for data processing entities to implement appropriate technical measures to demonstrate compliance. In practice, determining what measures are indeed "appropriate" is not trivial, particularly in light of vague guidelines provided by privacy regulations. To exacerbate the issue, challenges arise not only in the implementation of the technical mea… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 10 pages, 2 tables, 29th Americas Conference on Information Systems (AMCIS 2023)

  17. arXiv:2301.08549  [pdf, ps, other

    cs.CL

    Transforming Unstructured Text into Data with Context Rule Assisted Machine Learning (CRAML)

    Authors: Stephen Meisenbacher, Peter Norlander

    Abstract: We describe a method and new no-code software tools enabling domain experts to build custom structured, labeled datasets from the unstructured text of documents and build niche machine learning text classification models traceable to expert-written rules. The Context Rule Assisted Machine Learning (CRAML) method allows accurate and reproducible labeling of massive volumes of unstructured text. CRA… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

  18. AutoPV: Automated photovoltaic forecasts with limited information using an ensemble of pre-trained models

    Authors: Stefan Meisenbacher, Benedikt Heidrich, Tim Martin, Ralf Mikut, Veit Hagenmeyer

    Abstract: Accurate PhotoVoltaic (PV) power generation forecasting is vital for the efficient operation of Smart Grids. The automated design of such accurate forecasting models for individual PV plants includes two challenges: First, information about the PV mounting configuration (i.e. inclination and azimuth angles) is often missing. Second, for new PV plants, the amount of historical data available to tra… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

  19. Understanding the Implementation of Technical Measures in the Process of Data Privacy Compliance: A Qualitative Study

    Authors: Oleksandra Klymenko, Oleksandr Kosenkov, Stephen Meisenbacher, Parisa Elahidoost, Daniel Mendez, Florian Matthes

    Abstract: Modern privacy regulations, such as the General Data Protection Regulation (GDPR), address privacy in software systems in a technologically agnostic way by mentioning general "technical measures" for data privacy compliance rather than dictating how these should be implemented. An understanding of the concept of technical measures and how exactly these can be handled in practice, however, is not t… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: The 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

  20. Differential Privacy in Natural Language Processing: The Story So Far

    Authors: Oleksandra Klymenko, Stephen Meisenbacher, Florian Matthes

    Abstract: As the tide of Big Data continues to influence the landscape of Natural Language Processing (NLP), the utilization of modern NLP methods has grounded itself in this data, in order to tackle a variety of text-based tasks. These methods without a doubt can include private or otherwise personally identifiable information. As such, the question of privacy in NLP has gained fervor in recent years, coin… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

  21. Review of automated time series forecasting pipelines

    Authors: Stefan Meisenbacher, Marian Turowski, Kaleb Phipps, Martin Rätz, Dirk Müller, Veit Hagenmeyer, Ralf Mikut

    Abstract: Time series forecasting is fundamental for various use cases in different domains such as energy systems and economics. Creating a forecasting model for a specific use case requires an iterative and complex design process. The typical design process includes the five sections (1) data pre-processing, (2) feature engineering, (3) hyperparameter optimization, (4) forecasting method selection, and (5… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

    Journal ref: WIREs Data Mining and Knowledge Discovery (2022) e1475

  22. arXiv:2110.13585  [pdf, other

    cs.LG eess.SP

    Concepts for Automated Machine Learning in Smart Grid Applications

    Authors: Stefan Meisenbacher, Janik Pinter, Tim Martin, Veit Hagenmeyer, Ralf Mikut

    Abstract: Undoubtedly, the increase of available data and competitive machine learning algorithms has boosted the popularity of data-driven modeling in energy systems. Applications are forecasts for renewable energy generation and energy consumption. Forecasts are elementary for sector coupling, where energy-consuming sectors are interconnected with the power-generating sector to address electricity storage… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

  23. arXiv:2106.10157  [pdf, other

    cs.LG

    pyWATTS: Python Workflow Automation Tool for Time Series

    Authors: Benedikt Heidrich, Andreas Bartschat, Marian Turowski, Oliver Neumann, Kaleb Phipps, Stefan Meisenbacher, Kai Schmieder, Nicole Ludwig, Ralf Mikut, Veit Hagenmeyer

    Abstract: Time series data are fundamental for a variety of applications, ranging from financial markets to energy systems. Due to their importance, the number and complexity of tools and methods used for time series analysis is constantly increasing. However, due to unclear APIs and a lack of documentation, researchers struggle to integrate them into their research projects and replicate results. Additiona… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

  24. arXiv:2009.12201  [pdf, other

    eess.SP

    Integrating Battery Aging in the Optimization for Bidirectional Charging of Electric Vehicles

    Authors: Karl Schwenk, Stefan Meisenbacher, Benjamin Briegel, Tim Harr, Veit Hagenmeyer, Ralf Mikut

    Abstract: Smart charging of Electric Vehicles (EVs) reduces operating costs, allows more sustainable battery usage, and promotes the rise of electric mobility. In addition, bidirectional charging and improved connectivity enables efficient power grid support. Today, however, uncoordinated charging, e.g. governed by users' habits, is still the norm. Thus, the impact of upcoming smart charging applications is… ▽ More

    Submitted 29 April, 2021; v1 submitted 23 September, 2020; originally announced September 2020.

    Comments: Revised and Resubmitted to IEEE Transaction on Smart Grid