Skip to main content

Showing 1–6 of 6 results for author: van Kesteren, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13167  [pdf, other

    stat.ML cs.LG

    A density ratio framework for evaluating the utility of synthetic data

    Authors: Thom Benjamin Volker, Peter-Paul de Wolf, Erik-Jan van Kesteren

    Abstract: Synthetic data generation is a promising technique to facilitate the use of sensitive data while mitigating the risk of privacy breaches. However, for synthetic data to be useful in downstream analysis tasks, it needs to be of sufficient quality. Various methods have been proposed to measure the utility of synthetic data, but their results are often incomplete or even misleading. In this paper, we… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  2. arXiv:2405.08203  [pdf, other

    physics.soc-ph cs.SI stat.ME

    Community detection in bipartite signed networks is highly dependent on parameter choice

    Authors: Elena Candellone, Erik-Jan van Kesteren, Sofia Chelmi, Javier Garcia-Bernardo

    Abstract: Decision-making processes often involve voting. Human interactions with exogenous entities such as legislations or products can be effectively modeled as two-mode (bipartite) signed networks-where people can either vote positively, negatively, or abstain from voting on the entities. Detecting communities in such networks could help us understand underlying properties: for example ideological camps… ▽ More

    Submitted 27 January, 2025; v1 submitted 13 May, 2024; originally announced May 2024.

  3. To democratize research with sensitive data, we should make synthetic data more accessible

    Authors: Erik-Jan van Kesteren

    Abstract: For over 30 years, synthetic data has been heralded as a promising solution to make sensitive datasets accessible. However, despite much research effort and several high-profile use-cases, the widespread adoption of synthetic data as a tool for open, accessible, reproducible research with sensitive data is still a distant dream. In this opinion, Erik-Jan van Kesteren, head of the ODISSEI Social Da… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 4 pages, 2 figures

  4. arXiv:2212.06711  [pdf, other

    cs.CL cs.CY

    On Text-based Personality Computing: Challenges and Future Directions

    Authors: Qixiang Fang, Anastasia Giachanou, Ayoub Bagheri, Laura Boeschoten, Erik-Jan van Kesteren, Mahdi Shafiee Kamalabad, Daniel L Oberski

    Abstract: Text-based personality computing (TPC) has gained many research interests in NLP. In this paper, we describe 15 challenges that we consider deserving the attention of the research community. These challenges are organized by the following topics: personality taxonomies, measurement quality, datasets, performance evaluation, modelling choices, as well as ethics and fairness. When addressing each ch… ▽ More

    Submitted 22 May, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Findings of ACL 2023. Long paper

  5. arXiv:2003.07621  [pdf, other

    stat.ML cs.CY cs.LG

    Fair inference on error-prone outcomes

    Authors: Laura Boeschoten, Erik-Jan van Kesteren, Ayoub Bagheri, Daniel L. Oberski

    Abstract: Fair inference in supervised learning is an important and active area of research, yielding a range of useful methods to assess and account for fairness criteria when predicting ground truth targets. As shown in recent work, however, when target labels are error-prone, potential prediction unfairness can arise from measurement error. In this paper, we show that, when an error-prone proxy target is… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

    Comments: Online supplementary code is available at https://dx.doi.org/10.5281/zenodo.3708150

  6. arXiv:1911.03183  [pdf, other

    cs.LG cs.CR cs.DC stat.ML

    Privacy-Preserving Generalized Linear Models using Distributed Block Coordinate Descent

    Authors: Erik-Jan van Kesteren, Chang Sun, Daniel L. Oberski, Michel Dumontier, Lianne Ippel

    Abstract: Combining data from varied sources has considerable potential for knowledge discovery: collaborating data parties can mine data in an expanded feature space, allowing them to explore a larger range of scientific questions. However, data sharing among different parties is highly restricted by legal conditions, ethical concerns, and / or data volume. Fueled by these concerns, the fields of cryptogra… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: Fully reproducible code for all results and images can be found at https://github.com/vankesteren/privacy-preserving-glm, and the software package can be found at https://github.com/vankesteren/privreg