Skip to main content

Showing 1–4 of 4 results for author: Moslemi, M H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2411.01685  [pdf, ps, other

    cs.LG cs.CY cs.DB

    Reducing Biases in Record Matching Through Scores Calibration

    Authors: Mohammad Hossein Moslemi, Mostafa Milani

    Abstract: Record matching is the task of identifying records that refer to the same real-world entity across datasets. While most existing models optimize for accuracy, fairness has become an important concern due to the potential for unequal outcomes across demographic groups. Prior work typically focuses on binary outcomes evaluated at fixed decision thresholds. However, such evaluations can miss biases i… ▽ More

    Submitted 25 June, 2025; v1 submitted 3 November, 2024; originally announced November 2024.

  2. arXiv:2409.16410  [pdf, other

    cs.LG cs.DB

    Evaluating Blocking Biases in Entity Matching

    Authors: Mohammad Hossein Moslemi, Harini Balamurugan, Mostafa Milani

    Abstract: Entity Matching (EM) is crucial for identifying equivalent data entities across different sources, a task that becomes increasingly challenging with the growth and heterogeneity of data. Blocking techniques, which reduce the computational complexity of EM, play a vital role in making this process scalable. Despite advancements in blocking methods, the issue of fairness; where blocking may inadvert… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  3. Threshold-Independent Fair Matching through Score Calibration

    Authors: Mohammad Hossein Moslemi, Mostafa Milani

    Abstract: Entity Matching (EM) is a critical task in numerous fields, such as healthcare, finance, and public administration, as it identifies records that refer to the same entity within or across different databases. EM faces considerable challenges, particularly with false positives and negatives. These are typically addressed by generating matching scores and apply thresholds to balance false positives… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  4. arXiv:2403.02372  [pdf, other

    cs.LG cs.AI cs.DB

    OTClean: Data Cleaning for Conditional Independence Violations using Optimal Transport

    Authors: Alireza Pirhadi, Mohammad Hossein Moslemi, Alexander Cloninger, Mostafa Milani, Babak Salimi

    Abstract: Ensuring Conditional Independence (CI) constraints is pivotal for the development of fair and trustworthy machine learning models. In this paper, we introduce \sys, a framework that harnesses optimal transport theory for data repair under CI constraints. Optimal transport theory provides a rigorous framework for measuring the discrepancy between probability distributions, thereby ensuring control… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.