Skip to main content

Showing 1–9 of 9 results for author: Shahbazi, N

.
  1. arXiv:2505.18919  [pdf, ps, other

    cs.DS

    Fair-Count-Min: Frequency Estimation under Equal Group-wise Approximation Factor

    Authors: Nima Shahbazi, Stavros Sintos, Abolfazl Asudeh

    Abstract: Frequency estimation in streaming data often relies on sketches like Count-Min (CM) to provide approximate answers with sublinear space. However, CM sketches introduce additive errors that disproportionately impact low-frequency elements, creating fairness concerns across different groups of elements. We introduce Fair-Count-Min, a frequency estimation sketch that guarantees equal expected approxi… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  2. arXiv:2404.11782  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    REQUAL-LM: Reliability and Equity through Aggregation in Large Language Models

    Authors: Sana Ebrahimi, Nima Shahbazi, Abolfazl Asudeh

    Abstract: The extensive scope of large language models (LLMs) across various domains underscores the critical importance of responsibility in their application, beyond natural language processing. In particular, the randomized nature of LLMs, coupled with inherent biases and historical stereotypes in data, raises critical concerns regarding reliability and equity. Addressing these challenges are necessary b… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  3. arXiv:2404.07354  [pdf, other

    cs.DB cs.CY cs.LG

    FairEM360: A Suite for Responsible Entity Matching

    Authors: Nima Shahbazi, Mahdi Erfanian, Abolfazl Asudeh, Fatemeh Nargesian, Divesh Srivastava

    Abstract: Entity matching is one the earliest tasks that occur in the big data pipeline and is alarmingly exposed to unintentional biases that affect the quality of data. Identifying and mitigating the biases that exist in the data or are introduced by the matcher at this stage can contribute to promoting fairness in downstream tasks. This demonstration showcases FairEM360, a framework for 1) auditing the o… ▽ More

    Submitted 18 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  4. arXiv:2307.11355  [pdf, other

    cs.DS cs.DB

    A Fair and Memory/Time-efficient Hashmap

    Authors: Abolfazl Asudeh, Nima Shahbazi, Stavros Sintos

    Abstract: Hashmap is a fundamental data structure in computer science. There has been extensive research on constructing hashmaps that minimize the number of collisions leading to efficient lookup query time. Recently, the data-dependant approaches, construct hashmaps tailored for a target data distribution that guarantee to uniformly distribute data across different buckets and hence minimize the collision… ▽ More

    Submitted 13 April, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

    Journal ref: SIGMOD 2024

  5. arXiv:2307.02726  [pdf, other

    cs.DB cs.CY cs.LG

    Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching

    Authors: Nima Shahbazi, Nikola Danevski, Fatemeh Nargesian, Abolfazl Asudeh, Divesh Srivastava

    Abstract: Entity matching (EM) is a challenging problem studied by different communities for over half a century. Algorithmic fairness has also become a timely topic to address machine bias and its societal impacts. Despite extensive research on these two topics, little attention has been paid to the fairness of entity matching. Towards addressing this gap, we perform an extensive experimental evaluation… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: Accepted to VLDB'23

  6. arXiv:2306.13868  [pdf, other

    cs.DB

    Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach

    Authors: Melika Mousavi, Nima Shahbazi, Abolfazl Asudeh

    Abstract: Existing machine learning models have proven to fail when it comes to their performance for minority groups, mainly due to biases in data. In particular, datasets, especially social data, are often not representative of minorities. In this paper, we consider the problem of representation bias identification on image datasets without explicit attribute values. Using the notion of data coverage for… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: In EDBT 2024, 27th International Conference on Extending Database Technology

  7. arXiv:2305.02204  [pdf, other

    cs.CY

    PopSim: An Individual-level Population Simulator for Equitable Allocation of City Resources

    Authors: Khanh Duy Nguyen, Nima Shahbazi, Abolfazl Asudeh

    Abstract: Historical systematic exclusionary tactics based on race have forced people of certain demographic groups to congregate in specific urban areas. Aside from the ethical aspects of such segregation, these policies have implications for the allocation of urban resources including public transportation, healthcare, and education within the cities. The initial step towards addressing these issues invol… ▽ More

    Submitted 25 April, 2023; originally announced May 2023.

    Comments: Published as part of the Workshop on Algorithmic Fairness in Artificial Intelligence, Machine Learning, and Decision Making (AFair-AMLD) at the SIAM International Conference on Data Mining (SDM23)

    ACM Class: E.0

    Journal ref: AFair-AMLD workshop at SIAM International Conference on Data Mining (2023)

  8. arXiv:2204.07682  [pdf, other

    cs.DB

    Reliability Evaluation of Individual Predictions: A Data-centric Approach

    Authors: Nima Shahbazi, Abolfazl Asudeh

    Abstract: Machine learning models only provide probabilistic guarantees on the expected loss of random samples from the distribution represented by their training data. As a result, a model with high accuracy, may or may not be reliable for predicting an individual query point. To address this issue, XAI aims to provide explanations of individual predictions, while approaches such as conformal predictions,… ▽ More

    Submitted 10 April, 2024; v1 submitted 15 April, 2022; originally announced April 2022.

  9. arXiv:2203.11852  [pdf, other

    cs.DB cs.LG

    Representation Bias in Data: A Survey on Identification and Resolution Techniques

    Authors: Nima Shahbazi, Yin Lin, Abolfazl Asudeh, H. V. Jagadish

    Abstract: Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately. Representation Bias in data can happen due to various reasons ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods. Given that "bias in, bias out", one cannot expect AI-based so… ▽ More

    Submitted 18 March, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: Just Accepted ACM Comput. Surv. (March 2023)