Skip to main content

Showing 1–4 of 4 results for author: Heidari, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2310.00027  [pdf, ps, other

    stat.ML cs.LG

    Out-Of-Domain Unlabeled Data Improves Generalization

    Authors: Amir Hossein Saberi, Amir Najafi, Alireza Heidari, Mohammad Hosein Movasaghinia, Abolfazl Motahari, Babak H. Khalaj

    Abstract: We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems, where scenarios involving the minimization of either i) adversarially robust or ii) non-robust loss functions have been considered. Notably, we allow the unlabeled samples to deviate slightly (in total variation sense) from the in-domain distribution. The core idea behind our framework is to… ▽ More

    Submitted 15 February, 2024; v1 submitted 28 September, 2023; originally announced October 2023.

    Comments: Published at ICLR 2024 (Spotlight), 29 pages, no figures

  2. arXiv:2008.10549  [pdf, other

    cs.LG cs.DB cs.IR stat.ML

    On sampling from data with duplicate records

    Authors: Alireza Heidari, Shrinu Kushagra, Ihab F. Ilyas

    Abstract: Data deduplication is the task of detecting records in a database that correspond to the same real-world entity. Our goal is to develop a procedure that samples uniformly from the set of entities present in the database in the presence of duplicates. We accomplish this by a two-stage process. In the first step, we estimate the frequencies of all the entities in the database. In the second step, we… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

    Comments: 21 pages, 5 figures

  3. arXiv:2006.10208  [pdf, other

    cs.LG cs.DB cs.IR stat.ML

    Record fusion: A learning approach

    Authors: Alireza Heidari, George Michalopoulos, Shrinu Kushagra, Ihab F. Ilyas, Theodoros Rekatsinas

    Abstract: Record fusion is the task of aggregating multiple records that correspond to the same real-world entity in a database. We can view record fusion as a machine learning problem where the goal is to predict the "correct" value for each attribute for each entity. Given a database, we use a combination of attribute-level, recordlevel, and database-level signals to construct a feature vector for each ce… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: 18 pages, 9 figures

  4. arXiv:1907.00141  [pdf, other

    cs.LG cs.DS stat.ML

    Approximate Inference in Structured Instances with Noisy Categorical Observations

    Authors: Alireza Heidari, Ihab F. Ilyas, Theodoros Rekatsinas

    Abstract: We study the problem of recovering the latent ground truth labeling of a structured instance with categorical random variables in the presence of noisy observations. We present a new approximate algorithm for graphs with categorical variables that achieves low Hamming error in the presence of noisy vertex and edge observations. Our main result shows a logarithmic dependency of the Hamming error to… ▽ More

    Submitted 5 July, 2019; v1 submitted 29 June, 2019; originally announced July 2019.

    Comments: UAI 2019, 33 pages