Skip to main content

Showing 1–5 of 5 results for author: Nafa, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2012.12960  [pdf, ps, other

    cs.LG

    Active Deep Learning on Entity Resolution by Risk Sampling

    Authors: Youcef Nafa, Qun Chen, Zhaoqiang Chen, Xingyu Lu, Haiyang He, Tianyi Duan, Zhanhuai Li

    Abstract: While the state-of-the-art performance on entity resolution (ER) has been achieved by deep learning, its effectiveness depends on large quantities of accurately labeled training data. To alleviate the data labeling burden, Active Learning (AL) presents itself as a feasible solution that focuses on data deemed useful for model training. Building upon the recent advances in risk analysis for ER, wh… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

    Comments: 13 pages, 6 figures

  2. arXiv:2012.03513  [pdf, other

    cs.LG cs.AI cs.DB

    Adaptive Deep Learning for Entity Resolution by Risk Analysis

    Authors: Zhaoqiang Chen, Qun Chen, Youcef Nafa, Tianyi Duan, Wei Pan, Lijun Zhang, Zhanhuai Li

    Abstract: The state-of-the-art performance on entity resolution (ER) has been achieved by deep learning. However, deep models are usually trained on large quantities of accurately labeled training data, and can not be easily tuned towards a target workload. Unfortunately, in real scenarios, there may not be sufficient labeled training data, and even worse, their distribution is usually more or less differen… ▽ More

    Submitted 10 April, 2022; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: 31 pages, 5 figures, 4 tables

  3. arXiv:1810.12125  [pdf, other

    cs.DB cs.AI cs.LG stat.ML

    Gradual Machine Learning for Entity Resolution

    Authors: Boyi Hou, Qun Chen, Yanyan Wang, Youcef Nafa, Zhanhuai Li

    Abstract: Usually considered as a classification problem, entity resolution (ER) can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built on a variety of learning models (most notably deep neural networks), which require lots of accurately labeled training data. Unfortunately, high-quality labeled data usually require expensive manual work,… ▽ More

    Submitted 13 June, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

  4. arXiv:1803.05714  [pdf, other

    cs.HC cs.DB

    r-HUMO: A Risk-Aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees

    Authors: Boyi Hou, Qun Chen, Zhaoqiang Chen, Youcef Nafa, Zhanhuai Li

    Abstract: Even though many approaches have been proposed for entity resolution (ER), it remains very challenging to find one with quality guarantees. To this end, we proposea risk-aware HUman-Machine cOoperation framework for ER, denoted by r-HUMO. Built on the existing HUMO framework, r-HUMO similarly enforces both precision and recall levels by partitioning an ER workload between the human and the machine… ▽ More

    Submitted 25 November, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

    Comments: 12 pages, 7 figures. arXiv admin note: text overlap with arXiv:1710.00204

  5. arXiv:1710.00204  [pdf, other

    cs.DB

    Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework

    Authors: Zhaoqiang Chen, Qun Chen, Fengfeng Fan, Yanyan Wang, Zhuo Wang, Youcef Nafa, Zhanhuai Li, Hailong Liu, Wei Pan

    Abstract: Even though many machine algorithms have been proposed for entity resolution, it remains very challenging to find a solution with quality guarantees. In this paper, we propose a novel HUman and Machine cOoperation (HUMO) framework for entity resolution (ER), which divides an ER workload between the machine and the human. HUMO enables a mechanism for quality control that can flexibly enforce both p… ▽ More

    Submitted 2 April, 2018; v1 submitted 30 September, 2017; originally announced October 2017.

    Comments: 12 pages, 11 figures. Camera-ready version of the paper submitted to ICDE 2018, In Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE 2018)