Skip to main content

Showing 1–2 of 2 results for author: Zogheib, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.22473  [pdf, other

    cs.CY

    The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track

    Authors: Eshta Bhardwaj, Harshit Gujral, Siyi Wu, Ciara Zogheib, Tegan Maharaj, Christoph Becker

    Abstract: Data curation is a field with origins in librarianship and archives, whose scholarship and thinking on data issues go back centuries, if not millennia. The field of machine learning is increasingly observing the importance of data curation to the advancement of both applications and fundamental understanding of machine learning models - evidenced not least by the creation of the Datasets and Bench… ▽ More

    Submitted 3 January, 2025; v1 submitted 29 October, 2024; originally announced October 2024.

    Comments: Accepted as spotlight poster in NeurIPS Datasets & Benchmarks track 2024

  2. Machine Learning Data Practices through a Data Curation Lens: An Evaluation Framework

    Authors: Eshta Bhardwaj, Harshit Gujral, Siyi Wu, Ciara Zogheib, Tegan Maharaj, Christoph Becker

    Abstract: Studies of dataset development in machine learning call for greater attention to the data practices that make model development possible and shape its outcomes. Many argue that the adoption of theory and practices from archives and data curation fields can support greater fairness, accountability, transparency, and more ethical machine learning. In response, this paper examines data practices in m… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: In ACM Conference on Fairness, Accountability, and Transparency 2024. ACM, Rio de Janeiro, Brazil