Skip to main content

Showing 1–12 of 12 results for author: Geiger, R S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.15567  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Asking an AI for salary negotiation advice is a matter of concern: Controlled experimental perturbation of ChatGPT for protected and non-protected group discrimination on a contextual task with no clear ground truth answers

    Authors: R. Stuart Geiger, Flynn O'Sullivan, Elsie Wang, Jonathan Lo

    Abstract: We conducted controlled experimental bias audits for four versions of ChatGPT, which we asked to recommend an opening offer in salary negotiations for a new hire. We submitted 98,800 prompts to each version, systematically varying the employee's gender, university, and major, and tested prompts in voice of each side of the negotiation: the employee versus employer. We find ChatGPT as a multi-model… ▽ More

    Submitted 8 October, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

  2. arXiv:2107.02278  [pdf, other

    cs.LG cs.CY cs.SI

    "Garbage In, Garbage Out" Revisited: What Do Machine Learning Application Papers Report About Human-Labeled Training Data?

    Authors: R. Stuart Geiger, Dominique Cope, Jamie Ip, Marsha Lotosh, Aayush Shah, Jenny Weng, Rebekah Tang

    Abstract: Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data. This study builds on prior work that investigated to what extent 'best practices' around labeling training data were followed in applied ML publications within a single domain (social media platforms). In this paper, we expand by studying publications that… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

    Journal ref: Quantitative Science Studies 2:2 (2021)

  3. arXiv:1912.08320  [pdf, other

    cs.CY cs.CL cs.DL cs.LG

    Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?

    Authors: R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, Jenny Huang

    Abstract: Many machine learning projects for new application areas involve teams of humans who label data for a particular purpose, from hiring crowdworkers to the paper's authors labeling the data themselves. Such a task is quite similar to (or a form of) structured content analysis, which is a longstanding methodology in the social sciences and humanities, with many established best practices. In this pap… ▽ More

    Submitted 17 December, 2019; originally announced December 2019.

    Comments: 18 pages, includes appendix

    Journal ref: Proc ACM FAT* 2020

  4. arXiv:1909.05189  [pdf, other

    cs.HC cs.CY cs.LG

    ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia

    Authors: Aaron Halfaker, R. Stuart Geiger

    Abstract: Algorithmic systems---from rule-based bots to machine learning classifiers---have a long history of supporting the essential work of content moderation and other curation work in peer production projects. From counter-vandalism to task routing, basic machine prediction has allowed open knowledge projects like Wikipedia to scale to the largest encyclopedia in the world, while maintaining quality an… ▽ More

    Submitted 20 August, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

    Comments: 29 pages + 3 pages appendix. Currently under review

  5. arXiv:1908.10808  [pdf, other

    cs.HC cs.CY cs.DL cs.SI

    The Rise and Fall of the Note: Changing Paper Lengths in ACM CSCW, 2000-2018

    Authors: R. Stuart Geiger

    Abstract: In this note, I quantitatively examine various trends in the lengths of published papers in ACM CSCW from 2000-2018, focusing on several major transitions in editorial and reviewing policy. The focus is on the rise and fall of the 4-page note, which was introduced in 2004 as a separate submission type to the 10-page double-column "full paper" format. From 2004-2012, 4-page notes of 2,500 to 4,500… ▽ More

    Submitted 9 September, 2019; v1 submitted 28 August, 2019; originally announced August 2019.

    Comments: 10 pages. To appear in PACMHCI, to be presented at ACM CSCW 2019. v3 fixes typos and updates some statistics

    Journal ref: PACMHCI 3, CSCW (2019) 222

  6. Black-boxing the user: internet protocol over xylophone players (IPoXP)

    Authors: R. Stuart Geiger, Yoon Jung Jeong, Emily Manders

    Abstract: We introduce IP over Xylophone Players (IPoXP), a novel Internet protocol between two computers using xylophone-based Arduino interfaces. In our implementation, human operators are situated within the lowest layer of the network, transmitting data between computers by striking designated keys. We discuss how IPoXP inverts the traditional mode of human-computer interaction, with a computer using th… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.

    Journal ref: In CHI 2012 Extended Abstracts on Human Factors in Computing Systems (CHI EA 2012, alt.chi). ACM, New York, NY, USA, p. 71-80. DOI: https://doi.org/10.1145/2212776.2212785

  7. arXiv:1810.09590  [pdf

    cs.CY cs.AI cs.HC cs.SI

    The Lives of Bots

    Authors: R. Stuart Geiger

    Abstract: Automated software agents --- or bots --- have long been an important part of how Wikipedia's volunteer community of editors write, edit, update, monitor, and moderate content. In this paper, I discuss the complex social and technical environment in which Wikipedia's bots operate. This paper focuses on the establishment and role of English Wikipedia's bot policies and the Bot Approvals Group, a vo… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

    Comments: Originally published in 2011

    Journal ref: Book chapter in __Wikipedia: A Critical Point of View__ (Institute of Network Cultures, Amsterdam), 2011

  8. arXiv:1810.07273  [pdf

    cs.CY cs.HC cs.SI

    Operationalizing Conflict and Cooperation between Automated Software Agents in Wikipedia: A Replication and Expansion of 'Even Good Bots Fight'

    Authors: R. Stuart Geiger, Aaron Halfaker

    Abstract: This paper replicates, extends, and refutes conclusions made in a study published in PLoS ONE ("Even Good Bots Fight"), which claimed to identify substantial levels of conflict between automated software agents (or bots) in Wikipedia using purely quantitative methods. By applying an integrative mixed-methods approach drawing on trace ethnography, we place these alleged cases of bot-bot conflict in… ▽ More

    Submitted 16 October, 2018; originally announced October 2018.

    Comments: 33 pages. In ACM CSCW 2018

    Journal ref: Proc ACM on Human Computer Interaction. 1(2), Article 49. CSCW 2018

  9. The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work

    Authors: R. Stuart Geiger, Nelle Varoquaux, Charlotte Mazel-Cabasse, Chris Holdgraf

    Abstract: Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) "libraries" -- curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to use them. Yet documentation for open source softwar… ▽ More

    Submitted 31 May, 2018; originally announced May 2018.

    Journal ref: Computer-Supported Cooperative Work. 2018. doi: 10.1007/s10606-018-9333-1

  10. arXiv:1709.09093  [pdf

    cs.CY cs.AI cs.HC

    Beyond opening up the black box: Investigating the role of algorithmic systems in Wikipedian organizational culture

    Authors: R. Stuart Geiger

    Abstract: Scholars and practitioners across domains are increasingly concerned with algorithmic transparency and opacity, interrogating the values and assumptions embedded in automated, black-boxed systems, particularly in user-generated content platforms. I report from an ethnography of infrastructure in Wikipedia to discuss an often understudied aspect of this topic: the local, contextual, learned experti… ▽ More

    Submitted 1 October, 2017; v1 submitted 26 September, 2017; originally announced September 2017.

    Comments: 14 pages, typo fixed in v2

    Journal ref: Big Data & Society 4(2). 2017

  11. arXiv:1706.02777  [pdf, other

    cs.CY cs.SE cs.SI

    Summary Analysis of the 2017 GitHub Open Source Survey

    Authors: R. Stuart Geiger

    Abstract: This report is a high-level summary analysis of the 2017 GitHub Open Source Survey dataset, presenting frequency counts, proportions, and frequency or proportion bar plots for every question asked in the survey.

    Submitted 8 June, 2017; originally announced June 2017.

    Comments: 58 pages

  12. Report on the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4)

    Authors: Daniel S. Katz, Kyle E. Niemeyer, Sandra Gesing, Lorraine Hwang, Wolfgang Bangerth, Simon Hettrick, Ray Idaszak, Jean Salac, Neil Chue Hong, Santiago Núñez Corrales, Alice Allen, R. Stuart Geiger, Jonah Miller, Emily Chen, Anshu Dubey, Patricia Lago

    Abstract: This report records and discusses the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4). The report includes a description of the keynote presentation of the workshop, the mission and vision statements that were drafted at the workshop and finalized shortly after it, a set of idea papers, position papers, experience papers, demos, and lightning talks, and a pa… ▽ More

    Submitted 18 May, 2017; v1 submitted 7 May, 2017; originally announced May 2017.