-
Navigating Uncertainties: Understanding How GenAI Developers Document Their Models on Open-Source Platforms
Authors:
Ningjing Tang,
Megan Li,
Amy Winecoff,
Michael Madaio,
Hoda Heidari,
Hong Shen
Abstract:
Model documentation plays a crucial role in promoting transparency and responsible development of AI systems. With the rise of Generative AI (GenAI), open-source platforms have increasingly become hubs for hosting and distributing these models, prompting platforms like Hugging Face to develop dedicated model documentation guidelines that align with responsible AI principles. Despite these growing…
▽ More
Model documentation plays a crucial role in promoting transparency and responsible development of AI systems. With the rise of Generative AI (GenAI), open-source platforms have increasingly become hubs for hosting and distributing these models, prompting platforms like Hugging Face to develop dedicated model documentation guidelines that align with responsible AI principles. Despite these growing efforts, there remains a lack of understanding of how developers document their GenAI models on open-source platforms. Through interviews with 13 GenAI developers active on open-source platforms, we provide empirical insights into their documentation practices and challenges. Our analysis reveals that despite existing resources, developers of GenAI models still face multiple layers of uncertainties in their model documentation: (1) uncertainties about what specific content should be included; (2) uncertainties about how to effectively report key components of their models; and (3) uncertainties in deciding who should take responsibilities for various aspects of model documentation. Based on our findings, we discuss the implications for policymakers, open-source platforms, and the research community to support meaningful, effective and actionable model documentation in the GenAI era, including cultivating better community norms, building robust evaluation infrastructures, and clarifying roles and responsibilities.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.
-
Improving governance outcomes through AI documentation: Bridging theory and practice
Authors:
Amy A. Winecoff,
Miranda Bogen
Abstract:
Documentation plays a crucial role in both external accountability and internal governance of AI systems. Although there are many proposals for documenting AI data, models, systems, and methods, the ways these practices enhance governance as well as the challenges practitioners and organizations face with documentation remain underexplored. In this paper, we analyze 37 proposed documentation frame…
▽ More
Documentation plays a crucial role in both external accountability and internal governance of AI systems. Although there are many proposals for documenting AI data, models, systems, and methods, the ways these practices enhance governance as well as the challenges practitioners and organizations face with documentation remain underexplored. In this paper, we analyze 37 proposed documentation frameworks and 22 empirical studies evaluating their use. We identify several pathways or "theories of change" through which documentation can enhance governance, including informing stakeholders about AI risks and applications, facilitating collaboration, encouraging ethical deliberation, and supporting best practices. However, empirical findings reveal significant challenges for practitioners, such as insufficient incentives and resources, structural and organizational communication barriers, interpersonal and organizational constraints to ethical action, and poor integration with existing workflows. These challenges often hinder the realization of the possible benefits of documentation. We also highlight key considerations for organizations when designing documentation, such as determining the appropriate level of detail and balancing automation in the process. We conclude by discussing how future research can expand on our findings such as by exploring documentation approaches that support governance of general-purpose models and how multiple transparency and documentation methods can collectively improve governance outcomes.
△ Less
Submitted 9 December, 2024; v1 submitted 13 September, 2024;
originally announced September 2024.
-
Minimum Viable Ethics: From Institutionalizing Industry AI Governance to Product Impact
Authors:
Archana Ahlawat,
Amy Winecoff,
Jonathan Mayer
Abstract:
Across the technology industry, many companies have expressed their commitments to AI ethics and created dedicated roles responsible for translating high-level ethics principles into product. Yet it is unclear how effective this has been in leading to meaningful product changes. Through semi-structured interviews with 26 professionals working on AI ethics in industry, we uncover challenges and str…
▽ More
Across the technology industry, many companies have expressed their commitments to AI ethics and created dedicated roles responsible for translating high-level ethics principles into product. Yet it is unclear how effective this has been in leading to meaningful product changes. Through semi-structured interviews with 26 professionals working on AI ethics in industry, we uncover challenges and strategies of institutionalizing ethics work along with translation into product impact. We ultimately find that AI ethics professionals are highly agile and opportunistic, as they attempt to create standardized and reusable processes and tools in a corporate environment in which they have little traditional power. In negotiations with product teams, they face challenges rooted in their lack of authority and ownership over product, but can push forward ethics work by leveraging narratives of regulatory response and ethics as product quality assurance. However, this strategy leaves us with a minimum viable ethics, a narrowly scoped industry AI ethics that is limited in its capacity to address normative issues separate from compliance or product quality. Potential future regulation may help bridge this gap.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Techno-Utopians, Scammers, and Bullshitters: The Promise and Peril of Web3 and Blockchain Technologies According to Operators and Venture Capital Investors
Authors:
Amy A. Winecoff,
Johannes Lenhard
Abstract:
Proponents and developers of Web3 and blockchain argue that these technologies can revolutionize how people live and work by empowering individuals and distributing decision-making power. While technologists often have expansive hopes for what their technologies will accomplish over the long term, the practical challenges of developing, scaling, and maintaining systems amidst present-day constrain…
▽ More
Proponents and developers of Web3 and blockchain argue that these technologies can revolutionize how people live and work by empowering individuals and distributing decision-making power. While technologists often have expansive hopes for what their technologies will accomplish over the long term, the practical challenges of developing, scaling, and maintaining systems amidst present-day constraints can compromise progress toward this vision. How technologists think about the technological future they hope to enable and how they navigate day-to-day issues impacts the form technologies take, their potential benefits, and their potential harms. In our current work, we aimed to explore the visions of Web3 and blockchain technologists and identify the immediate challenges that could threaten their visions. We conducted semi-structured interviews with 29 operators and professional investors in the Web3 and blockchain field. Our findings revealed that participants supported several ideological goals for their projects, with decentralization being a pivotal mechanism to enable user autonomy, distribute governance power, and promote financial inclusion. However, participants acknowledged the practical difficulties in fulfilling these promises, including the need for rapid technology development, conflicts of interest among stakeholders due to platform financing dynamics, and the challenge of expanding to mainstream users who may not share the "Web3 ethos." If negotiated ineffectively, these challenges could lead to negative outcomes, such as corrupt governance, increased inequality, and increased prevalence of scams and dubious investment schemes. While participants thought education, regulation, and a renewed commitment to the original blockchain ideals could alleviate some problems, they expressed skepticism about the potential of these solutions.
△ Less
Submitted 7 November, 2023; v1 submitted 14 July, 2023;
originally announced July 2023.
-
Upvotes? Downvotes? No Votes? Understanding the relationship between reaction mechanisms and political discourse on Reddit
Authors:
Orestis Papakyriakopoulos,
Severin Engelmann,
Amy Winecoff
Abstract:
A significant share of political discourse occurs online on social media platforms. Policymakers and researchers try to understand the role of social media design in shaping the quality of political discourse around the globe. In the past decades, scholarship on political discourse theory has produced distinct characteristics of different types of prominent political rhetoric such as deliberative,…
▽ More
A significant share of political discourse occurs online on social media platforms. Policymakers and researchers try to understand the role of social media design in shaping the quality of political discourse around the globe. In the past decades, scholarship on political discourse theory has produced distinct characteristics of different types of prominent political rhetoric such as deliberative, civic, or demagogic discourse. This study investigates the relationship between social media reaction mechanisms (i.e., upvotes, downvotes) and political rhetoric in user discussions by engaging in an in-depth conceptual analysis of political discourse theory. First, we analyze 155 million user comments in 55 political subforums on Reddit between 2010 and 2018 to explore whether users' style of political discussion aligns with the essential components of deliberative, civic, and demagogic discourse. Second, we perform a quantitative study that combines confirmatory factor analysis with difference in differences models to explore whether different reaction mechanism schemes (e.g., upvotes only, upvotes and downvotes, no reaction mechanisms) correspond with political user discussion that is more or less characteristic of deliberative, civic, or demagogic discourse. We produce three main takeaways. First, despite being "ideal constructs of political rhetoric," we find that political discourse theories describe political discussions on Reddit to a large extent. Second, we find that discussions in subforums with only upvotes, or both up- and downvotes are associated with user discourse that is more deliberate and civic. Third, social media discussions are most demagogic in subreddits with no reaction mechanisms at all. These findings offer valuable contributions for ongoing policy discussions on the relationship between social media interface design and respectful political discussion among users.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.
-
Artificial Concepts of Artificial Intelligence: Institutional Compliance and Resistance in AI Startups
Authors:
Amy A. Winecoff,
Elizabeth Anne Watkins
Abstract:
Scholars and industry practitioners have debated how to best develop interventions for ethical artificial intelligence (AI). Such interventions recommend that companies building and using AI tools change their technical practices, but fail to wrangle with critical questions about the organizational and institutional context in which AI is developed. In this paper, we contribute descriptive researc…
▽ More
Scholars and industry practitioners have debated how to best develop interventions for ethical artificial intelligence (AI). Such interventions recommend that companies building and using AI tools change their technical practices, but fail to wrangle with critical questions about the organizational and institutional context in which AI is developed. In this paper, we contribute descriptive research around the life of "AI" as a discursive concept and organizational practice in an understudied sphere--emerging AI startups--and with a focus on extra-organizational pressures faced by entrepreneurs. Leveraging a theoretical lens for how organizations change, we conducted semi-structured interviews with 23 entrepreneurs working at early-stage AI startups. We find that actors within startups both conform to and resist institutional pressures. Our analysis identifies a central tension for AI entrepreneurs: they often valued scientific integrity and methodological rigor; however, influential external stakeholders either lacked the technical knowledge to appreciate entrepreneurs' emphasis on rigor or were more focused on business priorities. As a result, entrepreneurs adopted hyped marketing messages about AI that diverged from their scientific values, but attempted to preserve their legitimacy internally. Institutional pressures and organizational constraints also influenced entrepreneurs' modeling practices and their response to actual or impending regulation. We conclude with a discussion for how such pressures could be used as leverage for effective interventions towards building ethical AI.
△ Less
Submitted 14 June, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Qualitative Analysis for Human Centered AI
Authors:
Orestis Papakyriakopoulos,
Elizabeth Anne Watkins,
Amy Winecoff,
Klaudia Jaźwińska,
Tithi Chattopadhyay
Abstract:
Human-centered artificial intelligence (AI) posits that machine learning and AI should be developed and applied in a socially aware way. In this article, we argue that qualitative analysis (QA) can be a valuable tool in this process, supplementing, informing, and extending the possibilities of AI models. We show this by describing how QA can be integrated in the current prediction paradigm of AI,…
▽ More
Human-centered artificial intelligence (AI) posits that machine learning and AI should be developed and applied in a socially aware way. In this article, we argue that qualitative analysis (QA) can be a valuable tool in this process, supplementing, informing, and extending the possibilities of AI models. We show this by describing how QA can be integrated in the current prediction paradigm of AI, assisting scientists in the process of selecting data, variables, and model architectures. Furthermore, we argue that QA can be a part of novel paradigms towards Human Centered AI. QA can support scientists and practitioners in practical problem solving and situated model development. It can also promote participatory design approaches, reveal understudied and emerging issues in AI systems, and assist policy making.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Simulation as Experiment: An Empirical Critique of Simulation Research on Recommender Systems
Authors:
Amy A. Winecoff,
Matthew Sun,
Eli Lucherini,
Arvind Narayanan
Abstract:
Simulation can enable the study of recommender system (RS) evolution while circumventing many of the issues of empirical longitudinal studies; simulations are comparatively easier to implement, are highly controlled, and pose no ethical risk to human participants. How simulation can best contribute to scientific insight about RS alongside qualitative and quantitative empirical approaches is an ope…
▽ More
Simulation can enable the study of recommender system (RS) evolution while circumventing many of the issues of empirical longitudinal studies; simulations are comparatively easier to implement, are highly controlled, and pose no ethical risk to human participants. How simulation can best contribute to scientific insight about RS alongside qualitative and quantitative empirical approaches is an open question. Philosophers and researchers have long debated the epistemological nature of simulation compared to wholly theoretical or empirical methods. Simulation is often implicitly or explicitly conceptualized as occupying a middle ground between empirical and theoretical approaches, allowing researchers to realize the benefits of both. However, what is often ignored in such arguments is that without firm grounding in any single methodological tradition, simulation studies have no agreed upon scientific norms or standards, resulting in a patchwork of theoretical motivations, approaches, and implementations that are difficult to reconcile. In this position paper, we argue that simulation studies of RS are conceptually similar to empirical experimental approaches and therefore can be evaluated using the standards of empirical research methods. Using this empirical lens, we argue that the combination of high heterogeneity in approaches and low transparency in methods in simulation studies of RS has limited their interpretability, generalizability, and replicability. We contend that by adopting standards and practices common in empirical disciplines, simulation researchers can mitigate many of these weaknesses.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
T-RECS: A Simulation Tool to Study the Societal Impact of Recommender Systems
Authors:
Eli Lucherini,
Matthew Sun,
Amy Winecoff,
Arvind Narayanan
Abstract:
Simulation has emerged as a popular method to study the long-term societal consequences of recommender systems. This approach allows researchers to specify their theoretical model explicitly and observe the evolution of system-level outcomes over time. However, performing simulation-based studies often requires researchers to build their own simulation environments from the ground up, which create…
▽ More
Simulation has emerged as a popular method to study the long-term societal consequences of recommender systems. This approach allows researchers to specify their theoretical model explicitly and observe the evolution of system-level outcomes over time. However, performing simulation-based studies often requires researchers to build their own simulation environments from the ground up, which creates a high barrier to entry, introduces room for implementation error, and makes it difficult to disentangle whether observed outcomes are due to the model or the implementation.
We introduce T-RECS, an open-sourced Python package designed for researchers to simulate recommendation systems and other types of sociotechnical systems in which an algorithm mediates the interactions between multiple stakeholders, such as users and content creators. To demonstrate the flexibility of T-RECS, we perform a replication of two prior simulation-based research on sociotechnical systems. We additionally show how T-RECS can be used to generate novel insights with minimal overhead. Our tool promotes reproducibility in this area of research, provides a unified language for simulating sociotechnical systems, and removes the friction of implementing simulations from scratch.
△ Less
Submitted 27 July, 2021; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Recommendation or Discrimination?: Quantifying Distribution Parity in Information Retrieval Systems
Authors:
Rinat Khaziev,
Bryce Casavant,
Pearce Washabaugh,
Amy A. Winecoff,
Matthew Graham
Abstract:
Information retrieval (IR) systems often leverage query data to suggest relevant items to users. This introduces the possibility of unfairness if the query (i.e., input) and the resulting recommendations unintentionally correlate with latent factors that are protected variables (e.g., race, gender, and age). For instance, a visual search system for fashion recommendations may pick up on features o…
▽ More
Information retrieval (IR) systems often leverage query data to suggest relevant items to users. This introduces the possibility of unfairness if the query (i.e., input) and the resulting recommendations unintentionally correlate with latent factors that are protected variables (e.g., race, gender, and age). For instance, a visual search system for fashion recommendations may pick up on features of the human models rather than fashion garments when generating recommendations. In this work, we introduce a statistical test for "distribution parity" in the top-K IR results, which assesses whether a given set of recommendations is fair with respect to a specific protected variable. We evaluate our test using both simulated and empirical results. First, using artificially biased recommendations, we demonstrate the trade-off between statistically detectable bias and the size of the search catalog. Second, we apply our test to a visual search system for fashion garments, specifically testing for recommendation bias based on the skin tone of fashion models. Our distribution parity test can help ensure that IR systems' results are fair and produce a good experience for all users.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
Assessing Fashion Recommendations: A Multifaceted Offline Evaluation Approach
Authors:
Jake Sherman,
Chinmay Shukla,
Rhonda Textor,
Su Zhang,
Amy A. Winecoff
Abstract:
Fashion is a unique domain for developing recommender systems (RS). Personalization is critical to fashion users. As a result, highly accurate recommendations are not sufficient unless they are also specific to users. Moreover, fashion data is characterized by a large majority of new users, so a recommendation strategy that performs well only for users with prior interaction history is a poor fit…
▽ More
Fashion is a unique domain for developing recommender systems (RS). Personalization is critical to fashion users. As a result, highly accurate recommendations are not sufficient unless they are also specific to users. Moreover, fashion data is characterized by a large majority of new users, so a recommendation strategy that performs well only for users with prior interaction history is a poor fit to the fashion problem. Critical to addressing these issues in fashion recommendation is an evaluation strategy that: 1) includes multiple metrics that are relevant to fashion, and 2) is performed within segments of users with different interaction histories. Here, we present our multifaceted offline strategy for evaluating fashion RS. Using our proposed evaluation methodology, we compare the performance of three different algorithms, a most popular (MP) items strategy, a collaborative filtering (CF) strategy, and a content-based (CB) strategy. We demonstrate that only by considering the performance of these algorithms across multiple metrics and user segments can we determine the extent to which each algorithm is likely to fulfill fashion users' needs.
△ Less
Submitted 5 September, 2019;
originally announced September 2019.