-
Bridging the Gap: Integrating Ethics and Environmental Sustainability in AI Research and Practice
Authors:
Alexandra Sasha Luccioni,
Giada Pistilli,
Raesetje Sefala,
Nyalleng Moorosi
Abstract:
As the possibilities for Artificial Intelligence (AI) have grown, so have concerns regarding its impacts on society and the environment. However, these issues are often raised separately; i.e. carbon footprint analyses of AI models typically do not consider how the pursuit of scale has contributed towards building models that are both inaccessible to most researchers in terms of cost and dispropor…
▽ More
As the possibilities for Artificial Intelligence (AI) have grown, so have concerns regarding its impacts on society and the environment. However, these issues are often raised separately; i.e. carbon footprint analyses of AI models typically do not consider how the pursuit of scale has contributed towards building models that are both inaccessible to most researchers in terms of cost and disproportionately harmful to the environment. On the other hand, model audits that aim to evaluate model performance and disparate impacts mostly fail to engage with the environmental ramifications of AI models and how these fit into their auditing approaches. In this separation, both research directions fail to capture the depth of analysis that can be explored by considering the two in parallel and the potential solutions for making informed choices that can be developed at their convergence. In this essay, we build upon work carried out in AI and in sister communities, such as philosophy and sustainable development, to make more deliberate connections around topics such as generalizability, transparency, evaluation and equity across AI research and practice. We argue that the efforts aiming to study AI's ethical ramifications should be made in tandem with those evaluating its impacts on the environment, and we conclude with a proposal of best practices to better integrate AI ethics and sustainability in AI research and practice.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
The Human Labour of Data Work: Capturing Cultural Diversity through World Wide Dishes
Authors:
Siobhan Mackenzie Hall,
Samantha Dalal,
Raesetje Sefala,
Foutse Yuehgoh,
Aisha Alaagib,
Imane Hamzaoui,
Shu Ishida,
Jabez Magomere,
Lauren Crais,
Aya Salama,
Tejumade Afonja
Abstract:
This paper provides guidance for building and maintaining infrastructure for participatory AI efforts by sharing reflections on building World Wide Dishes (WWD), a bottom-up, community-led image and text dataset of culinary dishes and associated cultural customs. We present WWD as an example of participatory dataset creation, where community members both guide the design of the research process an…
▽ More
This paper provides guidance for building and maintaining infrastructure for participatory AI efforts by sharing reflections on building World Wide Dishes (WWD), a bottom-up, community-led image and text dataset of culinary dishes and associated cultural customs. We present WWD as an example of participatory dataset creation, where community members both guide the design of the research process and contribute to the crowdsourced dataset. This approach incorporates localised expertise and knowledge to address the limitations of web-scraped Internet datasets acknowledged in the Participatory AI discourse. We show that our approach can result in curated, high-quality data that supports decentralised contributions from communities that do not typically contribute to datasets due to a variety of systemic factors. Our project demonstrates the importance of participatory mediators in supporting community engagement by identifying the kinds of labour they performed to make WWD possible. We surface three dimensions of labour performed by participatory mediators that are crucial for participatory dataset construction: building trust with community members, making participation accessible, and contextualising community values to support meaningful data collection. Drawing on our findings, we put forth five lessons for building infrastructure to support future participatory AI efforts.
△ Less
Submitted 5 May, 2025; v1 submitted 9 February, 2025;
originally announced February 2025.
-
The World Wide Recipe: A community-centred framework for fine-grained data collection and regional bias operationalisation
Authors:
Jabez Magomere,
Shu Ishida,
Tejumade Afonja,
Aya Salama,
Daniel Kochin,
Foutse Yuehgoh,
Imane Hamzaoui,
Raesetje Sefala,
Aisha Alaagib,
Samantha Dalal,
Beatrice Marchegiani,
Elizaveta Semenova,
Lauren Crais,
Siobhan Mackenzie Hall
Abstract:
We introduce the World Wide recipe, which sets forth a framework for culturally aware and participatory data collection, and the resultant regionally diverse World Wide Dishes evaluation dataset. We also analyse bias operationalisation to highlight how current systems underperform across several dimensions: (in-)accuracy, (mis-)representation, and cultural (in-)sensitivity, with evidence from qual…
▽ More
We introduce the World Wide recipe, which sets forth a framework for culturally aware and participatory data collection, and the resultant regionally diverse World Wide Dishes evaluation dataset. We also analyse bias operationalisation to highlight how current systems underperform across several dimensions: (in-)accuracy, (mis-)representation, and cultural (in-)sensitivity, with evidence from qualitative community-based observations and quantitative automated tools. We find that these T2I models generally do not produce quality outputs of dishes specific to various regions. This is true even for the US, which is typically considered more well-resourced in training data -- although the generation of US dishes does outperform that of the investigated African countries. The models demonstrate the propensity to produce inaccurate and culturally misrepresentative, flattening, and insensitive outputs. These representational biases have the potential to further reinforce stereotypes and disproportionately contribute to erasure based on region. The dataset and code are available at https://github.com/oxai/world-wide-dishes.
△ Less
Submitted 9 February, 2025; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Improving Traffic Safety Through Video Analysis in Jakarta, Indonesia
Authors:
João Caldeira,
Alex Fout,
Aniket Kesari,
Raesetje Sefala,
Joseph Walsh,
Katy Dupre,
Muhammad Rizal Khaefi,
Setiaji,
George Hodge,
Zakiya Aryana Pramestri,
Muhammad Adib Imtiyazi
Abstract:
This project presents the results of a partnership between the Data Science for Social Good fellowship, Jakarta Smart City and Pulse Lab Jakarta to create a video analysis pipeline for the purpose of improving traffic safety in Jakarta. The pipeline transforms raw traffic video footage into databases that are ready to be used for traffic analysis. By analyzing these patterns, the city of Jakarta w…
▽ More
This project presents the results of a partnership between the Data Science for Social Good fellowship, Jakarta Smart City and Pulse Lab Jakarta to create a video analysis pipeline for the purpose of improving traffic safety in Jakarta. The pipeline transforms raw traffic video footage into databases that are ready to be used for traffic analysis. By analyzing these patterns, the city of Jakarta will better understand how human behavior and built infrastructure contribute to traffic challenges and safety risks. The results of this work should also be broadly applicable to smart city initiatives around the globe as they improve urban planning and sustainability through data science approaches.
△ Less
Submitted 30 November, 2018;
originally announced December 2018.
-
Rapid Probabilistic Interest Learning from Domain-Specific Pairwise Image Comparisons
Authors:
Michael Burke,
Siyabonga Mbonambi,
Purity Molala,
Raesetje Sefala
Abstract:
A great deal of work aims to discover large general purpose models of image interest or memorability for visual search and information retrieval. This paper argues that image interest is often domain and user specific, and that efficient mechanisms for learning about this domain-specific image interest as quickly as possible, while limiting the amount of data-labelling required, are often more use…
▽ More
A great deal of work aims to discover large general purpose models of image interest or memorability for visual search and information retrieval. This paper argues that image interest is often domain and user specific, and that efficient mechanisms for learning about this domain-specific image interest as quickly as possible, while limiting the amount of data-labelling required, are often more useful to end-users. This work uses pairwise image comparisons to reduce the labelling burden on these users, and introduces an image interest estimation approach that performs similarly to recent data hungry deep learning approaches trained using pairwise ranking losses. Here, we use a Gaussian process model to interpolate image interest inferred using a Bayesian ranking approach over image features extracted using a pre-trained convolutional neural network. Results show that fitting a Gaussian process in high-dimensional image feature space is not only computationally feasible, but also effective across a broad range of domains. The proposed probabilistic interest estimation approach produces image interests paired with uncertainties that can be used to identify images for which additional labelling is required and measure inference convergence, allowing for sample efficient active model training. Importantly, the probabilistic formulation allows for effective visual search and information retrieval when limited labelling data is available.
△ Less
Submitted 22 May, 2020; v1 submitted 19 June, 2017;
originally announced June 2017.