-
Entrofy Your Cohort: A Data Science Approach to Candidate Selection
Authors:
D. Huppenkothen,
B. McFee,
L. Norén
Abstract:
Selecting a cohort from a set of candidates is a common task within and beyond academia. Admitting students, awarding grants, choosing speakers for a conference are situations where human biases may affect the make-up of the final cohort. We propose a new algorithm, Entrofy, designed to be part of a larger decision making strategy aimed at making cohort selection as just, quantitative, transparent…
▽ More
Selecting a cohort from a set of candidates is a common task within and beyond academia. Admitting students, awarding grants, choosing speakers for a conference are situations where human biases may affect the make-up of the final cohort. We propose a new algorithm, Entrofy, designed to be part of a larger decision making strategy aimed at making cohort selection as just, quantitative, transparent, and accountable as possible. We suggest this algorithm be embedded in a two-step selection procedure. First, all application materials are stripped of markers of identity that could induce conscious or sub-conscious bias. During blind review, the committee selects all applicants, submissions, or other entities that meet their merit-based criteria. This often yields a cohort larger than the admissible number. In the second stage, the target cohort can be chosen from this meritorious pool via a new algorithm and software tool. Entrofy optimizes differences across an assignable set of categories selected by the human committee. Criteria could include gender, academic discipline, experience with certain technologies, or other quantifiable characteristics. The Entrofy algorithm yields the computational maximization of diversity by solving the tie-breaking problem with provable performance guarantees. We show how Entrofy selects cohorts according to pre-determined characteristics in simulated sets of applications and demonstrate its use in a case study. This cohort selection process allows human judgment to prevail when assessing merit, but assigns the assessment of diversity to a computational process less likely to be beset by human bias. Importantly, the stage at which diversity assessments occur is fully transparent and auditable with Entrofy. Splitting merit and diversity considerations into their own assessment stages makes it easier to explain why a given candidate was selected or rejected.
△ Less
Submitted 8 May, 2019;
originally announced May 2019.
-
Introducing Bayesian Analysis with $\text{m&m's}^\circledR$: an active-learning exercise for undergraduates
Authors:
Gwendolyn Eadie,
Daniela Huppenkothen,
Aaron Springford,
Tyler McCormick
Abstract:
We present an active-learning strategy for undergraduates that applies Bayesian analysis to candy-covered chocolate $\text{m&m's}^\circledR$. The exercise is best suited for small class sizes and tutorial settings, after students have been introduced to the concepts of Bayesian statistics. The exercise takes advantage of the non-uniform distribution of $\text{m&m's}^\circledR~$ colours, and the di…
▽ More
We present an active-learning strategy for undergraduates that applies Bayesian analysis to candy-covered chocolate $\text{m&m's}^\circledR$. The exercise is best suited for small class sizes and tutorial settings, after students have been introduced to the concepts of Bayesian statistics. The exercise takes advantage of the non-uniform distribution of $\text{m&m's}^\circledR~$ colours, and the difference in distributions made at two different factories. In this paper, we provide the intended learning outcomes, lesson plan and step-by-step guide for instruction, and open-source teaching materials. We also suggest an extension to the exercise for the graduate-level, which incorporates hierarchical Bayesian analysis.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
Hack Weeks as a model for Data Science Education and Collaboration
Authors:
Daniela Huppenkothen,
Anthony Arendt,
David W. Hogg,
Karthik Ram,
Jake VanderPlas,
Ariel Rokem
Abstract:
Across almost all scientific disciplines, the instruments that record our experimental data and the methods required for storage and data analysis are rapidly increasing in complexity. This gives rise to the need for scientific communities to adapt on shorter time scales than traditional university curricula allow for, and therefore requires new modes of knowledge transfer. The universal applicabi…
▽ More
Across almost all scientific disciplines, the instruments that record our experimental data and the methods required for storage and data analysis are rapidly increasing in complexity. This gives rise to the need for scientific communities to adapt on shorter time scales than traditional university curricula allow for, and therefore requires new modes of knowledge transfer. The universal applicability of data science tools to a broad range of problems has generated new opportunities to foster exchange of ideas and computational workflows across disciplines. In recent years, hack weeks have emerged as an effective tool for fostering these exchanges by providing training in modern data analysis workflows. While there are variations in hack week implementation, all events consist of a common core of three components: tutorials in state-of-the-art methodology, peer-learning and project work in a collaborative environment. In this paper, we present the concept of a hack week in the larger context of scientific meetings and point out similarities and differences to traditional conferences. We motivate the need for such an event and present in detail its strengths and challenges. We find that hack weeks are successful at cultivating collaboration and the exchange of knowledge. Participants self-report that these events help them both in their day-to-day research as well as their careers. Based on our results, we conclude that hack weeks present an effective, easy-to-implement, fairly low-cost tool to positively impact data analysis literacy in academic disciplines, foster collaboration and cultivate best practices.
△ Less
Submitted 31 October, 2017;
originally announced November 2017.