-
Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants
Authors:
Chloé Sekkat,
Fanny Leroy,
Salima Mdhaffar,
Blake Perry Smith,
Yannick Estève,
Joseph Dureau,
Alice Coucke
Abstract:
Recent works demonstrate that voice assistants do not perform equally well for everyone, but research on demographic robustness of speech technologies is still scarce. This is mainly due to the rarity of large datasets with controlled demographic tags. This paper introduces the Sonos Voice Control Bias Assessment Dataset, an open dataset composed of voice assistant requests for North American Engl…
▽ More
Recent works demonstrate that voice assistants do not perform equally well for everyone, but research on demographic robustness of speech technologies is still scarce. This is mainly due to the rarity of large datasets with controlled demographic tags. This paper introduces the Sonos Voice Control Bias Assessment Dataset, an open dataset composed of voice assistant requests for North American English in the music domain (1,038 speakers, 166 hours, 170k audio samples, with 9,040 unique labelled transcripts) with a controlled demographic diversity (gender, age, dialectal region and ethnicity). We also release a statistical demographic bias assessment methodology, at the univariate and multivariate levels, tailored to this specific use case and leveraging spoken language understanding metrics rather than transcription accuracy, which we believe is a better proxy for user experience. To demonstrate the capabilities of this dataset and statistical method to detect demographic bias, we consider a pair of state-of-the-art Automatic Speech Recognition and Spoken Language Understanding models. Results show statistically significant differences in performance across age, dialectal region and ethnicity. Multivariate tests are crucial to shed light on mixed effects between dialectal region, gender and age.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Van der Waals epitaxy of Weyl-semimetal Td-WTe$_2$
Authors:
Alexandre Llopez,
Frédéric Leroy,
Calvin Tagne-Kaegom,
Boris Croes,
Adrien Michon,
Chiara Mastropasqua,
Mohamed Al Khalfioui,
Stefano Curiotto,
Pierre Müller,
Andrés Saùl,
Bertrand Kierren,
Geoffroy Kremer,
Patrick Le Fèvre,
François Bertran,
Yannick Fagot-Revurat,
Fabien Cheynis
Abstract:
Epitaxial growth of WTe$_2$ offers significant advantages, including the production of high-qualityfilms, possible long range in-plane ordering and precise control over layer thicknesses. However,the mean island size of WTe$_2$ grown by molecular beam epitaxy (MBE) in litterature is only a fewtens of nanometers, which is not suitable for an implementation of devices at large lateral scales.Here we…
▽ More
Epitaxial growth of WTe$_2$ offers significant advantages, including the production of high-qualityfilms, possible long range in-plane ordering and precise control over layer thicknesses. However,the mean island size of WTe$_2$ grown by molecular beam epitaxy (MBE) in litterature is only a fewtens of nanometers, which is not suitable for an implementation of devices at large lateral scales.Here we report the growth of Td-WTe$_2$ ultrathin films by MBE on monolayer (ML) graphenereaching a mean flake size of $\cong$110nm, which is, on overage, more than three time larger thanprevious results. WTe$_2$ films thicker than 5nm have been successfully synthesized and exhibit theexpected Td-phase atomic structure. We rationalize epitaxial growth of Td-WTe$_2$ and propose asimple model to estimate the mean flake size as a function of growth parameters that can be appliedto other transition metal dichalcogenides (TMDCs). Based on nucleation theory and Kolmogorov-Johnson-Meh-Avrami (KJMA) equation, our analytical model supports experimental data showinga critical coverage of 0.13ML above which WTe$_2$ nucleation becomes negligible. The quality ofmonolayer WTe$_2$ films is demonstrated from electronic band structure analysis using angle-resolved photoemission spectroscopy (ARPES) in agreement with first-principle calculationsperformed on free-standing WTe$_2$ and previous reports.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
SCALPEL3: a scalable open-source library for healthcare claims databases
Authors:
Emmanuel Bacry,
Stéphane Gaïffas,
Fanny Leroy,
Maryan Morel,
Dinh Phong Nguyen,
Youcef Sebiat,
Dian Sun
Abstract:
This article introduces SCALPEL3, a scalable open-source framework for studies involving Large Observational Databases (LODs). Its design eases medical observational studies thanks to abstractions allowing concept extraction, high-level cohort manipulation, and production of data formats compatible with machine learning libraries. SCALPEL3 has successfully been used on the SNDS database (see Tuppi…
▽ More
This article introduces SCALPEL3, a scalable open-source framework for studies involving Large Observational Databases (LODs). Its design eases medical observational studies thanks to abstractions allowing concept extraction, high-level cohort manipulation, and production of data formats compatible with machine learning libraries. SCALPEL3 has successfully been used on the SNDS database (see Tuppin et al. (2017)), a huge healthcare claims database that handles the reimbursement of almost all French citizens.
SCALPEL3 focuses on scalability, easy interactive analysis and helpers for data flow analysis to accelerate studies performed on LODs. It consists of three open-source libraries based on Apache Spark. SCALPEL-Flattening allows denormalization of the LOD (only SNDS for now) by joining tables sequentially in a big table. SCALPEL-Extraction provides fast concept extraction from a big table such as the one produced by SCALPEL-Flattening. Finally, SCALPEL-Analysis allows interactive cohort manipulations, monitoring statistics of cohort flows and building datasets to be used with machine learning libraries. The first two provide a Scala API while the last one provides a Python API that can be used in an interactive environment. Our code is available on GitHub.
SCALPEL3 allowed to extract successfully complex concepts for studies such as Morel et al (2017) or studies with 14.5 million patients observed over three years (corresponding to more than 15 billion healthcare events and roughly 15 TeraBytes of data) in less than 49 minutes on a small 15 nodes HDFS cluster. SCALPEL3 provides a sharp interactive control of data processing through legible code, which helps to build studies with full reproducibility, leading to improved maintainability and audit of studies performed on LODs.
△ Less
Submitted 26 August, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
ConvSCCS: convolutional self-controlled case series model for lagged adverse event detection
Authors:
Maryan Morel,
Emmanuel Bacry,
Stéphane Gaïffas,
Agathe Guilloux,
Fanny Leroy
Abstract:
With the increased availability of large databases of electronic health records (EHRs) comes the chance of enhancing health risks screening. Most post-marketing detections of adverse drug reaction (ADR) rely on physicians' spontaneous reports, leading to under reporting. To take up this challenge, we develop a scalable model to estimate the effect of multiple longitudinal features (drug exposures)…
▽ More
With the increased availability of large databases of electronic health records (EHRs) comes the chance of enhancing health risks screening. Most post-marketing detections of adverse drug reaction (ADR) rely on physicians' spontaneous reports, leading to under reporting. To take up this challenge, we develop a scalable model to estimate the effect of multiple longitudinal features (drug exposures) on a rare longitudinal outcome. Our procedure is based on a conditional Poisson model also known as self-controlled case series (SCCS). We model the intensity of outcomes using a convolution between exposures and step functions, that are penalized using a combination of group-Lasso and total-variation. This approach does not require the specification of precise risk periods, and allows to study in the same model several exposures at the same time. We illustrate the fact that this approach improves the state-of-the-art for the estimation of the relative risks both on simulations and on a cohort of diabetic patients, extracted from the large French national health insurance database (SNIIRAM), a SQL database built around medical reimbursements of more than 65 million people. This work has been done in the context of a research partnership between Ecole Polytechnique and CNAMTS (in charge of SNIIRAM).
△ Less
Submitted 25 January, 2018; v1 submitted 21 December, 2017;
originally announced December 2017.
-
Spatial inhomogeneity and temporal dynamics of a 2D electron gas in interaction with a 2D adatom gas
Authors:
F. Cheynis,
S. Curiotto,
F. Leroy,
P. Müller
Abstract:
Fundamental interest for 2D electron gas (2DEG) systems has been recently renewed with the advent of 2D materials and their potential high-impact applications in optoelectronics. Here, we investigate a 2DEG created by the electron transfer from a Ag adatom gas deposited on a Si(111)$\sqrt{3}\times\sqrt{3}$-Ag surface to an electronic surface state. Using low-energy electron microscopy (LEEM), we m…
▽ More
Fundamental interest for 2D electron gas (2DEG) systems has been recently renewed with the advent of 2D materials and their potential high-impact applications in optoelectronics. Here, we investigate a 2DEG created by the electron transfer from a Ag adatom gas deposited on a Si(111)$\sqrt{3}\times\sqrt{3}$-Ag surface to an electronic surface state. Using low-energy electron microscopy (LEEM), we measure the Ag adatom gas concentration and the 2DEG-induced charge transfer. We demonstrate a linear dependence of the surface work function change on the Ag adatom gas concentration. A breakdown of the linear relationship is induced by the occurrence of the Ag adatom gas superstructure identified as Si(111)$\sqrt{21}\times\sqrt{21}$-Ag only observed below room temperature. We evidence below room temperature a confinement of the 2DEG on atomic terraces characterised by spatial inhomogeneities of the 2DEG-induced charge transfer along with temporal fluctuations. These variations mirror the Ag adatom gas concentration changes induced by the growth of 3D Ag islands and the occurrence of an Ehrlich-Schwoebel diffusion barrier of 155$\pm$10meV.
△ Less
Submitted 13 September, 2017; v1 submitted 17 October, 2016;
originally announced October 2016.
-
Improving the reliability of material databases using multiscale approaches
Authors:
Y. Rollet,
M. Bonnet,
N. Carrère,
F. -H. Leroy,
J. -F. Maire
Abstract:
This article addresses the propagation of constitutive uncertainties between scales occurring in the multiscale modelling of fibre-reinforced composites. The amplification of such uncertainties through upward or downward transitions by a homogenisation model is emphasized and exemplified with the Mori-Tanaka model. In particular, the sensitivity to data uncertainty in the inverse determination o…
▽ More
This article addresses the propagation of constitutive uncertainties between scales occurring in the multiscale modelling of fibre-reinforced composites. The amplification of such uncertainties through upward or downward transitions by a homogenisation model is emphasized and exemplified with the Mori-Tanaka model. In particular, the sensitivity to data uncertainty in the inverse determination of constituent parameters based on downward transitions is stressed on an example. Then a database improvement method, which exploits simultaneously the available information on constitutive uncertainties at all scales instead of just propagating those associated with one scale, is presented and shown to yield substantial reductions in uncertainty for both the constitutive parameters and the response of structures. The latter finding is demonstrated on two examples of structures, with significant gains in confidence obtained on both.
△ Less
Submitted 25 November, 2007;
originally announced November 2007.
-
X-ray scattering from stepped and kinked surfaces: An approach with the paracrystal model
Authors:
F. Leroy,
R. Lazzari,
G. Renaud
Abstract:
A general formalism of X-ray scattering from different kinds of surface morphologies is described. Based on a description of the surface morphology at the atomic scale through the use of the paracrystal model and discrete distributions of distances, the scattered intensity by non-periodic surfaces is calculated over the whole reciprocal space. In one dimension, the scattered intensity by a vicin…
▽ More
A general formalism of X-ray scattering from different kinds of surface morphologies is described. Based on a description of the surface morphology at the atomic scale through the use of the paracrystal model and discrete distributions of distances, the scattered intensity by non-periodic surfaces is calculated over the whole reciprocal space. In one dimension, the scattered intensity by a vicinal surface, the two-level model, the N-level model, the faceted surface and the rough surface are addressed. In two dimensions, the previous results are generalized to the kinked vicinal surface, the two-level vicinal surface and the step meandering on a vicinal surface. The concept of crystal truncation rod is generalized considering also the truncation of a terrace by a step (yielding a terrace truncation rod) and a step by a kink (yielding a step truncation rod).
△ Less
Submitted 22 July, 2007;
originally announced July 2007.
-
Self-Organized Growth of Nanoparticles on a Surface Patterned by a Buried Dislocation Network
Authors:
F. Leroy,
G. Renaud,
A. Letoublon,
R. Lazzari,
C. Mottet,
J. Goniakowski
Abstract:
The self-organized growth of Co nanoparticles with 10 nm periodicity was achieved at room temperature on a Ag(001) surface patterned by an underlying dislocation network, as shown by real time, in situ Grazing Incidence Small and Wide Angle X-ray Scattering. The misfit dislocation network, buried at the interface between a 5nm-thick Ag thin film and a MgO(001) substrate, induces a periodic strai…
▽ More
The self-organized growth of Co nanoparticles with 10 nm periodicity was achieved at room temperature on a Ag(001) surface patterned by an underlying dislocation network, as shown by real time, in situ Grazing Incidence Small and Wide Angle X-ray Scattering. The misfit dislocation network, buried at the interface between a 5nm-thick Ag thin film and a MgO(001) substrate, induces a periodic strain field on top of the surface. Nucleation and growth of Co on tensile areas are found as the most favorable sites as highlighted by Molecular Dynamic simulations.
△ Less
Submitted 22 July, 2007;
originally announced July 2007.
-
In situ GISAXS study of the growth of Pd on MgO(001)
Authors:
F. Leroy,
C. Revenant,
G. Renaud,
R. Lazzari
Abstract:
The morphology of growing Pd nano-particles on MgO(001) surfaces have been investigated in situ, during growth, by grazing incidence small angle x-ray scattering, for different substrate temperatures. The 2D patterns obtained are quantitatively analyzed, and the average morphological parameters (shape, size) deduced. Above 650 K, the aggregates adopt their equilibrium shape of truncated octahedr…
▽ More
The morphology of growing Pd nano-particles on MgO(001) surfaces have been investigated in situ, during growth, by grazing incidence small angle x-ray scattering, for different substrate temperatures. The 2D patterns obtained are quantitatively analyzed, and the average morphological parameters (shape, size) deduced. Above 650 K, the aggregates adopt their equilibrium shape of truncated octahedron, and the interfacial energy is deduced.
△ Less
Submitted 22 July, 2007;
originally announced July 2007.
-
Vicinal silicon surfaces: from step density wave to faceting
Authors:
F. Leroy,
P. Muller,
J. J. Metois,
O. Pierre-Louis
Abstract:
This paper investigates faceting mechanisms induced by electromigration in the regime where atomic steps are transparent. For this purpose we study several vicinal orientations by means of in-situ (optical diffraction, electronic microscopy) as well as ex-situ (AFM, microprofilometry) visualization techniques. The data show that faceting proceeds in two stages. The first stage is short and leads…
▽ More
This paper investigates faceting mechanisms induced by electromigration in the regime where atomic steps are transparent. For this purpose we study several vicinal orientations by means of in-situ (optical diffraction, electronic microscopy) as well as ex-situ (AFM, microprofilometry) visualization techniques. The data show that faceting proceeds in two stages. The first stage is short and leads to the appearance of a step density wave, with a wavelength roughly independent of the surface orientation. The second stage is much slower, and leads to the formation of a hill-and-valley structure, the period of which depends on the initial surface orientation. A simple continuum model enables us to point out why the wavelength of the step density wave does not depend on the microscale details of the surface. The final wavelength is controlled by the competition between elastic step-step interaction and facet edge energy cost. Finally, the surface stress angular dependence is shown to emerge as a coarsed-grained picture from the step model.
△ Less
Submitted 15 June, 2007;
originally announced June 2007.
-
Nucleosynthesis in evolved stars with the NACRE compilation
Authors:
A. Palacios,
F. Leroy,
C. Charbonnel,
M. Forestini
Abstract:
Nucleosynthesis in evolved (RGB and AGB) low-mass stars is reviewed under the light of the reaction rates recommended in the NACRE compilation (Angulo et al. 1999). We use a parametric model of stellar nucleosynthesis to investigate the uncertainties that still exist nowadays on the nuclear data and to give a critical point of view on the resulting evolution of the chemical abundances. We discus…
▽ More
Nucleosynthesis in evolved (RGB and AGB) low-mass stars is reviewed under the light of the reaction rates recommended in the NACRE compilation (Angulo et al. 1999). We use a parametric model of stellar nucleosynthesis to investigate the uncertainties that still exist nowadays on the nuclear data and to give a critical point of view on the resulting evolution of the chemical abundances. We discuss in particular (i) the NeNa and MgAl modes of hydrogen burning in the context of the chemical anomalies observed in RGB globular cluster stars, (ii) the helium combustion in a thermal pulse of an AGB star.
△ Less
Submitted 15 October, 1999;
originally announced October 1999.