-
Multilingual Disinformation Detection for Digital Advertising
Authors:
Zofia Trstanova,
Nadir El Manouzi,
Maryline Chen,
Andre L. V. da Cunha,
Sergei Ivanov
Abstract:
In today's world, the presence of online disinformation and propaganda is more widespread than ever. Independent publishers are funded mostly via digital advertising, which is unfortunately also the case for those publishing disinformation content. The question of how to remove such publishers from advertising inventory has long been ignored, despite the negative impact on the open internet. In th…
▽ More
In today's world, the presence of online disinformation and propaganda is more widespread than ever. Independent publishers are funded mostly via digital advertising, which is unfortunately also the case for those publishing disinformation content. The question of how to remove such publishers from advertising inventory has long been ignored, despite the negative impact on the open internet. In this work, we make the first step towards quickly detecting and red-flagging websites that potentially manipulate the public with disinformation. We build a machine learning model based on multilingual text embeddings that first determines whether the page mentions a topic of interest, then estimates the likelihood of the content being malicious, creating a shortlist of publishers that will be reviewed by human experts. Our system empowers internal teams to proactively, rather than defensively, blacklist unsafe content, thus protecting the reputation of the advertisement provider.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Machine learning force fields and coarse-grained variables in molecular dynamics: application to materials and biological systems
Authors:
Paraskevi Gkeka,
Gabriel Stoltz,
Amir Barati Farimani,
Zineb Belkacemi,
Michele Ceriotti,
John Chodera,
Aaron R. Dinner,
Andrew Ferguson,
Jean-Bernard Maillet,
Hervé Minoux,
Christine Peter,
Fabio Pietrucci,
Ana Silveira,
Alexandre Tkatchenko,
Zofia Trstanova,
Rafal Wiewiora,
Tony Leliévre
Abstract:
Machine learning encompasses a set of tools and algorithms which are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals…
▽ More
Machine learning encompasses a set of tools and algorithms which are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals, benefits, and limitations of machine learning techniques for computational studies on atomistic systems, focusing on the construction of empirical force fields from ab-initio databases and the determination of reaction coordinates for free energy computation and enhanced sampling.
△ Less
Submitted 15 April, 2020;
originally announced April 2020.
-
Transferability of Operational Status Classification Models Among Different Wind Turbine Typesq
Authors:
Z. Trstanova,
A. Martinsson,
C. Matthews,
S. Jimenez,
B. Leimkuhler,
T. Van Delft,
M. Wilkinson
Abstract:
A detailed understanding of wind turbine performance status classification can improve operations and maintenance in the wind energy industry. Due to different engineering properties of wind turbines, the standard supervised learning models used for classification do not generalize across data sets obtained from different wind sites. We propose two methods to deal with the transferability of the t…
▽ More
A detailed understanding of wind turbine performance status classification can improve operations and maintenance in the wind energy industry. Due to different engineering properties of wind turbines, the standard supervised learning models used for classification do not generalize across data sets obtained from different wind sites. We propose two methods to deal with the transferability of the trained models: first, data normalization in the form of power curve alignment, and second, a robust method based on convolutional neural networks and feature-space extension. We demonstrate the success of our methods on real-world data sets with industrial applications.
△ Less
Submitted 21 March, 2019;
originally announced March 2019.
-
TATi-Thermodynamic Analytics ToolkIt: TensorFlow-based software for posterior sampling in machine learning applications
Authors:
Frederik Heber,
Zofia Trstanova,
Benedict Leimkuhler
Abstract:
With the advent of GPU-assisted hardware and maturing high-efficiency software platforms such as TensorFlow and PyTorch, Bayesian posterior sampling for neural networks becomes plausible. In this article we discuss Bayesian parametrization in machine learning based on Markov Chain Monte Carlo methods, specifically discretized stochastic differential equations such as Langevin dynamics and extended…
▽ More
With the advent of GPU-assisted hardware and maturing high-efficiency software platforms such as TensorFlow and PyTorch, Bayesian posterior sampling for neural networks becomes plausible. In this article we discuss Bayesian parametrization in machine learning based on Markov Chain Monte Carlo methods, specifically discretized stochastic differential equations such as Langevin dynamics and extended system methods in which an ensemble of walkers is employed to enhance sampling. We provide a glimpse of the potential of the sampling-intensive approach by studying (and visualizing) the loss landscape of a neural network applied to the MNIST data set. Moreover, we investigate how the sampling efficiency itself can be significantly enhanced through an ensemble quasi-Newton preconditioning method. This article accompanies the release of a new TensorFlow software package, the Thermodynamic Analytics ToolkIt, which is used in the computational experiments.
△ Less
Submitted 3 March, 2020; v1 submitted 20 March, 2019;
originally announced March 2019.
-
Local and Global Perspectives on Diffusion Maps in the Analysis of Molecular Systems
Authors:
Zofia Trstanova,
Ben Leimkuhler,
Tony Lelièvre
Abstract:
Diffusion maps approximate the generator of Langevin dynamics from simulation data. They afford a means of identifying the slowly-evolving principal modes of high-dimensional molecular systems. When combined with a biasing mechanism, diffusion maps can accelerate the sampling of the stationary Boltzmann-Gibbs distribution. In this work, we contrast the local and global perspectives on diffusion ma…
▽ More
Diffusion maps approximate the generator of Langevin dynamics from simulation data. They afford a means of identifying the slowly-evolving principal modes of high-dimensional molecular systems. When combined with a biasing mechanism, diffusion maps can accelerate the sampling of the stationary Boltzmann-Gibbs distribution. In this work, we contrast the local and global perspectives on diffusion maps, based on whether or not the data distribution has been fully explored. In the global setting, we use diffusion maps to identify metastable sets and to approximate the corresponding committor functions of transitions between them. We also discuss the use of diffusion maps within the metastable sets, formalising the locality via the concept of the quasi-stationary distribution and justifying the convergence of diffusion maps within a local equilibrium. This perspective allows us to propose an enhanced sampling algorithm. We demonstrate the practical relevance of these approaches both for simple models and for molecular dynamics problems (alanine dipeptide and deca-alanine).
△ Less
Submitted 24 November, 2019; v1 submitted 21 January, 2019;
originally announced January 2019.
-
Diffusion maps tailored to arbitrary non-degenerate Ito processes
Authors:
Ralf Banisch,
Zofia Trstanova,
Andreas Bittracher,
Stefan Klus,
Peter Koltai
Abstract:
We present two generalizations of the popular diffusion maps algorithm. The first generalization replaces the drift term in diffusion maps, which is the gradient of the sampling density, with the gradient of an arbitrary density of interest which is known up to a normalization constant. The second generalization allows for a diffusion map type approximation of the forward and backward generators o…
▽ More
We present two generalizations of the popular diffusion maps algorithm. The first generalization replaces the drift term in diffusion maps, which is the gradient of the sampling density, with the gradient of an arbitrary density of interest which is known up to a normalization constant. The second generalization allows for a diffusion map type approximation of the forward and backward generators of general Ito diffusions with given drift and diffusion coefficients. We use the local kernels introduced by Berry and Sauer, but allow for arbitrary sampling densities. We provide numerical illustrations to demonstrate that this opens up many new applications for diffusion maps as a tool to organize point cloud data, including biased or corrupted samples, dimension reduction for dynamical systems, detection of almost invariant regions in flow fields, and importance sampling.
△ Less
Submitted 10 October, 2017;
originally announced October 2017.
-
Langevin dynamics with general kinetic energies
Authors:
Gabriel Stoltz,
Zofia Trstanova
Abstract:
We study Langevin dynamics with a kinetic energy different from the standard, quadratic one in order to accelerate the sampling of Boltzmann-Gibbs distributions. In particular, this kinetic energy can be non-globally Lipschitz, which raises issues for the stability of discretizations of the associated Langevin dynamics. We first prove the exponential convergence of the law of the continuous proces…
▽ More
We study Langevin dynamics with a kinetic energy different from the standard, quadratic one in order to accelerate the sampling of Boltzmann-Gibbs distributions. In particular, this kinetic energy can be non-globally Lipschitz, which raises issues for the stability of discretizations of the associated Langevin dynamics. We first prove the exponential convergence of the law of the continuous process to the Boltzmann-Gibbs measure by a hypocoercive approach, and characterize the asymptotic variance of empirical averages over trajectories. We next develop numerical schemes which are stable and of weak order two, by considering splitting strategies where the discretizations of the fluctuation/dissipation are corrected by a Metropolis procedure. We use the newly developped schemes for two applications: optimizing the shape of the kinetic energy for the so-called adaptively restrained Langevin dynamics (which considers perturbations of standard quadratic kinetic energies vanishing around the origin); and reducing the metastability of some toy models using non-globally Lipschitz kinetic energies.
△ Less
Submitted 12 May, 2018; v1 submitted 9 September, 2016;
originally announced September 2016.
-
Estimating the speed-up of Adaptively Restrained Langevin Dynamics
Authors:
Zofia Trstanova,
Stephane Redon
Abstract:
We consider Adaptively Restrained Langevin dynamics, in which the kinetic energy function vanishes for small velocities. Properly parameterized, this dynamics makes it possible to reduce the computational complexity of updating inter-particle forces, and to accelerate the computation of ergodic averages of molecular simulations. In this paper, we analyze the influence of the method parameters on t…
▽ More
We consider Adaptively Restrained Langevin dynamics, in which the kinetic energy function vanishes for small velocities. Properly parameterized, this dynamics makes it possible to reduce the computational complexity of updating inter-particle forces, and to accelerate the computation of ergodic averages of molecular simulations. In this paper, we analyze the influence of the method parameters on the total achievable speed-up. In particular, we estimate both the algorithmic speed-up, resulting from incremental force updates, and the influence of the change of the dynamics on the asymptotic variance. This allows us to propose a practical strategy for the parametrization of the method. We validate these theoretical results by representative numerical experiments.
△ Less
Submitted 27 March, 2017; v1 submitted 6 July, 2016;
originally announced July 2016.
-
Error Analysis of Modified Langevin Dynamics
Authors:
Stephane Redon,
Gabriel Stoltz,
Zofia Trstanova
Abstract:
We consider Langevin dynamics associated with a modified kinetic energy vanishing for small momenta. This allows us to freeze slow particles, and hence avoid the re-computation of inter-particle forces, which leads to computational gains. On the other hand, the statistical error may increase since there are a priori more correlations in time. The aim of this work is first to prove the ergodicity o…
▽ More
We consider Langevin dynamics associated with a modified kinetic energy vanishing for small momenta. This allows us to freeze slow particles, and hence avoid the re-computation of inter-particle forces, which leads to computational gains. On the other hand, the statistical error may increase since there are a priori more correlations in time. The aim of this work is first to prove the ergodicity of the modified Langevin dynamics (which fails to be hypoelliptic), and next to analyze how the asymptotic variance on ergodic averages depends on the parameters of the modified kinetic energy. Numerical results illustrate the approach, both for low-dimensional systems where we resort to a Galerkin approximation of the generator, and for more realistic systems using Monte Carlo simulations.
△ Less
Submitted 27 January, 2016;
originally announced January 2016.