Search | arXiv e-print repository

Multilingual Disinformation Detection for Digital Advertising

Authors: Zofia Trstanova, Nadir El Manouzi, Maryline Chen, Andre L. V. da Cunha, Sergei Ivanov

Abstract: In today's world, the presence of online disinformation and propaganda is more widespread than ever. Independent publishers are funded mostly via digital advertising, which is unfortunately also the case for those publishing disinformation content. The question of how to remove such publishers from advertising inventory has long been ignored, despite the negative impact on the open internet. In th… ▽ More In today's world, the presence of online disinformation and propaganda is more widespread than ever. Independent publishers are funded mostly via digital advertising, which is unfortunately also the case for those publishing disinformation content. The question of how to remove such publishers from advertising inventory has long been ignored, despite the negative impact on the open internet. In this work, we make the first step towards quickly detecting and red-flagging websites that potentially manipulate the public with disinformation. We build a machine learning model based on multilingual text embeddings that first determines whether the page mentions a topic of interest, then estimates the likelihood of the content being malicious, creating a shortlist of publishers that will be reviewed by human experts. Our system empowers internal teams to proactively, rather than defensively, blacklist unsafe content, thus protecting the reputation of the advertisement provider. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: Disinformation Countermeasures and Machine Learning Workshop at ICML 2022

arXiv:2004.06950 [pdf, other]

Machine learning force fields and coarse-grained variables in molecular dynamics: application to materials and biological systems

Authors: Paraskevi Gkeka, Gabriel Stoltz, Amir Barati Farimani, Zineb Belkacemi, Michele Ceriotti, John Chodera, Aaron R. Dinner, Andrew Ferguson, Jean-Bernard Maillet, Hervé Minoux, Christine Peter, Fabio Pietrucci, Ana Silveira, Alexandre Tkatchenko, Zofia Trstanova, Rafal Wiewiora, Tony Leliévre

Abstract: Machine learning encompasses a set of tools and algorithms which are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals… ▽ More Machine learning encompasses a set of tools and algorithms which are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals, benefits, and limitations of machine learning techniques for computational studies on atomistic systems, focusing on the construction of empirical force fields from ab-initio databases and the determination of reaction coordinates for free energy computation and enhanced sampling. △ Less

Submitted 15 April, 2020; originally announced April 2020.

arXiv:1903.08901 [pdf, other]

doi 10.1088/1742-6596/1222/1/012041

Transferability of Operational Status Classification Models Among Different Wind Turbine Typesq

Authors: Z. Trstanova, A. Martinsson, C. Matthews, S. Jimenez, B. Leimkuhler, T. Van Delft, M. Wilkinson

Abstract: A detailed understanding of wind turbine performance status classification can improve operations and maintenance in the wind energy industry. Due to different engineering properties of wind turbines, the standard supervised learning models used for classification do not generalize across data sets obtained from different wind sites. We propose two methods to deal with the transferability of the t… ▽ More A detailed understanding of wind turbine performance status classification can improve operations and maintenance in the wind energy industry. Due to different engineering properties of wind turbines, the standard supervised learning models used for classification do not generalize across data sets obtained from different wind sites. We propose two methods to deal with the transferability of the trained models: first, data normalization in the form of power curve alignment, and second, a robust method based on convolutional neural networks and feature-space extension. We demonstrate the success of our methods on real-world data sets with industrial applications. △ Less

Submitted 21 March, 2019; originally announced March 2019.

Comments: 9 pages

arXiv:1903.08640 [pdf, other]

TATi-Thermodynamic Analytics ToolkIt: TensorFlow-based software for posterior sampling in machine learning applications

Authors: Frederik Heber, Zofia Trstanova, Benedict Leimkuhler

Abstract: With the advent of GPU-assisted hardware and maturing high-efficiency software platforms such as TensorFlow and PyTorch, Bayesian posterior sampling for neural networks becomes plausible. In this article we discuss Bayesian parametrization in machine learning based on Markov Chain Monte Carlo methods, specifically discretized stochastic differential equations such as Langevin dynamics and extended… ▽ More With the advent of GPU-assisted hardware and maturing high-efficiency software platforms such as TensorFlow and PyTorch, Bayesian posterior sampling for neural networks becomes plausible. In this article we discuss Bayesian parametrization in machine learning based on Markov Chain Monte Carlo methods, specifically discretized stochastic differential equations such as Langevin dynamics and extended system methods in which an ensemble of walkers is employed to enhance sampling. We provide a glimpse of the potential of the sampling-intensive approach by studying (and visualizing) the loss landscape of a neural network applied to the MNIST data set. Moreover, we investigate how the sampling efficiency itself can be significantly enhanced through an ensemble quasi-Newton preconditioning method. This article accompanies the release of a new TensorFlow software package, the Thermodynamic Analytics ToolkIt, which is used in the computational experiments. △ Less

Submitted 3 March, 2020; v1 submitted 20 March, 2019; originally announced March 2019.

Comments: 25 pages: textual improvements with results unchanged, sections on TATi architecture and software performance removed for size constraints, extended EQN parts, added MNIST nonlinear perceptron example

arXiv:1901.06936 [pdf, other]

doi 10.1098/rspa.2019.0036

Local and Global Perspectives on Diffusion Maps in the Analysis of Molecular Systems

Authors: Zofia Trstanova, Ben Leimkuhler, Tony Lelièvre

Abstract: Diffusion maps approximate the generator of Langevin dynamics from simulation data. They afford a means of identifying the slowly-evolving principal modes of high-dimensional molecular systems. When combined with a biasing mechanism, diffusion maps can accelerate the sampling of the stationary Boltzmann-Gibbs distribution. In this work, we contrast the local and global perspectives on diffusion ma… ▽ More Diffusion maps approximate the generator of Langevin dynamics from simulation data. They afford a means of identifying the slowly-evolving principal modes of high-dimensional molecular systems. When combined with a biasing mechanism, diffusion maps can accelerate the sampling of the stationary Boltzmann-Gibbs distribution. In this work, we contrast the local and global perspectives on diffusion maps, based on whether or not the data distribution has been fully explored. In the global setting, we use diffusion maps to identify metastable sets and to approximate the corresponding committor functions of transitions between them. We also discuss the use of diffusion maps within the metastable sets, formalising the locality via the concept of the quasi-stationary distribution and justifying the convergence of diffusion maps within a local equilibrium. This perspective allows us to propose an enhanced sampling algorithm. We demonstrate the practical relevance of these approaches both for simple models and for molecular dynamics problems (alanine dipeptide and deca-alanine). △ Less

Submitted 24 November, 2019; v1 submitted 21 January, 2019; originally announced January 2019.

arXiv:1710.03484 [pdf, other]

Diffusion maps tailored to arbitrary non-degenerate Ito processes

Authors: Ralf Banisch, Zofia Trstanova, Andreas Bittracher, Stefan Klus, Peter Koltai

Abstract: We present two generalizations of the popular diffusion maps algorithm. The first generalization replaces the drift term in diffusion maps, which is the gradient of the sampling density, with the gradient of an arbitrary density of interest which is known up to a normalization constant. The second generalization allows for a diffusion map type approximation of the forward and backward generators o… ▽ More We present two generalizations of the popular diffusion maps algorithm. The first generalization replaces the drift term in diffusion maps, which is the gradient of the sampling density, with the gradient of an arbitrary density of interest which is known up to a normalization constant. The second generalization allows for a diffusion map type approximation of the forward and backward generators of general Ito diffusions with given drift and diffusion coefficients. We use the local kernels introduced by Berry and Sauer, but allow for arbitrary sampling densities. We provide numerical illustrations to demonstrate that this opens up many new applications for diffusion maps as a tool to organize point cloud data, including biased or corrupted samples, dimension reduction for dynamical systems, detection of almost invariant regions in flow fields, and importance sampling. △ Less

Submitted 10 October, 2017; originally announced October 2017.

arXiv:1609.02891 [pdf, other]

doi 10.1137/16M110575X

Langevin dynamics with general kinetic energies

Authors: Gabriel Stoltz, Zofia Trstanova

Abstract: We study Langevin dynamics with a kinetic energy different from the standard, quadratic one in order to accelerate the sampling of Boltzmann-Gibbs distributions. In particular, this kinetic energy can be non-globally Lipschitz, which raises issues for the stability of discretizations of the associated Langevin dynamics. We first prove the exponential convergence of the law of the continuous proces… ▽ More We study Langevin dynamics with a kinetic energy different from the standard, quadratic one in order to accelerate the sampling of Boltzmann-Gibbs distributions. In particular, this kinetic energy can be non-globally Lipschitz, which raises issues for the stability of discretizations of the associated Langevin dynamics. We first prove the exponential convergence of the law of the continuous process to the Boltzmann-Gibbs measure by a hypocoercive approach, and characterize the asymptotic variance of empirical averages over trajectories. We next develop numerical schemes which are stable and of weak order two, by considering splitting strategies where the discretizations of the fluctuation/dissipation are corrected by a Metropolis procedure. We use the newly developped schemes for two applications: optimizing the shape of the kinetic energy for the so-called adaptively restrained Langevin dynamics (which considers perturbations of standard quadratic kinetic energies vanishing around the origin); and reducing the metastability of some toy models using non-globally Lipschitz kinetic energies. △ Less

Submitted 12 May, 2018; v1 submitted 9 September, 2016; originally announced September 2016.

arXiv:1607.01489 [pdf, other]

doi 10.1016/j.jcp.2017.02.010

Estimating the speed-up of Adaptively Restrained Langevin Dynamics

Authors: Zofia Trstanova, Stephane Redon

Abstract: We consider Adaptively Restrained Langevin dynamics, in which the kinetic energy function vanishes for small velocities. Properly parameterized, this dynamics makes it possible to reduce the computational complexity of updating inter-particle forces, and to accelerate the computation of ergodic averages of molecular simulations. In this paper, we analyze the influence of the method parameters on t… ▽ More We consider Adaptively Restrained Langevin dynamics, in which the kinetic energy function vanishes for small velocities. Properly parameterized, this dynamics makes it possible to reduce the computational complexity of updating inter-particle forces, and to accelerate the computation of ergodic averages of molecular simulations. In this paper, we analyze the influence of the method parameters on the total achievable speed-up. In particular, we estimate both the algorithmic speed-up, resulting from incremental force updates, and the influence of the change of the dynamics on the asymptotic variance. This allows us to propose a practical strategy for the parametrization of the method. We validate these theoretical results by representative numerical experiments. △ Less

Submitted 27 March, 2017; v1 submitted 6 July, 2016; originally announced July 2016.

Journal ref: Journal of Computational Physics, Volume 336, 1 May 2017, Pages 412-428, ISSN 0021-9991

arXiv:1601.07411 [pdf, ps, other]

doi 10.1007/s10955-016-1544-6

Error Analysis of Modified Langevin Dynamics

Authors: Stephane Redon, Gabriel Stoltz, Zofia Trstanova

Abstract: We consider Langevin dynamics associated with a modified kinetic energy vanishing for small momenta. This allows us to freeze slow particles, and hence avoid the re-computation of inter-particle forces, which leads to computational gains. On the other hand, the statistical error may increase since there are a priori more correlations in time. The aim of this work is first to prove the ergodicity o… ▽ More We consider Langevin dynamics associated with a modified kinetic energy vanishing for small momenta. This allows us to freeze slow particles, and hence avoid the re-computation of inter-particle forces, which leads to computational gains. On the other hand, the statistical error may increase since there are a priori more correlations in time. The aim of this work is first to prove the ergodicity of the modified Langevin dynamics (which fails to be hypoelliptic), and next to analyze how the asymptotic variance on ergodic averages depends on the parameters of the modified kinetic energy. Numerical results illustrate the approach, both for low-dimensional systems where we resort to a Galerkin approximation of the generator, and for more realistic systems using Monte Carlo simulations. △ Less

Submitted 27 January, 2016; originally announced January 2016.

Showing 1–9 of 9 results for author: Trstanova, Z