Search | arXiv e-print repository

Evaluating Word Embedding Hyper-Parameters for Similarity and Analogy Tasks

Authors: Maryam Fanaeepour, Adam Makarucha, Jey Han Lau

Abstract: The versatility of word embeddings for various applications is attracting researchers from various fields. However, the impact of hyper-parameters when training embedding model is often poorly understood. How much do hyper-parameters such as vector dimensions and corpus size affect the quality of embeddings, and how do these results translate to downstream applications? Using standard embedding ev… ▽ More The versatility of word embeddings for various applications is attracting researchers from various fields. However, the impact of hyper-parameters when training embedding model is often poorly understood. How much do hyper-parameters such as vector dimensions and corpus size affect the quality of embeddings, and how do these results translate to downstream applications? Using standard embedding evaluation metrics and datasets, we conduct a study to empirically measure the impact of these hyper-parameters. △ Less

Submitted 11 April, 2018; originally announced April 2018.

Comments: 8 pages, 16 figures

arXiv:1702.05607 [pdf, other]

End-to-End Differentially-Private Parameter Tuning in Spatial Histograms

Authors: Maryam Fanaeepour, Benjamin I. P. Rubinstein

Abstract: Differentially-private histograms have emerged as a key tool for location privacy. While past mechanisms have included theoretical & experimental analysis, it has recently been observed that much of the existing literature does not fully provide differential privacy. The missing component, private parameter tuning, is necessary for rigorous evaluation of these mechanisms. Instead works frequently… ▽ More Differentially-private histograms have emerged as a key tool for location privacy. While past mechanisms have included theoretical & experimental analysis, it has recently been observed that much of the existing literature does not fully provide differential privacy. The missing component, private parameter tuning, is necessary for rigorous evaluation of these mechanisms. Instead works frequently tune on training data to optimise parameters without consideration of privacy; in other cases selection is performed arbitrarily and independent of data, degrading utility. We address this open problem by deriving a principled tuning mechanism that privately optimises data-dependent error bounds. Theoretical results establish privacy and utility while extensive experimentation demonstrates that we can practically achieve true end-to-end privacy. △ Less

Submitted 18 February, 2017; originally announced February 2017.

Comments: 13 pages, 11 figures

arXiv:1609.07983 [pdf, other]

doi 10.1007/s10115-017-1113-6

Differentially-Private Counting of Users' Spatial Regions

Authors: Maryam Fanaeepour, Benjamin I. P. Rubinstein

Abstract: Mining of spatial data is an enabling technology for mobile services, Internet-connected cars, and the Internet of Things. But the very distinctiveness of spatial data that drives utility, can cost user privacy. Past work has focused upon points and trajectories for differentially-private release. In this work, we continue the tradition of privacy-preserving spatial analytics, focusing not on poin… ▽ More Mining of spatial data is an enabling technology for mobile services, Internet-connected cars, and the Internet of Things. But the very distinctiveness of spatial data that drives utility, can cost user privacy. Past work has focused upon points and trajectories for differentially-private release. In this work, we continue the tradition of privacy-preserving spatial analytics, focusing not on point or path data, but on planar spatial regions. Such data represents the area of a user's most frequent visitation---such as "around home and nearby shops". Specifically we consider the differentially-private release of data structures that support range queries for counting users' spatial regions. Counting planar regions leads to unique challenges not faced in existing work. A user's spatial region that straddles multiple data structure cells can lead to duplicate counting at query time. We provably avoid this pitfall by leveraging the Euler characteristic for the first time with differential privacy. To address the increased sensitivity of range queries to spatial region data, we calibrate privacy-preserving noise using bounded user region size and a constrained inference that uses robust least absolute deviations. Our novel constrained inference reduces noise and promotes covertness by (privately) imposing consistency. We provide a full end-to-end theoretical analysis of both differential privacy and high-probability utility for our approach using concentration bounds. A comprehensive experimental study on several real-world datasets establishes practical validity. △ Less

Submitted 5 February, 2018; v1 submitted 26 September, 2016; originally announced September 2016.

Comments: 27 pages, 14 figures

Journal ref: Knowl.Inf.Syst 54 (2018) 5-32

Showing 1–3 of 3 results for author: Fanaeepour, M