Skip to main content

Showing 1–50 of 96 results for author: Phillips, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.00051  [pdf, ps, other

    cs.CY

    Beyond Monoliths: Expert Orchestration for More Capable, Democratic, and Safe Large Language Models

    Authors: Philip Quirke, Narmeen Oozeer, Chaithanya Bandi, Amir Abdullah, Jason Hoelscher-Obermaier, Jeff M. Phillips, Joshua Greaves, Clement Neo, Fazl Barez, Shriyash Upadhyay

    Abstract: This position paper argues that the prevailing trajectory toward ever larger, more expensive generalist foundation models controlled by a handful of big companies limits innovation and constrains progress. We challenge this approach by advocating for an "Expert Orchestration" framework as a superior alternative that democratizes LLM advancement. Our proposed framework intelligently selects from th… ▽ More

    Submitted 28 May, 2025; originally announced June 2025.

    Comments: 9 pages, 2 figures

  2. arXiv:2505.02472  [pdf, other

    cs.CG

    Trajectory Minimum Touching Ball

    Authors: Jeff M. Phillips, Jens Kristian Refsgaard Schou

    Abstract: We present algorithms to find the minimum radius sphere that intersects every trajectory in a set of $n$ trajectories composed of at most $k$ line segments each. When $k=1$, we can reduce the problem to the LP-type framework to achieve a linear time complexity. For $k \geq 4$ we provide a trajectory configuration with unbounded LP-type complexity, but also present an almost… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  3. arXiv:2504.11299  [pdf, other

    stat.CO cs.CG cs.LG

    Efficient and Stable Multi-Dimensional Kolmogorov-Smirnov Distance

    Authors: Peter Matthew Jacobs, Foad Namjoo, Jeff M. Phillips

    Abstract: We revisit extending the Kolmogorov-Smirnov distance between probability distributions to the multidimensional setting and make new arguments about the proper way to approach this generalization. Our proposed formulation maximizes the difference over orthogonal dominating rectangular ranges (d-sided rectangles in R^d), and is an integral probability metric. We also prove that the distance between… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 21 pages, Primary: stat.CO. Secondary: cs.CG, cs.LG

  4. arXiv:2504.05627  [pdf, other

    cs.LG

    Maternal and Fetal Health Status Assessment by Using Machine Learning on Optical 3D Body Scans

    Authors: Ruting Cheng, Yijiang Zheng, Boyuan Feng, Chuhui Qiu, Zhuoxin Long, Joaquin A. Calderon, Xiaoke Zhang, Jaclyn M. Phillips, James K. Hahn

    Abstract: Monitoring maternal and fetal health during pregnancy is crucial for preventing adverse outcomes. While tests such as ultrasound scans offer high accuracy, they can be costly and inconvenient. Telehealth and more accessible body shape information provide pregnant women with a convenient way to monitor their health. This study explores the potential of 3D body scan data, captured during the 18-24 g… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  5. arXiv:2502.11324  [pdf, other

    stat.ML cs.LG

    Robust High-Dimensional Mean Estimation With Low Data Size, an Empirical Study

    Authors: Cullen Anderson, Jeff M. Phillips

    Abstract: Robust statistics aims to compute quantities to represent data where a fraction of it may be arbitrarily corrupted. The most essential statistic is the mean, and in recent years, there has been a flurry of theoretical advancement for efficiently estimating the mean in high dimensions on corrupted data. While several algorithms have been proposed that achieve near-optimal error, they all rely on la… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

    Journal ref: Transactions on Machine Learning Research (TMLR), February 2025, ISSN: 2835-8856

  6. Fast Comparative Analysis of Merge Trees Using Locality Sensitive Hashing

    Authors: Weiran Lyu, Raghavendra Sridharamurthy, Jeff M. Phillips, Bei Wang

    Abstract: Scalar field comparison is a fundamental task in scientific visualization. In topological data analysis, we compare topological descriptors of scalar fields -- such as persistence diagrams and merge trees -- because they provide succinct and robust abstract representations. Several similarity measures for topological descriptors seem to be both asymptotically and practically efficient with polynom… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: IEEE VIS 2024

  7. arXiv:2402.05280  [pdf, ps, other

    cs.LG cs.CG

    No Dimensional Sampling Coresets for Classification

    Authors: Meysam Alishahi, Jeff M. Phillips

    Abstract: We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework. Such coresets seek the smallest possible subsets of input data, so one can optimize a loss function on the coreset and ensure approximation guarantees with respect to the original data. Our analysis provides the first no dimensional coresets, so the size does not depend on the… ▽ More

    Submitted 22 July, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 42 Pages

  8. arXiv:2402.02746  [pdf, other

    cs.LG stat.ML

    Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

    Authors: Zhitong Xu, Haitao Wang, Jeff M Phillips, Shandian Zhe

    Abstract: A long-standing belief holds that Bayesian Optimization (BO) with standard Gaussian processes (GP) -- referred to as standard BO -- underperforms in high-dimensional optimization problems. While this belief seems plausible, it lacks both robust empirical evidence and theoretical justification. To address this gap, we present a systematic investigation. First, through a comprehensive evaluation acr… ▽ More

    Submitted 11 March, 2025; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: ICLR 2025 camera-ready version

  9. arXiv:2311.09394  [pdf, other

    cs.SE cs.PL

    GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production

    Authors: Kostya Serebryany, Chris Kennelly, Mitch Phillips, Matt Denton, Marco Elver, Alexander Potapenko, Matt Morehouse, Vlad Tsyrklevich, Christian Holler, Julian Lettner, David Kilzer, Lander Brandt

    Abstract: Despite the recent advances in pre-production bug detection, heap-use-after-free and heap-buffer-overflow bugs remain the primary problem for security, reliability, and developer productivity for applications written in C or C++, across all major software ecosystems. Memory-safe languages solve this problem when they are used, but the existing code bases consisting of billions of lines of C and C+… ▽ More

    Submitted 13 January, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  10. arXiv:2311.05651  [pdf, other

    cs.CG cs.DS cs.LG

    On Mergable Coresets for Polytope Distance

    Authors: Benwei Shi, Aditya Bhaskara, Wai Ming Tai, Jeff M. Phillips

    Abstract: We show that a constant-size constant-error coreset for polytope distance is simple to maintain under merges of coresets. However, increasing the size cannot improve the error bound significantly beyond that constant.

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Presented in SoCG'19 Young Researchers Forum (CG:YRF)

    ACM Class: I.3.5

  11. arXiv:2311.03393  [pdf, other

    cs.DB cs.AI

    Sketching Multidimensional Time Series for Fast Discord Mining

    Authors: Chin-Chia Michael Yeh, Yan Zheng, Menghai Pan, Huiyuan Chen, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei Zhang, Jeff M. Phillips, Eamonn Keogh

    Abstract: Time series discords are a useful primitive for time series anomaly detection, and the matrix profile is capable of capturing discord effectively. There exist many research efforts to improve the scalability of discord discovery with respect to the length of time series. However, there is surprisingly little work focused on reducing the time complexity of matrix profile computation associated with… ▽ More

    Submitted 7 December, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

  12. arXiv:2310.03919  [pdf, other

    cs.IR cs.AI cs.LG

    An Efficient Content-based Time Series Retrieval System

    Authors: Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Junpeng Wang, Vivian Lai, Yujie Fan, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang, Jeff M. Phillips

    Abstract: A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated met… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  13. arXiv:2308.07418  [pdf, other

    cs.LG stat.ML

    Locally Adaptive and Differentiable Regression

    Authors: Mingxuan Han, Varun Shankar, Jeff M Phillips, Chenglong Ye

    Abstract: Over-parameterized models like deep nets and random forests have become very popular in machine learning. However, the natural goals of continuity and differentiability, common in regression models, are now often ignored in modern overparametrized, locally-adaptive models. We propose a general framework to construct a global continuous and differentiable model based on a weighted average of locall… ▽ More

    Submitted 12 October, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

    Journal ref: Journal of Machine Learning for Modeling and Computing 2023

  14. arXiv:2306.16516  [pdf, ps, other

    cs.CG cs.LG

    Dimension-Independent Kernel ε-Covers

    Authors: Jeff M. Phillips, Hasan Pourmahmood-Aghababa

    Abstract: We introduce the notion of an $\varepsilon$-cover for a kernel range space. A kernel range space concerns a set of points $X \subset \mathbb{R}^d$ and the space of all queries by a fixed kernel (e.g., a Gaussian kernel $K(p,\cdot) = \exp(-\|p-\cdot\|^2)$, where $p \in \mathbb{R}^d$). For a point set $X$ of size $n$, a query returns a vector of values $R_p \in \mathbb{R}^n$, where the $i$th coordin… ▽ More

    Submitted 12 June, 2025; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: 27 pages

    Journal ref: Computing in Geometry and Topology, Volume 4(1), 2025, Article 5, pp. 1-28

  15. arXiv:2306.03173  [pdf, other

    cs.LG

    Linear Distance Metric Learning with Noisy Labels

    Authors: Meysam Alishahi, Anna Little, Jeff M. Phillips

    Abstract: In linear distance metric learning, we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible. In this paper, we formalize a simple and elegant method which reduces to a general continuous convex loss optimization problem, and for different noise models we derive… ▽ More

    Submitted 20 December, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: 52 pages

  16. arXiv:2305.18774  [pdf, other

    cs.LG cs.NE

    Bayesian Decision Trees Inspired from Evolutionary Algorithms

    Authors: Efthyvoulos Drousiotis, Alexander M. Phillips, Paul G. Spirakis, Simon Maskell

    Abstract: Bayesian Decision Trees (DTs) are generally considered a more advanced and accurate model than a regular Decision Tree (DT) because they can handle complex and uncertain data. Existing work on Bayesian DTs uses Markov Chain Monte Carlo (MCMC) with an accept-reject mechanism and sample using naive proposals to proceed to the next iteration, which can be slow because of the burn-in time needed. We c… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:2301.09090

  17. arXiv:2305.16606  [pdf, other

    cs.IR

    Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes Approach

    Authors: Tao Yang, Cuize Han, Chen Luo, Parth Gupta, Jeff M. Phillips, Qingyao Ai

    Abstract: Ranking is at the core of many artificial intelligence (AI) applications, including search engines, recommender systems, etc. Modern ranking systems are often constructed with learning-to-rank (LTR) models built from user behavior signals. While previous studies have demonstrated the effectiveness of using user behavior signals (e.g., clicks) as both features and labels of LTR algorithms, we argue… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  18. arXiv:2210.12704  [pdf, other

    cs.LG

    Batch Multi-Fidelity Active Learning with Budget Constraints

    Authors: Shibo Li, Jeff M. Phillips, Xin Yu, Robert M. Kirby, Shandian Zhe

    Abstract: Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e.g. by running numerical solvers. The recent work (Li et al., 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

  19. arXiv:2209.01322  [pdf, other

    cs.CG cs.LG

    Classifying Spatial Trajectories

    Authors: Hasan Pourmahmood-Aghababa, Jeff M. Phillips

    Abstract: We provide the first comprehensive study on how to classify trajectories using only their spatial representations, measured on 5 real-world data sets. Our comparison considers 20 distinct classifiers arising either as a KNN classifier of a popular distance, or as a more general type of classifier using a vectorized representation of each trajectory. We additionally develop new methods for how to v… ▽ More

    Submitted 3 September, 2022; originally announced September 2022.

    Comments: 21 pages, 15 figures

  20. arXiv:2112.10931  [pdf, other

    cs.NI

    Hiding Signal Strength Interference from Outside Adversaries

    Authors: Mingxuan Han, Jeff M. Phillips, Sneha Kumar Kasera

    Abstract: The presence of people can be detected by passively observing the signal strength of Wifi and related forms of communication. This paper tackles the question of how and when can this be prevented by adjustments to the transmitted signal strength, and other similar measures. The main contribution of this paper is a formal framework to analyze this problem, and the identification of several scenario… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: 6 pages, 2 figures

  21. arXiv:2112.01537  [pdf, other

    cs.HC cs.AI cs.LG

    Improving mathematical questioning in teacher training

    Authors: Debajyoti Datta, Maria Phillips, James P Bywater, Jennifer Chiu, Ginger S. Watson, Laura E. Barnes, Donald E Brown

    Abstract: High-fidelity, AI-based simulated classroom systems enable teachers to rehearse effective teaching strategies. However, dialogue-oriented open-ended conversations such as teaching a student about scale factors can be difficult to model. This paper builds a text-based interactive conversational agent to help teachers practice mathematical questioning skills based on the well-known Instructional Qua… ▽ More

    Submitted 6 December, 2021; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: Accepted to appear at the NeurIPS 2021 Human Centered AI Workshop (HCAI). Data collection process for this data is described here arXiv:2112.00985

  22. arXiv:2112.00985  [pdf, other

    cs.AI cs.HC cs.LG

    Evaluation of mathematical questioning strategies using data collected through weak supervision

    Authors: Debajyoti Datta, Maria Phillips, James P Bywater, Jennifer Chiu, Ginger S. Watson, Laura E. Barnes, Donald E Brown

    Abstract: A large body of research demonstrates how teachers' questioning strategies can improve student learning outcomes. However, developing new scenarios is challenging because of the lack of training data for a specific scenario and the costs associated with labeling. This paper presents a high-fidelity, AI-based classroom simulator to help teachers rehearse research-based mathematical questioning skil… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: Accepted to appear at the NeurIPS 2021 Workshop on Math AI for Education (MATHAI4ED)

  23. arXiv:2110.07663  [pdf, other

    math.NA cs.PF

    Tuning Spectral Element Preconditioners for Parallel Scalability on GPUs

    Authors: Malachi Phillips, Stefan Kerkemeier, Paul Fischer

    Abstract: The Poisson pressure solve resulting from the spectral element discretization of the incompressible Navier-Stokes equation requires fast, robust, and scalable preconditioning. In the current work, a parallel scaling study of Chebyshev-accelerated Schwarz and Jacobi preconditioning schemes is presented, with special focus on GPU architectures, such as OLCF's Summit. Convergence properties of the Ch… ▽ More

    Submitted 12 December, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 12 pages, 8 figures, submitted/accepted to SIAM Conference on Parallel Processing for Scientific Computing (PP22) proceedings

    MSC Class: 35-04 ACM Class: D.0; F.2; G.1.3; G.4; I.6

  24. arXiv:2109.05072  [pdf, other

    cs.DC cs.MS math.NA

    GPU Algorithms for Efficient Exascale Discretizations

    Authors: Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Ryan Bleile, Jed Brown, Jean-Sylvain Camier, Robert Carson, Noel Chalmers, Veselin Dobrev, Yohann Dudouit, Paul Fischer, Ali Karakus, Stefan Kerkemeier, Tzanio Kolev, Yu-Hsiang Lan, Elia Merzari, Misun Min, Malachi Phillips, Thilina Rathnayake, Robert Rieben, Thomas Stitt, Ananias Tomboulides, Stanimire Tomov, Vladimir Tomov, Arturo Vargas , et al. (2 additional authors not shown)

    Abstract: In this paper we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM,… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

  25. arXiv:2108.12084  [pdf, other

    cs.CL cs.AI cs.LG

    Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies

    Authors: Sunipa Dev, Masoud Monajatipoor, Anaelia Ovalle, Arjun Subramonian, Jeff M Phillips, Kai-Wei Chang

    Abstract: Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understandin… ▽ More

    Submitted 10 September, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

    Journal ref: EMNLP 2021

  26. Practical and Configurable Network Traffic Classification Using Probabilistic Machine Learning

    Authors: Jiahui Chen, Joe Breen, Jeff M. Phillips, Jacobus Van der Merwe

    Abstract: Network traffic classification that is widely applicable and highly accurate is valuable for many network security and management tasks. A flexible and easily configurable classification framework is ideal, as it can be customized for use in a wide variety of networks. In this paper, we propose a highly configurable and flexible machine learning traffic classification method that relies only on st… ▽ More

    Submitted 10 July, 2021; originally announced July 2021.

    Comments: Published in the Springer Cluster Computing journal

  27. arXiv:2106.13851  [pdf, other

    cs.CG cs.LG

    Approximate Maximum Halfspace Discrepancy

    Authors: Michael Matheny, Jeff M. Phillips

    Abstract: Consider the geometric range space $(X, \mathcal{H}_d)$ where $X \subset \mathbb{R}^d$ and $\mathcal{H}_d$ is the set of ranges defined by $d$-dimensional halfspaces. In this setting we consider that $X$ is the disjoint union of a red and blue set. For each halfspace $h \in \mathcal{H}_d$ define a function $Φ(h)$ that measures the "difference" between the fraction of red and fraction of blue point… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

  28. arXiv:2104.05829  [pdf, other

    cs.PF cs.DC

    NekRS, a GPU-Accelerated Spectral Element Navier-Stokes Solver

    Authors: Paul Fischer, Stefan Kerkemeier, Misun Min, Yu-Hsiang Lan, Malachi Phillips, Thilina Rathnayake, Elia Merzari, Ananias Tomboulides, Ali Karakus, Noel Chalmers, Tim Warburton

    Abstract: The development of NekRS, a GPU-oriented thermal-fluids simulation code based on the spectral element method (SEM) is described. For performance portability, the code is based on the open concurrent compute abstraction and leverages scalable developments in the SEM code Nek5000 and in libParanumal, which is a library of high-performance kernels for high-order discretizations and PDE-based miniapps… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: 14 pages, 8 figures

    MSC Class: 35-04 ACM Class: D.0; F.2; G.2; G.4; I.6

  29. arXiv:2104.02797  [pdf, other

    cs.CL cs.HC

    VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

    Authors: Archit Rathore, Sunipa Dev, Jeff M. Phillips, Vivek Srikumar, Yan Zheng, Chin-Chia Michael Yeh, Junpeng Wang, Wei Zhang, Bei Wang

    Abstract: Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these biases in word representations. In this paper, we utilize interactive visualization to increase the interpretability and accessibility of a collection of state-of-the-art debiasing techniques. To aid this,… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 11 pages

  30. arXiv:2010.12710  [pdf, other

    cs.CL cs.CY cs.LG

    Improving Classification through Weak Supervision in Context-specific Conversational Agent Development for Teacher Education

    Authors: Debajyoti Datta, Maria Phillips, Jennifer Chiu, Ginger S. Watson, James P. Bywater, Laura Barnes, Donald Brown

    Abstract: Machine learning techniques applied to the Natural Language Processing (NLP) component of conversational agent development show promising results for improved accuracy and quality of feedback that a conversational agent can provide. The effort required to develop an educational scenario specific conversational agent is time consuming as it requires domain experts to label and annotate noisy data s… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: Preprint: Under Review

    ACM Class: I.2.7

  31. arXiv:2010.06536  [pdf, other

    cs.CV cs.AI

    Kartta Labs: Collaborative Time Travel

    Authors: Sasan Tavakkol, Feng Han, Brandon Mayer, Mark Phillips, Cyrus Shahabi, Yao-Yi Chiang, Raimondas Kiveris

    Abstract: We introduce the modular and scalable design of Kartta Labs, an open source, open data, and scalable system for virtually reconstructing cities from historical maps and photos. Kartta Labs relies on crowdsourcing and artificial intelligence consisting of two major modules: Maps and 3D models. Each module, in turn, consists of sub-modules that enable the system to reconstruct a city from historical… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  32. arXiv:2009.02947  [pdf, other

    cs.HC

    Towards a Practical Virtual Office for Mobile Knowledge Workers

    Authors: Eyal Ofek, Jens Grubert, Michel Pahud, Mark Phillips, Per Ola Kristensson

    Abstract: As more people work from home or during travel, new opportunities and challenges arise around mobile office work. On one hand, people may work at flexible hours, independent of traffic limitations, but on the other hand, they may need to work at makeshift spaces, with less than optimal working conditions and decoupled from co-workers. Virtual Reality (VR) has the potential to change the way inform… ▽ More

    Submitted 7 September, 2020; originally announced September 2020.

    Comments: https://www.microsoft.com/en-us/research/event/new-future-of-work/#!publications

    ACM Class: H.5.2

    Journal ref: Microsoft New Future of Work 2020 Symposium

  33. arXiv:2009.00611  [pdf, other

    cs.IR cs.CL cs.DL cs.LG

    Identifying Documents In-Scope of a Collection from Web Archives

    Authors: Krutarth Patel, Cornelia Caragea, Mark Phillips, Nathaniel Fox

    Abstract: Web archive data usually contains high-quality documents that are very useful for creating specialized collections of documents, e.g., scientific digital libraries and repositories of technical reports. In doing so, there is a substantial need for automatic approaches that can distinguish the documents of interest for a collection out of the huge number of documents collected by web archiving inst… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

    Comments: 10 pages

    Journal ref: In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020)

  34. arXiv:2007.15924  [pdf, other

    cs.CG

    Orientation-Preserving Vectorized Distance Between Curves

    Authors: Jeff M. Phillips, Hasan Pourmahmood-Aghababa

    Abstract: We introduce an orientation-preserving landmark-based distance for continuous curves, which can be viewed as an alternative to the \Frechet or Dynamic Time Warping distances. This measure retains many of the properties of those measures, and we prove some relations, but can be interpreted as a Euclidean distance in a particular vector space. Hence it is significantly easier to use, faster for gene… ▽ More

    Submitted 24 May, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

    Comments: 23 pages, 10 figures, 1 table, accepted for MSML21, this paper is in Primary: CG (Computational Geometry) and Secondary: LG (Machine Learning) categories

    ACM Class: I.3.5

  35. arXiv:2007.00049  [pdf, other

    cs.CL cs.AI cs.LG

    OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings

    Authors: Sunipa Dev, Tao Li, Jeff M Phillips, Vivek Srikumar

    Abstract: Language representations are known to carry stereotypical biases and, as a result, lead to biased predictions in downstream tasks. While existing methods are effective at mitigating biases by linear projection, such methods are too aggressive: they not only remove bias, but also erase valuable information from word embeddings. We develop new measures for evaluating specific information retention t… ▽ More

    Submitted 10 September, 2021; v1 submitted 30 June, 2020; originally announced July 2020.

    Journal ref: EMNLP 2021

  36. arXiv:2002.02013  [pdf, other

    cs.LG cs.DS stat.ML

    A Deterministic Streaming Sketch for Ridge Regression

    Authors: Benwei Shi, Jeff M. Phillips

    Abstract: We provide a deterministic space-efficient algorithm for estimating ridge regression. For $n$ data points with $d$ features and a large enough regularization parameter, we provide a solution within $\varepsilon$ L$_2$ error using only $O(d/\varepsilon)$ space. This is the first $o(d^2)$ space deterministic streaming algorithm with guaranteed solution error and risk bound for this classic problem.… ▽ More

    Submitted 10 March, 2021; v1 submitted 5 February, 2020; originally announced February 2020.

    Comments: Fix a few typos. To be published in AISTATS 2021

    Journal ref: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:586-594, 2021

  37. arXiv:1912.07673  [pdf, ps, other

    cs.DS cs.CG

    Finding the Mode of a Kernel Density Estimate

    Authors: Jasper C. H. Lee, Jerry Li, Christopher Musco, Jeff M. Phillips, Wai Ming Tai

    Abstract: Given points $p_1, \dots, p_n$ in $\mathbb{R}^d$, how do we find a point $x$ which maximizes $\frac{1}{n} \sum_{i=1}^n e^{-\|p_i - x\|^2}$? In other words, how do we find the maximizing point, or mode of a Gaussian kernel density estimation (KDE) centered at $p_1, \dots, p_n$? Given the power of KDEs in representing probability distributions and other continuous functions, the basic mode finding p… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

  38. arXiv:1910.05862  [pdf, other

    cs.LG stat.ML

    Constrained Non-Affine Alignment of Embeddings

    Authors: Yuwei Wang, Yan Zheng, Yanqing Peng, Chin-Chia Michael Yeh, Zhongfang Zhuang, Das Mahashweta, Bendre Mangesh, Feifei Li, Wei Zhang, Jeff M. Phillips

    Abstract: Embeddings are one of the fundamental building blocks for data analysis tasks. Embeddings are already essential tools for large language models and image analysis, and their use is being extended to many other research domains. The generation of these distributed representations is often a data- and computation-expensive process; yet the holistic analysis and adjustment of them after they have bee… ▽ More

    Submitted 19 November, 2021; v1 submitted 13 October, 2019; originally announced October 2019.

  39. arXiv:1907.02171  [pdf, other

    cs.CG

    Sketched MinDist

    Authors: Jeff M. Phillips, Pingfan Tang

    Abstract: We consider sketch vectors of geometric objects $J$ through the \mindist function \[ v_i(J) = \inf_{p \in J} \|p-q_i\| \] for $q_i \in Q$ from a point set $Q$. Collecting the vector of these sketch values induces a simple, effective, and powerful distance: the Euclidean distance between these sketched vectors. This paper shows how large this set $Q$ needs to be under a variety of shapes and scenar… ▽ More

    Submitted 7 July, 2019; v1 submitted 3 July, 2019; originally announced July 2019.

  40. arXiv:1906.09381  [pdf, other

    stat.ML cs.LG

    The Kernel Spatial Scan Statistic

    Authors: Mingxuan Han, Michael Matheny, Jeff M. Phillips

    Abstract: Kulldorff's (1997) seminal paper on spatial scan statistics (SSS) has led to many methods considering different regions of interest, different statistical models, and different approximations while also having numerous applications in epidemiology, environmental monitoring, and homeland security. SSS provides a way to rigorously test for the existence of an anomaly and provide statistical guarante… ▽ More

    Submitted 9 August, 2019; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: 13 pages, 13 figures

  41. arXiv:1906.04261  [pdf, other

    cs.SI cs.LG

    Examining Untempered Social Media: Analyzing Cascades of Polarized Conversations

    Authors: Arunkumar Bagavathi, Pedram Bashiri, Shannon Reid, Matthew Phillips, Siddharth Krishnan

    Abstract: Online social media, periodically serves as a platform for cascading polarizing topics of conversation. The inherent community structure present in online social networks (homophily) and the advent of fringe outlets like Gab have created online "echo chambers" that amplify the effects of polarization, which fuels detrimental behavior. Recently, in October 2018, Gab made headlines when it was revea… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

  42. arXiv:1906.01693  [pdf, other

    cs.DS

    Scalable Spatial Scan Statistics for Trajectories

    Authors: Michael Matheny, Dong Xie, Jeff M. Phillips

    Abstract: We define several new models for how to define anomalous regions among enormous sets of trajectories. These are based on spatial scan statistics, and identify a geometric region which captures a subset of trajectories which are significantly different in a measured characteristic from the background population. The model definition depends on how much a geometric region is contributed to by some o… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

  43. arXiv:1903.08014  [pdf, other

    cs.DS cs.CG

    Independent Range Sampling, Revisited Again

    Authors: Peyman Afshani, Jeff M. Phillips

    Abstract: We revisit the range sampling problem: the input is a set of points where each point is associated with a real-valued weight. The goal is to store them in a structure such that given a query range and an integer $k$, we can extract $k$ independent random samples from the points inside the query range, where the probability of sampling a point is proportional to its weight. This line of work was… ▽ More

    Submitted 19 March, 2019; originally announced March 2019.

  44. arXiv:1903.03211  [pdf, other

    cs.CG

    The VC Dimension of Metric Balls under Fréchet and Hausdorff Distances

    Authors: Anne Driemel, André Nusser, Jeff M. Phillips, Ioannis Psarros

    Abstract: The Vapnik-Chervonenkis dimension provides a notion of complexity for systems of sets. If the VC dimension is small, then knowing this can drastically simplify fundamental computational tasks such as classification, range counting, and density estimation through the use of sampling bounds. We analyze set systems where the ground set $X$ is a set of polygonal curves in $\mathbb{R}^d$ and the sets… ▽ More

    Submitted 15 November, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

    Comments: 24 pages, 5 figures

  45. arXiv:1811.04136  [pdf, ps, other

    cs.LG cs.CG stat.ML

    The GaussianSketch for Almost Relative Error Kernel Distance

    Authors: Jeff M. Phillips, Wai Ming Tai

    Abstract: We introduce two versions of a new sketch for approximately embedding the Gaussian kernel into Euclidean inner product space. These work by truncating infinite expansions of the Gaussian kernel, and carefully invoking the RecursiveTensorSketch [Ahle et al. SODA 2020]. After providing concentration and approximation properties of these sketches, we use them to approximate the kernel distance betwee… ▽ More

    Submitted 19 June, 2020; v1 submitted 9 November, 2018; originally announced November 2018.

  46. Improved Bounds on Information Dissemination by Manhattan Random Waypoint Model

    Authors: Aria Rezaei, Jie Gao, Jeff M. Phillips, Csaba D. Tóth

    Abstract: With the popularity of portable wireless devices it is important to model and predict how information or contagions spread by natural human mobility -- for understanding the spreading of deadly infectious diseases and for improving delay tolerant communication schemes. Formally, we model this problem by considering $M$ moving agents, where each agent initially carries a \emph{distinct} bit of info… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Comments: 10 pages, ACM SIGSPATIAL 2018, Seattle, US

    Journal ref: 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL 18), 2018, Seattle, WA, USA

  47. arXiv:1806.01330  [pdf, other

    cs.CL stat.ML

    Closed Form Word Embedding Alignment

    Authors: Sunipa Dev, Safia Hassan, Jeff M. Phillips

    Abstract: We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). Our methods are simple and have a closed form to optimally rotate, translate, and scale to minimize root mean squared errors or maximize the average cosine similarity between two embeddings of the same vocabulary into the same… ▽ More

    Submitted 17 November, 2020; v1 submitted 4 June, 2018; originally announced June 2018.

    Comments: Accepted ICDM 2019 and KAIS Special Issue

  48. arXiv:1804.11307  [pdf, other

    cs.CG

    Practical Low-Dimensional Halfspace Range Space Sampling

    Authors: Michael Matheny, Jeff M. Phillips

    Abstract: We develop, analyze, implement, and compare new algorithms for creating $\varepsilon$-samples of range spaces defined by halfspaces which have size sub-quadratic in $1/\varepsilon$, and have runtime linear in the input size and near-quadratic in $1/\varepsilon$. The key to our solution is an efficient construction of partition trees. Despite not requiring any techniques developed after the early 1… ▽ More

    Submitted 17 July, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

  49. arXiv:1804.11287  [pdf, other

    cs.CG

    Computing Approximate Statistical Discrepancy

    Authors: Michael Matheny, Jeff M. Phillips

    Abstract: Consider a geometric range space $(X,cA)$ where each data point $x \in X$ has two or more values (say $r(x)$ and $b(x)$). Also consider a function $Φ(A)$ defined on any subset $A \in (X,cA)$ on the sum of values in that range e.g., $r_A = \sum_{x \in A} r(x)$ and $b_A = \sum_{x \in A} b(x)$. The $Φ$-maximum range is $A^* = \arg \max_{A \in (X,cA)} Φ(A)$. Our goal is to find some $\hat{A}$ such tha… ▽ More

    Submitted 27 September, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

  50. arXiv:1804.11284  [pdf, other

    cs.CG cs.LG

    Simple Distances for Trajectories via Landmarks

    Authors: Jeff M. Phillips, Pingfan Tang

    Abstract: We develop a new class of distances for objects including lines, hyperplanes, and trajectories, based on the distance to a set of landmarks. These distances easily and interpretably map objects to a Euclidean space, are simple to compute, and perform well in data analysis tasks. For trajectories, they match and in some cases significantly out-perform all state-of-the-art other metrics, can effortl… ▽ More

    Submitted 11 June, 2019; v1 submitted 30 April, 2018; originally announced April 2018.

    ACM Class: I.3.5; G.3