Skip to main content

Showing 1–50 of 55 results for author: Gray, G

.
  1. arXiv:2506.14095  [pdf, ps, other

    cs.LG

    Transformers Learn Faster with Semantic Focus

    Authors: Parikshit Ram, Kenneth L. Clarkson, Tim Klinger, Shashanka Ubaru, Alexander G. Gray

    Abstract: Various forms of sparse attention have been explored to mitigate the quadratic computational and memory cost of the attention mechanism in transformers. We study sparse transformers not through a lens of efficiency but rather in terms of learnability and generalization. Empirically studying a range of attention mechanisms, we find that input-dependent sparse attention models appear to converge fas… ▽ More

    Submitted 18 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  2. arXiv:2505.13738  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training

    Authors: Shane Bergsma, Nolan Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness

    Abstract: Efficient LLM pre-training requires well-tuned hyperparameters (HPs), including learning rate η and weight decay λ. We study scaling laws for HPs: formulas for how to scale HPs as we scale model size N, dataset size D, and batch size B. Recent work suggests the AdamW timescale, B/(ηλD), should remain constant across training settings, and we verify the implication that optimal λ scales linearly wi… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  3. arXiv:2504.18704  [pdf, other

    cs.PL cs.SE

    An Interactive Debugger for Rust Trait Errors

    Authors: Gavin Gray, Will Crichton, Shriram Krishnamurthi

    Abstract: Compiler diagnostics for type inference failures are notoriously bad, and type classes only make the problem worse. By introducing a complex search process during inference, type classes can lead to wholly inscrutable or useless errors. We describe a system, Argus, for interactively visualizing type class inferences to help programmers debug inference failures, applied specifically to Rust's trait… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  4. arXiv:2502.15938  [pdf, other

    cs.LG cs.AI cs.CL cs.NE

    Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs

    Authors: Shane Bergsma, Nolan Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness

    Abstract: LLMs are commonly trained with a learning rate (LR) warmup, followed by cosine decay to 10% of the maximum (10x decay). In a large-scale empirical study, we show that under an optimal peak LR, a simple linear decay-to-zero (D2Z) schedule consistently outperforms other schedules when training at compute-optimal dataset sizes. D2Z is superior across a range of model sizes, batch sizes, datasets, and… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: ICLR 2025

  5. arXiv:2502.12394  [pdf, ps, other

    math.CO math.NT

    Fixed perimeter analogues of some partition results

    Authors: Gabriel Gray, Emily Payne, Holly Swisher, Ren Watson

    Abstract: Euler's partition identity states that the number of partitions of $n$ into odd parts is equal to the number of partitions of $n$ into distinct parts. Strikingly, Straub proved in 2016 that this identity also holds when counting partitions of any size with largest hook (perimeter) $n$. This has inspired further investigation of partition identities and inequalities in the fixed perimeter setting.… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 15 pages

    MSC Class: 05A17; 05A19; 11P81; 11P84

  6. arXiv:2411.00999  [pdf, other

    cs.LG stat.ML

    Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

    Authors: Gavia Gray, Aman Tiwari, Shane Bergsma, Joel Hestness

    Abstract: Per-example gradient norms are a vital ingredient for estimating gradient noise scale (GNS) with minimal variance. Observing the tensor contractions required to compute them, we propose a method with minimal FLOPs in 3D or greater tensor regimes by simultaneously computing the norms while computing the parameter gradients. Using this method we are able to observe the GNS of different layers at hig… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: 23 pages, 16 figures, to be published in the proceedings of the 2024 Conference on Neural Information Processing Systems (NeurIPS), code is available at: https://github.com/CerebrasResearch/nanoGNS

    ACM Class: I.2.6

  7. arXiv:2411.00773  [pdf, other

    cs.AI

    LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation

    Authors: Bowen Li, Zhaoyu Li, Qiwei Du, Jinqi Luo, Wenshan Wang, Yaqi Xie, Simon Stepputtis, Chen Wang, Katia P. Sycara, Pradeep Kumar Ravikumar, Alexander G. Gray, Xujie Si, Sebastian Scherer

    Abstract: Recent years have witnessed the rapid development of Neuro-Symbolic (NeSy) AI systems, which integrate symbolic reasoning into deep neural networks. However, most of the existing benchmarks for NeSy AI fail to provide long-horizon reasoning tasks with complex multi-agent interactions. Furthermore, they are usually constrained by fixed and simplistic logical rules over limited entities, making them… ▽ More

    Submitted 3 April, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

    Comments: 25 pages, 8 figures, In Advances in Neural Information Processing Systems (NeurIPS) 37 D&B Track (2024): 69840-69864

    Journal ref: Advances in Neural Information Processing Systems, 37, 69840-69864 (2024)

  8. arXiv:2410.17378  [pdf, ps, other

    math.NT math.CO

    A generalization of Franklin's partition identity and a Beck-type companion identity

    Authors: Gabriel Gray, David Hovey, Brandt Kronholm, Emily Payne, Holly Swisher, Ren Watson

    Abstract: Euler's classic partition identity states that the number of partitions of $n$ into odd parts equals the number of partitions of $n$ into distinct parts. We develop a new generalization of this identity, which yields a previous generalization of Franklin as a special case, and prove an accompanying Beck-type companion identity.

    Submitted 22 October, 2024; originally announced October 2024.

  9. arXiv:2405.02350  [pdf, ps, other

    cs.LG cs.AI

    What makes Models Compositional? A Theoretical View: With Supplement

    Authors: Parikshit Ram, Tim Klinger, Alexander G. Gray

    Abstract: Compositionality is thought to be a key component of language, and various compositional benchmarks have been developed to empirically probe the compositional generalization of existing sequence processing models. These benchmarks often highlight failures of existing models, but it is not clear why these models fail in this way. In this paper, we seek to theoretically understand the role the compo… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Extended version of the original IJCAI 2024 paper with detailed supplementary materials (27 pages, 7 figures)

  10. arXiv:2309.05137  [pdf, other

    cs.PL cs.HC

    Debugging Trait Errors as Logic Programs

    Authors: Gavin Gray, Will Crichton

    Abstract: Rust uses traits to define units of shared behavior. Trait constraints build up an implicit set of first-order hereditary Harrop clauses which is executed by a powerful logic programming engine in the trait system. But that power comes at a cost: the number of traits in Rust libraries is increasing, which puts a growing burden on the trait system to help programmers diagnose errors. Beyond a certa… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: 9 pages, 2 figures

  11. arXiv:2309.04134  [pdf, other

    cs.PL cs.HC

    A Grounded Conceptual Model for Ownership Types in Rust

    Authors: Will Crichton, Gavin Gray, Shriram Krishnamurthi

    Abstract: Programmers learning Rust struggle to understand ownership types, Rust's core mechanism for ensuring memory safety without garbage collection. This paper describes our attempt to systematically design a pedagogy for ownership types. First, we studied Rust developers' misconceptions of ownership to create the Ownership Inventory, a new instrument for measuring a person's knowledge of ownership. We… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: Published at OOPSLA 2023

  12. arXiv:2303.01466  [pdf

    q-bio.TO

    Investigating biomechanical determinants of endothelial permeability in a modified hollow fibre bioreactor

    Authors: Stephen G Gray, Peter D Weinberg

    Abstract: Effects of mechanical stress on the permeability of vascular endothelium are important to normal physiology and may be critical in the development of atherosclerosis, where they can account for the patchy arterial distribution of the disease. Such properties are frequently investigated in vitro. Here we evaluate and use the hollow fibre bioreactor for this purpose; in this system, endothelial cell… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: 33 pages, 13 figures

  13. arXiv:2301.05131  [pdf, other

    cs.LG

    Toward Theoretical Guidance for Two Common Questions in Practical Cross-Validation based Hyperparameter Selection

    Authors: Parikshit Ram, Alexander G. Gray, Horst C. Samulowitz, Gregory Bramble

    Abstract: We show, to our knowledge, the first theoretical treatments of two common questions in cross-validation based hyperparameter selection: (1) After selecting the best hyperparameter using a held-out set, we train the final model using {\em all} of the training data -- since this may or may not improve future generalization error, should one do this? (2) During optimization such as via SGD (stochasti… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

    Comments: Extended version of the paper appearing at the SIAM International Conference on Data Mining 2023 (SDM23)

  14. Deformation and dislocation evolution in body-centered-cubic single- and polycrystal tantalum

    Authors: Seunghyeon Lee, Hansohl Cho, Curt A. Bronkhorst, Reeju Pokharel, Donald W. Brown, Bjørn Clausen, Sven C. Vogel, Veronica Anghel, George T. Gray III, Jason R. Mayeur

    Abstract: A physically-informed continuum crystal plasticity model is presented to elucidate the deformation mechanisms and dislocation evolution in body-centered-cubic (bcc) tantalum widely used as a key structural material for mechanical and thermal extremes. We show our unified structural modeling framework informed by mesoscopic dislocation dynamics simulations is capable of capturing salient features o… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

    Journal ref: Int. J. Plast., 163 (2023), Article 103529

  15. arXiv:2109.03358  [pdf, other

    physics.chem-ph physics.bio-ph physics.comp-ph

    Integral equation models for solvent in macromolecular crystals

    Authors: Jonathon G. Gray, George M. Giambaşu, David A. Case, Tyler Luchko

    Abstract: Solvent can occupy up to ~70% of macromolecular crystals and hence having models that predict solvent distributions in periodic systems could improve in the interpretation of crystallographic data. Yet there are few implicit solvent models applicable to periodic solutes while crystallographic structures are commonly solved assuming a flat solvent model. Here we present a newly-developed periodic v… ▽ More

    Submitted 7 September, 2021; originally announced September 2021.

    Comments: 13 pages, 6 figure, 5 tables

  16. arXiv:2106.13367  [pdf, other

    cs.AI cs.DB

    SeaNet -- Towards A Knowledge Graph Based Autonomic Management of Software Defined Networks

    Authors: Qianru Zhou, Alasdair J. G. Gray, Stephen McLaughlin

    Abstract: Automatic network management driven by Artificial Intelligent technologies has been heatedly discussed over decades. However, current reports mainly focus on theoretic proposals and architecture designs, works on practical implementations on real-life networks are yet to appear. This paper proposes our effort toward the implementation of knowledge graph driven approach for autonomic network manage… ▽ More

    Submitted 27 May, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

  17. SARA -- A Semantic Access Point Resource Allocation Service for Heterogenous Wireless Networks

    Authors: Qianru Zhou, Alasdair J. G. Gray, Dimitrios Pezaros, Stephen McLaughlin

    Abstract: In this paper, we present SARA, a Semantic Access point Resource Allocation service for heterogenous wireless networks with various wireless access technologies existing together. By automatically reasoning on the knowledge base of the full system provided by a knowledge based autonomic network management system -- SEANET, SARA selects the access point providing the best quality of service among t… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: 2019 IEEE Wireless Day

  18. arXiv:2006.09635  [pdf, other

    cs.LG math.OC stat.ML

    Solving Constrained CASH Problems with ADMM

    Authors: Parikshit Ram, Sijia Liu, Deepak Vijaykeerthi, Dakuo Wang, Djallel Bouneffouf, Greg Bramble, Horst Samulowitz, Alexander G. Gray

    Abstract: The CASH problem has been widely studied in the context of automated configurations of machine learning (ML) pipelines and various solvers and toolkits are available. However, CASH solvers do not directly handle black-box constraints such as fairness, robustness or other domain-specific custom constraints. We present our recent approach [Liu, et al., 2020] that leverages the ADMM optimization fram… ▽ More

    Submitted 10 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: 7th ICML Workshop on Automated Machine Learning (2020)

  19. arXiv:1908.05086  [pdf

    q-bio.NC

    Improved Hodgkin & Huxley-type model for action potentials in squid

    Authors: P. J. Stiles, C. G. Gray

    Abstract: By extending the crude Goldman-Hodgkin-Katz electrodiffusion model for resting-state membrane potentials in perfused giant axons of squid, we reformulate the Hodgkin-Huxley (HH) phenomenological quantitative model to create a new model which is simpler and based more fundamentally on electrodiffusion principles. Our dynamical system, like that of HH, behaves as a 4-dimensional resonator exhibiting… ▽ More

    Submitted 5 February, 2020; v1 submitted 14 August, 2019; originally announced August 2019.

    Comments: 29 pages, 7 figures This replacement version contains a new section on periodic action potentials in low external calcium ion environments. It also includes expanded discussions of temperature dependences and oscillator behaviors of membrane potentials

    MSC Class: 34A34; 35C07

  20. arXiv:1906.04113  [pdf, other

    cs.LG stat.ML

    BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

    Authors: Jack Turner, Elliot J. Crowley, Michael O'Boyle, Amos Storkey, Gavin Gray

    Abstract: The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in a large network with cheap alternative blocks. However, not all blocks are created equally; for a required compute budget there may exist a potent combination of many different cheap blocks, though exhaustiv… ▽ More

    Submitted 23 January, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: ICLR 2020

  21. arXiv:1906.00859  [pdf, other

    stat.ML cs.LG

    Separable Layers Enable Structured Efficient Linear Substitutions

    Authors: Gavin Gray, Elliot J. Crowley, Amos Storkey

    Abstract: In response to the development of recent efficient dense layers, this paper shows that something as simple as replacing linear components in pointwise convolutions with structured linear decompositions also produces substantial gains in the efficiency/accuracy tradeoff. Pointwise convolutions are fully connected layers and are thus prepared for replacement by structured transforms. Networks using… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

  22. arXiv:1903.05372  [pdf, other

    cs.CY cs.NI

    Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams

    Authors: Qianru Zhou, Stephen McLaughlin, Alasdair J. G. Gray, Shangbin Wu, Chengxiang Wang

    Abstract: Early detection of significant traumatic events, e.g. a terrorist attack or a ship capsizing, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems could play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is ill… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: 15 pages, 4 figures, WSP ISWC 2017 conference

    Journal ref: ISWC WSP 2017, pp. 33--47

  23. arXiv:1805.11032  [pdf, other

    physics.flu-dyn cond-mat.soft

    A geometric state function for two-fluid flow in porous media

    Authors: James E. McClure, Ryan T. Armstrong, Mark A. Berrill, Steffen Schlüter, Steffen Berg, William G. Gray, Cass T. Miller

    Abstract: Models that describe two-fluid flow in porous media suffer from a widely-recognized problem that the constitutive relationships used to predict capillary pressure as a function of the fluid saturation are non-unique, thus requiring a hysteretic description. As an alternative to the traditional perspec- tive, we consider a geometrical description of the capillary pressure, which relates the average… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

    Journal ref: Phys. Rev. Fluids 3, 084306 (2018)

  24. arXiv:1803.02507  [pdf

    physics.chem-ph cond-mat.soft

    Nonlinear Electrostatics. The Poisson-Boltzmann Equation

    Authors: C. G. Gray, P. J. Stiles

    Abstract: The description of a conducting medium in thermal equilibrium, such as an electrolyte solution or a plasma, involves nonlinear electrostatics, a subject rarely discussed in the standard electricity and magnetism textbooks. We consider in detail the case of the electrostatic double layer formed by an electrolyte solution near a uniformly charged wall, and we use mean-field or Poisson-Boltzmann (PB)… ▽ More

    Submitted 6 March, 2018; originally announced March 2018.

    Comments: 70 pages, 4 figures, 3 appendices

  25. arXiv:1711.02613  [pdf, other

    stat.ML cs.CV cs.LG

    Moonshine: Distilling with Cheap Convolutions

    Authors: Elliot J. Crowley, Gavin Gray, Amos Storkey

    Abstract: Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture… ▽ More

    Submitted 17 January, 2019; v1 submitted 7 November, 2017; originally announced November 2017.

    Comments: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018)

  26. arXiv:1607.07430  [pdf, other

    physics.comp-ph cond-mat.stat-mech physics.bio-ph

    Stiff-spring approximation revisited: inertial effects in non-equilibrium trajectories

    Authors: Mostafa Nategholeslam, C. G. Gray, Bruno Tomberli

    Abstract: Use of harmonic guiding potentials is the most common method for implementing steered molecular dynamics (SMD) simulations, performed to obtain potentials of mean force (PMFs) of molecular systems using non-equilibrium work (NEW) theorems. Harmonic guiding potentials are also the natural choice in single molecule force spectroscopy experiments. The stiff spring approximation (SSA) of Schulten and… ▽ More

    Submitted 25 July, 2016; originally announced July 2016.

  27. arXiv:1406.1523  [pdf, other

    physics.chem-ph

    McMillan-Mayer Theory of Solutions Revisited: Simplifications and Extensions

    Authors: Shaghayegh Vafaei, Bruno Tomberli, C. G. Gray

    Abstract: McMillan and Mayer (MM) proved two remarkable theorems in their paper on the equilibrium statistical mechanics of liquid solutions. They first showed that the grand canonical partition function for a solution can be reduced to a one with an effectively solute-only form, by integrating out the solvent degrees of freedom. The total effective solute potential in the effective solute grand partition f… ▽ More

    Submitted 18 June, 2014; v1 submitted 5 June, 2014; originally announced June 2014.

    Comments: pdftex, 32 pages, 2 figures. Thermodynamic errata in Section IV.C and Appendix B are corrected. Figure 2 is recalculated at a new temperature to correspond with more recent experimental data

  28. arXiv:1403.4890  [pdf, other

    stat.ME stat.CO

    Modeling an Augmented Lagrangian for Blackbox Constrained Optimization

    Authors: Robert B. Gramacy, Genetha A. Gray, Sebastien Le Digabel, Herbert K. H. Lee, Pritam Ranjan, Garth Wells, Stefan M. Wild

    Abstract: Constrained blackbox optimization is a difficult problem, with most approaches coming from the mathematical programming literature. The statistical literature is sparse, especially in addressing problems with nontrivial constraints. This situation is unfortunate because statistical methods have many attractive properties: global scope, handling noisy objectives, sensitivity analysis, and so forth.… ▽ More

    Submitted 3 March, 2015; v1 submitted 19 March, 2014; originally announced March 2014.

    Comments: 22 Pages, 2 additional supplementary, 5 figures

  29. arXiv:1309.6830  [pdf

    cs.LG stat.ML

    Building Bridges: Viewing Active Learning from the Multi-Armed Bandit Lens

    Authors: Ravi Ganti, Alexander G. Gray

    Abstract: In this paper we propose a multi-armed bandit inspired, pool based active learning algorithm for the problem of binary classification. By carefully constructing an analogy between active learning and multi-armed bandits, we utilize ideas such as lower confidence bounds, and self-concordant regularization from the multi-armed bandit literature to design our proposed algorithm. Our algorithm is a se… ▽ More

    Submitted 26 September, 2013; originally announced September 2013.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

    Report number: UAI-P-2013-PG-232-241

  30. PAV ontology: Provenance, Authoring and Versioning

    Authors: Paolo Ciccarese, Stian Soiland-Reyes, Khalid Belhajjame, Alasdair J G Gray, Carole Goble, Tim Clark

    Abstract: Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as DC Terms and the W3C PROV-O are domain-independent and general-purpose and they allow and encourage for extensions to… ▽ More

    Submitted 6 December, 2013; v1 submitted 26 April, 2013; originally announced April 2013.

    Comments: 22 pages (incl 5 tables and 19 figures). Submitted to Journal of Biomedical Semantics 2013-04-26 (#1858276535979415). Revised article submitted 2013-08-30. Second revised article submitted 2013-10-06. Accepted 2013-10-07. Author proofs sent 2013-10-09 and 2013-10-16. Published 2013-11-22. Final version 2013-12-06. http://www.jbiomedsem.com/content/4/1/37

    Report number: University of Manchester eScholar: uk-ac-man-scw:193385 ACM Class: I.2.4; H.2.1; H.3.7; I.7.4

    Journal ref: Journal of Biomedical Semantics 2013, 4:37

  31. arXiv:1304.4327  [pdf, ps, other

    cs.DS

    Tree-Independent Dual-Tree Algorithms

    Authors: Ryan R. Curtin, William B. March, Parikshit Ram, David V. Anderson, Alexander G. Gray, Charles L. Isbell Jr

    Abstract: Dual-tree algorithms are a widely used class of branch-and-bound algorithms. Unfortunately, developing dual-tree algorithms for use with different trees and problems is often complex and burdensome. We introduce a four-part logical split: the tree, the traversal, the point-to-point base case, and the pruning rule. We provide a meta-algorithm which allows development of dual-tree algorithms in a tr… ▽ More

    Submitted 16 April, 2013; originally announced April 2013.

    Comments: accepted in ICML 2013

  32. arXiv:1303.0393  [pdf

    cond-mat.mtrl-sci

    The effect of shock-wave profile on dynamic brittle failure

    Authors: J. Pablo Escobedo, Eric N. Brown, Carl P. Trujillo, Ellen K. Cerreta, George T. Gray III

    Abstract: The influence of shock-wave-loading profile on the failure processes in a brittle material has been investigated. Tungsten heavy alloy (WHA) specimens have been subjected to two shock-wave loading profiles with a similar peak stress of 15.4 GPa but different pulse durations. Contrary to the strong dependence of strength on wave profile observed in ductile metals, for WHA, specimens subjected to di… ▽ More

    Submitted 2 March, 2013; originally announced March 2013.

  33. arXiv:1210.6293  [pdf, ps, other

    cs.MS cs.CV cs.LG

    MLPACK: A Scalable C++ Machine Learning Library

    Authors: Ryan R. Curtin, James R. Cline, N. P. Slagle, William B. March, Parikshit Ram, Nishant A. Mehta, Alexander G. Gray

    Abstract: MLPACK is a state-of-the-art, scalable, multi-platform C++ machine learning library released in late 2011 offering both a simple, consistent API accessible to novice users and high performance and flexibility to expert users by leveraging modern features of C++. MLPACK provides cutting-edge algorithms whose benchmarks exhibit far better performance than other leading machine learning libraries. ML… ▽ More

    Submitted 23 October, 2012; originally announced October 2012.

    Comments: Submitted to JMLR MLOSS (http://jmlr.csail.mit.edu/mloss/)

    Journal ref: Journal of Machine Learning Research 14 (2013) 801-805

  34. arXiv:1210.6287  [pdf, ps, other

    cs.DS cs.IR cs.LG

    Fast Exact Max-Kernel Search

    Authors: Ryan R. Curtin, Parikshit Ram, Alexander G. Gray

    Abstract: The wide applicability of kernels makes the problem of max-kernel search ubiquitous and more general than the usual similarity search in metric spaces. We focus on solving this problem efficiently. We begin by characterizing the inherent hardness of the max-kernel search problem with a novel notion of directional concentration. Following that, we present a method to use an $O(n \log n)$ algorithm… ▽ More

    Submitted 26 October, 2012; v1 submitted 23 October, 2012; originally announced October 2012.

    Comments: Under submission in SIAM Data Mining conference

  35. arXiv:1209.2784  [pdf, other

    cs.LG stat.ML

    Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL

    Authors: Nishant A. Mehta, Dongryeol Lee, Alexander G. Gray

    Abstract: Since its inception, the modus operandi of multi-task learning (MTL) has been to minimize the task-wise mean of the empirical risks. We introduce a generalized loss-compositional paradigm for MTL that includes a spectrum of formulations as a subfamily. One endpoint of this spectrum is minimax MTL: a new MTL formulation that minimizes the maximum of the tasks' empirical risks. Via a certain relaxat… ▽ More

    Submitted 13 September, 2012; originally announced September 2012.

    Comments: appearing at NIPS 2012

  36. arXiv:1206.6857  [pdf

    cs.LG math.NA stat.ML

    Faster Gaussian Summation: Theory and Experiment

    Authors: Dongryeol Lee, Alexander G. Gray

    Abstract: We provide faster algorithms for the problem of Gaussian summation, which occurs in many machine learning methods. We develop two new extensions - an O(Dp) Taylor expansion for the Gaussian kernel with rigorous error bounds and a new error control scheme integrating any arbitrary approximation method - within the best discretealgorithmic framework using adaptive hierarchical data structures. We ri… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)

    Report number: UAI-P-2006-PG-281-288

  37. arXiv:1206.5278  [pdf

    stat.ME cs.LG stat.ML

    Fast Nonparametric Conditional Density Estimation

    Authors: Michael P. Holmes, Alexander G. Gray, Charles Lee Isbell

    Abstract: Conditional density estimation generalizes regression by modeling a full density f(yjx) rather than only the expected value E(yjx). This is important for many tasks, including handling multi-modality and generating prediction intervals. Though fundamental and widely applicable, nonparametric conditional density estimators have received relatively little attention from statisticians and little or n… ▽ More

    Submitted 20 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

    Report number: UAI-P-2007-PG-175-182

  38. arXiv:1202.6101  [pdf, ps, other

    cs.CG cs.DS cs.IR

    Maximum Inner-Product Search using Tree Data-structures

    Authors: Parikshit Ram, Alexander G. Gray

    Abstract: The problem of {\em efficiently} finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied in literature. However, a closely related problem of efficiently finding the best match with respect to the inner product has never been explored in the general setting to the best of our knowledge. In this paper we consider… ▽ More

    Submitted 27 February, 2012; originally announced February 2012.

    Comments: Under submission in KDD 2012

  39. arXiv:1202.4050  [pdf, other

    cs.LG stat.ML

    On the Sample Complexity of Predictive Sparse Coding

    Authors: Nishant A. Mehta, Alexander G. Gray

    Abstract: The goal of predictive sparse coding is to learn a representation of examples as sparse linear combinations of elements from a dictionary, such that a learned hypothesis linear in the new representation performs well on a predictive task. Predictive sparse coding algorithms recently have demonstrated impressive performance on a variety of supervised tasks, but their generalization properties have… ▽ More

    Submitted 7 October, 2012; v1 submitted 17 February, 2012; originally announced February 2012.

    Comments: Sparse Coding Stability Theorem from version 1 has been relaxed considerably using a new notion of coding margin. Old Sparse Coding Stability Theorem still in new version, now as Theorem 2. Presentation of all proofs simplified/improved considerably. Paper reorganized. Empirical analysis showing new coding margin is non-trivial on real datasets

  40. IVOA Recommendation: Vocabularies in the Virtual Observatory Version 1.19

    Authors: Sebastien Derriere, Alasdair J G Gray, Norman Gray, Frederic V Hessman, Tony Linde, Andrea Preite Martinez, Rob Seaman, Brian Thomas

    Abstract: This document specifies a standard format for vocabularies based on the W3C's Resource Description Framework (RDF) and Simple Knowledge Organization System (SKOS). By adopting a standard and simple format, the IVOA will permit different groups to create and maintain their own specialised vocabularies while letting the rest of the astronomical community access, use, and combine them. The use of cur… ▽ More

    Submitted 3 October, 2011; originally announced October 2011.

    Report number: Vocabularies-20091007

  41. arXiv:1105.2769  [pdf, ps, other

    physics.comp-ph cs.DS

    Multibody Multipole Methods

    Authors: Dongryeol Lee, Arkadas Ozakin, Alexander G. Gray

    Abstract: A three-body potential function can account for interactions among triples of particles which are uncaptured by pairwise interaction functions such as Coulombic or Lennard-Jones potentials. Likewise, a multibody potential of order $n$ can account for interactions among $n$-tuples of particles uncaptured by interaction functions of lower orders. To date, the computation of multibody potential funct… ▽ More

    Submitted 30 June, 2012; v1 submitted 13 May, 2011; originally announced May 2011.

    Comments: To appear in Journal of Computational Physics

    MSC Class: 68U01 ACM Class: J.2

  42. arXiv:1102.2878  [pdf, ps, other

    stat.CO cs.DS stat.ML

    Dual-Tree Fast Gauss Transforms

    Authors: Dongryeol Lee, Alexander G. Gray, Andrew W. Moore

    Abstract: Kernel density estimation (KDE) is a popular statistical technique for estimating the underlying density distribution with minimal assumptions. Although they can be shown to achieve asymptotic estimation optimality for any input distribution, cross-validating for an optimal parameter requires significant computation dominated by kernel summations. In this paper we present an improvement to the dua… ▽ More

    Submitted 14 February, 2011; originally announced February 2011.

    Comments: Extended version of a conference paper. Submitted to a journal

  43. The magnetic fields of forming solar-like stars

    Authors: S. G. Gregory, M. Jardine, C. G. Gray, J. -F. Donati

    Abstract: Magnetic fields play a crucial role at all stages of the formation of low mass stars and planetary systems. In the final stages, in particular, they control the kinematics of in-falling gas from circumstellar discs, and the launching and collimation of spectacular outflows. The magnetic coupling with the disc is thought to influence the rotational evolution of the star, while magnetised stellar wi… ▽ More

    Submitted 11 August, 2010; originally announced August 2010.

    Comments: 55 pages, review article accepted for publication in Reports on Progress in Physics. Astro-ph version includes additional appendices

  44. arXiv:1005.0188  [pdf, other

    cs.LG stat.ML

    Generative and Latent Mean Map Kernels

    Authors: Nishant A. Mehta, Alexander G. Gray

    Abstract: We introduce two kernels that extend the mean map, which embeds probability measures in Hilbert spaces. The generative mean map kernel (GMMK) is a smooth similarity measure between probabilistic models. The latent mean map kernel (LMMK) generalizes the non-iid formulation of Hilbert space embeddings of empirical distributions in order to incorporate latent variable models. When comparing certain c… ▽ More

    Submitted 3 May, 2010; originally announced May 2010.

    Comments: 16 pages, 1 figure, 1 table

  45. Sequential category aggregation and partitioning approaches for multi-way contingency tables based on survey and census data

    Authors: L. Fraser Jackson, Alistair G. Gray, Stephen E. Fienberg

    Abstract: Large contingency tables arise in many contexts but especially in the collection of survey and census data by government statistical agencies. Because the vast majority of the variables in this context have a large number of categories, agencies and users need a systematic way of constructing tables which are summaries of such contingency tables. We propose such an approach in this paper by find… ▽ More

    Submitted 11 November, 2008; originally announced November 2008.

    Comments: Published in at http://dx.doi.org/10.1214/08-AOAS175 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS175

    Journal ref: Annals of Applied Statistics 2008, Vol. 2, No. 3, 955-981

  46. arXiv:0810.4611  [pdf, ps, other

    cs.LG

    Learning Isometric Separation Maps

    Authors: Nikolaos Vasiloglou, Alexander G. Gray, David V. Anderson

    Abstract: Maximum Variance Unfolding (MVU) and its variants have been very successful in embedding data-manifolds in lower dimensional spaces, often revealing the true intrinsic dimension. In this paper we show how to also incorporate supervised class information into an MVU-like method without breaking its convexity. We call this method the Isometric Separation Map and we show that the resulting kernel m… ▽ More

    Submitted 15 April, 2009; v1 submitted 25 October, 2008; originally announced October 2008.

    Comments: Submitted to the NIPS workshop on Kernel Learning:Automatic Selection Of Kernels and now presented in MLSP 2009

  47. Eight-Dimensional Mid-Infrared/Optical Bayesian Quasar Selection

    Authors: Gordon T. Richards, Rajesh P. Deo, Mark Lacy, Adam D. Myers, Robert C. Nichol, Nadia L. Zakamska, Robert J. Brunner, W. N. Brandt, Alexander G. Gray, John K. Parejko, Andrew Ptak, Donald P. Schneider, Lisa J. Storrie-Lombardi, Alexander S. Szalay

    Abstract: We explore the multidimensional, multiwavelength selection of quasars from mid-IR (MIR) plus optical data, specifically from Spitzer-IRAC and the Sloan Digital Sky Survey (SDSS). We apply modern statistical techniques to combined Spitzer MIR and SDSS optical data, allowing up to 8-D color selection of quasars. Using a Bayesian selection method, we catalog 5546 quasar candidates to an 8.0 um dept… ▽ More

    Submitted 25 February, 2009; v1 submitted 20 October, 2008; originally announced October 2008.

    Comments: 49 pages, 14 figures, 7 tables. AJ, accepted

    Journal ref: Astron.J.137:3884,2009

  48. arXiv:0810.2311  [pdf, ps, other

    cs.AI cs.CV

    Non-Negative Matrix Factorization, Convexity and Isometry

    Authors: Nikolaos Vasiloglou, Alexander G. Gray, David V. Anderson

    Abstract: In this paper we explore avenues for improving the reliability of dimensionality reduction methods such as Non-Negative Matrix Factorization (NMF) as interpretive exploratory data analysis tools. We first explore the difficulties of the optimization problem underlying NMF, showing for the first time that non-trivial NMF solutions always exist and that the optimization problem is actually convex,… ▽ More

    Submitted 22 April, 2009; v1 submitted 13 October, 2008; originally announced October 2008.

    Comments: accpepted in SIAM Data Mining 2009, 12 pages

  49. Efficient Photometric Selection of Quasars from the Sloan Digital Sky Survey: II. ~1,000,000 Quasars from Data Release Six

    Authors: Gordon T. Richards, Adam D. Myers, Alexander G. Gray, Ryan N. Riegel, Robert C. Nichol, Robert J. Brunner, Alexander S. Szalay, Donald P. Schneider, Scott F. Anderson

    Abstract: We present a catalog of 1,172,157 quasar candidates selected from the photometric imaging data of the Sloan Digital Sky Survey (SDSS). The objects are all point sources to a limiting magnitude of i=21.3 from 8417 sq. deg. of imaging from SDSS Data Release 6 (DR6). This sample extends our previous catalog by using the latest SDSS public release data and probing both UV-excess and high-redshift qu… ▽ More

    Submitted 23 September, 2008; originally announced September 2008.

    Comments: 54 pages, 19 figures, 4 tables. ApJS in press

    Journal ref: Astrophys.J.Suppl.180:67-83,2009

  50. arXiv:0710.1066  [pdf, ps, other

    cond-mat.stat-mech

    Detailed Examination of Transport Coefficients in Cubic-Plus-Quartic Oscillator Chains

    Authors: G. R. Lee-Dadswell, B. G. Nickel, C. G. Gray

    Abstract: We examine the thermal conductivity and bulk viscosity of a one-dimensional (1D) chain of particles with cubic-plus-quartic interparticle potentials and no on-site potentials. This system is equivalent to the FPU-alpha beta system in a subset of its parameter space. We identify three distinct frequency regimes which we call the hydrodynamic regime, the perturbative regime and the collisionless r… ▽ More

    Submitted 4 October, 2007; originally announced October 2007.

    Comments: Latex with references in .bib file. 36 pages, 8 figures. Submitted to J. Stat. Phys. on Sept. 27