Search | arXiv e-print repository

arXiv:2010.08676 [pdf, other]

Fast Spatial Autocorrelation

Authors: Anar Amgalan, Lilianne R. Mujica-Parodi, Steven S. Skiena

Abstract: Physical or geographic location proves to be an important feature in many data science models, because many diverse natural and social phenomenon have a spatial component. Spatial autocorrelation measures the extent to which locally adjacent observations of the same phenomenon are correlated. Although statistics like Moran's $I$ and Geary's $C$ are widely used to measure spatial autocorrelation, t… ▽ More Physical or geographic location proves to be an important feature in many data science models, because many diverse natural and social phenomenon have a spatial component. Spatial autocorrelation measures the extent to which locally adjacent observations of the same phenomenon are correlated. Although statistics like Moran's $I$ and Geary's $C$ are widely used to measure spatial autocorrelation, they are slow: all popular methods run in $Ω(n^2)$ time, rendering them unusable for large data sets, or long time-courses with moderate numbers of points. We propose a new $S_A$ statistic based on the notion that the variance observed when merging pairs of nearby clusters should increase slowly for spatially autocorrelated variables. We give a linear-time algorithm to calculate $S_A$ for a variable with an input agglomeration order (available at https://github.com/aamgalan/spatial_autocorrelation). For a typical dataset of $n \approx 63,000$ points, our $S_A$ autocorrelation measure can be computed in 1 second, versus 2 hours or more for Moran's $I$ and Geary's $C$. Through simulation studies, we demonstrate that $S_A$ identifies spatial correlations in variables generated with spatially-dependent model half an order of magnitude earlier than either Moran's $I$ or Geary's $C$. Finally, we prove several theoretical properties of $S_A$: namely that it behaves as a true correlation statistic, and is invariant under addition or multiplication by a constant. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: To be published in ICDM 2020

arXiv:1703.04746 [pdf, other]

Citation histories of papers: sometimes the rich get richer, sometimes they don't

Authors: Michael J. Hazoglu, Vivek Kulkarni, Steven S. Skiena, Ken A. Dill

Abstract: We describe a simple model of how a publication's citations change over time, based on pure-birth stochastic processes with a linear cumulative advantage effect. The model is applied to citation data from the Physical Review corpus provided by APS. Our model reveals that papers fall into three different clusters: papers that have rapid initial citations and ultimately high impact (fast-hi), fast t… ▽ More We describe a simple model of how a publication's citations change over time, based on pure-birth stochastic processes with a linear cumulative advantage effect. The model is applied to citation data from the Physical Review corpus provided by APS. Our model reveals that papers fall into three different clusters: papers that have rapid initial citations and ultimately high impact (fast-hi), fast to rise but quick to plateau (fast-flat), or late bloomers (slow-late), which may either never achieve many citations, or do so many years after publication. In "fast-hi" and "slow-late", there is a rich-get-richer effect: papers that have many citations accumulate additional citations more rapidly while the "fast-flat" papers do not display this effect. We conclude by showing that only a few years of post-publication statistics are needed to identify high impact ("fast-hi") papers. △ Less

Submitted 14 March, 2017; originally announced March 2017.

arXiv:cs/0210024 [pdf, ps, other]

doi 10.1016/S0890-5401(03)00060-9

The Lazy Bureaucrat Scheduling Problem

Authors: Esther M. Arkin, Michael A. Bender, Joseph S. B. Mitchell, Steven S. Skiena

Abstract: We introduce a new class of scheduling problems in which the optimization is performed by the worker (single ``machine'') who performs the tasks. A typical worker's objective is to minimize the amount of work he does (he is ``lazy''), or more generally, to schedule as inefficiently (in some sense) as possible. The worker is subject to the constraint that he must be busy when there is work that h… ▽ More We introduce a new class of scheduling problems in which the optimization is performed by the worker (single ``machine'') who performs the tasks. A typical worker's objective is to minimize the amount of work he does (he is ``lazy''), or more generally, to schedule as inefficiently (in some sense) as possible. The worker is subject to the constraint that he must be busy when there is work that he can do; we make this notion precise both in the preemptive and nonpreemptive settings. The resulting class of ``perverse'' scheduling problems, which we denote ``Lazy Bureaucrat Problems,'' gives rise to a rich set of new questions that explore the distinction between maximization and minimization in computing optimal schedules. △ Less

Submitted 26 October, 2002; originally announced October 2002.

Comments: 19 pages, 2 figures, Latex. To appear, Information and Computation

ACM Class: F.2.2; I.2.8

arXiv:cs/0011026 [pdf, ps, other]

When Can You Fold a Map?

Authors: Esther M. Arkin, Michael A. Bender, Erik D. Demaine, Martin L. Demaine, Joseph S. B. Mitchell, Saurabh Sethia, Steven S. Skiena

Abstract: We explore the following problem: given a collection of creases on a piece of paper, each assigned a folding direction of mountain or valley, is there a flat folding by a sequence of simple folds? There are several models of simple folds; the simplest one-layer simple fold rotates a portion of paper about a crease in the paper by +-180 degrees. We first consider the analogous questions in one di… ▽ More We explore the following problem: given a collection of creases on a piece of paper, each assigned a folding direction of mountain or valley, is there a flat folding by a sequence of simple folds? There are several models of simple folds; the simplest one-layer simple fold rotates a portion of paper about a crease in the paper by +-180 degrees. We first consider the analogous questions in one dimension lower -- bending a segment into a flat object -- which lead to interesting problems on strings. We develop efficient algorithms for the recognition of simply foldable 1D crease patterns, and reconstruction of a sequence of simple folds. Indeed, we prove that a 1D crease pattern is flat-foldable by any means precisely if it is by a sequence of one-layer simple folds. Next we explore simple foldability in two dimensions, and find a surprising contrast: ``map'' folding and variants are polynomial, but slight generalizations are NP-complete. Specifically, we develop a linear-time algorithm for deciding foldability of an orthogonal crease pattern on a rectangular piece of paper, and prove that it is (weakly) NP-complete to decide foldability of (1) an orthogonal crease pattern on a orthogonal piece of paper, (2) a crease pattern of axis-parallel and diagonal (45-degree) creases on a square piece of paper, and (3) crease patterns without a mountain/valley assignment. △ Less

Submitted 30 August, 2003; v1 submitted 20 November, 2000; originally announced November 2000.

Comments: 24 pages, 19 figures. Version 3 includes several improvements thanks to referees, including formal definitions of simple folds, more figures, table summarizing results, new open problems, and additional references

ACM Class: F.2.2; G.2.1

Showing 1–4 of 4 results for author: Skiena, S S