-
CompLex: legal systems through the lens of complexity science
Authors:
Pierpaolo Vivo,
Daniel M. Katz,
J. B. Ruhl
Abstract:
While "complexity science" has achieved significant successes in several interdisciplinary fields such as economics and biology, it is only a very recent observation that legal systems -- from the way legal texts are drafted and connected to the rest of the corpus, up to the level of how judges and courts reach decisions under a variety of conflicting inputs -- share several features with standard…
▽ More
While "complexity science" has achieved significant successes in several interdisciplinary fields such as economics and biology, it is only a very recent observation that legal systems -- from the way legal texts are drafted and connected to the rest of the corpus, up to the level of how judges and courts reach decisions under a variety of conflicting inputs -- share several features with standard Complex Adaptive Systems. This review is meant as a gentle introduction to the use of quantitative tools and techniques of complexity science to describe, analyse, and tame the complex web of human interactions that the Law is supposed to regulate. We offer an overview of the main directions of research undertaken so far as well as an outlook for future research, and we argue that statistical physicists and complexity scientists should not ignore the opportunities offered by the cross-fertilisation between legal scholarship and complex-systems modelling.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Measuring Law Over Time: A Network Analytical Framework with an Application to Statutes and Regulations in the United States and Germany
Authors:
Corinna Coupette,
Janis Beckedorf,
Dirk Hartung,
Michael Bommarito,
Daniel Martin Katz
Abstract:
How do complex social systems evolve in the modern world? This question lies at the heart of social physics, and network analysis has proven critical in providing answers to it. In recent years, network analysis has also been used to gain a quantitative understanding of law as a complex adaptive system, but most research has focused on legal documents of a single type, and there exists no unified…
▽ More
How do complex social systems evolve in the modern world? This question lies at the heart of social physics, and network analysis has proven critical in providing answers to it. In recent years, network analysis has also been used to gain a quantitative understanding of law as a complex adaptive system, but most research has focused on legal documents of a single type, and there exists no unified framework for quantitative legal document analysis using network analytical tools. Against this background, we present a comprehensive framework for analyzing legal documents as multi-dimensional, dynamic document networks. We demonstrate the utility of this framework by applying it to an original dataset of statutes and regulations from two different countries, the United States and Germany, spanning more than twenty years (1998-2019). Our framework provides tools for assessing the size and connectivity of the legal system as viewed through the lens of specific document collections as well as for tracking the evolution of individual legal documents over time. Implementing the framework for our dataset, we find that at the federal level, the United States legal system is increasingly dominated by regulations, whereas the German legal system remains governed by statutes. This holds regardless of whether we measure the systems at the macro, the meso, or the micro level.
△ Less
Submitted 5 April, 2021; v1 submitted 27 January, 2021;
originally announced January 2021.
-
Complex Societies and the Growth of the Law
Authors:
Daniel Martin Katz,
Corinna Coupette,
Janis Beckedorf,
Dirk Hartung
Abstract:
While a large number of informal factors influence how people interact, modern societies rely upon law as a primary mechanism to formally control human behaviour. How legal rules impact societal development depends on the interplay between two types of actors: the people who create the rules and the people to which the rules potentially apply. We hypothesise that an increasingly diverse and interc…
▽ More
While a large number of informal factors influence how people interact, modern societies rely upon law as a primary mechanism to formally control human behaviour. How legal rules impact societal development depends on the interplay between two types of actors: the people who create the rules and the people to which the rules potentially apply. We hypothesise that an increasingly diverse and interconnected society might create increasingly diverse and interconnected rules, and assert that legal networks provide a useful lens through which to observe the interaction between law and society. To evaluate these propositions, we present a novel and generalizable model of statutory materials as multidimensional, time-evolving document networks. Applying this model to the federal legislation of the United States and Germany, we find impressive expansion in the size and complexity of laws over the past two and a half decades. We investigate the sources of this development using methods from network science and natural language processing. To allow for cross-country comparisons over time, we algorithmically reorganise the legislative materials of the United States and Germany into cluster families that reflect legal topics. This reorganisation reveals that the main driver behind the growth of the law in both jurisdictions is the expansion of the welfare state, backed by an expansion of the tax state.
△ Less
Submitted 6 August, 2020; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Sensitivity of collective outcomes identifies pivotal components
Authors:
Edward D. Lee,
Daniel M. Katz,
Michael J. Bommarito II,
Paul Ginsparg
Abstract:
A social system is susceptible to perturbation when its collective properties depend sensitively on a few pivotal components. Using the information geometry of minimal models from statistical physics, we develop an approach to identify pivotal components to which coarse-grained, or aggregate, properties are sensitive. As an example, we introduce our approach on a reduced toy model with a median vo…
▽ More
A social system is susceptible to perturbation when its collective properties depend sensitively on a few pivotal components. Using the information geometry of minimal models from statistical physics, we develop an approach to identify pivotal components to which coarse-grained, or aggregate, properties are sensitive. As an example, we introduce our approach on a reduced toy model with a median voter who always votes in the majority. The sensitivity of majority-minority divisions to changing voter behaviour pinpoints the unique role of the median. More generally, the sensitivity identifies pivotal components that precisely determine collective outcomes generated by a complex network of interactions. Using perturbations to target pivotal components in the models, we analyse datasets from political voting, finance and Twitter. Across these systems, we find remarkable variety, from systems dominated by a median-like component to those whose components behave more equally. In the context of political institutions such as courts or legislatures, our methodology can help describe how changes in voters map to new collective voting outcomes. For economic indices, differing system response reflects varying fiscal conditions across time. Thus, our information-geometric approach provides a principled, quantitative framework that may help assess the robustness of collective outcomes to targeted perturbation and compare social institutions, or even biological networks, with one another and across time.
△ Less
Submitted 2 July, 2020; v1 submitted 22 September, 2019;
originally announced September 2019.
-
Crowdsourcing accurately and robustly predicts Supreme Court decisions
Authors:
Daniel Martin Katz,
Michael James Bommarito II,
Josh Blackman
Abstract:
Scholars have increasingly investigated "crowdsourcing" as an alternative to expert-based judgment or purely data-driven approaches to predicting the future. Under certain conditions, scholars have found that crowdsourcing can outperform these other approaches. However, despite interest in the topic and a series of successful use cases, relatively few studies have applied empirical model thinking…
▽ More
Scholars have increasingly investigated "crowdsourcing" as an alternative to expert-based judgment or purely data-driven approaches to predicting the future. Under certain conditions, scholars have found that crowdsourcing can outperform these other approaches. However, despite interest in the topic and a series of successful use cases, relatively few studies have applied empirical model thinking to evaluate the accuracy and robustness of crowdsourcing in real-world contexts. In this paper, we offer three novel contributions. First, we explore a dataset of over 600,000 predictions from over 7,000 participants in a multi-year tournament to predict the decisions of the Supreme Court of the United States. Second, we develop a comprehensive crowd construction framework that allows for the formal description and application of crowdsourcing to real-world data. Third, we apply this framework to our data to construct more than 275,000 crowd models. We find that in out-of-sample historical simulations, crowdsourcing robustly outperforms the commonly-accepted null model, yielding the highest-known performance for this context at 80.8% case level accuracy. To our knowledge, this dataset and analysis represent one of the largest explorations of recurring human prediction to date, and our results provide additional empirical support for the use of crowdsourcing as a prediction method.
△ Less
Submitted 11 December, 2017;
originally announced December 2017.
-
Measuring the temperature and diversity of the U.S. regulatory ecosystem
Authors:
Michael J Bommarito II,
Daniel Martin Katz
Abstract:
Over the last 23 years, the U.S. Securities and Exchange Commission has required over 34,000 companies to file over 165,000 annual reports. These reports, the so-called "Form 10-Ks," contain a characterization of a company's financial performance and its risks, including the regulatory environment in which a company operates. In this paper, we analyze over 4.5 million references to U.S. Federal Ac…
▽ More
Over the last 23 years, the U.S. Securities and Exchange Commission has required over 34,000 companies to file over 165,000 annual reports. These reports, the so-called "Form 10-Ks," contain a characterization of a company's financial performance and its risks, including the regulatory environment in which a company operates. In this paper, we analyze over 4.5 million references to U.S. Federal Acts and Agencies contained within these reports to build a mean-field measurement of temperature and diversity in this regulatory ecosystem, where companies are organisms inhabiting the regulatory environment. While individuals across the political, economic, and academic world frequently refer to trends in this regulatory ecosystem, far less attention has been paid to supporting such claims with large-scale, longitudinal data. In this paper, we document an increase in the regulatory energy per filing, i.e., a warming "temperature." We also find that the diversity of the regulatory ecosystem has been increasing over the past two decades, as measured by the dimensionality of the regulatory space and distance between the "regulatory bitstrings" of companies. These findings support the claim that regulatory activity and complexity are increasing, and this measurement framework contributes an important step towards improving academic and policy discussions around legal complexity and regulation.
△ Less
Submitted 10 January, 2017; v1 submitted 29 December, 2016;
originally announced December 2016.
-
A General Approach for Predicting the Behavior of the Supreme Court of the United States
Authors:
Daniel Martin Katz,
Michael J Bommarito II,
Josh Blackman
Abstract:
Building on developments in machine learning and prior work in the science of judicial prediction, we construct a model designed to predict the behavior of the Supreme Court of the United States in a generalized, out-of-sample context. To do so, we develop a time evolving random forest classifier which leverages some unique feature engineering to predict more than 240,000 justice votes and 28,000…
▽ More
Building on developments in machine learning and prior work in the science of judicial prediction, we construct a model designed to predict the behavior of the Supreme Court of the United States in a generalized, out-of-sample context. To do so, we develop a time evolving random forest classifier which leverages some unique feature engineering to predict more than 240,000 justice votes and 28,000 cases outcomes over nearly two centuries (1816-2015). Using only data available prior to decision, our model outperforms null (baseline) models at both the justice and case level under both parametric and non-parametric tests. Over nearly two centuries, we achieve 70.2% accuracy at the case outcome level and 71.9% at the justice vote level. More recently, over the past century, we outperform an in-sample optimized null model by nearly 5%. Our performance is consistent with, and improves on the general level of prediction demonstrated by prior work; however, our model is distinctive because it can be applied out-of-sample to the entire past and future of the Court, not a single term. Our results represent an important advance for the science of quantitative legal prediction and portend a range of other potential applications.
△ Less
Submitted 17 January, 2017; v1 submitted 11 December, 2016;
originally announced December 2016.
-
Law on the Market? Abnormal Stock Returns and Supreme Court Decision-Making
Authors:
Daniel Martin Katz,
Michael J Bommarito II,
Tyler Soellinger,
James Ming Chen
Abstract:
What happens when the Supreme Court of the United States decides a case impacting one or more publicly-traded firms? While many have observed anecdotal evidence linking decisions or oral arguments to abnormal stock returns, few have rigorously or systematically investigated the behavior of equities around Supreme Court actions. In this research, we present the first comprehensive, longitudinal stu…
▽ More
What happens when the Supreme Court of the United States decides a case impacting one or more publicly-traded firms? While many have observed anecdotal evidence linking decisions or oral arguments to abnormal stock returns, few have rigorously or systematically investigated the behavior of equities around Supreme Court actions. In this research, we present the first comprehensive, longitudinal study on the topic, spanning over 15 years and hundreds of cases and firms. Using both intra- and interday data around decisions and oral arguments, we evaluate the frequency and magnitude of statistically-significant abnormal return events after Supreme Court action. On a per-term basis, we find 5.3 cases and 7.8 stocks that exhibit abnormal returns after decision. In total, across the cases we examined, we find 79 out of the 211 cases (37%) exhibit an average abnormal return of 4.4% over a two-session window with an average $|t|$-statistic of 2.9. Finally, we observe that abnormal returns following Supreme Court decisions materialize over the span of hours and days, not minutes, yielding strong implications for market efficiency in this context. While we cannot causally separate substantive legal impact from mere revision of beliefs, we do find strong evidence that there is indeed a "law on the market" effect as measured by the frequency of abnormal return events, and that these abnormal returns are not immediately incorporated into prices.
△ Less
Submitted 14 May, 2017; v1 submitted 24 August, 2015;
originally announced August 2015.
-
Predicting the Behavior of the Supreme Court of the United States: A General Approach
Authors:
Daniel Martin Katz,
Michael J Bommarito II,
Josh Blackman
Abstract:
Building upon developments in theoretical and applied machine learning, as well as the efforts of various scholars including Guimera and Sales-Pardo (2011), Ruger et al. (2004), and Martin et al. (2004), we construct a model designed to predict the voting behavior of the Supreme Court of the United States. Using the extremely randomized tree method first proposed in Geurts, et al. (2006), a method…
▽ More
Building upon developments in theoretical and applied machine learning, as well as the efforts of various scholars including Guimera and Sales-Pardo (2011), Ruger et al. (2004), and Martin et al. (2004), we construct a model designed to predict the voting behavior of the Supreme Court of the United States. Using the extremely randomized tree method first proposed in Geurts, et al. (2006), a method similar to the random forest approach developed in Breiman (2001), as well as novel feature engineering, we predict more than sixty years of decisions by the Supreme Court of the United States (1953-2013). Using only data available prior to the date of decision, our model correctly identifies 69.7% of the Court's overall affirm and reverse decisions and correctly forecasts 70.9% of the votes of individual justices across 7,700 cases and more than 68,000 justice votes. Our performance is consistent with the general level of prediction offered by prior scholars. However, our model is distinctive as it is the first robust, generalized, and fully predictive model of Supreme Court voting behavior offered to date. Our model predicts six decades of behavior of thirty Justices appointed by thirteen Presidents. With a more sound methodological foundation, our results represent a major advance for the science of quantitative legal prediction and portend a range of other potential applications, such as those described in Katz (2013).
△ Less
Submitted 23 July, 2014;
originally announced July 2014.
-
A Mathematical Approach to the Study of the United States Code
Authors:
Michael J. Bommarito II,
Daniel Martin Katz
Abstract:
The United States Code (Code) is a document containing over 22 million words that represents a large and important source of Federal statutory law. Scholars and policy advocates often discuss the direction and magnitude of changes in various aspects of the Code. However, few have mathematically formalized the notions behind these discussions or directly measured the resulting representations. This…
▽ More
The United States Code (Code) is a document containing over 22 million words that represents a large and important source of Federal statutory law. Scholars and policy advocates often discuss the direction and magnitude of changes in various aspects of the Code. However, few have mathematically formalized the notions behind these discussions or directly measured the resulting representations. This paper addresses the current state of the literature in two ways. First, we formalize a representation of the United States Code as the union of a hierarchical network and a citation network over vertices containing the language of the Code. This representation reflects the fact that the Code is a hierarchically organized document containing language and explicit citations between provisions. Second, we use this formalization to measure aspects of the Code as codified in October 2008, November 2009, and March 2010. These measurements allow for a characterization of the actual changes in the Code over time. Our findings indicate that in the recent past, the Code has grown in its amount of structure, interdependence, and language.
△ Less
Submitted 22 March, 2010;
originally announced March 2010.
-
Properties of the United States Code Citation Network
Authors:
Michael J. Bommarito II,
Daniel Martin Katz
Abstract:
The United States Code (Code) is an important source of Federal law that is produced by the interactions of many heterogeneous actors in a complex, dynamic space. The Code can be represented as the union of a hierarchical network and a citation network over the vertices representing the language of the Code. In this paper, we investigate the properties of the Code's citation network by examining…
▽ More
The United States Code (Code) is an important source of Federal law that is produced by the interactions of many heterogeneous actors in a complex, dynamic space. The Code can be represented as the union of a hierarchical network and a citation network over the vertices representing the language of the Code. In this paper, we investigate the properties of the Code's citation network by examining the directed degree distributions of the network. We find that the power-law model is a plausible fit for the outdegree distribution but not for the indegree distribution. In order to better understand this result, we construct a model with the assumption that the probability of citation is a per-word rate. We calculate the adjusted degree of each vertex under this model and study the directed adjusted degree distributions. These adjusted degree distributions indicate that both the adjusted indegree and outdegree distributions seems to follow a log-normal form, not a power-law form. Our findings indicate that the power-law is not generally applicable to degree distributions within the United States Code but that the distribution of degree per word is well-described by a log-normal model.
△ Less
Submitted 23 March, 2010; v1 submitted 9 November, 2009;
originally announced November 2009.
-
Distance Measures for Dynamic Citation Networks
Authors:
Michael J. Bommarito II,
Daniel Martin Katz,
Jon Zelner,
James H. Fowler
Abstract:
Acyclic digraphs arise in many natural and artificial processes. Among the broader set, dynamic citation networks represent a substantively important form of acyclic digraphs. For example, the study of such networks includes the spread of ideas through academic citations, the spread of innovation through patent citations, and the development of precedent in common law systems. The specific dynam…
▽ More
Acyclic digraphs arise in many natural and artificial processes. Among the broader set, dynamic citation networks represent a substantively important form of acyclic digraphs. For example, the study of such networks includes the spread of ideas through academic citations, the spread of innovation through patent citations, and the development of precedent in common law systems. The specific dynamics that produce such acyclic digraphs not only differentiate them from other classes of graphs, but also provide guidance for the development of meaningful distance measures. In this article, we develop and apply our sink distance measure together with the single-linkage hierarchical clustering algorithm to both a two-dimensional directed preferential attachment model as well as empirical data drawn from the first quarter century of decisions of the United States Supreme Court. Despite applying the simplest combination of distance measures and clustering algorithms, analysis reveals that more accurate and more interpretable clusterings are produced by this scheme.
△ Less
Submitted 30 November, 2009; v1 submitted 9 September, 2009;
originally announced September 2009.
-
On the Stability of Community Detection Algorithms on Longitudinal Citation Data
Authors:
Michael James Bommarito II,
Daniel Martin Katz,
Jon Zelner
Abstract:
There are fundamental differences between citation networks and other classes of graphs. In particular, given that citation networks are directed and acyclic, methods developed primarily for use with undirected social network data may face obstacles. This is particularly true for the dynamic development of community structure in citation networks. Namely, it is neither clear when it is appropria…
▽ More
There are fundamental differences between citation networks and other classes of graphs. In particular, given that citation networks are directed and acyclic, methods developed primarily for use with undirected social network data may face obstacles. This is particularly true for the dynamic development of community structure in citation networks. Namely, it is neither clear when it is appropriate to employ existing community detection approaches nor is it clear how to choose among existing approaches. Using simulated data, we attempt to clarify the conditions under which one should use existing methods and which of these algorithms is appropriate in a given context. We hope this paper will serve as both a useful guidepost and an encouragement to those interested in the development of more targeted approaches for use with longitudinal citation data.
△ Less
Submitted 17 August, 2009; v1 submitted 4 August, 2009;
originally announced August 2009.