-
Regularisation of CART trees by summation of $p$-values
Authors:
Nils Engler,
Mathias Lindholm,
Filip Lindskog,
Taariq Nazar
Abstract:
The standard procedure to decide on the complexity of a CART regression tree is to use cross-validation with the aim of obtaining a predictor that generalises well to unseen data. The randomness in the selection of folds implies that the selected CART tree is not a deterministic function of the data. We propose a deterministic in-sample method that can be used for stopping the growing of a CART tr…
▽ More
The standard procedure to decide on the complexity of a CART regression tree is to use cross-validation with the aim of obtaining a predictor that generalises well to unseen data. The randomness in the selection of folds implies that the selected CART tree is not a deterministic function of the data. We propose a deterministic in-sample method that can be used for stopping the growing of a CART tree based on node-wise statistical tests. This testing procedure is derived using a connection to change point detection, where the null hypothesis corresponds to that there is no signal. The suggested $p$-value based procedure allows us to consider covariate vectors of arbitrary dimension and allows us to bound the $p$-value of an entire tree from above. Further, we show that the test detects a not-too-weak signal with a high probability, given a not-too-small sample size.
We illustrate our methodology and the asymptotic results on both simulated and real world data. Additionally, we illustrate how our $p$-value based method can be used as an automatic deterministic early stopping procedure for tree-based boosting. The boosting iterations stop when the tree to be added consists only of a root node.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
Issues with the Smith-Wilson method
Authors:
Andreas Lagerås,
Mathias Lindholm
Abstract:
The objective of the present paper is to analyse various features of the Smith-Wilson method used for discounting under the EU regulation Solvency II, with special attention to hedging. In particular, we show that all key rate duration hedges of liabilities beyond the Last Liquid Point will be peculiar. Moreover, we show that there is a connection between the occurrence of negative discount factor…
▽ More
The objective of the present paper is to analyse various features of the Smith-Wilson method used for discounting under the EU regulation Solvency II, with special attention to hedging. In particular, we show that all key rate duration hedges of liabilities beyond the Last Liquid Point will be peculiar. Moreover, we show that there is a connection between the occurrence of negative discount factors and singularities in the convergence criterion used to calibrate the model. The main tool used for analysing hedges is a novel stochastic representation of the Smith-Wilson method. Further, we provide necessary conditions needed in order to construct similar, but hedgeable, discount curves.
△ Less
Submitted 5 February, 2016;
originally announced February 2016.
-
Growing networks with preferential addition and deletion of edges
Authors:
Maria Deijfen,
Mathias Lindholm
Abstract:
A preferential attachment model for a growing network incorporating deletion of edges is studied and the expected asymptotic degree distribution is analyzed. At each time step $t=1,2,\ldots$, with probability $π_1>0$ a new vertex with one edge attached to it is added to the network and the edge is connected to an existing vertex chosen proportionally to its degree, with probability $π_2$ a vertex…
▽ More
A preferential attachment model for a growing network incorporating deletion of edges is studied and the expected asymptotic degree distribution is analyzed. At each time step $t=1,2,\ldots$, with probability $π_1>0$ a new vertex with one edge attached to it is added to the network and the edge is connected to an existing vertex chosen proportionally to its degree, with probability $π_2$ a vertex is chosen proportionally to its degree and an edge is added between this vertex and a randomly chosen other vertex, and with probability $π_3=1-π_1-π_2<1/2$ a vertex is chosen proportionally to its degree and a random edge of this vertex is deleted. The model is intended to capture a situation where high-degree vertices are more dynamic than low-degree vertices in the sense that their connections tend to be changing. A recursion formula is derived for the expected asymptotic fraction $p_k$ of vertices with degree $k$, and solving this recursion reveals that, for $π_3<1/3$, we have $p_k\sim k^{-(3-7π_3)/(1-3π_3)}$, while, for $π_3>1/3$, the fraction $p_k$ decays exponentially at rate $(π_1+π_2)/2π_3$. There is hence a non-trivial upper bound for how much deletion the network can incorporate without loosing the power-law behavior of the degree distribution. The analytical results are supported by simulations.
△ Less
Submitted 23 September, 2015;
originally announced September 2015.
-
A dynamic network in a dynamic population: asymptotic properties
Authors:
Tom Britton,
Mathias Lindholm,
Tatyana Turova
Abstract:
We derive asymptotic properties for a stochastic dynamic network model in a stochastic dynamic population. In the model, nodes give birth to new nodes until they die, each node being equipped with a social index given at birth. During the life of a node it creates edges to other nodes, nodes with high social index at higher rate, and edges disappear randomly in time. For this model we derive crite…
▽ More
We derive asymptotic properties for a stochastic dynamic network model in a stochastic dynamic population. In the model, nodes give birth to new nodes until they die, each node being equipped with a social index given at birth. During the life of a node it creates edges to other nodes, nodes with high social index at higher rate, and edges disappear randomly in time. For this model we derive criterion for when a giant connected component exists after the process has evolved for a long period of time, assuming the node population grows to infinity. We also obtain an explicit expression for the degree correlation $ρ$ (of neighbouring nodes) which shows that $ρ$ is always positive irrespective of parameter values in one of the two treated submodels, and may be either positive or negative in the other model, depending on the parameters.
△ Less
Submitted 1 April, 2011;
originally announced April 2011.
-
A note on the component structure in random intersection graphs with tunable clustering
Authors:
Andreas Nordvall Lagerås,
Mathias Lindholm
Abstract:
We study the component structure in random intersection graphs with tunable clustering, and show that the average degree works as a threshold for a phase transition for the size of the largest component. That is, if the expected degree is less than one, the size of the largest component is a.a.s. of logarithmic order, but if the average degree is greater than one, a.a.s. a single large component…
▽ More
We study the component structure in random intersection graphs with tunable clustering, and show that the average degree works as a threshold for a phase transition for the size of the largest component. That is, if the expected degree is less than one, the size of the largest component is a.a.s. of logarithmic order, but if the average degree is greater than one, a.a.s. a single large component of linear order emerges, and the size of the second largest component is at most of logarithmic order.
△ Less
Submitted 27 May, 2008; v1 submitted 10 September, 2007;
originally announced September 2007.
-
Epidemics on random graphs with tunable clustering
Authors:
Tom Britton,
Maria Deijfen,
Andreas Nordvall Lagerås,
Mathias Lindholm
Abstract:
In this paper, a branching process approximation for the spread of a Reed-Frost epidemic on a network with tunable clustering is derived. The approximation gives rise to expressions for the epidemic threshold and the probability of a large outbreak in the epidemic. It is investigated how these quantities varies with the clustering in the graph and it turns out for instance that, as the clusterin…
▽ More
In this paper, a branching process approximation for the spread of a Reed-Frost epidemic on a network with tunable clustering is derived. The approximation gives rise to expressions for the epidemic threshold and the probability of a large outbreak in the epidemic. It is investigated how these quantities varies with the clustering in the graph and it turns out for instance that, as the clustering increases, the epidemic threshold decreases. The network is modelled by a random intersection graph, in which individuals are independently members of a number of groups and two individuals are linked to each other if and only if they share at least one group.
△ Less
Submitted 29 August, 2007;
originally announced August 2007.