-
Optimal Placement of Nature-Based Solutions for Urban Challenges
Authors:
Diego Maria Pinto,
Davide Donato Russo,
Antonio M. Sudoso
Abstract:
Increased urbanization and climate change intensify urban heat islands and degrade air quality, making current mitigation strategies insufficient. Nature-based solutions (NBSs), such as parks, green walls, roofs, and street trees, offer a promising means to regulate urban temperatures and enhance air quality. However, determining their optimal placement to maximize environmental benefits remains a…
▽ More
Increased urbanization and climate change intensify urban heat islands and degrade air quality, making current mitigation strategies insufficient. Nature-based solutions (NBSs), such as parks, green walls, roofs, and street trees, offer a promising means to regulate urban temperatures and enhance air quality. However, determining their optimal placement to maximize environmental benefits remains a pressing challenge. Leveraging Operational Research (OR) tools, we propose a Mixed-Integer Linear Programming (MILP) model that integrates multiple factors, including urban challenges, physical constraints, clustering techniques, convolution theory, and fairness considerations. This model determines the optimal placement of NBSs by addressing metrics such as ground temperature, air quality, and accessibility to green spaces. Through several case study analyses, we demonstrate the effectiveness of our approach in improving environmental and social indicators. This research holds implications for policy and practice, empowering urban planners and policymakers to make informed decisions regarding NBS implementation. Such decisions ensure that investments in urban greening yield maximum environmental, social, and economic benefits.
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
Strong bounds for large-scale Minimum Sum-of-Squares Clustering
Authors:
Anna Livia Croella,
Veronica Piccialli,
Antonio M. Sudoso
Abstract:
Clustering is a fundamental technique in data analysis and machine learning, used to group similar data points together. Among various clustering methods, the Minimum Sum-of-Squares Clustering (MSSC) is one of the most widely used. MSSC aims to minimize the total squared Euclidean distance between data points and their corresponding cluster centroids. Due to the unsupervised nature of clustering,…
▽ More
Clustering is a fundamental technique in data analysis and machine learning, used to group similar data points together. Among various clustering methods, the Minimum Sum-of-Squares Clustering (MSSC) is one of the most widely used. MSSC aims to minimize the total squared Euclidean distance between data points and their corresponding cluster centroids. Due to the unsupervised nature of clustering, achieving global optimality is crucial, yet computationally challenging. The complexity of finding the global solution increases exponentially with the number of data points, making exact methods impractical for large-scale datasets. Even obtaining strong lower bounds on the optimal MSSC objective value is computationally prohibitive, making it difficult to assess the quality of heuristic solutions. We address this challenge by introducing a novel method to validate heuristic MSSC solutions through optimality gaps. Our approach employs a divide-and-conquer strategy, decomposing the problem into smaller instances that can be handled by an exact solver. The decomposition is guided by an auxiliary optimization problem, the "anticlustering problem", for which we design an efficient heuristic. Computational experiments demonstrate the effectiveness of the method for large-scale instances, achieving optimality gaps below 3% in most cases while maintaining reasonable computational times. These results highlight the practicality of our approach in assessing feasible clustering solutions for large datasets, bridging a critical gap in MSSC evaluation.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
A column generation algorithm with dynamic constraint aggregation for minimum sum-of-squares clustering
Authors:
Antonio M. Sudoso,
Daniel Aloise
Abstract:
The minimum sum-of-squares clustering problem (MSSC), also known as $k$-means clustering, refers to the problem of partitioning $n$ data points into $k$ clusters, with the objective of minimizing the total sum of squared Euclidean distances between each point and the center of its assigned cluster. We propose an efficient algorithm for solving large-scale MSSC instances, which combines column gene…
▽ More
The minimum sum-of-squares clustering problem (MSSC), also known as $k$-means clustering, refers to the problem of partitioning $n$ data points into $k$ clusters, with the objective of minimizing the total sum of squared Euclidean distances between each point and the center of its assigned cluster. We propose an efficient algorithm for solving large-scale MSSC instances, which combines column generation (CG) with dynamic constraint aggregation (DCA) to effectively reduce the number of constraints considered in the CG master problem. DCA was originally conceived to reduce degeneracy in set partitioning problems by utilizing an aggregated restricted master problem obtained from a partition of the set partitioning constraints into disjoint clusters. In this work, we explore the use of DCA within a CG algorithm for MSSC exact solution. Our method is fine-tuned by a series of ablation studies on DCA design choices, and is demonstrated to significantly outperform existing state-of-the-art exact approaches available in the literature.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
A Semidefinite Programming-Based Branch-and-Cut Algorithm for Biclustering
Authors:
Antonio M. Sudoso
Abstract:
Biclustering, also called co-clustering, block clustering, or two-way clustering, involves the simultaneous clustering of both the rows and columns of a data matrix into distinct groups, such that the rows and columns within a group display similar patterns. As a model problem for biclustering, we consider the $k$-densest-disjoint biclique problem, whose goal is to identify $k$ disjoint complete b…
▽ More
Biclustering, also called co-clustering, block clustering, or two-way clustering, involves the simultaneous clustering of both the rows and columns of a data matrix into distinct groups, such that the rows and columns within a group display similar patterns. As a model problem for biclustering, we consider the $k$-densest-disjoint biclique problem, whose goal is to identify $k$ disjoint complete bipartite subgraphs (called bicliques) of a given weighted complete bipartite graph such that the sum of their densities is maximized. To address this problem, we present a tailored branch-and-cut algorithm. For the upper bound routine, we consider a semidefinite programming relaxation and propose valid inequalities to strengthen the bound. We solve this relaxation in a cutting-plane fashion using a first-order method. For the lower bound, we design a maximum weight matching rounding procedure that exploits the solution of the relaxation solved at each node. Computational results on both synthetic and real-world instances show that the proposed algorithm can solve instances approximately 20 times larger than those handled by general-purpose solvers.
△ Less
Submitted 4 December, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Optimization meets Machine Learning: An Exact Algorithm for Semi-Supervised Support Vector Machines
Authors:
Veronica Piccialli,
Jan Schwiddessen,
Antonio M. Sudoso
Abstract:
Support vector machines (SVMs) are well-studied supervised learning models for binary classification. In many applications, large amounts of samples can be cheaply and easily obtained. What is often a costly and error-prone process is to manually label these instances. Semi-supervised support vector machines (S3VMs) extend the well-known SVM classifiers to the semi-supervised approach, aiming at m…
▽ More
Support vector machines (SVMs) are well-studied supervised learning models for binary classification. In many applications, large amounts of samples can be cheaply and easily obtained. What is often a costly and error-prone process is to manually label these instances. Semi-supervised support vector machines (S3VMs) extend the well-known SVM classifiers to the semi-supervised approach, aiming at maximizing the margin between samples in the presence of unlabeled data. By leveraging both labeled and unlabeled data, S3VMs attempt to achieve better accuracy and robustness compared to traditional SVMs. Unfortunately, the resulting optimization problem is non-convex and hence difficult to solve exactly. In this paper, we present a new branch-and-cut approach for S3VMs using semidefinite programming (SDP) relaxations. We apply optimality-based bound tightening to bound the feasible set. Box constraints allow us to include valid inequalities, strengthening the lower bound. The resulting SDP relaxation provides bounds significantly stronger than the ones available in the literature. For the upper bound, instead, we define a local search exploiting the solution of the SDP relaxation. Computational results highlight the efficiency of the algorithm, showing its capability to solve instances with a number of data points 10 times larger than the ones solved in the literature.
△ Less
Submitted 25 November, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Optimizing accuracy and diversity: a multi-task approach to forecast combinations
Authors:
Giovanni Felici,
Antonio M. Sudoso
Abstract:
Forecast combination involves using multiple forecasts to create a single, more accurate prediction. Recently, feature-based forecasting has been employed to either select the most appropriate forecasting models or to optimize the weights of their combination. In this paper, we present a multi-task optimization paradigm that focuses on solving both problems simultaneously and enriches current oper…
▽ More
Forecast combination involves using multiple forecasts to create a single, more accurate prediction. Recently, feature-based forecasting has been employed to either select the most appropriate forecasting models or to optimize the weights of their combination. In this paper, we present a multi-task optimization paradigm that focuses on solving both problems simultaneously and enriches current operational research approaches to forecasting. In essence, it incorporates an additional learning and optimization task into the standard feature-based forecasting approach, focusing on the identification of an optimal set of forecasting methods. During the training phase, an optimization model with linear constraints and quadratic objective function is employed to identify accurate and diverse methods for each time series. Moreover, within the training phase, a neural network is used to learn the behavior of that optimization model. Once training is completed the candidate set of methods is identified using the network. The proposed approach elicits the essential role of diversity in feature-based forecasting and highlights the interplay between model combination and model selection when optimizing forecasting ensembles. Experimental results on a large set of series from the M4 competition dataset show that our proposal enhances point forecast accuracy compared to state-of-the-art methods.
△ Less
Submitted 12 December, 2023; v1 submitted 31 October, 2023;
originally announced October 2023.
-
Fix and Bound: An efficient approach for solving large-scale quadratic programming problems with box constraints
Authors:
Marco Locatelli,
Veronica Piccialli,
Antonio M. Sudoso
Abstract:
In this paper, we propose a branch-and-bound algorithm for solving nonconvex quadratic programming problems with box constraints (BoxQP). Our approach combines existing tools, such as semidefinite programming (SDP) bounds strengthened through valid inequalities, with a new class of optimality-based linear cuts which leads to variable fixing. The most important effect of fixing the value of some va…
▽ More
In this paper, we propose a branch-and-bound algorithm for solving nonconvex quadratic programming problems with box constraints (BoxQP). Our approach combines existing tools, such as semidefinite programming (SDP) bounds strengthened through valid inequalities, with a new class of optimality-based linear cuts which leads to variable fixing. The most important effect of fixing the value of some variables is the size reduction along the branch-and-bound tree, allowing to compute bounds by solving SDPs of smaller dimension. Extensive computational experiments over large dimensional (up to $n=200$) test instances show that our method is the state-of-the-art solver on large-scale BoxQPs. Furthermore, we test the proposed approach on the class of binary QP problems, where it exhibits competitive performance with state-of-the-art solvers.
△ Less
Submitted 2 October, 2024; v1 submitted 16 November, 2022;
originally announced November 2022.
-
Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming
Authors:
Veronica Piccialli,
Antonio M. Sudoso
Abstract:
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound ro…
▽ More
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods.
△ Less
Submitted 7 September, 2023; v1 submitted 19 September, 2022;
originally announced September 2022.
-
An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering
Authors:
Veronica Piccialli,
Anna Russo Russo,
Antonio M. Sudoso
Abstract:
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, is traditionally considered an unsupervised learning task. In recent years, the use of background knowledge to improve the cluster quality and promote interpretability of the clustering process has become a hot research topic at the intersection of mathematical optimization and machine learning research. The problem of takin…
▽ More
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, is traditionally considered an unsupervised learning task. In recent years, the use of background knowledge to improve the cluster quality and promote interpretability of the clustering process has become a hot research topic at the intersection of mathematical optimization and machine learning research. The problem of taking advantage of background information in data clustering is called semi-supervised or constrained clustering. In this paper, we present a branch-and-cut algorithm for semi-supervised MSSC, where background knowledge is incorporated as pairwise must-link and cannot-link constraints. For the lower bound procedure, we solve the semidefinite programming relaxation of the MSSC discrete optimization model, and we use a cutting-plane procedure for strengthening the bound. For the upper bound, instead, by using integer programming tools, we use an adaptation of the k-means algorithm to the constrained case. For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points with different combinations of must-link and cannot-link constraints and with a generic number of features. This problem size is about four times larger than the one of the instances solved by state-of-the-art exact algorithms.
△ Less
Submitted 24 July, 2022; v1 submitted 30 November, 2021;
originally announced November 2021.
-
Mixed-Integer Nonlinear Programming for State-based Non-Intrusive Load Monitoring
Authors:
Marco Balletti,
Veronica Piccialli,
Antonio M. Sudoso
Abstract:
Energy disaggregation, known in the literature as Non-Intrusive Load Monitoring (NILM), is the task of inferring the energy consumption of each appliance given the aggregate signal recorded by a single smart meter. In this paper, we propose a novel two-stage optimization-based approach for energy disaggregation. In the first phase, a small training set consisting of disaggregated power profiles is…
▽ More
Energy disaggregation, known in the literature as Non-Intrusive Load Monitoring (NILM), is the task of inferring the energy consumption of each appliance given the aggregate signal recorded by a single smart meter. In this paper, we propose a novel two-stage optimization-based approach for energy disaggregation. In the first phase, a small training set consisting of disaggregated power profiles is used to estimate the parameters and the power states by solving a mixed integer programming problem. Once the model parameters are estimated, the energy disaggregation problem is formulated as a constrained binary quadratic optimization problem. We incorporate penalty terms that exploit prior knowledge on how the disaggregated traces are generated, and appliance-specific constraints characterizing the signature of different types of appliances operating simultaneously. Our approach is compared with existing optimization-based algorithms both on a synthetic dataset and on three real-world datasets. The proposed formulation is computationally efficient, able to disambiguate loads with similar consumption patterns, and successfully reconstruct the signatures of known appliances despite the presence of unmetered devices, thus overcoming the main drawbacks of the optimization-based methods available in the literature.
△ Less
Submitted 22 February, 2022; v1 submitted 16 June, 2021;
originally announced June 2021.
-
SOS-SDP: an Exact Solver for Minimum Sum-of-Squares Clustering
Authors:
Veronica Piccialli,
Antonio M. Sudoso,
Angelika Wiegele
Abstract:
The minimum sum-of-squares clustering problem (MSSC) consists of partitioning $n$ observations into $k$ clusters in order to minimize the sum of squared distances from the points to the centroid of their cluster. In this paper, we propose an exact algorithm for the MSSC problem based on the branch-and-bound technique. The lower bound is computed by using a cutting-plane procedure where valid inequ…
▽ More
The minimum sum-of-squares clustering problem (MSSC) consists of partitioning $n$ observations into $k$ clusters in order to minimize the sum of squared distances from the points to the centroid of their cluster. In this paper, we propose an exact algorithm for the MSSC problem based on the branch-and-bound technique. The lower bound is computed by using a cutting-plane procedure where valid inequalities are iteratively added to the Peng-Wei SDP relaxation. The upper bound is computed with the constrained version of k-means where the initial centroids are extracted from the solution of the SDP relaxation. In the branch-and-bound procedure, we incorporate instance-level must-link and cannot-link constraints to express knowledge about which data points should or should not be grouped together. We manage to reduce the size of the problem at each level preserving the structure of the SDP problem itself. The obtained results show that the approach allows to successfully solve for the first time real-world instances up to 4000 data points.
△ Less
Submitted 23 December, 2021; v1 submitted 23 April, 2021;
originally announced April 2021.