-
Co-Training with Active Contrastive Learning and Meta-Pseudo-Labeling on 2D Projections for Deep Semi-Supervised Learning
Authors:
David Aparco-Cardenas,
Jancarlo F. Gomes,
Alexandre X. Falcão,
Pedro J. de Rezende
Abstract:
A major challenge that prevents the training of DL models is the limited availability of accurately labeled data. This shortcoming is highlighted in areas where data annotation becomes a time-consuming and error-prone task. In this regard, SSL tackles this challenge by capitalizing on scarce labeled and abundant unlabeled data; however, SoTA methods typically depend on pre-trained features and lar…
▽ More
A major challenge that prevents the training of DL models is the limited availability of accurately labeled data. This shortcoming is highlighted in areas where data annotation becomes a time-consuming and error-prone task. In this regard, SSL tackles this challenge by capitalizing on scarce labeled and abundant unlabeled data; however, SoTA methods typically depend on pre-trained features and large validation sets to learn effective representations for classification tasks. In addition, the reduced set of labeled data is often randomly sampled, neglecting the selection of more informative samples. Here, we present active-DeepFA, a method that effectively combines CL, teacher-student-based meta-pseudo-labeling and AL to train non-pretrained CNN architectures for image classification in scenarios of scarcity of labeled and abundance of unlabeled data. It integrates DeepFA into a co-training setup that implements two cooperative networks to mitigate confirmation bias from pseudo-labels. The method starts with a reduced set of labeled samples by warming up the networks with supervised CL. Afterward and at regular epoch intervals, label propagation is performed on the 2D projections of the networks' deep features. Next, the most reliable pseudo-labels are exchanged between networks in a cross-training fashion, while the most meaningful samples are annotated and added into the labeled set. The networks independently minimize an objective loss function comprising supervised contrastive, supervised and semi-supervised loss components, enhancing the representations towards image classification. Our approach is evaluated on three challenging biological image datasets using only 5% of labeled samples, improving baselines and outperforming six other SoTA methods. In addition, it reduces annotation effort by achieving comparable results to those of its counterparts with only 3% of labeled data.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Solving the Graph Burning Problem for Large Graphs
Authors:
Felipe de Carvalho Pereira,
Pedro Jussieu de Rezende,
Tallys Yunes,
Luiz Fernando Batista Morato
Abstract:
We propose an exact algorithm for the Graph Burning Problem ($\texttt{GBP}$), an NP-hard optimization problem that models the spread of influence on social networks. Given a graph $G$ with vertex set $V$, the objective is to find a sequence of $k$ vertices in $V$, namely, $v_1, v_2, \dots, v_k$, such that $k$ is minimum and $\bigcup_{i = 1}^{k} \{u\! \in\! V\! : d(u, v_i) \leq k - i\} = V$, where…
▽ More
We propose an exact algorithm for the Graph Burning Problem ($\texttt{GBP}$), an NP-hard optimization problem that models the spread of influence on social networks. Given a graph $G$ with vertex set $V$, the objective is to find a sequence of $k$ vertices in $V$, namely, $v_1, v_2, \dots, v_k$, such that $k$ is minimum and $\bigcup_{i = 1}^{k} \{u\! \in\! V\! : d(u, v_i) \leq k - i\} = V$, where $d(u,v)$ denotes the distance between $u$ and $v$. We formulate the problem as a set covering integer programming model and design a row generation algorithm for the $\texttt{GBP}$. Our method exploits the fact that a very small number of covering constraints is often sufficient for solving the integer model, allowing the corresponding rows to be generated on demand. To date, the most efficient exact algorithm for the $\texttt{GBP}$, denoted here by $\texttt{GDCA}$, is able to obtain optimal solutions for graphs with up to 14,000 vertices within two hours of execution. In comparison, our algorithm finds provably optimal solutions approximately 236 times faster, on average, than $\texttt{GDCA}$. For larger graphs, memory space becomes a limiting factor for $\texttt{GDCA}$. Our algorithm, however, solves real-world instances with almost 200,000 vertices in less than 35 seconds, increasing the size of graphs for which optimal solutions are known by a factor of 14.
△ Less
Submitted 25 September, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Solving the Minimum Convex Partition of Point Sets with Integer Programming
Authors:
Allan Sapucaia,
Pedro J. de Rezende,
Cid C. de Souza
Abstract:
The partition of a problem into smaller sub-problems satisfying certain properties is often a key ingredient in the design of divide-and-conquer algorithms. For questions related to location, the partition problem can be modeled, in geometric terms, as finding a subdivision of a planar map -- which represents, say, a geographical area -- into regions subject to certain conditions while optimizing…
▽ More
The partition of a problem into smaller sub-problems satisfying certain properties is often a key ingredient in the design of divide-and-conquer algorithms. For questions related to location, the partition problem can be modeled, in geometric terms, as finding a subdivision of a planar map -- which represents, say, a geographical area -- into regions subject to certain conditions while optimizing some objective function. In this paper, we investigate one of these geometric problems known as the Minimum Convex Partition Problem (MCPP). A convex partition of a point set $P$ in the plane is a subdivision of the convex hull of $P$ whose edges are segments with both endpoints in $P$ and such that all internal faces are empty convex polygons. The MCPP is an NP-hard problem where one seeks to find a convex partition with the least number of faces.
We present a novel polygon-based integer programming formulation for the MCPP, which leads to better dual bounds than the previously known edge-based model. Moreover, we introduce a primal heuristic, a branching rule and a pricing algorithm. The combination of these techniques leads to the ability to solve instances with twice as many points as previously possible while constrained to identical computational resources. A comprehensive experimental study is presented to show the impact of our design choices.
△ Less
Submitted 6 December, 2020;
originally announced December 2020.
-
Engineering Art Galleries
Authors:
Pedro J. de Rezende,
Cid C. de Souza,
Stephan Friedrichs,
Michael Hemmer,
Alexander Kröller,
Davi C. Tozoni
Abstract:
The Art Gallery Problem is one of the most well-known problems in Computational Geometry, with a rich history in the study of algorithms, complexity, and variants. Recently there has been a surge in experimental work on the problem. In this survey, we describe this work, show the chronology of developments, and compare current algorithms, including two unpublished versions, in an exhaustive experi…
▽ More
The Art Gallery Problem is one of the most well-known problems in Computational Geometry, with a rich history in the study of algorithms, complexity, and variants. Recently there has been a surge in experimental work on the problem. In this survey, we describe this work, show the chronology of developments, and compare current algorithms, including two unpublished versions, in an exhaustive experiment. Furthermore, we show what core algorithmic ingredients have led to recent successes.
△ Less
Submitted 15 February, 2016; v1 submitted 31 October, 2014;
originally announced October 2014.