Search | arXiv e-print repository

arXiv:2505.06621 [pdf, ps, other]

Minimizing Risk Through Minimizing Model-Data Interaction: A Protocol For Relying on Proxy Tasks When Designing Child Sexual Abuse Imagery Detection Models

Authors: Thamiris Coelho, Leo S. F. Ribeiro, João Macedo, Jefersson A. dos Santos, Sandra Avila

Abstract: The distribution of child sexual abuse imagery (CSAI) is an ever-growing concern of our modern world; children who suffered from this heinous crime are revictimized, and the growing amount of illegal imagery distributed overwhelms law enforcement agents (LEAs) with the manual labor of categorization. To ease this burden researchers have explored methods for automating data triage and detection of… ▽ More The distribution of child sexual abuse imagery (CSAI) is an ever-growing concern of our modern world; children who suffered from this heinous crime are revictimized, and the growing amount of illegal imagery distributed overwhelms law enforcement agents (LEAs) with the manual labor of categorization. To ease this burden researchers have explored methods for automating data triage and detection of CSAI, but the sensitive nature of the data imposes restricted access and minimal interaction between real data and learning algorithms, avoiding leaks at all costs. In observing how these restrictions have shaped the literature we formalize a definition of "Proxy Tasks", i.e., the substitute tasks used for training models for CSAI without making use of CSA data. Under this new terminology we review current literature and present a protocol for making conscious use of Proxy Tasks together with consistent input from LEAs to design better automation in this field. Finally, we apply this protocol to study -- for the first time -- the task of Few-shot Indoor Scene Classification on CSAI, showing a final model that achieves promising results on a real-world CSAI dataset whilst having no weights actually trained on sensitive data. △ Less

Submitted 10 May, 2025; originally announced May 2025.

Comments: ACM Conference on Fairness, Accountability, and Transparency (FAccT 2025)

arXiv:2504.14446 [pdf, other]

Neglected Risks: The Disturbing Reality of Children's Images in Datasets and the Urgent Call for Accountability

Authors: Carlos Caetano, Gabriel O. dos Santos, Caio Petrucci, Artur Barros, Camila Laranjeira, Leo S. F. Ribeiro, Júlia F. de Mendonça, Jefersson A. dos Santos, Sandra Avila

Abstract: Including children's images in datasets has raised ethical concerns, particularly regarding privacy, consent, data protection, and accountability. These datasets, often built by scraping publicly available images from the Internet, can expose children to risks such as exploitation, profiling, and tracking. Despite the growing recognition of these issues, approaches for addressing them remain limit… ▽ More Including children's images in datasets has raised ethical concerns, particularly regarding privacy, consent, data protection, and accountability. These datasets, often built by scraping publicly available images from the Internet, can expose children to risks such as exploitation, profiling, and tracking. Despite the growing recognition of these issues, approaches for addressing them remain limited. We explore the ethical implications of using children's images in AI datasets and propose a pipeline to detect and remove such images. As a use case, we built the pipeline on a Vision-Language Model under the Visual Question Answering task and tested it on the #PraCegoVer dataset. We also evaluate the pipeline on a subset of 100,000 images from the Open Images V7 dataset to assess its effectiveness in detecting and removing images of children. The pipeline serves as a baseline for future research, providing a starting point for more comprehensive tools and methodologies. While we leverage existing models trained on potentially problematic data, our goal is to expose and address this issue. We do not advocate for training or deploying such models, but instead call for urgent community reflection and action to protect children's rights. Ultimately, we aim to encourage the research community to exercise - more than an additional - care in creating new datasets and to inspire the development of tools to protect the fundamental rights of vulnerable groups, particularly children. △ Less

Submitted 19 April, 2025; originally announced April 2025.

Comments: ACM Conference on Fairness, Accountability, and Transparency (FAccT 2025)

arXiv:2502.11810 [pdf, other]

Exploring Novel 2D Analogues of Goldene: Electronic, Mechanical, and Optical Properties of Silverene and Copperene

Authors: Emanuel J. A. dos Santos, Rodrigo A. F. Alves, Alexandre C. Dias, Marcelo L. Pereira Junior, Douglas S. Galvão, Luiz A. Ribeiro Junior

Abstract: Two-dimensional (2D) materials have garnered significant attention due to their unique properties and broad application potential. Building on the success of goldene, a monolayer lattice of gold atoms, we explore its proposed silver and copper analogs, silverene and copperene, using density functional theory calculations. Our findings reveal that silverene and copperene are energetically stable, w… ▽ More Two-dimensional (2D) materials have garnered significant attention due to their unique properties and broad application potential. Building on the success of goldene, a monolayer lattice of gold atoms, we explore its proposed silver and copper analogs, silverene and copperene, using density functional theory calculations. Our findings reveal that silverene and copperene are energetically stable, with formation energies of -2.3 eV/atom and -3.1 eV/atom, closely matching goldene's -2.9 eV/atom. Phonon dispersion and ab initio molecular dynamics simulations confirm their structural and dynamical stability at room temperature, showing no bond breaking or structural reconfiguration. Mechanical analyses indicate isotropy, with Young's moduli of 73 N/m, 44 N/m, and 59 N/m for goldene, silverene, and copperene, respectively, alongside Poisson's ratios of 0.46, 0.42, and 0.41. These results suggest comparable rigidity and deformation characteristics. Electronic band structure analysis highlights their metallic nature, with variations in the band profiles at negative energy levels. Despite their metallic character, these materials exhibit optical properties akin to semiconductors, pointing to potential applications in optoelectronics. △ Less

Submitted 17 February, 2025; originally announced February 2025.

Comments: 24 pages and six figures

MSC Class: 00-xx ACM Class: J.2; I.6

arXiv:2410.09603 [pdf, other]

Exploring the Electronic and Mechanical Properties of the Recently Synthesized Nitrogen-Doped Monolayer Amorphous Carbon

Authors: E. J. A. dos Santos, M. L. Pereira Junior, R. M. Tromer, D. S. Galvão, L. A. Ribeiro Junior

Abstract: The recent synthesis of nitrogen-doped monolayer amorphous carbon (MAC @N) opens new possibilities for multifunctional materials. In this study, we have investigated the nitrogen doping limits and their effects on MAC@N's structural and electronic properties using density functional-based tight-binding simulations. Our results show that MAC@N remains stable up to 35\% nitrogen doping, beyond which… ▽ More The recent synthesis of nitrogen-doped monolayer amorphous carbon (MAC @N) opens new possibilities for multifunctional materials. In this study, we have investigated the nitrogen doping limits and their effects on MAC@N's structural and electronic properties using density functional-based tight-binding simulations. Our results show that MAC@N remains stable up to 35\% nitrogen doping, beyond which the lattice becomes unstable. The formation energies of MAC@N are higher than those of nitrogen-doped graphene for all the cases we have investigated. Both undoped MAC and MAC@N exhibit metallic behavior, although only MAC features a Dirac-like cone. MAC has an estimated Young's modulus value of about 410 GPa, while MAC@N's modulus can vary around 416 GPa depending on nitrogen content. MAC displays optical activity in the ultraviolet range, whereas MAC@N features light absorption within the infrared and visible ranges, suggesting potential for distinct optoelectronic applications. Their structural thermal stabilities were addressed through molecular dynamics simulations. MAC melts at approximately 4900K, while MAC@N loses its structural integrity for temperatures ranging from 300K to 3300K, lower than graphene. These results point to potential MAC@N applications in flexible electronics and optoelectronics. △ Less

Submitted 12 October, 2024; originally announced October 2024.

Comments: 15 pages

MSC Class: 00-XX ACM Class: J.2; I.6

arXiv:2409.19474 [pdf, other]

FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models

Authors: Diego A. B. Moreira, Alef Iury Ferreira, Jhessica Silva, Gabriel Oliveira dos Santos, Luiz Pereira, João Medrado Gondim, Gustavo Bonil, Helena Maia, Nádia da Silva, Simone Tiemi Hashiguti, Jefersson A. dos Santos, Helio Pedrini, Sandra Avila

Abstract: Despite significant advancements and pervasive use of vision-language models, a paucity of studies has addressed their ethical implications. These models typically require extensive training data, often from hastily reviewed text and image datasets, leading to highly imbalanced datasets and ethical concerns. Additionally, models initially trained in English are frequently fine-tuned for other lang… ▽ More Despite significant advancements and pervasive use of vision-language models, a paucity of studies has addressed their ethical implications. These models typically require extensive training data, often from hastily reviewed text and image datasets, leading to highly imbalanced datasets and ethical concerns. Additionally, models initially trained in English are frequently fine-tuned for other languages, such as the CLIP model, which can be expanded with more data to enhance capabilities but can add new biases. The CAPIVARA, a CLIP-based model adapted to Portuguese, has shown strong performance in zero-shot tasks. In this paper, we evaluate four different types of discriminatory practices within visual-language models and introduce FairPIVARA, a method to reduce them by removing the most affected dimensions of feature embeddings. The application of FairPIVARA has led to a significant reduction of up to 98% in observed biases while promoting a more balanced word distribution within the model. Our model and code are available at: https://github.com/hiaac-nlp/FairPIVARA. △ Less

Submitted 4 October, 2024; v1 submitted 28 September, 2024; originally announced September 2024.

Comments: 14 pages, 10 figures. Accepted to 35th British Machine Vision Conference (BMVC 2024), Workshop on Privacy, Fairness, Accountability and Transparency in Computer Vision

arXiv:2409.11880 [pdf, other]

How does Goldene Stack?

Authors: Marcelo Lopes Pereira, Jr, Emanuel J. A. dos Santos, Luiz Antonio Ribeiro, Jr, Douglas Soares Galvão

Abstract: The recent synthesis of Goldene, a 2D atomic monolayer of gold, has opened new avenues in exploring novel materials. However, the question of when multilayer Goldene transitions into bulk gold remains unresolved. This study used density functional theory calculations to address this fundamental question. Our findings reveal that multilayer Goldene retains an AA-like stacking configuration of up to… ▽ More The recent synthesis of Goldene, a 2D atomic monolayer of gold, has opened new avenues in exploring novel materials. However, the question of when multilayer Goldene transitions into bulk gold remains unresolved. This study used density functional theory calculations to address this fundamental question. Our findings reveal that multilayer Goldene retains an AA-like stacking configuration of up to six layers, with no observation of Bernal-like stacking as seen in graphene. Goldene spontaneously transitions to a bulk-like gold structure at seven layers, adopting a rhombohedral (ABC-like) stacking characteristic of bulk face-centered cubic (FCC) gold. The atomic arrangement converges entirely to the bulk gold lattice for more than ten layers. Quantum confinement significantly impacts the electronic properties, with monolayer and bulk Goldene exhibiting a single Dirac cone at the X-point of the Brillouin zone. In contrast, multilayer Goldene shows two Dirac cones at the X- and Y-points. Additionally, monolayer Goldene exhibits anisotropic optical absorption, which is absent in bulk gold. This study provides a deeper understanding of multilayer Goldene's structural and electronic properties and stacked 2D materials in general. △ Less

Submitted 1 October, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

Comments: 12 pages

MSC Class: 00-XX ACM Class: J.2; I.6

arXiv:2403.01183 [pdf, other]

Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery

Authors: Pedro H. V. Valois, João Macedo, Leo S. F. Ribeiro, Jefersson A. dos Santos, Sandra Avila

Abstract: Crime in the 21st century is split into a virtual and real world. However, the former has become a global menace to people's well-being and security in the latter. The challenges it presents must be faced with unified global cooperation, and we must rely more than ever on automated yet trustworthy tools to combat the ever-growing nature of online offenses. Over 10 million child sexual abuse report… ▽ More Crime in the 21st century is split into a virtual and real world. However, the former has become a global menace to people's well-being and security in the latter. The challenges it presents must be faced with unified global cooperation, and we must rely more than ever on automated yet trustworthy tools to combat the ever-growing nature of online offenses. Over 10 million child sexual abuse reports are submitted to the US National Center for Missing \& Exploited Children every year, and over 80% originate from online sources. Therefore, investigation centers cannot manually process and correctly investigate all imagery. In light of that, reliable automated tools that can securely and efficiently deal with this data are paramount. In this sense, the scene classification task looks for contextual cues in the environment, being able to group and classify child sexual abuse data without requiring to be trained on sensitive material. The scarcity and limitations of working with child sexual abuse images lead to self-supervised learning, a machine-learning methodology that leverages unlabeled data to produce powerful representations that can be more easily transferred to downstream tasks. This work shows that self-supervised deep learning models pre-trained on scene-centric data can reach 71.6% balanced accuracy on our indoor scene classification task and, on average, 2.2 percentage points better performance than a fully supervised version. We cooperate with Brazilian Federal Police experts to evaluate our indoor classification model on actual child abuse material. The results demonstrate a notable discrepancy between the features observed in widely used scene datasets and those depicted on sensitive materials. △ Less

Submitted 26 October, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

Comments: 13 pages, 5 figures, 4 tables. Under review

arXiv:2312.05327 [pdf, other]

doi 10.1109/MGRS.2024.3470986

Better, Not Just More: Data-Centric Machine Learning for Earth Observation

Authors: Ribana Roscher, Marc Rußwurm, Caroline Gevaert, Michael Kampffmeyer, Jefersson A. dos Santos, Maria Vakalopoulou, Ronny Hänsch, Stine Hansen, Keiller Nogueira, Jonathan Prexl, Devis Tuia

Abstract: Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We arg… ▽ More Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that a shift from a model-centric view to a complementary data-centric perspective is necessary for further improvements in accuracy, generalization ability, and real impact on end-user applications. Furthermore, considering the entire machine learning cycle-from problem definition to model deployment with feedback-is crucial for enhancing machine learning models that can be reliable in unforeseen situations. This work presents a definition as well as a precise categorization and overview of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches. △ Less

Submitted 13 March, 2025; v1 submitted 8 December, 2023; originally announced December 2023.

Journal ref: IEEE Geoscience and Remote Sensing Magazine, vol. 12, no. 4, pp. 335-355, Dec. 2024

arXiv:2310.10423 [pdf, other]

YOLOv7 for Mosquito Breeding Grounds Detection and Tracking

Authors: Camila Laranjeira, Daniel Andrade, Jefersson A. dos Santos

Abstract: With the looming threat of climate change, neglected tropical diseases such as dengue, zika, and chikungunya have the potential to become an even greater global concern. Remote sensing technologies can aid in controlling the spread of Aedes Aegypti, the transmission vector of such diseases, by automating the detection and mapping of mosquito breeding sites, such that local entities can properly in… ▽ More With the looming threat of climate change, neglected tropical diseases such as dengue, zika, and chikungunya have the potential to become an even greater global concern. Remote sensing technologies can aid in controlling the spread of Aedes Aegypti, the transmission vector of such diseases, by automating the detection and mapping of mosquito breeding sites, such that local entities can properly intervene. In this work, we leverage YOLOv7, a state-of-the-art and computationally efficient detection approach, to localize and track mosquito foci in videos captured by unmanned aerial vehicles. We experiment on a dataset released to the public as part of the ICIP 2023 grand challenge entitled Automatic Detection of Mosquito Breeding Grounds. We show that YOLOv7 can be directly applied to detect larger foci categories such as pools, tires, and water tanks and that a cheap and straightforward aggregation of frame-by-frame detection can incorporate time consistency into the tracking process. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: Winning paper of ICIP 2023 Grand Challenge - Automatic Detection of Mosquito Breeding Grounds - https://www02.smt.ufrj.br/~tvdigital/mosquito/challenge/

arXiv:2206.02714 [pdf, other]

FuSS: Fusing Superpixels for Improved Segmentation Consistency

Authors: Ian Nunes, Matheus B. Pereira, Hugo Oliveira, Jefersson A. Dos Santos, Marcus Poggi

Abstract: In this work, we propose two different approaches to improve the semantic consistency of Open Set Semantic Segmentation. First, we propose a method called OpenGMM that extends the OpenPCS framework using a Gaussian Mixture of Models to model the distribution of pixels for each class in a multimodal manner. The second approach is a post-processing which uses superpixels to enforce highly homogeneou… ▽ More In this work, we propose two different approaches to improve the semantic consistency of Open Set Semantic Segmentation. First, we propose a method called OpenGMM that extends the OpenPCS framework using a Gaussian Mixture of Models to model the distribution of pixels for each class in a multimodal manner. The second approach is a post-processing which uses superpixels to enforce highly homogeneous regions to behave equally, rectifying erroneous classified pixels within these regions, we also proposed a novel superpixel method called FuSS. All tests were performed on ISPRS Vaihingen and Potsdam datasets, and both methods were capable to improve quantitative and qualitative results for both datasets. Besides that, the post-process with FuSS achieved state-of-the-art results for both datasets. The official implementation is available at: \url{https://github.com/iannunes/FuSS}. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: submitted to IEEEACCESS. 19 pages

arXiv:2205.10592 [pdf, other]

Facing the Void: Overcoming Missing Data in Multi-View Imagery

Authors: Gabriel Machado, Keiller Nogueira, Matheus Barros Pereira, Jefersson Alex dos Santos

Abstract: In some scenarios, a single input image may not be enough to allow the object classification. In those cases, it is crucial to explore the complementary information extracted from images presenting the same object from multiple perspectives (or views) in order to enhance the general scene understanding and, consequently, increase the performance. However, this task, commonly called multi-view imag… ▽ More In some scenarios, a single input image may not be enough to allow the object classification. In those cases, it is crucial to explore the complementary information extracted from images presenting the same object from multiple perspectives (or views) in order to enhance the general scene understanding and, consequently, increase the performance. However, this task, commonly called multi-view image classification, has a major challenge: missing data. In this paper, we propose a novel technique for multi-view image classification robust to this problem. The proposed method, based on state-of-the-art deep learning-based approaches and metric learning, can be easily adapted and exploited in other applications and domains. A systematic evaluation of the proposed algorithm was conducted using two multi-view aerial-ground datasets with very distinct properties. Results show that the proposed algorithm provides improvements in multi-view image classification accuracy when compared to state-of-the-art methods. Code available at \url{https://github.com/Gabriellm2003/remote_sensing_missing_data}. △ Less

Submitted 21 May, 2022; originally announced May 2022.

arXiv:2204.14110 [pdf, other]

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets

Authors: Camila Laranjeira, João Macedo, Sandra Avila, Jefersson A. dos Santos

Abstract: The online sharing and viewing of Child Sexual Abuse Material (CSAM) are growing fast, such that human experts can no longer handle the manual inspection. However, the automatic classification of CSAM is a challenging field of research, largely due to the inaccessibility of target data that is - and should forever be - private and in sole possession of law enforcement agencies. To aid researchers… ▽ More The online sharing and viewing of Child Sexual Abuse Material (CSAM) are growing fast, such that human experts can no longer handle the manual inspection. However, the automatic classification of CSAM is a challenging field of research, largely due to the inaccessibility of target data that is - and should forever be - private and in sole possession of law enforcement agencies. To aid researchers in drawing insights from unseen data and safely providing further understanding of CSAM images, we propose an analysis template that goes beyond the statistics of the dataset and respective labels. It focuses on the extraction of automatic signals, provided both by pre-trained machine learning models, e.g., object categories and pornography detection, as well as image metrics such as luminance and sharpness. Only aggregated statistics of sparse signals are provided to guarantee the anonymity of children and adolescents victimized. The pipeline allows filtering the data by applying thresholds to each specified signal and provides the distribution of such signals within the subset, correlations between signals, as well as a bias evaluation. We demonstrated our proposal on the Region-based annotated Child Pornography Dataset (RCPD), one of the few CSAM benchmarks in the literature, composed of over 2000 samples among regular and CSAM images, produced in partnership with Brazil's Federal Police. Although noisy and limited in several senses, we argue that automatic signals can highlight important aspects of the overall distribution of data, which is valuable for databases that can not be disclosed. Our goal is to safely publicize the characteristics of CSAM datasets, encouraging researchers to join the field and perhaps other institutions to provide similar reports on their benchmarks. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: FAccT 2022 - 5th Conference on Fairness, Accountability and Transparency

MSC Class: 68U99 ACM Class: J.4

arXiv:2203.01368 [pdf, other]

Conditional Reconstruction for Open-set Semantic Segmentation

Authors: Ian Nunes, Matheus B. Pereira, Hugo Oliveira, Jefersson A. dos Santos, Marcus Poggi

Abstract: Open set segmentation is a relatively new and unexploredtask, with just a handful of methods proposed to model suchtasks.We propose a novel method called CoReSeg thattackles the issue using class conditional reconstruction ofthe input images according to their pixelwise mask. Ourmethod conditions each input pixel to all known classes,expecting higher errors for pixels of unknown classes. Itwas obs… ▽ More Open set segmentation is a relatively new and unexploredtask, with just a handful of methods proposed to model suchtasks.We propose a novel method called CoReSeg thattackles the issue using class conditional reconstruction ofthe input images according to their pixelwise mask. Ourmethod conditions each input pixel to all known classes,expecting higher errors for pixels of unknown classes. Itwas observed that the proposed method produces better se-mantic consistency in its predictions, resulting in cleanersegmentation maps that better fit object boundaries. CoRe-Seg outperforms state-of-the-art methods on the Vaihin-gen and Potsdam ISPRS datasets, while also being com-petitive on the Houston 2018 IEEE GRSS Data Fusiondataset. Official implementation for CoReSeg is availableat:https://github.com/iannunes/CoReSeg. △ Less

Submitted 2 March, 2022; originally announced March 2022.

arXiv:2109.01693 [pdf, other]

Weakly Supervised Few-Shot Segmentation Via Meta-Learning

Authors: Pedro H. T. Gama, Hugo Oliveira, José Marcato Junior, Jefersson A. dos Santos

Abstract: Semantic segmentation is a classic computer vision task with multiple applications, which includes medical and remote sensing image analysis. Despite recent advances with deep-based approaches, labeling samples (pixels) for training models is laborious and, in some cases, unfeasible. In this paper, we present two novel meta learning methods, named WeaSeL and ProtoSeg, for the few-shot semantic seg… ▽ More Semantic segmentation is a classic computer vision task with multiple applications, which includes medical and remote sensing image analysis. Despite recent advances with deep-based approaches, labeling samples (pixels) for training models is laborious and, in some cases, unfeasible. In this paper, we present two novel meta learning methods, named WeaSeL and ProtoSeg, for the few-shot semantic segmentation task with sparse annotations. We conducted extensive evaluation of the proposed methods in different applications (12 datasets) in medical imaging and agricultural remote sensing, which are very distinct fields of knowledge and usually subject to data scarcity. The results demonstrated the potential of our method, achieving suitable results for segmenting both coffee/orange crops and anatomical parts of the human body in comparison with full dense annotation. △ Less

Submitted 3 September, 2021; originally announced September 2021.

arXiv:2108.11535 [pdf, other]

ChessMix: Spatial Context Data Augmentation for Remote Sensing Semantic Segmentation

Authors: Matheus Barros Pereira, Jefersson Alex dos Santos

Abstract: Labeling semantic segmentation datasets is a costly and laborious process if compared with tasks like image classification and object detection. This is especially true for remote sensing applications that not only work with extremely high spatial resolution data but also commonly require the knowledge of experts of the area to perform the manual labeling. Data augmentation techniques help to impr… ▽ More Labeling semantic segmentation datasets is a costly and laborious process if compared with tasks like image classification and object detection. This is especially true for remote sensing applications that not only work with extremely high spatial resolution data but also commonly require the knowledge of experts of the area to perform the manual labeling. Data augmentation techniques help to improve deep learning models under the circumstance of few and imbalanced labeled samples. In this work, we propose a novel data augmentation method focused on exploring the spatial context of remote sensing semantic segmentation. This method, ChessMix, creates new synthetic images from the existing training set by mixing transformed mini-patches across the dataset in a chessboard-like grid. ChessMix prioritizes patches with more examples of the rarest classes to alleviate the imbalance problems. The results in three diverse well-known remote sensing datasets show that this is a promising approach that helps to improve the networks' performance, working especially well in datasets with few available data. The results also show that ChessMix is capable of improving the segmentation of objects with few labeled pixels when compared to the most common data augmentation methods widely used. △ Less

Submitted 25 August, 2021; originally announced August 2021.

arXiv:2108.05476 [pdf, other]

Learning to Segment Medical Images from Few-Shot Sparse Labels

Authors: Pedro H. T. Gama, Hugo Oliveira, Jefersson A. dos Santos

Abstract: In this paper, we propose a novel approach for few-shot semantic segmentation with sparse labeled images. We investigate the effectiveness of our method, which is based on the Model-Agnostic Meta-Learning (MAML) algorithm, in the medical scenario, where the use of sparse labeling and few-shot can alleviate the cost of producing new annotated datasets. Our method uses sparse labels in the meta-trai… ▽ More In this paper, we propose a novel approach for few-shot semantic segmentation with sparse labeled images. We investigate the effectiveness of our method, which is based on the Model-Agnostic Meta-Learning (MAML) algorithm, in the medical scenario, where the use of sparse labeling and few-shot can alleviate the cost of producing new annotated datasets. Our method uses sparse labels in the meta-training and dense labels in the meta-test, thus making the model learn to predict dense labels from sparse ones. We conducted experiments with four Chest X-Ray datasets to evaluate two types of annotations (grid and points). The results show that our method is the most suitable when the target domain highly differs from source domains, achieving Jaccard scores comparable to dense labels, using less than 2% of the pixels of an image with labels in few-shot scenarios. △ Less

Submitted 21 August, 2021; v1 submitted 11 August, 2021; originally announced August 2021.

arXiv:2105.10013 [pdf, other]

Opening Deep Neural Networks with Generative Models

Authors: Marcos Vendramini, Hugo Oliveira, Alexei Machado, Jefersson A. dos Santos

Abstract: Image classification methods are usually trained to perform predictions taking into account a predefined group of known classes. Real-world problems, however, may not allow for a full knowledge of the input and label spaces, making failures in recognition a hazard to deep visual learning. Open set recognition methods are characterized by the ability to correctly identify inputs of known and unknow… ▽ More Image classification methods are usually trained to perform predictions taking into account a predefined group of known classes. Real-world problems, however, may not allow for a full knowledge of the input and label spaces, making failures in recognition a hazard to deep visual learning. Open set recognition methods are characterized by the ability to correctly identify inputs of known and unknown classes. In this context, we propose GeMOS: simple and plug-and-play open set recognition modules that can be attached to pretrained Deep Neural Networks for visual recognition. The GeMOS framework pairs pre-trained Convolutional Neural Networks with generative models for open set recognition to extract open set scores for each sample, allowing for failure recognition in object recognition tasks. We conduct a thorough evaluation of the proposed method in comparison with state-of-the-art open set algorithms, finding that GeMOS either outperforms or is statistically indistinguishable from more complex and costly models. △ Less

Submitted 29 June, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

arXiv:2011.08325 [pdf, other]

A New Similarity Space Tailored for Supervised Deep Metric Learning

Authors: Pedro H. Barros, Fabiane Queiroz, Flavio Figueredo, Jefersson A. dos Santos, Heitor S. Ramos

Abstract: We propose a novel deep metric learning method. Differently from many works on this area, we defined a novel latent space obtained through an autoencoder. The new space, namely S-space, is divided into different regions that describe the positions where pairs of objects are similar/dissimilar. We locate makers to identify these regions. We estimate the similarities between objects through a kernel… ▽ More We propose a novel deep metric learning method. Differently from many works on this area, we defined a novel latent space obtained through an autoencoder. The new space, namely S-space, is divided into different regions that describe the positions where pairs of objects are similar/dissimilar. We locate makers to identify these regions. We estimate the similarities between objects through a kernel-based t-student distribution to measure the markers' distance and the new data representation. In our approach, we simultaneously estimate the markers' position in the S-space and represent the objects in the same space. Moreover, we propose a new regularization function to avoid similar markers to collapse altogether. We present evidences that our proposal can represent complex spaces, for instance, when groups of similar objects are located in disjoint regions. We compare our proposal to 9 different distance metric learning approaches (four of them are based on deep-learning) on 28 real-world heterogeneous datasets. According to the four quantitative metrics used, our method overcomes all the nine strategies from the literature. △ Less

Submitted 18 November, 2020; v1 submitted 16 November, 2020; originally announced November 2020.

Comments: 47 pages, 11 figures

arXiv:2011.05127 [pdf, other]

doi 10.3390/rs12142267

A Soft Computing Approach for Selecting and Combining Spectral Bands

Authors: Juan F. H. Albarracín, Rafael S. Oliveira, Marina Hirota, Jefersson A. dos Santos, Ricardo da S. Torres

Abstract: We introduce a soft computing approach for automatically selecting and combining indices from remote sensing multispectral images that can be used for classification tasks. The proposed approach is based on a Genetic-Programming (GP) framework, a technique successfully used in a wide variety of optimization problems. Through GP, it is possible to learn indices that maximize the separability of sam… ▽ More We introduce a soft computing approach for automatically selecting and combining indices from remote sensing multispectral images that can be used for classification tasks. The proposed approach is based on a Genetic-Programming (GP) framework, a technique successfully used in a wide variety of optimization problems. Through GP, it is possible to learn indices that maximize the separability of samples from two different classes. Once the indices specialized for all the pairs of classes are obtained, they are used in pixelwise classification tasks. We used the GP-based solution to evaluate complex classification problems, such as those that are related to the discrimination of vegetation types within and between tropical biomes. Using time series defined in terms of the learned spectral indices, we show that the GP framework leads to superior results than other indices that are used to discriminate and classify tropical biomes. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: MDPI Remote Sensing - Special Issue "Current Limits and New Challenges and Opportunities in Soft Computing, Machine Learning and Computational Intelligence for Remote Sensing"

Journal ref: Remote Sens. 2020, 12(14), 2267

arXiv:2008.01133 [pdf, other]

AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification

Authors: Gabriel Machado, Edemir Ferreira, Keiller Nogueira, Hugo Oliveira, Pedro Gama, Jefersson A. dos Santos

Abstract: It is undeniable that aerial/satellite images can provide useful information for a large variety of tasks. But, since these images are always looking from above, some applications can benefit from complementary information provided by other perspective views of the scene, such as ground-level images. Despite a large number of public repositories for both georeferenced photographs and aerial images… ▽ More It is undeniable that aerial/satellite images can provide useful information for a large variety of tasks. But, since these images are always looking from above, some applications can benefit from complementary information provided by other perspective views of the scene, such as ground-level images. Despite a large number of public repositories for both georeferenced photographs and aerial images, there is a lack of benchmark datasets that allow the development of approaches that exploit the benefits and complementarity of aerial/ground imagery. In this paper, we present two new publicly available datasets named \thedataset~and CV-BrCT. The first one contains triplets of images from the same geographic coordinate with different perspectives of view extracted from various places around the world. Each triplet is composed of an aerial RGB image, a ground-level perspective image, and a Sentinel-2 sample. The second dataset contains pairs of aerial and street-level images extracted from southeast Brazil. We design an extensive set of experiments concerning multi-view scene classification, using early and late fusion. Such experiments were conducted to show that image classification can be enhanced using multi-view data. △ Less

Submitted 3 August, 2020; originally announced August 2020.

arXiv:2006.14673 [pdf, other]

Fully Convolutional Open Set Segmentation

Authors: Hugo Oliveira, Caio Silva, Gabriel L. S. Machado, Keiller Nogueira, Jefersson A. dos Santos

Abstract: In semantic segmentation knowing about all existing classes is essential to yield effective results with the majority of existing approaches. However, these methods trained in a Closed Set of classes fail when new classes are found in the test phase. It means that they are not suitable for Open Set scenarios, which are very common in real-world computer vision and remote sensing applications. In t… ▽ More In semantic segmentation knowing about all existing classes is essential to yield effective results with the majority of existing approaches. However, these methods trained in a Closed Set of classes fail when new classes are found in the test phase. It means that they are not suitable for Open Set scenarios, which are very common in real-world computer vision and remote sensing applications. In this paper, we discuss the limitations of Closed Set segmentation and propose two fully convolutional approaches to effectively address Open Set semantic segmentation: OpenFCN and OpenPCS. OpenFCN is based on the well-known OpenMax algorithm, configuring a new application of this approach in segmentation settings. OpenPCS is a fully novel approach based on feature-space from DNN activations that serve as features for computing PCA and multi-variate gaussian likelihood in a lower dimensional space. Experiments were conducted on the well-known Vaihingen and Potsdam segmentation datasets. OpenFCN showed little-to-no improvement when compared to the simpler and much more time efficient SoftMax thresholding, while being between some orders of magnitude slower. OpenPCS achieved promising results in almost all experiments by overcoming both OpenFCN and SoftMax thresholding. OpenPCS is also a reasonable compromise between the runtime performances of the extremely fast SoftMax thresholding and the extremely slow OpenFCN, being close able to run close to real-time. Experiments also indicate that OpenPCS is effective, robust and suitable for Open Set segmentation, being able to improve the recognition of unknown class pixels without reducing the accuracy on the known class pixels. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: Submitted to the Machine Learning Journal

arXiv:2003.07955 [pdf, other]

An End-to-end Framework For Low-Resolution Remote Sensing Semantic Segmentation

Authors: Matheus Barros Pereira, Jefersson Alex dos Santos

Abstract: High-resolution images for remote sensing applications are often not affordable or accessible, especially when in need of a wide temporal span of recordings. Given the easy access to low-resolution (LR) images from satellites, many remote sensing works rely on this type of data. The problem is that LR images are not appropriate for semantic segmentation, due to the need for high-quality data for a… ▽ More High-resolution images for remote sensing applications are often not affordable or accessible, especially when in need of a wide temporal span of recordings. Given the easy access to low-resolution (LR) images from satellites, many remote sensing works rely on this type of data. The problem is that LR images are not appropriate for semantic segmentation, due to the need for high-quality data for accurate pixel prediction for this task. In this paper, we propose an end-to-end framework that unites a super-resolution and a semantic segmentation module in order to produce accurate thematic maps from LR inputs. It allows the semantic segmentation network to conduct the reconstruction process, modifying the input image with helpful textures. We evaluate the framework with three remote sensing datasets. The results show that the framework is capable of achieving a semantic segmentation performance close to native high-resolution data, while also surpassing the performance of a network trained with LR inputs. △ Less

Submitted 17 March, 2020; originally announced March 2020.

arXiv:2003.07948 [pdf, other]

BrazilDAM: A Benchmark dataset for Tailings Dam Detection

Authors: Edemir Ferreira, Matheus Brito, Remis Balaniuk, Mário S. Alvim, Jefersson A. dos Santos

Abstract: In this work we present BrazilDAM, a novel public dataset based on Sentinel-2 and Landsat-8 satellite images covering all tailings dams cataloged by the Brazilian National Mining Agency (ANM). The dataset was built using georeferenced images from 769 dams, recorded between 2016 and 2019. The time series were processed in order to produce cloud free images. The dams contain mining waste from differ… ▽ More In this work we present BrazilDAM, a novel public dataset based on Sentinel-2 and Landsat-8 satellite images covering all tailings dams cataloged by the Brazilian National Mining Agency (ANM). The dataset was built using georeferenced images from 769 dams, recorded between 2016 and 2019. The time series were processed in order to produce cloud free images. The dams contain mining waste from different ore categories and have highly varying shapes, areas and volumes, making BrazilDAM particularly interesting and challenging to be used in machine learning benchmarks. The original catalog contains, besides the dam coordinates, information about: the main ore, constructive method, risk category, and associated potential damage. To evaluate BrazilDAM's predictive potential we performed classification essays using state-of-the-art deep Convolutional Neural Network (CNNs). In the experiments, we achieved an average classification accuracy of 94.11% in tailing dam binary classification task. In addition, others four setups of experiments were made using the complementary information from the original catalog, exhaustively exploiting the capacity of the proposed dataset. △ Less

Submitted 13 May, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

arXiv:2001.10063 [pdf, other]

Towards Open-Set Semantic Segmentation of Aerial Images

Authors: Caio C. V. da Silva, Keiller Nogueira, Hugo N. Oliveira, Jefersson A. dos Santos

Abstract: Classical and more recently deep computer vision methods are optimized for visible spectrum images, commonly encoded in grayscale or RGB colorspaces acquired from smartphones or cameras. A more uncommon source of images exploited in the remote sensing field are satellite and aerial images. However, the development of pattern recognition approaches for these data is relatively recent, mainly due to… ▽ More Classical and more recently deep computer vision methods are optimized for visible spectrum images, commonly encoded in grayscale or RGB colorspaces acquired from smartphones or cameras. A more uncommon source of images exploited in the remote sensing field are satellite and aerial images. However, the development of pattern recognition approaches for these data is relatively recent, mainly due to the limited availability of this type of images, as until recently they were used exclusively for military purposes. Access to aerial imagery, including spectral information, has been increasing mainly due to the low cost of drones, cheapening of imaging satellite launch costs, and novel public datasets. Usually remote sensing applications employ computer vision techniques strictly modeled for classification tasks in closed set scenarios. However, real-world tasks rarely fit into closed set contexts, frequently presenting previously unknown classes, characterizing them as open set scenarios. Focusing on this problem, this is the first paper to study and develop semantic segmentation techniques for open set scenarios applied to remote sensing images. The main contributions of this paper are: 1) a discussion of related works in open set semantic segmentation, showing evidence that these techniques can be adapted for open set remote sensing tasks; 2) the development and evaluation of a novel approach for open set semantic segmentation. Our method yielded competitive results when compared to closed set methods for the same dataset. △ Less

Submitted 27 January, 2020; originally announced January 2020.

arXiv:1907.13025 [pdf, other]

SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition

Authors: Carlos Caetano, Jessica Sena, François Brémond, Jefersson A. dos Santos, William Robson Schwartz

Abstract: Due to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community. Many works have focused on encoding skeleton data as skeleton image representations based on spatial structure of the skeleton joints, in which the temporal dynamics of the sequence is encoded as variations in columns and the spatial structure of eac… ▽ More Due to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community. Many works have focused on encoding skeleton data as skeleton image representations based on spatial structure of the skeleton joints, in which the temporal dynamics of the sequence is encoded as variations in columns and the spatial structure of each frame is represented as rows of a matrix. To further improve such representations, we introduce a novel skeleton image representation to be used as input of Convolutional Neural Networks (CNNs), named SkeleMotion. The proposed approach encodes the temporal dynamics by explicitly computing the magnitude and orientation values of the skeleton joints. Different temporal scales are employed to compute motion values to aggregate more temporal dynamics to the representation making it able to capture longrange joint interactions involved in actions as well as filtering noisy motion values. Experimental results demonstrate the effectiveness of the proposed representation on 3D action recognition outperforming the state-of-the-art on NTU RGB+D 120 dataset. △ Less

Submitted 30 July, 2019; originally announced July 2019.

Comments: 16-th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS2019)

arXiv:1906.01751 [pdf, other]

An Introduction to Deep Morphological Networks

Authors: Keiller Nogueira, Jocelyn Chanussot, Mauro Dalla Mura, Jefersson A. dos Santos

Abstract: The recent impressive results of deep learning-based methods on computer vision applications brought fresh air to the research and industrial community. This success is mainly due to the process that allows those methods to learn data-driven features, generally based upon linear operations. However, in some scenarios, such operations do not have a good performance because of their inherited proces… ▽ More The recent impressive results of deep learning-based methods on computer vision applications brought fresh air to the research and industrial community. This success is mainly due to the process that allows those methods to learn data-driven features, generally based upon linear operations. However, in some scenarios, such operations do not have a good performance because of their inherited process that blurs edges, losing notions of corners, borders, and geometry of objects. Overcoming this, non-linear operations, such as morphological ones, may preserve such properties of the objects, being preferable and even state-of-the-art in some applications. Encouraged by this, in this work, we propose a novel network, called Deep Morphological Network (DeepMorphNet), capable of doing non-linear morphological operations while performing the feature learning process by optimizing the structuring elements. The DeepMorphNets can be trained and optimized end-to-end using traditional existing techniques commonly employed in the training of deep learning approaches. A systematic evaluation of the proposed algorithm is conducted using two synthetic and two traditional image classification datasets. Results show that the proposed DeepMorphNets is a promising technique that can learn distinct features when compared to the ones learned by current deep learning methods. △ Less

Submitted 9 July, 2021; v1 submitted 4 June, 2019; originally announced June 2019.

arXiv:1903.00774 [pdf, other]

doi 10.1109/LGRS.2019.2903194

Spatio-Temporal Vegetation Pixel Classification By Using Convolutional Networks

Authors: Keiller Nogueira, Jefersson A. dos Santos, Nathalia Menini, Thiago S. F. Silva, Leonor Patricia C. Morellato, Ricardo da S. Torres

Abstract: Plant phenology studies rely on long-term monitoring of life cycles of plants. High-resolution unmanned aerial vehicles (UAVs) and near-surface technologies have been used for plant monitoring, demanding the creation of methods capable of locating and identifying plant species through time and space. However, this is a challenging task given the high volume of data, the constant data missing from… ▽ More Plant phenology studies rely on long-term monitoring of life cycles of plants. High-resolution unmanned aerial vehicles (UAVs) and near-surface technologies have been used for plant monitoring, demanding the creation of methods capable of locating and identifying plant species through time and space. However, this is a challenging task given the high volume of data, the constant data missing from temporal dataset, the heterogeneity of temporal profiles, the variety of plant visual patterns, and the unclear definition of individuals' boundaries in plant communities. In this letter, we propose a novel method, suitable for phenological monitoring, based on Convolutional Networks (ConvNets) to perform spatio-temporal vegetation pixel-classification on high resolution images. We conducted a systematic evaluation using high-resolution vegetation image datasets associated with the Brazilian Cerrado biome. Experimental results show that the proposed approach is effective, overcoming other spatio-temporal pixel-classification strategies. △ Less

Submitted 2 March, 2019; originally announced March 2019.

arXiv:1901.05553 [pdf, other]

Truly Generalizable Radiograph Segmentation with Conditional Domain Adaptation

Authors: Hugo Oliveira, Edemir Ferreira, Jefersson A. dos Santos

Abstract: Digitization techniques for biomedical images yield different visual patterns in radiological exams. These differences may hamper the use of data-driven approaches for inference over these images, such as Deep Neural Networks. Another noticeable difficulty in this field is the lack of labeled data, even though in many cases there is an abundance of unlabeled data available. Therefore an important… ▽ More Digitization techniques for biomedical images yield different visual patterns in radiological exams. These differences may hamper the use of data-driven approaches for inference over these images, such as Deep Neural Networks. Another noticeable difficulty in this field is the lack of labeled data, even though in many cases there is an abundance of unlabeled data available. Therefore an important step in improving the generalization capabilities of these methods is to perform Unsupervised and Semi-Supervised Domain Adaptation between different datasets of biomedical images. In order to tackle this problem, in this work we propose an Unsupervised and Semi-Supervised Domain Adaptation method for segmentation of biomedical images using Generative Adversarial Networks for Unsupervised Image Translation. We merge these unsupervised networks with supervised deep semantic segmentation architectures in order to create a semi-supervised method capable of learning from both unlabeled and labeled data, whenever labeling is available. We compare our method using several domains, datasets, segmentation tasks and traditional baselines, such as unsupervised distance-based methods and reusing pretrained models both with and without Fine-tuning. We perform both quantitative and qualitative analysis of the proposed method and baselines in the distinct scenarios considered in our experimental evaluation. The proposed method shows consistently better results than the baselines in scarce labeled data scenarios, achieving Jaccard values greater than 0.9 and good segmentation quality in most tasks. Unsupervised Domain Adaptation results were observed to be close to the Fully Supervised Domain Adaptation used in the traditional procedure of Fine-tuning pretrained networks. △ Less

Submitted 6 December, 2019; v1 submitted 16 January, 2019; originally announced January 2019.

arXiv:1806.02400 [pdf, other]

A Comparative Study on Unsupervised Domain Adaptation Approaches for Coffee Crop Mapping

Authors: Edemir Ferreira, Mário S. Alvim, Jefersson A. dos Santos

Abstract: In this work, we investigate the application of existing unsupervised domain adaptation (UDA) approaches to the task of transferring knowledge between crop regions having different coffee patterns. Given a geographical region with fully mapped coffee plantations, we observe that this knowledge can be used to train a classifier and to map a new county with no need of samples indicated in the target… ▽ More In this work, we investigate the application of existing unsupervised domain adaptation (UDA) approaches to the task of transferring knowledge between crop regions having different coffee patterns. Given a geographical region with fully mapped coffee plantations, we observe that this knowledge can be used to train a classifier and to map a new county with no need of samples indicated in the target region. Experimental results show that transferring knowledge via some UDA strategies performs better than just applying a classifier trained in a region to predict coffee crops in a new one. However, UDA methods may lead to negative transfer, which may indicate that domains are too different that transferring knowledge is not appropriate. We also verify that normalization affect significantly some UDA methods; we observe a meaningful complementary contribution between coffee crops data; and a visual behavior suggests an existent of a cluster of samples that are more likely to be drawn from a specific data. △ Less

Submitted 6 June, 2018; originally announced June 2018.

arXiv:1804.04020 [pdf, other]

doi 10.1109/TGRS.2019.2913861

Dynamic Multi-Context Segmentation of Remote Sensing Images based on Convolutional Networks

Authors: Keiller Nogueira, Mauro Dalla Mura, Jocelyn Chanussot, William R. Schwartz, Jefersson A. dos Santos

Abstract: Semantic segmentation requires methods capable of learning high-level features while dealing with large volume of data. Towards such goal, Convolutional Networks can learn specific and adaptable features based on the data. However, these networks are not capable of processing a whole remote sensing image, given its huge size. To overcome such limitation, the image is processed using fixed size pat… ▽ More Semantic segmentation requires methods capable of learning high-level features while dealing with large volume of data. Towards such goal, Convolutional Networks can learn specific and adaptable features based on the data. However, these networks are not capable of processing a whole remote sensing image, given its huge size. To overcome such limitation, the image is processed using fixed size patches. The definition of the input patch size is usually performed empirically (evaluating several sizes) or imposed (by network constraint). Both strategies suffer from drawbacks and could not lead to the best patch size. To alleviate this problem, several works exploited multi-context information by combining networks or layers. This process increases the number of parameters resulting in a more difficult model to train. In this work, we propose a novel technique to perform semantic segmentation of remote sensing images that exploits a multi-context paradigm without increasing the number of parameters while defining, in training time, the best patch size. The main idea is to train a dilated network with distinct patch sizes, allowing it to capture multi-context characteristics from heterogeneous contexts. While processing these varying patches, the network provides a score for each patch size, helping in the definition of the best size for the current scenario. A systematic evaluation of the proposed algorithm is conducted using four high-resolution remote sensing datasets with very distinct properties. Our results show that the proposed algorithm provides improvements in pixelwise classification accuracy when compared to state-of-the-art methods. △ Less

Submitted 22 April, 2019; v1 submitted 11 April, 2018; originally announced April 2018.

Comments: Accepted to Transactions on Geoscience & Remote Sensing (TGRS)

arXiv:1711.06809 [pdf, other]

A Genetic Algorithm Approach for ImageRepresentation Learning through Color Quantization

Authors: Érico M. Pereira, Ricardo da S. Torres, Jefersson A. dos Santos

Abstract: Over the last decades, hand-crafted feature extractors have been used to encode image visual properties into feature vectors. Recently, data-driven feature learning approaches have been successfully explored as alternatives for producing more representative visual features. In this work, we combine both research venues, focusing on the color quantization problem. We propose two data-driven approac… ▽ More Over the last decades, hand-crafted feature extractors have been used to encode image visual properties into feature vectors. Recently, data-driven feature learning approaches have been successfully explored as alternatives for producing more representative visual features. In this work, we combine both research venues, focusing on the color quantization problem. We propose two data-driven approaches to learn image representations through the search for optimized quantization schemes, which lead to more effective feature extraction algorithms and compact representations. Our strategy employs Genetic Algorithm, a soft-computing apparatus successfully utilized in Information-retrieval-related optimization problems. We hypothesize that changing the quantization affects the quality of image description approaches, leading to effective and efficient representations. We evaluate our approaches in content-based image retrieval tasks, considering eight well-known datasets with different visual properties. Results indicate that the approach focused on representation effectiveness outperformed baselines in all tested scenarios. The other approach, which also considers the size of created representations, produced competitive results keeping or even reducing the dimensionality of feature vectors up to 25%. △ Less

Submitted 20 November, 2020; v1 submitted 17 November, 2017; originally announced November 2017.

Comments: Submitted to Multimedia Tools and Applications

Report number: MTAP-D-19-02724R2

arXiv:1711.03564 [pdf, other]

doi 10.1109/LGRS.2018.2845549

Exploiting ConvNet Diversity for Flooding Identification

Authors: Keiller Nogueira, Samuel G. Fadel, Ícaro C. Dourado, Rafael de O. Werneck, Javier A. V. Muñoz, Otávio A. B. Penatti, Rodrigo T. Calumby, Lin Tzy Li, Jefersson A. dos Santos, Ricardo da S. Torres

Abstract: Flooding is the world's most costly type of natural disaster in terms of both economic losses and human causalities. A first and essential procedure towards flood monitoring is based on identifying the area most vulnerable to flooding, which gives authorities relevant regions to focus. In this work, we propose several methods to perform flooding identification in high-resolution remote sensing ima… ▽ More Flooding is the world's most costly type of natural disaster in terms of both economic losses and human causalities. A first and essential procedure towards flood monitoring is based on identifying the area most vulnerable to flooding, which gives authorities relevant regions to focus. In this work, we propose several methods to perform flooding identification in high-resolution remote sensing images using deep learning. Specifically, some proposed techniques are based upon unique networks, such as dilated and deconvolutional ones, while other was conceived to exploit diversity of distinct networks in order to extract the maximum performance of each classifier. Evaluation of the proposed algorithms were conducted in a high-resolution remote sensing dataset. Results show that the proposed algorithms outperformed several state-of-the-art baselines, providing improvements ranging from 1 to 4% in terms of the Jaccard Index. △ Less

Submitted 5 June, 2018; v1 submitted 9 November, 2017; originally announced November 2017.

Comments: Work winner of the Flood-Detection in Satellite Images, a subtask of 2017 Multimedia Satellite Task (MediaEval Benchmark) Accepted for publication in the Geoscience and Remote Sensing Letters (GRSL)

arXiv:1708.06637 [pdf, other]

Activity Recognition based on a Magnitude-Orientation Stream Network

Authors: Carlos Caetano, Victor H. C. de Melo, Jefersson A. dos Santos, William Robson Schwartz

Abstract: The temporal component of videos provides an important clue for activity recognition, as a number of activities can be reliably recognized based on the motion information. In view of that, this work proposes a novel temporal stream for two-stream convolutional networks based on images computed from the optical flow magnitude and orientation, named Magnitude-Orientation Stream (MOS), to learn the m… ▽ More The temporal component of videos provides an important clue for activity recognition, as a number of activities can be reliably recognized based on the motion information. In view of that, this work proposes a novel temporal stream for two-stream convolutional networks based on images computed from the optical flow magnitude and orientation, named Magnitude-Orientation Stream (MOS), to learn the motion in a better and richer manner. Our method applies simple nonlinear transformations on the vertical and horizontal components of the optical flow to generate input images for the temporal stream. Experimental results, carried on two well-known datasets (HMDB51 and UCF101), demonstrate that using our proposed temporal stream as input to existing neural network architectures can improve their performance for activity recognition. Results demonstrate that our temporal stream provides complementary information able to improve the classical two-stream methods, indicating the suitability of our approach to be used as a temporal video representation. △ Less

Submitted 22 August, 2017; originally announced August 2017.

Comments: 8 pages, SIBGRAPI 2017

arXiv:1611.02260 [pdf]

Meat adulteration detection through digital image analysis of histological cuts using LBP

Authors: João J. de Macedo Neto, Jefersson A. dos Santos, William Robson Schwartz

Abstract: Food fraud has been an area of great concern due to its risk to public health, reduction of food quality or nutritional value and for its economic consequences. For this reason, it's been object of regulation in many countries (e.g. [1], [2]). One type of food that has been frequently object of fraud through the addition of water or an aqueous solution is bovine meat. The traditional methods used… ▽ More Food fraud has been an area of great concern due to its risk to public health, reduction of food quality or nutritional value and for its economic consequences. For this reason, it's been object of regulation in many countries (e.g. [1], [2]). One type of food that has been frequently object of fraud through the addition of water or an aqueous solution is bovine meat. The traditional methods used to detect this kind of fraud are expensive, time-consuming and depend on physicochemical analysis that require complex laboratory techniques, specific for each added substance. In this paper, based on digital images of histological cuts of adulterated and not-adulterated (normal) bovine meat, we evaluate the of digital image analysis methods to identify the aforementioned kind of fraud, with focus on the Local Binary Pattern (LBP) algorithm. △ Less

Submitted 7 November, 2016; originally announced November 2016.

arXiv:1602.01517 [pdf, other]

doi 10.1016/j.patcog.2016.07.001

Towards Better Exploiting Convolutional Neural Networks for Remote Sensing Scene Classification

Authors: Keiller Nogueira, Otávio A. B. Penatti, Jefersson A. dos Santos

Abstract: We present an analysis of three possible strategies for exploiting the power of existing convolutional neural networks (ConvNets) in different scenarios from the ones they were trained: full training, fine tuning, and using ConvNets as feature extractors. In many applications, especially including remote sensing, it is not feasible to fully design and train a new ConvNet, as this usually requires… ▽ More We present an analysis of three possible strategies for exploiting the power of existing convolutional neural networks (ConvNets) in different scenarios from the ones they were trained: full training, fine tuning, and using ConvNets as feature extractors. In many applications, especially including remote sensing, it is not feasible to fully design and train a new ConvNet, as this usually requires a considerable amount of labeled data and demands high computational costs. Therefore, it is important to understand how to obtain the best profit from existing ConvNets. We perform experiments with six popular ConvNets using three remote sensing datasets. We also compare ConvNets in each strategy with existing descriptors and with state-of-the-art baselines. Results point that fine tuning tends to be the best performing strategy. In fact, using the features from the fine-tuned ConvNet with linear SVM obtains the best results. We also achieved state-of-the-art results for the three datasets used. △ Less

Submitted 3 February, 2016; originally announced February 2016.

Showing 1–35 of 35 results for author: Santos, J A d