-
YOLO-LAN: Precise Polyp Detection via Optimized Loss, Augmentations and Negatives
Authors:
Siddharth Gupta,
Jitin Singla
Abstract:
Colorectal cancer (CRC), a lethal disease, begins with the growth of abnormal mucosal cell proliferation called polyps in the inner wall of the colon. When left undetected, polyps can become malignant tumors. Colonoscopy is the standard procedure for detecting polyps, as it enables direct visualization and removal of suspicious lesions. Manual detection by colonoscopy can be inconsistent and is su…
▽ More
Colorectal cancer (CRC), a lethal disease, begins with the growth of abnormal mucosal cell proliferation called polyps in the inner wall of the colon. When left undetected, polyps can become malignant tumors. Colonoscopy is the standard procedure for detecting polyps, as it enables direct visualization and removal of suspicious lesions. Manual detection by colonoscopy can be inconsistent and is subject to oversight. Therefore, object detection based on deep learning offers a better solution for a more accurate and real-time diagnosis during colonoscopy. In this work, we propose YOLO-LAN, a YOLO-based polyp detection pipeline, trained using M2IoU loss, versatile data augmentations and negative data to replicate real clinical situations. Our pipeline outperformed existing methods for the Kvasir-seg and BKAI-IGH NeoPolyp datasets, achieving mAP$_{50}$ of 0.9619, mAP$_{50:95}$ of 0.8599 with YOLOv12 and mAP$_{50}$ of 0.9540, mAP$_{50:95}$ of 0.8487 with YOLOv8 on the Kvasir-seg dataset. The significant increase is achieved in mAP$_{50:95}$ score, showing the precision of polyp detection. We show robustness based on polyp size and precise location detection, making it clinically relevant in AI-assisted colorectal screening.
△ Less
Submitted 23 September, 2025;
originally announced September 2025.
-
Population Estimation using Deep Learning over Gandhinagar Urban Area
Authors:
Jai Singla,
Peal Jotania,
Keivalya Pandya
Abstract:
Population estimation is crucial for various applications, from resource allocation to urban planning. Traditional methods such as surveys and censuses are expensive, time-consuming and also heavily dependent on human resources, requiring significant manpower for data collection and processing. In this study a deep learning solution is proposed to estimate population using high resolution (0.3 m)…
▽ More
Population estimation is crucial for various applications, from resource allocation to urban planning. Traditional methods such as surveys and censuses are expensive, time-consuming and also heavily dependent on human resources, requiring significant manpower for data collection and processing. In this study a deep learning solution is proposed to estimate population using high resolution (0.3 m) satellite imagery, Digital Elevation Models (DEM) of 0.5m resolution and vector boundaries. Proposed method combines Convolution Neural Network (CNN) architecture for classification task to classify buildings as residential and non-residential and Artificial Neural Network (ANN) architecture to estimate the population. Approx. 48k building footprints over Gandhinagar urban area are utilized containing both residential and non-residential, with residential categories further used for building-level population estimation. Experimental results on a large-scale dataset demonstrate the effectiveness of our model, achieving an impressive overall F1-score of 0.9936. The proposed system employs advanced geospatial analysis with high spatial resolution to estimate Gandhinagar population at 278,954. By integrating real-time data updates, standardized metrics, and infrastructure planning capabilities, this automated approach addresses critical limitations of conventional census-based methodologies. The framework provides municipalities with a scalable and replicable tool for optimized resource management in rapidly urbanizing cities, showcasing the efficiency of AI-driven geospatial analytics in enhancing data-driven urban governance.
△ Less
Submitted 16 September, 2025;
originally announced September 2025.
-
Comparative Evaluation of Traditional and Deep Learning Feature Matching Algorithms using Chandrayaan-2 Lunar Data
Authors:
R. Makharia,
J. G. Singla,
Amitabh,
N. Dube,
H. Sharma
Abstract:
Accurate image registration is critical for lunar exploration, enabling surface mapping, resource localization, and mission planning. Aligning data from diverse lunar sensors -- optical (e.g., Orbital High Resolution Camera, Narrow and Wide Angle Cameras), hyperspectral (Imaging Infrared Spectrometer), and radar (e.g., Dual-Frequency Synthetic Aperture Radar, Selene/Kaguya mission) -- is challengi…
▽ More
Accurate image registration is critical for lunar exploration, enabling surface mapping, resource localization, and mission planning. Aligning data from diverse lunar sensors -- optical (e.g., Orbital High Resolution Camera, Narrow and Wide Angle Cameras), hyperspectral (Imaging Infrared Spectrometer), and radar (e.g., Dual-Frequency Synthetic Aperture Radar, Selene/Kaguya mission) -- is challenging due to differences in resolution, illumination, and sensor distortion. We evaluate five feature matching algorithms: SIFT, ASIFT, AKAZE, RIFT2, and SuperGlue (a deep learning-based matcher), using cross-modality image pairs from equatorial and polar regions. A preprocessing pipeline is proposed, including georeferencing, resolution alignment, intensity normalization, and enhancements like adaptive histogram equalization, principal component analysis, and shadow correction. SuperGlue consistently yields the lowest root mean square error and fastest runtimes. Classical methods such as SIFT and AKAZE perform well near the equator but degrade under polar lighting. The results highlight the importance of preprocessing and learning-based approaches for robust lunar image registration across diverse conditions.
△ Less
Submitted 4 September, 2025;
originally announced September 2025.
-
Examining quality of DGNSS derived positioning in data in urban city -- A case study of an urban city in India
Authors:
Jai G Singla
Abstract:
GNSS observations are carried out in static mode/ Differential global navigation satellite system (DGNSS) and dynamic mode / Real time Kinematics (RTK) mainly. RTK mode of observation is useful in case of navigation whereas in order to determine very precise positioning, static / DGNSS/ DGPS mode is recommended. In this study, we have examined the quality of DGNSS survey of an urban city in India…
▽ More
GNSS observations are carried out in static mode/ Differential global navigation satellite system (DGNSS) and dynamic mode / Real time Kinematics (RTK) mainly. RTK mode of observation is useful in case of navigation whereas in order to determine very precise positioning, static / DGNSS/ DGPS mode is recommended. In this study, we have examined the quality of DGNSS survey of an urban city in India over ~300 Ground Control Points. Survey is carried out in DGNSS mode with dual frequency mode. All the observations were recorded using GPS, GLONASS , Galileo and Beidu with GDOP values in the range of 1.4 to 2.5. Beidu was used in broadcast ephemeris mode whereas for other constellations, precise orbit ephemeris were obtained from International GNSS service (IGS) site as per the observation day and month. Further, all the data was post processed in the SW suite and positional and vertical accuracies of millimeter to few centimeter level were obtained. This paper describes the approach of Ground Control Point (GCP) identification, surveying, methodology, use of CORS network and data post-processing in order to achieve such a precise accuracies in the urban city.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Semantic segmentation on multi-resolution optical and microwave data using deep learning
Authors:
Jai G Singla,
Bakul Vaghela
Abstract:
Presently, deep learning and convolutional neural networks (CNNs) are widely used in the fields of image processing, image classification, object identification and many more. In this work, we implemented convolutional neural network based modified U-Net model and VGG-UNet model to automatically identify objects from satellite imagery captured using high resolution Indian remote sensing satellites…
▽ More
Presently, deep learning and convolutional neural networks (CNNs) are widely used in the fields of image processing, image classification, object identification and many more. In this work, we implemented convolutional neural network based modified U-Net model and VGG-UNet model to automatically identify objects from satellite imagery captured using high resolution Indian remote sensing satellites and then to pixel wise classify satellite data into various classes. In this paper, Cartosat 2S (~1m spatial resolution) datasets were used and deep learning models were implemented to detect building shapes and ships from the test datasets with an accuracy of more than 95%. In another experiment, microwave data (varied resolution) from RISAT-1 was taken as an input and ships and trees were detected with an accuracy of >96% from these datasets. For the classification of images into multiple-classes, deep learning model was trained on multispectral Cartosat images. Model generated results were then tested using ground truth. Multi-label classification results were obtained with an accuracy (IoU) of better than 95%. Total six different problems were attempted using deep learning models and IoU accuracies in the range of 85% to 98% were achieved depending on the degree of complexity.
△ Less
Submitted 12 November, 2024;
originally announced November 2024.
-
Classification of residential and non-residential buildings based on satellite data using deep learning
Authors:
Jai G Singla
Abstract:
Accurate classification of buildings into residential and non-residential categories is crucial for urban planning, infrastructure development, population estimation and resource allocation. It is a complex job to carry out automatic classification of residential and nonresidential buildings manually using satellite data. In this paper, we are proposing a novel deep learning approach that combines…
▽ More
Accurate classification of buildings into residential and non-residential categories is crucial for urban planning, infrastructure development, population estimation and resource allocation. It is a complex job to carry out automatic classification of residential and nonresidential buildings manually using satellite data. In this paper, we are proposing a novel deep learning approach that combines high-resolution satellite data (50 cm resolution Image + 1m grid interval DEM) and vector data to achieve high-performance building classification. Our architecture leverages LeakyReLU and ReLU activations to capture nonlinearities in the data and employs feature-engineering techniques to eliminate highly correlated features, resulting in improved computational efficiency. Experimental results on a large-scale dataset demonstrate the effectiveness of our model, achieving an impressive overall F1 -score of 0.9936. The proposed approach offers a scalable and accurate solution for building classification, enabling informed decision-making in urban planning and resource allocation. This research contributes to the field of urban analysis by providing a valuable tool for understanding the built environment and optimizing resource utilization.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Population estimation using 3D city modelling and Carto2S datasets -- A case study
Authors:
Jai G Singla
Abstract:
With the launch of Carto2S series of satellites, high resolution images (0.6-1.0 meters) are acquired and available for use. High resolution Digital Elevation Model (DEM) with better accuracies can be generated using C2S multi-view and multi date datasets. DEMs are further used as an input to derive Digital terrain models (DTMs) and to extract accurate heights of the objects (building and tree) ov…
▽ More
With the launch of Carto2S series of satellites, high resolution images (0.6-1.0 meters) are acquired and available for use. High resolution Digital Elevation Model (DEM) with better accuracies can be generated using C2S multi-view and multi date datasets. DEMs are further used as an input to derive Digital terrain models (DTMs) and to extract accurate heights of the objects (building and tree) over the surface of the Earth. Extracted building heights are validated with ground control points and can be used for generation of city modelling and resource estimation like population estimation, health planning, water and transport resource estimations. In this study, an attempt is made to assess the population of a township using high-resolution Indian remote sensing satellite datasets. We used Carto 2S multi-view data and generated a precise DEM and DTM over a city area. Using DEM and DTM datasets, accurate heights of the buildings are extracted which are further validated with ground data. Accurate building heights and high resolution imagery are used for generating accurate virtual 3D city model and assessing the number of floor and carpet area of the houses/ flats/ apartments. Population estimation of the area is made using derived information of no of houses/ flats/ apartments from the satellite datasets. Further, information about number of hospital and schools around the residential area is extracted from open street maps (OSM). Population estimation using satellite data and derived information from OSM datasets can prove to be very good tool for local administrator and decision makers.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Solar potential analysis over Indian cities using high-resolution satellite imagery and DEM
Authors:
Jai Singla
Abstract:
Most of the research work in the solar potential analysis is performed utilizing aerial imagery, LiDAR data, and satellite imagery. However, in the existing studies using satellite data, parameters such as trees/ vegetation shadow, adjacent higher architectural structures, and eccentric roof structures in urban areas were not considered, and relatively coarser-resolution datasets were used for ana…
▽ More
Most of the research work in the solar potential analysis is performed utilizing aerial imagery, LiDAR data, and satellite imagery. However, in the existing studies using satellite data, parameters such as trees/ vegetation shadow, adjacent higher architectural structures, and eccentric roof structures in urban areas were not considered, and relatively coarser-resolution datasets were used for analysis. In this work, we have implemented a novel approach to estimate rooftop solar potential using inputs of high-resolution satellite imagery (0.5 cm), a digital elevation model (1m), along with ground station radiation data. Solar radiation analysis is performed using the diffusion proportion and transmissivity ratio derived from the ground station data hosted by IMD. It was observed that due to seasonal variations, environmental effects and technical reasons such as solar panel structure etc., there can be a significant loss of electricity generation up to 50%. Based on the results, it is also understood that using 1m DEM and 50cm satellite imagery, more authentic results are produced over the urban areas.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Tree level change detection over Ahmedabad city using very high resolution satellite images and Deep Learning
Authors:
Jai G Singla,
Gautam Jaiswal
Abstract:
In this study, 0.5m high resolution satellite datasets over Indian urban region was used to demonstrate the applicability of deep learning models over Ahmedabad, India. Here, YOLOv7 instance segmentation model was trained on well curated trees canopy dataset (6500 images) in order to carry out the change detection. During training, evaluation metrics such as bounding box regression and mask regres…
▽ More
In this study, 0.5m high resolution satellite datasets over Indian urban region was used to demonstrate the applicability of deep learning models over Ahmedabad, India. Here, YOLOv7 instance segmentation model was trained on well curated trees canopy dataset (6500 images) in order to carry out the change detection. During training, evaluation metrics such as bounding box regression and mask regression loss, mean average precision (mAP) and stochastic gradient descent algorithm were used for evaluating and optimizing the performance of model. After the 500 epochs, the mAP of 0.715 and 0.699 for individual tree detection and tree canopy mask segmentation were obtained. However, by further tuning hyper parameters of the model, maximum accuracy of 80 % of trees detection with false segmentation rate of 2% on data was obtained.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
SAPG: Split and Aggregate Policy Gradients
Authors:
Jayesh Singla,
Ananye Agarwal,
Deepak Pathak
Abstract:
Despite extreme sample inefficiency, on-policy reinforcement learning, aka policy gradients, has become a fundamental tool in decision-making problems. With the recent advances in GPU-driven simulation, the ability to collect large amounts of data for RL training has scaled exponentially. However, we show that current RL methods, e.g. PPO, fail to ingest the benefit of parallelized environments be…
▽ More
Despite extreme sample inefficiency, on-policy reinforcement learning, aka policy gradients, has become a fundamental tool in decision-making problems. With the recent advances in GPU-driven simulation, the ability to collect large amounts of data for RL training has scaled exponentially. However, we show that current RL methods, e.g. PPO, fail to ingest the benefit of parallelized environments beyond a certain point and their performance saturates. To address this, we propose a new on-policy RL algorithm that can effectively leverage large-scale environments by splitting them into chunks and fusing them back together via importance sampling. Our algorithm, termed SAPG, shows significantly higher performance across a variety of challenging environments where vanilla PPO and other strong baselines fail to achieve high performance. Website at https://sapg-rl.github.io/
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
On Learning with LAD
Authors:
C. A. Jothishwaran,
Biplav Srivastava,
Jitin Singla,
Sugata Gangopadhyay
Abstract:
The logical analysis of data, LAD, is a technique that yields two-class classifiers based on Boolean functions having disjunctive normal form (DNF) representation. Although LAD algorithms employ optimization techniques, the resulting binary classifiers or binary rules do not lead to overfitting. We propose a theoretical justification for the absence of overfitting by estimating the Vapnik-Chervone…
▽ More
The logical analysis of data, LAD, is a technique that yields two-class classifiers based on Boolean functions having disjunctive normal form (DNF) representation. Although LAD algorithms employ optimization techniques, the resulting binary classifiers or binary rules do not lead to overfitting. We propose a theoretical justification for the absence of overfitting by estimating the Vapnik-Chervonenkis dimension (VC dimension) for LAD models where hypothesis sets consist of DNFs with a small number of cubic monomials. We illustrate and confirm our observations empirically.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Sāmayik: A Benchmark and Dataset for English-Sanskrit Translation
Authors:
Ayush Maheshwari,
Ashim Gupta,
Amrith Krishna,
Atul Kumar Singh,
Ganesh Ramakrishnan,
G. Anil Kumar,
Jitin Singla
Abstract:
We release Sāmayik, a dataset of around 53,000 parallel English-Sanskrit sentences, written in contemporary prose. Sanskrit is a classical language still in sustenance and has a rich documented heritage. However, due to the limited availability of digitized content, it still remains a low-resource language. Existing Sanskrit corpora, whether monolingual or bilingual, have predominantly focused on…
▽ More
We release Sāmayik, a dataset of around 53,000 parallel English-Sanskrit sentences, written in contemporary prose. Sanskrit is a classical language still in sustenance and has a rich documented heritage. However, due to the limited availability of digitized content, it still remains a low-resource language. Existing Sanskrit corpora, whether monolingual or bilingual, have predominantly focused on poetry and offer limited coverage of contemporary written materials. Sāmayik is curated from a diverse range of domains, including language instruction material, textual teaching pedagogy, and online tutorials, among others. It stands out as a unique resource that specifically caters to the contemporary usage of Sanskrit, with a primary emphasis on prose writing. Translation models trained on our dataset demonstrate statistically significant improvements when translating out-of-domain contemporary corpora, outperforming models trained on older classical-era poetry datasets. Finally, we also release benchmark models by adapting four multilingual pre-trained models, three of them have not been previously exposed to Sanskrit for translating between English and Sanskrit while one of them is multi-lingual pre-trained translation model including English and Sanskrit. The dataset and source code is present at https://github.com/ayushbits/saamayik.
△ Less
Submitted 29 March, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.