-
Enhancing Fruit and Vegetable Detection in Unconstrained Environment with a Novel Dataset
Authors:
Sandeep Khanna,
Chiranjoy Chattopadhyay,
Suman Kundu
Abstract:
Automating the detection of fruits and vegetables using computer vision is essential for modernizing agriculture, improving efficiency, ensuring food quality, and contributing to technologically advanced and sustainable farming practices. This paper presents an end-to-end pipeline for detecting and localizing fruits and vegetables in real-world scenarios. To achieve this, we have curated a dataset…
▽ More
Automating the detection of fruits and vegetables using computer vision is essential for modernizing agriculture, improving efficiency, ensuring food quality, and contributing to technologically advanced and sustainable farming practices. This paper presents an end-to-end pipeline for detecting and localizing fruits and vegetables in real-world scenarios. To achieve this, we have curated a dataset named FRUVEG67 that includes images of 67 classes of fruits and vegetables captured in unconstrained scenarios, with only a few manually annotated samples per class. We have developed a semi-supervised data annotation algorithm (SSDA) that generates bounding boxes for objects to label the remaining non-annotated images. For detection, we introduce the Fruit and Vegetable Detection Network (FVDNet), an ensemble version of YOLOv7 featuring three distinct grid configurations. We employ an averaging approach for bounding-box prediction and a voting mechanism for class prediction. We have integrated Jensen-Shannon divergence (JSD) in conjunction with focal loss to better detect smaller objects. Our experimental results highlight the superiority of FVDNet compared to previous versions of YOLO, showcasing remarkable improvements in detection and localization performance. We achieved an impressive mean average precision (mAP) score of 0.78 across all classes. Furthermore, we evaluated the efficacy of FVDNet using open-category refrigerator images, where it demonstrates promising results.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
INDoRI: Indian Dataset of Recipes and Ingredients and its Ingredient Network
Authors:
Sandeep Khanna,
Chiranjoy Chattopadhyay,
Suman Kundu
Abstract:
Exploring and comprehending the culinary heritage of a nation holds a captivating allure. It offers insights into the structure and qualities of its cuisine. The endeavor becomes more accessible with the availability of a well-organized dataset. In this paper, we present the introduction of INDoRI (Indian Dataset of Recipes and Ingredients), a compilation drawn from seven distinct online platforms…
▽ More
Exploring and comprehending the culinary heritage of a nation holds a captivating allure. It offers insights into the structure and qualities of its cuisine. The endeavor becomes more accessible with the availability of a well-organized dataset. In this paper, we present the introduction of INDoRI (Indian Dataset of Recipes and Ingredients), a compilation drawn from seven distinct online platforms, representing 18 regions within the Indian subcontinent. This comprehensive geographical span ensures a portrayal of the rich variety within culinary practices. Furthermore, we introduce a unique collection of stop words, referred to as ISW (Ingredient Stop Words), manually tuned for the culinary domain. We assess the validity of ISW in the context of global cuisines beyond Indian culinary tradition. Subsequently, an ingredient network (InN) is constructed, highlighting interconnections among ingredients sourced from different recipes. We delve into both the defining attributes of INDoRI and the communal dimensions of InN. Additionally, we outline the potential applications that can be developed leveraging this dataset. Addressing one of the applications, we demonstrated a research problem on InN with a simple weighted community detection algorithm. Furthermore, we provide a comparative analysis of the results obtained with this algorithm against those generated by two baselines.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Knowledge driven Description Synthesis for Floor Plan Interpretation
Authors:
Shreya Goyal,
Chiranjoy Chattopadhyay,
Gaurav Bhatnagar
Abstract:
Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, r…
▽ More
Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, Description Synthesis from Image Cue (DSIC) and Transformer Based Description Generation (TBDG), for the floor plan image to text generation to fill the gaps in existing methods. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model's superiority.
△ Less
Submitted 15 March, 2021;
originally announced March 2021.
-
GRIHA: Synthesizing 2-Dimensional Building Layouts from Images Captured using a Smart Phone
Authors:
Shreya Goyal,
Naimul Khan,
Chiranjoy Chattopadhyay,
Gaurav Bhatnagar
Abstract:
Reconstructing an indoor scene and generating a layout/floor plan in 3D or 2D is a widely known problem. Quite a few algorithms have been proposed in the literature recently. However, most existing methods either use RGB-D images, thus requiring a depth camera, or depending on panoramic photos, assuming that there is little to no occlusion in the rooms. In this work, we proposed GRIHA (Generating…
▽ More
Reconstructing an indoor scene and generating a layout/floor plan in 3D or 2D is a widely known problem. Quite a few algorithms have been proposed in the literature recently. However, most existing methods either use RGB-D images, thus requiring a depth camera, or depending on panoramic photos, assuming that there is little to no occlusion in the rooms. In this work, we proposed GRIHA (Generating Room Interior of a House using ARCore), a framework for generating a layout using an RGB image captured using a simple mobile phone camera. We take advantage of Simultaneous Localization and Mapping (SLAM) to assess the 3D transformations required for layout generation. SLAM technology is built-in in recent mobile libraries such as ARCore by Google. Hence, the proposed method is fast and efficient. It gives the user freedom to generate layout by merely taking a few conventional photos, rather than relying on specialized depth hardware or occlusion-free panoramic images. We have compared GRIHA with other existing methods and obtained superior results. Also, the system is tested on multiple hardware platforms to test the dependency and efficiency.
△ Less
Submitted 15 March, 2021;
originally announced March 2021.
-
Antara: An Interactive 3D Volume Rendering and Visualization Framework
Authors:
Pratik Kalshetti,
Parag Rahangdale,
Dinesh Jangra,
Manas Bundele,
Chiranjoy Chattopadhyay
Abstract:
The goal of 3D visualization is to provide the user with an intuitive interface which enables him to explore the 3D data in an interactive manner. The aim of the exploration is to identify and analyze anomalies or to give proof of the non-anomaly of the visualized organic structures. For 3D Medical Data, Magnetic Resonance Images (MRI) has been used. To create the 3D model, we used the Direct Volu…
▽ More
The goal of 3D visualization is to provide the user with an intuitive interface which enables him to explore the 3D data in an interactive manner. The aim of the exploration is to identify and analyze anomalies or to give proof of the non-anomaly of the visualized organic structures. For 3D Medical Data, Magnetic Resonance Images (MRI) has been used. To create the 3D model, we used the Direct Volume Rendering technique. In the input 3D data, we have $x, y$ and $z$ coordinates and an intensity value for each voxel. The 3D data is used by Volume Ray Casting to compute 2D projections from 3D volumetric data sets. In ray casting, a ray of light is made to pass through the volume data. The interaction of each voxel with this ray is used to assign RGB and alpha values for every voxel in the volume. As a result, we are able to generate the 3D model of the region of interest using the 3D data. The 3D model is interactive, thus enabling us to visualize the different layers of the 3D volume by adjusting the transfer function.
△ Less
Submitted 11 December, 2018;
originally announced December 2018.
-
Automatic Feature Weight Determination using Indexing and Pseudo-Relevance Feedback for Multi-feature Content-Based Image Retrieval
Authors:
Asheet Kumar,
Shivam Choudhary,
Vaibhav Singh Khokhar,
Vikas Meena,
Chiranjoy Chattopadhyay
Abstract:
Content-based image retrieval (CBIR) is one of the most active research areas in multimedia information retrieval. Given a query image, the task is to search relevant images in a repository. Low level features like color, texture, and shape feature vectors of an image are always considered to be an important attribute in CBIR system. Thus the performance of the CBIR system can be enhanced by combi…
▽ More
Content-based image retrieval (CBIR) is one of the most active research areas in multimedia information retrieval. Given a query image, the task is to search relevant images in a repository. Low level features like color, texture, and shape feature vectors of an image are always considered to be an important attribute in CBIR system. Thus the performance of the CBIR system can be enhanced by combining these feature vectors. In this paper, we propose a novel CBIR framework by applying to index using multiclass SVM and finding the appropriate weights of the individual features automatically using the relevance ratio and mean difference. We have taken four feature descriptors to represent color, texture and shape features. During retrieval, feature vectors of query image are combined, weighted and compared with feature vectors of images in the database to rank order the results. Experiments were performed on four benchmark datasets and performance is compared with existing techniques to validate the superiority of our proposed framework.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
SUGAMAN: Describing Floor Plans for Visually Impaired by Annotation Learning and Proximity based Grammar
Authors:
Shreya Goyal,
Satya Bhavsar,
Shreya Patel,
Chiranjoy Chattopadhyay,
Gaurav Bhatnagar
Abstract:
In this paper, we propose SUGAMAN (Supervised and Unified framework using Grammar and Annotation Model for Access and Navigation). SUGAMAN is a Hindi word meaning "easy passage from one place to another". SUGAMAN synthesizes textual description from a given floor plan image for the visually impaired. A visually impaired person can navigate in an indoor environment using the textual description gen…
▽ More
In this paper, we propose SUGAMAN (Supervised and Unified framework using Grammar and Annotation Model for Access and Navigation). SUGAMAN is a Hindi word meaning "easy passage from one place to another". SUGAMAN synthesizes textual description from a given floor plan image for the visually impaired. A visually impaired person can navigate in an indoor environment using the textual description generated by SUGAMAN. With the help of a text reader software, the target user can understand the rooms within the building and arrangement of furniture to navigate. SUGAMAN is the first framework for describing a floor plan and giving direction for obstacle-free movement within a building. We learn $5$ classes of room categories from $1355$ room image samples under a supervised learning paradigm. These learned annotations are fed into a description synthesis framework to yield a holistic description of a floor plan image. We demonstrate the performance of various supervised classifiers on room learning. We also provide a comparative analysis of system generated and human written descriptions. SUGAMAN gives state of the art performance on challenging, real-world floor plan images. This work can be applied to areas like understanding floor plans of historical monuments, stability analysis of buildings, and retrieval.
△ Less
Submitted 14 November, 2018;
originally announced December 2018.
-
Automatic Rendering of Building Floor Plan Images from Textual Descriptions in English
Authors:
Mahak Jain,
Anurag Sanyal,
Shreya Goyal,
Chiranjoy Chattopadhyay,
Gaurav Bhatnagar
Abstract:
Human beings understand natural language description and could able to imagine a corresponding visual for the same. For example, given a description of the interior of a house, we could imagine its structure and arrangements of furniture. Automatic synthesis of real-world images from text descriptions has been explored in the computer vision community. However, there is no such attempt in the area…
▽ More
Human beings understand natural language description and could able to imagine a corresponding visual for the same. For example, given a description of the interior of a house, we could imagine its structure and arrangements of furniture. Automatic synthesis of real-world images from text descriptions has been explored in the computer vision community. However, there is no such attempt in the area of document images, like floor plans. Floor plan synthesis from sketches, as well as data-driven models, were proposed earlier. Ours is the first attempt to render building floor plan images from textual description automatically. Here, the input is a natural language description of the internal structure and furniture arrangements within a house, and the output is the 2D floor plan image of the same. We have experimented on publicly available benchmark floor plan datasets. We were able to render realistic synthesized floor plan images from the description written in English.
△ Less
Submitted 28 November, 2018;
originally announced November 2018.
-
Siamese LSTM based Fiber Structural Similarity Network (FS2Net) for Rotation Invariant Brain Tractography Segmentation
Authors:
Shreyas Malakarjun Patil,
Aditya Nigam,
Arnav Bhavsar,
Chiranjoy Chattopadhyay
Abstract:
In this paper, we propose a novel deep learning architecture combining stacked Bi-directional LSTM and LSTMs with the Siamese network architecture for segmentation of brain fibers, obtained from tractography data, into anatomically meaningful clusters. The proposed network learns the structural difference between fibers of different classes, which enables it to classify fibers with high accuracy.…
▽ More
In this paper, we propose a novel deep learning architecture combining stacked Bi-directional LSTM and LSTMs with the Siamese network architecture for segmentation of brain fibers, obtained from tractography data, into anatomically meaningful clusters. The proposed network learns the structural difference between fibers of different classes, which enables it to classify fibers with high accuracy. Importantly, capturing such deep inter and intra class structural relationship also ensures that the segmentation is robust to relative rotation among test and training data, hence can be used with unregistered data. Our extensive experimentation over order of hundred-thousands of fibers show that the proposed model achieves state-of-the-art results, even in cases of large relative rotations between test and training data.
△ Less
Submitted 28 December, 2017;
originally announced December 2017.
-
An Interactive Medical Image Segmentation Framework Using Iterative Refinement
Authors:
Pratik Kalshetti,
Manas Bundele,
Parag Rahangdale,
Dinesh Jangra,
Chiranjoy Chattopadhyay,
Gaurav Harit,
Abhay Elhence
Abstract:
Image segmentation is often performed on medical images for identifying diseases in clinical evaluation. Hence it has become one of the major research areas. Conventional image segmentation techniques are unable to provide satisfactory segmentation results for medical images as they contain irregularities. They need to be pre-processed before segmentation. In order to obtain the most suitable meth…
▽ More
Image segmentation is often performed on medical images for identifying diseases in clinical evaluation. Hence it has become one of the major research areas. Conventional image segmentation techniques are unable to provide satisfactory segmentation results for medical images as they contain irregularities. They need to be pre-processed before segmentation. In order to obtain the most suitable method for medical image segmentation, we propose a two stage algorithm. The first stage automatically generates a binary marker image of the region of interest using mathematical morphology. This marker serves as the mask image for the second stage which uses GrabCut on the input image thus resulting in an efficient segmented result. The obtained result can be further refined by user interaction which can be done using the Graphical User Interface (GUI). Experimental results show that the proposed method is accurate and provides satisfactory segmentation results with minimum user interaction on medical as well as natural images.
△ Less
Submitted 4 June, 2016;
originally announced June 2016.
-
Encoding by DNA Relations and Randomization Through Chaotic Sequences for Image Encryption
Authors:
Chiranjoy Chattopadhyay,
Bikramjit Sarkar,
Debaprasad Mukherjee
Abstract:
Researchers in the field of DNA-based chaotic cryptography have recently proposed a set of novel and efficient image encryption algorithms. In this paper, we present a comprehensive summary of those techniques, which are available in the literature. The discussion given in this paper is grouped into three main areas. At first, we give a brief sketch of the backbone architecture and the theoretical…
▽ More
Researchers in the field of DNA-based chaotic cryptography have recently proposed a set of novel and efficient image encryption algorithms. In this paper, we present a comprehensive summary of those techniques, which are available in the literature. The discussion given in this paper is grouped into three main areas. At first, we give a brief sketch of the backbone architecture and the theoretical foundation of this field, based on which all the algorithms were proposed. Next, we briefly discuss the set of image encryption algorithms based on this architecture and categorized them as either encryption or cryptanalyzing techniques. Finally, we present the different evaluation metrics used to quantitatively measure the performance of such algorithms. We also discuss the characteristic differences among these algorithms. We further highlight the potential advances that are needed to improvise the present state-of-the-art image encryption technique using DNA computing and chaos theory. The primary objective of this survey is to provide researchers in the field of DNA computing and chaos theory based image encryption a comprehensive summary of the progress achieved so far and to facilitate them to identify a few challenging future research areas.
△ Less
Submitted 7 May, 2015;
originally announced May 2015.