-
MemeSense: An Adaptive In-Context Framework for Social Commonsense Driven Meme Moderation
Authors:
Sayantan Adak,
Somnath Banerjee,
Rajarshi Mandal,
Avik Halder,
Sayan Layek,
Rima Hazra,
Animesh Mukherjee
Abstract:
Memes present unique moderation challenges due to their subtle, multimodal interplay of images, text, and social context. Standard systems relying predominantly on explicit textual cues often overlook harmful content camouflaged by irony, symbolism, or cultural references. To address this gap, we introduce MemeSense, an adaptive in-context learning framework that fuses social commonsense reasoning…
▽ More
Memes present unique moderation challenges due to their subtle, multimodal interplay of images, text, and social context. Standard systems relying predominantly on explicit textual cues often overlook harmful content camouflaged by irony, symbolism, or cultural references. To address this gap, we introduce MemeSense, an adaptive in-context learning framework that fuses social commonsense reasoning with visually and semantically related reference examples. By encoding crucial task information into a learnable cognitive shift vector, MemeSense effectively balances lexical, visual, and ethical considerations, enabling precise yet context-aware meme intervention. Extensive evaluations on a curated set of implicitly harmful memes demonstrate that MemeSense substantially outperforms strong baselines, paving the way for safer online communities. Code and data available at: https://github.com/sayantan11995/MemeSense
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models
Authors:
Somnath Banerjee,
Sayan Layek,
Hari Shrawgi,
Rajarshi Mandal,
Avik Halder,
Shanu Kumar,
Sagnik Basu,
Parag Agrawal,
Rima Hazra,
Animesh Mukherjee
Abstract:
As LLMs are increasingly deployed in global applications, the importance of cultural sensitivity becomes paramount, ensuring that users from diverse backgrounds feel respected and understood. Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values. This work addresses the challenges of ensuring cultural…
▽ More
As LLMs are increasingly deployed in global applications, the importance of cultural sensitivity becomes paramount, ensuring that users from diverse backgrounds feel respected and understood. Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values. This work addresses the challenges of ensuring cultural sensitivity in LLMs, especially in small-parameter models that often lack the extensive training data needed to capture global cultural nuances. We present two key contributions: (1) A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and (2) A culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators. These datasets facilitate the evaluation and enhancement of LLMs, ensuring their ethical and safe deployment across different cultural landscapes. Our results show that integrating culturally aligned feedback leads to a marked improvement in model behavior, significantly reducing the likelihood of generating culturally insensitive or harmful content. Ultimately, this work paves the way for more inclusive and respectful AI systems, fostering a future where LLMs can safely and ethically navigate the complexities of diverse cultural landscapes.
△ Less
Submitted 24 January, 2025; v1 submitted 15 October, 2024;
originally announced October 2024.
-
New Lower Bound and Algorithms for Online Geometric Hitting Set Problem
Authors:
Minati De,
Ratnadip Mandal,
Satyam Singh
Abstract:
The hitting set problem is one of the fundamental problems in combinatorial optimization and is well-studied in offline setup. We consider the online hitting set problem, where only the set of points is known in advance, and objects are introduced one by one. Our objective is to maintain a minimum-sized hitting set by making irrevocable decisions. Here, we present the study of two variants of the…
▽ More
The hitting set problem is one of the fundamental problems in combinatorial optimization and is well-studied in offline setup. We consider the online hitting set problem, where only the set of points is known in advance, and objects are introduced one by one. Our objective is to maintain a minimum-sized hitting set by making irrevocable decisions. Here, we present the study of two variants of the online hitting set problem depending on the point set. In the first variant, we consider the point set to be the entire $\mathbb{Z}^d$, while in the second variant, we consider the point set to be a finite subset of $\mathbb{R}^2$.
If you use points in $\mathbb{Z}^d$ to hit homothetic hypercubes in $\mathbb{R}^d$ with side lengths in $[1,M]$, we show that the competitive ratio of any algorithm is $Ω(d\log M)$, whether it is deterministic or random. This improves the recently known deterministic lower bound of $Ω(\log M)$ by a factor of $d$. Then, we present an almost tight randomized algorithm with a competitive ratio $O(d^2\log M)$ that significantly improves the best-known competitive ratio of $25^d\log M$. Next, we propose a simple deterministic ${\lfloor\frac{2}α+2\rfloor^d}(\lfloor\log_{2}M\rfloor+1)$ competitive algorithm to hit similarly sized {$α$-fat objects} in $\mathbb{R}^d$ having diameters in the range $[1, M]$ using points in $\mathbb{Z}^d$. This improves the current best-known upper bound by a factor of at least $5^d$.
Finally, we consider the hitting set problem when the point set consists of $n$ points in $\mathbb{R}^2$, and the objects are homothetic regular $k$-gons having diameter in the range $[1, M]$. We present an $O(\log n\log M)$ competitive randomized algorithm for that. Whereas no result was known even for squares. In particular, our results answer some of the open questions raised by Khan et al. (SoCG'23) and Alefkhani et al. (WAOA'23).
△ Less
Submitted 1 October, 2024; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance
Authors:
Somnath Banerjee,
Avik Halder,
Rajarshi Mandal,
Sayan Layek,
Ian Soboroff,
Rima Hazra,
Animesh Mukherjee
Abstract:
The integration of pretrained language models (PLMs) like BERT and GPT has revolutionized NLP, particularly for English, but it has also created linguistic imbalances. This paper strategically identifies the need for linguistic equity by examining several knowledge editing techniques in multilingual contexts. We evaluate the performance of models such as Mistral, TowerInstruct, OpenHathi, Tamil-Ll…
▽ More
The integration of pretrained language models (PLMs) like BERT and GPT has revolutionized NLP, particularly for English, but it has also created linguistic imbalances. This paper strategically identifies the need for linguistic equity by examining several knowledge editing techniques in multilingual contexts. We evaluate the performance of models such as Mistral, TowerInstruct, OpenHathi, Tamil-Llama, and Kan-Llama across languages including English, German, French, Italian, Spanish, Hindi, Tamil, and Kannada. Our research identifies significant discrepancies in normal and merged models concerning cross-lingual consistency. We employ strategies like 'each language for itself' (ELFI) and 'each language for others' (ELFO) to stress-test these models. Our findings demonstrate the potential for LLMs to overcome linguistic barriers, laying the groundwork for future research in achieving linguistic inclusivity in AI technologies.
△ Less
Submitted 18 March, 2025; v1 submitted 16 June, 2024;
originally announced June 2024.
-
Training Deep 3D Convolutional Neural Networks to Extract BSM Physics Parameters Directly from HEP Data: a Proof-of-Concept Study Using Monte Carlo Simulations
Authors:
S. Dubey,
T. E. Browder,
S. Kohani,
R. Mandal,
A. Sibidanov,
R. Sinha
Abstract:
We report on a novel application of computer vision techniques to extract beyond the Standard Model parameters directly from high energy physics flavor data. We propose a simple but novel data representation that transforms the angular and kinematic distributions into "quasi-images", which are used to train a convolutional neural network to perform regression tasks, similar to fitting. As a proof-…
▽ More
We report on a novel application of computer vision techniques to extract beyond the Standard Model parameters directly from high energy physics flavor data. We propose a simple but novel data representation that transforms the angular and kinematic distributions into "quasi-images", which are used to train a convolutional neural network to perform regression tasks, similar to fitting. As a proof-of-concept, we train a 34-layer Residual Neural Network to regress on these images and determine information about the Wilson Coefficient $C_{9}$ in Monte Carlo simulations of $B^0 \rightarrow K^{*0}μ^{+}μ^{-}$ decays. The method described here can be generalized and may find applicability across a variety of experiments.
△ Less
Submitted 15 November, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
-
Hallucination Reduction in Long Input Text Summarization
Authors:
Tohida Rehman,
Ronit Mandal,
Abhishek Agarwal,
Debarshi Kumar Sanyal
Abstract:
Hallucination in text summarization refers to the phenomenon where the model generates information that is not supported by the input source document. Hallucination poses significant obstacles to the accuracy and reliability of the generated summaries. In this paper, we aim to reduce hallucinated outputs or hallucinations in summaries of long-form text documents. We have used the PubMed dataset, w…
▽ More
Hallucination in text summarization refers to the phenomenon where the model generates information that is not supported by the input source document. Hallucination poses significant obstacles to the accuracy and reliability of the generated summaries. In this paper, we aim to reduce hallucinated outputs or hallucinations in summaries of long-form text documents. We have used the PubMed dataset, which contains long scientific research documents and their abstracts. We have incorporated the techniques of data filtering and joint entity and summary generation (JAENS) in the fine-tuning of the Longformer Encoder-Decoder (LED) model to minimize hallucinations and thereby improve the quality of the generated summary. We have used the following metrics to measure factual consistency at the entity level: precision-source, and F1-target. Our experiments show that the fine-tuned LED model performs well in generating the paper abstract. Data filtering techniques based on some preprocessing steps reduce entity-level hallucinations in the generated summaries in terms of some of the factual consistency metrics.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Online Class Cover Problem
Authors:
Minati De,
Anil Maheshwari,
Ratnadip Mandal
Abstract:
In this paper, we study the online class cover problem where a (finite or infinite) family $\cal F$ of geometric objects and a set ${\cal P}_r$ of red points in $\mathbb{R}^d$ are given a prior, and blue points from $\mathbb{R}^d$ arrives one after another. Upon the arrival of a blue point, the online algorithm must make an irreversible decision to cover it with objects from $\cal F$ that do not c…
▽ More
In this paper, we study the online class cover problem where a (finite or infinite) family $\cal F$ of geometric objects and a set ${\cal P}_r$ of red points in $\mathbb{R}^d$ are given a prior, and blue points from $\mathbb{R}^d$ arrives one after another. Upon the arrival of a blue point, the online algorithm must make an irreversible decision to cover it with objects from $\cal F$ that do not cover any points of ${\cal P}_r$. The objective of the problem is to place a minimum number of objects. When $\cal F$ consists of axis-parallel unit squares in $\mathbb{R}^2$, we prove that the competitive ratio of any deterministic online algorithm is $Ω(\log |{\cal P}_r|)$, and also propose an $O(\log |{\cal P}_r|)$-competitive deterministic algorithm for the problem.
△ Less
Submitted 3 July, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.
-
Online Geometric Hitting Set and Set Cover Beyond Unit Balls in $\mathbb{R}^2$
Authors:
Minati De,
Ratnadip Mandal,
Satyam Singh
Abstract:
We investigate the geometric hitting set problem in the online setup for the range space $Σ=({\cal P},{\cal S})$, where the set $¶\subset\mathbb{R}^2$ is a collection of $n$ points and the set $\cal S$ is a family of geometric objects in $\mathbb{R}^2$. In the online setting, the geometric objects arrive one by one. Upon the arrival of an object, an online algorithm must maintain a valid hitting s…
▽ More
We investigate the geometric hitting set problem in the online setup for the range space $Σ=({\cal P},{\cal S})$, where the set $¶\subset\mathbb{R}^2$ is a collection of $n$ points and the set $\cal S$ is a family of geometric objects in $\mathbb{R}^2$. In the online setting, the geometric objects arrive one by one. Upon the arrival of an object, an online algorithm must maintain a valid hitting set by making an irreversible decision, i.e., once a point is added to the hitting set by the algorithm, it can not be deleted in the future. The objective of the geometric hitting set problem is to find a hitting set of the minimum cardinality. Even and Smorodinsky (Discret. Appl. Math., 2014) considered an online model (Model-I) in which the range space $Σ$ is known in advance, but the order of arrival of the input objects in $\cal S$ is unknown. They proposed online algorithms having optimal competitive ratios of $Θ(\log n)$ for intervals, half-planes and unit disks in $\mathbb{R}^2$. Whether such an algorithm exists for unit squares remained open for a long time. This paper considers an online model (Model-II) in which the entire range space $Σ$ is not known in advance. We only know the set $\cal P$ but not the set $\cal S$ in advance. Note that any algorithm for Model-II will also work for Model-I, but not vice-versa. In Model-II, we obtain an optimal competitive ratio of $Θ(\log(n))$ for unit disks and regular $k$-gon with $k\geq 4$ in $\mathbb{R}^2$. All the above-mentioned results also hold for the equivalent geometric set cover problem in Model-II.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Context-based Deep Learning Architecture with Optimal Integration Layer for Image Parsing
Authors:
Ranju Mandal,
Basim Azam,
Brijesh Verma
Abstract:
Deep learning models have been efficient lately on image parsing tasks. However, deep learning models are not fully capable of exploiting visual and contextual information simultaneously. The proposed three-layer context-based deep architecture is capable of integrating context explicitly with visual information. The novel idea here is to have a visual layer to learn visual characteristics from bi…
▽ More
Deep learning models have been efficient lately on image parsing tasks. However, deep learning models are not fully capable of exploiting visual and contextual information simultaneously. The proposed three-layer context-based deep architecture is capable of integrating context explicitly with visual information. The novel idea here is to have a visual layer to learn visual characteristics from binary class-based learners, a contextual layer to learn context, and then an integration layer to learn from both via genetic algorithm-based optimal fusion to produce a final decision. The experimental outcomes when evaluated on benchmark datasets are promising. Further analysis shows that optimized network weights can improve performance and make stable predictions.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Deep Learning Model with GA based Feature Selection and Context Integration
Authors:
Ranju Mandal,
Basim Azam,
Brijesh Verma,
Mengjie Zhang
Abstract:
Deep learning models have been very successful in computer vision and image processing applications. Since its inception, Many top-performing methods for image segmentation are based on deep CNN models. However, deep CNN models fail to integrate global and local context alongside visual features despite having complex multi-layer architectures. We propose a novel three-layered deep learning model…
▽ More
Deep learning models have been very successful in computer vision and image processing applications. Since its inception, Many top-performing methods for image segmentation are based on deep CNN models. However, deep CNN models fail to integrate global and local context alongside visual features despite having complex multi-layer architectures. We propose a novel three-layered deep learning model that assiminlate or learns independently global and local contextual information alongside visual features. The novelty of the proposed model is that One-vs-All binary class-based learners are introduced to learn Genetic Algorithm (GA) optimized features in the visual layer, followed by the contextual layer that learns global and local contexts of an image, and finally the third layer integrates all the information optimally to obtain the final class label. Stanford Background and CamVid benchmark image parsing datasets were used for our model evaluation, and our model shows promising results. The empirical analysis reveals that optimized visual features with global and local contextual information play a significant role to improve accuracy and produce stable predictions comparable to state-of-the-art deep CNN models.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Exploiting Multi-modal Contextual Sensing for City-bus's Stay Location Characterization: Towards Sub-60 Seconds Accurate Arrival Time Prediction
Authors:
Ratna Mandal,
Prasenjit Karmakar,
Soumyajit Chatterjee,
Debaleen Das Spandan,
Shouvit Pradhan,
Sujoy Saha,
Sandip Chakraborty,
Subrata Nandi
Abstract:
Intelligent city transportation systems are one of the core infrastructures of a smart city. The true ingenuity of such an infrastructure lies in providing the commuters with real-time information about citywide transports like public buses, allowing her to pre-plan the travel. However, providing prior information for transportation systems like public buses in real-time is inherently challenging…
▽ More
Intelligent city transportation systems are one of the core infrastructures of a smart city. The true ingenuity of such an infrastructure lies in providing the commuters with real-time information about citywide transports like public buses, allowing her to pre-plan the travel. However, providing prior information for transportation systems like public buses in real-time is inherently challenging because of the diverse nature of different stay-locations that a public bus stops. Although straightforward factors stay duration, extracted from unimodal sources like GPS, at these locations look erratic, a thorough analysis of public bus GPS trails for 720km of bus travels at the city of Durgapur, a semi-urban city in India, reveals that several other fine-grained contextual features can characterize these locations accurately. Accordingly, we develop BuStop, a system for extracting and characterizing the stay locations from multi-modal sensing using commuters' smartphones. Using this multi-modal information BuStop extracts a set of granular contextual features that allow the system to differentiate among the different stay-location types. A thorough analysis of BuStop using the collected dataset indicates that the system works with high accuracy in identifying different stay locations like regular bus stops, random ad-hoc stops, stops due to traffic congestion stops at traffic signals, and stops at sharp turns. Additionally, we also develop a proof-of-concept setup on top of BuStop to analyze the potential of the framework in predicting expected arrival time, a critical piece of information required to pre-plan travel, at any given bus stop. Subsequent analysis of the PoC framework, through simulation over the test dataset, shows that characterizing the stay-locations indeed helps make more accurate arrival time predictions with deviations less than 60s from the ground-truth arrival time.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
New Lower Bounds for the Number of Pseudoline Arrangements
Authors:
Adrian Dumitrescu,
Ritankar Mandal
Abstract:
Arrangements of lines and pseudolines are fundamental objects in discrete and computational geometry. They also appear in other areas of computer science, such as the study of sorting networks. Let $B_n$ be the number of nonisomorphic arrangements of $n$ pseudolines and let $b_n=\log_2{B_n}$. The problem of estimating $B_n$ was posed by Knuth in 1992. Knuth conjectured that…
▽ More
Arrangements of lines and pseudolines are fundamental objects in discrete and computational geometry. They also appear in other areas of computer science, such as the study of sorting networks. Let $B_n$ be the number of nonisomorphic arrangements of $n$ pseudolines and let $b_n=\log_2{B_n}$. The problem of estimating $B_n$ was posed by Knuth in 1992. Knuth conjectured that $b_n \leq {n \choose 2} + o(n^2)$ and also derived the first upper and lower bounds: $b_n \leq 0.7924 (n^2 +n)$ and $b_n \geq n^2/6 -O(n)$. The upper bound underwent several improvements, $b_n \leq 0.6988\, n^2$ (Felsner, 1997), and $b_n \leq 0.6571\, n^2$ (Felsner and Valtr, 2011), for large $n$. Here we show that $b_n \geq cn^2 -O(n \log{n})$ for some constant $c>0.2083$. In particular, $b_n \geq 0.2083\, n^2$ for large $n$. This improves the previous best lower bound, $b_n \geq 0.1887\, n^2$, due to Felsner and Valtr (2011). Our arguments are elementary and geometric in nature. Further, our constructions are likely to spur new developments and improved lower bounds for related problems, such as in topological graph drawings.
△ Less
Submitted 7 December, 2018; v1 submitted 10 September, 2018;
originally announced September 2018.
-
Bag-of-Visual-Words for Signature-Based Multi-Script Document Retrieval
Authors:
Ranju Mandal,
Partha Pratim Roy,
Umapada Pal,
Michael Blumenstein
Abstract:
An end-to-end architecture for multi-script document retrieval using handwritten signatures is proposed in this paper. The user supplies a query signature sample and the system exclusively returns a set of documents that contain the query signature. In the first stage, a component-wise classification technique separates the potential signature components from all other components. A bag-of-visual-…
▽ More
An end-to-end architecture for multi-script document retrieval using handwritten signatures is proposed in this paper. The user supplies a query signature sample and the system exclusively returns a set of documents that contain the query signature. In the first stage, a component-wise classification technique separates the potential signature components from all other components. A bag-of-visual-words powered by SIFT descriptors in a patch-based framework is proposed to compute the features and a Support Vector Machine (SVM)-based classifier was used to separate signatures from the documents. In the second stage, features from the foreground (i.e. signature strokes) and the background spatial information (i.e. background loops, reservoirs etc.) were combined to characterize the signature object to match with the query signature. Finally, three distance measures were used to match a query signature with the signature present in target documents for retrieval. The `Tobacco' document database and an Indian script database containing 560 documents of Devanagari (Hindi) and Bangla scripts were used for the performance evaluation. The proposed system was also tested on noisy documents and promising results were obtained. A comparative study shows that the proposed method outperforms the state-of-the-art approaches.
△ Less
Submitted 18 July, 2018;
originally announced July 2018.
-
Assessing fish abundance from underwater video using deep neural networks
Authors:
Ranju Mandal,
Rod M. Connolly,
Thomas A. Schlacherz,
Bela Stantic
Abstract:
Uses of underwater videos to assess diversity and abundance of fish are being rapidly adopted by marine biologists. Manual processing of videos for quantification by human analysts is time and labour intensive. Automatic processing of videos can be employed to achieve the objectives in a cost and time-efficient way. The aim is to build an accurate and reliable fish detection and recognition system…
▽ More
Uses of underwater videos to assess diversity and abundance of fish are being rapidly adopted by marine biologists. Manual processing of videos for quantification by human analysts is time and labour intensive. Automatic processing of videos can be employed to achieve the objectives in a cost and time-efficient way. The aim is to build an accurate and reliable fish detection and recognition system, which is important for an autonomous robotic platform. However, there are many challenges involved in this task (e.g. complex background, deformation, low resolution and light propagation). Recent advancement in the deep neural network has led to the development of object detection and recognition in real time scenarios. An end-to-end deep learning-based architecture is introduced which outperformed the state of the art methods and first of its kind on fish assessment task. A Region Proposal Network (RPN) introduced by an object detector termed as Faster R-CNN was combined with three classification networks for detection and recognition of fish species obtained from Remote Underwater Video Stations (RUVS). An accuracy of 82.4% (mAP) obtained from the experiments are much higher than previously proposed methods.
△ Less
Submitted 16 July, 2018;
originally announced July 2018.
-
Monotone Paths in Geometric Triangulations
Authors:
Adrian Dumitrescu,
Ritankar Mandal,
Csaba D. Tóth
Abstract:
(I) We prove that the (maximum) number of monotone paths in a geometric triangulation of $n$ points in the plane is $O(1.7864^n)$. This improves an earlier upper bound of $O(1.8393^n)$; the current best lower bound is $Ω(1.7003^n)$.
(II) Given a planar geometric graph $G$ with $n$ vertices, we show that the number of monotone paths in $G$ can be computed in $O(n^2)$ time.
(I) We prove that the (maximum) number of monotone paths in a geometric triangulation of $n$ points in the plane is $O(1.7864^n)$. This improves an earlier upper bound of $O(1.8393^n)$; the current best lower bound is $Ω(1.7003^n)$.
(II) Given a planar geometric graph $G$ with $n$ vertices, we show that the number of monotone paths in $G$ can be computed in $O(n^2)$ time.
△ Less
Submitted 3 October, 2016; v1 submitted 16 August, 2016;
originally announced August 2016.
-
Accurate, fully-automated NMR spectral profiling for metabolomics
Authors:
Siamak Ravanbakhsh,
Philip Liu,
Trent Bjorndahl,
Rupasri Mandal,
Jason R. Grant,
Michael Wilson,
Roman Eisner,
Igor Sinelnikov,
Xiaoyu Hu,
Claudio Luchinat,
Russell Greiner,
David S. Wishart
Abstract:
Many diseases cause significant changes to the concentrations of small molecules (aka metabolites) that appear in a person's biofluids, which means such diseases can often be readily detected from a person's "metabolic profile". This information can be extracted from a biofluid's NMR spectrum. Today, this is often done manually by trained human experts, which means this process is relatively slow,…
▽ More
Many diseases cause significant changes to the concentrations of small molecules (aka metabolites) that appear in a person's biofluids, which means such diseases can often be readily detected from a person's "metabolic profile". This information can be extracted from a biofluid's NMR spectrum. Today, this is often done manually by trained human experts, which means this process is relatively slow, expensive and error-prone. This paper presents a tool, Bayesil, that can quickly, accurately and autonomously produce a complex biofluid's (e.g., serum or CSF) metabolic profile from a 1D1H NMR spectrum. This requires first performing several spectral processing steps then matching the resulting spectrum against a reference compound library, which contains the "signatures" of each relevant metabolite. Many of these steps are novel algorithms and our matching step views spectral matching as an inference problem within a probabilistic graphical model that rapidly approximates the most probable metabolic profile. Our extensive studies on a diverse set of complex mixtures, show that Bayesil can autonomously find the concentration of all NMR-detectable metabolites accurately (~90% correct identification and ~10% quantification error), in <5minutes on a single CPU. These results demonstrate that Bayesil is the first fully-automatic publicly-accessible system that provides quantitative NMR spectral profiling effectively -- with an accuracy that meets or exceeds the performance of trained experts. We anticipate this tool will usher in high-throughput metabolomics and enable a wealth of new applications of NMR in clinical settings. Available at http://www.bayesil.ca.
△ Less
Submitted 7 September, 2014; v1 submitted 4 September, 2014;
originally announced September 2014.
-
Greedy is good: An experimental study on minimum clique cover and maximum independent set problems for randomly generated rectangles
Authors:
Ritankar Mandal,
Anirban Ghosh,
Sasanka Roy,
Subhas C. Nandy
Abstract:
Given a set ${\cal R}=\{R_1,R_2,..., R_n\}$ of $n$ randomly positioned axis parallel rectangles in 2D, the problem of computing the minimum clique cover (MCC) and maximum independent set (MIS) for the intersection graph $G({\cal R})$ of the members in $\cal R$ are both computationally hard \cite{CC05}. For the MCC problem, it is proved that polynomial time constant factor approximation is impossib…
▽ More
Given a set ${\cal R}=\{R_1,R_2,..., R_n\}$ of $n$ randomly positioned axis parallel rectangles in 2D, the problem of computing the minimum clique cover (MCC) and maximum independent set (MIS) for the intersection graph $G({\cal R})$ of the members in $\cal R$ are both computationally hard \cite{CC05}. For the MCC problem, it is proved that polynomial time constant factor approximation is impossible to obtain \cite{PT11}. Though such a result is not proved yet for the MIS problem, no polynomial time constant factor approximation algorithm exists in the literature. We study the performance of greedy algorithms for computing these two parameters of $G({\cal R})$. Experimental results shows that for each of the MCC and MIS problems, the corresponding greedy algorithm produces a solution that is very close to its optimum solution. Scheinerman \cite{Scheinerman80} showed that the size of MIS is tightly bounded by $\sqrt{n}$ for a random instance of the 1D version of the problem, (i.e., for the interval graph). Our experiment shows that the size of independent set and the clique cover produced by the greedy algorithm is at least $2\sqrt{n}$ and at most $3\sqrt{n}$, respectively. Thus the experimentally obtained approximation ratio of the greedy algorithm for MIS problem is at most 3/2 and the same for the MCC problem is at least 2/3. Finally we will provide refined greedy algorithms based on a concept of {\it simplicial rectangle}. The characteristics of this algorithm may be of interest in getting a provably constant factor approximation algorithm for random instance of both the problems. We believe that the result also holds true for any finite dimension.
△ Less
Submitted 4 December, 2012;
originally announced December 2012.
-
Equivalence Checking in Embedded Systems Design Verification
Authors:
S. Bandyopadhyay,
D. Sarkar,
C. R. Mandal
Abstract:
In this report we focus on some aspects related to modeling and formal verification of embedded systems. Many models have been proposed to represent embedded systems. These models encompass a broad range of styles, characteristics, and application domains and include the extensions of finite state machines, data flow graphs, communication processes and Petri nets. In this report, we have used a PR…
▽ More
In this report we focus on some aspects related to modeling and formal verification of embedded systems. Many models have been proposed to represent embedded systems. These models encompass a broad range of styles, characteristics, and application domains and include the extensions of finite state machines, data flow graphs, communication processes and Petri nets. In this report, we have used a PRES+ model (Petri net based Representation for Embedded Systems) as an extension of classical Petri net model that captures concurrency, timing behaviour of embedded systems; it allows systems to be representative in different levels of abstraction and improves expressiveness by allowing the token to carry information. Modeling using PRES+, as discussed above, may be convenient for specifying the input behaviour because it supports concurrency. However, there is no equivalence checking method reported in the literature for PRES+ models to the best of our knowledge. In contrast, equivalence checking of FSMD models exist. As a first step, therefore, we seek to devise an algorithm to translate PRES+ models to FSMD models.
△ Less
Submitted 20 August, 2010; v1 submitted 13 July, 2010;
originally announced July 2010.