-
agriFrame: Agricultural framework to remotely control a rover inside a greenhouse environment
Authors:
Saail Narvekar,
Soofiyan Atar,
Vishal Gupta,
Lohit Penubaku,
Kavi Arya
Abstract:
The growing demand for innovation in agriculture is essential for food security worldwide and more implicit in developing countries. With growing demand comes a reduction in rapid development time. Data collection and analysis are essential in agriculture. However, considering a given crop, its cycle comes once a year, and researchers must wait a few months before collecting more data for the give…
▽ More
The growing demand for innovation in agriculture is essential for food security worldwide and more implicit in developing countries. With growing demand comes a reduction in rapid development time. Data collection and analysis are essential in agriculture. However, considering a given crop, its cycle comes once a year, and researchers must wait a few months before collecting more data for the given crop. To overcome this hurdle, researchers are venturing into digital twins for agriculture. Toward this effort, we present an agricultural framework(agriFrame). Here, we introduce a simulated greenhouse environment for testing and controlling a robot and remotely controlling/implementing the algorithms in the real-world greenhouse setup. This work showcases the importance/interdependence of network setup, remotely controllable rover, and messaging protocol. The sophisticated yet simple-to-use agriFrame has been optimized for the simulator on minimal laptop/desktop specifications.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud Platforms
Authors:
Benjamin Reidys,
Pantea Zardoshti,
Íñigo Goiri,
Celine Irvene,
Daniel S. Berger,
Haoran Ma,
Kapil Arya,
Eli Cortez,
Taylor Stark,
Eugene Bak,
Mehmet Iyigun,
Stanko Novaković,
Lisa Hsu,
Karel Trueba,
Abhisek Pan,
Chetan Bansal,
Saravan Rajmohan,
Jian Huang,
Ricardo Bianchini
Abstract:
Cloud platforms remain underutilized despite multiple proposals to improve their utilization (e.g., disaggregation, harvesting, and oversubscription). Our characterization of the resource utilization of virtual machines (VMs) in Azure reveals that, while CPU is the main underutilized resource, we need to provide a solution to manage all resources holistically. We also observe that many VMs exhibit…
▽ More
Cloud platforms remain underutilized despite multiple proposals to improve their utilization (e.g., disaggregation, harvesting, and oversubscription). Our characterization of the resource utilization of virtual machines (VMs) in Azure reveals that, while CPU is the main underutilized resource, we need to provide a solution to manage all resources holistically. We also observe that many VMs exhibit complementary temporal patterns, which can be leveraged to improve the oversubscription of underutilized resources.
Based on these insights, we propose Coach: a system that exploits temporal patterns for all-resource oversubscription in cloud platforms. Coach uses long-term predictions and an efficient VM scheduling policy to exploit temporally complementary patterns. We introduce a new general-purpose VM type, called CoachVM, where we partition each resource allocation into a guaranteed and an oversubscribed portion. Coach monitors the oversubscribed resources to detect contention and mitigate any potential performance degradation. We focus on memory management, which is particularly challenging due to memory's sensitivity to contention and the overhead required to reassign it between CoachVMs. Our experiments show that Coach enables platforms to host up to ~26% more VMs with minimal performance degradation.
△ Less
Submitted 19 March, 2025; v1 submitted 19 January, 2025;
originally announced January 2025.
-
Scalable and low-cost remote lab platforms: Teaching industrial robotics using open-source tools and understanding its social implications
Authors:
Amit Kumar,
Jaison Jose,
Archit Jain,
Siddharth Kulkarni,
Kavi Arya
Abstract:
With recent advancements in industrial robots, educating students in new technologies and preparing them for the future is imperative. However, access to industrial robots for teaching poses challenges, such as the high cost of acquiring these robots, the safety of the operator and the robot, and complicated training material. This paper proposes two low-cost platforms built using open-source tool…
▽ More
With recent advancements in industrial robots, educating students in new technologies and preparing them for the future is imperative. However, access to industrial robots for teaching poses challenges, such as the high cost of acquiring these robots, the safety of the operator and the robot, and complicated training material. This paper proposes two low-cost platforms built using open-source tools like Robot Operating System (ROS) and its latest version ROS 2 to help students learn and test algorithms on remotely connected industrial robots. Universal Robotics (UR5) arm and a custom mobile rover were deployed in different life-size testbeds, a greenhouse, and a warehouse to create an Autonomous Agricultural Harvester System (AAHS) and an Autonomous Warehouse Management System (AWMS). These platforms were deployed for a period of 7 months and were tested for their efficacy with 1,433 and 1,312 students, respectively. The hardware used in AAHS and AWMS was controlled remotely for 160 and 355 hours, respectively, by students over a period of 3 months.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Vision-based indoor localization of nano drones in controlled environment with its applications
Authors:
Simranjeet Singh,
Amit Kumar,
Fayyaz Pocker Chemban,
Vikrant Fernandes,
Lohit Penubaku,
Kavi Arya
Abstract:
Navigating unmanned aerial vehicles in environments where GPS signals are unavailable poses a compelling and intricate challenge. This challenge is further heightened when dealing with Nano Aerial Vehicles (NAVs) due to their compact size, payload restrictions, and computational capabilities. This paper proposes an approach for localization using off-board computing, an off-board monocular camera,…
▽ More
Navigating unmanned aerial vehicles in environments where GPS signals are unavailable poses a compelling and intricate challenge. This challenge is further heightened when dealing with Nano Aerial Vehicles (NAVs) due to their compact size, payload restrictions, and computational capabilities. This paper proposes an approach for localization using off-board computing, an off-board monocular camera, and modified open-source algorithms. The proposed method uses three parallel proportional-integral-derivative controllers on the off-board computer to provide velocity corrections via wireless communication, stabilizing the NAV in a custom-controlled environment. Featuring a 3.1cm localization error and a modest setup cost of 50 USD, this approach proves optimal for environments where cost considerations are paramount. It is especially well-suited for applications like teaching drone control in academic institutions, where the specified error margin is deemed acceptable. Various applications are designed to validate the proposed technique, such as landing the NAV on a moving ground vehicle, path planning in a 3D space, and localizing multi-NAVs. The created package is openly available at https://github.com/simmubhangu/eyantra_drone to foster research in this field.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
MultiFusionNet: Multilayer Multimodal Fusion of Deep Neural Networks for Chest X-Ray Image Classification
Authors:
Saurabh Agarwal,
K. V. Arya,
Yogesh Kumar Meena
Abstract:
Chest X-ray imaging is a critical diagnostic tool for identifying pulmonary diseases. However, manual interpretation of these images is time-consuming and error-prone. Automated systems utilizing convolutional neural networks (CNNs) have shown promise in improving the accuracy and efficiency of chest X-ray image classification. While previous work has mainly focused on using feature maps from the…
▽ More
Chest X-ray imaging is a critical diagnostic tool for identifying pulmonary diseases. However, manual interpretation of these images is time-consuming and error-prone. Automated systems utilizing convolutional neural networks (CNNs) have shown promise in improving the accuracy and efficiency of chest X-ray image classification. While previous work has mainly focused on using feature maps from the final convolution layer, there is a need to explore the benefits of leveraging additional layers for improved disease classification. Extracting robust features from limited medical image datasets remains a critical challenge. In this paper, we propose a novel deep learning-based multilayer multimodal fusion model that emphasizes extracting features from different layers and fusing them. Our disease detection model considers the discriminatory information captured by each layer. Furthermore, we propose the fusion of different-sized feature maps (FDSFM) module to effectively merge feature maps from diverse layers. The proposed model achieves a significantly higher accuracy of 97.21% and 99.60% for both three-class and two-class classifications, respectively. The proposed multilayer multimodal fusion model, along with the FDSFM module, holds promise for accurate disease classification and can also be extended to other disease classifications in chest X-ray images.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
Keeping Teams in the Game: Predicting Dropouts in Online Problem-Based Learning Competition
Authors:
Aditya Panwar,
Ashwin T S,
Ramkumar Rajendran,
Kavi Arya
Abstract:
Online learning and MOOCs have become increasingly popular in recent years, and the trend will continue, given the technology boom. There is a dire need to observe learners' behavior in these online courses, similar to what instructors do in a face-to-face classroom. Learners' strategies and activities become crucial to understanding their behavior. One major challenge in online courses is predict…
▽ More
Online learning and MOOCs have become increasingly popular in recent years, and the trend will continue, given the technology boom. There is a dire need to observe learners' behavior in these online courses, similar to what instructors do in a face-to-face classroom. Learners' strategies and activities become crucial to understanding their behavior. One major challenge in online courses is predicting and preventing dropout behavior. While several studies have tried to perform such analysis, there is still a shortage of studies that employ different data streams to understand and predict the drop rates. Moreover, studies rarely use a fully online team-based collaborative environment as their context. Thus, the current study employs an online longitudinal problem-based learning (PBL) collaborative robotics competition as the testbed. Through methodological triangulation, the study aims to predict dropout behavior via the contributions of Discourse discussion forum 'activities' of participating teams, along with a self-reported Online Learning Strategies Questionnaire (OSLQ). The study also uses Qualitative interviews to enhance the ground truth and results. The OSLQ data is collected from more than 4000 participants. Furthermore, the study seeks to establish the reliability of OSLQ to advance research within online environments. Various Machine Learning algorithms are applied to analyze the data. The findings demonstrate the reliability of OSLQ with our substantial sample size and reveal promising results for predicting the dropout rate in online competition.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
An Efficient Point of Gaze Estimator for Low-Resolution Imaging Systems Using Extracted Ocular Features Based Neural Architecture
Authors:
Atul Sahay,
Imon Mukherjee,
Kavi Arya
Abstract:
A user's eyes provide means for Human Computer Interaction (HCI) research as an important modal. The time to time scientific explorations of the eye has already seen an upsurge of the benefits in HCI applications from gaze estimation to the measure of attentiveness of a user looking at a screen for a given time period. The eye tracking system as an assisting, interactive tool can be incorporated b…
▽ More
A user's eyes provide means for Human Computer Interaction (HCI) research as an important modal. The time to time scientific explorations of the eye has already seen an upsurge of the benefits in HCI applications from gaze estimation to the measure of attentiveness of a user looking at a screen for a given time period. The eye tracking system as an assisting, interactive tool can be incorporated by physically disabled individuals, fitted best for those who have eyes as only a limited set of communication. The threefold objective of this paper is - 1. To introduce a neural network based architecture to predict users' gaze at 9 positions displayed in the 11.31° visual range on the screen, through a low resolution based system such as a webcam in real time by learning various aspects of eyes as an ocular feature set. 2.A collection of coarsely supervised feature set obtained in real time which is also validated through the user case study presented in the paper for 21 individuals ( 17 men and 4 women ) from whom a 35k set of instances was derived with an accuracy score of 82.36% and f1_score of 82.2% and 3.A detailed study over applicability and underlying challenges of such systems. The experimental results verify the feasibility and validity of the proposed eye gaze tracking model.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Unsupervised Learning of Explainable Parse Trees for Improved Generalisation
Authors:
Atul Sahay,
Ayush Maheshwari,
Ritesh Kumar,
Ganesh Ramakrishnan,
Manjesh Kumar Hanawal,
Kavi Arya
Abstract:
Recursive neural networks (RvNN) have been shown useful for learning sentence representations and helped achieve competitive performance on several natural language inference tasks. However, recent RvNN-based models fail to learn simple grammar and meaningful semantics in their intermediate tree representation. In this work, we propose an attention mechanism over Tree-LSTMs to learn more meaningfu…
▽ More
Recursive neural networks (RvNN) have been shown useful for learning sentence representations and helped achieve competitive performance on several natural language inference tasks. However, recent RvNN-based models fail to learn simple grammar and meaningful semantics in their intermediate tree representation. In this work, we propose an attention mechanism over Tree-LSTMs to learn more meaningful and explainable parse tree structures. We also demonstrate the superior performance of our proposed model on natural language inference, semantic relatedness, and sentiment analysis tasks and compare them with other state-of-the-art RvNN based methods. Further, we present a detailed qualitative and quantitative analysis of the learned parse trees and show that the discovered linguistic structures are more explainable, semantically meaningful, and grammatically correct than recent approaches. The source code of the paper is available at https://github.com/atul04/Explainable-Latent-Structures-Using-Attention.
△ Less
Submitted 11 April, 2021;
originally announced April 2021.
-
Detecting Hostile Posts using Relational Graph Convolutional Network
Authors:
Sarthak,
Shikhar Shukla,
Karm Veer Arya
Abstract:
This work is based on the submission to the competition Hindi Constraint conducted by AAAI@2021 for detection of hostile posts in Hindi on social media platforms. Here, a model is presented for detection and classification of hostile posts and further classify into fake, offensive, hate and defamation using Relational Graph Convolutional Networks. Unlike other existing work, our approach is focuse…
▽ More
This work is based on the submission to the competition Hindi Constraint conducted by AAAI@2021 for detection of hostile posts in Hindi on social media platforms. Here, a model is presented for detection and classification of hostile posts and further classify into fake, offensive, hate and defamation using Relational Graph Convolutional Networks. Unlike other existing work, our approach is focused on using semantic meaning along with contextutal information for better classification. The results from AAAI@2021 indicates that the proposed model is performing at par with Google's XLM-RoBERTa on the given dataset. Our best submission with RGCN achieves an F1 score of 0.97 (7th Rank) on coarse-grained evaluation and achieved best performance on identifying fake posts. Among all submissions to the challenge, our classification system with XLM-Roberta secured 2nd rank on fine-grained classification.
△ Less
Submitted 7 April, 2021; v1 submitted 10 January, 2021;
originally announced January 2021.
-
Learning Non-Markovian Quantum Noise from Moiré-Enhanced Swap Spectroscopy with Deep Evolutionary Algorithm
Authors:
Murphy Yuezhen Niu,
Vadim Smelyanskyi,
Paul Klimov,
Sergio Boixo,
Rami Barends,
Julian Kelly,
Yu Chen,
Kunal Arya,
Brian Burkett,
Dave Bacon,
Zijun Chen,
Ben Chiaro,
Roberto Collins,
Andrew Dunsworth,
Brooks Foxen,
Austin Fowler,
Craig Gidney,
Marissa Giustina,
Rob Graff,
Trent Huang,
Evan Jeffrey,
David Landhuis,
Erik Lucero,
Anthony Megrant,
Josh Mutus
, et al. (8 additional authors not shown)
Abstract:
Two-level-system (TLS) defects in amorphous dielectrics are a major source of noise and decoherence in solid-state qubits. Gate-dependent non-Markovian errors caused by TLS-qubit coupling are detrimental to fault-tolerant quantum computation and have not been rigorously treated in the existing literature. In this work, we derive the non-Markovian dynamics between TLS and qubits during a SWAP-like…
▽ More
Two-level-system (TLS) defects in amorphous dielectrics are a major source of noise and decoherence in solid-state qubits. Gate-dependent non-Markovian errors caused by TLS-qubit coupling are detrimental to fault-tolerant quantum computation and have not been rigorously treated in the existing literature. In this work, we derive the non-Markovian dynamics between TLS and qubits during a SWAP-like two-qubit gate and the associated average gate fidelity for frequency-tunable Transmon qubits. This gate dependent error model facilitates using qubits as sensors to simultaneously learn practical imperfections in both the qubit's environment and control waveforms. We combine the-state-of-art machine learning algorithm with Moiré-enhanced swap spectroscopy to achieve robust learning using noisy experimental data. Deep neural networks are used to represent the functional map from experimental data to TLS parameters and are trained through an evolutionary algorithm. Our method achieves the highest learning efficiency and robustness against experimental imperfections to-date, representing an important step towards in-situ quantum control optimization over environmental and control defects.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Selection-based Question Answering of an MOOC
Authors:
Atul Sahay,
Smita Gholkar,
Kavi Arya
Abstract:
e-Yantra Robotics Competition (eYRC) is a unique Robotics Competition hosted by IIT Bombay that is actually an Embedded Systems and Robotics MOOC. Registrations have been growing exponentially in each year from 4500 in 2012 to over 34000 in 2019. In this 5-month long competition students learn complex skills under severe time pressure and have access to a discussion forum to post doubts about the…
▽ More
e-Yantra Robotics Competition (eYRC) is a unique Robotics Competition hosted by IIT Bombay that is actually an Embedded Systems and Robotics MOOC. Registrations have been growing exponentially in each year from 4500 in 2012 to over 34000 in 2019. In this 5-month long competition students learn complex skills under severe time pressure and have access to a discussion forum to post doubts about the learning material. Responding to questions in real-time is a challenge for project staff. Here, we illustrate the advantage of Deep Learning for real-time question answering in the eYRC discussion forum. We illustrate the advantage of Transformer based contextual embedding mechanisms such as Bidirectional Encoder Representation From Transformer (BERT) over word embedding mechanisms such as Word2Vec. We propose a weighted similarity metric as a measure of matching and find it more reliable than Content-Content or Title-Title similarities alone. The automation of replying to questions has brought the turn around response time(TART) down from a minimum of 21 mins to a minimum of 0.3 secs.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
IIITM Face: A Database for Facial Attribute Detection in Constrained and Simulated Unconstrained Environments
Authors:
Raj Kuwar Gupta,
Shresth Verma,
KV Arya,
Soumya Agarwal,
Prince Gupta
Abstract:
This paper addresses the challenges of face attribute detection specifically in the Indian context. While there are numerous face datasets in unconstrained environments, none of them captures emotions in different face orientations. Moreover, there is an under-representation of people of Indian ethnicity in these datasets since they have been scraped from popular search engines. As a result, the p…
▽ More
This paper addresses the challenges of face attribute detection specifically in the Indian context. While there are numerous face datasets in unconstrained environments, none of them captures emotions in different face orientations. Moreover, there is an under-representation of people of Indian ethnicity in these datasets since they have been scraped from popular search engines. As a result, the performance of state-of-the-art techniques can't be evaluated on Indian faces. In this work, we introduce a new dataset, IIITM Face, for the scientific community to address these challenges. Our dataset includes 107 participants who exhibit 6 emotions in 3 different face orientations. Each of these images is further labelled on attributes like gender, presence of moustache, beard or eyeglasses, clothes worn by the subjects and the density of their hair. Moreover, the images are captured in high resolution with specific background colors which can be easily replaced by cluttered backgrounds to simulate `in the Wild' behaviour. We demonstrate the same by constructing IIITM Face-SUE. Both IIITM Face and IIITM Face-SUE have been benchmarked across key multi-label metrics for the research community to compare their results.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Learning while Competing -- 3D Modeling & Design
Authors:
Kalind Karia,
Rucmenya Bessariya,
Krishna Lala,
Kavi Arya
Abstract:
The e-Yantra project at IIT Bombay conducts an online competition, e-Yantra Robotics Competition (eYRC) which uses a Project Based Learning (PBL) methodology to train students to implement a robotics project in a step-by-step manner over a five-month period. Participation is absolutely free. The competition provides all resources - robot, accessories, and a problem statement - to a participating t…
▽ More
The e-Yantra project at IIT Bombay conducts an online competition, e-Yantra Robotics Competition (eYRC) which uses a Project Based Learning (PBL) methodology to train students to implement a robotics project in a step-by-step manner over a five-month period. Participation is absolutely free. The competition provides all resources - robot, accessories, and a problem statement - to a participating team. If selected for the finals, e-Yantra pays for them to come to the finals at IIT Bombay. This makes the competition accessible to resource-poor student teams. In this paper, we describe the methodology used in the 6th edition of eYRC, eYRC-2017 where we experimented with a Theme (projects abstracted into rulebooks) involving an advanced topic - 3D Designing and interfacing with sensors and actuators. We demonstrate that the learning outcomes are consistent with our previous studies [1]. We infer that even 3D designing to create a working model can be effectively learned in a competition mode through PBL.
△ Less
Submitted 18 May, 2019;
originally announced May 2019.
-
Transition Watchpoints: Teaching Old Debuggers New Tricks
Authors:
Kapil Arya,
Tyler Denniston,
Ariel Rabkin,
Gene Cooperman
Abstract:
Reversible debuggers and process replay have been developed at least since 1970. This vision enables one to execute backwards in time under a debugger. Two important problems in practice are that, first, current reversible debuggers are slow when reversing over long time periods, and, second, after building one reversible debugger, it is difficult to transfer that achievement to a new programming…
▽ More
Reversible debuggers and process replay have been developed at least since 1970. This vision enables one to execute backwards in time under a debugger. Two important problems in practice are that, first, current reversible debuggers are slow when reversing over long time periods, and, second, after building one reversible debugger, it is difficult to transfer that achievement to a new programming environment. The user observes a bug when arriving at an error. Searching backwards for the corresponding fault may require many reverse steps. Ultimately, the user prefers to write an expression that will transition to false upon arriving at the fault. The solution is an expression-transition watchpoint facility based on top of snapshots and record/replay. Expression-transition watch- points are implemented as binary search through the timeline of a program execution, while using the snapshots as landmarks within that timeline. This allows for debugging of subtle bugs that appear only after minutes or more of program execution. When a bug occurs within seconds of program startup, repeated debugging sessions suffice. Reversible debugging is preferred for bugs seen only after minutes. This architecture allows for an efficient and easy-to-write snapshot-based reversibe debugger on top of a conventional debugger. The validity of this approach was tested by developing four personalities (for GDB, MATLAB, Perl, and Python), with each personality typically requiring just 100 lines of code.
△ Less
Submitted 31 March, 2017;
originally announced March 2017.
-
Adapting the DMTCP Plugin Model for Checkpointing of Hardware Emulation
Authors:
Rohan Garg,
Kapil Arya,
Jiajun Cao,
Gene Cooperman,
Jeff Evans,
Ankit Garg,
Neil A. Rosenberg,
K. Suresh
Abstract:
Checkpoint-restart is now a mature technology. It allows a user to save and later restore the state of a running process. The new plugin model for the upcoming version 3.0 of DMTCP (Distributed MultiThreaded Checkpointing) is described here. This plugin model allows a target application to disconnect from the hardware emulator at checkpoint time and then re-connect to a possibly different hardware…
▽ More
Checkpoint-restart is now a mature technology. It allows a user to save and later restore the state of a running process. The new plugin model for the upcoming version 3.0 of DMTCP (Distributed MultiThreaded Checkpointing) is described here. This plugin model allows a target application to disconnect from the hardware emulator at checkpoint time and then re-connect to a possibly different hardware emulator at the time of restart. The DMTCP plugin model is important in allowing three distinct parties to seamlessly inter-operate. The three parties are: the EDA designer, who is concerned with formal verification of a circuit design; the DMTCP developers, who are concerned with providing transparent checkpointing during the circuit emulation; and the hardware emulator vendor, who provides a plugin library that responds to checkpoint, restart, and other events.
The new plugin model is an example of process-level virtualization: virtualization of external abstractions from within a process. This capability is motivated by scenarios for testing circuit models with the help of a hardware emulator. The plugin model enables a three-way collaboration: allowing a circuit designer and emulator vendor to each contribute separate proprietary plugins while sharing an open source software framework from the DMTCP developers. This provides a more flexible platform, where different fault injection models based on plugins can be designed within the DMTCP checkpointing framework. After initialization, one restarts from a checkpointed state under the control of the desired plugin. This restart saves the time spent in simulating the initialization phase, while enabling fault injection exactly at the region of interest. Upon restart, one can inject faults or otherwise modify the remainder of the simulation. The work concludes with a brief survey of checkpointing and process-level virtualization.
△ Less
Submitted 2 March, 2017;
originally announced March 2017.
-
System-level Scalable Checkpoint-Restart for Petascale Computing
Authors:
Jiajun Cao,
Kapil Arya,
Rohan Garg,
Shawn Matott,
Dhabaleswar K. Panda,
Hari Subramoni,
Jérôme Vienne,
Gene Cooperman
Abstract:
Fault tolerance for the upcoming exascale generation has long been an area of active research. One of the components of a fault tolerance strategy is checkpointing. Petascale-level checkpointing is demonstrated through a new mechanism for virtualization of the InfiniBand UD (unreliable datagram) mode, and for updating the remote address on each UD-based send, due to lack of a fixed peer. Note that…
▽ More
Fault tolerance for the upcoming exascale generation has long been an area of active research. One of the components of a fault tolerance strategy is checkpointing. Petascale-level checkpointing is demonstrated through a new mechanism for virtualization of the InfiniBand UD (unreliable datagram) mode, and for updating the remote address on each UD-based send, due to lack of a fixed peer. Note that InfiniBand UD is required to support modern MPI implementations. An extrapolation from the current results to future SSD-based storage systems provides evidence that the current approach will remain practical in the exascale generation. This transparent checkpointing approach is evaluated using a framework of the DMTCP checkpointing package. Results are shown for HPCG (linear algebra), NAMD (molecular dynamics), and the NAS NPB benchmarks. In tests up to 32,752 MPI processes on 32,752 CPU cores, checkpointing of a computation with a 38 TB memory footprint in 11 minutes is demonstrated. Runtime overhead is reduced to less than 1%. The approach is also evaluated across three widely used MPI implementations.
△ Less
Submitted 23 September, 2016; v1 submitted 27 July, 2016;
originally announced July 2016.
-
Transparent Checkpoint-Restart over InfiniBand
Authors:
Jiajun Cao,
Gregory Kerr,
Kapil Arya,
Gene Cooperman
Abstract:
InfiniBand is widely used for low-latency, high-throughput cluster computing. Saving the state of the InfiniBand network as part of distributed checkpointing has been a long-standing challenge for researchers. Because of a lack of a solution, typical MPI implementations have included custom checkpoint-restart services that "tear down" the network, checkpoint each node as if the node were a standal…
▽ More
InfiniBand is widely used for low-latency, high-throughput cluster computing. Saving the state of the InfiniBand network as part of distributed checkpointing has been a long-standing challenge for researchers. Because of a lack of a solution, typical MPI implementations have included custom checkpoint-restart services that "tear down" the network, checkpoint each node as if the node were a standalone computer, and then re-connect the network again. We present the first example of transparent, system-initiated checkpoint-restart that directly supports InfiniBand. The new approach is independent of any particular Linux kernel, thus simplifying the current practice of using a kernel-based module, such as BLCR. This direct approach results in checkpoints that are found to be faster than with the use of a checkpoint-restart service. The generality of this approach is shown not only by checkpointing an MPI computation, but also a native UPC computation (Berkeley Unified Parallel C), which does not use MPI. Scalability is shown by checkpointing 2,048 MPI processes across 128 nodes (with 16 cores per node). In addition, a cost-effective debugging approach is also enabled, in which a checkpoint image from an InfiniBand-based production cluster is copied to a local Ethernet-based cluster, where it can be restarted and an interactive debugger can be attached to it. This work is based on a plugin that extends the DMTCP (Distributed MultiThreaded CheckPointing) checkpoint-restart package.
△ Less
Submitted 30 January, 2014; v1 submitted 13 December, 2013;
originally announced December 2013.
-
Explorations of the viability of ARM and Xeon Phi for physics processing
Authors:
David Abdurachmanov,
Kapil Arya,
Josh Bendavid,
Tommaso Boccali,
Gene Cooperman,
Andrea Dotti,
Peter Elmer,
Giulio Eulisse,
Francesco Giacomini,
Christopher D. Jones,
Matteo Manzali,
Shahzad Muzaffar
Abstract:
We report on our investigations into the viability of the ARM processor and the Intel Xeon Phi co-processor for scientific computing. We describe our experience porting software to these processors and running benchmarks using real physics applications to explore the potential of these processors for production physics processing.
We report on our investigations into the viability of the ARM processor and the Intel Xeon Phi co-processor for scientific computing. We describe our experience porting software to these processors and running benchmarks using real physics applications to explore the potential of these processors for production physics processing.
△ Less
Submitted 21 January, 2014; v1 submitted 5 November, 2013;
originally announced November 2013.
-
Use of checkpoint-restart for complex HEP software on traditional architectures and Intel MIC
Authors:
Kapil Arya,
Gene Cooperman,
Andrea Dotti,
Peter Elmer
Abstract:
Process checkpoint-restart is a technology with great potential for use in HEP workflows. Use cases include debugging, reducing the startup time of applications both in offline batch jobs and the High Level Trigger, permitting job preemption in environments where spare CPU cycles are being used opportunistically and efficient scheduling of a mix of multicore and single-threaded jobs. We report on…
▽ More
Process checkpoint-restart is a technology with great potential for use in HEP workflows. Use cases include debugging, reducing the startup time of applications both in offline batch jobs and the High Level Trigger, permitting job preemption in environments where spare CPU cycles are being used opportunistically and efficient scheduling of a mix of multicore and single-threaded jobs. We report on tests of checkpoint-restart technology using CMS software, Geant4-MT (multi-threaded Geant4), and the DMTCP (Distributed Multithreaded Checkpointing) package. We analyze both single- and multi-threaded applications and test on both standard Intel x86 architectures and on Intel MIC. The tests with multi-threaded applications on Intel MIC are used to consider scalability and performance. These are considered an indicator of what the future may hold for many-core computing.
△ Less
Submitted 22 January, 2014; v1 submitted 1 November, 2013;
originally announced November 2013.
-
FReD: Automated Debugging via Binary Search through a Process Lifetime
Authors:
Kapil Arya,
Tyler Denniston,
Ana-Maria Visan,
Gene Cooperman
Abstract:
Reversible debuggers have been developed at least since 1970. Such a feature is useful when the cause of a bug is close in time to the bug manifestation. When the cause is far back in time, one resorts to setting appropriate breakpoints in the debugger and beginning a new debugging session. For these cases when the cause of a bug is far in time from its manifestation, bug diagnosis requires a seri…
▽ More
Reversible debuggers have been developed at least since 1970. Such a feature is useful when the cause of a bug is close in time to the bug manifestation. When the cause is far back in time, one resorts to setting appropriate breakpoints in the debugger and beginning a new debugging session. For these cases when the cause of a bug is far in time from its manifestation, bug diagnosis requires a series of debugging sessions with which to narrow down the cause of the bug.
For such "difficult" bugs, this work presents an automated tool to search through the process lifetime and locate the cause. As an example, the bug could be related to a program invariant failing. A binary search through the process lifetime suffices, since the invariant expression is true at the beginning of the program execution, and false when the bug is encountered. An algorithm for such a binary search is presented within the FReD (Fast Reversible Debugger) software. It is based on the ability to checkpoint, restart and deterministically replay the multiple processes of a debugging session. It is based on GDB (a debugger), DMTCP (for checkpoint-restart), and a custom deterministic record-replay plugin for DMTCP.
FReD supports complex, real-world multithreaded programs, such as MySQL and Firefox. Further, the binary search is robust. It operates on multi-threaded programs, and takes advantage of multi-core architectures during replay.
△ Less
Submitted 20 December, 2012;
originally announced December 2012.
-
Single bit full adder design using 8 transistors with novel 3 transistors XNOR gate
Authors:
Manoj Kumar,
Sandeep K. Arya,
Sujata Pandey
Abstract:
In present work a new XNOR gate using three transistors has been presented, which shows power dissipation of 550.7272$μ$W in 0.35$μ$m technology with supply voltage of 3.3V. Minimum level for high output of 2.05V and maximum level for low output of 0.084V have been obtained. A single bit full adder using eight transistors has been designed using proposed XNOR cell, which shows power dissipation of…
▽ More
In present work a new XNOR gate using three transistors has been presented, which shows power dissipation of 550.7272$μ$W in 0.35$μ$m technology with supply voltage of 3.3V. Minimum level for high output of 2.05V and maximum level for low output of 0.084V have been obtained. A single bit full adder using eight transistors has been designed using proposed XNOR cell, which shows power dissipation of 581.542$μ$W. Minimum level for high output of 1.97V and maximum level for low output of 0.24V is obtained for sum output signal. For carry signal maximum level for low output of 0.32V and minimum level for high output of 3.2V have been achieved. Simulations have been performed by using SPICE based on TSMC 0.35$μ$m CMOS technology. Power consumption of proposed XNOR gate and full adder has been compared with earlier reported circuits and proposed circuit's shows better performance in terms of power consumption and transistor count.
△ Less
Submitted 10 January, 2012;
originally announced January 2012.
-
Level Shifter Design for Low Power Applications
Authors:
Manoj Kumar,
Sandeep K. Arya,
Sujata Pandey
Abstract:
With scaling of Vt sub-threshold leakage power is increasing and expected to become significant part of total power consumption In present work three new configurations of level shifters for low power application in 0.35μm technology have been presented. The proposed circuits utilize the merits of stacking technique with smaller leakage current and reduction in leakage power. Conventional level sh…
▽ More
With scaling of Vt sub-threshold leakage power is increasing and expected to become significant part of total power consumption In present work three new configurations of level shifters for low power application in 0.35μm technology have been presented. The proposed circuits utilize the merits of stacking technique with smaller leakage current and reduction in leakage power. Conventional level shifter has been improved by addition of three NMOS transistors, which shows total power consumption of 402.2264pW as compared to 0.49833nW with existing circuit. Single supply level shifter has been modified with addition of two NMOS transistors that gives total power consumption of 108.641pW as compared to 31.06nW. Another circuit, contention mitigated level shifter (CMLS) with three additional transistors shows total power consumption of 396.75pW as compared to 0.4937354nW. Three proposed circuit's shows better performance in terms of power consumption with a little conciliation in delay. Output level of 3.3V has been obtained with input pulse of 1.6V for all proposed circuits.
△ Less
Submitted 2 November, 2010;
originally announced November 2010.
-
Routing in Wireless Adhoc Networks: A New Horizon
Authors:
Mano Yadav,
Vinay Rishiwal,
K. V. Arya
Abstract:
A lot of work has been done on routing protocols for mobile ad hoc networks, but still standardization of them requires some more issues less addressed by the existing routing protocols. In this paper a new paradigm of maintaining multiple connections in adhoc routing protocols has been highlighted which may be crucial for efficient routing in mobile ad hoc networks. The problem of multiple conn…
▽ More
A lot of work has been done on routing protocols for mobile ad hoc networks, but still standardization of them requires some more issues less addressed by the existing routing protocols. In this paper a new paradigm of maintaining multiple connections in adhoc routing protocols has been highlighted which may be crucial for efficient routing in mobile ad hoc networks. The problem of multiple connections has been hardly worked on in adhoc networks. In this paper the solution of route maintenance if nodes are maintaining multiple connections has been proposed. This idea not only helps to solve the multiple connections problem, but also take care of proper bandwidth distribution to different connections as per different traffic types. Study has been incorporated on existing AODV with changes. Simulation studies have been performed over packet delivery ratio, throughput and message overheads. Results show that the proposed solution for multiple connections is efficient and worth implementing in existing as well as new protocols.
△ Less
Submitted 22 December, 2009;
originally announced December 2009.
-
Temporal Debugging using URDB
Authors:
Ana Maria Visan,
Artem Polyakov,
Praveen S. Solanki,
Kapil Arya,
Tyler Denniston,
Gene Cooperman
Abstract:
A new style of temporal debugging is proposed. The new URDB debugger can employ such techniques as temporal search for finding an underlying fault that is causing a bug. This improves on the standard iterative debugging style, which iteratively re-executes a program under debugger control in the search for the underlying fault. URDB acts as a meta-debugger, with current support for four widely u…
▽ More
A new style of temporal debugging is proposed. The new URDB debugger can employ such techniques as temporal search for finding an underlying fault that is causing a bug. This improves on the standard iterative debugging style, which iteratively re-executes a program under debugger control in the search for the underlying fault. URDB acts as a meta-debugger, with current support for four widely used debuggers: gdb, MATLAB, python, and perl. Support for a new debugger can be added in a few hours. Among its points of novelty are: (i) the first reversible debuggers for MATLAB, python, and perl; (ii) support for today's multi-core architectures; (iii) reversible debugging of multi-process and distributed computations; and (iv) temporal search on changes in program expressions. URDB gains its reversibility and temporal abilities through the fast checkpoint-restart capability of DMTCP (Distributed MultiThreaded CheckPointing). The recently enhanced DMTCP also adds ptrace support, enabling one to freeze, migrate, and replicate debugging sessions.
△ Less
Submitted 27 October, 2009;
originally announced October 2009.
-
DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop
Authors:
Jason Ansel,
Kapil Arya,
Gene Cooperman
Abstract:
DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including MATLAB, Python, TightVNC, MPICH2, OpenMPI, and runCMS. RunCMS runs as a 680 MB image in memory that includes 540 dynamic libraries, and is used for the CMS experimen…
▽ More
DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including MATLAB, Python, TightVNC, MPICH2, OpenMPI, and runCMS. RunCMS runs as a 680 MB image in memory that includes 540 dynamic libraries, and is used for the CMS experiment of the Large Hadron Collider at CERN. DMTCP transparently checkpoints general cluster computations consisting of many nodes, processes, and threads; as well as typical desktop applications. On 128 distributed cores (32 nodes), checkpoint and restart times are typically 2 seconds, with negligible run-time overhead. Typical checkpoint times are reduced to 0.2 seconds when using forked checkpointing. Experimental results show that checkpoint time remains nearly constant as the number of nodes increases on a medium-size cluster.
DMTCP automatically accounts for fork, exec, ssh, mutexes/semaphores, TCP/IP sockets, UNIX domain sockets, pipes, ptys (pseudo-terminals), terminal modes, ownership of controlling terminals, signal handlers, open file descriptors, shared open file descriptors, I/O (including the readline library), shared memory (via mmap), parent-child process relationships, pid virtualization, and other operating system artifacts. By emphasizing an unprivileged, user-space approach, compatibility is maintained across Linux kernels from 2.6.9 through the current 2.6.28. Since DMTCP is unprivileged and does not require special kernel modules or kernel patches, DMTCP can be incorporated and distributed as a checkpoint-restart module within some larger package.
△ Less
Submitted 24 February, 2009; v1 submitted 6 January, 2007;
originally announced January 2007.