Search | arXiv e-print repository

AI-assisted Automated Workflow for Real-time X-ray Ptychography Data Analysis via Federated Resources

Authors: Anakha V Babu, Tekin Bicer, Saugat Kandel, Tao Zhou, Daniel J. Ching, Steven Henke, Siniša Veseli, Ryan Chard, Antonino Miceli, Mathew Joseph Cherukara

Abstract: We present an end-to-end automated workflow that uses large-scale remote compute resources and an embedded GPU platform at the edge to enable AI/ML-accelerated real-time analysis of data collected for x-ray ptychography. Ptychography is a lensless method that is being used to image samples through a simultaneous numerical inversion of a large number of diffraction patterns from adjacent overlappin… ▽ More We present an end-to-end automated workflow that uses large-scale remote compute resources and an embedded GPU platform at the edge to enable AI/ML-accelerated real-time analysis of data collected for x-ray ptychography. Ptychography is a lensless method that is being used to image samples through a simultaneous numerical inversion of a large number of diffraction patterns from adjacent overlapping scan positions. This acquisition method can enable nanoscale imaging with x-rays and electrons, but this often requires very large experimental datasets and commensurately high turnaround times, which can limit experimental capabilities such as real-time experimental steering and low-latency monitoring. In this work, we introduce a software system that can automate ptychography data analysis tasks. We accelerate the data analysis pipeline by using a modified version of PtychoNN -- an ML-based approach to solve phase retrieval problem that shows two orders of magnitude speedup compared to traditional iterative methods. Further, our system coordinates and overlaps different data analysis tasks to minimize synchronization overhead between different stages of the workflow. We evaluate our workflow system with real-world experimental workloads from the 26ID beamline at Advanced Photon Source and ThetaGPU cluster at Argonne Leadership Computing Resources. △ Less

Submitted 9 April, 2023; originally announced April 2023.

Comments: 7 pages, 1 figure, to be published in High Performance Computing for Imaging Conference, Electronic Imaging (HPCI 2023)

arXiv:2209.09408 [pdf, other]

Deep learning at the edge enables real-time streaming ptychographic imaging

Authors: Anakha V Babu, Tao Zhou, Saugat Kandel, Tekin Bicer, Zhengchun Liu, William Judge, Daniel J. Ching, Yi Jiang, Sinisa Veseli, Steven Henke, Ryan Chard, Yudong Yao, Ekaterina Sirazitdinova, Geetika Gupta, Martin V. Holt, Ian T. Foster, Antonino Miceli, Mathew J. Cherukara

Abstract: Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials charact… ▽ More Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials characterization. However, associated significant increases in data and compute needs mean that conventional approaches no longer suffice for recovering sample images in real-time from high-speed coherent imaging experiments. Here, we demonstrate a workflow that leverages artificial intelligence at the edge and high-performance computing to enable real-time inversion on X-ray ptychography data streamed directly from a detector at up to 2 kHz. The proposed AI-enabled workflow eliminates the sampling constraints imposed by traditional ptychography, allowing low dose imaging using orders of magnitude less data than required by traditional methods. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2008.02189 [pdf, other]

SpinAPS: A High-Performance Spintronic Accelerator for Probabilistic Spiking Neural Networks

Authors: Anakha V Babu, Osvaldo Simeone, Bipin Rajendran

Abstract: We discuss a high-performance and high-throughput hardware accelerator for probabilistic Spiking Neural Networks (SNNs) based on Generalized Linear Model (GLM) neurons, that uses binary STT-RAM devices as synapses and digital CMOS logic for neurons. The inference accelerator, termed "SpinAPS" for Spintronic Accelerator for Probabilistic SNNs, implements a principled direct learning rule for first-… ▽ More We discuss a high-performance and high-throughput hardware accelerator for probabilistic Spiking Neural Networks (SNNs) based on Generalized Linear Model (GLM) neurons, that uses binary STT-RAM devices as synapses and digital CMOS logic for neurons. The inference accelerator, termed "SpinAPS" for Spintronic Accelerator for Probabilistic SNNs, implements a principled direct learning rule for first-to-spike decoding without the need for conversion from pre-trained ANNs. The proposed solution is shown to achieve comparable performance with an equivalent ANN on handwritten digit and human activity recognition benchmarks. The inference engine, SpinAPS, is shown through software emulation tools to achieve 4x performance improvement in terms of GSOPS/W/mm2 when compared to an equivalent SRAM-based design. The architecture leverages probabilistic spiking neural networks that employ first-to-spike decoding rule to make inference decisions at low latencies, achieving 75% of the test performance in as few as 4 algorithmic time steps on the handwritten digit benchmark. The accelerator also exhibits competitive performance with other memristor-based DNN/SNN accelerators and state-of-the-art GPUs. △ Less

Submitted 5 August, 2020; originally announced August 2020.

Comments: 25 pages, 10 figures, Submitted to Elsevier Neural Networks for review

arXiv:1711.03640 [pdf, other]

Stochastic Deep Learning in Memristive Networks

Authors: Anakha V Babu, Bipin Rajendran

Abstract: We study the performance of stochastically trained deep neural networks (DNNs) whose synaptic weights are implemented using emerging memristive devices that exhibit limited dynamic range, resolution, and variability in their programming characteristics. We show that a key device parameter to optimize the learning efficiency of DNNs is the variability in its programming characteristics. DNNs with s… ▽ More We study the performance of stochastically trained deep neural networks (DNNs) whose synaptic weights are implemented using emerging memristive devices that exhibit limited dynamic range, resolution, and variability in their programming characteristics. We show that a key device parameter to optimize the learning efficiency of DNNs is the variability in its programming characteristics. DNNs with such memristive synapses, even with dynamic range as low as $15$ and only $32$ discrete levels, when trained based on stochastic updates suffer less than $3\%$ loss in accuracy compared to floating point software baseline. We also study the performance of stochastic memristive DNNs when used as inference engines with noise corrupted data and find that if the device variability can be minimized, the relative degradation in performance for the Stochastic DNN is better than that of the software baseline. Hence, our study presents a new optimization corner for memristive devices for building large noise-immune deep learning systems. △ Less

Submitted 9 November, 2017; originally announced November 2017.

Comments: 4 pages, 5 figures, accepted at ICECS 2017

arXiv:1509.00693 [pdf]

A Fuzzy Clustering Based Approach for Mining Usage Profiles from Web Log Data

Authors: Zahid Ansari, Mohammad Fazle Azeem, A. Vinaya Babu, Waseem Ahmed

Abstract: The World Wide Web continues to grow at an amazing rate in both the size and complexity of Web sites and is well on its way to being the main reservoir of information and data. Due to this increase in growth and complexity of WWW, web site publishers are facing increasing difficulty in attracting and retaining users. To design popular and attractive websites publishers must understand their users… ▽ More The World Wide Web continues to grow at an amazing rate in both the size and complexity of Web sites and is well on its way to being the main reservoir of information and data. Due to this increase in growth and complexity of WWW, web site publishers are facing increasing difficulty in attracting and retaining users. To design popular and attractive websites publishers must understand their users needs. Therefore analyzing users behaviour is an important part of web page design. Web Usage Mining (WUM) is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the users navigational behavior. WUM contains three main steps: preprocessing, knowledge extraction and results analysis. The goal of the preprocessing stage in Web usage mining is to transform the raw web log data into a set of user profiles. Each such profile captures a sequence or a set of URLs representing a user session. △ Less

Submitted 1 September, 2015; originally announced September 2015.

Journal ref: International Journal of Computer Science and Information Security, pp. 70-79 Vol. 9, No. 6, June 2011. (ISSN 1947-5500, IJCSIS Publications, United State)

arXiv:1509.00692 [pdf]

Discovery of Web Usage Profiles Using Various Clustering Techniques

Authors: Zahid Ansari, Waseem Ahmed, M. F. Azeem, A. Vinaya Babu

Abstract: The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely us… ▽ More The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely used in WUM to capture similar interests and trends among users accessing a Web site. Clustering aims to divide a data set into groups or clusters where inter-cluster similarities are minimized while the intra cluster similarities are maximized. This paper reviews four of the popularly used clustering techniques: k-Means, k-Medoids, Leader and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Performance and validity results of each technique are presented and compared. △ Less

Submitted 1 September, 2015; originally announced September 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1507.03340

Journal ref: International Journal of Computer Information Systems, pp. 18-27 Vol. 1, No. 3, July 2011. (ISSN 2229-5208, Silicon Valley Publishers, United Kingdom)

arXiv:1509.00690 [pdf]

A Fuzzy Approach for Feature Evaluation and Dimensionality Reduction to Improve the Quality of Web Usage Mining Results

Authors: Zahid Ansari, M. F. Azeem, A. Vinaya Babu, Waseem Ahmed

Abstract: Web Usage Mining is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the users navigational behavior. During the preprocessing stage, raw web log data is transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. Clustering can be applied to this sessionize… ▽ More Web Usage Mining is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the users navigational behavior. During the preprocessing stage, raw web log data is transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. Clustering can be applied to this sessionized data in order to capture similar interests and trends among users navigational patterns. Since the sessionized data may contain thousands of user sessions and each user session may consist of hundreds of URL accesses, dimensionality reduction is achieved by eliminating the low support URLs. Very small sessions are also removed in order to filter out the noise from the data. But direct elimination of low support URLs and small sized sessions may results in loss of a significant amount of information especially when the count of low support URLs and small sessions is large. We propose a fuzzy solution to deal with this problem by assigning weights to URLs and user sessions based on a fuzzy membership function. After assigning the weights we apply a Fuzzy c-Mean Clustering algorithm to discover the clusters of user profiles. In this paper, we describe our fuzzy set theoretic approach to perform feature selection (or dimensionality reduction) and session weight assignment. Finally we compare our soft computing based approach of dimensionality reduction with the traditional approach of direct elimination of small sessions and low support count URLs. Our results show that fuzzy feature evaluation and dimensionality reduction results in better performance and validity indices for the discovered clusters. △ Less

Submitted 1 September, 2015; originally announced September 2015.

Journal ref: International Journal on Advanced Science Engineering and Information Technology, pp. 67-73 Vol. 2 No. 6, 2012. (ISSN: 2088-5334, INSIGHT Publishers, Indonesia)

arXiv:1507.03340 [pdf]

Quantitative Evaluation of Performance and Validity Indices for Clustering the Web Navigational Sessions

Authors: Zahid Ansari, M. F. Azeem, Waseem Ahmed, A. Vinaya Babu

Abstract: Clustering techniques are widely used in Web Usage Mining to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are mini… ▽ More Clustering techniques are widely used in Web Usage Mining to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are minimized while the intra cluster similarities are maximized. Since the application of different clustering algorithms generally results in different sets of cluster formation, it is important to evaluate the performance of these methods in terms of accuracy and validity of the clusters, and also the time required to generate them, using appropriate performance measures. This paper describes various validity and accuracy measures including Dunn's Index, Davies Bouldin Index, C Index, Rand Index, Jaccard Index, Silhouette Index, Fowlkes Mallows and Sum of the Squared Error (SSE). We conducted the performance evaluation of the following clustering techniques: k-Means, k-Medoids, Leader, Single Link Agglomerative Hierarchical and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Finally their performance results are presented and compared. △ Less

Submitted 13 July, 2015; originally announced July 2015.

Journal ref: World of Computer Science and Information Technology Journal pp. 217-226, Vol. 1, No. 5, June 2011. (ISSN: 2221- 0741, WCSIT Publisher, Unites States)

arXiv:1010.3862 [pdf]

A New Non Linear, Time Stamped & Feed Back Model Based Encryption Mechanism with Acknowledgement Support

Authors: A. V. N. Krishna, A. Vinaya Babu

Abstract: In this work a model is going to be used which develops data distributed over a identified value which is used as nonce (IV). The model considers an equilibrium equation which is a function of non linear relationships, time variant and nonce variant values and takes the feed back of earlier round as input to the present round. The process is repeated for different timings which are used as time st… ▽ More In this work a model is going to be used which develops data distributed over a identified value which is used as nonce (IV). The model considers an equilibrium equation which is a function of non linear relationships, time variant and nonce variant values and takes the feed back of earlier round as input to the present round. The process is repeated for different timings which are used as time stamps in the encryption mechanism. Thus this model generates a distributed sequence which is used as sub key. This model supports very important parameters in symmetric data encryption schemes like non linear relationships between different values used in the model, variable key length, timeliness of encryption mechanism and also acknowledgement between the participating parties. It also supports feed back mode which provides necessary strength against crypto analysis. △ Less

Submitted 19 October, 2010; originally announced October 2010.

Comments: 6 pages

Journal ref: IJoAT Vol 1, No 2 (October 2010)

arXiv:1007.0411 [pdf]

Role of Statistical tests in Estimation of the Security of a New Encryption Algorithm

Authors: Addepalli V. N Krishna, A Vinay Babu

Abstract: Encryption study basically deals with three levels of algorithms. The first algorithm deals with encryption mechanism, second deals with decryption Mechanism and the third discusses about the generation of keys and sub keys used in the encryption study. In the given study, a new algorithm is discussed. The algorithm executes a series of steps and generates a sequence. This sequence is being used a… ▽ More Encryption study basically deals with three levels of algorithms. The first algorithm deals with encryption mechanism, second deals with decryption Mechanism and the third discusses about the generation of keys and sub keys used in the encryption study. In the given study, a new algorithm is discussed. The algorithm executes a series of steps and generates a sequence. This sequence is being used as sub key to be mapped to plain text to generate cipher text. The strength of the encryption & Decryption process depends on the strength of sequence generated against crypto analysis.. In this part of work some statistical tests like Uniformity tests, Universal tests & Repetition tests are tried on the sequence generated to test the strength of it. △ Less

Submitted 1 July, 2010; originally announced July 2010.

Comments: http://ijict.org/index.php/ijoat/article/view/statistical-tests-for-encryption-algorithm

Journal ref: International Journal of Advancements in Technology, Vol 1, No 1 (2010)

arXiv:1004.4477 [pdf]

Preserving Privacy and Sharing the Data in Distributed Environment using Cryptographic Technique on Perturbed data

Authors: P. Kamakshi, A. Vinaya Babu

Abstract: The main objective of data mining is to extract previously unknown patterns from large collection of data. With the rapid growth in hardware, software and networking technology there is outstanding growth in the amount data collection. Organizations collect huge volumes of data from heterogeneous databases which also contain sensitive and private information about and individual .The data mining e… ▽ More The main objective of data mining is to extract previously unknown patterns from large collection of data. With the rapid growth in hardware, software and networking technology there is outstanding growth in the amount data collection. Organizations collect huge volumes of data from heterogeneous databases which also contain sensitive and private information about and individual .The data mining extracts novel patterns from such data which can be used in various domains for decision making .The problem with data mining output is that it also reveals some information, which are considered to be private and personal. Easy access to such personal data poses a threat to individual privacy. There has been growing concern about the chance of misusing personal information behind the scene without the knowledge of actual data owner. Privacy is becoming an increasingly important issue in many data mining applications in distributed environment. Privacy preserving data mining technique gives new direction to solve this problem. PPDM gives valid data mining results without learning the underlying data values .The benefits of data mining can be enjoyed, without compromising the privacy of concerned individuals. The original data is modified or a process is used in such a way that private data and private knowledge remain private even after the mining process. In this paper we have proposed a framework that allows systemic transformation of original data using randomized data perturbation technique and the modified data is then submitted as result of client's query through cryptographic approach. Using this approach we can achieve confidentiality at client as well as data owner sites. This model gives valid data mining results for analysis purpose but the actual or true data is not revealed. △ Less

Submitted 26 April, 2010; originally announced April 2010.

Comments: https://sites.google.com/site/journalofcomputing/

Journal ref: Journal of Computing, Volume 2, Issue 4, April 2010, 115-119

arXiv:1003.3090 [pdf]

doi 10.5121/ijcnc.2010.2202

Node Isolation Probability of Wireless Adhoc Networks in Nagakami Fading Channel

Authors: A. V. Babu, Mukesh Kumar Singh

Abstract: This paper investigates the issue of connectivity of a wireless adhoc network in the presence of channel impairments. We derive analytical expressions for the node isolation probability in an adhoc network in the presence of Nakagami-m fading with superimposed lognormal shadowing. The node isolation probability is the probability that a randomly chosen node is not able to communicate with none of… ▽ More This paper investigates the issue of connectivity of a wireless adhoc network in the presence of channel impairments. We derive analytical expressions for the node isolation probability in an adhoc network in the presence of Nakagami-m fading with superimposed lognormal shadowing. The node isolation probability is the probability that a randomly chosen node is not able to communicate with none of the other nodes in the network. An extensive investigation into the impact of path loss exponent, lognormal shadowing, Nakagami fading severity index, node density, and diversity order on the node isolation probability is conducted. The presented results are beneficial for the practical design of ad hoc networks. △ Less

Submitted 16 March, 2010; originally announced March 2010.

Comments: 16 pages, IJCNC Journal

Journal ref: International Journal of Computer Networks & Communications 2.2 (2010) 21-36

Showing 1–12 of 12 results for author: Babu, A V