Skip to main content

Showing 1–17 of 17 results for author: Yedida, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.18002  [pdf, other

    cs.LG cs.AI

    A Radon-Nikodým Perspective on Anomaly Detection: Theory and Implications

    Authors: Shlok Mehendale, Aditya Challa, Rahul Yedida, Sravan Danda, Santonu Sarkar, Snehanshu Saha

    Abstract: Which principle underpins the design of an effective anomaly detection loss function? The answer lies in the concept of Radon-Nikodým theorem, a fundamental concept in measure theory. The key insight from this article is: Multiplying the vanilla loss function with the Radon-Nikodým derivative improves the performance across the board. We refer to this as RN-Loss. We prove this using the setting of… ▽ More

    Submitted 16 May, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  2. arXiv:2402.05025  [pdf, other

    cs.LG

    Strong convexity-guided hyper-parameter optimization for flatter losses

    Authors: Rahul Yedida, Snehanshu Saha

    Abstract: We propose a novel white-box approach to hyper-parameter optimization. Motivated by recent work establishing a relationship between flat minima and generalization, we first establish a relationship between the strong convexity of the loss and its flatness. Based on this, we seek to find hyper-parameter configurations that improve flatness by minimizing the strong convexity of the loss. By using th… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: v1

  3. arXiv:2401.09622  [pdf, other

    cs.SE cs.LG

    Is Hyper-Parameter Optimization Different for Software Analytics?

    Authors: Rahul Yedida, Tim Menzies

    Abstract: Yes. SE data can have "smoother" boundaries between classes (compared to traditional AI data sets). To be more precise, the magnitude of the second derivative of the loss function found in SE data is typically much smaller. A new hyper-parameter optimizer, called SMOOTHIE, can exploit this idiosyncrasy of SE data. We compare SMOOTHIE and a state-of-the-art AI hyper-parameter optimizer on three tas… ▽ More

    Submitted 25 February, 2025; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted to TSE

  4. arXiv:2205.10504  [pdf, other

    cs.SE cs.LG

    How to Find Actionable Static Analysis Warnings: A Case Study with FindBugs

    Authors: Rahul Yedida, Hong Jin Kang, Huy Tu, Xueqi Yang, David Lo, Tim Menzies

    Abstract: Automatically generated static code warnings suffer from a large number of false alarms. Hence, developers only take action on a small percent of those warnings. To better predict which static code warnings should not be ignored, we suggest that analysts need to look deeper into their algorithms to find choices that better improve the particulars of their specific problem. Specifically, we show he… ▽ More

    Submitted 23 December, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

    Comments: Accepted to TSE

  5. arXiv:2202.01322  [pdf, other

    cs.SE

    How to Improve Deep Learning for Software Analytics (a case study with code smell detection)

    Authors: Rahul Yedida, Tim Menzies

    Abstract: To reduce technical debt and make code more maintainable, it is important to be able to warn programmers about code smells. State-of-the-art code small detectors use deep learners, without much exploration of alternatives within that technology. One promising alternative for software analytics and deep learning is GHOST (from TSE'21) that relies on a combination of hyper-parameter optimization o… ▽ More

    Submitted 27 March, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Accepted to MSR 2022

  6. arXiv:2109.14569  [pdf, other

    cs.LG cs.SE stat.ML

    An Expert System for Redesigning Software for Cloud Applications

    Authors: Rahul Yedida, Rahul Krishna, Anup Kalia, Tim Menzies, Jin Xiao, Maja Vukovic

    Abstract: Cloud-based software has many advantages. When services are divided into many independent components, they are easier to update. Also, during peak demand, it is easier to scale cloud services (just hire more CPUs). Hence, many organizations are partitioning their monolithic enterprise applications into cloud-based microservices. Recently there has been much work using machine learning to simplif… ▽ More

    Submitted 27 June, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: version 3

  7. Crowdsourcing the State of the Art(ifacts)

    Authors: Maria Teresa Baldassarre, Neil Ernst, Ben Hermann, Tim Menzies, Rahul Yedida

    Abstract: In any field, finding the "leading edge" of research is an on-going challenge. Researchers cannot appease reviewers and educators cannot teach to the leading edge of their field if no one agrees on what is the state-of-the-art. Using a novel crowdsourced "reuse graph" approach, we propose here a new method to learn this state-of-the-art. Our reuse graphs are less effort to build and verify than… ▽ More

    Submitted 15 August, 2021; originally announced August 2021.

    Comments: Submitted to Communications ACM

    Journal ref: CACM February 2023 (Vol. 66, No. 2)

  8. arXiv:2106.06652  [pdf, ps, other

    cs.SE

    Lessons learned from hyper-parameter tuning for microservice candidate identification

    Authors: Rahul Yedida, Rahul Krishna, Anup Kalia, Tim Menzies, Jin Xiao, Maja Vukovic

    Abstract: When optimizing software for the cloud, monolithic applications need to be partitioned into many smaller *microservices*. While many tools have been proposed for this task, we warn that the evaluation of those approaches has been incomplete; e.g. minimal prior exploration of hyperparameter optimization. Using a set of open source Java EE applications, we show here that (a) such optimization can si… ▽ More

    Submitted 10 August, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: Accepted to ASE 2021 (industry track, short paper)

  9. arXiv:2101.06319  [pdf, other

    cs.SE cs.AI

    Old but Gold: Reconsidering the value of feedforward learners for software analytics

    Authors: Rahul Yedida, Xueqi Yang, Tim Menzies

    Abstract: There has been an increased interest in the use of deep learning approaches for software analytics tasks. State-of-the-art techniques leverage modern deep learning techniques such as LSTMs, yielding competitive performance, albeit at the price of longer training times. Recently, Galke and Scherp [18] showed that at least for image recognition, a decades-old feedforward neural network can match t… ▽ More

    Submitted 5 February, 2022; v1 submitted 15 January, 2021; originally announced January 2021.

    Comments: v2

  10. arXiv:2011.07959  [pdf

    cs.IR cs.CL cs.LG

    Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets

    Authors: Rahul Yedida, Saad Mohammad Abrar, Cleber Melo-Filho, Eugene Muratov, Rada Chirkova, Alexander Tropsha

    Abstract: Objective: We aim to learn potential novel cures for diseases from unstructured text sources. More specifically, we seek to extract drug-disease pairs of potential cures to diseases by a simple reasoning over the structure of spoken text. Materials and Methods: We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the… ▽ More

    Submitted 22 October, 2020; originally announced November 2020.

    Comments: initial submission

  11. arXiv:2008.03835  [pdf, other

    cs.SE

    On the Value of Oversampling for Deep Learning in Software Defect Prediction

    Authors: Rahul Yedida, Tim Menzies

    Abstract: One truism of deep learning is that the automatic feature engineering (seen in the first layers of those networks) excuses data scientists from performing tedious manual feature engineering prior to running DL. For the specific case of deep learning for defect prediction, we show that that truism is false. Specifically, when we preprocess data with a novel oversampling technique called fuzzy sampl… ▽ More

    Submitted 20 April, 2021; v1 submitted 9 August, 2020; originally announced August 2020.

    Comments: v3, revision 2 (minor revision); submitted to TSE

  12. arXiv:2006.00444  [pdf, other

    cs.SE

    Learning to Recognize Actionable Static Code Warnings (is Intrinsically Easy)

    Authors: Xueqi Yang, Jianfeng Chen, Rahul Yedida, Zhe Yu, Tim Menzies

    Abstract: Static code warning tools often generate warnings that programmers ignore. Such tools can be made more useful via data mining algorithms that select the "actionable" warnings; i.e. the warnings that are usually not ignored. In this paper, we look for actionable warnings within a sample of 5,675 actionable warnings seen in 31,058 static code warnings from FindBugs. We find that data mining algori… ▽ More

    Submitted 10 January, 2021; v1 submitted 31 May, 2020; originally announced June 2020.

    Comments: 24 pages, 5 figures, 7 tables, accepted to Empirical Software Engineering and to appear

  13. arXiv:2005.08442  [pdf, other

    cs.LG q-bio.QM stat.ML

    Parsimonious Computing: A Minority Training Regime for Effective Prediction in Large Microarray Expression Data Sets

    Authors: Shailesh Sridhar, Snehanshu Saha, Azhar Shaikh, Rahul Yedida, Sriparna Saha

    Abstract: Rigorous mathematical investigation of learning rates used in back-propagation in shallow neural networks has become a necessity. This is because experimental evidence needs to be endorsed by a theoretical background. Such theory may be helpful in reducing the volume of experimental effort to accomplish desired results. We leveraged the functional property of Mean Square Error, which is Lipschitz… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.

  14. arXiv:1912.04061  [pdf, other

    cs.SE cs.AI cs.LG

    Simpler Hyperparameter Optimization for Software Analytics: Why, How, When?

    Authors: Amritanshu Agrawal, Xueqi Yang, Rishabh Agrawal, Rahul Yedida, Xipeng Shen, Tim Menzies

    Abstract: How can we make software analytics simpler and faster? One method is to match the complexity of analysis to the intrinsic complexity of the data being explored. For example, hyperparameter optimizers find the control settings for data miners that improve the predictions generated via software analytics. Sometimes, very fast hyperparameter optimization can be achieved by "DODGE-ing"; i.e. simply st… ▽ More

    Submitted 22 April, 2021; v1 submitted 9 December, 2019; originally announced December 2019.

    Comments: 15 pages

    Journal ref: Transactions on Software Engineering, 2021

  15. arXiv:1906.01975  [pdf, other

    astro-ph.IM cs.LG physics.data-an stat.ML

    Evolution of Novel Activation Functions in Neural Network Training with Applications to Classification of Exoplanets

    Authors: Snehanshu Saha, Nithin Nagaraj, Archana Mathur, Rahul Yedida

    Abstract: We present analytical exploration of novel activation functions as consequence of integration of several ideas leading to implementation and subsequent use in habitability classification of exoplanets. Neural networks, although a powerful engine in supervised methods, often require expensive tuning efforts for optimized performance. Habitability classes are hard to discriminate, especially when at… ▽ More

    Submitted 1 June, 2019; originally announced June 2019.

    Comments: 41 pages, 11 figures

  16. arXiv:1902.07399  [pdf, other

    cs.LG stat.ML

    LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence

    Authors: Rahul Yedida, Snehanshu Saha, Tejas Prashanth

    Abstract: Optimizing deep neural networks is largely thought to be an empirical process, requiring manual tuning of several hyper-parameters, such as learning rate, weight decay, and dropout rate. Arguably, the learning rate is the most important of these to tune, and this has gained more attention in recent works. In this paper, we propose a novel method to compute the learning rate for training deep neura… ▽ More

    Submitted 31 July, 2020; v1 submitted 19 February, 2019; originally announced February 2019.

    Comments: v4; comparison studies added

  17. arXiv:1806.10480  [pdf

    stat.ML cs.LG

    Employee Attrition Prediction

    Authors: Rahul Yedida, Rahul Reddy, Rakshit Vahi, Rahul Jana, Abhilash GV, Deepti Kulkarni

    Abstract: We aim to predict whether an employee of a company will leave or not, using the k-Nearest Neighbors algorithm. We use evaluation of employee performance, average monthly hours at work and number of years spent in the company, among others, as our features. Other approaches to this problem include the use of ANNs, decision trees and logistic regression. The dataset was split, using 70% for training… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

    Comments: 3 pages, 1 figure