Software hang problems are one of the most common problems reported by users. Some software hang problems are constantly expensive, that is, these do not depend on the workload. On the other hand, some depend on workload which are called workload-dependent performance bottlenecks (WDPBs). The main goal of this project is to implement the mechanism to identify workload-dependent performance bottlenecks in a context sensitive way. The base paper is "Context-Sensitive Delta Inference for Identifying Workload-Dependent Performance Bottlenecks" by Xiao et al. 1. The work is extended by considering multiple workload parameter rather than a single one.
In this approach, at first scenarios are chosen, for example, reading a file can be a scenario for performance analysis. Then, corresponding workload parameters are selected, for example, in the above case, number of lines can be a parameter. Then the representative value range (RVR) is chosen and workloads are generated by varying randomly from an initial point in the RVR. Then, the k-profile graph is buiilt. Regression is performed to predict execution count from workload. The complexity model where R-squared is more than a specific threshold is chosen. Then, model is validated generating workload from a validation range and comparing generated k-profile graph for this workload with the predicted count. If error is present, new workload is generated by taking average of the worst result workload and its close workload value from the validation range. This new workload is added to the workload for validation and this continues until error reaches a threshold or no more workload generation is possible. Then, WDPB candidates are identified by searching for complexity transitions where these complexities are calculated from regression prediction equations. The cost for these WDPB candidates are calculated.
The whole workflow is shown in the following figure.
In the methodlogy, the training algorithm is single attribute. However, a multiparameter model is still necessary for more accurate prediction. This mutiparameter training model has been incorporated with the project. That is, the work is extended by considering multiple parameters for training. It has been seen that incorporating the multiparameter model results in better cost coverage.
The prerequisites and using the technique by execution of the code is explained below. The input is a source code directory, which is fixed in the code as input data and the output is detected WDPB loops and coverage analysis.
- srcML - Converts source code to XML and vice-versa
- R - For Data Analysis (Training and Prediction)
- Eclipse - IDE for running Java Source Code
- JDK8 - Java Development Kit (Version 8 has been used)
- Make sure XML and R are added as environment variables
- Load the code in Eclipse and run Main.java
- For running multiparameter mode input 1 and for running single parameter model input 2
- After first pass is complete go to the training folder and run the .bat file
- Now rerun the source code. For running multiparameter mode input 3 and for running single parameter model input 4
- The resuts including WDPB loop, cost coverage, total cost are generated into the corresponding results directory
The data is a simulation of selected word color change of Notepad++. It is available in the secondsample directory. It contains a single WDPB loop in the NotepadSimulation.java file. The generated results from the base paper is in the results-single directory. The results from the extension of this work is in the results-multiple directory.
- Kishan Kumar Ganguly - Initial work - KKGanguly
- Xiao, Xusheng, et al. "Context-sensitive delta inference for identifying workload-dependent performance bottlenecks." Proceedings of the 2013 International Symposium on Software Testing and Analysis. ACM, 2013.
- Collard, Michael L., Michael J. Decker, and Jonathan I. Maletic. "Lightweight transformation and fact extraction with the srcML toolkit." Source Code Analysis and Manipulation (SCAM), 2011 11th IEEE International Working Conference on. IEEE, 2011.
- Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. New York: Springer series in statistics, 2001.