Long-Term Software Fault Prediction Model with Linear Regression and Data Transformation

Begum, Momotaz; Rony, Jahid Hasan; Islam, Md. Rashedul; Uddin, Jia

doi:10.61186/jist.36585.11.43.222

Manuscript ID : 2022041336585 Visit : 9417 Page: 222 - 231

10.61186/jist.36585.11.43.222

20.1001.1.23221437.2023.11.43.6.5

Article Type: Original Research

Long-Term Software Fault Prediction Model with Linear Regression and Data Transformation

Subject Areas : Machine learning

Momotaz Begum ¹ , Jahid Hasan Rony ² , Md. Rashedul Islam ³ , Jia Uddin ^{4
*}

1 - Department of Computer Science and Engineering, Dhaka University of Engineering & Technology, Gazipur-1707, Dhaka, Bangladesh
2 - Department of Computer Science and Engineering, Dhaka University of Engineering & Technology, Gazipur-1707, Dhaka, Bangladesh
3 - Department of Computer Science and Engineering, International University of Business Agriculture and Technology
4 - AI and Big Data Department, Endicott College, Woosong University, Daejeon, South Korea

Received: 2022-04-13 Accepted : 2022-10-27 Published : 2023-08-20

Keywords: Software Reliability, Software Faults, Forecasting, Long Term Prediction, Relative Error,

Abstract :

The validation performance is obligatory to ensure the software reliability by determining the characteristics of an implemented software system. To ensure the reliability of software, not only detecting and solving occurred faults but also predicting the future fault is required. It is performed before any actual testing phase initiates. As a result, various works on software fault prediction have been done. In this paper presents, we present a software fault prediction model where different data transformation methods are applied with Poisson fault count data. For data pre-processing from Poisson data to Gaussian data, Box-Cox power transformation (Box-Cox_T), Yeo-Johnson power transformation (Yeo-Johnson_T), and Anscombe transformation (Anscombe_T) are used here. And then, to predict long-term software fault prediction, linear regression is applied. Linear regression shows the linear relationship between the dependent and independent variable correspondingly relative error and testing days. For synthesis analysis, three real software fault count datasets are used, where we compare the proposed approach with Naïve gauss, exponential smoothing time series forecasting model, and conventional method software reliability growth models (SRGMs) in terms of data transformation (With_T) and non-data transformation (Non_T). Our datasets contain days and cumulative software faults represented in (62, 133), (181, 225), and (114, 189) formats, respectively. Box-Cox power transformation with linear regression (L_Box-Cox_T) method, has outperformed all other methods with regard to average relative error from the short to long term.

References:

[1] J. Stilgoe, “Who Killed Elaine Herzberg?,” in Who’s Driving Innovation? New Technologies and the Collaborative State, J. Stilgoe, Ed. Cham: Springer International Publishing, 2020, pp. 1–6. doi: 10.1007/978-3-030-32320-2_1.
[2] B. P. Murthy, N. Krishna, T. Jones, A. Wolkin, R. N. Avchen, and S. J. Vagi, “Public Health Emergency Risk Communication and Social Media Reactions to an Errant Warning of a Ballistic Missile Threat — Hawaii, January 2018,” Morb. Mortal. Wkly. Rep., vol. 68, no. 7, pp. 174–176, Feb. 2019, doi: 10.15585/mmwr.mm6807a2.
[3] H. Pham, System Software Reliability. Springer Science & Business Media, 2007.
[4] T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, “Defect prediction from static code features: current results, limitations, new approaches,” Autom. Softw. Eng., vol. 17, no. 4, pp. 375–407, Dec. 2010, doi: 10.1007/s10515-010-0069-5.
[5] A. L. Goel, “Software Reliability Models: Assumptions, Limitations, and Applicability,” IEEE Trans. Softw. Eng., vol. SE-11, no. 12, pp. 1411–1423, Dec. 1985, doi: 10.1109/TSE.1985.232177.
[6] A. A. Abdel-Ghaly, P. Y. Chan, and B. Littlewood, “Evaluation of competing software reliability predictions,” IEEE Trans. Softw. Eng., vol. SE-12, no. 9, pp. 950–967, Sep. 1986, doi: 10.1109/TSE.1986.6313050.
[7] S. Santosa, R. A. Pramunendar, D. P. Prabowo, and Y. P. Santosa, “Wood Types Classification using Back-Propagation Neural Network based on Genetic Algorithm with Gray Level Co-occurrence Matrix for Features Extraction,” 2019.
[8] Y. Wang, D. Niu, and L. Ji, “Short-term power load forecasting based on IVL-BP neural network technology,” Syst. Eng. Procedia, vol. 4, pp. 168–174, Jan. 2012, doi: 10.1016/j.sepro.2011.11.062.
[9] “Long-term Software Fault Prediction with Robust Prediction Interval Analysi...: EBSCOhost.”
[10] M. Begum and T. Dohi, “Optimal Release Time Estimation of Software System using Box-Cox Transformation and Neural Network,” Int. J. Math. Eng. Manag. Sci., vol. 3, pp. 177–194, Jun. 2018, doi: 10.33889/IJMEMS.2018.3.2-014.
[11] M. Begum and T. Dohi, “Estimating prediction interval of cumulative number of software faults using back propagation algorithm,” May 2016.
[12] M. Begum and T. Dohi, optimal software release decision via artificial neural network approach with bug count data. 2016.
[13] M. Begum and T. Dohi, “Prediction Interval of Cumulative Number of Software Faults Using Multilayer Perceptron,” vol. 619, pp. 43–58, Jan. 2016, doi: 10.1007/978-3-319-26396-0_4.
[14] M. Begum and T. Dohi, “A Neuro-Based Software Fault Prediction with Box-Cox Power Transformation,” J. Softw. Eng. Appl., vol. 10, no. 3, Art. no. 3, Mar. 2017, doi: 10.4236/jsea.2017.103017.
[15] M. Begum and T. Dohi, “Optimal stopping time of software system test via artificial neural network with fault count data,” J. Qual. Maint. Eng., vol. 24, pp. 00–00, Jan. 2018, doi: 10.1108/JQME-12-2016-0082.
[16] Y. Kamei and E. Shihab, “Defect Prediction: Accomplishments and Future Challenges,” in 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Mar. 2016, vol. 5, pp. 33–45. doi: 10.1109/SANER.2016.56.
[17] V. R. Basili, “The experimental paradigm in software engineering,” in Experimental Software Engineering Issues: Critical Assessment and Future Directions, Berlin, Heidelberg, 1993, pp. 1–12. doi: 10.1007/3-540-57092-6_91.
[18] T. M. Khoshgoftaar et al., “Predicting fault-prone modules with case-based reasoning,” in Proceedings The Eighth International Symposium on Software Reliability Engineering, Nov. 1997, pp. 27–35. doi: 10.1109/ISSRE.1997.630845.
[19] C. Catal, “Software fault prediction: A literature review and current trends,” Expert Syst. Appl., vol. 38, no. 4, pp. 4626–4636, Apr. 2011, doi: 10.1016/j.eswa.2010.10.024.
[20] K. Thantirige, A. K. Rathore, S. K. Panda, S. Mukherjee, M. A. Zagrodnik, and A. K. Gupta, “An open-switch fault detection method for cascaded H-bridge multilevel inverter fed industrial drives,” in IECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society, Oct. 2016, pp. 2159–2165. doi: 10.1109/IECON.2016.7794032.
[21] M. Islam, M. Akhtar, and M. Begum, Long short-term memory (LSTM) networks based software fault prediction using data transformation methods. 2022, p. 6. doi: 10.1109/ICAEEE54957.2022.9836388. [22] M. Islam, M. Begum and M. Akhtar, Recursive Approach for Multiple Step-Ahead Software Fault Prediction through Long Short-Term Memory (LSTM). p. 10.
[23] H. K. Dam et al., “Lessons Learned from Using a Deep Tree-Based Model for Software Defect Prediction in Practice,” in 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), May 2019, pp. 46–57. doi: 10.1109/MSR.2019.00017.
[24] D. Sharma and P. Chandra, “Linear regression with factor analysis in fault prediction of software,” J. Interdiscip. Math., vol. 23, pp. 11–19, Jan. 2020, doi: 10.1080/09720502.2020.1721641.
[25] D. J. Pedregal, “Time series analysis and forecasting with ECOTOOL,” PLOS ONE, vol. 14, no. 10, p. e0221238, Oct. 2019, doi: 10.1371/journal.pone.0221238.
[26] O. Nyarko-Boateng, A. F. Adekoya, and B. A. Weyori, “Predicting the actual location of faults in underground optical networks using linear regression,” Eng. Rep., vol. 3, no. 3, p. eng212304, 2021, doi: 10.1002/eng2.12304.
[27] G. E. P. Box and D. R. Cox, “An Analysis of Transformations,” J. R. Stat. Soc. Ser. B Methodol., vol. 26, no. 2, pp. 211–252, 1964.
[28] F. J. Anscombe, “The Transformation of Poisson, Binomial and Negative-Binomial Data,” Biometrika, vol. 35, no. 3/4, pp. 246–254, 1948, doi: 10.2307/2332343.
[29] S. Weisberg, “Yeo-Johnson Power Transformations.” 2001.
[30] E. S. Gardner, “Exponential smoothing: The state of the art—Part II,” Int. J. Forecast., vol. 22, no. 4, pp. 637–666, Oct. 2006, doi: 10.1016/j.ijforecast.2006.03.005.
[31] X. Su, X. Yan, and C.-L. Tsai, “Linear regression,” WIREs Comput. Stat., vol. 4, no. 3, pp. 275–294, 2012, doi: 10.1002/wics.1198.
[32] H. Okamura and T. Dohi, “SRATS: Software reliability assessment tool on spreadsheet (Experience report),” in 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE), Nov. 2013, pp. 100–107. doi: 10.1109/ISSRE.2013.6698909.
[33] M. R. Lyu, Ed., Handbook of Software Reliability Engineering. Los Alamitos, Calif.: New York: McGraw-Hill, 1996.
[34] A. Rasoolzadegan, “A new approach to the quantitative measurement of software reliability,” 2015.

Enhancing IoT Security: A Hybrid Deep Learning-Based Intrusion Detection System Utilizing LSTM, GRU, and Attention Mechanisms with Optimized Hyperparameter Tuning
Print Date : 2025-11-02
Resolving Class Imbalance in Medical Classification: Technique Comparison and Performance Evaluation
Print Date : 2025-11-02
Optimizing Hyperparameters for Customer Churn Prediction with PSO-Enhanced Composite Deep Learning Techniques
Print Date : 2025-07-26
A Holistic Approach to Stress Identification: Integrating Questionnaires and Physiological Signals through Machine Learning
Print Date : 2025-07-26
Designing a Hybrid Algorithm that Combines Deep Learning and PSO for Proactive Detection of Attacks in IoT Networks
Print Date : 2025-07-26
Credit Risk Prediction: An Application of Federated Learning
Print Date : 2025-07-26

Share To

Article Url

Long-Term Software Fault Prediction Model with Linear Regression and Data Transformation