Prediction of Heart Stroke using A Novel Framework – PySpark
Main Article Content
Abstract
Heart diseases are one of the most challenging problems faced by the Health Care sectors all over the world. These diseases are very basic now a days. With the expanding count of deaths because of heart illnesses, the necessity to build up a system to foresee heart ailments precisely. The work in this paper focuses on finding the best Machine Learning algorithm for identification of heart diseases. Our study compares the precision of three well known classification algorithms, Decision Tree and Naïve Bayes, Random Forest for the prediction of heart disease by making the use of dataset provided by Kaggle. We utilized various characteristics which relate with this heart diseases well, to find the better algorithm for prediction. The result of this study indicates that the Random Forest algorithm is the most efficient algorithm for prediction of heart disease with accuracy score of 97.17%.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
“Coronary Diseases”, https://www.who.int/health-topics/cardiovascular-diseases/#tab=tab_1
Elham Nazari, Mohammad Hasan Shahriari, Hamed Tabesh “Big Data Analysis in Healthcare: Apache Hadoop, Apache spark and Apache Flink”, Frontiers in Health Informatics, September 2019.
Salman Salloum, Ruslan Dautov , Xiaojun Chen, Patrick Xiaogang Peng, Joshua Zhexue Huang “Big Data Analysis on Apache Spark”, International Journal of Data Science and Analytics, October 2016.
V V Ramalingam ,Ayantan Dandapath, M Karthik Raja,” Heart Disease Prediction using Machine Learning Techniques: A Survey”, October 2018.
N. Rajesh, T. Maneesha , Shaik Hafeez, Hari Krishna, “Prediction of Heart Disease Using Machine Learning Algorithms”, IJET, May 2018.
Al-Talqani, H.M., Dyslipidemia and Cataract in Adult Iraqi Patients. EC Ophthalmology, 2017. 5: p. 162-171.
McKinley, R., et al., Fully automated stroke tissue estimation using random forest classifiers (FASTER). Journal of Cerebral Blood Flow & Metabolism, 2017.
Jos Timanta Tarigan, C.L.G., Elviawaty Muisa Zamzami, A REVIEW ON APPLYING MACHINE LEARNING IN GAME INDUSTRY International Journal of Advanced Science and Technology, 2019-09-27 28(2).
Saiteja Myla, S.T.M., K Karthikeya ,Preetham.B , SK Hasane Ahammad, The Rise of “Big Data” in the field of Cloud Analytics. International Journal of Advanced Science and Technology, 2019. 28(8).
Ara, A. and A. Ara, Beyond Hadoop: The Paradigm Shift of Data from Stationary to Streaming Data for Data Analytics.
Hadoop, Apache Hadoop [cited 2019; Available from: https://hadoop.apache.org/.
Spark, A. Apache Spark. [cited 2019; Available from: https://spark.apache.org/.
Ahmed, H., Heart disease identification from patients’ social posts, machine learning solution on Spark. Future Generation Computer Systems, 2019.
Healthcare dataset stroke data. [cited 2019; Available from: https://www.kaggle.com/asaumya/healthcare-dataset-stroke-data.
Shanthi, D., G. Sahoo, and N. Saravanan, Designing an artificial neural network model for the prediction of thrombo-embolic stroke. International Journals of Biometric and Bioinformatics (IJBB), 2009. 3(1): p. 10-18.
Kansadub, T., et al. Stroke risk prediction model based on demographic data. in 2015 8th Biomedical Engineering International Conference (BMEiCON). 2015. IEEE.
Sung, S.-F., et al., Developing a stroke severity index based on administrative data was feasible using data mining techniques. Journal of clinical epidemiology, 2015. 68(11): p. 1292-1300.
Linder, R., et al., Two models for outcome prediction. Methods of information in medicine, 2006. 45(05): p. 536- 540
Khosla, A., et al. An integrated machine learning approach to stroke prediction. in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining 2010. ACM.
Adam, S.Y., A. Yousif, and M.B. Bashir, Classification of ischemic stroke using machine learning algorithms. Int J Comput Appl ,2016 149(10): p. 26-31.
Cheng, C.-A., Y.-C. Lin, and H.-W. Chiu. Prediction of the prognosis of ischemic stroke patients after intravenous thrombolysis using artificial neural networks. in ICIMTH ,2014.
Swethalakshmi, H., et al. Online handwritten character recognition of Devanagari and Telugu Characters using support vector machines. 2006.