Distance Measures Insights For Breast Cancer Analysis Using K-NN Algorithm
DOI:
https://doi.org/10.53555/jaz.v45i3.4902Keywords:
Breast Cancer, k-Nearest Neighbor, Performance Measures, Topsoe, Average (L_1,L_∞), LorentzianAbstract
Breast cancer stands as a prevalent disease among women, ranking high in terms of frequency. It is treatable if caught early enough. The greatest technique for predicting breast cancer is what this paper seeks to deliver. Mammograms can detect abnormal growths, although they are not always 100% accurate in identifying breast cancer. This article provides a superior way of prediction without biopsy, as it is currently not possible to confirm the presence of breast cancer without a biopsy. This study proposes the k-Nearest Neighbor (k-NN) technique, which is commonly used in machine learning for regression and classification. This study requires a number of steps, such as importing the dataset, pre-processing the data, and choosing the characteristics that need to be classified. The k-NN method additionally employed a variety of distance metrics to distinguish between benign and malignant tumours. Additionally, to demonstrate the effectiveness of the suggested strategy, the produced anser is contrasted with other outcomes. With improved distance measurements made possible by the k-NN algorithm, the study's findings advance our understanding of breast cancer prediction. Topsoe, Lorentzian distance, and Averageapproaches produced the most reliable overall results. Comparisons are made between the outcomes and established techniques such as the Euclidean, Clark, and Bray-Curtis distances.
Downloads
References
Abu Alfeilat HA, Hassanat ABA, Lasassmeh O, et al. Effects of distance measure choice on k-nearest neighbor classifier performance: a review. Big Data. 2019;7:221-248
Aghdam, H. H., & Heravi, E. J. (2017). Guide to convolutional neural networks: a practical application to traffic-sign detection and classification. Springer
Bajramovic F, Mattern F, Butko N, Denzler J. A Comparison of Nearest Neighbor Search Algorithms for Generic Object Recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 4179. Berlin, Germany: Springer
Bramer, M. (2013). Principles of data mining, second edition. London: Springer
Clark PJ. An extension of the coefficient of divergence for use with multiple characters. Copeia. 1952;1952:61-64.
Euclid. (1956). The Thirteen Books of Euclid’s Elements. Courier Corporation.
Geng X, Liu T-Y, Qin T, Arnold A, Li H, Shum H-Y. Query dependent ranking using K-nearest neighbor. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Singapore). New York, NY: Association for Computing Machinery; 2008:115-122.
Karan Sharma, Victoria Rodriguez et.al, Dana Walker(2018). Breast Cancer Prediction with K-Nearest Neighbor Algorithm using Different Distance Measurements
Khamis HS, Cheruiyot KW, Kimani S. Application of k-nearest neighbour classification in medical data mining. Int J Inform Commun Technol Res. 2014;4:121-128.
Kusmirek W, Szmurlo A, Wiewiorka M, Nowak R, Gambin T. Comparison of kNN and k-means optimization methods of reference set selection for improved CNV callers performance. BMC Bioinform. 2019;20:266.
Larose, D. T., & Larose, C. D. (2015). Data mining and predictive analytics. John Wiley & Sons.
Manne S, Kotha SK, Sameen Fatima S. Text categorization with k-nearest neighbor approach. In: Proceedings of the International Conference on Information Systems Design and Intelligent Applications 2012 (INDIA 2012), Visakhapatnam, India; Berlin, Germany; Heidelberg, Germany: Springer; 2012:413-420
Rezvan Ehsani and Finn Drablos(2020) . Robust Distance Measures for k-NN Classification of Cancer data
Roder J, Oliveira C, Net L, Tsypin M, Linstid B, Roder H. A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data. BMC Bioinform. 2019;20:325.
Silverman BW, Jones MC, Fix E, Hodges JL. An important contribution to nonparametric discriminant analysis and density estimation: commentary on Fix and Hodges (1951). Int Stat Rev. 1989;57:233-238.
Sørensen T. A method of establishing groups of equal amplitudes in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Kongelige Danske Videnskabernes Selskab, Biologiske Skrifter. 1948;5:1-34.
Szmidt E. Distances and Similarities in Intuitionistic Fuzzy Sets. Berlin, Germany: Springer; 2013.
Topsoe, F. (2000). Some inequalities for information divergence and related measures of discrimination. IEEE Transactions on information theory, 46 (4), 1602–1609.
Xu S, Wu Y. An algorithm for remote sensing image classification based on artificial immune B-cell network. In: Jun C, Jie J, Cho K, eds. Xxist ISPRS Congress, Youth Forum, Vol. 37. Beijing, China: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences; 2008:107-112.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Dr.S.Bharathi, Krithika.L
This work is licensed under a Creative Commons Attribution 4.0 International License.