Distance Measures Insights For Breast Cancer Analysis Using K-NN Algorithm

Authors

  • Dr.S.Bharathi
  • Krithika.L

DOI:

https://doi.org/10.53555/jaz.v45i3.4902

Keywords:

Breast Cancer, k-Nearest Neighbor, Performance Measures, Topsoe, Average (L_1,L_∞), Lorentzian

Abstract

Breast cancer stands as a prevalent disease among women, ranking high in terms of frequency. It is treatable if caught early enough. The greatest technique for predicting breast cancer is what this paper seeks to deliver. Mammograms can detect abnormal growths, although they are not always 100% accurate in identifying breast cancer. This article provides a superior way of prediction without biopsy, as it is currently not possible to confirm the presence of breast cancer without a biopsy. This study proposes the k-Nearest Neighbor (k-NN) technique, which is commonly used in machine learning for regression and classification. This study requires a number of steps, such as importing the dataset, pre-processing the data, and choosing the characteristics that need to be classified. The k-NN method additionally employed a variety of distance metrics to distinguish between benign and malignant tumours. Additionally, to demonstrate the effectiveness of the suggested strategy, the produced anser is contrasted with other outcomes. With improved distance measurements made possible by the k-NN algorithm, the study's findings advance our understanding of breast cancer prediction. Topsoe, Lorentzian distance, and Averageapproaches produced the most reliable overall results. Comparisons are made between the outcomes and established techniques such as the Euclidean, Clark, and Bray-Curtis distances.

Downloads

Download data is not yet available.

Author Biographies

Dr.S.Bharathi

Department of Mathematics, Bharathiar University PG Extension and Research Center, Perundurai, Erode, Tamilnadu, India

Krithika.L

Research Scholar, Department of Mathematics, Bharathiar University PG Extension and Research Center, Perundurai, Erode, Tamilnadu, India

References

Abu Alfeilat HA, Hassanat ABA, Lasassmeh O, et al. Effects of distance measure choice on k-nearest neighbor classifier performance: a review. Big Data. 2019;7:221-248

Aghdam, H. H., & Heravi, E. J. (2017). Guide to convolutional neural networks: a practical application to traffic-sign detection and classification. Springer

Bajramovic F, Mattern F, Butko N, Denzler J. A Comparison of Nearest Neighbor Search Algorithms for Generic Object Recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 4179. Berlin, Germany: Springer

Bramer, M. (2013). Principles of data mining, second edition. London: Springer

Clark PJ. An extension of the coefficient of divergence for use with multiple characters. Copeia. 1952;1952:61-64.

Euclid. (1956). The Thirteen Books of Euclid’s Elements. Courier Corporation.

Geng X, Liu T-Y, Qin T, Arnold A, Li H, Shum H-Y. Query dependent ranking using K-nearest neighbor. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Singapore). New York, NY: Association for Computing Machinery; 2008:115-122.

Karan Sharma, Victoria Rodriguez et.al, Dana Walker(2018). Breast Cancer Prediction with K-Nearest Neighbor Algorithm using Different Distance Measurements

Khamis HS, Cheruiyot KW, Kimani S. Application of k-nearest neighbour classification in medical data mining. Int J Inform Commun Technol Res. 2014;4:121-128.

Kusmirek W, Szmurlo A, Wiewiorka M, Nowak R, Gambin T. Comparison of kNN and k-means optimization methods of reference set selection for improved CNV callers performance. BMC Bioinform. 2019;20:266.

Larose, D. T., & Larose, C. D. (2015). Data mining and predictive analytics. John Wiley & Sons.

Manne S, Kotha SK, Sameen Fatima S. Text categorization with k-nearest neighbor approach. In: Proceedings of the International Conference on Information Systems Design and Intelligent Applications 2012 (INDIA 2012), Visakhapatnam, India; Berlin, Germany; Heidelberg, Germany: Springer; 2012:413-420

Rezvan Ehsani and Finn Drablos(2020) . Robust Distance Measures for k-NN Classification of Cancer data

Roder J, Oliveira C, Net L, Tsypin M, Linstid B, Roder H. A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data. BMC Bioinform. 2019;20:325.

Silverman BW, Jones MC, Fix E, Hodges JL. An important contribution to nonparametric discriminant analysis and density estimation: commentary on Fix and Hodges (1951). Int Stat Rev. 1989;57:233-238.

Sørensen T. A method of establishing groups of equal amplitudes in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Kongelige Danske Videnskabernes Selskab, Biologiske Skrifter. 1948;5:1-34.

Szmidt E. Distances and Similarities in Intuitionistic Fuzzy Sets. Berlin, Germany: Springer; 2013.

Topsoe, F. (2000). Some inequalities for information divergence and related measures of discrimination. IEEE Transactions on information theory, 46 (4), 1602–1609.

Xu S, Wu Y. An algorithm for remote sensing image classification based on artificial immune B-cell network. In: Jun C, Jie J, Cho K, eds. Xxist ISPRS Congress, Youth Forum, Vol. 37. Beijing, China: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences; 2008:107-112.

Downloads

Published

2024-03-24

Issue

Section

Articles

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.