Data Pre-processing Issues in Medical Data Classification
Main Article Content
Abstract
With digitalization of data and the rise of World Wide Web, access to information has been very easy and affordable. Especially the Web and the Internet have boosted research activities by facilitating access to large, publicly available medical datasets under open access scheme. These developments have resulted in explosive amounts of data being generated varying in volume, variety and velocity thus referred to as big data. Availability of such medical big data has catalyzed the research in medical predictive analytics. However, the true value of such data can be derived only after subjecting it to careful processing and analysis before drawing inferences from it. Publicly available medical datasets have noise in the form of missing values, outliers and data inconsistencies, that may affect the results or outcomes negatively. Pre-processing of such data is essential to eliminate noisy elements and refine the data to be suitable for further analysis and processing. This paper signifies the need for data pre-processing and explains the data pre-processing pipeline with various underlying stages constituting it. It also presents a comparative analysis of various data pre-processing techniques for handling missing values and outliers in a dataset..
Downloads
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.