Data Pre-processing Issues in Medical Data Classification

Ashwini Tuppad; Shantala Devi Patil

doi:10.17762/jaz.v44iS6.2361

Authors

Ashwini Tuppad
Shantala Devi Patil

DOI:

https://doi.org/10.17762/jaz.v44iS6.2361

Keywords:

Data, Pre-processing, Missing data, Imputation, Outlier, Sampling

Abstract

With digitalization of data and the rise of World Wide Web, access to information has been very easy and affordable. Especially the Web and the Internet have boosted research activities by facilitating access to large, publicly available medical datasets under open access scheme. These developments have resulted in explosive amounts of data being generated varying in volume, variety and velocity thus referred to as big data. Availability of such medical big data has catalyzed the research in medical predictive analytics. However, the true value of such data can be derived only after subjecting it to careful processing and analysis before drawing inferences from it. Publicly available medical datasets have noise in the form of missing values, outliers and data inconsistencies, that may affect the results or outcomes negatively. Pre-processing of such data is essential to eliminate noisy elements and refine the data to be suitable for further analysis and processing. This paper signifies the need for data pre-processing and explains the data pre-processing pipeline with various underlying stages constituting it. It also presents a comparative analysis of various data pre-processing techniques for handling missing values and outliers in a dataset..

Downloads

Download data is not yet available.

Data Pre-processing Issues in Medical Data Classification

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

Make a Submission

Our Indexing Partners