Memory Management In Real-Time Mining Of Massive Complex Data Streams

Authors

  • Kavitha N
  • Dr.Y. Kalpana
  • Dr.Kumar V

DOI:

https://doi.org/10.53555/jaz.v44iS8.4355

Keywords:

data mining, data stream, data cleaning, cron, voltDB, multiple databases, IEP, FFM, cap

Abstract

The terms “real-time mining and streaming of data” have become gained immense popularity in the data field where they have access to the fastest and the latest data on a real-time basis. Real-Time-Mining attempts to develop a real-time framework to minimize adverse environmental impact and increase resource efficiency. The real-time analysis deals with a huge rate of change in data which needs to be processed and updated frequently and rapidly. Data Mining encompasses a multi-disciplinary field. This combines several domains such as artificial intelligence (AI), statistics, machine learning, database technology, etc. The key objective of data mining is to explain the past and predict the future. This is achieved by exploring and analyzing a huge amount of data almost on a real-time basis from diverse datasets and sources. This process can be termed Knowledge Discovery. Data Mining endeavors to store the data in the local data set, hosted by local computers that are connected to the computer networks. In the real  orld, data has become large and almost unmanageable with several data streams. Extraction of numerous knowledge structures from continuous and rapid data  records is called data stream mining. A data stream includes an ordered sequence of several instances. The latter can be read only once or a few times in many data stream mining applications by employing the available computing and storage capabilities in the information technology world. Though the technology comes to real-time distributed mining of complex data streams, ample research has already been conducted on decreasing computation cost, ensuring enhanced data privacy at the distributed sites, and optimal deployment of limited assets. The key characteristics of mining complex data streams include huge volume of continuous incoming infinite data; the nature of the data is fast-changing, necessitating a fast real-time response. The data become multidimensional in nature. Since the data set is complex, some of the challenges to be addressed are unbounded memory requirements. The current paper analyses how effectively memory can be managed in realtime data streams.

Downloads

Download data is not yet available.

Author Biographies

Kavitha N

Department of Information Technology, Vels Institute of Science Technology and Advanced Studies,
Chennai, Tamilnadu, India

Dr.Y. Kalpana

Department of Information Technology, Vels Institute of Science Technology and Advanced Studies,
Chennai, Tamilnadu, India

Dr.Kumar V

Former Professor and Head, Agricultural, Engineering Department, ACRI, Madurai, Tamil Nadu, India

References

Bifet and R. Kirkby, Data Stream Mining A Practical Approach.

S. K. Sen and B. K. Ratha “A Comprehensive Study on Distributed Data Mining and Learning

Algorithms,” Xi, J. B. Ni, “Deploying Mobile Agents in Distributed Data Mining,” PAKDD 2007

Workshops, pp. 322–331, 2007.

S. Bailey, R. Grossman, H. Sivakumar, and A. Turinsky, “Papyrus: A System for Data Mining over Local

and Wide Area Clusters and Super-Clusters”.

V. Sawant and K. Shah, “A review of Distributed Data Mining using agents”, International Journal of

Advanced Technology & Engineering Research (IJATER), vol. 3, no. 5, pp. 27-33, 2013.

S. Kumar, P. N. Santosh Kumar, and C. Venugopal, “An Apriori Algorithm in Distributed Data Mining

System”, Global Journal of Computer Science and Technology Software & Data Engineering, vol. 13, no.

, 2013.

K. Das, K. Bhaduri, and H. Kargupta, “A local asynchronous distributed privacy preserving feature

selection algorithm for large peer-to peer networks”, J. Knowledge and Information Systems, vol. 24(3),

pp. 341-367, Sept. 2010.

R. Vilalta, C. Giraud-Carrier, P. Brazdil, and C. Soares, “Using Meta-Learning to Support Data Mining,”

International Journal of Computer Science & Applications, vol. 1, no. 1, pp. 31-45, 2004.

S. C. Frank, Y. H. Tseng, and Y. H, Min, “Toward boosting distributed association rule mining by data

de-clustering,” Journal of Information Sciences, vol. 180, no. 22, pp. 4263-4289, Nov. 2010.

G. S. Bhamra, A. K. Verma, and R. B. Patel, “Agent Enriched Distributed Association Rules Mining,”

ADMI 2011, pp. 30–45, 2012.

J. Costa da Silva and M. Klusch, “Inferences in Distributed Data Mining”, Engineering Applications of

Artificial Intelligence, vol. 19, pp. 363 -369, 2006.

M. A. Naeem, “A robust join operator to process streaming data in real time data warehousing,” in Eighth

International Conference on Digital Information Management (ICDIM 2013), pp. 119–124, IEEE, 2013.

H. Isah, T. Abughofa, S. Mahfuz, D. Ajerla, F. Zulkernine, and S. Khan, “A survey of distributed data

stream processing frameworks,” IEEE Access, vol. 7, pp. 154300–154316, 2019.

M. A. Naeem, G. Dobbie, and G. Weber, “Efficient processing of streaming updates with archived master

data in near-real-time data warehousing,” Knowledge and information systems, vol. 40, no. 3, pp. 615–

, 2014.

M. Babar and F. Arif, “Real-time data processing scheme using big data analytics in internet of things

based smart transportation environment,” J. Ambient Intelligence and Humanized Computing, vol. 10, no.

, pp. 4167–4177, 2019.

N. Biswas, A. Sarkar, and K. C. Mondal, “Efficient incremental loading in etl processing for real-time

data integration,” Innovations in Systems and Software Engineering, pp. 1–9, 2019.

M. A. Naeem, G. Dobbie, I. S. Bajwa, and G. Weber, “Resource optimization for processing of stream

data in data warehouse environment,” in Proceedings of the International Conference on Advances in

Computing, Communications and Informatics, pp. 62–68, ACM, 2012.

R. Mukherjee and P. Kar, “A comparative review of data warehousing etl tools with new trends and

industry insight,” in 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 943–948,

IEEE, 2017.

H. Bouali, J. Akaichi, and A. Gaaloul, “Real-time data warehouse loading methodology and architecture:

a healthcare use case,” Int. J. Data Analysis Techniques and Strategies, vol. 11, no. 4, pp. 310–327, 2019.

R. Duan, R. Prodan, and T. Fahringer, "Short Paper: Data Mining-based Fault Prediction and Detection on

the Grid," High Performance Distributed Computing, 15th IEEE International Conference on High

Performance Distributed Computing, vol., no., pp.305-308, 2006

N. Khayat, Semantic Instrumentation and Measurement of Data Mining Algorithms, Technical Report on

R&D 2, Hochschule Bonn-Rhein-Sieg, 2009.

S. Datta, K. Bhaduri, C. Giannella, R. Wolff, and H. Kargupta. Distributed data mining in peer-to-peer

networks. Internet Computing,IEEE, vol. 10, no. pp. 18–26, 2006.

N. Khan, I. Yaqoob, I. A. Hashem, Z. Inayat, W. K. Ali, M. Alam, M. Shiraz, and A. Gani, “Big data:

survey, technologies, opportunities, and challenges,” Scientific World J, Article ID 712826, 2014

M. Steen, G. Pierre, and S. Voulgaris, “Challenges in very large distributed systems,” J Internet Serv

Appl, vol. 23, no. 1, pp. 59–66. G. Tsoumakas, and I. Vlahavas, Distributed data mining. In:

Encyclopaedia of Data Warehousing and Mining, IGI Global, Hershey, PA, USA, 2009, 709–715.

S. Cong, J. Han, J. Hoeflinger, and D. Padua, A sampling-based framework for parallel data mining. In:

Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Chicago,

Illinois, USA, 2005, 255–265.

P. Luo, K. Lü Z. Shi, and Q. He, “Distributed data mining in grid computing environments,” Future Gener

Comput Syst,” vol. 23, no. 1, pp. 84–91, 2007.

M. Last, “Online classification of nonstationary data streams,” Intelligent Data Analysis, vol. 6, no. 2, pp.

-147, 2002.

Shearer, “The CRISP-DM model: the new blueprint for data mining,” J. Data Warehousing, vol. 5, no. 4,

pp. 4-15, 2000.

Aggrawal, Data Streams: Models and Algorithms, Springer, 2000

L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani, “Streaming-data algorithms for highquality clustering,” in Proc. 2003 IEEE International Conference on Data Engineering.

M. M. Gaber, S. Krishnaswamy, and A. Zaslavsky, On-board mining of data streams in sensor networks

advanced, Methods of Knowledge Discovery from Complex Data, Springer, pp.307-335, 2006

S. Muthukrishnan, Data streams: algorithms and applications, Proceedings of the fourteenth annual ACMSIAM symposium on discrete algorithms, 2003

Downloads

Published

2023-12-20

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.