Memory Management In Real-Time Mining Of Massive Complex Data Streams
DOI:
https://doi.org/10.53555/jaz.v44iS8.4355Keywords:
data mining, data stream, data cleaning, cron, voltDB, multiple databases, IEP, FFM, capAbstract
The terms “real-time mining and streaming of data” have become gained immense popularity in the data field where they have access to the fastest and the latest data on a real-time basis. Real-Time-Mining attempts to develop a real-time framework to minimize adverse environmental impact and increase resource efficiency. The real-time analysis deals with a huge rate of change in data which needs to be processed and updated frequently and rapidly. Data Mining encompasses a multi-disciplinary field. This combines several domains such as artificial intelligence (AI), statistics, machine learning, database technology, etc. The key objective of data mining is to explain the past and predict the future. This is achieved by exploring and analyzing a huge amount of data almost on a real-time basis from diverse datasets and sources. This process can be termed Knowledge Discovery. Data Mining endeavors to store the data in the local data set, hosted by local computers that are connected to the computer networks. In the real orld, data has become large and almost unmanageable with several data streams. Extraction of numerous knowledge structures from continuous and rapid data records is called data stream mining. A data stream includes an ordered sequence of several instances. The latter can be read only once or a few times in many data stream mining applications by employing the available computing and storage capabilities in the information technology world. Though the technology comes to real-time distributed mining of complex data streams, ample research has already been conducted on decreasing computation cost, ensuring enhanced data privacy at the distributed sites, and optimal deployment of limited assets. The key characteristics of mining complex data streams include huge volume of continuous incoming infinite data; the nature of the data is fast-changing, necessitating a fast real-time response. The data become multidimensional in nature. Since the data set is complex, some of the challenges to be addressed are unbounded memory requirements. The current paper analyses how effectively memory can be managed in realtime data streams.
Downloads
References
Bifet and R. Kirkby, Data Stream Mining A Practical Approach.
S. K. Sen and B. K. Ratha “A Comprehensive Study on Distributed Data Mining and Learning
Algorithms,” Xi, J. B. Ni, “Deploying Mobile Agents in Distributed Data Mining,” PAKDD 2007
Workshops, pp. 322–331, 2007.
S. Bailey, R. Grossman, H. Sivakumar, and A. Turinsky, “Papyrus: A System for Data Mining over Local
and Wide Area Clusters and Super-Clusters”.
V. Sawant and K. Shah, “A review of Distributed Data Mining using agents”, International Journal of
Advanced Technology & Engineering Research (IJATER), vol. 3, no. 5, pp. 27-33, 2013.
S. Kumar, P. N. Santosh Kumar, and C. Venugopal, “An Apriori Algorithm in Distributed Data Mining
System”, Global Journal of Computer Science and Technology Software & Data Engineering, vol. 13, no.
, 2013.
K. Das, K. Bhaduri, and H. Kargupta, “A local asynchronous distributed privacy preserving feature
selection algorithm for large peer-to peer networks”, J. Knowledge and Information Systems, vol. 24(3),
pp. 341-367, Sept. 2010.
R. Vilalta, C. Giraud-Carrier, P. Brazdil, and C. Soares, “Using Meta-Learning to Support Data Mining,”
International Journal of Computer Science & Applications, vol. 1, no. 1, pp. 31-45, 2004.
S. C. Frank, Y. H. Tseng, and Y. H, Min, “Toward boosting distributed association rule mining by data
de-clustering,” Journal of Information Sciences, vol. 180, no. 22, pp. 4263-4289, Nov. 2010.
G. S. Bhamra, A. K. Verma, and R. B. Patel, “Agent Enriched Distributed Association Rules Mining,”
ADMI 2011, pp. 30–45, 2012.
J. Costa da Silva and M. Klusch, “Inferences in Distributed Data Mining”, Engineering Applications of
Artificial Intelligence, vol. 19, pp. 363 -369, 2006.
M. A. Naeem, “A robust join operator to process streaming data in real time data warehousing,” in Eighth
International Conference on Digital Information Management (ICDIM 2013), pp. 119–124, IEEE, 2013.
H. Isah, T. Abughofa, S. Mahfuz, D. Ajerla, F. Zulkernine, and S. Khan, “A survey of distributed data
stream processing frameworks,” IEEE Access, vol. 7, pp. 154300–154316, 2019.
M. A. Naeem, G. Dobbie, and G. Weber, “Efficient processing of streaming updates with archived master
data in near-real-time data warehousing,” Knowledge and information systems, vol. 40, no. 3, pp. 615–
, 2014.
M. Babar and F. Arif, “Real-time data processing scheme using big data analytics in internet of things
based smart transportation environment,” J. Ambient Intelligence and Humanized Computing, vol. 10, no.
, pp. 4167–4177, 2019.
N. Biswas, A. Sarkar, and K. C. Mondal, “Efficient incremental loading in etl processing for real-time
data integration,” Innovations in Systems and Software Engineering, pp. 1–9, 2019.
M. A. Naeem, G. Dobbie, I. S. Bajwa, and G. Weber, “Resource optimization for processing of stream
data in data warehouse environment,” in Proceedings of the International Conference on Advances in
Computing, Communications and Informatics, pp. 62–68, ACM, 2012.
R. Mukherjee and P. Kar, “A comparative review of data warehousing etl tools with new trends and
industry insight,” in 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 943–948,
IEEE, 2017.
H. Bouali, J. Akaichi, and A. Gaaloul, “Real-time data warehouse loading methodology and architecture:
a healthcare use case,” Int. J. Data Analysis Techniques and Strategies, vol. 11, no. 4, pp. 310–327, 2019.
R. Duan, R. Prodan, and T. Fahringer, "Short Paper: Data Mining-based Fault Prediction and Detection on
the Grid," High Performance Distributed Computing, 15th IEEE International Conference on High
Performance Distributed Computing, vol., no., pp.305-308, 2006
N. Khayat, Semantic Instrumentation and Measurement of Data Mining Algorithms, Technical Report on
R&D 2, Hochschule Bonn-Rhein-Sieg, 2009.
S. Datta, K. Bhaduri, C. Giannella, R. Wolff, and H. Kargupta. Distributed data mining in peer-to-peer
networks. Internet Computing,IEEE, vol. 10, no. pp. 18–26, 2006.
N. Khan, I. Yaqoob, I. A. Hashem, Z. Inayat, W. K. Ali, M. Alam, M. Shiraz, and A. Gani, “Big data:
survey, technologies, opportunities, and challenges,” Scientific World J, Article ID 712826, 2014
M. Steen, G. Pierre, and S. Voulgaris, “Challenges in very large distributed systems,” J Internet Serv
Appl, vol. 23, no. 1, pp. 59–66. G. Tsoumakas, and I. Vlahavas, Distributed data mining. In:
Encyclopaedia of Data Warehousing and Mining, IGI Global, Hershey, PA, USA, 2009, 709–715.
S. Cong, J. Han, J. Hoeflinger, and D. Padua, A sampling-based framework for parallel data mining. In:
Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Chicago,
Illinois, USA, 2005, 255–265.
P. Luo, K. Lü Z. Shi, and Q. He, “Distributed data mining in grid computing environments,” Future Gener
Comput Syst,” vol. 23, no. 1, pp. 84–91, 2007.
M. Last, “Online classification of nonstationary data streams,” Intelligent Data Analysis, vol. 6, no. 2, pp.
-147, 2002.
Shearer, “The CRISP-DM model: the new blueprint for data mining,” J. Data Warehousing, vol. 5, no. 4,
pp. 4-15, 2000.
Aggrawal, Data Streams: Models and Algorithms, Springer, 2000
L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani, “Streaming-data algorithms for highquality clustering,” in Proc. 2003 IEEE International Conference on Data Engineering.
M. M. Gaber, S. Krishnaswamy, and A. Zaslavsky, On-board mining of data streams in sensor networks
advanced, Methods of Knowledge Discovery from Complex Data, Springer, pp.307-335, 2006
S. Muthukrishnan, Data streams: algorithms and applications, Proceedings of the fourteenth annual ACMSIAM symposium on discrete algorithms, 2003
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Kavitha N, Dr.Y. Kalpana, Dr.Kumar V
This work is licensed under a Creative Commons Attribution 4.0 International License.