Architecture and software tools for Big Data management in computational environments
Main Article Content
Abstract
In the digital age, the processing of large volumes of data (Big Data) has become a fundamental challenge for organizations across various sectors. It is projected that by 2026, the global volume of data will reach approximately 200 to 210 zettabytes, driven primarily by the growth of IoT devices, social networks, and corporate information systems. In this context, software infrastructure plays a crucial role in the efficient collection, analysis, and storage of massive datasets.
This article presents a critical review of the leading technologies and architectures used in the Big Data ecosystem, with particular emphasis on widely adopted tools such as Apache Hadoop, Apache Spark, and NoSQL databases (e.g., MongoDB, Cassandra, and HBase). The capabilities of these technologies are examined in relation to the fundamental characteristics of Big Data—volume, velocity, and variety—while comparing their approaches to distributed processing, in-memory computation, and response times across different environments.
Furthermore, reference architectures such as Lambda and Kappa are analyzed, highlighting their contributions to real-time and batch processing. Finally, current industry challenges are addressed, including scalability, the integration of heterogeneous data sources, and security and privacy concerns. The article concludes with a discussion of emerging trends such as Artificial Intelligence, Machine Learning, Edge Computing, and Cloud infrastructures, which are redefining the possibilities of large-scale data analysis.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors retain the copyright of their works and grant the journal IDEAS the right of first publication. Articles are published under the Creative Commons Attribution–NonCommercial–NoDerivatives 4.0 International License (CC BY-NC-ND 4.0), which allows reading, downloading, copying, distributing, and sharing the content for non-commercial purposes, provided that proper credit is given to the author(s) and the original publication in the journal, without making modifications or creating derivative works. The journal IDEAS does not charge fees for submission, processing, or publication of manuscripts and guarantees open access to its contents.
How to Cite
References
“Apache hadoop,” https://hadoop.apache.org/, 2023, [En línea].
“Addressing big data problem using hadoop and map reduce,” in 2012 Nirma University International Conference on Engineering (NUiCONE), 2022, pp. 1–5, [En línea]. [Online]. Available: https://doi.org/10.1109/NUICONE.2012.6493198
Understanding Big Data: Analytics for enterprise class Hadoop and streaming data, 2022.
“Cassandra: A decentralized structured storage system,” ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2020, [En línea]. [Online]. Available: https://doi.org/10.1145/1773912.1773922
“Apache spark,” https://spark.apache.org/, 2023, [En línea].
“Mapreduce: Simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008, [En línea]. [Online]. Available: https://doi.org/10.1145/1327452.1327492
“Mongodb,” https://www.mongodb.com/, 2023, [En línea].
“The hadoop distributed file system,” in 2022 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2022, pp. 1–10, [En línea]. [Online]. Available: https://doi.org/10.1109/MSST.2010.5496972
“The google file system,” ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 29–43, 2023, [En línea]. [Online]. Available: https://doi.org/10.1145/945445.945450
“Apache hbase,” https://hbase.apache.org/, 2023, [En línea].
“Spark: Cluster computing with working sets,” in Proceedings of the 2nd USENIX conference on Hot topics in cloud computing (HotCloud’10), 2023, p. 10, [En línea]. [Online]. Available: https://doi.org/10.5555/1863103.1863113
MongoDB: The Definitive Guide, 2024.
“The tail at scale,” Communications of the ACM, vol. 56, no. 2, pp. 74–80, 2023, [En línea]. [Online]. Available: https://doi.org/10.1145/2408776.2408794
“Bigtable: A distributed storage system for structured data,” ACM Transactions on Computer Systems (TOCS), vol. 26, no. 2, pp. 1–26, 2023, [En línea]. [Online]. Available: https://doi.org/10.1145/1138057.1138067
“The hadoop distributed file system: Architecture and design,” Hadoop Project Website, 2023, [En línea]. [Online]. Available: https://hadoop.apache.org/
Hadoop: The Definitive Guide, 2024.
“Megastore: Providing scalable, highly available storage for interactive services,” in Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2021, pp. 223–234.
“Data management in the cloud: Limitations and opportunities,” IEEE Data Engineering Bulletin, vol. 32, no. 1, pp. 3–12, 2023.
Big Data: Principles and best practices of scalable realtime data systems, 2022.
“10 rules for scalable performance in ’simple operation’ datastores,” Communications of the ACM, vol. 54, no. 6, pp. 72–80, 2022, [En línea]. [Online]. Available: https://doi.org/10.1145/1953122.1953144