##plugins.themes.bootstrap3.article.main##

Nagwa Elmobark

Abstract

This paper systematically analyzes Apache Hadoop's technological evolution, tracing its transformation from a web crawling subsystem to a comprehensive enterprise computing platform. Beginning with its origins in Google's foundational papers on the Google File System (GFS) and MapReduce, we examine the critical architectural decisions and technical innovations that shaped Hadoop's development across its major releases. The study talks about important technical milestones, such as how it came out of the Nutch project in 2006, how Yahoo! put it into production in 2008, how the stability-focused 1.0 release came out in 2011, how the groundbreaking YARN architecture came out in 2013, and how the security-enhanced 3.0 version came out in 2017. Our study shows how each stage of development solved a different problem related to distributed computing while also making Hadoop more useful than just being used for the web. We show how architectural changes in resource management, data storage efficiency, and processing flexibility helped Hadoop grow from a specific MapReduce implementation to a flexible distributed computing framework that can handle a wide range of business workloads. The research provides valuable insights into the technical considerations that drive distributed system evolution and offers lessons for future large-scale computing platforms.

##plugins.themes.bootstrap3.article.details##

Keywords

Apache Hadoop, Distributed Computing, MapReduce, YARN Architecture, Big Data Processing, Enterprise Computing, Technical Evolution

Section
Articles
How to Cite
THE EVOLUTION OF APACHE HADOOP: A TECHNICAL JOURNEY FROM WEB CRAWLING TO ENTERPRISE COMPUTING. (2025). Journal of Science and Technology, 30(4). https://doi.org/10.20428/jst.v30i4.2615

How to Cite

THE EVOLUTION OF APACHE HADOOP: A TECHNICAL JOURNEY FROM WEB CRAWLING TO ENTERPRISE COMPUTING. (2025). Journal of Science and Technology, 30(4). https://doi.org/10.20428/jst.v30i4.2615