Apache Hadoop, the open source framework for creating distributed applications has just been updated to version 1.0.
There is only so much power you can put into one single system, and thus most popular web applications today require multiple servers to handle the kinds of workload that they recieve.
How then are these large volumes of data stored, managed, processed, and queried? Well one answer is Apache Hadoop. Apache Hadoop provides a Java-based software platform for building highly scalable distributed applications, and storage. It can run on easily available commodity hardware.
Apache Hadoop provides two main functionalities, a distributed filesystem (HDFS), and a Map/Reduce computation system for processing data. HDFS stores data is a redundant way such that the data is distributted in 64MB chunks across the network of storage nodes, with each chunk stored in three different places. HDFS is aware of the proximity and location of each data piece so it can rout things for best efficiency.
The Map/Reduce model of Apache Hadoop can be used for computing on large amounts of data by breaking the task into small units of work that can be done in parallel.
If you are interested in Apache Hadoop, you can visit the project’s wiki to find out more about it; you can see what’s new in the latest 1.0 release here; and can download the project from here.
We’d like you to be a part of us, join us on Facebook by clicking on Like on our Facebook page at facebook.com/devworx.in.