Hadoop is more popular among parallel processing frameworks. Why ?
With the evolution of social networking sites, the traditional relational databases faced some tough challenges due to their fixed schema design. These limitations gave birth to technologies like Hadoop. These challenges are
- Massive amount of incoming unstructured and semi-structured data with considerably high storage and replication cost.
- CPU performance increasing much faster than spinning disk performance over the years.
- Horizontal scaling and parallel execution.
From the time of inception, Hadoop intended to be a generalized parallel processing framework. Hadoop performs worse than the specialized data processing systems. For instance an application designed to process only graphical content will be much efficient than the a hadoop cluster with similar resources. Hadoop also needs a lot of tuning. But Hadoop gained popularity because the following reasons:
- Extremely low cost per byte:
Hadoop/HDFS uses commodity direct attached storage and shares the cost of the network & computers that its runs on with the MapReduce layers of the Hadoop stack. Hadoop is open source software, so that if an organization chooses, it can be used with zero licensing and support costs. This cost advantage lets organizations store and process orders of magnitude more data per dollar than tradition SAN or NAS systems.
- High bandwidth to support MapReduce workloads:
HDFS can deliver data into the compute infrastructure at a huge data rate, which is often a requirement of big data workloads. HDFS can easily exceed 2 gigabits per second per computer into the map-reduce layer, on a very low cost shared network.
- High data reliability and scaling:
Hadoop is designed from the ground level to efficiently store and deliver data considering scalability and constant component failures. And it does this in software, again at low cost. Hadoop has been proven best in thousands of different use cases and cluster sizes, from startups to Internet giants and governments, against a variety of production applications at full scale.
- All fit approach:
Hadoop can do any type of operation on structured or unstructured data. May the functionality be a Transactional data base or a analytical data warehouse or a machine log analysis application, it is possible in Hadoop. The developer just needs to write the map/reduce/shuffle functions. Though the performance of the application may not be enterprise ready till now, hadoop is showing great promise for improvement.
This was enough for Hadoop to stand out among other alternatives. Also Hadoop is under active development for improving performance. In near future it will be favorable option for enterprises.