Building a Big Data platform the Red Hat way

Designing a scalable big data platform is one of the key decisions organizations will face in the near future. The platform they choose should enable them to deal with the scale and growth of data that has never been seen before. Big data is not just about running map reduce applications. There are several other factors that enterprises needs to consider, making this one of the most important decisions they will make in this decade.

As big data is crossing the chasm and is entering into mainstream enterprises. Red Hat has a suite of products for big data platform that allows you to address the full spectrum of big data business challenges. One can easily observe that big data deployments are dominated by Linux and the dominant Linux underneath the big data deployments is Red Hat Enterprise Linux. Red Hat Enterprise Virtualization is leading the way with its high performance and para-virtualized low I/O overhead as a good fit for I/O intensive big data workloads. Red Hat Enterprise Linux along with Red Hat Enterprise Virtualization makes for a compelling foundation in an organization's big data environment.

Adding to this mix, the Red Hat Storage helps enterprises to get the scalability in storage that they need to handle big data problems. Red Hat Storage reduces the distance between data silos and serves as the general purpose data store. Enterprises can integrate map reduce and other workloads that exploit data locality directly onto the Red Hat Storage clusters. Existing map reduce jobs can run seamlessly on these clusters without any modification.

On the compute side, Red Hat Grid, the leader in distributed computing, brings big compute to big data. As large enterprises have multiple Hadoop clusters, islands of data are getting created and it reduces the returns from the IT infrastructure. There is a need to consolidate them in to super cluster or federated clusters which allows independent pools to use each others' resources. Enterprises need a common interface layer for submission, monitoring and reporting of map reduce, and other jobs. Running Hadoop on Red Hat Grid provides this powerful capability. The name node and the data node that are part of the Hadoop instance can themselves be encapsulated as jobs. When run in this fashion all the policies, lifecycle functionality, scalability and migration capability that the grid provides are available to map reduce jobs running in the grid. Intelligence is built into the engine that can match the jobs to appropriate resources in appropriate places. It does a fair sharing of the resources subject to quotas, limits and priorities. It can dispatch jobs and data to resources, handle errors / failures and report results. It can store the information not just for analytics, but for other common needs such as metering and capacity planing.

Red Hat also has cloud offerings which complement the big data use case - openshift provides an open platform for building and deploying modern cloud applications. Infinispan and hibernate are good examples of middleware technology which are quite relevant in this space. Red Hat JBoss Data Grid which is a perfect fit in the in-memory data grid for real time big data market, would enable companies to scale their applications without adding to their relational database sprawl.

In a nutshell, Red Hat has an array of solution offerings which are connected to the big data movement. This combined with the fact that big data is an open source dominated landscape makes Red Hat the defacto choice for big data platform.

Know more about Red Hat portfolio of products for the big data platform at http://www.redhat.com

This is a sponsored blogpost from Red Hat

The Fifth Elephant

Building a Big Data platform the Red Hat way 25 July 2012