4 Key Features of a Leading Big Data Hadoop Product Offering

Guest Post: Michele Nemschoff  is Vice President of Corporate Marketing at MapR Technologies

 

On February 27, 2014, Forrester Research Inc. published The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014. MapR was among the select companies that Forrester invited to participate in this evaluation. MapR was cited as a Leader and achieved the highest score for Current Offering among all reviewed vendors. The evaluation covered 32 criteria in three different divisions. Our overall score in the “Current Offering” division was 4.25 out of 5. This was the highest score of the nine vendors included.

 

Our Current Offering:

The MapR product offering exceeds our competition for several reasons, one of which is the fact that our Hadoop features are unmatched. MapR is the only distribution that is built from the ground up for business-critical production applications.

 

The MapR M7 Enterprise Database Edition is our most robust product of our Current Offering. Here’s a look at four M7 product features that set MapR apart from other Hadoop distributions:

 

1. Distributed Metadata

 

The default Hadoop architecture uses a single NameNode to store the metadata. This forces all data into a bottleneck, and limits clusters to 50-200 million files. It also creates a single point-of-failure (SPOF). If the NameNode were to fail, the entire cluster would be useless.

 

Other distributions try to sidestep the problem by using a secondary NameNode. Secondary NameNodes run as a slave to the primary NameNode, and only replicate data from it on a periodic basis. This means that those depending on a secondary NameNode cannot trust its data integrity.

 

The only real solution to the NameNode problem is to remove it. With the MapR Distribution no-NameNode solution, there are no practical limits to the number of files that can be stored on MapR. This foundational change in the Hadoop architecture distributes the metadata amongst several nodes, which is illustrated below.

 

MaapRPic.png

Photo credit: Architectural Overview of the MapR Apache Hadoop Distribution by M.C. Srivas via SlideShare; Slide 58

 

In addition to its benefits for dependability, its database performance boost is also remarkable. With only commodity hardware, you can gain 10-20 times the performance over all other distributions that utilize the centralized metadata structure.

 

This feature is an architectural improvement to Hadoop that MapR initiated in its infancy. The power it adds to our offering’s dependability and performance makes it untouchable by competitive offerings.

 

2. Low Latency

 

Your Hadoop infrastructure needs to be fast. Equally as important, it needs to stay that way. A dirty secret among many Hadoop distributions is the staggering volatility in performance and latency. The MapR M7 disk strategy obviates compactions and defragmentation that can affect performance. Because of this ability, MapR M7 achieves 5x better performance, with low 95th and 99th percentile latencies. The graph below compares the high performance and consistent low latency of the MapR M7 Edition in comparison to other Hadoop distributions.

 

Mapr2.png

 

Notice how M7’s highest point of latency is much lower than the other distributions. The difference in volatility is even more shocking. With M7, you can depend on a consistent low latency experience.

 

3. High Availability

 

High availability (HA) refers to the capability of a Hadoop system to continue functioning, regardless of multiple system failures. For companies running mission-critical applications, HA is a necessity.

 

The best way to ensure that your distributed system is highly available is by using an architecture that distributes the metadata. The MapR architecture increases performance and removes the SPOF.

 

The MapR Distribution for Hadoop provides high availability with self-healing and support for multiple failures. This means that your Hadoop infrastructure will be accessible during system failures, system upgrades and data recoveries.

 

4. Snapshots

 

Other distributions use the HDFS snapshot system, which has several downsides when compared to the MapR Distribution for Hadoop:

 

True Point-In-Time

HDFS snapshots only capture data that is closed at the time the snapshot is taken. If you are using snapshots as an automated recovery system, you will have no guarantees that the data is complete. With MapR, you can perform point-in-time recovery of all files and tables, whether they are open or not.

 

Supports All Applications

MapR Snapshots support all Hadoop applications by default.

 

No Data Duplication

MapR snapshots never duplicate your data and share the same storage with your live information. This allows clients to capture snapshots of a 1 petabyte cluster in just seconds.

 

As we look at these features that are exclusive to MapR, it seems obvious why our customers are continually excited about our product offering. We feel this was made apparent in our scores in the previously mentioned independent evaluation.

 

In our opinion, the results of that product evaluation are just another testament to the caliber of our product offering. We continue to push ourselves with our product offerings, and look forward to more recognitions like this in the future.

Labels: Hadoop| HP cloud| MapR
Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Stephen Spector is a HP Cloud Evangelist promoting the OpenStack based clouds at HP for hybrid, public, and private clouds . He was previous...


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation