GreenButton Simplifies Hadoop

Guest Author: Christian Smith, Solution Architect, GreenButton

 

As the quantity and complexity of unstructured data increases, so does the need to process it.  Businesses are finding clever and innovative ways of turning this data into a source of revenue, not directly, but through a better understanding of their business, their customers and the habits exhibited by them.

 

Because of this growth, we’re often finding the traditional data analysis techniques aren’t up to the task and as a result of this Hadoop has become a leader in the Big Data and Analytics spaces.

 

While the Hadoop ecosystem offers many valuable tools like MapReduce, HBase, Hive, PIG and Oozie, setting these up and running your jobs can be a daunting task.  The Hadoop components require significant capital for the hardware and ops, and raise numerous questions like ‘how much hardware do I need?’  It would be nearly impossible to cater for all the potential workloads, and if you did you would probably end up with a lot of idle resources.  If you aim too low, you end up with contention over the limited resources.  What businesses really need is a managed Hadoop service that scales to their needs, enabling them to use the resources required for a particular job.  This approach also mitigates the contention and fighting over resources.  Need to run more than one job?  No problem, just create a new cluster or scale up an existing one.  When you’re done, you can just delete the cluster.

 

Another hurdle is around job submission and handling job dependencies and data.  Currently the story around Hadoop’s job submission and monitoring are a little rough (although improving).  Typically you would execute MapReduce jobs directly on a Hadoop node or submit a workflow via Oozie but this requires you to ensure all job dependencies are available on each of the slaves which can be a tedious and error prone process.

 

Governance is another major problem.  How do I calculate the Hadoop resources used by my marketing department over the last month?  How many hours did Bob use?

 

This is where GreenButton comes in.  GreenButton has been simplifying HPC applications for years and has now extended this knowledge to the Hadoop stack, offering:

 

  • Easy and automated provisioning of on-demand or permanent clusters
  • Easy job submission, monitoring and gathering of outputs via Mission Control
  • Governance tools to identify and track resource usage
  • Tools to manage and monitor your cluster(s)
  • High performance data synchronization with CloudSync

Provisioning

 

The GreenButton solution simplifies Hadoop provisioning in the cloud of your choice, e.g. HP’s OpenStack, Azure, Amazon etc. 

 

As a customer you can choose between on-demand and permanent clusters.  On-demand clusters are provisioned by GreenButton when a job is submitted, and removed when a job completes.  All job outputs are persisted in Swift storage and can be accessed at a later time.

 

Permanent clusters can be provisioned and deleted via the GreenButton API, as required.  Jobs can then be submitted to specific clusters, or any available clusters.  A Permanent cluster can also be dynamically scaled up and down to accommodate larger jobs as needed.

 

Job Submission & Monitoring

 

GreenButton has a mature RESTful API[1] that allows you to easily submit jobs and their required assets.  Job progress can be monitored and job outputs can be downloaded via the API or Mission Control.  Mission Control can be used to get an overview of the cluster CPU, memory and IO during the lifetime of a job and its tasks.

 

greenbutton1b.jpg

 

Governance with Mission Control

 

GreenButton’s Mission Control provides governance controls to manage costs across cloud deployments with spending limits and allows you to breakdown costs so usage can be charged back to departments and projects in your organization.

 

greenbutton2b.png

 

greenbutton3b.png

 

Cluster Management & Monitoring

 

Each Hadoop cluster includes a dedicated Ambari[2] deployment for monitoring the clusters health, resources, start or stop services and make changes to Hadoop configuration.  Ambari includes a Web UI which visualizes the various real-time metrics including memory, CPU, IO, JVM, Map & Reduce slots, and many more.

 

greenbutton4.jpg

 

The Hadoop User Experience - Hue

 

Each cluster is deployed with a dedicated instance of Hue[3], an easy to use web application for the Hadoop ecosystem including HDFS, MapReduce, Oozie, PIG, Hive, and HBase.  Hue provides a full-featured user interface making it really easy to get started with Hadoop, including drag & drop creation of Oozie workflows and one-click submission.

 

greenbutton5b.jpg

 

The Hue website offers good tutorials covering the different components and latest Hue features.

http://gethue.tumblr.com/tagged/tutorial

 

Data Synchronization with CloudSync

 

One of the hurdles with running ‘Big Data’ workloads in the cloud, is getting that Big Data to the cloud so that it can be processed.  GreenButton has developed its CloudSync product for just this problem, allowing customers to easily synchronize data between local storage and the cloud, or between cloud services. It even supports ETL from custom data sources. CloudSync makes use of the GridFTP protocol to facilitate large, parallel data transfers over UDP.

 

greenbutton6.jpg.png

 

You can read more about the GreenButton’s CloudSync product here:http://www.greenbutton.com/blog/index.php/2013/08/19/greenbutton-has-gridftp-support-for-greenbutton...

 

More Information and Feedback

 

If you would like to try out these services or like more information on them, feel free to contact us.  Additionally, if you have any feedback or suggestions we’re always happy to hear them.

 

http://www.greenbutton.com/Contact

 

About GreenButton™ Limited

 

GreenButton™ is an award winning global software company specializing in On-Demand cloud computing. GreenButton delivers a turnkey solution for cloud-enablement, synchronizing data and bursting apps to the cloud. Enabling enterprises and independent software vendors (ISVs) to move to the cloud and access cloud resources. GreenButton provides a multi-purpose cloud platform for development and delivery of software and services. GreenButton's Cloud Fabric empowers users across all industries including digital media, engineering, oil and gas, financial and biotech, to leverage supercomputing power. With GreenButton's Mission Control dashboard, cloud-based applications across multiple private and public cloud platforms can be easily managed from one centralized and user-friendly interface with rich usage reporting and governance controls.

 

GreenButton is Microsoft Corp's 2011 Windows Azure ISV Partner of the year and the company's offices are located in New Zealand and the US. For more information, please visitwww.greenbutton.com or follow GreenButton at https://twitter.com/GreenButton

 

GreenButton is a service mark and trademark of GreenButton Limited. All other product names, service marks, and trademarks mentioned herein are trademarks of their respective owners.

 

[1] Details of the GreenButton Job API are available herehttp://developer.greenbutton.com/documentation/cloudfabric/management/jobs/

 

[2] Further details on Ambari can be found on their page http://ambari.apache.org/

 

[3] Further information on Hue can be found here http://gethue.com/

Tags: big data| Partner
Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Stephen Spector is a HP Cloud Evangelist promoting the OpenStack based clouds at HP for hybrid, public, and private clouds . He was previous...
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.