Information Faster Blog

Distributed R for Big Data

Data scientists use sophisticated algorithms to obtain insights. However, what usually takes tens of lines of MATLAB or R code is now been rewritten in Hadoop like systems and applied at scale in the industry. Instead of rewriting algorithms in a new model, can we stretch the limits of R and reuse it for analyzing Big Data? We present our early experiences at HP Labs as we attempt to answer this question.

distributed-r-fig1.png

A Deeper Dive on Vertica & R

 

The R programming language is quickly gaining popularity among data scientists to perform statistical analyses. It is extensible and has a large community of users, many of whom contribute packages to extend its capabilities. However, it is single-threaded and limited by the amount of RAM on the machine it is running on, which makes it challenging to run R programs on big data. Continue reading to find out more about the impacts of the R programming language.

 

by Lakshmikant Shrinivas, Pratibha Rana, and Mark Warner

Search
Showing results for 
Search instead for 
Do you mean 
About the Author(s)
  • This account is for guest bloggers. The blog post will identify the blogger.
  • For years I've been doing video and music production back and forth between Boston MA and New Orleans LA. Starting in 2010, I've began working with Vertica (now HP Vertica) in the marketing team, doing customer testimonials, product release videos, and website management. I'm fascinated by Big Data and the amazing things my badass team at HP Vertica has done and continues to do in the industry every day.
HP Blog

HP Software Solutions Blog

Featured


Follow Us
Labels
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.