HP Vertica General Manager Colin Mahony on the next generation of analytics platforms

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

 

This next edition of the HP Discover Performance Discussion Series welcomes Colin Mahony, General Manager at HP Vertica, on this first day of the inaugural HP Vertica Big Data Conference in Boston.

 

It's been well over two years since HP acquired Vertica, and the analytics platform has become a pillar of HP's recently announced HAVEn Initiative. Now Vertica is poised to advance beyond its MPP column store database origins into a next generation anywhere analytics platform. New Vertica benefits include ease in cloud deployments and appliance delivery, as well as new features coming later this year for improved speed, lower-cost and greater ease in data input and access.

 

Learn how Mahony is guiding the future of the HP Vertica Analytics Platform, and how users are finding new ways to leverage its unique speed and attributes. The interview is conducted by Dana Gardner, Principal Analyst at Interarbor Solutions. [Follow Colin on Twitter.] [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

 

Here are some excerpts:

Gardner: One of the things that strikes me about the market nowadays is that there seems to be a sense of tradeoffs going on when organizations are trying to pick their big data engine or platform. They have a set of value on one side, but it’s opposed by value on the other. They can’t have everything. One size does not fit all.

So how are you at Vertica able to help people deal with these tradeoffs that they're facing when it comes to a next-generation data platform?

Mahony: Vertica was founded on the premise that one size does not fit all. Using a single OLTP transactional database to do everything, including analytics, just doesn't make a lot of sense.

If you think about the areas that the people have to trade off, usually it’s scale for performance or analytics functionality for performance. One of things that I've spent a lot of time looking at is, especially over the last couple of years, is just some of the alternative platforms, not just for analytics, but for all of the different data needs.

You can take something like Hadoop as an example. Hadoop really is a distributed file system and has capabilities to run rudimentary analytics and transform processed data. But I think what people love about Hadoop is that it's really easy to load data into Hadoop. You don't have to define the schema or anything.

Mahony

Instead of schema on write or load time, it’s schema on read time. People like that. They also like at least the perception that it is free and the scalability of it. On the database side, what people love about the database is that you're going to get really good performance, because the data is structured. If you're using a NexGen MPP platform like Vertica, you'll get the performance of the scalability.

Hadoop-like

We've been doing a lot of work in areas like making it easier to get the data into the platform, doing more with it, making it seem much more like a Hadoop-like environment. You can look at our past releases and see that there's been a lot of work done on that and we continue to make those investments.

One thing has been consistent at Vertica since the beginning. What we focus on is to make it really easy for people to get information onto the platform. Then, we make sure we continue to deliver new capabilities, performance, and functionality within the platform.

We make sure we’re enabling our customers and partners to deploy Vertica anywhere and everywhere, whether it’s cloud appliances, software, or the like. Those are the three tenets of the company. It’s all around this notion of making data matter and help people make better decisions that lead to better outcomes with superior information.

There's so much that can be done in this space, but I think the key for us is to focus on the things that we know we do really well. The good news is that it's such a large space with so many demands that we know we can make a huge impact without trying to take on the world. We know we can make a huge impact in what we’re doing.

I think you'll continue to see some interesting developments along the lines of what I'm describing, and it's very much in line with where we've been.

Gardner: Do more and more IT functions and business functions begin and end with big data? It seems to be at the center of so many things.

Exponential growth

Mahony: It is. To go back to the founding of Vertica [in 2005], I remember when Mike Stonebraker was giving the early presentations on the need for it. He talked a lot about the exponential growth of data and how that was outpacing any laws like Moore’s law or other hardware laws. So much information was being created, there was no way that just using more paralyzed hardware was going to be able to address the issue.

The state of the union back then was there was no such thing as "big data." But I think Mike, as a visionary, knew what was going to happen in the industry. And it has happened.

It wasn’t a long time ago, but I remember that I was trying to find our first sample dataset that was over a terabyte and we had a difficult time finding it. When we would talk to the early customers, they looked at us like we were crazy when we were asking about a terabyte.

We have an easy time now finding terabytes of data. The state of the union today is that what's driving so much around big data is that you have obviously the volume, variety, and velocity that we talk about often, but what's really driving those three things is human information, whether it's social media, tweets, or expressive content that’s just so prevalent right now, as well machine information.

If you look at the traditional structured database market by any number, it’s a small percentage of the amount of data that’s out there. The strength of Vertica, and really the strength of HP overall, is that we have the best assets for the unstructured human information in Autonomy, as well as the best assets when it comes to machine information and large data.

That has some structure. It’s semi-structured information, but it’s not your traditional transaction system. The power of all of that data comes together when you can have an engine that applies some structure to it and then is able to deliver the analytics that the organization needs. It's both IT as well as line of business, and even this new category we often talk about, which is the data scientist.

One of the great things about this show here is that we’ve got Billy Beane of Moneyball fame as our keynote speaker. The reason that we wanted Billy to come speak here is that Moneyball is exactly what’s happening right now in the world when it comes to big data.

You have the data scientist or the statistician, you have the line of business folks, and you have IT. They all have a part to play in the success of how information is used in companies. By bringing them together and by making the software that much easier for them to come together and solve these problems, you can create very real and differentiated value within organization.

So Moneyball is exactly what’s happening, certainly in corporate America, but also in government and in many other institutions that want to leverage information to be more efficient and create a competitive advantage.

Gardner: Colin, what about the notion of big data as agent for business transformation. We've been hearing about this for 30 years. It's been big part of the academic work in business schools. Process re-engineering has evolved into balanced scorecards. Getting more detailed information in real time about the customers and the marketplace probably has as much or more of a opportunity to transform businesses than just about anything else that's happened over the past 20 years.

More than technology

Mahony: It's an enormous opportunity for business transformation, and definitely the whole is greater than the sum of the parts. What makes companies really successful with information is not trying to boil the ocean, not trying to do a traditional enterprise data warehouse project that's going to take 24 months, if you're lucky, 36 most likely.

They’ll end up with some monolithic inflexible platform that will probably be outdated by the time it gets deployed. What is making a lot of companies successful is they find a particular use, they find a problem area that they want to drill down on, and they mobilize to do it.

For that, they need a solution that is quickly deployed, but also has that capability to become something much larger. Whether it's Vertica, Talend, or any of the other portfolios that we offer, we strive to make sure that somebody can get up and running quickly, whether it's Autonomy and human information analytics, Vertica and machine data or other types of transactional structured data.

The most important thing is that you find that business case, you focus on it, and prove very quickly. There's something we refer to as “Time to Terabyte,” which is less than a month, typically for Vertica. You get a return on investment (ROI) in less than a month for the investments that you made. If you prove that out, then everybody in the organization is happy, the line of business, the technology folks in IT, even the statisticians, data scientists.

From there, you start expanding the project, and that's exactly how we win most of our customers. We very rarely go in and say, "Buy an enterprise license for our product across the company." We certainly do those, but more typically we get into a business unit, we find the acute pain, and we solve that problem.

What they're betting on is the ability for us to expand and for them to expand in this platform. That's why we are, on the one hand, all about the platform and the integration, but on the other hand, not about to lose the flexibility and the modularity of what we do, because that's also a huge differentiator for HP's portfolio

I think that this is a wonderful time in the world of business transformation, and I think, unlike what has been talked about for the last 30 years, you now have the data that can back it up and prove it in real-time to the organization.

That's the big difference. You gave the balanced scorecard as an example. If you look at the balance scorecard methodology, you can take that methodology and drill down into a thousand fields of detail and be able to get that information in real time. That's the opportunity here, and that's I think why this market is so huge.

It's not just about faster speeds and feeds. It's about fundamentally stepping back and asking how we're running this business. What assets, especially information assets, do we have that could dramatically boost the productivity to the same extent that computers, when they were first introduced, boosted productivity. That's the goal that everybody is looking for when it comes to information.

Gardner: Tell our listeners and readers a bit more about yourself and your background.

Mahony: I've been with Vertica since the beginning. In fact, long before Vertica, my background has always been databases. I've always loved computer science, and had a minor in computer science in my undergraduate degree. In my first job out of school, I was taking databases and working with civilian US Government clients, and getting a lot of information published up to the web in the earliest days of the web.

I had a couple of other roles, but they were always very technology focused. Then I got my MBA on the business side and went into venture capital for seven years. That's where I met Mike Stonebraker, the founder of Vertica.

I just loved the idea, everything I knew about databases and the challenges of traditional database and everything I knew about the new world order of information -- at the time we didn’t even talk about the term big data -- it just seemed to align really well.

So I decided to leave the dark side of venture capital and I jumped into something that I have been incredibly passionate about. If you look at that lifecycle even my own background with Vertica and where we’ve come, it’s just been a great. The timing was great and as always it takes a lot more than just great technology and great people.

Gardner: It's been well over two years since HP acquired Vertica and, as we begin the inaugural 2013 Big Data Conference, how would you best characterize how Vertica has evolved since its founding back in 2005?

Mahony: Yes, this is our first user conference. It’s ironic that we've never had one before, but I think also this is a testament to that scale that HP can bring. We have wanted a user conference since the beginning. Obviously, it takes some critical mass to get there which we now have, but also it takes the support of an organization that knows how to do these conferences and understand the value of them.

And we’ve evolved quite a bit. It’s been a busy couple of years here, certainly post the HP acquisition. But I think at a high level, we’ve really shifted and expanded from being an MPP column store, very narrowly-focused database company, really into an analytic platform company.

With that comes several developments, obviously on the product side, but also as an organization, going through that maturation in terms of being able to operate at a global scale across the spectrum of what you would expect an analytics provider to offer.

Gardner: And how do you characterize the difference between a store and a platform? Are there many ecosystem players or is this an organic evolution of your capabilities or both?

Mahony: It’s both, the ecosystem and the tools that you interact with. And of course, we support a very rich and vibrant ecosystem of business-intelligencve (BI) tools, extract, transform and load (ETL) tools, and other types of management tools. Not just the ecosystem around it, but also looking within our own products.

So it's adding a lot of the capabilities like backup and recovery, additional analytics capabilities beyond just standard SQL with the SDKs that Vertica supports, the ability to run both the procedural and the other types of code within the product, being able to express things like MapReduce beyond what a traditional database system would do.

Since the founding of the company, we've tried to take the best part of the database world and the best parts of the SQL world, but address the most challenging issues that traditional databases have had. So whether it is scalability or it’s being able to run things beyond SQL or it’s just the performance, those are all the things that we have taken into account while we built Vertica, and I think we have always been on the fast track to a platform.

We knew it would be a journey and we knew that building a product and a platform from the bottom up is not an easy thing, but we also knew that once we got there, once we sort of crossed that chasm, if you will, then all those decisions that made in the beginning about this product and building an engine from the bottom up would pay off.

Platform modularity

For probably the last year, that's where we’ve been. Right now, we're seeing that it’s easy to add functionality to the platform because of the modularity of the platform, and we can add that functionality without giving up any of the performance.

For me, it’s probably the most exciting time. Being part of HP offers us so many things that make it a lot easier to become a platform, not only on the development side, but a much greater ecosystem, a global scale, being able to support customers globally 24/7.

Gardner: It’s only been a few months since the HP Discover 2013 Conference in Las Vegas where the HAVEn Initiative was announced. This puts Vertica in a very prominent place among other HP properties, technologies, platforms and approaches to solving this big data issue. Recap for us, if you would, what HAVEn is and why Vertica formed such an important pillar for this larger HP initiative?

Big-data lake

Mahony: What companies are looking for is this notion of the big-data lake. To me, it can mean many different things, but at the end of the day, companies want to take all the information assets that they have and they want to put them into a safe place, but a place where access to that information can be used by many different constituencies, whether it's IT, line of business, or data scientist.

So the notion of having a safe place, a harbor, or a port is what we announced as HP HAVEn, which is HP’s big data platform. It is primarily for analytics, but it can be used for just about anything when it comes to information and data.

What's so important about information right now is that there are different constituencies in the companies that want to take the information. First of all they want to capture all the information, not just structured, not just unstructured, but 100 percent of their information.

They want to get it to a place where they can leverage it and use it for a lot of different use cases, but the first part is get that information into the right place. For us, that is one of three components of HAVEn, which is the connectors.

We have over 700 connectors as part of HAVEn coming from Autonomy, coming from our Enterprise Security Group, the ArcSight core Logger and those connectors. That can be human information, extreme log information, or traditional database structured information.

At Discover, we announced some of our own internal applications, which are powered by the HAVEn platforms. We announced our HP Analytics offering, which is built using Hadoop, Vertica, Enterprise Security, and Autonomy assets.

So it's an incredibly exciting time, and we're looking forward to having many more of these user conferences and are certainly going to enjoy the rest of the show this week. [Follow Colin on Twitter.]

You can read the rest of this blog post here.

 

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

   

You may also be interested in:

 

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Dana Gardner is president and principal analyst at Interarbor Solutions, an enterprise IT analysis, market research, and consulting firm. Ga...


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation