Information Faster Blog

Can Vertica Climb a Tree?

big_basin_0939_mg_1143

 

The answer is YES if it is the right king of tree. Here “tree” refers to a common data structure that consists of parent-child hierarchical relationship such as an org chart. Traditionally this kind of hierarchical data structure can be modeled and stored in tables but is usually not simple to navigate and use in a relational database (RDBMS). Some other RDBMS (e.g.. Oracle) has a built-in CONNECT_BY function that can be used to find the level of a given node and navigate the tree. However if you take a close look at its syntax, you will realize that it is quite complicated and not at all easy to understand or use.

 

For a complex hierarchical tree with 10+ levels and large number of nodes, any meaningful business questions that require joins to the fact tables, aggregate and filter on multiple levels will result in SQL statements that look extremely unwieldy and can perform poorly. The reason is that such kind of procedural logic may internally scan the same tree multiple times, wasting precious machine resources. Also this kind of approach flies in the face of some basic SQL principles, simple, intuitive and declarative. Another major issue is the integration with third-party BI reporting tools which may often not recognize vendor-specific variants such as CONNECT_BY.

 

Other implementations include ANSI SQL’s recursive SQL syntax using WITH and UNION ALL, special graph based algorithms and enumerated path technique. These solutions tend to follow an algorithmic approach and as such, they can be long on theory but short on practical applications. Since SQL derives its tremendous power and popularity from its declarative nature, specifying clearly WHAT you want to get out of a RDBMS but not HOW you can get it, a fair question to ask is: Is there a simple and intuitive approach to the modeling and navigating of such kind of hierarchical (recursive) data structures in a RDBMS? Thankfully the answer is yes.

 

In the following example, I will discuss a design that focuses on “flattening” out such kind of hierarchical parent-child relationship in a special way. The output is a wide sparsely populated table that has extra columns that will hold the node-ids at various levels on a tree and the number of these extra columns is dependent upon the depth of a tree. For simplicity, I will use one table with one hierarchy as an example. The same design principles can be applied to tables with multiple hierarchies embedded in them. The following is a detailed outline of how this can be done in a program/script:

 

  1. Capture the (parent, child) pairs in a table (table_source).
  2. Identify the root node by following specific business rules and store this info in a new temp_table_1. Example: parent_id=id.
  3. Next find the 1st level of nodes and store them in a temp_table_2. Join condition: temp_table_1.id=table_source.parent_id.
  4. Continue to go down the tree and at the end of each step (N), store data in temp_table_N. Join condition: temp_table_M.parent_id=temp_table_N.id, where M=N+1.
  5. Stop at a MAX level (Mevel) when there is no child for any node at this level (leaf nodes).
  6. Create a flattened table: table_flat by adding in total (Mlevel+1) columns named as LEVEL, LEVEL_1_ID,….LEVEL_Mlevel_ID.
  7. A SQL insert statement can be generated to join all these temp tables together to load into the final flat table: table_flat.
  8. When there are multiple hierarchies in one table, the above procedures can be repeated for each hierarchy to arrive at a flattened table in the end.
 

This design is general and is not specific to any particular RDBMS architecture, row or column or hybrid. However the physical implementation of this design naturally favors columnar databases such as Vertica. Why? The flattened table is usually wide with many extra columns and these extra columns tend to be sparsely populated and they can be very efficiently stored in compressed format in Vertica. Another advantage is that when a small set of these columns are included in the select clause of an SQL, because of Vertica’s columnar nature, the other columns (no matter how many there are) will not introduce any performance overhead. This is as close to “free lunch” as you can get in a RDBMS. Let’s consider the following simple hierarchical tree structure:

Vertica Tree diagram

There are four levels and the root node has an ID of 1. Each node is assumed to have one and only one parent (except for the root node) and each parent node may have zero to many child nodes. The above structure can be loaded into a table (hier_tab) having two columns: Parent_ID and Node_ID, which represent all the (parent, child) pairs in the above hierarchical tree:

CHart 1

 

It is possible to develop a script to “flatten” out this table by starting from the root node, going down the tree recursively one level at a time and stopping when there is no data left (i.e. reaching the max level or depth of the tree). The final

output is a new table (hier_tab_flat):

 

Chart 2

 

What’s so special above this “flattened” table? First, this table has the same key (Node_ID) as the original table; Second, this table has several extra columns named as LEVEL_N_ID and the number of these columns is equal to the max number of levels (4 in this case) plus one extra LEVEL column; Third, for each node in this table, there is a row that includes the ID’s of all of its parents up to the root (LEVEL=1) and itself. This represents a path starting from a node and going all the way up to the root level.The power of this new “flattened” table is that it has encoded all the hierarchical tree info in the original table. Questions such as finding a level of a node and all the nodes that are below a give node, etc. can be translated into relatively simple SQL statements by applying predicates to the proper columns.

 

Example 1:Find all the nodes that are at LEVEL=3.Select Node_ID From hier_tab_flat Where LEVEL=3;

Example 2:Find all the nodes that are below node= 88063633.

This requires two logical steps (which can be handled in a front-end application to generate the proper SQL).


Step 2.1. Find the LEVEL of node= 88063633 (which is 3).


Select LEVEL From hier_tab_flat Where Node_ID=88063633;


Step 2.2. Apply predicates to the column LEVE_3_ID:


Select Node_ID From hier_tab_flat Where LEVE_3_ID =88063633;

 

By invoking the script that flattens one hierarchy repeatedly, you can also flatten a table with multiple hierarchies using the same design. With this flattened table in your Vertica tool box, you can climb up and down any hierarchical tree using nothing but SQL.

 

 

Po Hong is a senior pre-sales engineer in HP Vertica’s Corporate Systems Engineering (CSE) group with a broad range of experience in various relational databases such as Vertica, Neoview, Teradata and Oracle

HP Vertica Tutorials You Asked, We Listened.

Over recent months, we’ve heard our community request short, instructional videos and tutorials to help them learn more about the rich and powerful features of the HP Vertica Analytics Platform. We've heard you loud and clear, and have put together some videos to help you maximise your potential with HP Vertica.

Gartner Magic Quadrant Released – HP Vertica Enters the Leader’s Quadrant

Gartner released the 2014 Magic Quadrant for Data Warehouse and Database Management Systems, and we are very proud to announce that the HP Vertica Analytics Platform has entered the Leaders Quadrant!

Labels: gartner MQ

Enhancing Big Data Analytics with the HP Vertica Marketplace

HP Vertica Marketplace.jpgAt the O’Reilly Strata Conference, we demonstrated the just-announced HP Vertica Marketplace, an online destination for developers, HP Vertica users, and technology partners to create and share innovative big data analytics solutions built for the HP Vertica Analytics Platform.

 

Keep reading to find out what this announcement means for you.

Labels: Big Data| Vertica

Our users validate the value of Vertica

HP Vertica Software rocks.pngWe asked the questions and you answered them. We recently asked TechValidate, a trusted authority for creating customer evidence content, to survey the HP Vertica customer base.

 

Keep reading to find out what the survey showed and how customers are seeing the benefits of the HP Vertica Analytics Platform.

Labels: Analytics| Big Data

Welcoming Facebook to the growing family of HP Vertica customers!

FacebookDiscover-1024x576.jpgThis week at HP Discover Barcelona, we were thrilled to welcome one of our newest Vertica customers – Tim Campos, CIO of Facebook, during George Kadifa’s keynote.

 

Facebook selected the HP Vertica Analytics Platform as one component of its big data infrastructure.  Vertica’s value to Facebook can be found in its ability to provide business insights with incredible speed and flexibility. HP Vertica supports Facebook’s business analysts and helps the company be more productive through dramatically reduced query time. It is also valuable for providing accurate forecasting and aiding data driven decisions.

 

Guest post by Chris Selland, VP Marketing HP Vertica

Labels: Big Data| Vertica

HP Vertica 7, uniting and simplifying the worlds of data exploration and analysis

crane-flex-table-05-300x168.jpgToday, we launched HP Vertica 7 “Crane,” our latest release of the HP Vertica Analytics Platform.  This release is a major milestone, as not only does it deliver exciting new platform enhancements, but it also marks HP Vertica’s entry into the data exploration space. In fact, we believe HP Vertica 7 will revolutionize many organizations’ approach to data exploration. With HP Vertica 7, we are providing the most open and comprehensive SQL-based solution to storing, exploring, and serving up big data deployments that span structured and, now, semi-structured data.

 

Keep reading to find out what HP Vertica 7 will mean for you.

 

Guest Post by

Luis Maldonado

Director of Product Management

HP Vertica

Big Data analytics: what’s old is new again

Big Data Analytics.jpgMany consider Big Data analytics to be a new paradigm. In reality, the analytics of massive amounts of data has been in practice for years, particularly in the financial services, communications and manufacturing industries. Interestingly, one of the early pioneers was UPS, which used analytics in the '50s to improve operations.

 

Find out how companies are looking to ideas of the past to gather information for the future.

 

Guest Post by Kevin McConnell, Analytics Solutions & SI Alliances Global Leader at HP Software, Analytic Industry Solutions at Vertica Systems and Jeff Healey, Director of Product Marketing, HP Vertica

CIOs vs CMOs: Who’s biggest on Big Data?

alec_b&w.jpg

Alec Wagner is an associate editor for the Discover Performance blog.

 

Much like BYOD and SaaS, Big Data is infiltrating enterprises by making an end-run around IT. IDC forecasts that by 2016, LOB executives will be directly involved in 80 percent of new IT investments. As vendors make inroads by selling to LOBs—and getting chief marketing officers (CMOs) to sign the contracts—Big Data initiatives are as likely to be born in marketing as they are in IT.

 

To find out how this new environment is changing the roles of IT and marketing/LOBs, HP Discover Performance is hosting a webcast (with an introduction from Vertica VP Chris Selland) that brings a CIO and CMO together for a frank discussion of Big Data. Who should have control? 

Labels: Big Data

How MZI HealthCare identifies big data patient productivity gems using HP Vertica

As part of ourpodcast series, Dana Gardner, president and principal analyst for Interarbor Solutions, recently conducted an interview with Greg Gootee, product manager at MZI HealthCare.   MZI HealthCare develops and provides sophisticated software solutions that are flexible, reliable, cost effective and help reduce the complexities of the healthcare industry.

 

Guest Post by Chuck Smith, Customer Marketing Manager, HP Vertica

Labels: Big Data| healthcare

HP Vertica in private cloud deployments-- how does it work?

Private_Cloud3.pngWhat is the role of Big Data in cloud deployments?  Is a private cloud deployment a viable option for a Big Data implementation? What are the impacts that Big Data will have on the deployment, maintenance and performance of the system?

 

Continue reading to find out how Big Data and cloud deployments work effectively together.

 

Guest post by John Margaglione, Vice President of Systems Engineering at HP Vertica

Doing everything with Big Data

chris-wegryzyn-300x172.jpgBig Data and the presidency. For the 2012 presidential election, big data played a key role in determing voter data and opinions. Chris Wegryzyn, director of Data Architecture for the Democratic National Committee presented on this topic at the HP Vertica Big Data Conference in August.

 

Keep reading to find out how Wegryzyn and the DNC were powered by HP Vertica and the power of Big Data 

Labels: data analytics

Harnessing All Of Your Data

by2020_thumb.jpg

Agility is a key factor in today’s Big Data landscape. Having a strategy in place to deal with the volume of data, while crucial, is only a piece of the puzzle. A Coleman Parkes Research study, commissioned by HP, revealed that barely 50 percent of survey respondents indicated that they use all sources of structured, semi-structured, and unstructured data to analyze and act. A 50 percent gap!  

Using Big Data Analytics for Stronger Customer Support

blog-img.jpgDeloitte University Press just released the key findings from the 2013 social business global executive study and research project, focused on the increasing emphasis businesses are placing on social media and why some companies are lagging behind. 

 

While social business has many facets, sentiment analysis and marketing intelligence is a growing trend in how we engage with customers. The report notes that “companies are connecting with dissatisfied customers before their complaints spread and providing support wherever customers gather online.”

Visualizing Big Data

HP-BigData_Infographic-solution_thumbnail.jpgIt’s no surprise that the staggering amount of information being generated by humans and machines is transforming businesses of all shapes and sizes, but sometimes visualizing the numbers can help us better understand the Big Data landscape and its many challenges. Check out this great infographic capturing the richness and complexity of Big Data by breaking it into 3 sections:

  • Big data challenges
  • Harnessing big data
  • Generating Return On Information (ROI)

Making more of your information management with OMi Management Pack for Vertica

OMi 1.pngDatabase management is huge concern for many organizations. The HP Vertica Analytics Platform has become the chosen solution when organizations are looking to get their information under control.

 

The HP Vertica Analytics Platform was designed for critical business applications and has built in fault tolerance and high availability. With this functionality, people might wonder if they still need a separate monitoring solution. Keep reading to find out if it is vital for your environment.

 

Guest post by Ian Bromehead, Sr. Product Marketing Manager

BSM Service and Operations Bridge

Data-driven decision making with the Vertica Analytics Platform

Vertica HP Discover video.pngPhysicians need access to a wealth of critical information from multiple systems in order to make life-saving decisions on a daily basis. Greg Gootee, Product Manager, MZI Healthcare, discusses how their new application, powered by the Vertica analytics platform, helps deliver better patient care through data-driven decision making.

 

Keep reading to find out how MZI Heathcare helps physicans make more accurate point-of-care decisions.

The results are in: HP Vertica Analytics Platform is a game changer

HP Vertica.jpgAre you curious about what the HP Vertica Analytics Platform can do for you, in your individualized environment? Now you have the opportunity to hear what other organizations have said about the platform.

 

The results are in from a  recent TechValidate survey as well as from a Forrester study. Keep reading to find out what these studies revealed.

 

Guest Post by Chuck Smith, Customer Marketing Manager, HP Vertica

Two new case studies – Yota and Kokubu

Problems with Big Data affects organizations of all sizes. Our HP Vertica Analytics Platform was designed to meet the needs of organizations--regardless of size.

 

Keep reading to find out how we helped create a competitive advantage for two organizations--Russia's Yota, and Japan's Kokubu.

 

Written by Chuck Smith, Customer Marketing Manager, HP Vertica

A new approach to Big Data analytics

Discover2013.pngWere you there? Did you hear the big news? With all of the announcements made final from HP Discover 2013, it’s safe to say that Big Data was a hot topic this year.

 

Colin Mahoney, Senior Vice President and General Manager of HP Vertica discusses all the big announcements we made at HP Discover. Keep reading to find out more.

 

Guest blog post by Dennis Ludvino,

Senior Digital Marketing, HP Vertica

Six degrees of Big Data

BigData.jpgWhen you think Big Data you most certainly don’t think about acclaimed actor Kevin Bacon (most famous for the game six degrees of Kevin Bacon), but maybe you should. Bacon introduced Meg Whitman's HP Discover 2013 keynote adress.

 

Keep reading to find out the big news about Big Data that Whitman announced.

 

Guest blog post by Dennis Ludvino,

Senior Digital Marketing, HP Vertica

Labels: Autonomy| Vertica

Vertica’s talking Big Data at HP Discover

HP vertica.jpgAre you struggling under the weight of Big Data? Do you want to power your data to work for you, instead of fighting to contain it?

 

If you have answered “yes” to either of these questions, I have a few sessions at HP Discover that will interest you. I am presenting at a few of them, and others are being let by my colleagues. Continue reading to find out what you will learn about the hottest topics in Big Data.

 

Guest post by Chris Selland, VP Marketing HP Vertica

How did Guess? and Kansys turn Big Data into a competitive differentiator?

Big Data has a big impact over multiple industries and HP Vertica is there to support them all. We have recently posted two new case studies and the enterprises couldn't be more different. One case study is from the fashion company Guess? and the other is from consulting and professional services firm Kansys.

 

Keep reading to find out how these companies used Big Data as a competitive differentiator instead of a debilitator.

 

 Written by Chuck Smith, Customer Marketing Manager, HP Vertica

Comparing pattern mining on a billion records with HP Vertica and Hadoop

Pattern mining can help analysts discover hidden structures in data. Pattern mining has many applications—from retail and marketing to security management.

 

How can you utilize pattern mining? Will it allow you to better understand your customers and anticipate their needs?

 

Continue reading to find out how to make the most of the power within pattern mining.

 

Guest post by by Kyungyong Lee, Indrajit Roy, and Vanish Talwar

Taking a Moonshot at Big Data analytics for everyone

HP's Moonshot is exciting news. From our point of view here at HP Vertica it offers customers the avility to rapidly deply, scale and manage specialized workloads with lower space and energy contrastraints.

 

Keep reading to find out what Moonshot means for you.

 

Guest post by Chris Selland, VP Marketing HP Vertica

Labels: HP Moonshot| Vertica

Big Data analytics without Big Data complexity

New analytics deployments can be complex, taking up to 18 months to implement and optimize. The complexity of maintaining and integrating these environments often results in missed deadlines, incomplete projects, increased costs, and lost opportunities. In fact, only 32 percent* of application deployments are rated as “’successful”’ by organizations.

 

Guest Post by Jeff Healey, Director of Product Marketing at Vertica Systems

The disruptive power of big data

Aside from the sheer quantity of digital data created every day—about 2.5 exabytes1 —there’s more to Big Data than volume. Big Data offers enterprise leaders the opportunity to dramatically change the way their organizations operate to gain competitive advantage and find new revenue opportunities. But realizing the value Big Data promises requires a new approach. Traditional data warehouses and business intelligence tools weren’t built for the scale of Big Data, and can’t provide insight quickly enough to be useful or even keep up.

 

Keep reading to find out what will happen if Big Data is let out of its cage...

 

Guest post by Chris Selland, VP Marketing HP Vertica

A Method to the March Madness?

ScreenShot001-1024x614.jpgThe NCAA 2013 Men’s Basketball March Madness Tournament officially tiped off on Thursday, March 21st.  For those of you unfamiliar with the tournament, 64 teams from colleges and universities across the United States compete for the championship, awarded to just one winner in early April. Buzzer-beating upsets are as common as fan face paint and schools from parts unknown, making it challenging to choose the winner in your office tournament bracket.

To give you a sense of the tournament’s popularity and appeal, according to USA Today “Last year’s championship game alone had about 20 million TV viewers. The overall tournament had 52 million visits across March Madness on Demand’s broadband and mobile platforms.”

 

Guest Post by Jeff Healey, Director of Product Marketing at Vertica Systems

No, You Do Not Need One Projection Per Query in Vertica

Projections are powerful.

 

Projections are the Vertica Analytic Databases’s only data structure. Every row from every table stored in Vertica is stored in a projection.

 

Continue reading to find out why there is no such thing as a query which “does not hit the projections.”

 

Guest post by Andrew Lamb, Vertica Senior Engineer

Distributed R for Big Data

Data scientists use sophisticated algorithms to obtain insights. However, what usually takes tens of lines of MATLAB or R code is now been rewritten in Hadoop like systems and applied at scale in the industry. Instead of rewriting algorithms in a new model, can we stretch the limits of R and reuse it for analyzing Big Data? We present our early experiences at HP Labs as we attempt to answer this question.

distributed-r-fig1.png

Search
About the Author(s)
  • This account is for guest bloggers. The blog post will identify the blogger.
  • For years I've been doing video and music production back and forth between Boston MA and New Orleans LA. Starting in 2010, I've began working with Vertica (now HP Vertica) in the marketing team, doing customer testimonials, product release videos, and website management. I'm fascinated by Big Data and the amazing things my badass team at HP Vertica has done and continues to do in the industry every day.
Follow Us
Twitter Stream


HP Blog

HP Software Solutions Blog

Labels
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation