Benefits of Using a Column Store DBMS in SHR

This posting was written by Balasubramanya Ramananda of the SHR R&D team.

 

Service Health Reporter (SHR) uses Sybase IQ, a Column based Database Management System (DBMS), to implement its Data Warehouse. The choice of DBMS technology is motivated by the primary requirements of a Data Warehouse. A Data Warehouse should:

 

  • Provide fast and efficient querying
  • Aggregate and summarize data over large periods of time
  • Store large amounts of historical data
  • Support OLAP
  • Load sizable chunks of operational data periodically

Benefits

Column based DBMS store data as columns on the disk as opposed to traditional row based DBMS. Typical Data Warehouse queries involve selection of a relatively small subset of columns across a large number of rows. In such cases it helps for data to be stored as columns as the DBMS doesn’t have to do an expensive table search. In addition, column stores do not require external indexes as the data itself is an index. With this strategy every column is indexed by default at no additional cost. Traditional row based DBMS avoid expensive table searches by creating exhaustive indexes. These indexes allow for efficient and fast queries as long as the indexed columns are involved.  SHR supports ad-hoc report creation and queries cannot always be foreseen to create indexes. Hence default column indexes prove beneficial to SHR.

 

Aggregations and summarizations form a majority of the queries handled by SHR. With a columnar arrangement, aggregations are greatly accelerated. Column stores are capable of aggregating millions of rows of data in a few seconds. Most column stores execute aggregations at a rate that is orders of magnitude quicker than traditional row based systems. This greatly aids statistical and trend analysis as well. Traditional row based aggregations are constrained by external indexes and have to work across partitions. SHR has a limited number of pre-aggregated data – a large number of aggregations are done on demand.

 

Column based DBMS’s storage strategy places homogeneous data types in a contiguous manner. This uniformity of data lends itself to very good compression rates. In addition, storage of homogeneous data types together allows the system to select a compression algorithm based both on data type and value/range. Column store vendors often claim a database size equal to or less than the stored data. Several row based DBMS vendors have developed compression techniques to handle heterogeneous data types but arguably it would be difficult to match compression rates supported by column stores. This is particularly relevant to SHR as most of its data consist of performance metrics and these metrics are retained for several months without any partitioning.

 

Rarely can all queries be confined to a single table and in many cases, SHR queries data across tables using join statements. The extent of joins is primarily determined by the Data Warehouse schema - a high degree of normalization points to extensive joins across tables. Both row and column based DBMS require creation of external indexes like B-Tree to support joins. These indexes tend to be expensive to create and maintain and while they are an integral part of row based DBMS, column store based systems often avoid them.

 

One reason column stores discourage external indexes in favor of simpler schema is because column stores don’t partition data. This allows a faster data access but maintenance of an external index tends to be very expensive as the column’s cardinality grows. One strategy to avoid joins is to create very wide de-normalized tables with hundreds of columns! This also results in simplification of queries. SHR, however, does use external index to support Star and Snow-flaked schema without sacrificing other benefits. This is achieved by creating external indexes on dimensions which have cardinality in tens of thousands rather than on facts that run into millions. For instance, the number of hosts reported upon is a dimension and is in the range of a few thousands.

 

Some of the other benefits of column based DBMS include ease of maintenance; due to the nature of indexes created, there is no need to run index maintenance jobs periodically. As mentioned earlier, column stores do not need to create partitions to hold billions of rows of data. Deleting rows is easy enough and doesn’t require re-partition of data. This eases maintenance of the DBMS and reduces overall cost of ownership. SHR provides a set of tools with its Administration module that automate a few routine maintenance jobs required by Sybase IQ.

 

Drawbacks

Column based DBMS have limited transaction support. They often don’t support row-based locking and this leads to queuing of updates on the same table for different records by multiple clients. Hence concurrent update queries don’t always yield the best results. This situation may be further exacerbated if large external indexes need to be updated. Column stores appear to perform best with inserts than with frequent updates. On the other hand, batch loads of large amounts of data are well supported and don’t require any special handling.

 

SHR uses MySQL for all its non-Data Warehouse (OLTP) needs.

 

Conclusion

Column based DBMS prove to be an advantageous choice for data analytics, storage and aggregations. They work well in the Data Warehouse and BI space. However they cannot be treated as a one-stop Data Management solution. Traditional row based DBMS are still the preferred choice for OLTP and transaction needs.

Labels: Analytics
Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Product Marketing Manager for HP Application Performance Management suite of software products. Before this role, I worked in the HP Storag...


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation