Information Faster Blog

How to optimize application data with a new approach to structured data management

image003.jpegBy Phil Nguyen, HP Software Community Director


As the volume of structured data continue to explode, organizations need to find ways to manage data that systematically minimises their database footprint. A new podcast by HP Storage Guy Calvin Zito tells listeners about new software from HP Autonomy that makes it easier to manage the overall data lifecycle.

Save $100 when you register now for HP's Information Management sessions at Software Universe

By Patrick Eitenbichler

HP Software and Solutions’ Information Management suite will be featured at the upcoming HP Software Universe 2010 in Washington DC, June 15 – 18th, 2010.

The IM suite, including HP Data Protector, HP Email, Database and Medical Archiving IAP, and HP TRIM records management software, will be represented in two tracks:

  • Data Protection

  • Information Management for Governance and E-Discovery

Customer case studies and presentations from product experts will highlight how HP’s Information Management solutions provide outcomes that matter. For more information about this event, or to register, please use code INSIDER at and get $100 off the conference rate.

How three very different companies are managing rapid database growth

By Patrick Eitenbichler

Wanted to share three great customer success stories. The companies are very different from each other, but they’re all grappling with business challenges posed by surging data growth: meeting compliance obligations, controlling storage costs, and optimizing performance. The companies turned to HP Database Archiving software to solve these problems, and more.

Tektronix, a U.S.-based provider of test and measurement solutions to the electronics industry, improved application and database performance by more than 47%, and aced compliance tests in 29 countries, despite data growth of 1.25 GB per month.

Tong Yang Group, a Taiwanese automotive parts manufacturer, experienced data growth at a rate of 30-40 GB on average per month - impacting database performance and causing
user-related issues. Tong Yang saw an immediate 10% increase in efficiency in handling orders, and they gained the ability to support 7% business growth in 2009 despite the economic recession.

Turkey is both a private financial services company and the country’s central depository for dematerialized securities. The agency’s database grew 1000 times in a one-year period, due in part to industry regulations requiring financial services firms to store more data for longer periods of time. With HP Database Archiving software, the agency met its growing data archiving needs while reducing storage costs by 50%.’s Central Registry Agency

To learn more about how these companies overcame their database growth challenges, click on their corresponding names above.

Structured Records Management - Taking control of the structured data

In my last post I spoke about how the transfer of structured data from the source system into the records management system works. Now that we have covered this step, lets look at some of the special features that you want to manage structured data as records.

Like any other record, you want to be able to preserve the authenticity, reliability, integrity and usability of the data.  The authenticity is maintained by the system storing an audit trail of the whole transfer process and any subsequent actions taken on the records. The reliability is based on the collaboration of application owners and records managers in the definition and classification of the structured records model, which means that the transferred data is based on a design by people who know all the facts about its source and usage. 

That leaves me to elaborate a bit more about the integrity and usability. 

The structured records get transferred into the records management environment as XML files.  Each transfer batch is a self contained group, consisting of a number of XML files that contain the data and a summary XML file that contains a detailed description of what the data files contain.  To be able to use the data and the summary file in future, each of them is described by a XML schema definition.   All of these files together form a single package and the records management rules are applied at the package level, meaning that the same security and retention rules apply to all files of a single transfer. The integrity of the individual files can be proved at any stage based on hash comparison technology between the summary and the data files.

Usability means that the structured data is not lost once it resides in the records management environment. Text indexing can be used to provide searching across the contents of the XML files to find batches that include data pertinent to a particular circumstance, e.g. all batches that contain customer number XYZ or order number 123.  This is the kind of full text searching that people use across all machine readable formats as part of early searches in the e-discovery or freedom of information processes. However, structured records should also be available to other methods of searching, e.g. for reporting engines. Having the data in XML format with a full schema description allows us to use our Record Query Server to create an ODBC data source pointing to the XML files, which can then be used by a whole variety of SQL query tools - this is a distinct advantage that you get from storing structured records as XML data, rather than as flat text file or PDF formatted report output.  If the original application still exists, and its algorithms are desirable in the analysis of the data, the records management system provides a re-load function to send the XML based data back to the original source database schema.

In all our design of HP TRIM functionality we pay attention to the characteristics of records as prescribed by ISO 15489: authenticity, reliability, integrity and usability, and as you can see,  structured records management is no exception.  By adhering to this principle we are able to create a truly unified records management environment, encompassing all formats of information, physical, electronic, unstructured and structured, meaning that you can apply a single set of consistent records management policies across all your enterprise content.


Structured Records Management - Transferring the records

Once you have defined what the records that you extract from your structured business application look like and how they are classified in the target records management system, you are ready to start transferring them.

There are different things to consider depending on what type of transfer you are performing. If you are transferring the records as a one-off exercise, for example because you want to retire the application that contains them, you will want to perform a "move".  If you are transferring the records as part of an ongoing structured records management regime, it is possible that you want to "copy" them into the records management system and defer the deletion to a later point in time. This deferred deletion option allows you to collect records as soon as they conform with the selection criteria, but still keep a copy in the originating system for fast access. If you implement a deferred delete policy you want to make sure that the records management system can initiate the deferred delete based on predefined or rules-based dates, as well as implementing a feature that ensures that the data in the source system is never retained for any longer than the controlled records. When records are destroyed in accordance with their retention schedule, the system must also initiate the deferred deletion of the original data.

Whatever approach the transfer process takes, you want to make sure that you have an audit trail that covers the selection and extraction from the source application and the creation of the records in the target records management system. Ideally the whole process uses digital signatures and hash algorithms to ensure the integrity of the transfer end to end. This allows you to provide an unbroken chain of custody for your structured records.

Once your transfer is complete and the structured records are stored and classified in your records management system, they will be managed in accordance with your corporate records management policy and in context with all the unstructured records that you capture from other systems or users' desktops. I will tell you about some additional features that can be useful particularly for structured records management in a later post...

Structured Records Management - Classifying the records

Today I want to write about the second step in the process of structured records management, the classification of structured records.

Structured business applications are built to perform consistent tasks as part of well defined day-to-day business processes. This, combined with the predictable nature of the structured data that they use, makes it possible to automate the classification of the structured records we extract from them.

In my last post I talked about how the database administrator and the records manager work together to define the model and extraction rules for the creation of structured records. The classification step falls into the same design time activities. It needs to be done once, and after that will be applied to every structured records transfer of the same type automatically.

When I talk about classification, I talk about assigning a variety of metadata values, which will be created as descriptive and management metadata in the target records management system to enable retention management, security and access management, as well as high accuracy searching by metadata tags. The classification also allows you to bring structured records into context with unstructured records.

Our structured records management solution allows you to access the records management business classification directly from within the design environment, so that you can browse it and navigate to select the correct branch for your structured records definition.  Other metadata fields are imported into the designer from a central configuration area, where they are mapped to the fields in the records management repository. For any metadata field you can decide whether you want to define a "static" value or whether you want to derive the value from the structured record itself at run-time. You can either read data values into the field or generate values based on a combination of data and rules. For example the base date for the record retention could be read from the structured data directly or it could be calculated based in a rule, for example based on a country code in support of country specific retention rules.

All of this flexibility allows you to create a pre-defined metadata profile that allows you to transfer structured records in a completely automated fashion and still get accurate, dynamically created metadata describing them.

I look forward to meeting you again in my next post, whjen I will talk about the extraction of records from the source system and their ingestion into the records management environment.

Structured Records Management - Defining what constitutes a record

After last weeks overview of the structured records management process, I want to start giving you some details of what each step in the process involves, beginning with the definition step.

Records stored in relational databases, as the name suggests, are made up from individual data items stored in multiple tables, linked to each other using relational links.  This means that not every record is a neat package that is redundantly stored with clear boundaries.  Some data items are shared between many records, others are uniquely stored for each record.

If you want to extract and store the data as long term records without reliance on the source application to maintain their usability, you need to extract it in a format that allows you to execute SQL queries across the data at any point in future. You also need to model the records so that they include all the data required to represent a complete and accurate picture of the data as it was at the time of extracting the record.

Structured records management is a discipline that brings together database administrators and records managers. In our solution we use definition tool that allows graphical browsing and modeling of the data structures to define the records. This allows the database administrator to visualize the data in a way that is easily understandable to people without specialist RDBMS knowledge, such as many records manager.

While you are modeling the data you also want to be able to create rules that you can use to select which records to extract from the system at what point. These may be selection rules such as "All fulfilled orders" or exclusion rules such as "Product is not recalled". This is where the records manager can provide valuable input as to what constitutes a record.  Our definition tool shows you what data is available to build the rules upon and allows you to formulate them right from within the data model.

Once you have defined the data model for the structured records and the rules that you want to use in their creation, you are ready to move to the next step in the process, the classification of the records.

To be continued....

Structured Records Management - from data to record

In my last post I started on the subject of Structured Records Management, an area of records management that is re-gaining a lot of relevance because ALL electronically stored information is discoverable in e-Discovery and FOIA, not just unstructured documents. 

In this and some subsequent posts I would like to introduce some of the concepts involved in structured records management. To start with, let's have a look at the steps involved to turn data in structured applications into records managed according to corporate policy:

1. Definition - this step allows us to identify and model the records in the source system

2. Classification - this step allows us to model some descriptive metadata around the records to apply our records management context

3. Extraction - this step allows us to extract records from the source system, based on the modeling done previously

4. Ingestion - this step brings the structured records under the control of the corporate records management environment

5. Management - this step allows us to access, retrieve, query, verify, the structured data under the control of the records management environment

6. Dispose - this step allows us to manage the retention and legal holds of structured data under the control of the records management environment

As I mentioned before, in HP IM we have created a solution that covers all the steps listed above, using our Database Archiving and TRIM products.  It is a truly exciting project to be working on! Throughout the process I was amazed how well the two products complement each other. I will let you know more about some of the details for each step in follow-up posts...

What happened to all the structured data we used to manage as records?

I started my professional life when computers were big machines that filled rooms and only large corporations could afford them. In smaller businesses many of the administrative tasks, such as accounting, keeping customer registers, product catalogs, managing personnel, leave control, payroll etc. were done on paper in big ledgers.  These ledgers were managed as records with very well defined access controls and retention schedules.

With the advent of personal computing, or affordable computing, most of these well defined administrative processes started to use specialized applications that stored the data in some form of database. The focus of these applications was the day-to-day business process and the focus of the underlying databases was the storage, linking, and retrieval of the data, as a service to the applications. Neither the application nor the database technology looked at the requirements of records management.

At the same time, records management systems moved from index card systems to computerized metadata catalogs, and pretty soon moved on to also capture electronic records directly from users' desktops. The focus of electronic records management was on unstructured documents, which proved to be a real nightmare to manage in environments where information could be created by anyone, virtually anywhere and anytime. The information of the structured line of business application was seemingly managed - at least it had a recognizable structure and was stored in a controlled environment.

It is only now, when e-discovery and freedom of information legislation includes all electronically stored information (ESI), that businesses start to realize that very large parts of their ESI resides in structured databases and is not managed as business records.

In a post a couple of months ago I wrote about how some of our HP TRIM customers use metadata only records to at least recognize the existence of records in structured systems within the records management environment; the next step is now to talk about how to start taking control of the structured records at a more detailed level.

The combination of HP Information Management's Database Archiving and HP TRIM technologies makes it possible to manage structured records right from their definition in the source system to their management and destruction as part of TRIM's classification and retention policies.  This allows us to bring back into the fold of records management all that information that somehow got overlooked during the rapid change from paper to electronic environments.  Stay tuned for more...


Go Green; Retire Those Old Energy-Hogging Apps!

by Mary Caplice

I was reading an article today from Forrester (‘Q&A: The Economics Of Green IT’) about how companies can not only save money by going green, there may also be government incentives and utility company programs to help them.  They may even incur penalties in some regions for not going green in certain areas. This article suggests that there are very compelling reasons for IT leaders to educate themselves on local incentives and penalties.  Some green projects require an upfront cost that will pay off later, some require none.  For example, GE expects to save millions just by turning on Windows features like standby and hibernate!  IT can save capital and operating expenses, cooling costs, DBA time, facility square footage  and  license fees for both hardware and software by retiring applications that are being kept alive in case they’re needed down the road for regulatory and compliance reasons.   There’s even a secondary market for that retired hardware!  One way to go about application retirement is to invest in HP Database Archiving software (excellent ROI potential is discussed in my recent blog ‘Death and Taxes- Maybe one can be avoided’).


Ignore it but it won’t go away!

By Mary Caplice

Although we’re all experiencing the effects of a worldwide downturn in the economy, organizations are finding that there are certain things they can’t ignore and wait until the economy improves.  They’re finding that they have no choice but to invest time and technology in reducing costs and risks associated with both increasing data retention regulations and the ability to quickly and efficiently answer legal discovery requests.  This problem is of course most concentrated in highly regulated industries such as Insurance, Financial Services and pharmaceuticals.

HP Database Archiving customers are finding that investing in our technology in these areas can really pay off!

Email Archiving: Choose Carefully…Very Carefully (Part 4)


By André Franklin

In part 3, we discussed seven principles. If the principles are observed, you are unlikely to ever have the need to migrate to a different archiving platform in the near future.

The seven principles are:

  1. Thoroughly understand your email environment

  2. Set clear archiving goals that will still make sense in 5 years or more

  3. Examine scalability in all dimensions

  4. Don’t treat email archiving as a silo. Consider other applications that need (or will need) data archiving

  5. Favor solutions built on standard interfaces for investment protection

  6. Backup and/or replication is more important than any other single feature

  7. Seek references of companies that have similar needs

We examined in detail principles 1 through 3 in part 3. Let’s examine a couple more principles in this post…

Don’t treat email archiving as a silo

We have heard from many users that email is the biggest pain with regard to implementing archiving. This applies to email archiving for compliance purposes, or simply to lighten the load on mailservers. As such, email is often the first archiving problem to be tackled. It’s a noble deed to take on the toughest problem first, but it’s not a wise deed if future archiving needs are not taken into consideration.

What will you need to archive in the future?  Most environments have files. Many use Microsoft Sharepoint to share departmental and corporate information and content. Then there are instant messaging systems, text messages, voicemail, and so on. There is also database data that can be selectively archived for improved database performance. To complicate matters, information management systems want to control what is stored, for how long I is stored, and who has access to the stored information. All of this must be taken into account when implementing an archive.

In an ideal world, one can perform a single search across a massively scalable archive to retrieve data of various types from email to media files to financial records, etc.

All future archiving needs should be considered at the time the first archiving problem is tackled. If an archiving solution does not address the breadth of application data that you want or will need to archive…you run the risk of trying to migrate your archive data to a new and scalable archiving platform in the future. As we have discussed in previous posts…”it’s ain’t gonna be pretty”…so make the right choices upfront.

Favor solutions built on standard interfaces for investment protection

Solutions built around standard interfaces mitigate certain risks with regard to data interchange -- if a migration ever becomes necessary. In addition to standard interfaces, solutions that expose well-documented API’s also mitigate risks. This allows you to roll your own solution and/or interface with other solutions and add-ons. You never really know everything you will want or need in the future, nor can you know of future products that will add value to your existing archiving investment. Standards and API’s help put the odds in your favor.

We’ll examine the remaining two of the seven principles in part 5 of this series.

Death and Taxes- Maybe one can be avoided!

by Mary Caplice

Benjamin Franklin first said "In this world nothing is certain but death and taxes" but we’ve come across recent evidence from an HP customer that one can be avoided, sort of (but it’s not death, sorry!). This customer has thousands of old, de-commissioned applications that they’re keeping on ‘life support’ for compliance and regulatory reasons.

A conservative estimate is that it costs them approximately $10,000 per year in overhead alone to keep each one going, which adds up to $1 million per year per thousand old applications.  They compare it to paying property taxes on a house that’s empty with nobody living in it. How do they plan to avoid paying needless ‘taxes’? – they plan  to offload the data residing in these old apps to an XML archive using HP Database Archiving software, then shut down the old applications. 

The ROI for the first year alone is HUGE! So, how high are your taxes?

Data Archival = Data Survival

by Ali Elkortobi

Data needs to survive technological evolutions because data may still be needed many years after its creation and active usage. Structured data is particularly challenging since it must be stored in a format that survives:
• Database evolutions and migrations
• Application versions and migrations
• Operating system evolutions and migrations
• CPU/Storage evolutions & migrations

We believe that XML is a great format for storing long-term, survivable data. XML has the advantage of being pure text, where data and its metadata (description of the data) are kept together. XML has become a widely used format adopted by many open source programs and utilities.

• XML is searchable via XQuery and/or text retrieval. Because we are storing structured data as XML, it can be also queried thru SQL
• XML is verbose, but can be compressed by significant ratios. Ideally, queries should be possible on compressed XML

HP Database Archiving software fully complies with our proposal for the data life cycle and enables SQL queries on compressed XML. All you need is the data and an enterprise reporting tool. Encapsulating complex business objects data inside one XML file, opens an opportunity to apply record management to structured data as well as unstructured data. These XML files can be managed by a specialized record management tool such as HP TRIM.

Long live your data.

Using Replication with the HP Integrated Archiving Platform

By Linda Zhou

The HP Integrated Archiving Platform (IAP) supports local and remote replication. There are two replication methods for copying data between two IAP systems: one-way replication and cross replication. These techniques are in addition to the disk-level mirroring built into IAP.

To illustrate how the replication works, let's look at two examples. Consider two IAP systems: one IAP system, IAP-USA, is in New York City, and the other IAP system, IAP-UK, is in London, UK.  We designate IAP-USA as the master and IAP-UK as the slave. User permissions are maintained at the master and replicated to the slave, for both one-way and cross replication.

One-way Replication

In this scenario, IAP-USA archives emails, but IAP-UK is dedicated only to replicated data from IAP-USA.

IAP-USA first archives emails into its Smartcells. These Smartcells are grouped in primary and secondary pairs. IAP-USA then sends its Smartcell data to IAP-UK to replicate the archived emails. IAP-UK has two options to store the replication data: one Smartcell or a pair of mirrored Smartcells. Email owners and compliance users can search emails in both IAP-USA and IAP-UK. This replication method is also called active-passive replication because IAP-USA is actively ingesting new emails and IAP-UK is passively replicating IAP-USA.

Cross Replication

In this scenario, both systems are archiving new emails, each replicating to the other system. For example, IAP-USA might archive the emails of North American users, while IAP-UK is responsible for archiving European users’ emails. The archived emails are stored in the primary and secondary Smartcells in IAP-USA and IAP-UK. Both email owners and compliance users can search their emails in IAP-USA and IAP-UK. This replication method is also called active-active replication because both IAP-USA and IAP-UK are actively ingesting new emails.

Replication Rates

The effective rate of replication is dependent on the rate of new emails being archived and the available network bandwidth between the two IAP systems. Because the peak time of new emails is during business hours, and less bandwidth may be available during that time, replication may fall behind. Generally, this should not be cause for alarm, as IAP will catch up during periods of reduced network traffic and email volume (e.g. overnight). However, if the replication backlog is consistently growing, the administrator should consider increasing the network bandwidth available for replication.


Archiving for the Clouds!

by Janani Mahalingam

What does Cloud Computing mean? What does it mean for archiving world?

Cloud computing is the concept where you get IT resources by just turning a knob - just like you get hot water or cooking gas whenever you want by turning a knob. These resources are always available and you will be charged only by the amount that you use. Hmm... isn’t that an interesting concept for IT? It may be storage resource, servers, databases, software resource, # of disks, # cpus, anything that IT has to deal with - Now it is done in the cloud. It is the headache of the cloud maintainers to deal with it. There are a few providers such as HP AIaaS(Adaptive Infrastructure as a Service), Amazon, Google and others who have come up in the last couple of years.

What does Cloud computing mean for Archiving world? Once everything is centralized, there is a necessity for all the clouds to look into archiving in order to ease their maintenance pain. Archiving provides a number of advantages such as performance improvement, cost control of expensive storage, compliance, control of legal hold issues and so on. Every cloud will look into archiving solution once they start building their customer base.

Next step - Archiving solution for the Clouds...

Inquiring Minds want to know about Database Archiving - Part 1




by Kevin O'Malley


Recently I spoke at several Oracle user conferences and had
a lot of inquiries about what it takes to implement a database archiving
project. Here are some of the most frequently asked questions with answers. Please feel free to submit questions of your own.


Q1- What packaged applications do you support?

A1- HP has pre-built integrations for Oracle E-Business and
PeopleSoft Enterprise. We cover the major transactional areas and have the
broadest coverage in the industry for these applications.


Q2- What about custom or third-party applications?

A2- The HP platform can be used for any Oracle-based
application, including custom and third-party applications. Rapid
implementation is facilitated by using the Designer interface to easily model
tables and apply business rules. Anyone with the application knowledge,
in-house personnel or consultants, can rapidly build archive modules. We also
support applications running on SQL Server with a common interface and platform.


Q3- I have extensive customizations in my vendor application
– doesn’t this make it difficult to archive?

A3- Whether your building custom archive modules or
extending pre-packaged modules it’s easy to handle customizations with
Designer. Designer provides a simple to use visual interface for modeling data,
adding custom tables, and building and testing archive/eligibility rules.


Q4- How long does it take to implement a database archiving

A4- HP Database Archiving provides a wealth of features to
facilitate rapid deployment including analysis tools, the Designer interface, as
well as a robust, scalable run-time environment. With that said most of our
customers deploy archiving in their production environments within 6 to 12
weeks – this includes a complete install and testing in a UAT environment,
sign-off by the business users before rolling it into production. Implementation
times can vary by application complexity, number of module areas, end-user
access requirements in addition to your data volumes.


Q5- We have many years of data build-up - won’t it take too
long to get ‘caught up’ and reap the benefits of archiving?

A5- This is a very common situation - customers can have 8,

9, 10 years or more of data in their production databases. Depending on the type
of application and usage most customers only need to keep 3 years or less of
data in their production systems. So, archiving 5 or more years of data in
initial archive runs can seem daunting. Firstly, the HP Database Archiving
platform is highly scalable. Using multi-threading techniques it handles large
volumes of data, always in a safe manner. In addition, you can run archive jobs
online with users on the system. This allows you to run jobs whenever you like
to fit them into your schedule. HP also supports a one-time, bulk data movement
option – this is an offline activity that archives data and re-organizes the
source data tables.  In any case, HP
fully supports your ability to get ‘caught up’ and then you can archive on
regular cycles (weekly, monthly, etc.).


For more information go to - Thanks for the great questions. Keep them coming.

Implementing a Global Database Archiving Strategy

by Kevin O'Malley 


Recently, HP hosted a customer case study with Tektronix that detailed their global database archiving strategy. The following is an interview style blog with Lois Hughes of Tektronix on some of the key elements of their implementation. If you'd like to view the complete webinar please go to the following - Tektronix Case Study Webinar.


Lois, what are the guiding principles for Tektronix when it comes to information management?


We have three core tenets when it comes to this. First, business data must be viewed as information asset and like any asset it must be managed by its 'useful life'. Documenting the useful life or retention must be well understood by both business and IT. These assets can become liabilities if they are kept beyond its useful life - in other words, don't keep data longer than legally required.


So, sounds good - how do you accomplish this?


At Tektronix we take our application data through three phases to manage its lifecycle. The three phases are called Current State, Long-term Retention and Final Form. Depending on the application and data type all transactions must flow through these in succession and ultimately deletion. Current State is the only part of the lifecycle in which data is transactive and actively updated - the other phases are read-only/reporting phases to meet business and audit requirements.


How does establishing these retentions help Tektronix?


We do business in many counties around the world - 29 to be exact. Understanding regulations down to the country level is critical. We created Central Retention Document that lays out our retention polices for every country and region around the world. For example, we need to keep 7 years of financial data in the US whereas in China the requirement is 10 years. We keep track of changes to laws and regulations in this document.


How do you enforce your retention policies?


All of the business owners must understand how retention impacts their areas and access needs. Ultimately, however, we needed a solution to help us enforce policies and manage data through the entire lifecycle. HP Database Archiving software was a great fit for us. HP's solution allows us to migrate database transactions to secondary archive databases as well as long-term XML data stores. The partnership has been a great fit for Tektronix and we're very pleased with the results.


To find out more about the Tektronix implementation and HP's solution register for the webcast replay (Tektronix Case Study Webinar).  You can post questions/comments to Lois or myself here as well.


Honey I shrunk my database!

By Janani Mahalingam


Many enterprise customers are looking for solutions to reduce the storage footprint and administration required for their development and test environments. Creating ten or more clone databases on a regular basis for a single mission-critical application is not uncommon. There are software solutions that shrink production databases and create smaller size clones. This may be useful for some of the development environments where the developers may not need the whole production volume, testing environments where there is unit testing going on, or for security purposes where one group of employees are not allowed to see all the organization data and so on.


There are some vendors including HP who provide subsetting technology. The main criteria in shrinking a database are to make sure that the remaining transactions are complete and application integrity is preserved. The challenge for complex data models is to make sure that all the data types can be moved without breaking integrity and the speed at which the ‘shrinking process’ can be performed. The database shrink is necessary to reduce the overall storage requirement. Getting a handle on the needs of your test and development environments can prevent you from cloning full-size terabyte databases.


The other way to address these issues is to look at snapping and cloning technologies provided by the storage vendors. Database cloning times can be significantly improved and depending on the clone usage the storage requirement can be reduced. Anytime there is a situation where customers are looking into a subsetting solution, it is highly recommended to look into hardware and software solutions before making a decision. It could be a combined solution as well - HP provides both software subsetting and storage technology to assist you.


Keep Lawyers out of your Databases

By Mary Caplice  E-Discovery and Records Management for ‘unstructured’ information (such as email, and electronic and physical documents, etc.) is a well-established practice for most organizations.  There is now an emerging trend towards enabling structured information from databases to be available to legal and compliance in the same manner so that organizations can prove or defend a legal position in a timely manner.

Just a few of the problems an organization might face when their legal team is required to respond to a subpoena within a fixed amount of time might include:

  • Identifying, then retrieving evidence from databases is usually a manual, time-consuming, error and risk-prone process

  • Legal teams tend to not have the technical background to understand the intense complexities of databases, particularly for packaged ERP systems such as Oracle E-Business, Peoplesoft and SAP - this makes it difficult to define a ‘record’

  • The answer to records retention could be to back it all up to tape - not very user-friendly for legal teams

  • IT teams are concerned with purging unused data from databases.  Investigations usually result in a legal hold of the entire ‘responsive’ database, therefore postponing this process, so performance and availability suffer 
Particularly for financial services organizations, these issues may become more of a legal issue due to the recent dramatic changes in the economic climate.Does all this sound familiar to your organization? 


Test and Dev Databases - Is the Fox Watching the Henhouse?

By Mary Caplice  There is growing worldwide concern for the privacy of certain information held in corporate databases such as national identifiers, employee’s salaries, etc. and the organizations that hold that data are being held legally responsible for keeping it private amongst both their customers and employees.  One part of this problem happens when production databases are cloned for use in test and development systems – the data is made available to IT department employees. Can all those people resist looking up their coworkers’ salaries?  Are any of those with bad intentions accessing customer credit card numbers?  Most people don't have malicious intent but it's best to keep them from temptation or accidentally running across sensitive data. Gartner estimates that 70% of unauthorized access to information is committed by internal employees - who are also responsible for more than 95% of intrusions that result in significant financial losses.  What can add to the problem is that when it comes to choosing software to help combat this, the fox may be watching the henhouse.  IT tends to choose software, but for most organizations, Legal, Security, Audit and Compliance are the groups held responsible by customers and the law for keeping data safe.   Who’s watching your data?


Eeny Meeny Miney Mo

by Janani Mahalingam

How do you pick the right archiving solution for your databases? Now that you know archiving is what you need for your databases, how do you go about choosing the right vendor? There are several vendors who provide database archiving solutions. The marketing hype can sound very similar but ‘the devil is in the details’ as they say. Here’s a few things you should look for when choosing the right vendor for your database archiving solution:

1) Do they have a comprehensive product and platform or do they customize their solution for each customer?
2) How long has their solution been in the market?
3) Have they archived a complicated database like an Oracle E-Business database?
4) Do they have many referenceable customers
5) Can they easily build and document data models for the objects to be archived?
6) Do they have solutions for long-term and short-term retention?
7) How easy is it to access the data after it has been archived with their solution?
8) How are customizations handled with their solution?
9) Do they have flexibility in their product to work with database optimizers?
10) Can they work on partitioned tables as part of the archiving strategy? Can partitioned objects be moved transactionally and as a whole?
11) Can they readily handle chained transactions as complete sets of data (e.g. invoices linked by many payments)?
12) Do they have a reporting and analysis capabilities to see which transactions will pass or will be rejected before running archive processes?

Is your database data secure?

Corporate data theft makes huge headlines like the TJ Maxx incident where 45.7 million credit and debit card numbers were stolen. In addition to debit and credit card numbers about a half a million customers had their personal information (SSN, address, phone etc.) stolen. This was a premeditated crime by outside hackers who went out of their way to breach security, including hacking through encrypted data overall several months. While these events will always make the headlines the threat from lack of internal security policies and controls is by far the weakest line in your data security defense. Forrester estimates that 80% of security breaches are from insiders – this includes employees and others with access from within the organization.


What kind of data is being managed in your enterprise databases? Employee personal data is typically stored in HR/Payroll systems, customer data in billing systems, AR and order management systems just to name a few. IT staff have the responsibility of managing the infrastructure and in some cases have direct database level access to perform system management functions. Database Administrators (DBAs) in particular can be given ‘the keys to the kingdom’ if the right checks and balances are not in place.

Most of these internal security weaknesses are overcome by narrowly defining roles combined with the right controls and oversight.


What’s wrong with this picture? Most of the effort is focused on production systems/databases and can be very lax in test and development environments. In some cases developers and testers need sample data that exactly mirrors production data. The easiest way to re-produce a production system is to clone (copy) the entire database. Passwords and access in dev/test systems tend to be much more open then in production. The scariest part is that most breaches here can go undetected. For example, what if someone can’t fight the temptation of looking up their manager’s salary or that of another employee? No one will ever know.


Non-production databases used for test, development, training etc. require just as much oversight as their production counterparts (if not more). When clones of production environments are required the best thing to do is incorporate data subsetting and data masking into the database creation process. Subsetting reduces data set volumes in a way so that data and application integrity is maintained and the sampling of data allows all the required tests to be performed. Subsetting doesn’t sound like a security function but it may be integral to your overall strategy. For example, financial information or sales order data might be very sensitive data, especially for public companies.  Removing current year transactions from non-production databases is a very valuable way to subset away potential breaches. Masking is the process of changing or substituting certain data values so that they become meaningless. In the example above, not only would the employee’s salary be masked, the employee’s personal identifiable information (PII) would be changed as well.


HP has the products and services to help you maintain your production and non-production databases as well as meet compliance requirements. Please check out HP Database Archiving on the Information Management Hub.


Database Partitioning vs. Database Archiving

By Kevin O'Malley

I intentionally chose a blog title that pits partitioning against archiving to get your attention. In reality it’s not really one versus the other but rather how and where these fit into an overall management strategy for structured data. By itself database partitioning does not meet long-term archiving and compliance requirements. Transaction-based archiving alone may not be able to keep up with the volume demands of certain applications. Together they can be a strong force and highly complementary. Let’s explore how…

Database partitioning has been around for some time, supported in some fashion by the major database vendors. Database Administrators (DBAs) have been using table partitioning to increase database performance, freeing ‘hot spots’ from read/write contention, and also to improve manageability (backups, cloning operations etc.). Some applications lend themselves to partitioning better than others. Data warehouse applications are generally good partitioning candidates because date is usually a key dimension and can be used to spread the data out evenly (by years, quarters, months etc.). This segregation can be used to improve query performance and ‘offline’ partitions that are no longer needed for day to day operations. Other high volume applications can be good candidates for partitioning if there’s a suitable ‘partitioning key’ that distributes the data optimally for better performance and manageability.

The advent of storage tiering and Information Lifecycle Management (ILM) techniques has breathed new life into database partitioning. Why not move ‘older partitions’ to lower class tiers and even archive older partitions that are rarely accessed? This seems to make a lot of sense since as data ages beyond certain thresholds the usage typically goes down dramatically. For example, three to five year-old sales orders are probably rarely reported on individually but they may be occasionally included in trending reports etc. Segregating by age and tiering partitions helps to focus performance on ‘current’ data while lowering storage costs.

A lot of confusion arises from whether or not partitioning can be an archiving strategy in and of it self. Let’s take the same example of a sales order as discussed above. In most sales order systems sales order data will span many tables, possibly as many as 50 or 60. This is the nature of relational databases – they are designed to minimize data redundancy and optimize query and reporting. But this elegance also comes with a price - data model complexity. So, some of the sales order tables may be partitioned. But how can they be synchronized so all related data moves together? What happens if I offline or delete an individual partition that has related data in other tables/partitions? This raises compliance concerns for corporations since pieces of individual transactions are ‘split’ and need to be migrated together in order to have a true archiving strategy.

What to do?

HP Database Archiving provides the bridge between traditional archiving and partitioning strategies. The key is to leverage the data relationships and archiving rules so both partitioned data and the related transactions are understood and data can be moved in concert. HP provides built-in analytics to identify which partitions are eligible based on your company’s archiving rules. Partitioned data movement can then be used to move whole partitions along with all related data across non-partitioned tables. This is very powerful and takes the guesswork out of the process and frees DBAs from compliance concerns. Additionally, if only some data in a partition is eligible it can still be archived. HP Database Archiving algorithms determine the fastest method to move only ‘archive eligible’ data across partitions and related tables. Complete sets of ineligible data are preserved in the production system.

If you would like to know more about HP Database Archiving support for partitioning please weigh in or you can find more information on HP’s Information Management Digital Hub (




"Database Archiving" or "Structured Information Management"?

By Mary Caplice

HP Database Archiving software may be underserved by the term ‘archive’.  A better name might be something along the lines of ‘HP Structured Information Management’ since the product does more than archiving.  Even the term ‘archive’ itself is often misunderstood, ill-defined and confused with ‘backup’ (archived data needs to be backed up itself).

‘Archive’ is a generic term that includes

  • HSM (Hierarchical Storage Management)

  • Migration (transparent, or not)

  • Data Retention (policies to govern how long to keep data around, in what form and where)

  • Data Disposition (policies about when and how to destroy or make no longer useful, either selectively or wholesale)

‘Database Archiving’ products in the market also include the following capabilities in addition to archive:

  • Subsetting of production databases, usually for testing purposes but also for compliance reasons (branch offices only need their data)

  • Masking of data for privacy reasons (SSN, salary, etc.), usually for test database subsets

  • Sunsetting -- Organizations retire old applications but need to save the data from them for compliance reasons

  • Data Transformation -- for example, Oracle or SQL Server to XML

Do they belong together?  What other areas should be included in ‘Structured Information Management’ that are typically not today?  Let us know what you think!


Database Archiving goes “mainstream”


By Kevin O'Malley

Having been in the database archiving business for close to 10 years now, I’ve seen several shifts in the market along the way. In the ‘old days’ database archiving was considered an act of last resort. Companies only took on database archiving projects after the production database still had problems after it was tuned, tweaked and put on the most expensive hardware. IT professionals considered allowing continued growth of data in the production database to be the ‘easiest’ mode of operation. Not necessarily the least expensive but the easiest.

A lot has changed over the years. CIOs and Infrastructure VPs started to see hardware budgets escalate out of control. They’ve heard complaints from the business that their applications are not supporting the business. How could this be, since the cost of hardware continues to decline? The main reason is the accumulation effect of data and associated hidden costs.

Hidden costs you say – it’s just about throwing more servers and storage at the problem, right? Not by a long shot. No production database is immune from performance degradation or operational issues – it happens to all of them at some point. The impact on the business will vary based on the type of business processes the applications support, tolerance for outages, missed support agreements etc. Some mission-critical systems examples include order processing, payroll, financial close and SEC reporting. What happens if these take longer and longer to process? What if the system is down for 2 hours or more for maintenance? Can you afford to lose customers? What happens if financial numbers or reports are not filed on time? The problem is that most of the time this creeps up on you and before you know it, ‘tomorrow’ is here. The costs of avoiding the problem far outweigh the impact of implementing a data archiving strategy upfront.

After years of evangelizing the benefits of database archiving it appears that it’s finally top of mind, no longer the final act. CIOs and VPs of IT are budgeting for information management strategies, including database, email and file archiving. These are strategic projects, not just to manage performance and service levels, but to manage information for the long-term (and eventually delete data when it’s over the retention period).

If you are ready to take a closer look at database archiving, HP makes it easy to identify and classify database records using the platform modeling tool. Data is relocated to target destinations based on access and compliance requirements. Archived data can remain highly accessible in an online archive, even to the point where end-users can access data using their native application screens and reports. This includes a transparent access capability that provides Combined Reporting™ capabilities that allows current and archive data to be queried in a single report. Alternatively, or as a next step, data can be archived to XML for long-term retention while retaining its original format for reload and SQL query access. Customer implementations include transactional systems as well as data warehouses. The market has adopted database archiving as the long-term approach for managing information and the time is right to jump in.

Please weigh in on future blog topics. Potential topics include -- how to leverage partitioning with database archiving, modeling data and archive rules, application retirement and performing archive management/operations.

Showing results for 
Search instead for 
Do you mean 
About the Author(s)
  • This account is for guest bloggers. The blog post will identify the blogger.
  • For years I've been doing video and music production back and forth between Boston MA and New Orleans LA. Starting in 2010, I've began working with Vertica (now HP Vertica) in the marketing team, doing customer testimonials, product release videos, and website management. I'm fascinated by Big Data and the amazing things my badass team at HP Vertica has done and continues to do in the industry every day.
HP Blog

HP Software Solutions Blog


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.