PDF's and image-based documents are a pain for testing automators. Is there a remedy?

uft2.pngHave you ever had the honor or the privilege of trying to get information or data from a PDF file using Quick Test professional? If you have, you will understand my pain. Many software platforms opt to use image or PDF formats to represent data in the form of a formatted report.  However, over the years this has caused a large number of headaches for automators. PDFs are simply image-based documents, and it is close to impossible to work with them and decipher the information we need.

 

One my favorite steps in a test case to automate has always been “print report”. This simple statement alludes to the concept that that the automated testing tool has the capability to read printed documents. I always imagine a scenario where R2-D2 from Star Wars rolls over to a printer; beeping the whole way in disgust. Then he has C-3PO read the binary document for Jaba the Hut. The whole time R2 is thinking "I should have read the darn test case all the way through before starting the automation process."

 

The available options

 

FCC.pngPDFs are not impossible for automators to decipher. The capability to use Optical Character Recognition (OCR) software has always been an option when dealing with PDF files when automating test. However because this capability was not built-in to Quick Test Professional, it has been more trouble than it’s worth in the past, causing lengthy delays and has for the most part been very unreliable.

In fact, reading PDF or image-base reports in automation is the reason why it was virtually impossible to achieve 100 percent automation for regression testing when describing it to my customers. While 100 percent automation for regression testing is still a pipe dream today, with Unified Functional Testing (UFT)11.5 it has become one step closer to reality.

 

As a result of UFT 11.5, I will need to come up with a new example to describe the impossibility of achieving 100 percent automation. Or on second thought, I could tell the client the truth—that it is just too boring and really not any fun to automate everything.

 

Is 100 percent an attainable goal?

 

Before going on any farther, I would like to say that I don’t believe 100 percent automation is reasonable for most automation testing. But when you’re talking about ROI in the near future, I do believe that we can achieve a much higher percentage rate than we currently achieve. I think this is true especially when you’re talking about tools like UFT 11.5.

 

I can hear your question. How much closer could UFT 11.5 help us in achieving a goal of 90ish percent? I can’t really tell you this time; but this will definitely be a subject of later blogs because it is a personal interest of mine.

 

UFT 11.5 vs. PDF. The ultimate champion is….

 

But the topic of this discussion isn’t the amount of automation that a person can achieve. It’s about the ability to read, validate and even extract information from a PDF file using UFT 11.5.

I’ve included a video which will demonstrate how simple it is to use a base PDF file for checkpoints validation and even regular expressions to enhance this new functionality. Now we can also extract information from the PDF itself such as invoice numbers, addresses or even a list of information to be used in the current automated script or stored to an external data file.  Again this may seem like child’s play for some people; however over the years having accuracy and the ability to read a PDF file has sometimes been a roadblock to Automators unless they were willing to spend additional time on code that for the most part would go unnoticed.

 

More and more companies are using PDF image-based technology to address the legal issues of editable documents. UFT’s 11.5 file content interface allows users to easily add or edit the contents within the image-based document. Testing file-content checkpoints is very reliable and easy to troubleshoot, even when I used incorrect expressions. If you’re wondering what the performance hit is when using or reading a PDF file as a checkpoint, I found it very minimal with files averaging 25 pages or more.  

There are two things I’d like to ask you as a reader of this article.  First, how do you plan to use this new functionality? Second, as I stated earlier in this article, I have lost my primary example of why 100 percent regression automation is not feasible:

 

  1. Do you believe that you can reach 100 percent and not just in your environment but in others as well?
  2. What is the example that you give clients when attempting to explain why it is impossible to automate all your regression scripts?

Hopefully your answers will help me come up with a new example. Otherwise I will have to spend more time thinking about old movies, cartoons and the like—and you don’t want me doing that.

Also, I will be attending HP Discover in Frankfurt from December 4-6. I would love to meet with you and discuss your ideas over a bratwurst, or maybe a pretzel.

 

On the Down Low: From now on Pretzel is the Code Word for a Good German Beer so I could slip it by my Editor and HP’s censor team also known as HP Software Marketing Group #HPSoftware #HaveAPretzel

 

If you like this subject you may like these articles:

What does it mean to an Automator to have a True IDE?

HP Communities - Webinar announcement – “What’s new in HP’s automat...

My dream functionality is now a reality with HP UF...

“What does a name change mean to me?” A look at th...

 

footer-discover2012-sw-600x60.gif 

Thanks

@wh4tsup_Doc

thCAHZFT5X.gif

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Michael Deady is a Pr. Consultant & Solution Architect for HP Professional Service and HP's ALM Evangelist for IT Experts Community. He spec...
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.