Re: Profiling a program built with and without STLport (330 Views)
Reply
Regular Advisor
blackwater
Posts: 129
Registered: ‎05-14-2009
Message 1 of 10 (330 Views)
Accepted Solution

Profiling a program built with and without STLport

The question is about significant difference in performance of the same application when it was built with STLport and without STLport.

For the sake of profiling I build on HP-UX 11.31 ia64 my application with STLport 5.1.7 and without STLport. So there were two configurations:

1) gcc 4.3.1, STLport 5.1.7, level of optimization O2
2) gcc 4.3.1, own gcc STL, level of optimization O2

It is probably important to mention that we have been developing the application for a long time using STLport and now we decided to build it without STLport, compare performance and if it is the same or better then start using own gcc STL library.

So I run a simple test when 500 000 requests are sent to the application, processed in one worker thread and sent back
500 000 responses.

What is interesting is that the version with STLport processes requests 25% faster. So for the time being it is not practical to give up using STLport. However I would like to get some assistance in finding out why STLport gives such a significant boost and what can be done to get this speed in the no-STLport version.

I profiled both applications and the problem seems to be in the fact that the no-STLport verion calls too often pthread_mutex_unlock().

STLport-version of the application (profiled with Caliper) showed that there were 57308 pthread_mutex_unlock() calls count. No-STLport-version of the application (profiled with Caliper) showed that there were 124941 pthread_mutex_unlock() calls count.

Actually I have done some search, found recommendation to assign variables of type std::string in this way (add .c_str() to a right-parameter): string_var_1 = string_var_2.c_str(), changed code in often called functions, profiled again and it resulted in 70000 pthread_mutex_unlock() calls count. It is definitely some improvement but it is still worse than STLport-version and what is more important it is not feasible to find all places where it was necessary to add .c_str().

So what recommendations for this situations could you give?

By the way I attached both profile reports.
Acclaimed Contributor
Dennis Handly
Posts: 25,089
Registered: ‎03-06-2006
Message 2 of 10 (330 Views)

Re: Profiling a program built with and without STLport

>So there were two configurations:

Any reason you haven't looked into using aC++, even with STLport? g++ can't compete with the code generated by aC++'s +O2 and above.
Of course you may lose more than you gain when using the aC++ runtime.

>What is interesting is that the version with STLport processes requests 25% faster.

Hard to argue with that.

>the fact that the no-STLport version calls too often pthread_mutex_unlock.

(Actually since lock/unlock come in pairs, it is calling pthread_mutex_lock too often.)

>found recommendation to assign variables of type std::string in this way (add .c_str() to a right-parameter): string_var_1 = string_var_2.c_str()

For aC++'s reference counted strings, this would hurt things.

>So what recommendations for this situations could you give?

Change the g++ STL to optimize your cases? Perhaps STLport is using its own allocator and bypassing malloc?
libstlport.so.5.1::stlp_std::__malloc_alloc::allocate

Not using strings but const char*.

If you look closely at your caliper outputs you'll see that for STLport 56.37% of the calls to lock come from __thread_mutex_lock, which comes from other libc functions:
real_free
localtime_r
__gmtime_r_posix
real_malloc

So don't call these that often. ;-)
You might get help by using MallocNG.

For the no STLport case, with 86.91%, you are doing about 5 times as many calls to malloc.
Regular Advisor
blackwater
Posts: 129
Registered: ‎05-14-2009
Message 3 of 10 (330 Views)

Re: Profiling a program built with and without STLport

Dennis,

Thanks for your answer.

>Any reason you haven't looked into using aC++, even with STLport?

It's a good idea. We'll give it a try. We don't have any compelling reasons to use only gcc.

>Perhaps STLport is using its own allocator and bypassing malloc?
There is such way of allocating memory in STLport but we don't use it. We have built STLport with #define _STLP_USE_MALLOC 1. As you can see in [79] 96% of time stlp_std::__malloc_alloc::allocate spends in libc.so.1::malloc

> If you look closely at your caliper outputs you'll see that for STLport 56.37% of the calls to lock come from __thread_mutex_lock, which comes from other libc functions:

Yes, it's true. I have already realized that. It is likely that it might be optimized.

Regular Advisor
blackwater
Posts: 129
Registered: ‎05-14-2009
Message 4 of 10 (330 Views)

Re: Profiling a program built with and without STLport

I am going to build the application
1) with acc and STLport
2) with acc and its own STL
and measure time of processing requests.
Regular Advisor
blackwater
Posts: 129
Registered: ‎05-14-2009
Message 5 of 10 (330 Views)

Re: Profiling a program built with and without STLport

One more point.

Are you sure that MallocNG is available for download?

I went to this page: http://www.docs.hp.com/en/5992-4174/ch02s07.html, then I went to http://hp.com/go/softwaredepot, then I searched for SWPACKv3 and finanly got (quote): "0 products shown below matched your search on "SWPACKv3".
Acclaimed Contributor
James R. Ferguson
Posts: 21,184
Registered: ‎07-06-2000
Message 6 of 10 (330 Views)

Re: Profiling a program built with and without STLport

Hi:

> Are you sure that MallocNG is available for download?

Yes, see:

http://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=MallocNextGen

Regards!

...JRF...
Regular Advisor
blackwater
Posts: 129
Registered: ‎05-14-2009
Message 7 of 10 (330 Views)

Re: Profiling a program built with and without STLport

James,

Thanks, I will try to install it.
Acclaimed Contributor
Dennis Handly
Posts: 25,089
Registered: ‎03-06-2006
Message 8 of 10 (330 Views)

Re: Profiling a program built with and without STLport

>As you can see in [79] 96% of time stlp_std::__malloc_alloc::allocate spends in malloc

Then the decision to call malloc is made higher in the call tree. Perhaps if you turn off inlining, it may be easier to find.

If you use MallocNG, your STLport version should also be faster.
Regular Advisor
blackwater
Posts: 129
Registered: ‎05-14-2009
Message 9 of 10 (330 Views)

Re: Profiling a program built with and without STLport

>>Any reason you haven't looked into using aC++, even with STLport?

>It's a good idea. We'll give it a try. We don't have any compelling reasons to use only gcc.

I can build Boost 1.38 with aCC but it seems I can't build recent versions of the Boost library with aCC. Unfortunately this is a problem for my project.
Regular Advisor
blackwater
Posts: 129
Registered: ‎05-14-2009
Message 10 of 10 (330 Views)

Re: Profiling a program built with and without STLport

My mistake. Checked again and 1.43 complies.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.