Using profile-guided optimization with shared libraries and multiple executables (151 Views)
Reply
Occasional Advisor
Alan Lehotsky
Posts: 6
Registered: ‎09-16-2008
Message 1 of 11 (151 Views)
Accepted Solution

Using profile-guided optimization with shared libraries and multiple executables

I have a complex application with a large number of executables sharing several shared libraries.

The root executable will exec(1) other executables which will then call into the shared library.

Do I need to build EACH executable main program with +Oprofile=collect or does building just the root executable with profile collection enabled cause all subsequent callers of the shared library to collect profiling information within the shared library?
Please use plain text.
Acclaimed Contributor
Dennis Handly
Posts: 24,784
Registered: ‎03-06-2006
Message 2 of 11 (151 Views)

Re: Using profile-guided optimization with shared libraries and multiple executables

You need to compile everything with +Oprofile=collect to instrument everything (that's important for performance).

And then you use +Oprofile=use to recompile after collection your flow.data file(s).
Please use plain text.
Occasional Advisor
Alan Lehotsky
Posts: 6
Registered: ‎09-16-2008
Message 3 of 11 (151 Views)

Re: Using profile-guided optimization with shared libraries and multiple executables

BTW, I compiled TWO of the executables with +Oprofile=collect and ran my training data.

When it finished, there were a bunch of flow.data. and flow.data.lock files left in addition to a 9MB flow.data file.

Attempting to compile the shared library with +Oprofile=use and that flow.data file resulted in aCC errors complaining that flow.data was locked.

And, I am only trying to optimize the heavily travelled execution paths in my shared library, so I don't care if I don't have profiling information from EVERY executable that uses the shared library.
Please use plain text.
Acclaimed Contributor
Dennis Handly
Posts: 24,784
Registered: ‎03-06-2006
Message 4 of 11 (151 Views)

Re: Using profile-guided optimization with shared libraries and multiple executables

What version of aC++ are you using? The latest is A.06.20.

>When it finished, there were a bunch of flow.data. and flow.data.lock files left in addition to a 9MB flow.data file.

What's in those flow.data. files? They may be created because you fork and exec the same process?

>that flow.data file resulted in aCC errors complaining that flow.data was locked.

If the application is finished, you can remove the .lock files.

>I am only trying to optimize the heavily traveled execution paths in my shared library, so I don't care if I don't have profiling information from EVERY executable

Ok, that should work.
Please use plain text.
Occasional Advisor
Alan Lehotsky
Posts: 6
Registered: ‎09-16-2008
Message 5 of 11 (151 Views)

Re: Using profile-guided optimization with shared libraries and multiple executables

Using A.06.15.

We definitely fork our executables. This is a parallel processing application. The flow.data. files are large (600kb+), and the flow.data..lock files are all of the form

hp8

where is a process id of a process that's gone. There's also a flow.data.log file; it's full of complaints about ffw not being able to write to the flow.data file because it's locked.

Should I delete the temp files, append them to the flow.data file or what?
Please use plain text.
Acclaimed Contributor
Dennis Handly
Posts: 24,784
Registered: ‎03-06-2006
Message 6 of 11 (151 Views)

Re: Using profile-guided optimization with shared libraries and multiple executables

>The flow.data. files are large (600kb+)

Then these probably have useful data in them.

>the flow.data..lock files are all of the form

I think you can remove these.

>There's also a flow.data.log file; it's full of complaints about ffw not being able to write to the flow.data file because it's locked.

Oh boy. Perhaps the log will tell you which are valid? And for which executable.

>Should I delete the temp files, append them to the flow.data file or what?

You probably need to merge them with the flow.data file, see fdm(1).
You probably want to create a new combined file:
fdm -o flow.data_BIG flow.data flow.data. ...

And use flow.data_BIG when recompiling:
+Oprofile=use:flow.data_BIG
Please use plain text.
Occasional Visitor
Nathaniel McIntosh
Posts: 1
Registered: ‎09-16-2008
Message 7 of 11 (151 Views)

Re: Using profile-guided optimization with shared libraries and multiple executables

Collecting flow.data files for multiprocess/multithreaded applications with shared libraries is tricky, no question about it. Here are some suggestions on how you might be able to manage the process more effectively.

The first thing that you can do to make your life easier is to take advantage of a feature that we call flow path qualifiers. When a +Oprofile=collect application finishes execution, it looks at the setting of the environment variable FLOW_DATA. If this "FLOW_DATA" set to a file name or path name, the +Oprofile=collect runtime will try to write the accumulated data to that file or path. You can also tack on the following additional qualifiers to your FLOW_DATA setting:

Suffix Effect
------ ------
,per-process qualifies flow files with executable name
,unique qualifies flow files with process ID

Here is an example that should illustrate:

% /opt/ansic/bin/cc himom.c +Oprofile=collect -o first.exe
% cp first.exe second.exe
% setenv FLOW_DATA "myflow,per-process"
% ./first.exe
hi mom!
% ./second.exe
hi mom!
% ls -ltr myflow*
-rw-rw-r-- 1 me 0 May 13 13:48 myflow,per-process.err
-rw-rw-r-- 1 me 1700 May 13 13:48 myflow.first.exe
-rw-rw-r-- 1 me 1924 May 13 13:48 myflow.first.exe.log
-rw-rw-r-- 1 me 1700 May 13 13:48 myflow.second.exe
-rw-rw-r-- 1 me 2396 May 13 13:48 myflow.log
-rw-rw-r-- 1 me 1931 May 13 13:48 myflow.second.exe.log
% setenv FLOW_DATA "anotherflow,per-process,unique"
% ./first.exe
hi mom!
% ./first.exe
hi mom!
% ./second.exe
hi mom!
%
-rw-rw-r-- 1 me 0 May 13 13:50 anotherflow,per-process,unique.err
-rw-rw-r-- 1 me 1700 May 13 13:50 anotherflow.first.exe-22030
-rw-rw-r-- 1 me 1946 May 13 13:50 anotherflow.first.exe-22030.log
-rw-rw-r-- 1 me 1700 May 13 13:50 anotherflow.first.exe-22036
-rw-rw-r-- 1 me 1946 May 13 13:50 anotherflow.first.exe-22036.log
-rw-rw-r-- 1 me 3594 May 13 13:50 anotherflow.log
-rw-rw-r-- 1 me 1700 May 13 13:50 anotherflow.second.exe-22042
-rw-rw-r-- 1 me 1953 May 13 13:50 anotherflow.second.exe-22042.log
%

As Dennis suggests, you can merge the resulting flow files together after the fact, picking and choosing the things you are interested in. For example, if I know that "first.exe" is performance-critical and "second.exe" non-performance-critical, then I can select only the "first.exe" flow files for merging into my final destination.

With regard to locked flow files: the tools that create flow files will never intentionally leave a flow file locked; if you wind up with flow.data.lock files after the run is complete, then it means that something went wrong somewhere along the line (some process somewhere got interrupted or encountered an error during a flow.data update).

Let me know if this helps. There are other more arcane things you can try if this doesn't do the trick.



Please use plain text.
Occasional Advisor
Alan Lehotsky
Posts: 6
Registered: ‎09-16-2008
Message 8 of 11 (151 Views)

Re: Using profile-guided optimization with shared libraries and multiple executables

Thanks. That worked like a charm; and merging all the flow data boosted my PGO performance to 22% faster than without profiling.
Please use plain text.
Occasional Advisor
Alan Lehotsky
Posts: 6
Registered: ‎09-16-2008
Message 9 of 11 (151 Views)

Re: Using profile-guided optimization with shared libraries and multiple executables

It would be a good thing to extend the documentation in the user's guide to cover profiling shared libraries.
Please use plain text.
Acclaimed Contributor
Dennis Handly
Posts: 24,784
Registered: ‎03-06-2006
Message 10 of 11 (151 Views)

Re: Using profile-guided optimization with shared libraries and multiple executables

>That worked like a charm; and merging all the flow data boosted my PGO performance to 22% faster than without profiling.

Have you looked into -ipo?

>It would be a good thing to extend the documentation in the user's guide to cover profiling shared libraries.

Which document was this?
aC++ Online Help:
http://docs.hp.com/en/14487/options.htm#optprofilebasedoptopt
Optimizing Itanium-based applications:
http://h21007.www2.hp.com/portal/site/dspp/menuitem.863c3e4cbcdc3f3515b49c108973a801/?ciid=c208dd324...
Linker: Profile-Based Optimization
http://docs.hp.com/en/14640/OnlineHelp/applicationperformance.htm#OPTPBO
Please use plain text.
Occasional Advisor
Alan Lehotsky
Posts: 6
Registered: ‎09-16-2008
Message 11 of 11 (151 Views)

Re: Using profile-guided optimization with shared libraries and multiple executables

I tried building the entire library with -ipo (it SEGVs on execution). I also selectively built the 3 sources that were most heavily represented in caliper scgprof, and that ran perhaps 1% faster on one of my benchmarks and 2.5% SLOWER on the other one.

As for the documentation, I was using the a.06.15 online users guide. I can't find ANY mention of unique or per-process in there, or of fdm, for that matter.

The 22% improvement exceeds my goal of 20%, so I'm satisfied at this point.
Please use plain text.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation