10-02-2013 05:48 AM - edited 10-02-2013 05:49 AM
I posted an earlier tale focused on a performance issue on a "new" BL860c I2 blade, and which I suspected might be a FC issue. Based on a response to that post (from Maurizio De Tommaso, thankyou), I turned by attention to the RAD setup.
I should point out that 1) the blade in question is actually a "loaner" from HP, and 2) I have no experience at all with RAD's.
As I mentioned, this is a BL860C I2. It is configured with 2 x Quad-core CPU's (8 cores total) and 32GB of memory.
I have included a text file containing the output from
1) SYS$EXAMPLES:RAD.COM (Does this look like an OK configuration?? I'm wondering if this blade might have been previously used in an HP UX Virtual host setup?)
2) Running the RADCHECK utility on the problem process (currently running). HOME RAD = 1
i.e. $ RADCHECK -PROCESS 20204BB7
3) Show Process /all /ID=20204BB7
4) Show CPU/full
I am really interested in whether this is an appropriate RAD configuration?
I am considering changing the memory config to MaxUMA, i.e. disabling RAD support on this blade, and just having 32GB of Interleaved Memory.
Any Comments on that.
P.S. my thanks to Volker, I have not dropped the FC considerations and will follow up on that in the next couple of days.
10-02-2013 06:03 AM
Still cant see the attachment. Including the text.
Node: TABBUD Version: V8.4 System: HP BL860c i2 (1.73GHz/6.0MB)
RAD Memory (GB) CPUs
=== =========== ===============
0 14.00 0-3
1 14.00 4-7
2 3.99 0-7
$ radcheck :== "$SYS$SYSDEVICE:[SYS0.SYSCOMMON.SYSTEST]radcheck.e
$ show user end_night3/full
OpenVMS User Processes at 2-OCT-2013 08:27:48.17
Total number of users = 1, number of processes = 1
Username Node Process Name PID Terminal
END_NIGHT3 TABBUD BATCH_1230 20204BB7 (Batch)
$ RADCHECK -PROCESS 20204BB7
System pages seen from RAD 0: (2162810 pages in 3 RADs)
RAD Total Private Galaxy Shared
0 1642344 ( 76%) 1642344 0
1 13069 ( 1%) 13069 0
2 507397 ( 23%) 507397 0
Global pages: (4804 pages in 3 RADs)
RAD Total Private Galaxy Shared
0 914 ( 19%) 914 0
1 0 ( 0%) 0 0
2 3890 ( 81%) 3890 0
Process pages for process 20204bb7 with Home RAD 1: (4262 pages in 3 RADs)
RAD Total Private Galaxy Shared Global
0 0 ( 0%) 0 0 0
1 3260 ( 76%) 3260 0 0
2 1002 ( 24%) 11 0 991
SHOW PROC /ID=20204BB7/all
2-OCT-2013 08:40:42.05 User: END_NIGHT3 Process ID: 20204BB7
Node: TABBUD Process name: "BATCH_1230"
User Identifier: [END_NIGHT3]
Base priority: 3
Default file spec: Not available
Number of Kthreads: 1 (System-wide limit: 8)
Devices allocated: BG43828:
Account name: 141
CPU limit: Infinite Direct I/O limit: 4096
Buffered I/O byte count quota: 723824 Buffered I/O limit: 128
Timer queue entry quota: 99 Open file quota: 270
Paging file quota: 1429696 Subprocess quota: 10
Default page fault cluster: 64 AST quota: 4093
Enqueue quota: 2169 Shared file limit: 0
Max detached processes: 0 Max active jobs: 0
Buffered I/O count: 32093 Peak working set size: 70032
Direct I/O count: 69481 Peak virtual size: 340960
Page faults: 20477 Mounted volumes: 0
Images activated: 27
Elapsed CPU time: 0 00:00:35.46
Connect time: 0 00:40:42.05
ACNT ALLSPOOL ALTPRI AUDIT BUGCHK BYPASS
CMEXEC CMKRNL DIAGNOSE DOWNGRADE EXQUOTA GROUP
GRPNAM GRPPRV IMPERSONATE IMPORT LOG_IO MOUNT
NETMBX OPER PFNMAP PHY_IO PRMCEB PRMGBL
PRMMBX PSWAPM READALL SECURITY SETPRV SHARE
SHMEM SYSGBL SYSLCK SYSNAM SYSPRV TMPMBX
UPGRADE VOLPRO WORLD
Image Dump: off
Soft CPU Affinity: off
Parse Style: Traditional
Case Lookup: Blind
Symlink search mode: No wildcard
Token Size: Traditional
Home RAD: 1
Scheduling class name: none
There is 1 process in this job:
10-02-2013 07:12 AM - edited 10-02-2013 07:20 AM
Some technical problems with forum, I suspect....
OpenVMS V8.4 introduced support for Resource Affinity Domain (RAD) for Integrity servers with Non-Uniform Memory Architecture (NUMA). Cell-based Integrity servers (rx7620, rx7640, rx8620, rx8640 and Superdomes) and Integrity i2 servers (BL860c i2, BL870c i2, BL890c i2, rx2800 i2) are all based on the NUMA architecture.
Integrity i2 servers are based on the quad-core or dual-core Tukwila processors. Each CPU socket is coupled with specific memory Dual Inline Memory Modules (DIMMs) through its integrated memory controllers. This memory is termed as Socket Local Memory (SLM). Access to memory local to a socket is faster when compared to access to memory in a remote socket (other socket in the same blade or another blade).
Depending on your specific hardware configuration, the RAD configuration might impact the Blade performance. Before to analyze others technical aspects of your configuration, I suggest you also to check the EFI memory configuration.
From OpenVMS side :
Further information : OpenVMS Technical Journal Volume 16 - "OpenVMS RAD Support on Integrity Servers"
I may suggest also to check the firmware compatibility matrix of the BL860c-i2,HBA, VC, San Switch, Storage and the OpenVMS patch level.
10-03-2013 12:52 AM
>Seems to have missed the attachment.
Attachments need to have known suffixes like .txt.
You can edit your post and add them by using Post Options > Edit Reply
01-05-2014 04:35 AM
Have just regained access to this new (to me) forum setup ... and no, I still don't like it!
NUMA forces you to think about how the OS and your applications run on the box and what kind of memory they're using - process private or shared between processes. You also need to think about the OS data structures and IO device locality as well.
The 860c-i2 is much the same as the rx2800 with 2x processor sockets populated - same underlying physical layout of processor sockets, memory controllers, IO controllers and so on. Memory local to a socket is accessed faster by a processor in that socket than memory local to the other socket. There is plenty of documentation out there describing the memory technology and layout.
VMS implicitly uses a lot of memory that is probably better in the shared (ILM) region than the per-socket (SLM) regions. For example the XFC, RMS global buffers, DECram devices, application specific global sections, etc. You also need SLM memory for process private data etc. However, much of what's best for you is extremely dependent on how your applications are written and how they work.
So, a workable start for a VMS machine is to set the NUMA layout to balanced, then place the XFC in the ILM region by using memory reservations. Depends how much memory you have to play with - you may choose to restrict the XFC to less than the default 50% of total memory. If there's a lot of stuff that's best in ILM shared memory with fair access from all CPUs, then mostly UMA might be better - gives you some SLM for process private stuff and place what you can in ILM by using memory reservations or command line options or in the call to system services or whatever.
You might find that there's little difference in behaviour between balanced, mostly UMA or max UMA, in which case max UMA is the simplest.
Setting the fastpath CPU for devices can be useful - say CPU 1 for all FC devices, CPU2 for all ethernet devices + TCPIP packet processing engine (PPE). You might find that a dedicated CPU for lock manager is useful, but that's going to be very dependent on workload.
You might also want to think about CPUs and hyperthreads (co-threads). Enabling that might be useful, it might not. Enabling hyperthreading and turning co-thread CPUs on or off can be a useful technique. If you do that, be careful about the effect you can have on the primary CPU and fastpath CPUs.
You can also control what runs where by techniques such as using affinity to tie a process to a CPU (or set of CPUs) and by associating batch queues with specific RADs.
It's like a lot of performance related work - how much effort are you prepared to put in, and do you have a problem to solve anyway? If performance in general is good enough, provided that you understand it enough and know what to do if a problem develops, how far do you need to go in setting the machine up by tweaking everything you can?
There are a number of slide sets and webinars out there by several people around this stuff.