Re: RAD Configuration on "new" BL860c I2 (8-core, 32GB Mem) (699 Views)
Reply
Honored Contributor
The Brit
Posts: 1,291
Registered: ‎06-18-2007
Message 1 of 5 (710 Views)

RAD Configuration on "new" BL860c I2 (8-core, 32GB Mem)

[ Edited ]

I posted an earlier tale focused on a performance issue on a "new" BL860c I2 blade, and which I suspected might be a FC issue.     Based on a response to that post (from Maurizio De Tommaso, thankyou), I turned by attention to the RAD setup.

 

I should point out that 1) the blade in question is actually a "loaner" from HP, and 2)  I have no experience at all with RAD's.

 

As I mentioned, this is a BL860C I2.    It is configured with 2 x Quad-core CPU's (8 cores total) and 32GB of memory.

 

I have included a text file containing the output from

 

1)    SYS$EXAMPLES:RAD.COM        (Does this look like an OK configuration??    I'm wondering if this blade might have been previously used in an HP UX Virtual host setup?)

 

2)   Running the RADCHECK utility on the problem process (currently running).    HOME RAD = 1

i.e.    $  RADCHECK -PROCESS 20204BB7

 

3)  Show Process /all  /ID=20204BB7

 

4)  Show CPU/full

 

I am really interested in whether this is  an appropriate RAD configuration?   

 

I am considering changing the memory config to MaxUMA, i.e. disabling RAD support on this blade, and just having 32GB of Interleaved Memory.

 

Any Comments on that.

 

Dave

 

P.S. my thanks to Volker, I have not dropped the FC considerations and will follow up on that in the next couple of days.

 

Please use plain text.
Honored Contributor
The Brit
Posts: 1,291
Registered: ‎06-18-2007
Message 2 of 5 (699 Views)

Re: RAD Configuration on "new" BL860c I2 (8-core, 32GB Mem)

Still cant see the attachment.      Including the text.

 

 

              CONFIGURATION INFORMATION.

------------------------------------------------------------
$  @sys$examples:rad.com

 

Node: TABBUD Version: V8.4      System: HP BL860c i2  (1.73GHz/6.0MB)

 

RAD   Memory (GB)   CPUs
===   ===========   ===============
  0       14.00                   0-3
  1       14.00                   4-7
  2        3.99                    0-7

 

------------------------------------------------------------

 

$  radcheck :== "$SYS$SYSDEVICE:[SYS0.SYSCOMMON.SYSTEST]radcheck.exe"

 

$  show user end_night3/full

 

      OpenVMS User Processes at  2-OCT-2013 08:27:48.17
    Total number of users = 1,  number of processes = 1

 

 Username          Node        Process Name       PID           Terminal
 END_NIGHT3  TABBUD     BATCH_1230    20204BB7  (Batch)

 

$  RADCHECK -PROCESS 20204BB7

 

System pages seen from RAD 0:         (2162810 pages in 3 RADs)

 

 RAD           Total                  Private      Galaxy Shared
  0        1642344 ( 76%) 1642344                      0
  1            13069 (  1%)       13069                        0
  2         507397 ( 23%)    507397                       0

 

Global pages:                                              (4804 pages in 3 RADs)

 

 RAD          Total            Private      Galaxy    Shared
  0       914 ( 19%)         914                                 0
  1         0 (  0%)                 0                                   0
  2      3890 ( 81%)      3890                               0

 

Process pages for process 20204bb7 with Home RAD 1:   (4262 pages in 3 RADs)

 RAD                  Total                 Private      Galaxy Shared    Global
  0                    0 (  0%)                    0                              0             0
  1               3260 ( 76%)          3260                           0             0
  2               1002 ( 24%)             11                             0           991

-------------------------------------------------------------

 

SHOW PROC /ID=20204BB7/all

 

 2-OCT-2013 08:40:42.05   User: END_NIGHT3       Process ID:   20204BB7
                          Node: TABBUD           Process name: "BATCH_1230"

 

Terminal:          
User Identifier:    [END_NIGHT3]
Base priority:      3
Default file spec:  Not available
Number of Kthreads: 1 (System-wide limit: 8)

 

Devices allocated:  BG43828:
                    BG45808:

 

Process Quotas:

 Account name: 141    
 CPU limit:                      Infinite                           Direct I/O limit:      4096
 Buffered I/O byte count quota:    723824  Buffered I/O limit:     128
 Timer queue entry quota:              99             Open file quota:        270
 Paging file quota:               1429696              Subprocess quota:        10
 Default page fault cluster:           64               AST quota:             4093
 Enqueue quota:                      2169                   Shared file limit:        0
 Max detached processes:                0              Max active jobs:          0

 

Accounting information:

 

 Buffered I/O count:     32093                         Peak working set size:      70032
 Direct I/O count:       69481                            Peak virtual size:         340960
 Page faults:            20477                               Mounted volumes:                0
 Images activated:          27


 Elapsed CPU time:          0 00:00:35.46
 Connect time:              0 00:40:42.05
 
Authorized privileges:
 ACNT         ALLSPOOL     ALTPRI       AUDIT        BUGCHK       BYPASS
 CMEXEC       CMKRNL       DIAGNOSE     DOWNGRADE    EXQUOTA      GROUP
 GRPNAM       GRPPRV       IMPERSONATE  IMPORT       LOG_IO       MOUNT
 NETMBX       OPER         PFNMAP       PHY_IO       PRMCEB       PRMGBL
 PRMMBX       PSWAPM       READALL      SECURITY     SETPRV       SHARE
 SHMEM        SYSGBL       SYSLCK       SYSNAM       SYSPRV       TMPMBX
 UPGRADE      VOLPRO       WORLD
 
Process rights:
 END_NIGHT3                        resource
 BATCH                            
 OES_DATA_READ                    
 OES_APP_RUN                      
 
Subsystem rights:
 OES_DATA_WRITE                   
 
System rights:
 SYS$NODE_TABBUD                  
 
Auto-unshelve: on
 
Image Dump: off
 
Soft CPU Affinity: off
 
Parse Style: Traditional
 
Case Lookup: Blind
 
Symlink search mode: No wildcard
 
Units: Blocks
 
Token Size: Traditional
 
Home RAD: 1
 
Scheduling class name: none

There is 1 process in this job:

  BATCH_1230 (*)

Please use plain text.
Frequent Advisor
Maurizio De Tommaso_
Posts: 41
Registered: ‎03-11-2004
Message 3 of 5 (680 Views)

Re: RAD Configuration on "new" BL860c I2 (8-core, 32GB Mem)

[ Edited ]

Some technical problems with forum, I suspect....

 

....

OpenVMS V8.4 introduced support for Resource Affinity Domain (RAD) for Integrity servers with Non-Uniform Memory Architecture (NUMA). Cell-based Integrity servers (rx7620, rx7640, rx8620, rx8640 and Superdomes) and Integrity i2 servers (BL860c i2, BL870c i2, BL890c i2, rx2800 i2) are all based on the NUMA architecture.

 

Integrity i2 servers are based on the quad-core or dual-core Tukwila processors. Each CPU socket is coupled with specific memory Dual Inline Memory Modules (DIMMs) through its integrated memory controllers. This memory is termed as Socket Local Memory (SLM). Access to memory local to a socket is faster when compared to access to memory in a remote socket (other socket in the same blade or another blade).

 

Depending on your specific hardware configuration, the RAD configuration might impact the Blade performance. Before to analyze others technical aspects of your configuration, I suggest you also to check the EFI memory configuration.

 

From OpenVMS side :

 

$ @sys$examples:rad.com

 

Further information : OpenVMS Technical Journal Volume 16 - "OpenVMS RAD Support on Integrity Servers"

 

http://h71000.www7.hp.com/openvms/journal/v16/rad.pdf

 

I may suggest also to check the firmware compatibility matrix of the BL860c-i2,HBA, VC, San Switch, Storage and the OpenVMS patch level.

Please use plain text.
Acclaimed Contributor
Dennis Handly
Posts: 24,953
Registered: ‎03-06-2006
Message 4 of 5 (633 Views)

Re: RAD Configuration on "new" BL860c I2 (8-core, 32GB Mem)

>Seems to have missed the attachment.

 

Attachments need to have known suffixes like .txt.

You can edit your post and add them by using Post Options > Edit Reply

Please use plain text.
Esteemed Contributor
Colin Butcher
Posts: 356
Registered: ‎11-28-2003
Message 5 of 5 (382 Views)

Re: RAD Configuration on "new" BL860c I2 (8-core, 32GB Mem)

Hi,

 

Have just regained access to this new (to me) forum setup ... and no, I still don't like it!

 

---

 

NUMA  forces you to think about how the OS and your applications run on the box and what kind of memory they're using - process private or shared between processes. You also need to think about the OS data structures and IO device locality as well.

 

The 860c-i2 is much the same as the rx2800 with 2x processor sockets populated - same underlying physical layout of processor sockets, memory controllers, IO controllers and so on. Memory local to a socket is accessed faster by a processor in that socket than memory local to the other socket. There is plenty of documentation out there describing the memory technology and layout.

 

VMS implicitly uses a lot of memory that is probably better in the shared (ILM) region than the per-socket (SLM) regions. For example the XFC, RMS global buffers, DECram devices, application specific global sections, etc. You also need SLM memory for process private data etc. However, much of what's best for you is extremely dependent on how your applications are written and how they work.

 

So, a workable start for a VMS machine is to set the NUMA layout to balanced, then place the XFC in the ILM region by using memory reservations. Depends how much memory you have to play with - you may choose to restrict the XFC to less than the default 50% of total memory. If there's a lot of stuff that's best in ILM shared memory with fair access from all CPUs, then mostly UMA might be better - gives you  some SLM for process private stuff and place what you can in ILM by using memory reservations or command line options or in the call to system services or whatever.

 

You might find that there's little difference in behaviour between balanced, mostly UMA or max UMA, in which case max UMA is the simplest.

 

Setting the fastpath CPU for devices can be useful - say CPU 1 for all FC devices, CPU2 for all ethernet devices + TCPIP packet processing engine (PPE). You might find that a dedicated CPU for lock manager is useful, but that's going to be very dependent on workload.

 

You might also want to think about CPUs and hyperthreads (co-threads). Enabling that might be useful, it might not. Enabling hyperthreading and turning co-thread CPUs on or off can be a useful technique. If you do that, be careful about the effect you can have on the primary CPU and fastpath CPUs.

 

You can also control what runs where by techniques such as using affinity to tie a process to a CPU (or set of CPUs) and by associating batch queues with specific RADs.

 

It's like a lot of performance related work - how much effort are you prepared to put in, and do you have a problem to solve anyway?  If performance in general is good enough, provided that you understand it enough and know what to do if a problem develops, how far do you need to go in setting the machine up by tweaking everything you can?

 

There are a number of slide sets and webinars out there by several people around this stuff.

 

Cheers, Colin.

Entia non sunt multiplicanda praeter necessitatem (Occam's razor).
Please use plain text.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation