03-06-2012 07:02 AM
We are using SSH logins and notice that the priority was an 8 and yet all user accounts are a 4, any idea why so high? we were thinking of lowering it to a 6. has anyone done this? vms 8.3 1h1 IA64
03-06-2012 08:20 AM
There's a wrinkle here with looking at process priorities. Are you looking at the running process priority -- the running priority will typically receive a priority boost for interactive users -- or at the process base priority?
I'm seeing priority 4 as the base priority for an ssh session connecting onto an OpenVMS Alpha V8.4 box with V5.7-13ECO2 installed.
A command such as SHOW PROCESS will show the base priority.
Commands including SHOW SYSTEM will show the running priority; that'll be equal to or higher than the base priority.
There are other ways to look at the base and running priorities.
03-06-2012 08:56 AM
well that's a good question we run this com file
and it lists out all the processes above 5
and then I do a sho proc and it show a base priority of the same.
6-MAR-2012 11:53:37.93 User: TCPIP$SSH Process ID: 20C905AE
Node: FACS1 Process name: "TCPIP$SS_BG3347"
User Identifier: [TCPIP$AUX,TCPIP$SSH]
Base priority: 8
Default file spec: Not available
Number of Kthreads: 1
Devices allocated: BG3347:
03-06-2012 09:34 AM - edited 03-06-2012 09:40 AM
That site$specific:baseprio_user.com tool is a site-specific DCL command procedure, and site-specific DCL command procedures can contain bugs.
That site-specific procedure does contain a bug, too.
That's an ssh server process, and not an interactive user. (Note the username and the UIC values there.) That particular server process is the connection between an interactive ssh user process, and the network; it's a server associated with an interactive user process, but it isn't something that the interactive user can cause to execute commands or otherwise more directly control.
Whatever that DCL command procedure is doing here (DCL details not being in evidence, etc) is not correctly detecting an interactive user process.
03-06-2012 10:10 AM
ok well I'll pass along the ingo to ny manager who wants to lower it... :-(
I do know we see the tcpip$ssh show up in the top I/O quite abit, as shown below...
OpenVMS Monitor Utility
TOP BUFFERED I/O RATE PROCESSES
on node FACS1
0 250 500 750 1000
+ - - - - + - - - - + - - - - + - - - - +
20C7F815 _FTA9790: 210 aaaaaaaa
20C81181 TCPIP$S_BG56022 111 aaaa
20CB9990 _FTA60: 95 aaa
20C0245C JOB.=g %SYS.SER 56 aa
20C75400 TCPIP$SS_BG5869 53 aa
20C60C8F TCPIP$SS_BG2668 47 a
20C572AA TCPIP$S_BG64387 45 a
20CAD03E TCPIP$S_BG14472 45 a
20C09A45 TCPIP$S_BG13255 42 a
20C81A18 JOB.W= ZUTPERFH 41 a
20C5DA8F TCPIP$S_BG28846 38 a
20C59A94 TCPIP$S_BG36883 37 a
20DD9C7B TCPIP$SS_BG5557 35 a
20CDCABB TCPIP$S_BG14209 33 a
20D9B847 TCPIP$S_BG32482 33 a
+ - - - - + - - - - + - - - - + - - - - +
03-06-2012 12:36 PM
Those server processes are tossing I/O around as part of the user ssh sessions. There'll be I/O shown.
Log in under ssh, start up monitoring, and slam a terminal session with ordinary terminal I/O.
You should see the associated ssh server process pick up its I/O activities, too.
And out of curiosity, why is your manager even looking at this level of detail? I can infer that there is far more here than is in evidence. (This both the line management and staffing-related aspects, and probably also around system load and performance. Managers seldom ask these sorts of questions in isolation.)
And assuming that your manager is not placated with the responses here, whack the priority of a few of the sessions. (Managers that approach these problems using the managerial approach that can be inferred here might require empirical evidence, after all)
Dropping the ssh server process priority probably won't cure whatever performance issue(s) you're probably targetting, though it'll potentially adversely effect the terminal I/O activity. In general, a stable system doesn't want a higher-priority ssh client interactive process - one that's had a normal interactive-process priority boost, for instance - that's getting throttled by a lower-priority server process. That would be somewhere between slow and bad. These sorts of "priority inversions" can be bad operations.
03-06-2012 12:47 PM
she is knowledgable... but I personally hate touching priorities.. we are experiencing ODBC timeouts in queries to the server running cache. hence thinking that lowering SSH may help. the system for most of the day is at 1200% cpu utilization and moves along quite well. we did a reboot on the 19th after autogen was run and memory was increased to cache.
we are going to try it on the test system.
03-06-2012 01:06 PM
My advice is don't mess with it. The priorities were chosen for a reason, and unless you're seeing some real symptom, you're trying to fix... "Si valet, non est emendandum"
If you're curious, here's what's going on. As you're aware, SSH involves encrypted communication.
On OpenVMS when you establish an SSH connection, a non-privileged server process is started, running under the username TCPIP$SSH. It runs the image TCPIP$SSH_SSHD2 to deal with the SSH protocol and perform encryption and decryption of the data stream. It has network channels to and from the remote SSH client, and channels to a pseudo terminal (FTA device), the other side of which is connected to your interactive process.
Since encryption tends to be compute intensive, there is a risk that starving the SSH server will have a magnified performance impact on the attached interactive process, or worse, in a single CPU environment, competition for CPU between the SSH server and the interactive process could lead to deadlocks (imagine a runaway process hogging the CPU and not letting the user send a ^Y - sure priority boosting won't let that happen, but it illustrates the kinds of interactions that can occur).
Long experience has shown that in this kind of architecture, the best approach is to give the communications server a higher priority than the interactive process. Remember that the server is self limiting since it's constrained by the flow of data. The running image TCPIP$SSH_SSHD2 is well tested and well understood, so there's little risk of it "abusing" a higher than normal base priority. Research by the TCPIP engineers has determined that priority 8 achieves the objectives without interfering with the rest of the system.
So, now that you understand why there is a priority difference, what kind of problem are you attempting to solve by changing it?
The reality is that reducing the base priority of TCPIP$SSH from 8 to 6 probably won't have much impact, as there probably isn't anything running with base priorities between 4 and 8. Taking it down to, or lower than 4 might not be such a good idea. On a busy system I'd expect that to cause lumpy response from SSH sessions, or possibly even packet retransmissions if the server starts dropping TCPIP packets.
Is this supported? Definitely NOT, again, ask yourself what problem you're trying to solve.
If you insist on messing, it looks to be fairly simple to play with:
$ MCR AUTHORIZE MODIFY TCPIP$SSH/PRIORITY=6
this will take immediate effect on subsequent SSH sessions, and can be reversed just as easily. A privileged user can also set the priority of a running server process with SET PROCESS/PRIORITY.
Remember that potential issues will only be seen when the CPU(s) are saturated, so if you want to experiment, make sure you do so with some "ballast" processes soaking up CPU, at least one per CPU on your system.
Don't complain if you break something!
03-06-2012 01:43 PM
>she is knowledgable...
That's not the question. The question is why a manager is looking at this level of detail; this question is more one of the relative job descriptions of a "manager" and a "system manager" here. In most organizations, the latter group usually concern themselves with performance-tuning and related tasks, while the former concern themselves with the latter. That a manager is looking at this level of detail implies that there's rather more going on here; that this problem has been escalated to the attention of management, and for whatever reason.
>but I personally hate touching priorities.. we are experiencing ODBC timeouts in queries to the server running cache. hence thinking that lowering SSH may help. the system for most of the day is at 1200% cpu utilization and moves along quite well. we did a reboot on the 19th after autogen was run and memory was increased to cache.
Whether 1200% CPU utilization is good or bad depends highly on what those 12 (or more) processors are doing here; whether that's real work, or busy work, and whether any other processors are blocked by this activity.
If you're down to the level of looking at ssh I/O rates and here assuming that the normal tuning sequence (of system monitoring and bottleneck identification and changes made and tested) has been followed and assuming that the rest of the low-hanging "fruit" has been collected, then this Itanium box is just somewhere between loaded and overloaded.
And FWIW, Intersystems Cache' reportedly does some rather weird stuff within their process exit handlers around ACID; there have been cases with run-away processes with that package. There have been various reports of this looping over the years; I don't know if it's been resolved.
03-07-2012 04:21 AM
I would have to agree with Hoff and others here. If you are attempting to address performance problems with priority changes, there are other things that should be done. I personally have seen run-away processes appear to be OK, but consume major portions of CPU time and significantly impact overall performance. Sometimes, this can be seen as a complete system hang, others just slow response. More information is required here. I would suggest a more detailed performance evaluation be done. Use Monitor to see the processes consuming the majority of CPU and/or direct I/O. These are the areas most likely to hit bottlenecks. Also, seeing ODBC timeouts can be a result of network issues rather than the machine itself. Check into that as well. Perhaps you are hitting a limit of the network card? (Not likely, but check anyway.)
03-07-2012 06:24 AM
thanks everyone. with the ODBC timeouts there is nothing from a VMS point of you that any thing is wrong, it performs very well. what we see is that the application/cache is slower than mollases. and I do have a nagios graph of the network to the box and it too isn't showing anything out of the ordinary, I also have T4 running and have not been able to discern any issues. most of the time these happen tuesday mornings which is really wierd. wireshark just mentions disconnects.
yes I realize cache isn't perfect and we can have runaway processes.
the system for the most part operates at 1200% during business hours. 12 cpu RX8640
it's frustrating not to be able to find anything... yes we have logged cases with Intersystems and HP to help and they too can not locate anything.
03-07-2012 06:52 AM
If that's 1200% of mostly-user-mode activity, then you probably need a bigger Integrity, a processor upgrade, or you need to reduce the application load at peak times, or you'll want to redistribute the load across servers.
That's also a cell-based box, so there can also arise issues around locality of memory access; see the RAD features of OpenVMS.
You can also investigate where the CPU time is going for your own tools that are active and consuming CPU.
With packaged applications or without access to the application source code, that investigation of CPU use usually involves the vendor(s).
Given that box is an Itanium, also take a look around for alignment faults, as well. Alignment faults are very, very expensive on Itanium. A few faults can be ignored, but a fault-blizzard will vaporize performance.