07-31-2012 08:58 AM
OpenVMS 7.3.2, tcp 5.4 eco 7
The problem is solved (I think - explanation about this caveat forthcoming) but am posting here for the sake of others having similar issues
Was getting the following error
sftp> open node
Opening connection to node
Disconnected; connection lost (Connection closed.).
Warning: child process (/sys$system/tcpip$ssh_ssh2) exited with code 131.
%TCPIP-E-SSH_FC_ERROR, error in ssh file transfer operation
I then started Googling and reading and Googling and reading
Found the following thread here that indicated the problem was solved by increasing certain sysgen parameters - yet other references say people were getting the problem without any npagedyn memory exhaustion - hmmm. Still others tried that solution and had no success
Looked at memory allocated etc and could find no evidence of exhaution at all
When I looked at the tcp logs I noted the following about Quota exceeded - oh no, the dreaded quota exhaustion hunt again :-(
$ set NOverify
Wed 01 00:24:35 WARNING: Starting image in auxiliary server mode.
Wed 01 00:24:35 INFORMATIONAL: OpenVMS$gl_sockfd = 0
Wed 01 00:24:35 INFORMATIONAL: connection from "xx.xx.xx.xx"
Wed 01 00:24:39 NOTICE: User account's local password accepted.
Wed 01 00:24:39 NOTICE: Password authentication for user account accepted.
Wed 01 00:24:39 NOTICE: User account, coming from node, authenticated.
/sys$system/tcpip$ssh_sftp-server2: non-translatable vms error code: 0x224C
%system-f-exlnmquota, logical name table is full
%TCPIP-E-SSH_ERROR, non-specific error condition
TCPIP$SSH job terminated at 1-AUG-2012 00:24:39.18
Logical name table is full? hmmm
Checked the account I ran the scp command from and the only to find the parameters are set very high - can't see an issue there
No npage exhaustion even remotely on the table
Don't know why but to start with I totally missed what I bolded a few paragraphs above - the account having the issue is tcpip$ssh - not my account at all (talk about the bleeding obvious)
The jtquota on this account was set at 1024, probably the default from install?
Upped it to 4096 and vola - suddenly scp / sftp works
I backed out the change and it started failing again, upped it once again and it started behaving properly
The point of this post was to post what I did to solve the problem in case others out there encounter the problem and find the npage solution doesn't do it for them
my2c worth :-)
07-31-2012 06:32 PM - edited 07-31-2012 06:36 PM
There are lots of things this could depend on. In this day and age, a quota of 1024 (bytes!) on any resource would seem to me to be woefully low and eagerly looking for a place to hit the wall. On the other hand, for a common or garden system, I'd guess that 1024 would be sufficient, but your system may put extra stuff into the job table which pushes you over the limit. (or perhaps SCP stores things like the NCB there?)
On one of my systems, a randomly chosen SSH process has a JOB table using 624 bytes. Just the SYS$LOGIN, SYS$REM* and SYS$SCRATCH logical names.
To see what your processes are doing, first find your processes:
$ SHOW SYSTEM/PROCESS=TCPIP$S*
Possibly more accurately:
$ PIPE SHOW DEVICE/FILE TCPIP$SSH_DEVICE | SEARCH SYS$PIPE TCPIP$SSH
TCPIP$SS_BG3749 2246E890 [TCPIP$SSH]TCPIP$SSH_RUN.LOG;277
TCPIP$S_BG17440 224A92CD [TCPIP$SSH]TCPIP$SSH_RUN.LOG;278
Look for processes with open TCPIP$SSH_RUN.LOG files, as above, and note their PIDs. Now use SDA to determine the JIB address from the PID
SDA> SHOW PROCESS/INDEX=2246E890
Process index: 0090 Name: TCPIP$SS_BG3749 Extended PID: 2246E890
Process status: 00240001 RES,PHDRES,NETWRK
status2: 00000001 QUANTUM_RESCHED
PCB address 82E3C980 JIB address 8374D980
Use the JIB address to display the job logical name table for the SSH process. Use /FULL to display the quota, displayed as (remaining,limit).
$ show logical/table=*8374D980*/full
(LNM$JOB_8374D980) [kernel] [shareable] [Quota=(7568,8192)]
"SYS$LOGIN" [exec] = "TCPIP$SSH_DEVICE:[TCPIP$SSH]"
"SYS$LOGIN_DEVICE" [exec] = "TCPIP$SSH_DEVICE:"
"SYS$REM_ID" [exec] = "TCPIP$SSH"
"SYS$REM_NODE" [exec] = "192.168.192.254::" [terminal]
"SYS$REM_NODE_FULLNAME" [exec] = "192.168.192.254::" [terminal]
"SYS$SCRATCH" [exec] = "TCPIP$SSH_DEVICE:[TCPIP$SSH]"
My guess is you'll see some system specific logical names, or perhaps some very long node or device names? In any case, you should be able to see exactly how much logical name table space is consumed so you can decide if your current quota allocation is appropriate.
Remember, quotas protect you against malicious or accidental consumption of a resource. Unless you've been downloading dodgy versions from some dubious web site, I think you can assume neither apply to SSH, so you needn't be too concerned about over allocation.
Another means at your disposal for investigating this process is to modify the LOGIN procedure. See LOGIN.COM in the home directory of TCPIP$SSH. It should be empty (or perhaps just a comment). Anything you put in there will output into the log file. It will be executed before the SSH code, and needs to exit - though you could put in something that SPAWNs an async subprocess with a timer. Maybe something like this:
$ SHOW LOGICAL/JOB
$ SPAWN/NOWAIT PIPE (WAIT 00:01:00.00 ; SHOW LOGICAL/JOB)
So you'll see the table before SSH runs, then 1 minute later. Output will be in TCPIP$SSH_RUN.LOG