AIX error messages

Some error messages with their possible solutions

NFS: gssd fails to start via SRC

When issuing startsrc, gssd fails to start, and the following gets logged in errpt:

SRC_TRYX (1BA7DF4E) or SRC_RSTRT (CB4A951F)

with the following details:

SYMPTOM CODE
       65280
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'234'
FAILING MODULE
gssd

When looking at the gssd binary:

# ldd /usr/sbin/gssd
/usr/sbin/gssd needs:
...
Cannot find libgssapi_krb5.a(libgssapi_krb5.a.so)
Cannot find libkrb5.a(libkrb5.a.so)

Install the Kerberos aka NDAF client fileset krb5.client.rte and the Modern Cryptographic Library filesets modcrypt.base.lib modcrypt.base.includes - the lib fileset includes the KRB5 kernel extension.

Maximum number of mirror pools reached.

Don't allocate PVs in a certain VG to a fourth mirror pool. Even if you don't copy onto them, mirrorvg / mklvcopy will refuse to mirror LVs.

# extendvg -p mp1new datavg hdisk19
# extendvg -p mp2new datavg hdisk25
# lsvg -P datavg
Physical Volume   Mirror Pool
hdisk4            mp1
hdisk0            mp2
hdisk19           mp1new
hdisk25           mp2new
# mirrorvg -S -c 3 -p copy1=mp1 -p copy2=mp2 -p copy3=mp1new datavg hdisk19
0516-1828 mklvcopy: Maximum number of mirror pools reached.
0516-842 mklvcopy: Unable to make logical partition copies for
        logical volume.
0516-1199 mirrorvg: Failed to create logical partition copies
        for logical volume loglv02.
0516-1200 mirrorvg: Failed to mirror the volume group.

Solution: make sure PVs are only from a maximum of three mirror pools in a given VG at the same time.

In the above example, extend to mp1new first, remove PVs of mp1, add PVs of mp2new and extend mirror to them, and finally remove mp2 copies.

mklv: Number of LPs should be between 1 and 32768.

# mklv -y foobar2lv -t jfs2 -x 81920 -p copy1=site1 foobarvg 1
0516-1390 mklv: Number of LPs should be between 1 and 32768.

The problem is with the number of maximum LPs (-x), not the last parameter, as you would expect from the "number of LPs". Omit the -x option first and adjust on the existing LV, or reduce the -x value to a valid number:

# mklv -y foobar2lv -t jfs2 -p copy1=site1 foobarvg 1
foobar2lv
# chlv -x 81920 foobar2lv
0516-1389 chlv: The -x parameter for MaxLPs must be between 1 and 32768.
        It cannot be smaller than the existing number of LP's.
# chlv -x 32768 foobar2lv

installp: The specified device is not a valid device or file.

# installp -ld <path to existing directory with filesets>
installp: The specified device <directory>
        is not a valid device or file.

The .toc file exists in the directory but is not readable for the user executing the installp command (in our case, root trying to install from an NFS share).

Audit: failed setting kernel audit objects

Audit does not start and gives a rather obscure error:

# audit start
** failed setting kernel audit objects

In our case, the configuration included an invalid user. Username was changed manually in /etc/passwd for a certain user ID.

Listing user's crontab fails with 'not authorized'

# crontab -l adm
crontab: you are not authorized to use cron.  Sorry.

Solution: /var/adm/cron/cron.allow restricts access. No matter if you are root, if the user is not included, you cannot even list the crontab of the user.

NFS: RPC: 1832-019 Program not registered

Trying to mount an NFS v4 share from an AIX client:

mount -o vers=4 srv:/x/y /mnt

:

RPC: 1832-019 Program not registered
Verify the NFS local domain has been set, and the nfsrgyd process is running

Solution: Create the local_domain file.

mkdir /etc/nfs
echo mycompany.tld > /etc/nfs/local_domain
startsrc -s nfsrgyd

NFS: vmount: A file or directory in the path name does not exist

Trying to mount an NFS v4 share from an AIX client after NFS local domain has been set and NFS registry daemon is started:

mount -o vers=4 srv:/x/y /mnt
mount: 1831-008 giving up on:
srv:/x/y
vmount: A file or directory in the path name does not exist.

Solution: Different mount syntax.

mount -o vers=4 -n srv /XX/y /mnt

Also make sure the remote path does not contain symlinks! It did not work for me with links.

NFS mount: nfsmnthelp: System call error number -1

Mounting a share generates this error:

nfsmnthelp: 1831-019 <ip address>: System call error number -1.
mount: 1831-008 giving up on:
<ip address>:<mount point>
System call error number -1.

The IP address of the client cannot be resolved from the server. Correct the DNS, or add it to /etc/hosts on the server.

NFS: vmount: Operation not permitted

When mounting a server share which otherwise looks fine via showmount and rpcinfo, and getting this error:

# mount nfsserv:/share /localmnt
mount: 1831-008 giving up on:
nfsserv:/share
vmount: Operation not permitted.

suspect that the problem is with the NFS client using random ports. The following NFS tunable should be adjusted:

# nfso -p -o nfs_use_reserved_ports=1
Setting nfs_use_reserved_ports to 1
Setting nfs_use_reserved_ports to 1 in nextboot file

syslog: fopen on /dev/null failed

Any activity on the system (here on oslevel 5300-12)

syslog: somecommand[somepid]: syslog: fopen on /dev/null failed, errno 23

Solution: Cannot open any file, open file limit exceeded - see 'sar -v' - AIX system-wide limit is ~3.000.000 files (info from the Internets/random google hit)

crontab: 0481-124 Cannot create the cron file...

While trying to edit crontabs, with any user including root:

crontab: 0481-124 Cannot create the cron file in the /usr/spool/cron/crontabs directory.

Solution: /tmp FS corruption (needs fsck and/or reboot...)

crontab: you are not authorized to use cron. Sorry.

$ crontab -e
crontab: you are not authorized to use cron.  Sorry.

Check these:

# lsuser -a daemon myuser
myuser daemon=true
# grep -w myuser /usr/lib/cron/cron.allow

If the file exists but the user is not there, add it.

arp: Namelist is not valid.

Running 'arp -a' as user:

0821-240 arp: Namelist is not valid.

Solution: User tries running arp command where fpm is set to High. The syscall setppriv() fails.

stty: tcgetattr: A specified file does not support the ioctl system call

Starting/using a (login) shell, you get this error:

stty: tcgetattr: A specified file does not support the ioctl system call.

Solution: check for (unnecessary) stty settings in the profile files (stty erase etc).

scp: protocol error: expected control record

During scp:

protocol error: expected control record

Solution: there is a newline character in a filename.

kern:crit unix: bpf: lo0 attached

R: the following message is seen on the console

E: kern:crit unix: bpf: lo0 attached

S: prngd is running, not necessary on AIX 5L+

installp: The build date requisite check failed

R: installp -agXYd. foobar

E: installp: The build date requisite check failed for fileset ...

S: 5.3 TL8 issue, see IZ25154 - base level and update in the same dir causes the error

TCP port 9090 conflict/in use

R: start application using port 9090

E: port 9090 is in use

S: 5.3: check for a running WebSM (wsmserver): lssrc -ls inetd

crontab for $user not loaded due to authentication error

E: ! crontab for $user not loaded due to authentication error.

S: User does not exist on localhost (LDAP user)

ssh: No xauth program; cannot forward with spoofing

R: ssh -vvv -X somehost

E: debug1: Remote: No xauth program; cannot forward with spoofing.

S: add "XAuthLocation /usr/bin/X11/xauth" to sshd_config

cu: 0835-028 The connection failed

R: cu -l /dev/tty1

E: cu: 0835-028 The connection failed. NO DEVICES AVAILABLE.

S: Uncomment this line in /etc/uucp/Devices:

Direct tty1 - 9600 direct

mirrorvg: getlvodm: Unable to find device id...

R: mirrorvg rootvg hdisk1

E:

0516-304 getlvodm: Unable to find device id 00000000000000000000000000000000 in the Device Configuration Database.
0516-014 mirrorvg: []: The physical volume appears to belong to another volume group.
0516-1200 mirrorvg: Failed to mirror the volume group.

S: extendvg was forgotten. ;-)

cron: 0481-087 The c queue maximum run limit has been reached

R: cron error message in syslog

E: cron: 0481-087 The c queue maximum run limit has been reached.

S: modify /var/adm/cron/queuedefs (raise the limit on cron event queue, eg 'c.200j20n60w')

vi: ex: 0602-101 Out of memory saving lines for undo

R: vi largefile

E: ex: 0602-101 Out of memory saving lines for undo.

S: From man vi: "1,048,560 lines silently enforced". Adjust line limit slightly above the double of the number of lines in the file.

For example, with 2000000 lines: vi -y 4100000 largefile

See also: ~/.exrc, linelimit parameter and the EXINIT variable, man vi

ls is slow

R: ls -la

E: file listing is slow

S: there are user/group IDs in the listing which cannot be resolved (numeric values appear) - chmod them and it will be OK again

Atape: Method error: 0514-022 The specified connection is not valid

R: cfgmgr

E: cfgmgr drops the following error. Tape devices remain in Defined state.

Method error (/etc/methods/cfgAtape -l rmt0 ):
        0514-022 The specified connection is not valid.

S: Atape driver was installed but there was no reboot. To confirm, check install date:

# lslpp -hcq Atape*
/usr/lib/objrepos:Atape.driver:11.1.3.0:COMMIT:COMPLETE:
05/10/11:16;32;59
# who -b
   .        system boot May 09 10:18

Xlib: PuTTY X11 proxy: MIT-MAGIC-COOKIE-1 data did not match

R: putty -X myuser@somehost; su - otheruser; export DISPLAY=localhost:1:0; xterm

E:

Xlib: connection to "localhost:11.0" refused by server
Xlib: PuTTY X11 proxy: MIT-MAGIC-COOKIE-1 data did not match
xterm Xt error: Can't open display: localhost:11.0

S: It depends on how sshd_config is set; the error appears only when 'X11UseLocalhost yes' is set.

1.) X11UseLocalhost yes

After exporting DISPLAY, add the Xauth cookie of the login user, for example:

myuser@somehost $ xauth list
somehost/unix:0  MIT-MAGIC-COOKIE-1  11359df8cb4d7ab9d72b03973a6e823b
myuser@somehost $ echo $DISPLAY
localhost:11.0
myuser@somehost $ su - otheruser
otheruser@somehost $ export DISPLAY=localhost:11.0
otheruser@somehost $ xauth add somehost/unix:0  MIT-MAGIC-COOKIE-1  11359df8cb4d7ab9d72b03973a6e823b
otheruser@somehost $ xterm

2.) X11UseLocalhost no

DISPLAY must be set to the source IP

myuser@somehost $ su - otheruser
otheruser@somehost $ export DISPLAY=<ssh client IP>:0
otheruser@somehost $ xterm

Here is a bit more elaborate permanent solution which can be used in the target user's profile.

This was created for use with sudo, but it should work with su as well. If the su/sudo user has no DISPLAY set, xauth won't work anyway, so make the xauth function exit if setting DISPLAY fails.

sudo_display="$(ps eww $PPID | tr -s " " "^J" | grep DISPLAY)"
[ -n "$sudo_display" ] && export "$sudo_display" || echo "Failed to set DISPLAY"
sudo_user="$(ls -l /dev/$(ps -p $PPID -o tty= | xargs) | awk '{print $3}')"
sudo_xauth="$(su - "$sudo_user" -c '/usr/bin/X11/xauth list')"
[ -n "$sudo_xauth" ] && xauth add "$sudo_xauth" || echo "Failed to set X authority"

A file, file system or message queue is no longer available

Running any command on a certain file or files results in a hang. Signals (Ctrl-c, kill) don't work on these processes anymore.

# touch /var/adm/ras/syslog
touch /var/adm/ras/syslog:A file, file system or message queue is no longer available.
touch: 0652-048 Cannot change the modification time on /var/adm/ras/syslog.

Solution: In our case, a single file in /var was affected and a fsck was required.

This is related to a Virtual I./O configuration problem on our host, so suspect I/O problems under the filesystem with this error message. How to prove that this is the case?

# find /var >/dev/null
find: 0652-019 The status on /var/adm/ras/syslog is not valid.

fsck: File system is currently mounted.

How to fsck /var? On a running system, you cannot fsck because the FS is mounted.

To umount, on a default system only the following daemons are using /var which can be stopped: errdemon, syslogd, sendmail, cron, and rsct daemons.

Note that (at least) cron and pcmsrv have a 'respawn' entry in inittab - to get rid of them, you must temporarily set these entries to 'off' and remember to restore 'respawn' after the fsck has been run!

# lsitab cron
cron:23456789:respawn:/usr/sbin/cron
# chitab "cron:23456789:off:/usr/sbin/cron"

Stopping cron may cause severe problems on systems where scheduled jobs rely on cron.

Make sure to umount other filesystems mounted over /var as well.

But on a production system, there may be more processes:

# ps -p $(fuser -cux /var 2>/dev/null | tr -s " " ",")

Note that setting "check=true" doesn't help either, probably because fsck of /var gets a -f flag for the first time it is mounted ("don't check if it was umounted successfully"), but it is not obvious:

# chfs -a check=true /var
# shutdown -Fr
...
The current volume is: /dev/hd9var
File system is currently mounted.
...
Inode 83 is linked as: /adm/ras/syslog
Directory inode 64 has an invalid reference to inode 83 in entry syslog (NOT REMOVED)
...

If there is no way to get rid of the processes and unmount the FS, do the following.

Boot to SMS, select your boot disk and Service Mode Boot - Single User Mode. Login as root and stop errdemon, the only process using /var:

# /usr/lib/errstop
# umount /var
# fsck -y /var
# sync
# reboot

This should resolve the filesystem errors.

lsdev: Cannot find information in the predefined device configuration database...

# lsdev
lsdev: 0514-521 Cannot find information in the predefined device
        configuration database for the customized device rcm0.

Device stuck in ODM. Probably the driver fileset is missing.

Save the ODM just to be sure it doesn't get corrupted, then:

# odmdelete -q name=rcm0 -o CuDv
0518-307 odmdelete: 1 objects deleted.

WebSphere MQ "2539 (09EB) (RC2539): MQRC_CHANNEL_CONFIG_ERROR"

when connecting to qmgr running on an AIX host

Install the bos.loc.adt.iconv fileset It includes source files for conversion tables and the uconvdef command.

An example would be:

# uconvdef -f /usr/lib/nls/loc/uconvTable/IBM-852.ucmap IBM-852
# mv IBM-852 /usr/lib/nls/loc/uconvTable/

See the manual for further details.

mklv: The genminor function failed

# mklv -t jfs2 -y foo rootvg 2
0516-350 lvgenminor: The genminor function failed.
0516-822 mklv: Unable to create logical volume.

Check free space in the / filesystem.

mv: ...: A file or directory in the path name does not exist.

Check maximum file size with ulimit/lsuser. The file size may be larger than the current fsize limit.

nimsh fails to start and/or machine cannot be managed from NIM server

A NIM operation on the host shows this error message:

0042-006 c_rsh: (exec_nimsh_cmd) exec_cmd Error 0
poll: setup failure
0505-159 nimadm: WARNING, unexpected result from remote command to <client>.
0505-153 nimadm: Unable to execute remote client commands.

The nimsh serviceis not running and fails to start. The following details can be seen with errpt:

LABEL:          SRC_RSTRT
IDENTIFIER:     CB4A951F
SYMPTOM CODE
         512
SOFTWARE ERROR CODE
       -9035

Suspect a port conflict. If you have lsof, check nimsh and/or nimaux ports (both TCP and UDP)

# lsof -Pni:3901,3902 | grep LISTEN

Chris Gibson suggests that a modified /etc/services file (affecting the above mentioned ports) may also be the reason.

restore: 0511-126 Cannot open ...: A file or directory in the path name does not exist.

Upon installing filesets:

restore: 0511-126 Cannot open <some fileset>: A file or directory in the path name does not exist.

The .toc file is outdated. Delete it, and optionally run inutoc in the same directory.

0505-139 alt_disk_install: A wake up cannot be performed...

# alt_rootvg_op -W -d hdisk2
Waking up altinst_rootvg volume group ...
0505-139 alt_disk_install: A wake up cannot be performed on volume
group: altinst_rootvg.  This volume group contains the operating system release 7.1
that is later than 6.1, the operating system release of the running system.

Unsupported operation.

passwd: 3004-616 User does not exist.

# mkuser myuser
# passwd myuser
Changing password for "myuser"
3004-616 User "myuser" does not exist.
3004-709 Error changing password for "myuser".

Reason: The default user registry on this system is set to use LDAP, then local, therefore passwd tries to change the password in LDAP.

# lssec -f /etc/security/user -s default -a SYSTEM
default SYSTEM="LDAP or compat"

Solution: Create the user with the correct registry:

# mkuser SYSTEM=compat registry=files myuser

POWER7 integrated multifunction card 1GE Ethernet: entstat.elxent command Unable to connect

entstat.elxent: 0909-003 Unable to connect to device ent8, errno = 19

Use it on these types of adapter: Int Multifunction Card w/ Base-TX 10/100/1000 1GbE Adapter (a21910071410d203)

See 'lsdev -Ct a21910070904000'

sudo: parse error in /etc/sudoers near line -1

I admit this isn't really AIX specific.Reproduction: whichever sudo command you want (sudo su -; sudo -l; etc)

Solution: User is not in sudoers. This is a bug in sudo 1.6.9.

dsh: 2617-011 No hosts in node list

One possible solution: This error comes when trying to use a node list from a file (-N flag).

The DSH context is not set. Use either

# export DSH_CONTEXT=DSH

or

# dsh -C DSH -N /path/to/file ...

Very slow copy over NFS v3 whereas other protocols are OK

Check the NFS mount options. Specifically, we have found that 'noac' (no attribute caching) seems to cause a serious degradation in throughput (by a factor of 25!).

Options we use and are not expected to have negative effects on performance: bg/fg, soft/hard, nointr, rsize=/wsize= (32768), proto, vers= (3, 4), timeo (600), sec= (sys)

Python: Illegal instruction (SIGILL) core dump when using hashlib

Using python-2.6.7-1 or earlier from perzl.org on AIX 7.1:

>>> import hashlib
>>> m = hashlib.md5()
>>> m.update("text for md5")
Illegal instruction (core dumped)

No solution; probably depends on something set at compile time, however, python-2.6.8-1 from perzl.org works fine.

LVM operations fail on EMC VMAX LUNs

This is our example:

# mkvg -y testvg hdisk0
0516-1254 mkvg: Changing the PVID in the ODM.
0516-1397 mkvg: The physical volume hdisk0, will not be added to
the volume group.
0516-862 mkvg: Unable to create volume group.

Here, hdisk0 is an 'EMC Symmetrix FCP MPIO VRAID' device, type MSYMM_VRAID, the EMC.Symmetrix.* filesets are installed.

WWNs, multipathing etc. are OK, LUN information is available but LVM tools cannot write the VGDAs and assign a PVID to the disk.

Check the detailed LUN parameters with the EMC 'inq' tool and look for unusual values, comparing them with a LUN known to work. In our case, the incorrect parameter was DevNotRdy showing the value 1 (should be 0).

# /usr/lpp/EMC/Symmetrix/bin/inq.aix64_51 -page0 -dev /dev/rhdisk0
Inquiry utility, Version V7.3-1214 (Rev 0.1)      (SIL Version V7.3.0.1 (Edit Level 1214)
Copyright (C) by EMC Corporation, all rights reserved.
For help type inq -h.
(output edited)
-----------------------------------------------------------------------------
BYTE 109:  Vdev   DevNotRdy  VCMState  VCMDevice  GKDevice  MetaDevice  Shared
SA State     0      1          1         0          0         0           1
-----------------------------------------------------------------------------

There was a missing pool binding on the EMC storage.

df shows invalid/negative values

/dev/fs00lv    4184866816 1688992296   60% 18446744073709551387    -1% /fs00

File system corruption. Look for LABEL: J2_TXN_CORRUPT or IDENTIFIER: 4B6BA416 in errpt (at least in our case these came up).