Data integrity issues are created when using Hercules-390 Version 4 (Hyperion) with DOS/360 when DASD file protection support is included in the DOS/360 supervisor.
This seems to be due to the fact that Hercules-390 does not return a channel busy response to a Test Channel instruction. The problem has been seen in the available V4 (Hyperion) build dated October of 2015 and a new build I did based on commits through 3 Feb. Hyperion 3.12, which has the same code for emulating the Test Channel instruction in io.c, does not seem to expose the problem.
DOS/360 DASD File Protection Implementation
DOS/360 implements DASDFP by creating and prepending a CCW string to the caller's disk I/O CCW string. A typical caller CCW string looks like this:
seek: X'07' Caller Long Seek
side: X'31' Search ID Equal (could be search key equal as well)
stic: X'08' TIC *-8 to repeat the search until a match.
...A match returns "status modifier" which results in the
...TIC being skipped; the next CCW executed follows the TIC
wckd: X'1D' Write count key data
The DOS/360 supervisor copies the caller's long seek into a 3-CCW prefix string in the supervisor. This CCW string is:
X'07' Caller's long seek
X'1F' Set file mask appropriate to the operation. Typically seeks are disabled.
X'08' TIC to caller's CCW string at "side:' above.
The DOS/360 supervisor contains one 3-CCW string per channel on which DASDFP is enabled and uses the channel busy condition code from Test Channel to serialize access to those strings. This works fine for selector channels and for byte multiplexer channels operating in burst mode--as they would for disk I/O operations on a byte multiplexer. Block multiplexer channels would appear to complicate this, but they do not.
DOS/VS DASD File Protection Implementation
DOS/VS uses the same logic and 3-CCW prefix string to insert a Set File Mask between the caller's seek and the rest of the caller's channel program. However, DOS/VS allocates a number of 3-CCW prefix strings (25 in a sample supervisor assembly) and maintains a list of free and in-use 3-CCW prefix strings. When an I/O request is made, the DOS/VS supervisor allocates a free 3-CCW prefix to it, alters it for the operation, and leaves it undisturbed until the I/O operation completes.
Channel Operation on Hardware Supported by DOS/360
IBM System/360 models include byte multiplexer and selector channels. Selector channels are busy from the start of an I/O operation through channel end on the last CCW in the started CCW chain. When disk and tape devices are attached to byte multiplexer channels, they operate in burst mode and and the channel remains busy from the start of an I/O operation through channel end on the last CCW.
Two System/370 models were supported by DOS/360: the 145 and 155. Currently available DOS/360 system generation manuals are for release 25 and do not describe the System/370 support, but the models are supported based on a review of the release 26.2 system generation macros.
The 145 included byte multiplexer and selector channels, and the selector channels could be changed to block multiplexer channels. The 155 included only byte and block multiplexer channels. The system guides for each system say that when I/O was started on non-RPS disks (2311, 2314) or then-currently announced tapes (2400 and 3400 series), the block multiplexer channels operated in selector mode. (Sources: for the 145, GC20-1734-0, A Guide to the IBM System/370 Model 145, page 29; for the 155, GC20-1720-0, A Guide to the IBM System/370 Model 155, page 27.) This device specific operation of a block multiplexer channel appears to be independent of the operating system running on the hardware.
Further, both models operated their block multiplexer channels (if equipped) as selector channels unless control register zero bit zero (CR0.0) was set to one. CR0.0 was initialized by the hardware at system reset and IPL to zero. So operating systems that were not control register-aware, like DOS/360, would leave CR0.0 at zero, and all 145/155 channels ran in selector mode.
So DOS/360 saw channel busy from start I/O through channel end on the last CCW in the chain.
Because Hercules does not return channel busy to Test Channel, the DOS/360 supervisor cannot serialize access to the channel's single 3-CCW prefix, and can alter it before the current I/O operation is completed. Because Hercules does I/O asynchronously with dispatch of the emulated CPU, we can expect intermittent problems to arise.
Failure Trace Analysis
The following excerpts were taken from a single trace of a large assembler job, assemble, link, and execute. The job is the BTAM sample program provided by Bill Carlborg, chosen because it reliably failed. I believe it generated the failure because it's a large assembly job that generates significant printout as it generates object code on IJSYSLN; with POWER II, this creates lots of concurrent disk I/O.
An instruction trace is done on the first instruction in the DOS/360 supervisor that alters the 3-CCW prefix, and an CCW trace is done on a single 2314 disk volume mounted on X'192'. Assembler workfiles and IJSYSLN are located there starting at cylinder 70 (track 1400). I have removed the time-stamp, a constant "13:54:23" to save space.
Hercules Hyperion CCW tracing was modified to include the main storage address of the CCW being executed (the "At xxxxxx" in message HHC01315I). This was a minor change affecting three statements in io.c and one line in msgenu.h.
You can find the complete trace here.
Normal Disk I/O Operation with DASD File Protection
Here's how it's supposed to look. R7 contains the address of the application program CCW following the seek CCW, and R4 points to the 3-CCW prefix in the supervisor.
HHC02324I PSW=00040000A0000C5C INST=50740010 ST 7,16(4,0)
HHC02326I R:000017E0:K:06=08181AD0 19601961 070017E8 40000006 ...}.-./...Y ...
HHC02269I GR00=000070AC GR01=000060B8 GR02=00000004 GR03=00003938
HHC02269I GR04=000017D0 GR05=00003801 GR06=30006100 GR07=00006108
HHC02269I GR08=000035F0 GR09=0000025A GR10=000035F0 GR11=00001000
HHC02269I GR12=00002000 GR13=00003000 GR14=00004000 GR15=00000010
HHC00396I 0:0192 start i/o file[0] bufcur 1707 cache[19]
HHC01315I 0:0192 CHAN: At 0017D8 ccw 070060EE 40000006=>00000064 0000 ......
HHC01320I 0:0192 CHAN: start I/O S/370 conversion to asynchronous operation started
HHC01321I 0:0192 CHAN: start I/O S/370 conversion to asynchronous operation successful
HHC01315I 0:0192 CHAN: At 0017D8 ccw 070060EE 40000006=>00000064 0000 ......
HHC00431I 0:0192 CKD file dasd/wrk14a.2314: seeking to cyl 100 head 0
HHC00396I 0:0192 read trk 2000 (asynchronous)
HHC00396I 0:0192 0 rdtrk 2000
HHC00396I 0:0192 0 rdtrk[166] 2000 cache hit buf 03D05ED8:0000640000
HHC01312I 0:0192 CHAN: stat 0C00, count 0000
HHC01315I 0:0192 CHAN: At 0017E0 ccw 1F001960 40000001=>10 .
HHC00442I 0:0192 CKD file dasd/wrk14a.2314: set file mask 10
HHC01312I 0:0192 CHAN: stat 0C00, count 0000
HHC01315I 0:0192 CHAN: At 0017E8 ccw 08006108 19601961
HHC01315I 0:0192 CHAN: At 006110 ccw 310060F0 40000005=>00640000 01 .....
HHC00434I 0:0192 CKD file dasd/wrk14a.2314: read count orientation index
HHC00435I 0:0192 CKD file dasd/wrk14a.2314: cyl 100 head 0 record 0 kl 0 dl 8 of 0
HHC01312I 0:0192 CHAN: stat 0C00, count 0000
HHC01315I 0:0192 CHAN: At 006118 ccw 08006108 00000000
HHC01315I 0:0192 CHAN: At 006110 ccw 310060F0 40000005=>00640000 01 .....
HHC00434I 0:0192 CKD file dasd/wrk14a.2314: read count orientation count
HHC00435I 0:0192 CKD file dasd/wrk14a.2314: cyl 100 head 0 record 1 kl 0 dl 18 of 0
HHC01312I 0:0192 CHAN: stat 4C00, count 0000
HHC01315I 0:0192 CHAN: At 006120 ccw 1D006148 A0000008=>00640000 02000400 ........
HHC00438I 0:0192 CKD file dasd/wrk14a.2314: writing cyl 100 head 0 record 2 kl 0 dl 1024
HHC00396I 0:0192 updt trk 2000 offset 47 length 1032
HHC00396I 0:0192 updt trk 2000 offset 1079 length 8
HHC01315I 0:0192 CHAN: At 006120 ccw 0000BF58 20000400=>00000000 00000100 00000057 0E0C1512 ................
HHC01312I 0:0192 CHAN: stat 0C00, count 0000
HHC00396I 0:0192 end i/o bufcur 2000 cache[166] waiters 0
HHC00396I 0:0192 end i/o bufcur 2000 cache[166] waiters 0 (end of prior I/O)
HHC01316I 0:0192 CHAN: csw 10, stat 0C00, count 0000, ccw 006128
Analysis of a Failure Event
Here's a trace excerpt of a failing event.
HHC02324I PSW=00040000A0000C5C INST=50740010 ST 7,16(4,0)
HHC02326I R:000017E0:K:06=08006108 19601961 070017E8 40000006 ../..-./...Y ...
HHC02269I GR00=0000CBEC GR01=FF006150 GR02=00000004 GR03=00003938
HHC02269I GR04=000017D0 GR05=00003800 GR06=10006198 GR07=000061A0
HHC02269I GR08=000035F0 GR09=000001AC GR10=000035E0 GR11=00001000
HHC02269I GR12=00002000 GR13=00003000 GR14=00004000 GR15=00000010
I/O started. First CCW is in the supervisor and points to a seek address in the BG partition. The next two CCW's are the SFM and a TIC, both in the supervisor. The TIC should point to a SIDE CCW in the BG partition.
HHC00396I 0:0192 start i/o file[0] bufcur 2000 cache[166]
HHC01315I 0:0192 CHAN: At 0017D8 ccw 07006186 40000006=>00000073 0000 ......
HHC01320I 0:0192 CHAN: start I/O S/370 conversion to asynchronous operation started
HHC01321I 0:0192 CHAN: start I/O S/370 conversion to asynchronous operation successful
HHC01315I 0:0192 CHAN: At 0017D8 ccw 07006186 40000006=>00000073 0000 ......
I/O converted to asynchronous and the DOS/360 guest is dispatched.
Before the SFM and TIC are done, DOS updates the TIC for an I/O operation requested, based on the I/O address, from POWER running in the F1 partition.
HHC02324I PSW=00040000A0000C5C INST=50740010 ST 7,16(4,0) store
HHC02326I R:000017E0:K:06=080061A0 19601961 070017E8 40000006 ../..-./...Y ...
HHC02269I GR00=00000006 GR01=00181AE8 GR02=00000004 GR03=00003930
HHC02269I GR04=000017D0 GR05=00003801 GR06=10181AC8 GR07=00181AD0
HHC02269I GR08=00003610 GR09=0000025A GR10=00003610 GR11=00001000
HHC02269I GR12=00002000 GR13=00003000 GR14=00004000 GR15=00000030
(The BG partition ends at 0x019B33, F1 starts at 0x180800, and an idle F2 occupies the memory between.) The I/O request from the F1 partition is to a disk volume on X'191', which was not traced. Because channel busy was not returned by Test Channel, DOS/360 assumes that the CCW string is available and prepares it for the next I/O.
This store instruction breaks the I/O operation requested by BG.
The I/O trace continues with entries for the on-going BG I/O operation.
HHC00431I 0:0192 CKD file dasd/wrk14a.2314: seeking to cyl 115 head 0
HHC00396I 0:0192 read trk 2300 (asynchronous)
HHC00396I 0:0192 0 rdtrk 2300
HHC00396I 0:0192 0 rdtrk[154] 2300 cache hit buf 0245C538:0000730000
HHC01312I 0:0192 CHAN: stat 0C00, count 0000
HHC01315I 0:0192 CHAN: At 0017E0 ccw 1F001960 40000001=>10 .
HHC00442I 0:0192 CKD file dasd/wrk14a.2314: set file mask 10
HHC01312I 0:0192 CHAN: stat 0C00, count 0000
We've reached the TIC in the supervisor. Note that it continues the BG-requested I/O operation by TIC'ing to a CCW string in the F1 partition. Note also that the Search CCW points to a count field that is not the same as the above seek.
HHC01315I 0:0192 CHAN: At 0017E8 ccw 08181AD0 19601961
HHC01315I 0:0192 CHAN: At 181AD8 ccw 31181B00 40000005=>00010006 03 .....
HHC00434I 0:0192 CKD file dasd/wrk14a.2314: read count orientation index
Ultimately, the above search fails No Record Found because there are no count fields CCCCHHHHRR on X'192' Cylinder 115 head 0 that have as their content 00010006xx. They should all start 00730000xx
Another alteration of the Channel 1 DASDFP 3-CCW prefix, also from an I/O requested by the F1 partition. The first I/O from BG is not complete, and there have been two alterations to the 3-CCW prefix.
HHC02324I PSW=00040000A0000C5C INST=50740010 ST 7,16(4,0) store
HHC02326I R:000017E0:K:06=08181AD0 19601961 070017E8 40000006 ...}.-./...Y ...
HHC02269I GR00=00000006 GR01=00181AE8 GR02=00000004 GR03=00003930
HHC02269I GR04=000017D0 GR05=00003801 GR06=30181AC8 GR07=00181AD0
HHC02269I GR08=00003610 GR09=0000025A GR10=00003610 GR11=00001000
HHC02269I GR12=00002000 GR13=00003000 GR14=00004000 GR15=00000030
Back to the original I/O operation requested by BG. Because the current SIDE count field has nothing to do with the original seek, the search is doomed and the write will never be executed.
HHC00435I 0:0192 CKD file dasd/wrk14a.2314: cyl 115 head 0 record 0 kl 0 dl 8 of 0
HHC01312I 0:0192 CHAN: stat 0C00, count 0000
HHC01315I 0:0192 CHAN: At 181AE0 ccw 08181AD0 20000001
HHC01315I 0:0192 CHAN: At 181AD8 ccw 31181B00 40000005=>00010006 04
HHC00434I 0:0192 CKD file dasd/wrk14a.2314: read count orientation count
HHC00435I 0:0192 CKD file dasd/wrk14a.2314: cyl 115 head 0 record 1 kl 0 dl 18 of 0
HHC02324I PSW=00040000A0000C5C INST=50740010 ST 7,16(4,0) store
HHC02326I R:000017E0:K:06=08181AD0 19601961 070017E8 40000006 ...}.-./...Y ...
HHC02269I GR00=00000006 GR01=00181AE8 GR02=00000004 GR03=00003930
HHC02269I GR04=000017D0 GR05=00003801 GR06=30181AC8 GR07=00181AD0
HHC02269I GR08=00003610 GR09=0000025A GR10=00003610 GR11=00001000
HHC02269I GR12=00002000 GR13=00003000 GR14=00004000 GR15=00000030
HHC01312I 0:0192 CHAN: stat 0C00, count 0000
HHC01315I 0:0192 CHAN: At 181AE0 ccw 08181AD0 20000001
HHC01315I 0:0192 CHAN: At 181AD8 ccw 31181B00 40000005=>00010006 05 .....