When it comes to attaching storage to computers, ATA (AT attachment) is the most prevalent method. The characters AT, which stand for "advanced technology," come from the name of the first 80286-based IBM PC. ATA drives are found mostly on desktop systems and laptops. Higher-end systems, often called "servers," utilize a connection technique called SCSI (Small Computer System Interface) parallel bus architecture. These systems may have several such SCSI buses attached to them. The more SCSI buses that can be effectively connected to a system, the higher the data input/output (I/O) capabilities of that system.
A SCSI bus permits hard disks, tape drives, tape libraries, printers, scanners, CD-ROM's, DVD's, and the like to be connected to server systems. It can be considered a general interconnection technique that permits devices of many different types to inter-operate with computer systems.
The protocol used on the SCSI bus is the SCSI Protocol. It defines how the SCSI device can be addressed, commanded to perform some operation, and give or take data to or from the (host) computing system. The operational commands are defined by a data structure called a command description block (CDB). For example, a read command would have a CDB that contained an "opcode" defined by the protocol to mean, "read." It would also contain information about where to get the data (e.g., the block location on the disk) and miscellaneous flags to further define the operation.
The protocol that defines how a SCSI bus is operated also defines how to address the various units to which the CDB will be delivered. Generally, presenting the address on the hardware lines of the SCSI bus performs the addressing. This address technique calls out a particular SCSI device, which may then be subdivided into one or more logical units (LUs). An LU is an abstract concept that can represent various real objects such as tapes, printers, and scanners.
Each LU is given an address. This is a simple number called the logical unit number (LUN). Thus, the SCSI protocol handles the addressing of both the SCSI device and the LU. (Note: "LUN," though technically incorrect, will often be used when "LU" is meant.) Servers may connect to many SCSI buses; in turn the SCSI buses can each connect to a number of SCSI devices, and each SCSI device can contain a number of LUs (8, 16, 32, etc.). Therefore, the total number of SCSI entities (LUs) attached to a system can be very large.
Issue with SCSI
Taking storage from one system's SCSI bus and moving it to another system's SCSI bus can be a major disruptive problem often requiring booting of the various systems. Users want a pool of storage, which can be assigned in a non-disruptive manner to the servers as need requires.
Another issue with the SCSI bus is that it has distance limitations varying from 1.5 to 25 meters, depending on the bus type (yes, there are multiple types). The bus type has to be matched with the requirements of the host and the SCSI (storage) devices (often called storage controllers), which seriously limits the amount of pooling a SCSI bus can provide.
Further, many SCSI bus storage devices can have no more than one bus connected to them, and unless high-end storage devices are used, one generally has at most two SCSI bus connections per storage device. In that case the storage devices have at most two different host systems that might share the various LUs within the SCSI devices.
Often the critical host systems want a primary and a secondary connection to the storage devices so that they have an alternate path in case of connection or bus failure. This results in additional problems for systems that want alternate paths to the storage and, at the same time, share the storage controllers with other hosts (which might be part of a fail-over-capable cluster).
Often an installation requires a cluster made up of more than two hosts, and it uses a process called file sharing via a shared file system (e.g., Veritas Clustered File System) or a shared database system (e.g., Oracle Cluster Database). Often this is not possible without the expense of a mainframe/ enterprise-class storage controller, which usually permits many SCSI bus connections but brings the installation into a whole new price range.
Understanding the problems with SCSI led a number of vendors to create a new interconnection type known as Fibre Channel. In this technology the SCSI CDBs are created in the host system, as they were in SCSI bus systems; however, the SCSI bus is replaced with a physical "fibre channel" connection and a logical connection to the target storage controller.
The term "logical connection" is used because Fibre Channel (FC) components can be interconnected via hubs and switches. These interconnections make up a network and thus have many of the characteristics found in any network. The FC network is referred to as an FC storage area network (SAN).
Issue with Fibre Channel
Unlike in an Internet Protocol (IP) network, basic management capability is missing in Fibre Channel. This is being rectified, but the administrator of an IP network cannot now, and probably never will be able to, use the same network management tools on an FC network that are used on an IP network. This requires duplicate training cost for the FC network administrator and the IP network administrator. These costs are in addition to the costs associated with the actual storage management duties of the storage administrator.
The total cost of ownership (TCO) with Fibre Channel is very high compared to that with IP networks. This applies not only to the price of FC components, which are significantly more expensive than corresponding IP components, but also to operation and maintenance. The cost of training personnel internally or hiring a service company to operate and maintain the FC network is a significant addition to the TCO.
The iSCSI (Internet SCSI) protocol was created in order to reduce the TCO of shared storage solutions by reducing the initial outlay for networking, training, and fabric management software. To this end a working group within the IETF (Internet Engineering Task Force) Standards Group was established.
iSCSI has the capability to tie together a company's systems and storage, which may be spread across a campus-wide environment, using the company's interconnected local area networks (LANs), also known as intranets. This applies not only to the company's collection of servers but also to their desktop and laptop systems.
Desktops and laptops can operate with iSCSI on a normal 100-megabit-per-second (Mb/s) Ethernet link in a manner that is often better than "sawing across"[*] their own single-disk systems. Additionally, many desktop systems can exploit new "gigabit copper" connections such as the 10/100/1000BaseT Ethernet links. The existing wiring infrastructure that most companies have is Category 5 (Cat. 5) Ethernet cable. The new 1000BaseT network interface cards (NICs) are able to support gigabit speeds on the existing Cat. 5 cables. It is expected that the customer will, over time, replace or upgrade his desktop system so that it has 1000BaseT NICs. In this environment, if the desktops can operate effectively at even 300 Mb/s, the customer will generally see better response than is possible today with normal desktop ATA drives—without having to operate at full gigabit speeds.
Data suggest that 500MHz Pentium systems can operate the normal host TCP/IP (Transmission Control Protocol over Internet Protocol) stacks at 100 Mb/s using less than 10% of CPU resources. These resources will hardly be missed if the I/O arrives in a timely manner. Likewise we can expect the desktop systems shipping in the coming year and beyond to be on the order of 1.5 to 3 GHz. This means that, for 30 megabyte-per-second (MB/s) I/O requirements (approximately 300 Mb/s), desktop systems will use about the same, or less, processor time as they previously consumed on 500MHz desktop systems using 100Mb/s links (less than 10%). Most users would be very happy if their desktops could sustain an I/O rate of 30 MB/s. (Currently desktops average less than 10 MB/s.)
The important point here is that iSCSI for desktops and laptops makes sense even if no special hardware is dedicated to its use. This is a significant plus for iSCSI versus Fibre Channel, since Fibre Channel requires special hardware and is therefore unlikely to be deployed on desktop and laptop systems.
Even though iSCSI HBAs and chips will be able to operate at link speed, it is expected that their latency will be slightly higher than that of Fibre Channel's. This difference is considered to be less than 10 microseconds, which, when compared to the time for I/O processing, is negligible. iSCSI's greater latency is caused by the greater amount of processing to be done within the iSCSI chip to support TCP. Thus, there is some impact from the additional work needed, even if supported by a chip. A key future vendor-value-add will be how well a chip is able to parallel its processes and thus reduce the latency. This is not to say that the latency of iSCSI chips will be unacceptable. In fact, it is believed that it will be small enough not to be noticeable in most normal operations.
Another important capability of iSCSI is that it will be able to send I/O commands across the Internet or a customer's dedicated wide area networks (WANs). This will be significant for applications that require tape.
An odd thing about tape is that almost everyone wants to be able to use it (usually for backup) but almost no one wants the tape library nearby. iSCSI provides interconnection to tape libraries at a great distance from the host that is writing data to it. This permits customers to place their tape libraries in secure backup centers, such as "Iron Mountain." A number of people have said that this "at distance" tape backup will be iSCSI's killer app.
At the bottom line, iSCSI is all about giving the customer the type of interconnect to storage that they have been requesting—a network-connected storage configuration made up of components that the customer can buy from many different places, whose purchase price is low, and whose operation is familiar to many people (especially computer science graduates). They also get a network they can configure and operate via standard network management tools, thereby keeping the TCO low. Customers do not have to invest in a totally new wiring installation, and they appreciate the fact that they can use Cat. 5 cable—which is already installed. They like the way that iSCSI can seamlessly operate, not only from server to local storage devices but also across campuses as well as remotely via WANs.
These customers can use iSCSI to interconnect remote sites, which permits mirrored backup and recovery capability, as well as a remote connection to their tape libraries. On top of all that, iSCSI will be operating on low-end systems and on high-end systems with performance as good as what FC networks can provide. If that is not enough, it also comes with built-in Internet Protocol security (IPsec), which the customer can enable whenever using unsecured networks.
Hard Drive
There are two main hard drive types available today:
ATA (used in desktop and laptop systems)
SCSI (used in server-class systems)
SCSI
SCSI drives are connected to a host via a SCSI bus and use the SCSI protocol.
The SCSI command description block (CDB) is a key element of the SCSI protocol.
The real or logical disk drive that the host talks to is a logical unit (LU).
The SCSI protocol gives each addressable LU a number, or LUN.
SCSI bus distance limitations vary from 1.5 to 25 meters depending on the type of cable needed by the host or drive.
Non enterprise storage controllers usually have only one or two SCSI bus connections.
Enterprise storage controllers usually have more than two SCSI bus connections.
Clustering servers without using enterprise-class storage systems is often difficult (especially if each host wants to have more then one connection to a storage controller).
Fibre Channel
Fibre Channel (FC) connections solve many interconnection problems, but bring their own management problems.
Fibre Channel requires its own fabric management software and cannot use the standard IP network management tools.
Fibre Channel needs software to manage the shared storage pool to prevent systems from stepping on each other.
FC networks are generally considered to have a high TCO (total cost of ownership).
FC HBAs, chips, and switches are generally considered to be expensive (especially when compared to IP network NICs and switches).
Personnel trained in Fibre Channel are scarce, and companies are pirating employees from each other.
Universities in general are not teaching Fibre Channel.
iSCSI
iSCSI offers the same inter-connectivity and the same pooled-storage approach that Fibre Channel does, but over the more familiar IP network infrastructure.
iSCSI offers the same fabric management capabilities that normal IP networks do.
iSCSI can use much of the storage management software that was developed for Fibre Channel.
iSCSI can utilize IP-trained personnel to manage the iSCSI-based SAN.
iSCSI is believed to offer the promise of lower TCO.
iSCSI will work not only with server systems but also with desktop and laptop systems via currently installed Cat. 5 cables and 10/100BaseT as well as 10/100/1000BaseT NICs.
Desktop and laptop systems will probably be very happy even if they utilize only up to 300 Mb/s on the 1000Mb/s-capable Cat. 5 cable.
The prices of iSCSI HBAs are currently significantly less than those of FC HBAs.
FC prices won't fall significantly until iSCSI becomes a threat at the high end of the market.
iSCSI can be expected to enable "at-distance" computing, where storage is located beyond the local environment.
Tape backup is likely to become the killer app for iSCSI in the future.
File servers (NASs) can be located on IP networks; therefore, the fabric management that fits NAS fits iSCSI and vice versa.
NAS and iSCSI storage can be located on the same network; though they have overlapping capabilities, they also each have capabilities that the other does not.