Measurements of a Distributed File System

Abstract:

We analyzed the user-level file access patterns and caching behavior of the Sprite distributed file system. The first part of our analysis repeated a study done in 1985 of the BSD UNIX file system. We found that file throughput has increased by a factor of 20 to an average of 8 Kbytes per second per active user over 10-minute intervals, and that the use of process migration for load sharing increased burst rates by another factor of six. Also, many more very large (multi-megabyte) files are in use today than in 1985. The second part of our analysis measured the behavior of Sprite’s main-memory file caches. Client-level caches average about 7 Mbytes in size (about one-quarter to one- third of main memory) and filter out about 50% of the traffic between clients and servers. 35% of the remaining server traffic is caused by paging, even on workstations with large memories. We found that client cache consistency is needed to prevent stale data errors, but that it is not invoked often enough to degrade overall system performance.

Introduction:

In 1985 a group of researchers at the University of California at Berkeley performed a trace-driven analysis of the UNIX 4.2 BSD file system [11]. That study, which we call ‘‘the BSD study,’’ showed that average file access rates were only a few hundred bytes per second per user for engineering and office applications, and that many files had lifetimes of only a few seconds. It also reinforced commonly-held beliefs that file accesses tend to be sequential, and that most file accesses are to short files but the majority of bytes transferred belong to long files. Lastly, it used simulations to predict that main-memory file caches of a few megabytes could substantially reduce disk I/O (and server traffic in a networked environment). The results of this study have been used to justify several network file system designs over the last six years.

In this paper we repeat the analysis of the BSD study and report additional measurements of file caching in a distributed file system. Two factors motivated us to make the new measurements. First, computing environments have changed dramatically over the last six years, from relatively slow time-shared machines (VAX-11/780s in the BSD study) to today’s much faster personal workstations. Second, several network-oriented operating systems and file systems have been developed during the last decade, e.g. AFS [4], Amoeba [7], Echo [3], Locus [14], NFS [16], Sprite [9], and V [1]; they provide transparent network file systems and, in some cases, the ability for a single user to harness many workstations to work on a single task. Given these changes in computers and the way they are used, we hoped to learn how file system access patterns have changed, and what the important factors are in designing file systems for the future.

We made our measurements on a collection of about 40 10-MIPS workstations all running the Sprite operating system [9, 12]. Four of the workstations served as file servers, and the rest were diskless clients. Our results are presented in two groups. The first group of results parallels the analysis of the BSD study. We found that file throughput per user has increased substantially (by at least a factor of 20) and has also become more bursty. Our measurements agree with the BSD study that the vast majority of file accesses are to small files; however, large files have become an order of magnitude larger, so that they account for an increasing fraction of bytes transferred. Many of the changes in our measurements can be explained by these large files. In most other respects our measurements match those of the BSD study: file accesses are largely sequential, files are typically open for only a fraction of a second, and file lifetimes are short.

Our second set of results analyzes the main-memory file caches in the Sprite system. Sprite’s file caches change size dynamically in response to the needs of the file and virtual memory systems; we found substantial cache size variations over time on clients that had an average cache size of about 7 Mbytes out of an average of 24 Mbytes of main memory. About 60% of all data bytes read by applications are retrieved from client caches without contacting file servers. Sprite’s 30-second delayed-write policy allows about 10% of newly-written bytes to be deleted or overwritten without being written back from the client cache to the server.

Sprite guarantees the consistency of data cached on different clients. We found that many users would be affected if Sprite’s consistency guarantees were weaker, but write-sharing occurs infrequently enough that the overheads of implementing consistency have little impact on average system performance. We compared Sprite’s consistency implementation with two other approaches and found that even the best approach, a token-based mechanism, does not significantly reduce the consistency overheads for our workload.

Sprite allows users to take advantage of many workstations simultaneously by migrating processes onto idle machines. Process migration increased the burst rates of file throughput by a factor of six in comparison to overall file throughput. Fortunately, we found that process migration does not reduce the effectiveness of file caches. Migrated processes actually had higher cache hit ratios than non-migrated processes, and process migration also had little impact on the cache consistency mechanism.

The rest of the paper is structured as follows: Section 2 describes the system that was measured and its workload, and Section 3 describes how we collected data. Section 4 contains the measurements that repeat the BSD study and Section 5 presents our measurements of Sprite caches. Section 6 is a summary.

Summary

Our measurements of application-level file accesses show many of the same trends found by the BSD study six years ago. Average throughput per user is relatively low, most files are short and are opened for only brief periods of time, most accesses are sequential, and most bytes don’t live more than a few minutes. We found two substantial changes, however. First, file throughput has increased by a factor of 20 overall and has become much more bursty. Second, typical large files used today are more than an order of magnitude larger than typical large files used in 1985. Large files account for much of the increase in throughput and burstiness, and they stress many parts of the system, such as the file caches.

In our measurements of the Sprite file caches, we found that increases in cache size have led to increases in read hit ratios, but the improvements have been much smaller than we expected. We suspect that the increasing size of large files accounts for this discrepancy. We found almost no improvement in the effectiveness of caches at reducing write traffic: about 90% of all new bytes eventually get written to the server in order to safeguard the data. If read hit ratios continue to improve, then writes will eventually dominate file system performance and new approaches, such as longer writeback intervals, non-volatile cache memories, and log-structured file systems [15], will become attractive.

We found that many users access file data in a way that assumes cache consistency among workstations, and that they will be inconvenienced on a daily basis if full consistency is not provided. Fortunately, we also found that the overheads for implementing consistency are very low, since write-sharing only occurs for about one percent of file accesses. Our simulations of cache consistency mechanisms showed no clear winner: the mechanisms had com- parable overheads, and where there were differences they depended strongly on the application mix. Without specific information about application behavior, it seems wisest to choose a consistency mechanism based on the simplicity of its implementation.

Lastly, we found that process migration increases the burstiness of file traffic by an order of magnitude. For example, users with migrated processes generated file traffic at a short-term rate 40 times the medium-term average rate for all users. Fortunately, we found that migration does not seem to degrade the performance of file caches or increase cache consistency overheads. In fact, we found that file caches worked better for migrated processes than for processes in general.