filesystems
Filesystems page for ClusterGateOrg
Local and distributed filesystems
BeeGFS (formerly FhGFS) is the leading parallel cluster file system, developed with a strong focus on performance and designed for very easy installation and management. If I/O intensive workloads are your problem, BeeGFS is the solution.
File systems and Storage Lab (FSL) Researchers and students in the FSL group perform research in operating systems with focus on file systems, storage, security, and networking. Emphasis is given to new methods, interfaces, and APIs that increase system security, usability, and performance significantly, improve the portability of operating system code, speed the productivity of development of new code, and more.
File systems HOWTO This small HOWTO is about filesystems and accessing filesystems. It is not Linux- or Unix-related document as you probably expect. You can find there also a lot of interesting information about non-Unix (file)systems, but Unix is my primary interest :-). More information and the latest version of this document can be found at http://martin.hinner.info/fs/.
File systems Linux has many file system types. Here is one of the attempt to review of them.
CloudStore Web-scale applications require a scalable storage infrastructure to process vast amounts of data. CloudStore (formerly, Kosmos filesystem) is an open-source high performance distributed filesystem designed to meet such an infrastructure need:
CloudStore is implemented in C++ using standard system components such as STL, boost libraries, aio, log4cpp.
CloudStore is integrated with Hadoop and Hypertable. This enables applications bult on those systems to seamlessly use CloudStore as the underlying data store.
CloudStore is deployed on Solaris and Linux platforms for storing web log data, crawler data, etc.
CloudStore source code is released under the terms of the Apache License Version 2.0.
XFS is one of popular file systems for Linux with good speed characteristics
Raiser journalling file system
ZFS is a new kind of file system that provides simple administration, transactional semantics, end-to-end data integrity, and immense scalability. ZFS is not an incremental improvement to existing technology; it is a fundamentally new approach to data management. We've blown away 20 years of obsolete assumptions, eliminated complexity at the source, and created a storage system that's actually a pleasure to use.
ZFS on Linux - Native ZFS for Linux! - ZFS is an advanced file system and volume manager which was originally developed for Solaris. It has been successfully ported to multiple platforms and now there is a a functional Linux ZFS kernel port. The port includes a functional and stable SPA, DMU, ZVOL, and Posix Layer (ZPL).
Virtual, Distributed and Parallel file systems
GnomeVFS -- is a library that allows applications to transparently access various types of filesystems through a uniform interface. GnomeVFS modules include support for things such as WebDAV, ftp, local filesystem, gzip, bzip2, cdda, and others. GNOME VFS is currently used as one of the foundations of the Nautilus file manager
FUSE is mechanizm to create new file systems with desired features. With FUSE it is possible to implement a fully functional filesystem in a userspace program. Features include:
Simple library API
Simple installation (no need to patch or recompile the kernel)
Secure implementation
Userspace - kernel interface is very efficient
Usable by non privileged users
Runs on Linux kernels 2.4.X and 2.6.X
Has proven very stable over time
SSHFS - SSH Filesystem.
Based on FUSE (the best userspace filesystem framework for linux ;-)
Multithreading: more than one request can be on it's way to the server
Allowing large reads (max 64k)
Caching directory contents
Andrew File System (Open AFS) oldest distributed file system. Till now the system is in wide use.
The Distributed-Parallel Storage System (DPSS) The Distributed-Parallel Storage System (DPSS) at LBL is a scalable, high-performance, distributed-parallel data storage system orginally developed as part of the DARPA -funded MAGIC Testbed, with additional support from the U.S. Dept. of Energy, Energy Research Division, Mathematical, Information, and Computational Sciences Office.
The DPSS is a data block server, which provides high-performance data handling and architecture for building high-performance storage systems from low-cost commodity hardware components. This technology has been quite successful in providing an economical, high-performance, widely distributed, and highly scalable architecture for caching large amounts of data that can potentially be used by many different users.
Parallel virtual file system -- PVFS. The goals PVFS project are to provide both a testbed for parallel I/O research as well as a freely available, production level parallel file system for use in the cluster community.
Red Hat Global File System Red Hat GFS allows a cluster of Linux servers to share data in a common pool of storage, allowing you to:
Greatly simplify your data infrastructure:
Install and patch applications once, for the entire cluster
Reduce the need for redundant copies of data
Simplify back-up and disaster recovery tasks
Maximize use of storage resources and minimize your storage costs:
Manage your storage capacity as a whole vs. by partition
Decrease your overall storage needs by reducing data duplication
Scale clusters seamlessly, adding storage or servers on the fly:
No more partitioning storage with complicated techniques
Add servers simply by mounting them to the common file system
Achieve maximum application uptime:
Red Hat Cluster Suite is included with Red Hat GFS
Lustre this is another parallel file system
Another parallel file system - Panasas By combining a parallel file system with object-based storage, the Panasas Storage Cluster delivers scalable bandwidth and random I/O to accelerate application throughput and streamlines operations within single scalable namespace to accelerate productivity.
IBM General Parallel File System GPFS provides high-performance I/O by "striping" blocks of data from individual files across multiple disks (on multiple storage devices) and reading/writing these blocks in parallel. In addition, GPFS can read or write large blocks of data in a single I/O operation, thereby minimizing overhead.
OCFS Oracle Cluster File System (OCFS) presents a consistent file system image across the servers in a cluster. OCFS allows administrators to take advantage of a files system for the Oracle database files (data files, control files, and archive logs) and configuration files. This eases administration of the Oracle Real Application Clusters.
Project Description: Oracle Clustered File System License: GPL
GlusterFS is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. Storage bricks can be made of any commodity hardware such as x86-64 server with SATA-II RAID and Infiniband HBA).
CEPH - Ceph’s object storage system offers a significant feature compared to many object storage systems available today: Ceph provides a traditional file system interface with POSIX semantics. Object storage systems are a significant innovation, but they complement rather than replace traditional file systems. As storage requirements grow for legacy applications, organizations can configure their legacy applications to use the Ceph file system too! This means you can run one storage cluster for object, block and file-based data storage.
Disk storage at CERN: Handling LHC data and beyond - IOPscience - paper 2013 [EOS is an open source distributed disk storage system in production since 2011 at CERN. Development focus has been on low-latency analysis use cases for LHC 1 and non- LHC experiments and life-cycle management using JBOD 2 hardware for multi PB storage installations. The EOS design implies a split of hot and cold storage and introduced a change of the traditional HSM 3 functionality based workflows at CERN.]
EOS as the present and future solution for data storage at CERN - paper 2015 [EOS is an open source distributed disk storage system in production since 2011 at CERN. Development focus has been on low-latency analysis use cases for LHC1 and non-LHC experiments and life-cycle management using JBOD2 hardware for multi PB storage installations. The EOS design implies a split of hot and cold storage and introduced a change of the traditional HSM3 functionality based workflows at CERN. The 2015 deployment brings storage at CERN to a new scale and foresees to breach 100 PB of disk storage in a distributed environment using tens of thousands of (heterogeneous) hard
drives. EOS has brought to CERN major improvements compared to past storage solutions by allowing quick changes in the quality of service of the storage pools. This allows the data centre
to quickly meet the changing performance and reliability requirements of the LHC experiments with minimal data movements and dynamic reconfiguration.]
© 2009-2024
Andrey Ye. Shevel