Current research on DNA storage usually focuses on the improvement of storage density by developing effective encoding and decoding schemes while lacking the consideration on the uncertainty in ultra-long-term data storage and retention. Consequently, the current DNA storage systems are often not self-contained, implying that they have to resort to external tools for the restoration of the stored DNA data. This may result in high risks in data loss since the required tools might not be available due to the high uncertainty in far future. To address this issue, we propose in this paper a self-contained DNA storage system that can bring self-explanatory to its stored data without relying on any external tool. To this end, we design a specific DNA file format whereby a separate storage scheme is developed to reduce the data redundancy while an effective indexing is designed for random read operations to the stored data file. We verified through experimental data that the proposed self-contained and self-explanatory method can not only get rid of the reliance on external tools for data restoration but also minimise the data redundancy brought about when the amount of data to be stored reaches a certain scale.

Despite that they are beneficial to storage density improvement, these schemes might result in high risks in data loss for ultra-long-term retention since the DNA storage systems are not self-contained anymore in the sense that the restoration of the stored DNA data has to rely on external tools, say decompression program in our example, which might not be available due to the high uncertainty in the far future. As a result, without solving this external reliance issue, it is unlikely for DNA storage to become a viable option for storing and retaining data in ultra-long terms.


Ne Yo Self Explanatory Download


Download File 🔥 https://urllio.com/2y3Iv6 🔥



In this work, to address the external reliance issue while improving the storage density, we propose a self-contained DNA storage system that can bring self-explanatory to its stored data without relying on any external tool. To this end, we allow the external tool (the corresponding decompression program) to be encoded with the compressed file together into a unified DNA sequence payload. However, unlike the one-to-one mapping between the compressed file and its corresponding decompression program in traditional cases, we deliberately make it possible to share a single decompression program among a set of compressed files for minimising the information redundancy. Although this strategy is not difficult to implement in traditional storage, it is hard to implement for the DNA storage given its random read hurdles and high sequencing cost during the data restoration. To address these issues and achieve our goal, we design a specific DNA file format whereby a non-continuous storage scheme is developed to reduce the data redundancy on the one hand, and combine with traditional storage media on the other hand to obtain effective indexing for random read operations to the stored data file while minimising the cost with an one-off sequencing.

To realise the concept of the self-containment, we take data compression tool as an example to develop a DNA-based storage method that can not only minimise the cost of DNA synthesis and sequencing but also support random read operation from the DNA data.

In order to have a full play to the advantages that DNA can store data for a ultra-long time, we propose a concept of self-contained and self-explanatory technology for the DNA storage and design a method to implement it.

Since data compression is an important tool in the DNA storage for cost-efficiency, we concentrate in this research on the proposed technology by taking compression and self-extracting as a focus. In fact, the compression tool can also be stored with other data related information, such as encoding parameters, file storage format, etc. We describe our methodology in three steps. We first overview the DNA storage process in Fig. 1, and then introduce the detailed information regarding the data self-containment technology. Finally, we describe the data self-explanation technique by defining the format of the DNA file and DNA fragment to support the implementation of the functions presented in Fig. 2.

Figure 1 depicts the storage process where the input binary data file is often compressed to minimise the data redundancy while saving the synthesis cost. In order to achieve data self-containment, we store both the compressed data and the decompression program as the payloads in the DNA file. The binary data in brown represents the compressed data, and in blue represents the decompression program. Both types of data are segmented and uniformly coded as synthetic DNA sequence. Since it is necessary to distinguish between the data file and the program file in DNA fragments, we deliberately add a bp-length flag in the DNA fragments. The data self-explanatory process is reflected in the data restore process and is supported by the defined data format.

The existing DNA storage systems are in general not self-contained as they always resort to external tools to backup/restore the data. For example, if the compressed data needs to be restored, the corresponding decompression program should be available. Although the chance of unavailability of the external tool is very small, we may still take the risk to lose the stored data in the case that the required tool is unavailable due to the uncertainties after a ultra-long time period for the DNA storage, say over hundred years. Therefore, we hope that the data and related tools stored in DNA are as complete as possible, allowing the data to be self-contained.

The purpose of our self-containment and self-explanation is to make the DNA file contain more information, not only including the encoded data itself, but also maintaining the index information for the stored data. As shown in Figs. 3, 4 and 5, we take the data compression in DNA storage as an example to explain the proposed self-contained and self-explanatory technology. In this example, the decompression program is our tool. As usual, the DNA fragments are stored in pools without specifying their orders, and all the stored files are identified using primers.

The goal of the data self-containment is able to restore the compressed data stored in the DNA storage without relying on external decompression tools. In particular, when reading a data file, the system first finds the file and the primer sequence corresponding to its decompression tools, and then obtains the data and the tool files at the same time through the PCR and DNA sequencing technology. After decoding, the decompression tool can automatically restore the data file to its original form to realise the self-explanatory function of the data. Clearly, to achieve this goal, one has to embed many different kinds of information into the DNA file in such a way that the self-contained data is also sufficient to self-explanation, which requires a well-defined file format.

The definition of the data format is to support multiple implementations of data self-containment with the ability of self-interpretation. The data format should include two levels of metadata information. The first level of the metadata describes a format for the binary compressed file in the data pre-processing step while the second level of the metadata defines the format for those small fragments after the file is encoded into a DNA sequence.

DNA File Format: The last steps in Figs. 4 and 5 are performed to select the specified data files and and their associated tool files. If there are multiple files selected to read at same time, a certain data format is required to support the realisation of data self-extraction for these files without compromising others. To this end, we define the format of data file as shown in Table 1.

We realise the data self-explanation mainly through the definition of the file format. The logic of file writing is relatively simple, which is simply to write the file in sequence, according to the field orders defined by the file format. The reading process is performed after the DNA data is decoded into binary data. First, the data file and the tool file are separated according to the first field FT, and the data file is put in the \(datafiles\) array, the tool file in the \(toolfiles\) array. Then, Algorithm 1 for file reading is performed, which first finds the needed file according to FID in the \(datafiles\) array (Line 4-5), then determines the storage method of the file according to SM. If SM is OF, the next field is the data field, and the subsequent field is data D, which can be read directly and returned (Line 6-7). Otherwise if SM is CPF, the data field D could be obtained based on the data length DL of the next field, and the subsequent data is the tool field TD, which can be used to restore the processed data to the original data and return (Line 8-12). Otherwise if SM is SF, the FID of the tool file can be obtained according to field TFID, then the tool files array can be traversed to find TD, whereby the processed data can be restored to the original data (Line 14-20).

The overall architecture of the DNA storage system is illustrated in Fig. 7. The data files stored on traditional storage medium is firstly pre-processed. The pre-processing procedure is composed of data compression, data deduplication, and data formatting, which follows the defined DNA segment format for self-containment. The proposed algorithm then encodes the data into DNA sequence for DNA synthesis. The synthesised DNA is stored in DNA storage medium and sequenced to produce the DNA sequence. The DNA sequence is then decoded into binary files. Finally, the compression algorithm that is self-contained in the sequence is used to extract the file content, achieving the self-explanatory.

The data self-containment inevitably brings some storage overheads. To evaluate this impact, we first define compression efficiency, denoted by e, as the metric to measure the space impact of the selected compression programs in the worst case, whereby the program with the best performance is selected as the main test program to evaluate our proposed methods. 2351a5e196

minecraft but stone drop op loot mod download

bless the lord oh my soul ringtone download

download kumpulan game nintendo ds

download elo rating system app

free download notifications iphone