Parallel archiver

Parallel archiver version 4.76

Author: Amine Moulay Ramdane

Email: aminer@videotron.ca

Description: Parallel archiver using my Parallel LZO , Parallel LZ4 , Parallel Zlib , Parallel Bzip , Parallel LZMA and Parallel Zstd compression algorithms..

Supported features:

- Opens and creates archives using my Parallel LZ4 or Parallel LZO or Parallel Zlib or Parallel Bzip or Parallel LZMA or Parallel Zstd compression algorithms.

- Wide range of Parallel compression algorithms: Parallel LZ4, Parallel LZO, Parallel ZLib, Parallel BZip , Parallel LZMA and Parallel Zstd with different compression levels

- 64 bit supports - lets you create archive files over 4 GB , supports archives up to 2^63 bytes, compresses and decompresses files up to 2^63 bytes.

- Now my Parallel Zlib gives 5% better performance than Pigz.

- Supports memory and file streams , adds compressed data directly from streams and extracts archived files to streams without creating temp files.

- Save/Load the archive from stream

- Supports in-memory archives

- You can use it as a hashtable from the hardisk or from the memory

- Fault tolerant to power failures etc..

- Creates encrypted archives using Parallel AES encryption with 256 bit keys.

- Fastest compression levels are extremely fast

- Good balanced compression levels provide both good compression ratio and high speed

- Maximum compression levels provide much better compression ratio than Zip, RAR and BZIP and the same as 7Zip.

- It supports both compression and decompression rate indicator

- Now it supports processor groups on windows, so that it can use more than 64 logical processors and it scales well.

- It's NUMA-aware and NUMA efficient.

- It minimizes efficiently the contention so that it scales well.

- Now it uses only two threads that do the IO (and they are not contending) so that it reduces at best the contention, so that it scales well.

- You can test the integrity of your archive

- Easy object programming interface

- Full source codes available.

- Platform: Win32 , Win64

Now my Parallel archiver library is optimized for NUMA and it supports processor groups on windows and it uses only two threads that do the IO (and they are not contending) so that it reduces at best the contention, so that it scales well. Also now the process of calculating the CRC is much more optimized and is fast, and the process of testing the integrity is fast.

And you have to know that fastest speed of zstd is only 2x slower than LZ4 level 1. On the other end of the spectrum, zstd level 22 runs ~1 MB/s slower than LZMA at level 9 and produces a file that is only 2.3% larger.

Read here:

https://gregoryszorc.com/blog/2017/03/07/better-compression-with-zstandard/

You have to know that directories are recorded with a * character in the end, like if you have the directory c:\amine, it will be recorded as c:\amine\* with the character * in the end.

Also the .. in the paths to return back to the parent directory is not supported, also the paths must not contain a /, a path like c:/amine is not accepted, you have to write it like c:\amine, and the UNC paths are not supported, you have to use mklink of windows to solve it.

In this version, when you add a stream, the stream's name must not contain a directory path.

My Parallel archiver uses now the nedmalloc that is a VERY fast, VERY scalable, multithreaded memory allocator with little memory fragmentation .

Also i have added a method called AddDirectory() to add a directory, and i have also added a new property called AddFullPath that adds full path or not.

If you want to add a directory in the current directory with AddDirectory(), the path must start with .\

I have added the following methods: CopyArchive() that copy the archive to a stream and a new ExtractAll() that extract from a stream, i have documented the methods, please read the readme file inside the zip file.

Here is an example of how to use the above new methods, this example will extract an archive from a stream without loading the index, and it will copy the archive to the a stream:

==

PROGRAM test_pl;

uses cmem,PLZ4Archiver,system.classes,system.sysutils,findfile;

var

pzr: TPLZ4Archiver;

fstream1,fstream2:TFileStream;

Begin

fstream1:=TFileStream.create('amine.z', fmOpenReadWrite);

fstream2:=TFileStream.create('amine1.z', fmCreate);

pzr :=TPLZ4Archiver.Create('',1000,4);

pzr.indicator:=true; // to show the compression and decompression rate

pzr.password:='amine';

if not pzr.ExtractAll(fstream1,'c:\tmp1000') then writeln('not ok...');

pzr.CopyArchive(fstream2);

pzr.free;

fstream2.free;

End.

===

I have thoroughly tested and stabilized more my parallel archiver for many years, and now i think that it is more stable and efficient, so i think that you can be more confident with it.

Important note:

Notice that i have not used a global password , but every file can be encrypted with a different password using my Parallel AES encryption with 256 bit keys. So the security level is this:

- When you encrypt , the content of the files will be encrypted but the names of the files and directories will not be encrypted, But if you want to encrypt also the names of the files and directories using Parallel AES encryption with 256 bit keys, please compress first into an archive, and after that encrypt all the archive into another archive.

- But when you encrypt , you can update the files with the update() method without giving the right password, and you can delete the files without giving the right password, but my level of security is that you can not access the encrypted data without giving the right password.

And here is the correct definitions of the AnalyzeArchive() methods:

function AnalyzeArchive(filename:UTF8String):TypeError;overload;

- Analyze the format of Archive file, the returned TypeError is ctCorrupt if the format of the file is corrupt or ctUnknown if the format of the file is unknown.

function AnalyzeArchive(Stream:TStream):TypeError;overload;

- Analyze the format of the Archive Stream, the returned TypeError is ctCorrupt if the format of the stream is corrupt or ctUnknown if the format of the stream is unknown.

If AnalyzeArchive() returns ctCorrupt, you have to fix the archive with FixArchive() that will fix the format of the archive, and you have to test the integrity of the archive and delete the corrupted files if any.

Please notice with me that when you are using parallel LZ4 compression, you have to give a compression level of clLZ4None or clLZ4Fast or clLZ4Max, but you have to know that clLZ4Fast is the default compression level of LZ4 and clLZ4Max is the default compression level of LZ4HC, this is how i have implemented it in my Parallel archiver.

Now ExtractFiles() and ExtractAll() methods return false when the given password is not correct.

Please look at test_pzlib.pas , test_plzo.pas , test_plz4.pas , test_pbzip.pas and test_plzma.pas demos inside the zip file, compile and execute them.. -

I have done a quick calculation of the scalability prediction for my Parallel archiver, and i think it's good: it can scale beyond 100X on NUMA systems.

When you want to delete files inside the archive you have to call the DeleteFiles() method , the DeleteFiles() method will not delete the files, it will mark the files as deleted , when you want to delete completly the files , you have to call the DeletedItems() method to see how many files are marked deleted and after that you use the Clean() method to delete completly the files from the archive. I have implemented it like that, cause it's better in my opinion..

And my parallel archiver uses a hashtable to store the file names and there corresponding file positions so that you can direct access to files inside the archive when decompressing, and deleting etc. so it's very fast.

Please look at the test_pzlib.pas, test_plzo.pas, test_plz4.pas , test_pbzip.pas and test_plzma.pas demos inside the zip file to see how to use my Parallel archiver.

And please don't use directly the ParalleZlib.pas that i have included inside the Parallel archiver zip file, cause i have modified it to work correclty with my Parallel archiver.

If you want to use my ParallelZlib library just download it from my website, or download my other Parallel compression library.

You can now use my Parallel archiver as a hashtable from the hardisk with O(1) access, you can for example stream your database row with my ParallelVarFiler into a memory stream or into a string, and store it with my Parallel archiver into an archive, and after that your can access your rows into the hardisk as a hashtable with O(1) access, you can use it like that as a database if you have for example id keys that you want to map to database rows, that will be a good idea to use my Parallel archiver as a hashtable.

Question:

What's your newest ideas behind your parallel archiver ?

Answer:

Of course my Parallel Archiver supports Parallel compression etc. but my newest ideas behind my Parallel Archiver are the following:

I have played with Winzip and 7Zip , but if you want to give some files to extract or to test there integrity, they both (Winzip and 7Zip) will use sequential access and that's bad i think, so i have decided to implement a O(1) access that is very fast for extraction and and for testing the integrity etc. into my Parallel Archiver and for that i have used an in-memory hashtable that maintains the files names and there correponding file positions , and my second idea is that my Parallel Archiver is fault tolerant to power failures and also if your hardisk is full and you get file corruption etc. so my Parallel Archiver is fault tolerant to this kind of problems , 7Zip and Winzip i think are not fault tolerant to those kind of problems.

I have just played with 7Zip , and i have compressed 3 files into the archive and after than i have opened the archive with an editor and i have deleted some bytes and i have saved the file and after that when i have tried to open the archive, 7zip responded that the file is corrupted, so 7Zip is not fault tolerant, i think that with WinZip it's the same, but i have done the same test with my Parallel archiver, and it's recovering from the file damage, so it's fault tolerant to this kind of damages, such as power failures and when also the disk is full and you get a file corruption etc. I have implemented this kind of fault tolerancy into my Parallel archiver.

I have updated my Parallel archiver and i have added the Update() method, it's overloaded now in the first version you pass a key name and a TStream, and in the second version you pass a key name and a filename. Please look at the test_pzlib.pas demo inside the zip file to see how to use those methods.

So now you have all the methods to use my Parallel archiver as a Hashtable from the hardisk with direct access to the compressed and/or encrypted data with O(1) very fast access to the data , the DeleteFiles() has a O(1) time complexity, the ExtractFiles() and Extract() have also O(1) time complexity, and GetInfo() has also a O(1) time complexity, and of course the AddFiles() has also a O(1) time complexity, the Test() method has also a O(1) time complexity. So now it's extremely fast.

When you want to do solid compression with my Parallel archiver using Bzip , you can use the same method as is using Tar , you can first archive your file with the compression level 0 and after that compress all your archive file using Bzip, and when you want to encrypt your data with Parallel AES encryption just give a password by setting the password property and when you don't want to encrypt just set the password property to a null string or don't set the password property , that's all.

Parallel archiver supports the storing and restoring of the following file attributes:

Hidden, Archive, System, and Read only attributes.

To store and restore them just set the AddAttributes property like this:

pzr.AddAttributes:=[ffArchive,ffReadOnly,ffHidden,ffSystem,ffDirectory];

I have added the in-memory archives support, cause this way Parallel archiver will be much more faster than disk archives, and you will be able to lower much more the response time and to lower the load on your server.

If you want to use an in-memory archive, pass an empty string to the file name in the constructor, like this:

pzr :=TPLZ4Archiver.Create('',1000,4);

And if you want to read your in-memory archive , read from the Stream property that is exposed(a TStream) like this:

pzr.stream.position:=0;

A_Memory_Stream.copyfrom(pzr.stream,pzr.stream.size)

You can also load your archive from a file or memory stream just by assigning your file or memory stream to the Stream property (a TStream).

I have overloaded the GetKeys() method , now you can use wildcards, you can pass the wildcard in the first argument and the TStringList in the second argument like this: pzr.getkeys('*.pas',st);

and after that call the ExtractFiles() method and pass it the TStringList.

As you have noticed, the programming interface of my Parallel archiver is very easy to use.

And read this:

"We're a video sharing site located in China. We rewrote the PHP memcached client extension by replacing zlib with QuickLZ. Then our server loads were dramatically reduced by up to 50%, the page response time was also boosted. Thanks for your great work!

Jiang Hong"

http://www.quicklz.com/testimonials.html

http://www.quicklz.com/

So as you have noticed , like QuickLZ or Qpress, i have implemented Parallel archiver to be very fast also.

By using my Parallel Zlib or my Parallel LZ4 or my Parallel LZO compression algorithms my Parallel archiver will be very very fast and as i have written in my webpage:

"So now you have all the methods to use my Parallel archiver as a Hashtable from the hardisk with direct access to the compressed and/or encrypted data with O(1) very fast access to the data , the DeleteFiles() has a O(1) time complexity, the ExtractFiles() and Extract() have also O(1) time complexity, and GetInfo() has also a O(1) time complexity, and of course the AddFiles() has also a O(1) time complexity, the Test() method has also a O(1) time complexity. So now it's extremely fast. "

You can even use my Parallel archiver as a hash table database from the Harddisk to lower more the load on your server (from internet or intranet) and boost the response time.....

I have used solid compression like with the tar.lzma format and i have found that my Parallel archiver, with maximum level compression that is clLZMAMax, compresses to the same size as 7Zip with maximum level compression and it compresses 13% better than WinRar with maximum level compression and it is much better than WinZip on compression ratio .

How to use solid compression with my Parallel archiver ?

Just archive your files with clLZMANone and after that compress your archive with clLZMAMax, Parallel archiver will then compress to the same size as 7Zip with maximum level compression and it will compress 13% better than WinRar with maximum level compression and it will compress much better than WinZip with maximum level compression .

If you ask me a question such as:

Amine, how can we be more confident and delete the files on the hardisk after archiving them ?

Answer:

Before deleting the files on your hardisk after archiving them, you can for example compare the total number of bytes of all the uncompressed files on your archive and number of files on your archive with the number of files and the total size in bytes of all the files on your hardisk, you can do it easily by programming using the Parallel archiver interface, and of course you can test the integrity of the files of your archive... this applies easily if you are archiving a directory and all its files.

I have updated my Parallel archiver to a new version and i have decided to include Parallel LZ4 compression algorithm (one of the fastest in the world) into my Parallel archiver, so to compress bigger data such us Terabytes data you can use my Parallel LZO or my Parallel LZ4 compression algorithms with my Parallel archiver, i have also added the high compression mode to Parallel LZ4 compression algorithm, now for a fast mode use clLZ4Fast and for the high compression mode use clLZ4Max. The Parallel LZ4 high compression mode is interresting also, it compresses much better than LZO and it is very very fast on decompression, faster than Parallel LZO. I have included a test_plz4.pas demo inside my Parallel archiver zip file to show you how to use Parallel LZ4 algorithm with my Parallel archiver.

Here is the LZ4 website if you want to read about it:

http://code.google.com/p/lz4/

I have downloaded also the IHCA compression algorithm from the following website:

http://objectegypt.com/

And i have written a Parallel IHCA and begin testing it against my Parallel LZO and my Parallel LZ4 , they say on the IHCA website that it has the same performance as the LZO algorithm , but i have noticed on my benchmarks that Parallel IHCA(that i wrote) is much more slower than my Parallel LZO and my Parallel LZ4 , so i think the IHCA compression algorithm is a poor quality software that you must avoid, so please use my Parallel archiver and Parallel compression library cause with my Parallel LZO and my Parallel LZ4 they are now one of the fastest in the world.

I have also downloaded the following QuickLZ algorithm from:

http://www.quicklz.com/

and i have written a Parallel QuickLZ and i have tested it against my Parallel LZO and Parallel LZ4 , and i have noticed that Parallel QuickLZ is slower than my Parallel LZ4 algorithm, other than that with QuickLZ you have to pay for a commercial license , but with my Parallel archiver and my Parallel compression library you have to pay 0$ for a commercial license.

My Parallel archiver was updated, i have ported the Parallel LZ4 compression algorithm(one of the fastest in the world) to the Windows 64 bit system, now Parallel LZ4 compression algorithm is working perfectly with Windows 32 bit and 64 bit, if you want to use Windows 64 bit Parallel LZ4 just copy the lz4_2.dll inside the LZ4_64 directory (that you find inside the zip file) to your current directory or to the c:\windows\SysWow64 directory, and if you want to use the Windows 32bit Parallel LZ4 use the lz4_2.dll inside the LZ4_32 directory.

If you want to use Windows 64 bit Parallel LZMA just copy the LZMAStream1.dll inside the LZMA_fpc64 directory and LZMAStream2.dll inside LZMA_dcc64 directory to your

current directory or to the c:\windows\SysWow64 directory, and if you want to use Windows 32bit Parallel LZMA copy the LZMAStream1.dll inside the LZMA_fpc32 directory and LZMAStream2.dll inside LZMA_dcc32 directory to your current directory or to the c:\windows\system32 directory.

Here is more information about my Parallel archiver:

Parallel LZO supports Windows 32 bit and 64 bit

Parallel Zlib supports Windows 32 bit and 64 bit

Parallel LZ4 supports Windows 32 bit and 64 bit

Parallel LZMA is Windows 32 bit and 64bit

Parallel Bzip is Windows 32 bit and 64bit

Parallel ZSTD is Windows 32 bit and 64bit

But even if you compile it for 32bit , my Parallel archiver will support Terabytes files and your archive can grow to Terabytes size even with 32 bit windows executables, and that's good.

And Look also at the prices of the XCEED products:

XCEED Streaming compression library:

http://xceed.com/Streaming_ActiveX_Intro.html

and the XCEED Zip compression library:

http://xceed.com/Zip_ActiveX_Intro.html

http://xceed.com/pages/TopMenu/Products/ProductSearch.aspx?Lang=EN-CA

I don't think the XCEED products supports parallel compression as does my Parallel archiver

and my Parallel compression library..

And just look also at the Easy compression library for example, if you have noticed also it's not a parallel compression library.

http://www.componentace.com/ecl_features.htm

And look at its pricing:

http://www.componentace.com/order/order_product.php?id=4

My Parallel archiver and parallel compression library costs you 0$ and they are parallel compression libraries, and they are very fast and very easy to use, and they supports Parallel LZ , Parallel LZ4, Parallel LZO, Parallel Zlib, Parallel Bzip and Parallel LZMA and they come with the source codes and much more...

In my Parallel archiver version 3.21 , i have added an indicator of the percentage of the Index processed when you are loading your Index...

Also i want to advice you to use an SSD drive of the Intel type because it's an excellent drive that don't have problems with power-outages and because it will be much faster to load the index...

Please read the following:

===

Conclusion

Right now, there is only one reliable SSD manufacturer: Intel.

That really is the end of the discussion. It would appear that Intel is

the only manufacturer of SSDs that provide sufficiently large on-board

temporary power (probably in the form of supercapacitors) to cover writing back the entire cache when power is pulled, even when the on-board cache is completely full.

Read here: http://lkcl.net/reports/ssd_analysis.html

===

Hope you will enjoy my Parallel archiver.

Here is the public methods that i have implemented:

Constructor Create(file1:string,size:integer;nbrprocs:integer;processorgroups:boolean=false);

- Creates a new TPZArchiver ready to use, size is the hashtable size for the index(Key file names and the corresponding file position ,and file1 is the file archive, nbrprocs is the number of cores you have specify to run Zlib , LZ4, LZO , Bzip and LZMA in parallel, and the boolean parameter that is processorgroups is to support processor groups on windows , if it is set to true it will enable you to scale beyond 64 logical processors and it will be NUMA efficient. The returned exceptions of the constructor is ELoadIndex if it can not load the index or EUnknownFileFormat if it isn't the right file archive format

Destructor Destroy;

- Destroys the TPZArchiver object and cleans up.

function AddFiles;

- Adds the files to the archive.

function AddDirectory;

- Adds a directory

function AddStream;

-Adds the stream to the archive.

function DeleteFiles;

- Deletes the TStringList content from the archive.

function Erase;

- Erases the data inside the archive and inside the hashtable.

function Update;

- Updates the file or the stream inside the archive

function ExtractFiles;

- Extracts the TStringList content from the archive.

function ExtractAll;

- Extracts all the files from the archive or from the stream.

function Extract;

-Extracts the file to the stream.

procedure CopyArchive;

- Copy archive to stream

function Test;

- Tests the integrity of the files inside the archive.

function GetInfo;

- Gets the file info that is returned in a TZSearchRec record.

function ClearFile;

- Deletes all contents of the archive.

function Clean:boolean

- Cleans the marked deleted items from the file.

function DeletedItems:integer

- Returns the number of items marked deleted.

function Exists(Name : String) : Boolean;

- Returns True if a file Name exists

procedure GetKeys(Strings : Tstrings);

- Fills up a TStrings descendant with all the keys names.

procedure GetFiles(Strings : Tstrings);

- Fills up a TStrings descendant with all the file names.

function FixArchive(filename:string):boolean;

- To fix the archive

function FixArchive(Stream:TStream):boolean;

- To fix the stream

function AnalyzeArchive(filename:string):TypeError;overload;

- Analyze the Archive file, the returned TypeError is ctCorrupt if the file is corrupt or ctUnknown if the file is of an unknown format.

function AnalyzeArchive(Stream:TStream):TypeError;overload;

- Analyze the Archive Stream, the returned TypeError is ctCorrupt if the stream is corrupt or ctUnknown if the stream is

of an unknown format.

function Count : Integer;

- Returns the number of files inside the archive.

function compressionType(name:string):string

- Return the compression type of the file

PUBLIC PROPERTIES:

Indicator : boolean

- To show the compression and decompression indicator.

CompressionLevel;

- Sets and reads the compression level.

Overwrite:boolean

- To update and overwrite the file without asking .

Freshen: boolean

-Adds newer files to the archiver and extract newer files from the archive.

AddRecurse: boolean

- AddFiles() method will recurse on subdirectories.

Stream:TStream

- The archive is exposed as a TStream, use it for in-memory archive or disk archive.

AddAttributes: TAttrOptions

- FindFile attributes for the AddFiles() method, look inside FindFile component.

ExtractFullPath:boolean

- Extract full path or not.

AddFullPath:boolean

- Add full path or not.

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -$H+ -DDelphi

For Delphi XE-XE7 or Delphi 10.2 tokyo, use the -DXE switch

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems

Please click on the small arrow on the right of the zip file bellow to download...