Hardlinks & Softlinks

Hardlinks, Softlinks

This small article intends to popularize what are Hard Links and Soft Link, if you want a more realistic computer vision of things, I give my references at the end. The goal here is to understand the principle.

# Hard Links

# Symbolic Links

# Conclusion

# References

# Utilities

In an easy way, a hard disk volume, or partition (C:, D:, ...) is split into 2 parts: The index and data parts.

Every time you write a file on the hard disk:

- The content of the file is written in the data part, it is called data chunk,

- The index (MFT) stores where is written the data chunk in the data part, and filename(s) and path(s) assigned to it, as much as security attributes, etc...

In an NTFS file system, the link between the index and the user, materialized under a filename and an icon before our eyes, is either a Hard or Soft link, therefore all files you see in file browser are Hard links or Symbolic links.

Hard Links

The hard link is the one everybody uses without knowing, each time you record a file you create a hard link to the data chunk. A less known fact is that it's possible to assign 1023 file names to a data chunk with each one its own location, but always on the same partition.

Picture

All roads lead to Rome which is always at the same location, so each sign to Rome is a reference to the same location. Depending on where you are in Italy, the path to reach Rome will vary, but not the destination.

We can bring the signs to Rome to references in the index, and Rome to the data chunk in the data part.

Specifically, what is the use of Hard Links?

If you have 50 project folders on one hard disk, that each must contain presentation.txt because it presents your company with exactly the same content for each project:

Rather than copy and paste this file 50 times in each project folder, you create 50 Hard Links to the same data chunk that contains your text. Thus, rather than occupy 50 times the size of presentation.txt, it will occupy 1 times its size.

The other advantage, besides the size on disk is that if you edit any of these files (Hard Link), since all reference the same data chunk, all Hard Link "are amended" to once. In fact they are not changed, they simply point to the same data chunk, so if it changes, opening any link (Hard Link) to this data chunk will access the new content.

The rights are defined on the block of data (and not on each of its paths to access it), so changing rights on a HardLink will also be reflected in all others. So if you want that only the user "Boss" can modify this file, you will make the handling only once.

So, let's brief this:

    • When you create a file and save to your hard disk, data is recorded on the hard disk, and the index references this file 1 time.

    • If you copy and paste the file, the datas are again recorded on the hard drive at another location than the original file and a new reference is indexed.

    • If you create a HardLink to that file, nothing is written in the hard disk data part, and the new path is added to the index which will then look like:

Symbolic Links

That's pretty much the same principle, but instead of referencing the new path to the data chunk at the index file entries (beginning of MFT), we reference it at the end of MFT, in a part reserved for reparse points. It creates a classic index entry also, meaning that a symlink is a kind of file, but it only contains an indicator which tells the file system that it is linked to a reparse point, then the path is accessible through the reparse point datas.

Reparse points are not constructed like file entries in MFT, this allows to point to network or other partitions, but disallow to put extented attributes on this reparse point file. Also there is a theorical limit of 31 links to the same path. They are "false file" or folder which can only points to another location, no data can be stored in a file linked to a reparse point except the paths it redirects to. Reparse points are also used to make junctions and to mount volumes.

Picture

It would be like if by opening your society door at New York, you enter in the Rome office.

With reparse points you are seamlessly redirected, by opening your file from NY, you open thoses located at Rome. If you've got the right, any modification to the file will be seen by all users that have access to this file, wherever they are on the network.

What it is useful at?

To take the previous example with 50 hard links, you've got to imagine that this hard drive is on the network, and that groups which work on them are all over the world.

In this case, each user will make a symbolic link on his own hard drive which will point to the folder(s) of the project(s) he is working on. Then, when opening this folder they'll have access to linked files and they will be able to work on it seamlessly, like if files were on his hard drive.

So, let's brief this:

    • Symbolic links are redirections to a file or folder located on the same computer or another one.

    • A symlink is a file or folder with an indicator which says it's linked to a reparse point, and paths linked as datas (in the reparse point datas).

    • Theoricately, there is a limit of 31 links on a same path. It doesn't seem to exist on Windows 7. It is possible, but I'm not sure (Microsoft isn't consistent on this point), that numbers of links on a same path could be limited by the 16KB technical limit.

    • If UAC is active, and if you create symlinks inside a system protected folder (program files or any folder asking for elevation to be modified), most of the time those links won't work correctly and you will get an error that say the link doesn't exists.

    • If you copy-paste or move a symlink, it is the target which will be copied, this showing even the file browser get lure by the seamless link offered by symlinks (which is their purpose!).

But what is it ultimately?

The 2 link types (soft and hard) are similar in the way they both allow to target a file seamlessly, shades comming from their structure which allows different things. The "Windows shortcut" brings more functionnalities but is seen by other applications like a file and thus isn't transparent.

Link and allowed targets

The next scheme requires to consider that each link tries to point to a file or folder on hard drive 1 of computer 1, or the printer linked to computer 2.

A ! means that the file or folder must exist at the time of the creation of the hard or soft link.

Links and their usages

Hard link is a clever way to avoid useless redundances, often harmfull to the system hard drive space used. It can also serve to organize files.

Junction is a mix between hard and soft links, the target folder must exist for the link to be created, but if you erase the folder afterward, the link still exists but is not functionnal.

Symbolic link is a handy way of organizing folders and files, and can serve the same usage as hard links in a limited range, but more powerfull in the way they can point to network or other partitions.

Windows' shortcut (shell link) allows to assign complex parameters to target file, and also to link peripherical like printers or scanners.

Recapitulative sheet

Link type

Seamless link?

Editing link edits

Target a folder

Target a file

Target a file on another partition

Target a network file/ file on another hard drive

Target a printer

Can use a relative path

Delete the link delete the target

Delete the target render link unfonctional

Number of links to the same target

Targeted file launch options

Type

Approximate size (data part)***

Shell link (.lnk)

Hard link

Junction

Symbolic link

link

target

target

target

if last link

**

infinite

*

1023

31

31

file

1Ko

file

0Ko

reparse point

1ko

reparse point

1Ko

* Unless it was the last reference, but there will be no link anymore if you delete it.

** Before Vista, deleting a junction from file explorer will delete the target, if you use shift-delete (Windows 2000/XP/2003).

*** The size of a reparse point in data part is generally the length of the path targeted.

To go further

NTFS Hard Links, Directory Junctions, and Windows Shortcuts

NTFS link behavior

References

Documentation

NTFS.com

Junctions Points and Symbolic Links

Technical documentation

NTFS file system

Symbolic Link effects on file systems functions

Reparse points

Reparse points and file operations

Hard Links and Junctions

Creating Symbolic links

Code

Programming considerations

Creating and opening files

File attribute constants

Determining Whether a directory is a mounted folder (or a symbolic link)

BY_HANDLE_FILE_INFORMATION Structure

CreateSymbolicLink Function

Utilities

Link Shell Extension - allows to see and manipulate hard links, symbolic links and junctions through explorer

Junction Link Magic - allows to manipulate junctions, and to list junctions and symbolic links