25 November 2009

Links

I just gave an impromptu lesson on symbolic links (symlinks) and hard links, complete with ASCII art, in #ubuntu-offtopic, and Topyli commented that simple explanations of this for beginners are hard to find, so here's a summary.

The purpose of a link is to allow you to have two (or more) paths to access the same data without having the data exist on disk multiple times, thus giving convenience without sacrificing disk space. So why are there two kinds of links and how do they work?

Symlinks (ln -s REALPATH LINK) work like this:

LINK --> REALPATH --> DATA

While hard links (ln PATH1 PATH2) work like this:

PATH1 --> DATA <-- PATH2

See what's happening here? In the symlink case, your link points to another path, which points to the data. In the hard link case, two paths point to the same data directly. I think I could get a lesson on pointers in C out of this ASCII art if I wanted to. If you want a bit more background, your hard disk's filesystem contains a table of inode numbers, which is just like the Index at the back of a book. Symlinks are when you get "(see also: rubber ducky)" and hard links are when you get "Rubber ducky: 5" and "Sesame Street: 5" both showing up in the Index. Since we can have multiple filesystems mounted on one machine (for example my /home is on a separate partition), it is important to note that while a symlink can point to something located on another disk (or in a book "Further reading: Little Red Riding Hood"), a hard link only knows about data on its own filesystem (ie same partition). So, if you want to link from your hard disk to a flash drive, you need to use a symlink. This makes sense since your hard disk can't know if your flash drive rearranges things while it's plugged into another computer.

How do these show up in ls? Hard links look like normal files. For symlinks ls -l --color will show LINK -> REALPATH. If REALPATH is deleted, this will be highlighted as red text on a black background.

Speaking of deletion, how does that work? Well, if you remove LINK, REALPATH and DATA will still exist. If you remove REALPATH, DATA goes away too and LINK just points at nothing (though if you add REALPATH back, LINK will start working again, as it only goes by filename). As for hard links, DATA goes away once no more inode numbers point to it. As mentioned before, hard links point directly to the data, so this means removing all links and the original filename. So if I remove the original filename (PATH1), PATH2 will still point to DATA.

I hope that's a straightforward enough explanation of how it works.


9 comments:

Jonathan Blackhall said...

That was a good explanation. Do you know offhand whether links created in Nautilus are hard or soft links?

Mackenzie said...

I *think* they're soft links. I don't use GNOME (and I'm in a school computer lab right now with my laptop disassembled), but you can always make a link and check with "ls -l" then report back.

qense said...

Informative post! I always vaguely knew the differences between the two, but know I really know it!

Could you say that all directories are hard links, because they are pointers to certain inodes?

By the way, Nautilus indeed creates softlinks.

Mackenzie said...

No, directories point to lists of other inode numbers.

According to jdong, OSX is the only UNIX with support for directory hardlinks, which is what makes Time Machine so fast. It just hardlinks the new snapshot's directory to the oldest one it hasn't changed from. He says this means Linux gets not happy when you try to use it for data recovery on a Time Machine drive.

Nigel said...

I had this doubt for quite some time, thanks for explaining it easily :)

MrCorey said...

I think that this is a great explanation, Mackenzie. As usual, you've provided a clear explanation for a tool that many people could use but might be a bit intimidated by.

Nice mention in The Fridge, BTW. You'll have to put Daniel on the spot now. ;)

Mackenzie said...

Put him on the spot? I don't get it. I knew the interview request would come since he sends the questions out immediately when a new MOTU is voted in.

Boarderpatrol said...

Good Job, Simple to the point. Your example made alot of sense..

chris* said...

Thank you very much for this straight-forward explanation. Exactly what I was looking for