2. Everything is a File

The Starter Guide introduced the file ownership and permissions access concepts, but really understanding the UNIX® file system (and this also applies to Linux's file systems) requires that we redefine the concept of “What is a file”.

Here, “everythingreally means everything. A hard disk, a partition on a hard disk, a parallel port, a connection to a web site, an Ethernet card: all these are files. Even directories are files. Linux recognizes many types of files in addition to the standard files and directories. Note that by file type here, we do not mean the type of content of a file: for GNU/Linux and any UNIX® system, a file, whether it be a PNG image, a binary file or whatever, is just a stream of bytes. Differentiating files according to their contents is left to applications.

2.1. The Different File Types

When you issue ls -l, the character before the access rights identifies the file type. We have already seen two types of files: regular files (-) and directories (d). You can also find other types if you wander through the file tree and list the contents of directories:

  1. Character mode files: they are either special system files (such as /dev/null, which we have already discussed), or peripherals (serial or parallel ports), which share the trait that their contents (if they have any) are not buffered (meaning they are not kept in memory). Such files are identified by the letter c.

  2. Block mode files: these files are peripherals, and unlike character files, their contents are buffered. For example, some files in this category are: hard disks, partitions on a hard disk, floppy drives, CD-ROM drives and other storage devices. Files like /dev/hda, /dev/sda5 are examples of block-mode files. Such files are identified by the letter b.

  3. Symbolic links: these files are very common and heavily used in the Mandriva Linux system start-up procedure (see Chapter 11, The Start-Up Files: init sysv). As their name implies, their purpose is to link files in a symbolic way, which means that they are files whose content is the path to a different file. They may not point to an existing file. They are very frequently called soft links, and such files are identified by the letter l.

  4. Named pipes: in case you were wondering, yes, these are very similar to pipes used in shell commands, but with the difference that these actually have names. However they are very rare and it's not likely that you will see one during your journey into the file tree. Such files are identified by the letter p. See Section 4, ““Anonymous” Pipes and Named Pipes”.

  5. Sockets: this is the file type for all network connections, but only a few of them have names. What's more, there are different types of sockets and only one can be linked, but this is way beyond the scope of this book. Such files are identified by the letter s.

Here is a sample of each file:

$ ls -l /dev/null /dev/sda  /etc/rc.d/rc3.d/S20random /proc/554/maps \
/tmp/ssh-queen/ssh-510-agent
crw-rw-rw-    1 root     root       1,   3 May  5  1998 /dev/null
brw-rw----    1 root     disk       8,   0 May  5  1998 /dev/sda
lrwxrwxrwx    1 root     root           16 Dec  9 19:12 /etc/rc.d/rc3.d/
  S20random -> ../init.d/random*
pr--r--r--    1 queen  queen         0 Dec 10 20:23 /proc/554/maps|
srwx------    1 queen  queen         0 Dec 10 20:08 /tmp/ssh-queen/
  ssh-510-agent=
$

2.2. Inodes

Inodes are, along with the “Everything Is a File” paradigm, a fundamental part of any UNIX® file system. The word inode is short for “Information NODE”.

Inodes are stored on disk in an inode table. They exist for all types of files which may be stored on a file system, including directories, named pipes, character-mode files and so on. Which leads to this other famous sentence: “The inode is the file”. Inodes are how UNIX® identifies a file in a unique way.

No, you didn't misread that: in UNIX®, you do not identify a file by its name, but by its inode number[4]. The reason for this is that the same file may have several names, or even no name. In UNIX®, a file name is just an entry in a directory inode. Such an entry is called a link. Let us look at links in more detail.



[4] Important: note that inode numbers are unique per file system, which means that an inode with the same number can exist on another file system. This leads to the difference between on-disk inodes and in-memory inodes. While two on-disk inodes may have the same number if they are on two different file systems, in-memory inodes have a unique number right across the system. One solution to obtain uniqueness, for example, is to hash the on-disk inode number against the block device identifier.