logo

Linux - Files

Path is an illusion

In Linux, the "path" is actually an illusion created for humans. The kernel doesn't think in terms of /home/user/photo.jpg; it thinks in terms of Inode numbers.

inode

inode = index node. A data structure describes a file or a directory. For a user, a file is a path like /home/yourname/whatever.txt; for the operating system, an inode is used to store the extra info about the file, like file type, owner, group, who can access the file, size of the file, last modified time, etc.

Manage Files

Create Files

$ touch a.txt             # Create an empty file
$ echo "hello" > b.txt    # Create a file with "hello"
$ echo "world" >> b.txt   # append content to the existing file

Remove Files

$ rm a.txt

Manage Directories

$ mkdir dir1
$ rmdir dir1
$ rm -r dir2

Manage Links

ln -s will create symbolic links, or soft links, or symlinks (calling symlink system call), while without -s will create hard links (calling link system call).

$ sudo ln -s <source_path> <target_path>

For example:

$ ln -s 0.1.0-SNAPSHOT/ snapshot
$ ls -l
... snapshot -> 0.1.0-SNAPSHOT/

File Info

Use ls -l to check file info:

$ ls -l
-rw-rw-rw- 1 ubuntu ubuntu    13 Mar 12 16:56 a.txt

Type

The first char is the type of the file:

  • -: regular file
  • d: directory
  • l: symbolic link

Permissions

Next are 3 groups of rwx describing the permissions:

  • 1st rwx: permissions for owner of the file
  • 2nd rwx: permissions for group owners of the file
  • 3rd rwx: permissions for all other users

where

  • r: permission to read
  • w: permission to write
  • x: permission to execute
  • -: no permission

Instead of rwx, we can also use a number between 0 and 7 to describe the permissions: map rwx to the binary form of the number, for example,

  • 7 => 111 => rwx: you have all the permissions
  • 6 => 110 => rw-: you can read or write but not execute
  • 5 => 101 => r-x: you can read or execute but not edit
  • 4 => 100 => r--: read only
  • 0 => 0 => ---: no permission at all

Then a file's permission can be described by 3 numbers, e.g. 755 is equivalent to rwxr-xr-x.

To change a file's permission, use chmod

$ chmod 755 a.txt

or use something like this if you do not like the numbers:

$ chmod a+rw a.txt

where a is for all users, + is to add permissions, rw is for read and write. Check $ man chmod for all available options.

Owners

-rw-rw-rw- 1 ubuntu ubuntu    13 Mar 12 16:56 a.txt
             ------ ------
                |      |
                |      |---- group owner of the file
                |----------- owner of the file

To change owners, use chown:

$ sudo chown root:root a.txt

The first root is the OWNER while the second root is the GROUP.

List Files

Show file size in different unit

$ ls -l --block-size=M
$ ls -l --block-size=K

Set Color

$ ls --color=auto
$ ls --color=tty
$ ls --color=none

Search Files

Find files with .txt suffix in home directory

$ find ~ -name "*.txt"

Or use locate

$ locate passwd

locate uses a database (using updatedb) rather than hunting individual directory paths.

which is used for locating binaries; whereis lists locations for binaries, sources, and man pages.

grep is used to search inside the content.

File Magic

While Windows relies heavily on file extensions (like .exe or .txt) to identify files, Linux relies on File Magic.

What is a "Magic Number"?

A magic number is a constant numerical or text value placed at the very beginning of a file so that the operating system can quickly identify its format.

For example:

hexdump -C -n 4 /bin/ls`

The options:

  • -C: Canonical hex+ASCII display.
  • -n 4: Read only the first 4 bytes.

The output 7f 45 4c 46 translates to:

  • 7f: A non-printable DEL character (used to prevent the file from being mistaken for a plain text file).
  • 45: ASCII E
  • 4c: ASCII L
  • 46: ASCII F

This .ELF signature tells the Linux kernel: "This is a binary executable that I know how to run."

The file Command and libmagic

The file command is the standard tool for reading these signatures. It doesn't care what the file is named; it only cares about the content.

If you rename /bin/ls to ls.txt, Windows might try to open it in Notepad. But Linux will still know exactly what it is:

mv /bin/ls /bin/ls.txt
file /bin/ls.txt
# Output: /bin/ls.txt: ELF 64-bit LSB shared object, x86-64...

The file command uses a library called libmagic, which is used by many other programs (like web browsers and email clients) to detect file types securely.

The Magic Database (magic.mgc)

As you noted, these signatures are stored in a database.

  • The Compiled Database: /usr/share/misc/magic.mgc
    • This is a binary (compiled) version of the definitions. It is used because searching through thousands of file signatures in plain text would be too slow.
  • The Source Files: /usr/share/misc/magic/ (or similar directory)
    • In many distributions, you can find the human-readable source files here. They contain the logic used to identify files.

How a "Magic" rule looks (Simplified):

Inside the source files, a rule looks something like this:

# Offset    Type       Test Value    Message
0           string     \x7fELF       ELF Binary
>4          byte       1             32-bit
>4          byte       2             64-bit

This tells the engine: "Go to the start (0), look for \x7fELF. If found, it's an ELF. Then go to byte 4; if it's a 1, it's 32-bit; if it's a 2, it's 64-bit."

Common Magic Numbers to Know

Aside from ELF, here are some common signatures you will see in the wild:

File Type Hex Signature ASCII / Note
JPEG ff d8 ff Standard Image
PNG 89 50 4e 47 .PNG
PDF 25 50 44 46 %PDF
ZIP 50 4b 03 04 PK.. (After Phil Katz, creator of Zip)
Java Class ca fe ba be CAFEBABE (Classic Easter Egg)
Script 23 21 #! (The "Shebang")

Why file --version is Important

Running file --version tells you exactly which database your system is using:

$ file --version
file-5.46
magic file from /etc/magic:/usr/share/misc/magic

This is useful for:

  1. Debugging: If file is misidentifying a new file format, you check if your database is outdated.
  2. Custom Magic: You can add your own signatures to /etc/magic (the local override file). If you invent a new file format for a custom app, you can tell your Linux system how to recognize it by adding a line there.