Linux - Files
Path is an illusion
In Linux, the "path" is actually an illusion created for humans. The kernel doesn't think in terms of /home/user/photo.jpg; it thinks in terms of Inode numbers.
inode
inode = index node. A data structure describes a file or a directory. For a user, a file is a path like /home/yourname/whatever.txt; for the operating system, an inode is used to store the extra info about the file, like file type, owner, group, who can access the file, size of the file, last modified time, etc.
Manage Files
Create Files
$ touch a.txt # Create an empty file
$ echo "hello" > b.txt # Create a file with "hello"
$ echo "world" >> b.txt # append content to the existing file
Remove Files
$ rm a.txt
Manage Directories
$ mkdir dir1
$ rmdir dir1
$ rm -r dir2
Manage Links
ln -s will create symbolic links, or soft links, or symlinks (calling symlink system call), while without -s will create hard links (calling link system call).
$ sudo ln -s <source_path> <target_path>
For example:
$ ln -s 0.1.0-SNAPSHOT/ snapshot
$ ls -l
... snapshot -> 0.1.0-SNAPSHOT/
File Info
Use ls -l to check file info:
$ ls -l
-rw-rw-rw- 1 ubuntu ubuntu 13 Mar 12 16:56 a.txt
Type
The first char is the type of the file:
-: regular filed: directoryl: symbolic link
Permissions
Next are 3 groups of rwx describing the permissions:
- 1st
rwx: permissions for owner of the file - 2nd
rwx: permissions for group owners of the file - 3rd
rwx: permissions for all other users
where
r: permission to readw: permission to writex: permission to execute-: no permission
Instead of rwx, we can also use a number between 0 and 7 to describe the permissions: map rwx to the binary form of the number, for example,
7=>111=>rwx: you have all the permissions6=>110=>rw-: you can read or write but not execute5=>101=>r-x: you can read or execute but not edit4=>100=>r--: read only0=>0=>---: no permission at all
Then a file's permission can be described by 3 numbers, e.g. 755 is equivalent to rwxr-xr-x.
To change a file's permission, use chmod
$ chmod 755 a.txt
or use something like this if you do not like the numbers:
$ chmod a+rw a.txt
where a is for all users, + is to add permissions, rw is for read and write. Check $ man chmod for all available options.
Owners
-rw-rw-rw- 1 ubuntu ubuntu 13 Mar 12 16:56 a.txt
------ ------
| |
| |---- group owner of the file
|----------- owner of the file
To change owners, use chown:
$ sudo chown root:root a.txt
The first root is the OWNER while the second root is the GROUP.
List Files
Show file size in different unit
$ ls -l --block-size=M
$ ls -l --block-size=K
Set Color
$ ls --color=auto
$ ls --color=tty
$ ls --color=none
Search Files
Find files with .txt suffix in home directory
$ find ~ -name "*.txt"
Or use locate
$ locate passwd
locate uses a database (using updatedb) rather than hunting individual directory paths.
which is used for locating binaries; whereis lists locations for binaries, sources, and man pages.
grep is used to search inside the content.
File Magic
While Windows relies heavily on file extensions (like .exe or .txt) to identify files, Linux relies on File Magic.
What is a "Magic Number"?
A magic number is a constant numerical or text value placed at the very beginning of a file so that the operating system can quickly identify its format.
For example:
hexdump -C -n 4 /bin/ls`
The options:
-C: Canonical hex+ASCII display.-n 4: Read only the first 4 bytes.
The output 7f 45 4c 46 translates to:
7f: A non-printable DEL character (used to prevent the file from being mistaken for a plain text file).45: ASCII E4c: ASCII L46: ASCII F
This .ELF signature tells the Linux kernel: "This is a binary executable that I know how to run."
The file Command and libmagic
The file command is the standard tool for reading these signatures. It doesn't care what the file is named; it only cares about the content.
If you rename /bin/ls to ls.txt, Windows might try to open it in Notepad. But Linux will still know exactly what it is:
mv /bin/ls /bin/ls.txt
file /bin/ls.txt
# Output: /bin/ls.txt: ELF 64-bit LSB shared object, x86-64...
The file command uses a library called libmagic, which is used by many other programs (like web browsers and email clients) to detect file types securely.
The Magic Database (magic.mgc)
As you noted, these signatures are stored in a database.
- The Compiled Database:
/usr/share/misc/magic.mgc- This is a binary (compiled) version of the definitions. It is used because searching through thousands of file signatures in plain text would be too slow.
- The Source Files:
/usr/share/misc/magic/(or similar directory)- In many distributions, you can find the human-readable source files here. They contain the logic used to identify files.
How a "Magic" rule looks (Simplified):
Inside the source files, a rule looks something like this:
# Offset Type Test Value Message
0 string \x7fELF ELF Binary
>4 byte 1 32-bit
>4 byte 2 64-bit
This tells the engine: "Go to the start (0), look for \x7fELF. If found, it's an ELF. Then go to byte 4; if it's a 1, it's 32-bit; if it's a 2, it's 64-bit."
Common Magic Numbers to Know
Aside from ELF, here are some common signatures you will see in the wild:
| File Type | Hex Signature | ASCII / Note |
|---|---|---|
| JPEG | ff d8 ff |
Standard Image |
| PNG | 89 50 4e 47 |
.PNG |
25 50 44 46 |
%PDF |
|
| ZIP | 50 4b 03 04 |
PK.. (After Phil Katz, creator of Zip) |
| Java Class | ca fe ba be |
CAFEBABE (Classic Easter Egg) |
| Script | 23 21 |
#! (The "Shebang") |
Why file --version is Important
Running file --version tells you exactly which database your system is using:
$ file --version
file-5.46
magic file from /etc/magic:/usr/share/misc/magic
This is useful for:
- Debugging: If
fileis misidentifying a new file format, you check if your database is outdated. - Custom Magic: You can add your own signatures to
/etc/magic(the local override file). If you invent a new file format for a custom app, you can tell your Linux system how to recognize it by adding a line there.