Programming Languages - Bits and Bytes
1 byte = 8-bit integer, in the range 0 to 255
Bit vs Byte
- bit: a single
0or1. 2 different values. The most basic unit of computing - byte: 1 byte = 8 bits = 2 hex = 256 different values. A.k.a "octet". Still "naked"
0s and1s, can be interpreted in different ways. - characters:
- historically, 1 byte (8 bits) is used to encode a single character: ASCII uses 7 bits, 128 code points, more than enough for English characters (both lowercase and uppercase). One extra bit can be used as a parity bit.
- now one character may need more than one byte to store, depend on the encoding.
Multiple-byte Units
| Unit | Abbrev. | Bytes | Unit | Abbrev. | Bytes |
|---|---|---|---|---|---|
| kB | kilobyte | 1000 | KiB | kibibyte | 1024 |
| MB | megabyte | 10002 | MiB | mebibyte | 10242 |
| GB | gigabyte | 10003 | GiB | gibibyte | 10243 |
| TB | terabyte | 10004 | TiB | tebibyte | 10244 |
| PB | petabyte | 10005 | PiB | pebibyte | 10245 |
| EB | exabyte | 10006 | EiB | exbibyte | 10246 |
| ZB | zettabyte | 10007 | ZiB | zebibyte | 10247 |
| YB | yottabyte | 10008 | YiB | yobibyte | 10248 |
Note if there's an i in the unit name: with an i, it is binary, otherwise decimal.
For example:
- terabyte (TB): 1012, or 10004, or 1,000,000,000,000 bytes
- tebibyte (TiB): 240, or 10244, or 1,099,511,627,776 bytes, roughly 1TiB = 1.1TB
Real world examples
- 3.5 inch Floppy Disk: 1,440 KiB = 1.47 MB = 1.41 MiB
- CD: up to 700 MB
- DVD: 4.7 GB = 4.38 GiB for a single-layered, single-sided disc
- Blu-ray: 25 GB for single-layer
- The Complete Works of William Shakespeare would occupy about 5,600,000 bytes when written in plain text without formatting.
Bit Manipulation
AND(&)
| AND | 0 | 1 |
| --- | --- | --- |
| 0 | 0 | 0 |
| 1 | 0 | 1 |
OR(|)
| OR | 0 | 1 |
| --- | --- | --- |
| 0 | 0 | 0 |
| 1 | 0 | 1 |
XOR(^)
| XOR | 0 | 1 |
| --- | --- | --- |
| 0 | 0 | 0 |
| 1 | 0 | 1 |
NOT(!)
| NOT | |
| --- | --- |
| 0 | 1 |
| 1 | 0 |
Basic Operations
- Bitwise And:
& - Bitwise exclusive OR:
^ - Bitwise inclusive OR:
| - Unary bitwise complement:
~ - Signed left shift:
<< - Signed right shift:
>> - Unsigned right shift:
>>>
By Language
Java
int bitmask = 0x000F;
int val = 0x2222;
System.out.println(val & bitmask);
// 2
System.out.println(~256);
// -257
Python
bytesis an immutable array of bytes (PyString)bytearrayis a mutable array of bytes (PyBytes)memoryviewis a bytes view on another object (PyMemory)
bytes literal: b'...'
strobjects: hold character databytesobjects: hold raw bytes
Indexing returns a integer:
>>> a = b'asdf'
>>> a
b'asdf'
>>> a[0]
97
while str returns a character:
>>> b = 'asdf'
>>> b
'asdf'
>>> b[0]
'a'
- Assigning or comparing an object that is not an integer to an element causes a TypeError exception.
- Assigning an element to a value outside the range 0 to 255 causes a ValueError exception.
string must comes with encoding
bytearray:
>>> a = bytearray("123", 'utf-8')
>>> a[0]
49
>>> a[1]
50
bytes:
>>> a = bytes('abc', 'utf-8')
>>> a
b'abc'
Endianness
Endianness: the order of the bytes
Big vs Little
- big-endian: the most significant byte first
- little-endian: the least significant byte first
Other names:
- Network byte order: big-endian
Example:
0A 0B 0C 0D
- Big-endian: stored as
0A 0B 0C 0Din memory - Little-endian: stored as
0D 0C 0B 0Ain memory
Why Little-endian
A 32-bit memory location with content 4A 00 00 00 can be read at the same address as either 8-bit (value = 4A), 16-bit (004A), 24-bit (00004A), or 32-bit (0000004A), all of which retain the same numeric value.
Systems
- big endian: Java, IPv6 (network byte order), IBM z/Architecture mainframes,
- little endian: Intel x86 processor
In Action
In Python:
>>> import struct
>>> struct.pack("<I", 1)
b'\x01\x00\x00\x00'
>>> struct.pack(">I", 1)
b'\x00\x00\x00\x01'
where < means little-endian, and > means big-endian. I for 32-bit unsigned integer, so it takes 4 bytes. In little-endian, byte \x01 is stored first, while in big-endian, it is stored last.
Check system byte order:
>>> import sys
>>> sys.byteorder
'little'