AI - LLM

What is an LLM?

An LLM is a type of Artificial Intelligence trained on massive amounts of text data to understand and generate human-like language.

The Core Mechanism: At its heart, an LLM is a "Next Token Predictor." If you give it the sentence "The cat sat on the...", it doesn't "know" the cat; it calculates the statistical probability that the next word is "mat" versus "moon."
The "Large" Part: It is "Large" because it has billions of Parameters (connections) and has read trillions of words (books, code, internet).

Key Vocabulary

Tokens

LLMs don't read words; they read Tokens. A token is a chunk of text (usually 3–4 characters).

Example: The word "hamburger" might be one token, while "intercontinental" might be three.
Rule of thumb: 1,000 tokens is roughly 750 words.

Weights and Parameters

These are the "brain cells" of the model.

When you see "Llama-3-70B," the 70B means 70 Billion parameters.
Generally, more parameters mean the model can understand more complex nuances, but it also requires more expensive hardware (GPUs) to run.

Context Window

This is the model's Short-Term Memory.

It is the maximum amount of text the model can "see" and think about at one time.
If a model has a context window of 128k tokens, you can feed it a 300-page book, and it can answer questions about the very first page. If you exceed that window, it "forgets" the beginning of the conversation.

Inference

This is the act of actually running the model.

Training is when the AI learns (takes months and costs millions).
Inference is when you ask the AI a question and it generates an answer (takes seconds).

Temperature

This is the "Creativity Knob."

Low Temperature (e.g., 0.1): The model is predictable and logical. It always picks the most likely next word. (Good for coding/math).
High Temperature (e.g., 0.8): The model takes risks and picks less likely words. (Good for creative writing/brainstorming).

Hallucination

Since LLMs are just predicting the "next most likely word," they sometimes sound very confident while being completely wrong. This is called a Hallucination. It doesn't "lie"—it just calculates a high probability for a factually incorrect word.

Fine-Tuning

This is "Extra Schooling" for a model. You take a general model (like GPT-4) and show it 10,000 examples of legal documents. Now, that model is "Fine-Tuned" to be a lawyer.

What is a Checkpoint?

Think of an LLM as a student studying for a 10-year exam. A Checkpoint is like a "Save Game" file of that student’s brain at a specific moment in their studies.

Technical Definition: A checkpoint is a snapshot of the model's Weights (the numerical values of all its connections) during or after the training process.
Why it matters: When a company like Meta releases "Llama 3," they are giving you a checkpoint file. You load that file into your computer, and the AI "wakes up" with all the knowledge it learned during training.
Base vs. Instruct Checkpoints:
- Base Checkpoint: The model just predicts the next word (it's not good at chatting yet).
- Instruct Checkpoint: The model has been further trained to follow commands and act as a helpful assistant.