Skip to main content
NNextGen AI Learn

Lessons

Every lesson, in one place.

72 lessons across 7 courses. Search a topic, filter by course, jump straight to the one you want.

72 lessons

  1. L1

    What a prompt actually is

    A prompt is a program — written in English (or any language). Treat it that way.

    8 min

  2. L2

    Anatomy of a great prompt

    The five slots that turn a vague prompt into a reliable one.

    12 min

  3. L3

    Constraints: the secret weapon

    Telling the model what NOT to do is often more powerful than telling it what to do.

    10 min

  4. L4

    Few-shot prompting

    Show, don't tell. The fastest way to lock in a format.

    10 min

  5. L5

    Chain-of-thought & reasoning

    "Think step by step" — and when it actually helps.

    11 min

  6. L6

    Structured output (JSON & schemas)

    How to make the model produce JSON that actually parses.

    10 min

  7. L7

    Personas, roles & tone

    Roles do real work — when they're specific. "You are a helpful assistant" does almost nothing.

    8 min

  8. L8

    Temperature, top-p, and sampling

    The three knobs that control how "random" the output is.

    9 min

  9. L9

    Prompt injection & safety

    The vulnerability every LLM app has — and how to actually defend against it.

    11 min

  10. L10

    Evaluating prompts (the part nobody does)

    You're not done when it works once. You're done when it works on a held-out test set.

    12 min

  11. L11

    Production patterns: caching, fallback, retries

    The infrastructure tricks that turn a prompt demo into a real product.

    11 min

  12. L12

    Capstone: ship a prompt system

    Pull it all together. Design a prompt + eval set + production wrapper for a real task.

    18 min

  1. L1

    Tokens — what models actually see

    Models do not read characters or words. They read tokens. This one reframe explains a lot of weird behavior.

    9 min

  2. L2

    Embeddings — words as coordinates

    Once a token is an integer, it becomes a vector in a high-dimensional space. The geometry of that space is where meaning lives.

    10 min

  3. L3

    Attention — the trick that made LLMs work

    For every token at every layer, the model looks back at every other token and decides what to focus on. That's attention.

    12 min

  4. L4

    Inside a transformer block

    Attention is one piece. The transformer block stacks it with norms, residuals, and a feed-forward layer.

    10 min

  5. L5

    Positional encoding — why order matters

    Self-attention is order-blind. We have to inject "where am I in the sequence" by hand.

    8 min

  6. L6

    Sampling — how the next token gets picked

    The model outputs a distribution. Picking from it is a separate (and tunable) step.

    10 min

  7. L7

    Reading a model card

    Pick the right model — and stop guessing — by reading the card like an engineer.

    10 min

  8. L8

    Context windows, KV cache & long context

    Why a 1M-token context is impressive — and expensive — and slower than you think.

    11 min

  9. L9

    Reasoning models & test-time compute

    Why "thinking out loud" before answering makes the model smarter — and when it doesn't.

    10 min

  10. L10

    Capstone: pick the right model for the job

    Combine everything: parameters, context, latency, cost, license, and your eval. Decide.

    12 min

  1. L1

    What RAG actually is — and when not to use it

    RAG is a retrieval system that feeds an LLM. That's it. The hard parts are everything except the LLM.

    9 min

  2. L2

    Cosine similarity in 5 lines of code

    Retrieval is just "find the closest vectors". The math is one dot product and two norms.

    9 min

  3. L3

    Chunking — the most important boring decision

    Bad chunking is the #1 cause of bad RAG. There's no universally right strategy — but there are clear wrong ones.

    11 min

  4. L4

    Vector databases — what they actually do

    A vector DB is a specialized index for "find the k nearest vectors" at scale. Pick one once you actually need scale.

    10 min

  5. L5

    Build a tiny end-to-end RAG

    Put it together: chunks, vectors, retrieval, prompt assembly. All in 50 lines of JavaScript.

    13 min

  6. L6

    Hybrid search & rerankers

    Pure vector search misses keyword-precise queries. Pure keyword search misses paraphrases. Use both.

    10 min

  7. L7

    The five most common RAG failure modes

    Diagnosing a broken RAG is half the job. Here's the field guide.

    10 min

  8. L8

    Evaluating a RAG pipeline

    Measure retrieval and generation separately. Aggregate metrics hide everything.

    12 min

  9. L9

    RAG in production: cost, latency, freshness

    A working RAG demo is 10% of the work. The rest is keeping it healthy.

    10 min

  10. L10

    Capstone — design RAG for support tickets

    A realistic system design exercise. Pick chunking, retrieval, eval, and ops choices.

    15 min

  1. L1

    Should you fine-tune?

    Fine-tuning is rarely the answer. This lesson is a decision tree for when it actually is.

    10 min

  2. L2

    LoRA, QLoRA, and PEFT

    You don't fine-tune the whole model. You train a tiny adapter and freeze everything else.

    11 min

  3. L3

    Building a training dataset

    Bad data destroys good models. Good data is half the work — and most of where you should spend time.

    12 min

  4. L4

    A QLoRA training run, end-to-end

    The practical recipe. From dataset → fine-tuned adapter → merged inference.

    13 min

  5. L5

    Hyperparameters that actually matter

    Most hyperparameters don't matter much. A few do — a lot.

    9 min

  6. L6

    Evaluating a fine-tuned model

    Train loss going down means *something* is happening. Whether it's the right thing is a separate question.

    11 min

  7. L7

    RLHF, DPO, and "alignment" — briefly

    Why "instruct" models exist, and why you probably shouldn't do RLHF yourself.

    10 min

  8. L8

    Catastrophic forgetting

    The fine-tune learns the new task — and forgets things it used to do well. Here's how to avoid it.

    9 min

  9. L9

    Hosting your fine-tune

    Once you have an adapter, where does it run? Three honest options.

    9 min

  10. L10

    Capstone: fine-tune for a specific task

    Pull it together. Make the call: prompt? RAG? Fine-tune? Then design the run.

    14 min

  1. L1

    From notebook to production — the gap

    A working notebook is 10% of the work. The other 90% is what nobody photographs.

    9 min

  2. L2

    Inference servers — vLLM, TGI, Triton, SGLang

    Don't serve LLMs from raw Hugging Face Transformers. The good engines exist for a reason.

    12 min

  3. L3

    Cloud GPUs — picking the right machine

    GPU choice is 50% of cost. Pick wrong and you waste money. Pick righter and you save serious cash.

    10 min

  4. L4

    Containers & immutable deployments

    Reproducible builds. Same image runs locally, in CI, in prod. No "works on my machine".

    10 min

  5. L5

    Autoscaling & traffic patterns

    Bursty traffic + slow GPU cold-starts = the canonical MLOps headache.

    10 min

  6. L6

    Cost optimization that actually moves the needle

    90% of LLM cost wins come from 5 patterns. Skip the obscure ones until you've done all five.

    11 min

  7. L7

    Monitoring — what to actually watch

    Prometheus dashboards lie. The right four metrics catch 90% of incidents.

    10 min

  8. L8

    Shadow traffic, canaries, and A/B tests

    Three rollout patterns, each appropriate for a different kind of risk.

    11 min

  9. L9

    CI/CD for ML pipelines

    Pipelines that ship models like code: tested, versioned, reviewable, rollback-able.

    9 min

  10. L10

    Capstone: design a production stack

    Make every choice. Stack, GPU, rollout, monitoring, cost.

    14 min

  1. L1

    Python for ML in 30 minutes

    You don't need 10 years of Python. You need NumPy, lists, dicts, and iterators. Here's the survival kit.

    10 min

  2. L2

    Vectors and dot products — the intuition

    Three things to internalize: vectors are arrows, dot products measure alignment, distances measure dissimilarity.

    10 min

  3. L3

    Matrices and matrix multiplication

    A neural network is, mostly, a sequence of matrix multiplications.

    10 min

  4. L4

    Probability for ML, briefly

    You don't need to be a probabilist. You need: distributions, expectation, log-probs, entropy.

    9 min

  5. L5

    Gradient descent — how models actually learn

    Pick a loss. Compute its gradient. Step downhill. Repeat. That's every neural network ever trained.

    11 min

  6. L6

    Train / val / test — how to not fool yourself

    Models that memorize their training data look great on it. The whole game is honest evaluation.

    9 min

  7. L7

    Loss functions — picking the right one

    Different problems need different losses. Three you'll meet 90% of the time.

    9 min

  8. L8

    Overfitting and regularization

    The model that fits the training data perfectly is rarely the best model. Six tools to keep it honest.

    9 min

  9. L9

    Build a tiny neural net from scratch

    Forward pass, loss, gradient, weight update. A real (tiny) classifier in 60 lines of plain JavaScript.

    14 min

  10. L10

    Capstone — diagnose a training run

    You're handed a broken run. What's wrong, and what do you check first?

    12 min

AI Agents

10 lessons

  1. L1

    What an AI agent actually is — and what isn't

    Most "AI agents" in production are 2-step pipelines. Real agents loop, decide, and act. Knowing the difference saves you weeks.

    9 min

  2. L2

    ReAct — Reason + Act + Observe

    The pattern under almost every modern agent. Surprisingly simple, surprisingly effective.

    11 min

  3. L3

    Tool use done right

    Tools are the agent's hands. Bad tool design wrecks more agents than bad models.

    11 min

  4. L4

    Planning — single-step vs multi-step

    ReAct is reactive. For long tasks you also want a plan. The hybrid wins.

    10 min

  5. L5

    MCP and the rise of agent protocols

    MCP is to agents what HTTP is to web apps. Worth understanding even if you don't use it directly.

    10 min

  6. L6

    Memory — short-term, long-term, none

    Most "memory" features in agents are over-engineered. Three simple patterns cover 90% of needs.

    9 min

  7. L7

    Multi-agent — when worth it

    Most multi-agent demos are a single agent with extra latency. Sometimes it's genuinely the right tool.

    9 min

  8. L8

    Agent safety and guardrails

    Agents are LLMs with the ability to act. The blast radius is bigger. Defense is layered.

    11 min

  9. L9

    Evaluating agents

    Agents don't have a single right answer. Eval is about success rates and trace quality.

    11 min

  10. L10

    Capstone — design a research agent

    Pull it together. Build a research agent that scopes, researches, and writes.

    14 min