Integrating LLMs into Production SaaS: A Guide to Quiz Generation
Overview
Building a
Prerequisites
To follow this guide, you should be comfortable with
Key Libraries & Tools

- Beautiful Soup: Parses HTML to extract body text from URLs.
- YouTube Transcript API: Retrieves captions from YouTube videos.
- LangChain: Orchestrates the LLM workflow and integrates withOpenAI.
- Pydantic: Defines data schemas and parses LLM outputs into strictly typed objects.
Code Walkthrough
Data Extraction
We first scrape the source content. For websites, we use requests and Beautiful Soup to grab the body text. For videos, we fetch transcripts.
# Basic scraping logic
from bs4 import BeautifulSoup
import requests
def get_text(url):
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")
return soup.body.get_text(strip=True)
Structured Output with Pydantic
To ensure the AI returns a valid quiz, we define a schema using Pydantic. This forces the LLM to adhere to a specific
from pydantic import BaseModel
from typing import List
class Question(BaseModel):
text: str
options: List[str]
answer: str
class Quiz(BaseModel):
title: str
questions: List[Question]
Handling Token Limits
LLMs have strict context windows. To process long content, we implement a chunking system. We split text into segments, send them concurrently to the
Syntax Notes
We rely heavily on the Pydantic Output Parser within
Practical Examples
This architecture powers
Tips & Gotchas
AI is unpredictable. Unlike traditional

Fancy watching it?
Watch the full video and context