Integrating AI into Laravel: A Guide to RAG, Embeddings, and Function Calling
Beyond the Chatbox: Practical AI in Laravel
Most developers treat AI as a standalone product, but the real power lies in making it a core feature of your web applications. Integrating large language models (LLMs) like those from
Prerequisites
To follow this guide, you should be comfortable with:
- PHP 8.2+ and the Laravel framework.
- PostgreSQL with a basic understanding of database migrations.
- Fundamental API concepts (REST, JSON payloads).
- A basic understanding of the Inertia.js or Vue.js stack is helpful for front-end implementation but not strictly required for the backend logic.
Key Libraries & Tools
- OpenAI PHP Laravel: A community-supported wrapper developed byNuno MaduroandSandro Gehrithat provides a clean facade for API interactions.
- PG Vector: A PostgreSQL extension that allows for storing and querying vector embeddings.
- Eloquent PG Vector: A package that adds vector support directly to Laravel’s Eloquent ORM, allowing you to treat high-dimensional data like standard attributes.
Mastering the Prompt: Identity and Instructions
Everything begins with the prompt. In a programmatic context, we use the Chat Completions API. This isn't just a single string; it's an array of messages with specific roles.
$response = OpenAI::chat()->create([
'model' => 'gpt-4o',
'messages' => [
['role' => 'system', 'content' => 'You are an energetic Laravel expert.'],
['role' => 'user', 'content' => 'How do I add TFA to my app?'],
],
'max_tokens' => 1024,
'temperature' => 0,
]);
The System Role acts as the director, setting the persona and behavioral boundaries. The User Role represents the actual query, while the Assistant Role is the model's response. Setting the temperature to 0 is a best practice for technical tools; it minimizes "creativity" and ensures the model provides the most likely, factual response rather than hallucinating varied answers.
Vector Embeddings and Semantic Search
Traditional search relies on keywords. Semantic search relies on meaning. We achieve this through Vector Embeddings—giant arrays of floating-point numbers (usually 1,536 dimensions) that represent the "essence" of a text snippet.
When a user asks a question, you first convert that question into a vector using the embeddings endpoint:
$result = OpenAI::embeddings()->create([
'model' => 'text-embedding-3-small',
'input' => $userQuestion,
]);
$queryVector = $result->embeddings[0]->embedding;
You then store these vectors in a database like PostgreSQL using
Implementing RAG: Knowledge Augmentation
- Embed the user's query.
- Query your database for snippets with similar vectors.
- Inject those snippets into the System Message of your next API call.
// Querying similar documentation sections
$sections = Section::query()
->where('embedding', '<#>', $queryVector) // Inner product operator
->limit(5)
->get();
$context = $sections->pluck('content')->implode("\n");
// Send context + question to the model
$finalResponse = OpenAI::chat()->create([
'model' => 'gpt-4o',
'messages' => [
['role' => 'system', 'content' => "Use this context to answer: $context"],
['role' => 'user', 'content' => $userQuestion],
],
]);
Agentic Control and Function Calling
While RAG provides information, Function Calling (or Tools) provides action. This shifts your app from linear control to Agentic Program Control. Instead of the developer defining every step, the model decides which tool to use.
You define your functions in JSON schema within the API request. If the model determines it needs a tool—like get_weather—it won't return a text answer. Instead, it returns a finish_reason of tool_calls.
'tools' => [[
'type' => 'function',
'function' => [
'name' => 'get_weather',
'description' => 'Get current weather for a location',
'parameters' => [
'type' => 'object',
'properties' => [
'location' => ['type' => 'string'],
'unit' => ['type' => 'string', 'enum' => ['celsius', 'fahrenheit']],
],
],
],
]]
Your Laravel application catches this request, executes the actual PHP logic (e.g., calling a weather API), and sends the results back to the model in a new message with the tool role. This loop continues until the model has enough data to provide a final response to the user.
Syntax Notes & Best Practices
- Role Discipline: Use the system role for instructions and the tool role for data output. Never mix user input with system instructions to avoid prompt injection.
- Token Management: Always monitor
usagein the API response. High-density vectors and large context windows increase costs rapidly. - Identity Predictability: Give your AI a specific job title in the prompt. An "Expert Code Auditor" provides better feedback than a generic "AI Assistant."
Tips & Gotchas
- Hallucinations: Always provide an "out" in your prompt. Tell the model: "If you do not know the answer based on the context provided, say you don't know."
- Statelessness: Remember that each API call is independent. If you want a conversation, you must pass the entire history of messages (User, Assistant, and Tool) back to the model with every new request.
- Distance Metrics: When using PG Vector, understand your distance operators.
<#>is for negative inner product, while<->is for Euclidean distance. For OpenAI embeddings, inner product is the recommended similarity metric.
