Mastering High-Performance CSV Imports: From Collections to MySQL Native Loaders

Laravel//Feb 13, 2025//5 min read

Overview: The Challenge of the Million-Row Import

Importing data is a deceptive task. It starts simple: read a file, loop through the rows, and save them to a database. But as the row count climbs from hundreds to millions, the "simple" way becomes a catastrophic failure. Memory limits are breached, database connections timeout, and processing times stretch from seconds into hours. This guide explores how to scale a import from a basic collection-based approach to high-performance methods using and native features.

We will operate under three strict constraints to simulate a real-world production environment: a memory limit of 256MB, a refusal to use background queues/jobs for this specific exercise, and a focus on vanilla and tools without external packages. The goal is to reach a sub-20-second import for one million records.

Prerequisites

Before diving in, you should have a solid grasp of the following:

PHP 8.x: Familiarity with closures, arrays, and file handles.
Laravel 11: Basic understanding of and the .
MySQL: Knowledge of tables and basic syntax.
CLI: Comfort running commands.

Key Libraries & Tools

: Our primary environment, specifically using the new Concurrency facade.
: Used for low-level database interactions to bypass overhead.
: The target database engine.
fgetcsv(): A native function that parses a line from a file pointer and checks for fields.

Code Walkthrough: Evolution of an Import

Phase 1: The Collection Pitfall

The most intuitive approach is loading a file into a collection. While readable, it is the least efficient.

$rows = file($filePath);
$data = collect($rows)->skip(1)->map(function ($line) {
    $row = str_getcsv($line);
    return [
        'name' => $row[1],
        'email' => $row[2],
        // ... more fields
    ];
});

foreach ($data as $customer) {
    Customer::create($customer);
}

Why it fails: The file() function reads the entire file into memory at once. At 100,000 rows, you hit the 256MB limit. Furthermore, Customer::create() triggers a separate INSERT for every row, making it incredibly slow.

Phase 2: Streamed File Pointers and PDO Chunks

To scale, we must stop loading the whole file. Instead, we use fopen() and fgetcsv() to stream the file line-by-line. To improve database performance, we move away from and use prepared statements with manual chunking.

$handle = fopen($filePath, 'r');
fgetcsv($handle); // Skip header

$pdo = DB::connection()->getPdo();
$stmt = $pdo->prepare("INSERT INTO customers (name, email, ...) VALUES (?, ?, ...)");

while (($row = fgetcsv($handle)) !== false) {
    $stmt->execute([$row[1], $row[2], ...]);
}
fclose($handle);

By executing 1,000,000 individual queries with a single prepared statement, the memory footprint drops to nearly zero. However, it still takes approximately 3 minutes. To go faster, we must bundle these into "multi-insert" statements.

Phase 3: Parallel Processing with Laravel Concurrency

introduced a Concurrency facade that allows us to run tasks in parallel. We can divide the file processing across 10 different worker processes. We use a Modulo Operator strategy to ensure each process only handles its assigned lines (e.g., Process 1 handles lines 0, 10, 20...).

Concurrency::run([
    function () use ($filePath, $i) {
        $handle = fopen($filePath, 'r');
        $currentLine = 0;
        while (($line = fgets($handle)) !== false) {
            if ($currentLine % 10 !== $i) {
                $currentLine++;
                continue;
            }
            // Process and insert the line here...
            $currentLine++;
        }
    }
], 10);

Performance Note: This method slashed the 1,000,000-row import time down to 4 seconds. Parallelization allows the CPU to utilize multiple cores to handle the parsing and I/O simultaneously.

Syntax Notes: The Power of `fgetcsv` vs `str_getcsv`

During high-volume processing, the choice of function matters. fgetcsv() is designed to work with file handles, making it perfect for streaming. On the other hand, str_getcsv() is better when you already have a string in memory (like when using fgets()). In the parallel approach, we use fgets() first to quickly skip rows we don't own, and only call str_getcsv() on the rows we intend to process. This saves significant CPU cycles.

Practical Examples

E-commerce Migration: Moving millions of customer records or legacy orders from an old system to a new app.
Financial Reporting: Importing daily transaction logs for analysis where speed is critical to meet processing windows.
Data Warehousing: Seeding local development environments with massive datasets to test performance at scale.

Tips & Gotchas

The PDO Placeholder Limit: via has a limit on the number of placeholders (?) in a single query (usually 65,535). If you chunk 10,000 rows with 10 columns each, you will hit this limit. Always calculate your chunk size based on Limit / Number of Columns.
MySQL local-infile: The absolute fastest method is the native LOAD DATA LOCAL INFILE command. However, this is often disabled for security. If you use it, you must enable it on both the client and the server (SET GLOBAL local_infile = 1).
Database Reconnection: When using the Concurrency facade, each process might need to explicitly reconnect to the database to avoid sharing a socket, which can cause intermittent connection errors.
Date Formats: Databases are picky. Ensure you format your dates (e.g., Y-m-d H:i:s) before inserting, or use functions like NOW() for timestamps to reduce memory usage.

Topic DensityMention share of the most discussed topics · 28 mentions across 7 distinct topics

: 29%· products
: 18%· products
: 14%· products
: 11%· products
: 11%· products
Other topics: 18%

End of Article

Source video

Mastering High-Performance CSV Imports: From Collections to MySQL Native Loaders

Import One Million Rows To The Database (PHP/Laravel)

Laravel // 48:40

Laravel

Laravel

The official YouTube channel of Laravel, the clean stack for Artisans and agents. We will update you on what's new in the world of Laravel, from the framework to our products Cloud, Forge, and Nightwatch.

Who and what they mention most

30.2%13

20.9%9

18.6%8

16.3%7

14.0%6

5 min read0%

5 min read