Skip to content

FlyokaiFujinShuttle — products import performance

Benchmarks + tuning for the wide-column products import. All figures are 50k all-new products, --reindex=none, on the sw6710 dev box (MySQL 8.4 localhost, 8 GB buffer pool, innodb_autoinc_lock_mode=2, binlog off, 28 cores).

TL;DR

scenario throughput vs the original ~65 rows/s
products, media off ~5,300 rows/s 80× — 9× the 500–600 Rapidflow target
products + media, raw mode ~670–1,175 rows/s 10–18×
products + media, worker mode ~414–573 rows/s (scales with workers) 6–9×
products + media, filesaver mode (default) ~133–150 rows/s

The original ~65 rows/s was media import, not the insert path. The old MediaImport downloaded each image with a blocking fopen($url) then created it through the DAL + FileSaver, serially, in Pass A: ~14 ms/image × 50k ≈ 700 s ≈ the reported 12:41.

Measuring (built-in profiler)

Set FUJIN_PROF=1 and the deputy prints a per-stage breakdown to the error log at the end of a run, plus a live in-flight gauge (real async concurrency):

FUJIN_PROF=1 php -d error_log=/tmp/prof.log bin/console fujin:shuttle:run <csv> --type=product --reindex=none

Labels: passA/passB, batch.build/batch.resolveIds, send.* (per request kind), drv.build (SQL string build) / drv.wait (DB round-trip), bulk.insert, media.*. No-op when the env var is unset.

What was fixed (insert path)

  1. O(n²) SQL param bindingAsyncMysqli\Statement::bindParametersFromContainer used array_shift() in a loop over ~15k ? placeholders per 500-row INSERT. Pure CPU with no await, so it serialised every batch fiber. Replaced with an index pointer + an all-positional fast path. drv.build 7.3 s → 1.0 s on 50k. (Shared driver code; output is byte-identical.)
  2. Per-INSERT SET FOREIGN_KEY_CHECKS removed — the bulk writer toggled FK checks 0/1 around every INSERT (55 % of all statements, ~31 % of DB round-trip time). FK is now set off once per pooled async link (the links are importer-dedicated and Pass A pre-creates all FK targets); InsertOnDuplicate skips the per-statement toggle on async connections. The sync/PDO path is unchanged. unique_checks stays on — ON DUPLICATE KEY needs it.
  3. Pipeline buffering — the row queue + emitter (setBufferSize(concurrency)) buffer enough ahead that runBatches can fill its concurrency slots.
  4. Config bool bug(bool)"false" is true, so a CLI-set "disable media" (and every other boolean toggle) silently stayed on. The deputy now coerces via boolOption().

On localhost the insert path is MySQL-insert-bound at ~5,300 rows/s regardless of concurrency/batch-size (the sweep below is flat) — async works correctly, there are no cross-fiber locks, and extra concurrency only pays off on a latency'd / remote database.

conc=8  bs=500   5,358 rows/s      conc=8  bs=1000  5,179 rows/s
conc=16 bs=500   5,379 rows/s      conc=16 bs=2000  5,532 rows/s
conc=32 bs=1000  4,824 rows/s

Media import strategies

All three share a concurrent, non-blocking amphp/http-client download front-end (DOWNLOAD_CONCURRENCY=32) and a single batched dedup lookup. Concurrent downloading is the dominant win for real remote image URLs (latency hidden across requests); on localhost the download itself becomes the bottleneck and the numbers are noisy/pessimistic.

Select via admin config FlyokaiFujinShuttle.config.mediaImportMode (+ mediaWorkers), or per run with --media-mode=<m> / --media-workers=N:

  • filesaver (default, safe) — Shopware's FileSaver pipeline per file. Full metadata, media_type, translations, thumbnail-capable. Per-file blocking persist (~5 ms) is the floor.
  • raw (fastest) — bypasses FileSaver: downloads concurrently, computes metadata with getimagesizefromstring, places the file via Shopware's own AbstractMediaPathStrategy + shopware.filesystem.public, and raw bulk-INSERTs media + media_translation through the async pool. No thumbnails. Media verified served + DAL-compatible. Processes in 2k batches to cap RAM. Note: media.file_hash is a generated column (meta_data->'$.hash') — the content md5 goes into meta_data.hash, never the column.
  • worker (safe + parallel) — shards the fresh sources across N subprocesses (fujin:shuttle:media-worker), each running the filesaver pipeline. Reuses Shopware's pipeline for full correctness while parallelising; scales with mediaWorkers up to core count.

Tunables

where knob default note
admin / CLI mediaImportMode / --media-mode filesaver raw fastest, worker safe+parallel
admin / CLI mediaWorkers / --media-workers 8 worker subprocesses; raise toward core count
admin / CLI concurrency / --concurrency 8 batches in flight (matters on remote DB)
CLI --batch-size 500 rows per INSERT
code MediaImport::DOWNLOAD_CONCURRENCY 32 raise for high-latency remote image hosts
code MediaImport::RAW_BATCH 2000 raw-mode download+insert batch (caps RAM)

Reproducing

Cold runs need fresh product numbers (re-importing existing numbers is a no-op upsert) and, for media, fresh image names (existing names dedupe and skip the download). The install-local helpers under var/fujin-shuttle/sample/ build these: gen_cold.php <n> <PREFIX> <out> (reuses existing FK targets, exercises the cold INSERT path) and gen_media_cold.php <n> <PREFIX> <out> (fresh-name symlink farm → forces real downloads). cleanup_test_data.php removes the throwaway rows afterwards (keep FOREIGN_KEY_CHECKS on so ON DELETE CASCADE fires).