FlyokaiFujinShuttle — products import performance¶
Benchmarks + tuning for the wide-column products import. All figures are 50k all-new
products, --reindex=none, on the sw6710 dev box (MySQL 8.4 localhost, 8 GB buffer pool,
innodb_autoinc_lock_mode=2, binlog off, 28 cores).
TL;DR¶
| scenario | throughput | vs the original ~65 rows/s |
|---|---|---|
| products, media off | ~5,300 rows/s | 80× — 9× the 500–600 Rapidflow target |
products + media, raw mode |
~670–1,175 rows/s | 10–18× |
products + media, worker mode |
~414–573 rows/s (scales with workers) | 6–9× |
products + media, filesaver mode (default) |
~133–150 rows/s | 2× |
The original ~65 rows/s was media import, not the insert path. The old MediaImport
downloaded each image with a blocking fopen($url) then created it through the DAL +
FileSaver, serially, in Pass A: ~14 ms/image × 50k ≈ 700 s ≈ the reported 12:41.
Measuring (built-in profiler)¶
Set FUJIN_PROF=1 and the deputy prints a per-stage breakdown to the error log at the end of
a run, plus a live in-flight gauge (real async concurrency):
FUJIN_PROF=1 php -d error_log=/tmp/prof.log bin/console fujin:shuttle:run <csv> --type=product --reindex=none
Labels: passA/passB, batch.build/batch.resolveIds, send.* (per request kind),
drv.build (SQL string build) / drv.wait (DB round-trip), bulk.insert, media.*.
No-op when the env var is unset.
What was fixed (insert path)¶
- O(n²) SQL param binding —
AsyncMysqli\Statement::bindParametersFromContainerusedarray_shift()in a loop over ~15k?placeholders per 500-row INSERT. Pure CPU with noawait, so it serialised every batch fiber. Replaced with an index pointer + an all-positional fast path.drv.build7.3 s → 1.0 s on 50k. (Shared driver code; output is byte-identical.) - Per-INSERT
SET FOREIGN_KEY_CHECKSremoved — the bulk writer toggled FK checks 0/1 around every INSERT (55 % of all statements, ~31 % of DB round-trip time). FK is now set off once per pooled async link (the links are importer-dedicated and Pass A pre-creates all FK targets);InsertOnDuplicateskips the per-statement toggle on async connections. The sync/PDO path is unchanged.unique_checksstays on —ON DUPLICATE KEYneeds it. - Pipeline buffering — the row queue + emitter (
setBufferSize(concurrency)) buffer enough ahead thatrunBatchescan fill its concurrency slots. - Config bool bug —
(bool)"false"istrue, so a CLI-set "disable media" (and every other boolean toggle) silently stayed on. The deputy now coerces viaboolOption().
On localhost the insert path is MySQL-insert-bound at ~5,300 rows/s regardless of concurrency/batch-size (the sweep below is flat) — async works correctly, there are no cross-fiber locks, and extra concurrency only pays off on a latency'd / remote database.
conc=8 bs=500 5,358 rows/s conc=8 bs=1000 5,179 rows/s
conc=16 bs=500 5,379 rows/s conc=16 bs=2000 5,532 rows/s
conc=32 bs=1000 4,824 rows/s
Media import strategies¶
All three share a concurrent, non-blocking amphp/http-client download front-end
(DOWNLOAD_CONCURRENCY=32) and a single batched dedup lookup. Concurrent downloading is the
dominant win for real remote image URLs (latency hidden across requests); on localhost the
download itself becomes the bottleneck and the numbers are noisy/pessimistic.
Select via admin config FlyokaiFujinShuttle.config.mediaImportMode (+ mediaWorkers), or per
run with --media-mode=<m> / --media-workers=N:
filesaver(default, safe) — Shopware'sFileSaverpipeline per file. Full metadata,media_type, translations, thumbnail-capable. Per-file blocking persist (~5 ms) is the floor.raw(fastest) — bypassesFileSaver: downloads concurrently, computes metadata withgetimagesizefromstring, places the file via Shopware's ownAbstractMediaPathStrategy+shopware.filesystem.public, and raw bulk-INSERTsmedia+media_translationthrough the async pool. No thumbnails. Media verified served + DAL-compatible. Processes in 2k batches to cap RAM. Note:media.file_hashis a generated column (meta_data->'$.hash') — the content md5 goes intometa_data.hash, never the column.worker(safe + parallel) — shards the fresh sources across N subprocesses (fujin:shuttle:media-worker), each running thefilesaverpipeline. Reuses Shopware's pipeline for full correctness while parallelising; scales withmediaWorkersup to core count.
Tunables¶
| where | knob | default | note |
|---|---|---|---|
| admin / CLI | mediaImportMode / --media-mode |
filesaver |
raw fastest, worker safe+parallel |
| admin / CLI | mediaWorkers / --media-workers |
8 | worker subprocesses; raise toward core count |
| admin / CLI | concurrency / --concurrency |
8 | batches in flight (matters on remote DB) |
| CLI | --batch-size |
500 | rows per INSERT |
| code | MediaImport::DOWNLOAD_CONCURRENCY |
32 | raise for high-latency remote image hosts |
| code | MediaImport::RAW_BATCH |
2000 | raw-mode download+insert batch (caps RAM) |
Reproducing¶
Cold runs need fresh product numbers (re-importing existing numbers is a no-op upsert) and,
for media, fresh image names (existing names dedupe and skip the download). The install-local
helpers under var/fujin-shuttle/sample/ build these: gen_cold.php <n> <PREFIX> <out> (reuses
existing FK targets, exercises the cold INSERT path) and gen_media_cold.php <n> <PREFIX> <out>
(fresh-name symlink farm → forces real downloads). cleanup_test_data.php removes the throwaway
rows afterwards (keep FOREIGN_KEY_CHECKS on so ON DELETE CASCADE fires).