gpt-5.4 thinking mode — 8-frame coherence

Eight separate full-resolution illustrated images of the same person across one quiet day, returned from a single /v1/responses call using model: "gpt-5.4" with reasoning.effort=high and the image_generation tool. The reasoning model decomposed the 8-beat prompt and invoked the image tool 8 times before assembling the response.

source photo of the subject

Reference photo

female_asian headshot from the eval cast pool, attached as the input image on the call.
Endpoint
POST https://api.openai.com/v1/responses
Model
gpt-5.4 (reasoning model)
Tool
image_generation
Reasoning
effort: "high" · 516 reasoning tokens consumed
Tool budget
max_tool_calls: 8 · parallel_tool_calls: true
Per-image size
1024×1024 each, full resolution
Wall-clock
~665 s (~11 min) for all 8
Cost
~$0.59 (0.05 reasoning text + 8 × $0.067 image-gen tool)

What this proves

  1. The thinking-mode 8-image capability lives on /v1/responses with model: gpt-5.4, not on /v1/images/edits. Azure deployments using the older images API don't expose it.
  2. gpt-5.5 silently ignores reasoning.effort (0 reasoning tokens). gpt-5.4 actually engages reasoning (516 tokens this run).
  3. Identity is locked across all 8 panels from a single reference photo — the strongest cross-panel face / hair / age coherence in the whole experiment family.
  4. Default behavior is 1 composite image; the model only emits 8 separate image_generation_calls when the prompt explicitly forbids composites and max_tool_calls is raised.

The eight beats

A single quiet day, chronological. Each panel is a separate image_generation_call in the response.

Beat 1 — dawn, sitting on bed in pajamas
Beat 1dawn

Sitting on the edge of a low bed in pajamas, looking down, hair tousled, soft blue pre-dawn light.

Beat 2 — early morning, pouring coffee
Beat 2early morning

At the kitchen counter pouring coffee from a small kettle, floral pajamas, sunlit window.

Beat 3 — mid-morning, on a train
Beat 3mid-morning

On a quiet train in a jacket, head leaned slightly toward the glass, countryside outside.

Beat 4 — midday, cafe noodles
Beat 4midday

Eating noodles at a tiny cafe counter, chopsticks, blue apron over cream shirt.

Beat 5 — warm afternoon, park walk
Beat 5warm afternoon

Walking a small public garden with autumn trees, beige jacket, dappled afternoon light.

Beat 6 — late afternoon, reading by window
Beat 6late afternoon

Cross-legged on a window seat at home, an open book in hand, warm afternoon light on the page.

Beat 7 — early evening, cooking
Beat 7early evening

At a small home stove with a wooden spoon stirring a pot, blue apron, warm overhead light, steam rising.

Beat 8 — dusk, balcony
Beat 8dusk

At a balcony railing in a soft cardigan, hands resting, looking out over city lights coming on.

The API call (verbatim)

POST https://api.openai.com/v1/responses
Authorization: Bearer <OPENAI_API_KEY>
Content-Type: application/json

{
  "model": "gpt-5.4",
  "input": [
    {
      "role": "user",
      "content": [
        { "type": "input_text",  "text": "...prompt commanding 8 separate image_generation calls..." },
        { "type": "input_image", "image_url": "data:image/png;base64,<reference photo>" }
      ]
    }
  ],
  "tools": [ { "type": "image_generation" } ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,
  "max_tool_calls": 8,
  "reasoning": { "effort": "high" }
}

Comparison to every other multi-image mode tested

ModeCallsLatencyIdentity holdPer-image resCost
8 separate Azure /edits (today's prod) 8 ~5 min seq · ~45 s parallel varies panel-to-panel (text identity_lock only) 1024² $0.54
1 composite call (2×4 grid, Mode E) 1 45 s very strong (single canvas) ~384×512 sub-panels $0.067
1 composite call + split_panels.py 1 + split 45 s + ms same as above ~384×512 $0.067
gpt-5.4 thinking · 8 tool calls in 1 /v1/responses 1 ~11 min best yet — locked across full-size panels 1024² each ~$0.59

What this means for Pikumo

For the production wizard (90-second user-facing budget), the ~11-min latency rules this out as the default per-panel renderer. Where it earns its place:

Generated 2026-05-26. Subject: female_asian headshot from the eval cast pool. Single OpenAI direct API call (not Azure). Reference photo and all 8 PNGs are served from this Cloudflare Pages site — the R2 bucket is private.