gpt-5.4 thinking mode — 8-frame coherence

Eight separate full-resolution illustrated images of the same person across one quiet day, returned from a single /v1/responses call using model: "gpt-5.4" with reasoning.effort=high and the image_generation tool. The reasoning model decomposed the 8-beat prompt and invoked the image tool 8 times before assembling the response.

Reference photo

female_asian headshot from the eval cast pool, attached as the input image on the call.

Endpoint: POST https://api.openai.com/v1/responses
Model: gpt-5.4 (reasoning model)
Tool: image_generation
Reasoning: effort: "high" · 516 reasoning tokens consumed
Tool budget: max_tool_calls: 8 · parallel_tool_calls: true
Per-image size: 1024×1024 each, full resolution
Wall-clock: ~665 s (~11 min) for all 8
Cost: ~$0.59 (0.05 reasoning text + 8 × $0.067 image-gen tool)

What this proves

The thinking-mode 8-image capability lives on /v1/responses with model: gpt-5.4, not on /v1/images/edits. Azure deployments using the older images API don't expose it.
gpt-5.5 silently ignores reasoning.effort (0 reasoning tokens). gpt-5.4 actually engages reasoning (516 tokens this run).
Identity is locked across all 8 panels from a single reference photo — the strongest cross-panel face / hair / age coherence in the whole experiment family.
Default behavior is 1 composite image; the model only emits 8 separate image_generation_calls when the prompt explicitly forbids composites and max_tool_calls is raised.

The eight beats

A single quiet day, chronological. Each panel is a separate image_generation_call in the response.

Beat 1dawn

Sitting on the edge of a low bed in pajamas, looking down, hair tousled, soft blue pre-dawn light.

Beat 2early morning

At the kitchen counter pouring coffee from a small kettle, floral pajamas, sunlit window.

Beat 3mid-morning

On a quiet train in a jacket, head leaned slightly toward the glass, countryside outside.

Beat 4midday

Eating noodles at a tiny cafe counter, chopsticks, blue apron over cream shirt.

Beat 5warm afternoon

Walking a small public garden with autumn trees, beige jacket, dappled afternoon light.

Beat 6late afternoon

Cross-legged on a window seat at home, an open book in hand, warm afternoon light on the page.

Beat 7early evening

At a small home stove with a wooden spoon stirring a pot, blue apron, warm overhead light, steam rising.

Beat 8dusk

At a balcony railing in a soft cardigan, hands resting, looking out over city lights coming on.

The API call (verbatim)

POST https://api.openai.com/v1/responses
Authorization: Bearer <OPENAI_API_KEY>
Content-Type: application/json

{
  "model": "gpt-5.4",
  "input": [
    {
      "role": "user",
      "content": [
        { "type": "input_text",  "text": "...prompt commanding 8 separate image_generation calls..." },
        { "type": "input_image", "image_url": "data:image/png;base64,<reference photo>" }
      ]
    }
  ],
  "tools": [ { "type": "image_generation" } ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,
  "max_tool_calls": 8,
  "reasoning": { "effort": "high" }
}

Comparison to every other multi-image mode tested

Mode	Calls	Latency	Identity hold	Per-image res	Cost
8 separate Azure /edits (today's prod)	8	~5 min seq · ~45 s parallel	varies panel-to-panel (text identity_lock only)	1024²	$0.54
1 composite call (2×4 grid, Mode E)	1	45 s	very strong (single canvas)	~384×512 sub-panels	$0.067
1 composite call + `split_panels.py`	1 + split	45 s + ms	same as above	~384×512	$0.067
gpt-5.4 thinking · 8 tool calls in 1 /v1/responses	1	~11 min	best yet — locked across full-size panels	1024² each	~$0.59

What this means for Pikumo

For the production wizard (90-second user-facing budget), the ~11-min latency rules this out as the default per-panel renderer. Where it earns its place:

Album re-render path. A user requests a higher-quality rerender of their existing 4-panel story. Latency budget is forgiving (background job, email when done). Identity locks far more tightly than production today.
Email teaser / share-card surface. A 4-panel "story preview" card sent in a follow-up email. Cap max_tool_calls: 4, latency scales to ~5 min, well within email-send timing.
"Premium tier" story output. If a paid tier ships, this is the differentiator — visibly tighter identity hold than the free tier's per-panel pipeline.

Generated 2026-05-26. Subject: female_asian headshot from the eval cast pool. Single OpenAI direct API call (not Azure). Reference photo and all 8 PNGs are served from this Cloudflare Pages site — the R2 bucket is private.