Image Pipeline

The image pipeline generates and validates visual assets embedded in section content. Images appear inside HTML sections as <img> tags with an alt description. The pipeline converts that description into a full image generation prompt, evaluates the result, and optionally localizes the text in the generated image.


Position in the section content pipeline

Images are generated after section HTML is produced, as a sub-step of section content production. Each <img> tag triggers the image pipeline independently.

Section content generation
  └── For each section with images:
        ├── Select image type (photo / illustration / diagram / portrait)
        ├── Generate image prompt  →  run image model  →  check quality
        │     └── If score < 7: use improved prompt → retry
        └── (optional) Translate image text

Image types

Four types cover all visual scenarios in section content. The type is selected before generation based on the section context and the alt description.

TypeWhen to usePrompt
photoPeople, workplaces, real-world scenarios with narrative charactersflow_image_photo_prompt
illustrationAbstract concepts, processes, diverse non-narrative charactersflow_image_illustration_prompt
diagramStructured relationships, processes, data layoutsflow_image_diagram_prompt
portraitKnown people, individual facesflow_image_portrait_prompt

Photo

Photorealistic, stock-photo style, soft daylight, blue color scheme. No brand system.

Narrative characters (course.narrative_characters) are passed in with name and appearance descriptions to keep the generated characters visually consistent across sections. The prompt uses a Jinja2 {% for %} loop — rendered by Dify, not the YAML loader.

Illustration

Brand-compliant flat vector illustration (see Brand system below). Diverse characters in casual or professional settings — simplified, no realistic rendering. No drop shadows, no heavy strokes, no decorative textures.

Diagram

Same brand system as illustration, but abstract and structure-focused. No characters. Uses commanding, imperative language in the generation prompt ("consists of exactly", "must be labeled") rather than descriptive phrases. Horizontal layout is optimized with explicit padding instructions to prevent cropping.

Portrait

Neutral daylight, cinematic, stock-photo style. No brand colors.

For well-known people: use the person's name directly in the prompt — do not describe their appearance. For people interacting with screens or paper: define a single, specific camera point of view. The prompt's focus should be on the human interaction, not on showing both the face and the screen content simultaneously.


Generation pipeline

Each image type follows the same two-step loop:

  1. Prompt generation — one of the four type-specific prompts reads the alt attribute from the <img> tag in section HTML and produces a plain-text image generation prompt. Output is unformatted text with no quotation marks — ready for DALL·E, Midjourney, or Flux.

  2. Quality checktool_image_checker evaluates the generated image against the original prompt and the section HTML context. Uses a two-step evaluation: visual interpretation first (without comparing to the prompt), then evaluation against the brief.

Quality check scoring

ScoreMeaningAction
7–10Solid or excellentAccept
5–6Correct meaning, weak executionAccept or retry
0–4Semantic violationReject — use improved prompt

Critical violations (score 0–4): required words missing, typos, words replaced or duplicated in a meaning-changing way, wrong chart data, new semantic content added, elements in wrong positions in diagrams or quadrants.

Not violations: line breaks, punctuation variants, bullet vs inline layout, typographic apostrophe differences, minor spacing differences.

If score < 7, the checker outputs an improved prompt for re-generation.


Translation pipeline

Localizes text inside generated images. Used after image generation when the course language is not English.

  1. flow_image_translation_prompt — analyzes the original image and section context, produces plain-text instructions for an image editor: what to translate and how. Output is instructions, not a translated image.

  2. tool_image_translation_checker — binary YES/NO evaluation: compares the original image, translated image, and the localization instructions. Strict on meaning and grammar, lenient on non-literal phrasing, loanwords, and minor layout differences from language length.

What is not translated

  • Code blocks, error messages, programming-context snippets
  • English acronyms (translate the expansion, not the acronym itself)
  • Brand names
  • Widely accepted English industry terms (may stay in English if commonly used in the target language)

Brand system

Applies to illustration, diagram, and the tool_image_checker retry prompt. Does not apply to photo or portrait.

Defined in prompts/injections/image_brand.yaml. Referenced via {{ image_brand }} in the three prompts above.

PropertyValue
Colors#212D56 #056DFF #F2F5F8 #FFFFFF #D6EAFE
FontAeonik — geometric sans-serif, near-circular letterforms, uniform stroke weight
Font weightExtra-heavy for titles, regular for labels. Titles ≥ 2× the size of labels.
Ratio3:2 (horizontal)
Color temperatureNeutral daylight. No sepia.

Prompts

PromptTypeInputOutput
flow_image_photo_promptDynamicSection HTML, narrative charactersImage generation prompt
flow_image_illustration_promptDynamicSection HTMLImage generation prompt
flow_image_diagram_promptDynamicSection HTMLImage generation prompt
flow_image_portrait_promptDynamicSection HTMLImage generation prompt
tool_image_checkerDynamicImage URL, section HTML, original promptScore + improved prompt if < 7
flow_image_translation_promptDynamicImage URL, section HTML, languageLocalization instructions
tool_image_translation_checkerDynamicOriginal image, translated image, localization promptYES/NO + justification

All prompts live in prompts/image/ and prompts/image_translation/.


Known limitations

Checker retry uses illustration style — when tool_image_checker generates an improved prompt (score < 7), it always applies the illustration/diagram brand style regardless of the original image type. If the original was a photo or portrait, the retry will produce a style mismatch. Fix: pass the original image type as a variable to the checker.

Narrative character consistencyflow_image_photo_prompt passes character descriptions as text. Future improvement: pass a character reference image as image_input to the generation model for better visual consistency across sections.

Related injection