Image Pipeline
The image pipeline generates and validates visual assets embedded in section content.
Images appear inside HTML sections as <img> tags with an alt description. The pipeline
converts that description into a full image generation prompt, evaluates the result,
and optionally localizes the text in the generated image.
Position in the section content pipeline
Images are generated after section HTML is produced, as a sub-step of section content
production. Each <img> tag triggers the image pipeline independently.
Section content generation
└── For each section with images:
├── Select image type (photo / illustration / diagram / portrait)
├── Generate image prompt → run image model → check quality
│ └── If score < 7: use improved prompt → retry
└── (optional) Translate image text
Image types
Four types cover all visual scenarios in section content. The type is selected before
generation based on the section context and the alt description.
| Type | When to use | Prompt |
|---|---|---|
photo | People, workplaces, real-world scenarios with narrative characters | flow_image_photo_prompt |
illustration | Abstract concepts, processes, diverse non-narrative characters | flow_image_illustration_prompt |
diagram | Structured relationships, processes, data layouts | flow_image_diagram_prompt |
portrait | Known people, individual faces | flow_image_portrait_prompt |
Photo
Photorealistic, stock-photo style, soft daylight, blue color scheme. No brand system.
Narrative characters (course.narrative_characters) are passed in with name and appearance
descriptions to keep the generated characters visually consistent across sections. The prompt
uses a Jinja2 {% for %} loop — rendered by Dify, not the YAML loader.
Illustration
Brand-compliant flat vector illustration (see Brand system below). Diverse characters in casual or professional settings — simplified, no realistic rendering. No drop shadows, no heavy strokes, no decorative textures.
Diagram
Same brand system as illustration, but abstract and structure-focused. No characters. Uses commanding, imperative language in the generation prompt ("consists of exactly", "must be labeled") rather than descriptive phrases. Horizontal layout is optimized with explicit padding instructions to prevent cropping.
Portrait
Neutral daylight, cinematic, stock-photo style. No brand colors.
For well-known people: use the person's name directly in the prompt — do not describe their appearance. For people interacting with screens or paper: define a single, specific camera point of view. The prompt's focus should be on the human interaction, not on showing both the face and the screen content simultaneously.
Generation pipeline
Each image type follows the same two-step loop:
-
Prompt generation — one of the four type-specific prompts reads the
altattribute from the<img>tag in section HTML and produces a plain-text image generation prompt. Output is unformatted text with no quotation marks — ready for DALL·E, Midjourney, or Flux. -
Quality check —
tool_image_checkerevaluates the generated image against the original prompt and the section HTML context. Uses a two-step evaluation: visual interpretation first (without comparing to the prompt), then evaluation against the brief.
Quality check scoring
| Score | Meaning | Action |
|---|---|---|
| 7–10 | Solid or excellent | Accept |
| 5–6 | Correct meaning, weak execution | Accept or retry |
| 0–4 | Semantic violation | Reject — use improved prompt |
Critical violations (score 0–4): required words missing, typos, words replaced or duplicated in a meaning-changing way, wrong chart data, new semantic content added, elements in wrong positions in diagrams or quadrants.
Not violations: line breaks, punctuation variants, bullet vs inline layout, typographic apostrophe differences, minor spacing differences.
If score < 7, the checker outputs an improved prompt for re-generation.
Translation pipeline
Localizes text inside generated images. Used after image generation when the course language is not English.
-
flow_image_translation_prompt— analyzes the original image and section context, produces plain-text instructions for an image editor: what to translate and how. Output is instructions, not a translated image. -
tool_image_translation_checker— binary YES/NO evaluation: compares the original image, translated image, and the localization instructions. Strict on meaning and grammar, lenient on non-literal phrasing, loanwords, and minor layout differences from language length.
What is not translated
- Code blocks, error messages, programming-context snippets
- English acronyms (translate the expansion, not the acronym itself)
- Brand names
- Widely accepted English industry terms (may stay in English if commonly used in the target language)
Brand system
Applies to illustration, diagram, and the tool_image_checker retry prompt.
Does not apply to photo or portrait.
Defined in prompts/injections/image_brand.yaml. Referenced via {{ image_brand }}
in the three prompts above.
| Property | Value |
|---|---|
| Colors | #212D56 #056DFF #F2F5F8 #FFFFFF #D6EAFE |
| Font | Aeonik — geometric sans-serif, near-circular letterforms, uniform stroke weight |
| Font weight | Extra-heavy for titles, regular for labels. Titles ≥ 2× the size of labels. |
| Ratio | 3:2 (horizontal) |
| Color temperature | Neutral daylight. No sepia. |
Prompts
| Prompt | Type | Input | Output |
|---|---|---|---|
flow_image_photo_prompt | Dynamic | Section HTML, narrative characters | Image generation prompt |
flow_image_illustration_prompt | Dynamic | Section HTML | Image generation prompt |
flow_image_diagram_prompt | Dynamic | Section HTML | Image generation prompt |
flow_image_portrait_prompt | Dynamic | Section HTML | Image generation prompt |
tool_image_checker | Dynamic | Image URL, section HTML, original prompt | Score + improved prompt if < 7 |
flow_image_translation_prompt | Dynamic | Image URL, section HTML, language | Localization instructions |
tool_image_translation_checker | Dynamic | Original image, translated image, localization prompt | YES/NO + justification |
All prompts live in prompts/image/ and prompts/image_translation/.
Known limitations
Checker retry uses illustration style — when tool_image_checker generates an improved
prompt (score < 7), it always applies the illustration/diagram brand style regardless of
the original image type. If the original was a photo or portrait, the retry will
produce a style mismatch. Fix: pass the original image type as a variable to the checker.
Narrative character consistency — flow_image_photo_prompt passes character descriptions
as text. Future improvement: pass a character reference image as image_input to the
generation model for better visual consistency across sections.