Workflow
Full description of the Synthesia Dify workflow — inputs, internal logic, and node-by-node flow.
Inputs
| Input | Type | Default | Description |
|---|---|---|---|
text | paragraph | — | Source content for the video |
course_title | string | — | Localized course title |
video_title | string | — | Title of the current video section |
avatar_id | string | — | Synthesia avatar UUID (from trainer asset) |
tone_of_voice | paragraph | — | Trainer voice profile (from trainer asset) |
sections_position | string | — | Section number prefix, e.g. "1.1.2" |
background_image | string | — | URL of the slide background image |
video_length | select | normal | short or normal; short = 2–3 slide template, 100–150 words total |
test_mode | select | true | "true" or "false"; test = does not publish the video in Synthesia |
fake_mode | select | false | "true" or "false"; see below |
language | select | Default | Default (auto detection), fr, or nl |
format | select | VideoPresentation | VideoPresentation or VideoTalkingHead |
Full pipeline
Start
├─► IF fake_mode = true ──► fake UUID ──► End (no API call made)
│
└─► Synthesia GET /v2/templates
│
└─► Filter by curated allowlist from repo
(34 supported template IDs; see templates.md)
│
└─► Split by format
(VideoPresentation or VideoTalkingHead)
│
└─► LLM Step 1 — template selection (Gemini 2.5 Pro)
(choose best template, generate image prompts if needed)
│
└─► IF generate_image = true?
├─ true ──► Iteration (parallel ×4):
│ for each image prompt →
│ IF aspect_ratio = horizontal?
│ → Nano Banana 1536×1024
│ → Nano Banana 1024×1536
│ → collect {variable_name, image_url}
└─ false ─┐
└─► Template-transform (variables + image URLs)
│
└─► Loop (up to 10 retries):
LLM Step 2 — build payload (Gemini 2.5 Pro)
│
└─► Validate JSON (Python)
└─► IF valid?
├─ true ──► Exit loop
└─ false ─► retry
│
└─► POST https://api.synthesia.io/v2/videos/fromTemplate
│
└─► IF status_code = 6000?
├─ yes ──► fallback POST (hardcoded payload)
└─► Extract video_id ──► End
Template sources and filtering
available_templates is not passed in by the caller. The workflow builds it internally
at runtime from two different sources with different roles:
- Repository allowlist — the pipeline supports a curated set of 34 template IDs. This curated list is documented in templates.md. The repo is the source of truth for which templates are allowed in Fraya.
- Synthesia API —
GET /v2/templates?limit=100&offset=0&source=workspacefetches all templates currently present in the Synthesia workspace, including their live metadata:id,title,description, andvariables.
At runtime, the workflow:
- fetches the full template catalog from Synthesia;
- filters that catalog by the curated allowlist of 34 IDs supported by Fraya;
- keeps the live Synthesia metadata for the matching templates;
- splits the result into
presentationsvstalking_head; - passes only the subset matching the
formatinput to Step 1.
This distinction matters:
- The repo decides which template IDs are allowed.
- Synthesia provides the current metadata for those IDs.
- The workflow does not treat the repo as the source of variable definitions or template descriptions at runtime.
Transition note: Google Sheets
Historically, the allowlist was maintained in Google Sheets and later copied into the repo. Some Dify workflow exports may still contain a Google Sheets node from that earlier setup.
For documentation purposes, treat Google Sheets as legacy migration residue, not as the intended long-term source of truth. The target model is:
repo allowlist + live Synthesia metadata
If the workflow implementation still reads the old sheet in some environment, it should be understood as transitional behavior rather than the architectural design of the system.
Prompt sourcing
The Synthesia workflow should not treat prompt text as manually copied workflow content. The intended pattern is:
- prompt definitions live in
prompts/video/ - Dify fetches prompt metadata from
GET /api/prompts/:namewhen it needs the contract - Dify renders prompt text from
POST /api/prompts/:nameat runtime - the prompt API always returns
system+user, even if the underlying YAML file was authored as a single-message prompt
For the video workflow, this means Step 1 and Step 2 prompt text should be loaded from the Fraya API rather than maintained as separate prompt copies inside Dify.
fake_mode
When fake_mode = "true", the workflow skips the entire pipeline and immediately returns a
randomly generated fake video ID. No LLM calls, no API calls, no image generation.
Use this to test the calling workflow (e.g. section content pipeline) without spending Synthesia credits or waiting for actual rendering.
Image generation
If Step 1 decides images are needed (generate_image = true), the workflow runs an Iteration
node (4 parallel threads) before Step 2.
For each image prompt returned by Step 1:
aspect_ratio = horizontal→ Nano Banana at1536×1024aspect_ratio = vertical→ Nano Banana at1024×1536
Results are collected as {variable_name, image_url} pairs and passed to Step 2.
Two image styles — see content_rules.md for prompting guidelines.
JSON validation loop
After Step 2 generates the payload, a Python validator runs inside a loop (up to 10 retries):
What it checks:
- Required top-level fields present:
test,templateData,visibility,templateId,title templateDatais parseable (array of"key: value"strings → converts to dict)avatar_idis intemplateData(injected from workflow input if missing)background_imageis intemplateData(injected from workflow input if missing)- All required template variables from Step 1 are present (no missing keys)
- No empty values (empty string → replaced with single space
" ") - Whitespace normalized (
\u00A0,\u200Bremoved)
If valid = false, the LLM retries with the same instructions. In practice, 1–2 iterations suffice.
The validated dict (not the original array-of-strings) is what gets POSTed to Synthesia.
Worked example
Real short-form talking-head run, simplified from production output.
1. Workflow input
{
"text": "<p>Hai costruito le tue fondamenta interiori... Esploriamo come formare queste connessioni forti e interdipendenti.</p>",
"course_title": "Le 7 abitudini per essere più efficace",
"video_title": "La vittoria pubblica: dall'io al noi",
"avatar_id": "49dc8f46-8c08-45f1-8608-57069c173827",
"sections_position": "R:171272 - 2.1.1(1787)",
"tone_of_voice": "Tone: Clear, insightful, minimalist, guiding...",
"test_mode": false,
"fake_mode": false,
"background_image": "synthesia.3f9b74bc-f870-4cc6-881e-7bff2394b686",
"video_length": "short",
"language": "Default (auto detection)",
"format": "VideoTalkingHead"
}
2. What the pipeline does with it
- Fetches all workspace templates from Synthesia.
- Filters them by Fraya's curated allowlist of supported template IDs.
- Keeps only the
VideoTalkingHeadsubset. - Step 1 selects template
9c23e3c3-401b-4373-9ca2-e15b669974a3:[de] Talking head (A-Q-A-F). - No image generation is needed for this template.
- Step 2 writes the slide scripts and text variables.
- The validator normalizes the payload and converts
templateDatainto a dict before the API call.
3. Final payload sent to Synthesia
{
"test": false,
"templateData": {
"course_title": "Le 7 abitudini per essere più efficace",
"video_title": "La vittoria pubblica: dall'io al noi",
"slide_0_script_intro": "Ora, passiamo dall'io al noi.",
"slide_1_script": "Hai costruito le tue fondamenta interiori. Ma l'autosufficienza non è il punto d'arrivo. È il terreno stabile di cui hai bisogno per connetterti con gli altri.",
"slide_2_key_idea_max_50": "Dall'autosufficienza alla collaborazione.",
"slide_3_script": "Per raggiungere una vera collaborazione, devi prima costruire la fiducia, la valuta di ogni relazione. <break time=\"1s\" /> In questo modo, la tua mente si allontana dalla competizione e inizi ad ascoltare profondamente per comprendere.",
"slide_4_title_max_30": "Connessioni interdipendenti",
"slide_4_textblock": "🔹 Costruire fiducia\n🔹 Ascoltare per capire\n🔹 Creare soluzioni insieme",
"slide_4_script": "Alla fine, impari a fondere punti di vista diversi per creare soluzioni che nessuna singola persona potrebbe costruire da sola. Questo è il potere delle connessioni interdipendenti.",
"avatar_id": "49dc8f46-8c08-45f1-8608-57069c173827",
"background_image": "synthesia.3f9b74bc-f870-4cc6-881e-7bff2394b686"
},
"visibility": "public",
"templateId": "9c23e3c3-401b-4373-9ca2-e15b669974a3",
"title": "R:171272 - 2.1.1(1787) La vittoria pubblica: dall'io al noi ([de] Talking head (A-Q-A-F))"
}
4. Workflow output
{
"video_id": "701d1437-d1ec-4d19-8598-29a3366fe19b",
"json": {
"test": false,
"templateData": {
"course_title": "Le 7 abitudini per essere più efficace",
"video_title": "La vittoria pubblica: dall'io al noi",
"slide_0_script_intro": "Ora, passiamo dall'io al noi.",
"slide_1_script": "Hai costruito le tue fondamenta interiori. Ma l'autosufficienza non è il punto d'arrivo. È il terreno stabile di cui hai bisogno per connetterti con gli altri.",
"slide_2_key_idea_max_50": "Dall'autosufficienza alla collaborazione.",
"slide_3_script": "Per raggiungere una vera collaborazione, devi prima costruire la fiducia, la valuta di ogni relazione. <break time=\"1s\" /> In questo modo, la tua mente si allontana dalla competizione e inizi ad ascoltare profondamente per comprendere.",
"slide_4_title_max_30": "Connessioni interdipendenti",
"slide_4_textblock": "🔹 Costruire fiducia\n🔹 Ascoltare per capire\n🔹 Creare soluzioni insieme",
"slide_4_script": "Alla fine, impari a fondere punti di vista diversi per creare soluzioni che nessuna singola persona potrebbe costruire da sola. Questo è il potere delle connessioni interdipendenti.",
"avatar_id": "49dc8f46-8c08-45f1-8608-57069c173827",
"background_image": "synthesia.3f9b74bc-f870-4cc6-881e-7bff2394b686"
},
"visibility": "public",
"templateId": "9c23e3c3-401b-4373-9ca2-e15b669974a3",
"title": "R:171272 - 2.1.1(1787) La vittoria pubblica: dall'io al noi ([de] Talking head (A-Q-A-F))"
}
}
5. Why this example is useful
- It shows the difference between the LLM output contract of Step 2 and the validated runtime payload used by the workflow.
- It demonstrates a real
VideoTalkingHeadselection for ashortvideo. - It shows that the final workflow result returns both the created
video_idand the normalized JSON payload that was sent to Synthesia.
Production design rules
These rules apply when creating templates in Synthesia, not during the LLM pipeline.
Backgrounds and transitions:
- Use environmental backgrounds on talking-head (avatar-only) slides.
- Use blue solid backgrounds on presentation (list/text) slides.
- Background changes between slides → set transition to fade.
- Background stays the same between slides → no transition.
Layout:
- Logo in bottom right corner. If occupied → top right.
- Use in-between title slides (
Q-type) in presentation templates. - Use different zoom levels across slides for visual rhythm.
Slide structure:
- Every template starts with
slide_0_script_intro+ avatar +video_title+course_title. Qslides display a framing question or key statement in the centre — no voiceover.- Limit continuous
I(diagram) slides to 4. For larger models, use grouping.