Knowledge Depository
Why this exists
Before Fraya can generate course content, it needs a deep knowledge foundation for the topic. Without it, generated content risks being shallow, historically thin, or missing the conceptual frameworks that make e-learning meaningful.
A Knowledge Depository is that foundation — an exhaustive, structured document covering a topic from multiple angles: definitions, history, theory, evidence, case studies, terminology, and cultural context. It is generated once per topic and reused across all content generation steps downstream.
How it works
The depository is built in two sequential steps. The output of step 1 is the input of step 2.
flow_depository_plan ← step 1: analyze topic, define structure
↓ JSON
flow_depository_fill ← step 2: fill each section with content
↓ Markdown
Knowledge Depository
Step 1 — Plan (flow_depository_plan)
Analyzes the topic and produces a JSON structure that defines:
- Which sections to include (selected from the section taxonomy below)
- Whether external research is needed (RAG or web search) — see note below
- Search queries to fill gaps, if applicable
Inputs: topic, optional industry, and optionally level_1 / level_2 / level_3
for hierarchical context (e.g., course → module → lesson).
Output: JSON with depository_structure, web_search_need, web_search_query,
rag_needed, and rag_queries.
Step 2 — Fill (flow_depository_fill)
Takes the topic and the JSON structure from step 1 and writes the full depository content. Target length: 3000+ words. Each section is populated with definitions, evidence, case studies, resources, and cross-references.
Inputs: topic, optional industry, depository_structure (JSON from step 1),
files (optional research content).
When industry context is provided, step 2 should keep the final depository industry-aware as well, adapting terminology, examples, practical references, and constraints to that domain instead of using industry context only during section planning.
files is a text variable containing research material converted to markdown — e.g.,
a web search result or a RAG-retrieved document. It is passed directly in the user prompt
and integrated seamlessly into the relevant sections.
Output: Long-form Markdown document.
Section taxonomy
The following sections are available. Step 1 selects only those relevant to the topic.
| Section | When to include |
|---|---|
| Foundational Concepts and Definitions | Always |
| Historical Context | Always |
| Theoretical Frameworks and Models | When established models exist |
| Legal and Regulatory Frameworks | Legal/compliance topics only |
| Risk Areas and Challenges | When risks are central to the topic |
| Best Practices and Strategies | When actionable guidance is the goal |
| Practical Case Studies and Examples | When real-world illustration adds value |
| Recommended Training Approaches | Always |
| Key Resources and References | Always |
| Leading Experts | When the field has recognized thought leaders |
| Implementation Tools and Templates | When the topic is operational/practical |
| Relevant Terminology | When the topic has dense or specialized vocabulary |
| Cultural and Contextual Considerations | Cross-cultural or context-sensitive topics |
| Evidence/Research | When empirical backing is available |
| Synthesis | Always — closes the depository with integrated insights |
RAG / Web Search logic
⚠️ Not yet implemented. This logic is defined in the system prompt but is not active in the current Dify workflow. Treat as nice-to-have for a future iteration.
The plan step classifies topics into four categories to decide whether external research is needed:
| Category | Examples | Research |
|---|---|---|
| Universal / general | Work-life balance, time management, emotional intelligence | None — use LLM knowledge |
| Legal / regulatory | GDPR, AGG, cybersecurity compliance | RAG — fetch primary sources |
| Evolving / tech | AI & Automation, prompting, LLMs | Web search — recent trends and reports |
| Interdisciplinary | Personality types, leadership styles | None — use LLM knowledge |
When research is needed, step 1 generates search queries. These are currently executed
manually and the results passed to step 2 via the files variable. Full automation
(RAG pipeline or web search integration) is a future milestone.
Future: Depository trimming
⚠️ Not yet implemented. Placeholder prompt exists at
prompts/flow_depository_trim.yaml.
The full depository is broad by design — it covers the topic exhaustively regardless of the specific angle the course will take. But once the course concept is defined (scope, audience, learning objectives), most of that knowledge becomes irrelevant context that inflates token usage in every downstream step.
The idea: after the course concept is defined, run a trimming step that strips the depository down to only the sections and content that are actually relevant to this specific course. The trimmed depository then replaces the full one for all subsequent content generation.
Why this matters:
- Reduces context size passed to section content, quiz, and artifact prompts
- Improves output focus and relevance
- Lowers token costs across the entire pipeline
Planned inputs:
course_concept— the defined course concept (scope, audience, objectives, angle)depository— the full depository Markdown to be trimmed
Planned output: Trimmed Markdown — same structure, reduced scope.
Related
- Audience analysis — runs in parallel; provides learner context
- Outline pipeline — consumes depository output (Steps 1–5)
- Section content — depository is passed to every section author