Knowledge Depository

Why this exists

Before Fraya can generate course content, it needs a deep knowledge foundation for the topic. Without it, generated content risks being shallow, historically thin, or missing the conceptual frameworks that make e-learning meaningful.

A Knowledge Depository is that foundation — an exhaustive, structured document covering a topic from multiple angles: definitions, history, theory, evidence, case studies, terminology, and cultural context. It is generated once per topic and reused across all content generation steps downstream.

How it works

The depository is built in two sequential steps. The output of step 1 is the input of step 2.

flow_depository_plan   ← step 1: analyze topic, define structure
              ↓ JSON
flow_depository_fill   ← step 2: fill each section with content
              ↓ Markdown
         Knowledge Depository

Step 1 — Plan (`flow_depository_plan`)

Analyzes the topic and produces a JSON structure that defines:

Which sections to include (selected from the section taxonomy below)
Whether external research is needed (RAG or web search) — see note below
Search queries to fill gaps, if applicable

Inputs: topic, optional industry, and optionally level_1 / level_2 / level_3 for hierarchical context (e.g., course → module → lesson).

Output: JSON with depository_structure, web_search_need, web_search_query, rag_needed, and rag_queries.

Step 2 — Fill (`flow_depository_fill`)

Takes the topic and the JSON structure from step 1 and writes the full depository content. Target length: 3000+ words. Each section is populated with definitions, evidence, case studies, resources, and cross-references.

Inputs: topic, optional industry, depository_structure (JSON from step 1), files (optional research content).

When industry context is provided, step 2 should keep the final depository industry-aware as well, adapting terminology, examples, practical references, and constraints to that domain instead of using industry context only during section planning.

files is a text variable containing research material converted to markdown — e.g., a web search result or a RAG-retrieved document. It is passed directly in the user prompt and integrated seamlessly into the relevant sections.

Output: Long-form Markdown document.

Section taxonomy

The following sections are available. Step 1 selects only those relevant to the topic.

Section	When to include
Foundational Concepts and Definitions	Always
Historical Context	Always
Theoretical Frameworks and Models	When established models exist
Legal and Regulatory Frameworks	Legal/compliance topics only
Risk Areas and Challenges	When risks are central to the topic
Best Practices and Strategies	When actionable guidance is the goal
Practical Case Studies and Examples	When real-world illustration adds value
Recommended Training Approaches	Always
Key Resources and References	Always
Leading Experts	When the field has recognized thought leaders
Implementation Tools and Templates	When the topic is operational/practical
Relevant Terminology	When the topic has dense or specialized vocabulary
Cultural and Contextual Considerations	Cross-cultural or context-sensitive topics
Evidence/Research	When empirical backing is available
Synthesis	Always — closes the depository with integrated insights

RAG / Web Search logic

⚠️ Not yet implemented. This logic is defined in the system prompt but is not active in the current Dify workflow. Treat as nice-to-have for a future iteration.

The plan step classifies topics into four categories to decide whether external research is needed:

Category	Examples	Research
Universal / general	Work-life balance, time management, emotional intelligence	None — use LLM knowledge
Legal / regulatory	GDPR, AGG, cybersecurity compliance	RAG — fetch primary sources
Evolving / tech	AI & Automation, prompting, LLMs	Web search — recent trends and reports
Interdisciplinary	Personality types, leadership styles	None — use LLM knowledge

When research is needed, step 1 generates search queries. These are currently executed manually and the results passed to step 2 via the files variable. Full automation (RAG pipeline or web search integration) is a future milestone.

Future: Depository trimming

⚠️ Not yet implemented. Placeholder prompt exists at prompts/flow_depository_trim.yaml.

The full depository is broad by design — it covers the topic exhaustively regardless of the specific angle the course will take. But once the course concept is defined (scope, audience, learning objectives), most of that knowledge becomes irrelevant context that inflates token usage in every downstream step.

The idea: after the course concept is defined, run a trimming step that strips the depository down to only the sections and content that are actually relevant to this specific course. The trimmed depository then replaces the full one for all subsequent content generation.

Why this matters:

Reduces context size passed to section content, quiz, and artifact prompts
Improves output focus and relevance
Lowers token costs across the entire pipeline

Planned inputs:

course_concept — the defined course concept (scope, audience, objectives, angle)
depository — the full depository Markdown to be trimmed

Planned output: Trimmed Markdown — same structure, reduced scope.

Audience analysis — runs in parallel; provides learner context
Outline pipeline — consumes depository output (Steps 1–5)
Section content — depository is passed to every section author