Knowledge Depository

Why this exists

Before Fraya can generate course content, it needs a deep knowledge foundation for the topic. Without it, generated content risks being shallow, historically thin, or missing the conceptual frameworks that make e-learning meaningful.

A Knowledge Depository is that foundation — an exhaustive, structured document covering a topic from multiple angles: definitions, history, theory, evidence, case studies, terminology, and cultural context. It is generated once per topic and reused across all content generation steps downstream.

How it works

The depository is built in two sequential steps. The output of step 1 is the input of step 2.

flow_depository_plan   ← step 1: analyze topic, define structure
              ↓ JSON
flow_depository_fill   ← step 2: fill each section with content
              ↓ Markdown
         Knowledge Depository

Step 1 — Plan (flow_depository_plan)

Analyzes the topic and produces a JSON structure that defines:

  • Which sections to include (selected from the section taxonomy below)
  • Whether external research is needed (RAG or web search) — see note below
  • Search queries to fill gaps, if applicable

Inputs: topic, optional industry, and optionally level_1 / level_2 / level_3 for hierarchical context (e.g., course → module → lesson).

Output: JSON with depository_structure, web_search_need, web_search_query, rag_needed, and rag_queries.

Step 2 — Fill (flow_depository_fill)

Takes the topic and the JSON structure from step 1 and writes the full depository content. Target length: 3000+ words. Each section is populated with definitions, evidence, case studies, resources, and cross-references.

Inputs: topic, optional industry, depository_structure (JSON from step 1), files (optional research content).

When industry context is provided, step 2 should keep the final depository industry-aware as well, adapting terminology, examples, practical references, and constraints to that domain instead of using industry context only during section planning.

files is a text variable containing research material converted to markdown — e.g., a web search result or a RAG-retrieved document. It is passed directly in the user prompt and integrated seamlessly into the relevant sections.

Output: Long-form Markdown document.


Section taxonomy

The following sections are available. Step 1 selects only those relevant to the topic.

SectionWhen to include
Foundational Concepts and DefinitionsAlways
Historical ContextAlways
Theoretical Frameworks and ModelsWhen established models exist
Legal and Regulatory FrameworksLegal/compliance topics only
Risk Areas and ChallengesWhen risks are central to the topic
Best Practices and StrategiesWhen actionable guidance is the goal
Practical Case Studies and ExamplesWhen real-world illustration adds value
Recommended Training ApproachesAlways
Key Resources and ReferencesAlways
Leading ExpertsWhen the field has recognized thought leaders
Implementation Tools and TemplatesWhen the topic is operational/practical
Relevant TerminologyWhen the topic has dense or specialized vocabulary
Cultural and Contextual ConsiderationsCross-cultural or context-sensitive topics
Evidence/ResearchWhen empirical backing is available
SynthesisAlways — closes the depository with integrated insights

RAG / Web Search logic

⚠️ Not yet implemented. This logic is defined in the system prompt but is not active in the current Dify workflow. Treat as nice-to-have for a future iteration.

The plan step classifies topics into four categories to decide whether external research is needed:

CategoryExamplesResearch
Universal / generalWork-life balance, time management, emotional intelligenceNone — use LLM knowledge
Legal / regulatoryGDPR, AGG, cybersecurity complianceRAG — fetch primary sources
Evolving / techAI & Automation, prompting, LLMsWeb search — recent trends and reports
InterdisciplinaryPersonality types, leadership stylesNone — use LLM knowledge

When research is needed, step 1 generates search queries. These are currently executed manually and the results passed to step 2 via the files variable. Full automation (RAG pipeline or web search integration) is a future milestone.


Future: Depository trimming

⚠️ Not yet implemented. Placeholder prompt exists at prompts/flow_depository_trim.yaml.

The full depository is broad by design — it covers the topic exhaustively regardless of the specific angle the course will take. But once the course concept is defined (scope, audience, learning objectives), most of that knowledge becomes irrelevant context that inflates token usage in every downstream step.

The idea: after the course concept is defined, run a trimming step that strips the depository down to only the sections and content that are actually relevant to this specific course. The trimmed depository then replaces the full one for all subsequent content generation.

Why this matters:

  • Reduces context size passed to section content, quiz, and artifact prompts
  • Improves output focus and relevance
  • Lowers token costs across the entire pipeline

Planned inputs:

  • course_concept — the defined course concept (scope, audience, objectives, angle)
  • depository — the full depository Markdown to be trimmed

Planned output: Trimmed Markdown — same structure, reduced scope.