What is a Multilayered AI Architecture?
And how can it supercharge the power of your app?
- If one prompt takes an input, runs it through a series of filters and rules and fundamentally transforms the input data then outputs it to you, multiple coordinated prompts compound those transformations substantially.
- The separation of concerns between AI layers not only makes these systems more manageable, but it helps avoid confusing the AI with a million tasks, bolstering performance for the most important functionalities.
- For systems that are aiming for near-perfect performance or just more consistent results.
An Example from Emstrata
The Emstrata Cycle
The Emstrata Cycle is a standardized series of prompts that run of every turn in an Emstrata simulation.
This cycle retains a comprehensive memory of all entities in the simulation, plans/positions entities on an interactive coordinate plane, writes prose according to exacting instruction, captures secrets and memories, and corrects all continuity errors after the narrative is written.
No single prompt or backend wizardry would be able to accomplish this by itself.
These are the layers (simplified for the example):
- Groundskeeper (system memory)
- Discovery (planning/consequence handling)
- Narration (writing the narrative)
- Chron-Con (correcting any minor errors)
Think Architecturally
Strategize on the best ways to achieve great results for your platform
- Consider your actual goal and then break it down into steps. If you were to perform this action yourself, what steps would you need to follow. Write that down. That's your workflow.
- After formalizing your workflow, think of the type of data transformations you would need throughout that process and then build the prompts to automate, then chain them together.
- Illustrative example: Your platform depending on conversation history for context can cause your token count and performance to take a hit. Perhaps a conversation consolidator prompt would benefit you. And if you want a truly random number to be used in the determination of something in your system, perhaps you have the backend serve that up to your AI, rather than assuming that the LLM's training data can produce anything close to pure randomness.
Correction Layers
The referee of your platform
- Correction layers catch errors after other layers have done their work. They're your quality control layers. They spot continuity breaks, logical inconsistencies, or constraint violations that slipped through.
- In Emstrata: The Chron-Con layer runs after the narrative is written. It checks for things like: Did a character who was in the tavern suddenly appear in the forest without traveling? Did someone use an item they don't have? Are the spatial coordinates consistent with the described action?
- When you need one: If there are complex requirements and expectations that your platform needs to meet. Correcting before revealing the final answer can lower the chance of bad responses.
Reasoning/Strategy Layers
The decision-maker of your platform
- Reasoning layers make decisions before content gets generated. They evaluate the current state, consider available options, assess consequences, and choose a direction. Think of them as the 'planning brain' of your system.
- In Emstrata: Discovery handles this - it looks at what the participant wants to do, considers the simulation state, evaluates what outcomes make narrative sense, and determines how the action should resolve. It's not writing the story yet; it's deciding what should happen.
- When you need one: If you find yourself asking an LLM to both 'figure out what should happen AND write it beautifully,' you're overloading a single prompt. Split it. Reason first, write second.
Memory Consolidation Layers
The stenographer of your platform
- Memory consolidation layers distill what just happened into something retrievable later. They extract the important details from verbose content and store them in a format your system can efficiently query or format into future inputs.
- In Emstrata: Groundskeeper serves this function. After Discovery determines what happens and Narration writes it, Groundskeeper updates the comprehensive memory of all entities and the emergent narrative. It's maintaining the source of truth about the simulation state.
Content Layers
The performer of your platform
- Content layers generate the actual output users experience - the prose, dialogue, descriptions, or interface text. These layers take decisions from reasoning layers and context from memory layers, then craft the experience.
- Emstrata's Narration layer does this. It receives Discovery's decisions about what happened, checks Groundskeeper's simulation state, and writes the actual narrative text that players read. It's optimizing for atmosphere, pacing, and emotional resonance - not logic or consistency (that's handled elsewhere).
Catch-All/Connector Layers
The clean-up crew of your platform
- Not every layer fits a clean category. Catch-all layers are hybrids that do complementary work for multiple other layers. They handle tasks that don't belong to any single specialized layer but are essential for the system to function cohesively.
- These layers often emerge when you discover gaps like two layers need to work together but speak different 'languages,' or several layers all need the same preprocessing that none of them should be responsible for individually.
- In Emstrata: The Chron-Con does more than just error correction. It also tracks secrets and memories from the narrative, explicitly tagging them for Groundskeeper to integrate into system memory. You don't want Narration burdened with the unrelated task of extracting and categorizing secrets while it's trying to write high-quality prose. And Groundskeeper needs these pieces explicitly labeled as 'secrets' or 'memories' to properly integrate them into the simulation history. The Chron-Con bridges this gap.
Cyclical Vs Circumstantial Systems
And everything in-between
Cyclical systems
Cyclical systems run the same prompts every time. Emstrata follows this pattern: every turn runs Discovery, then Narration, then Chron-Con, then Groundskeeper, in that exact order. The flow is predictable and consistent regardless of what happens in the simulation. You always know what's executing next, which makes debugging straightforward and cost estimation more reliable.
Circumstantial systems
Circumstantial systems determine the pathway based on outcomes or AI direction. The route through your architecture changes depending on what happened in previous steps. Maybe an error detection layer decides whether correction is needed. Maybe a routing layer examines user intent and sends the request down completely different processing paths. The system adapts its own execution flow based on runtime conditions.
Hybrid systems
Hybrid systems are mostly cyclical at their base, but circumstantial at times when specific conditions warrant different handling. You might always run your core cycle, but branch to specialized subsystems when certain triggers fire. Many real-world systems end up here. It's a reliable backbone with conditional branches for edge cases. Emstrata has a number of circumstantial offshoots as well.
Agnostic Backend Interaction
What happens between AI layers
- Data Persistence and Utility: Between AI layers, it's important to save important, transformed data to the backend for future retrieval, debugging, rerunning if there's an error, etc.
- Data Reusability and Presentation: Saving data also allows you to present that data in interesting ways later or feed that data into other layers in the future.
- Unbiased Decision-Making: Also, when you need an unbiased judge, the backend is the place to go. The backend is 'agnostic' to outcome, whereas the AI may or may not have a strong preference and display it.
- Emstrata Example (Weighted Randomness): In Emstrata, consequences are rolled and use weighted randomness. The Discovery layer determines the likelihood something happens, and then the backend returns a random number out of 1000. If that number is within the set likelihood range, the backend serves the confirmed consequence to the Narration layer, if it's outside of the range, it sends the failure outcome.
Randomness Injection
A jolt of creativity
- If you grow tired of tropes and clichés in your responses, I have an answer: Random Concept Injection.
- This is something I do for parts of the system that do creative heavy-lifting. Oftentimes, AIs will hop to really tried and true answers to creative questions, which is great for reasoning well, but not so much for surprising an audience.
- I use this to get names for characters that aren't baked into the training data, inject interesting concepts into simulations, and build out characters based on character archetypes.
- It can be used for any list of random strings you'd like to be potentially incorporated into a particular decision-making process.
Cost Considerations
Usage costs will likely increase
- Multilayered architectures cost more than single-prompt systems. Each layer is an API call, and those add up. If you're running a four-layer cycle on every user interaction, you're potentially paying 4x what a single prompt would cost (depending on usage costs and tokens). That's the trade you're making for better results.
- But they work better when properly configured. The question isn't 'should I add more layers to save money'; it's 'does the quality improvement justify the cost for my use case?' A customer service bot might not need four layers. A narrative engine generating premium content probably does.
- Optimization strategies exist. Use cheaper models for simpler layers (correction doesn't need the most expensive model), cache aggressively for cyclical systems, and be honest about cutting layers that aren't pulling their weight. Every layer should earn its spot.
Performance Considerations
Speed vs quality
- More layers means more latency. If you need fast responses, waiting on three consecutive prompts to complete is probably a bad solution.
- But parallelization can help. Some layers don't depend on each other and can run simultaneously. If your reasoning layer and your memory retrieval layer both only need the user input, run them in parallel.
- Performance can be helped and hurt by layering. Adding a layer isn't always the answer. Sometimes consolidating two weak layers into one strong prompt improves both speed and quality.
Hallucination Considerations
Avoiding architectures that tend to compound hallucinations
- In multilayered systems, hallucinations compound. One layer's mistake becomes the next layer's input. If your reasoning layer hallucinates a fact and your content layer writes it beautifully, you've just produced confidently wrong output. The more layers, the more opportunities for errors to slip through and get amplified.
- Correction layers should come before memory consolidation. If you don't catch errors before they enter your system's permanent memory, those minor mistakes slip into history and slowly expand. They reintroduce themselves ad infinitum, compounding with each cycle until your system's "source of truth" is corrupted.
Major Takeaways
What to remember
- Multilayered architectures compound transformations. Each layer takes input, transforms it, and passes it forward. The power comes from coordinating these transformations to achieve results no single prompt could accomplish.
- Layer types provide a vocabulary for building. Correction, reasoning, memory consolidation, content, and catch-all layers each serve distinct purposes. Understanding these patterns helps you architect intentionally rather than intuitively.
- Cyclical systems run the same flow every time. Circumstantial systems adapt their pathway based on outcomes. Most production systems end up somewhere in-between.
- Backend integration handles what LLMs can't. True randomness, deterministic calculations, unbiased judgment, and data persistence belong outside the AI layers.
System Prompt Generator Tool
Reminding you that this exists and is a great way to get started
- Available now on https://nicholasmgoldstein.com/system-prompt-generator
- Prebuilt modular system prompt skeleton that can give you a basis to build upon
- Feel free to copy/paste this into Notion, Google Docs, Microsoft Word, or whatever you plan to use and add your own modules/rulesets and logic