Here’s how we propose to structure the platform:
1. Custom Data Understanding Layer
This is the foundation. It captures the structure, semantics, and business logic of your data estate.
Key components:
- Schema metadata registry: All tables, columns, relationships, data types, update frequencies.
- Business ontology & semantic model: Define entities (e.g., Customer, Product, Region), dimensions (Time, Geography), metrics (Revenue, Active Users) and their business definitions.
- Synonyms & domain language mapping: Map how users say things (“my area”, “territory”, “geo”) to semantic entities.
- Business-case catalog: Pre-defined use-cases (e.g., “why did churn increase?”, “top accounts by growth”) with context around metrics, time-frames, dimensions.
- Data quality and access metadata: Flags for freshness, reliability, and governance (who can access what, what’s masked).
Why this matters: Without this layer your agents will misinterpret user language, use wrong joins, mis-select metrics, and generate untrusted results. Tellius emphasises that building a semantic layer is non-optional for enterprise agentic analytics. (Tellius)
Implementation tip: Start small (one business domain) and iterate the ontology. Provide a searchable dictionary for users. Ensure the layer drives autocomplete and user interaction.
2. Learning & Retrieval Layer
Once you have the semantic base, you need the system to learn from interactions and retrieve context-aware content.
**Key components: **
- User intent parser: Parses query types (descriptive, diagnostic, predictive, prescriptive) and extracts entities/timeframes/filters. Inspired by Tellius’s “planner” step. (Tellius)
- Contextual memory & session history: Maintain conversational context so follow-up questions (“What about Q4?”) are understood.
- Learning module / feedback loop: Tracks which responses users accepted, corrected, or ignored; refines synonyms, mappings, plan defaults.
- Retrieval module: Maps parsed intent + semantic layer → candidate data sources, business views, metrics; includes ranking/selection of best view.
- Plan generator: Creates a typed execution plan (AST) that orchestrates data retrieval, analytics, transforms, joins, filters — before any SQL or execution.
- Validator: Ensures the plan aligns with business rules, semantic definitions, governance policies, and rejects unsafe queries.
Why this layer matters: This layer ensures the system learns and adapts, retrieves the right business view, and extracts relevant data for the analytics engine. It bridges user intent and data execution.
3. Secured Retrieval & Execution Stage
At this stage the system runs the plan securely, efficiently, and produces the answer + narrative, with transparency and governance baked in.
Key components:
- Access control & policy enforcement: Role-based access, column/row masking, audit logs, query budget limits.
- Query compilation & optimization: Translate plan into optimized SQL (or appropriate dialect) with partition pruning, caching, reuse of results.
- Execution engine: Could be your warehouse (Snowflake, BigQuery etc) or specialized analytics engine. Should deliver deterministic, mathematically correct results. Tellius emphasises: “LLMs are not sufficient for deterministic analytics.” (Tellius)
- Narrative & explanation generator: Generate human-friendly explanation of results, include definitions, assumptions, lineage, and potential caveats.
- Monitoring, observability & audit trail: Track latency, bytes scanned, cache hits, failures, versioning of semantic models, drift detection.
-Fallback and retry logic: Handle missing data, ambiguous queries, system failures gracefully (ask user clarifications, suggest alternatives).
Why this matters: Execution is where the rubber meets the road. You must guarantee performance, correctness, security, and traceability — otherwise users will not trust the system.