Ontology in the Loop - AI-Assisted Knowledge Graph Evolution

Abstract

Knowledge graphs require continuous ontology maintenance as new data reveals concepts not captured in the original schema. Manual evolution is labor-intensive; fully automated approaches risk inconsistencies. This paper presents a Human-in-the-Loop framework for AI-assisted ontology evolution, where large language models detect schema gaps and generate structured proposals for human review. The framework employs domain-based schema slicing to manage LLM context efficiently and embedding-based validation to prevent redundant proposals. We describe the complete workflow from gap detection through human approval to batch application, balancing automation efficiency with human oversight for quality control.

I. Introduction

1.1 Motivation

Knowledge graphs have become essential for representing structured information in finance, healthcare, and e-commerce. A knowledge graph consists of entities (nodes representing real-world objects), relationships (edges connecting entities), and an ontology (schema defining valid types).

The ontology determines what can be represented. If a concept exists in the real world but not in the ontology, it cannot be captured in the knowledge graph. This creates a fundamental limitation: as domains evolve and new data arrives, ontologies become outdated.

Core Problem: When new data doesn't fit the current ontology, organizations must either discard valuable information, force-fit it incorrectly, or manually update the schema—a slow and expensive process.

1.2 A Concrete Example

Consider a financial knowledge graph with this simple ontology:

Entity types: COMPANY, COUNTRY, REGION
Relationship: MEMBER_OF (Country → Region)

When processing a document stating:

"Saudi Arabia is a founding member of OPEC since 1960"

The system encounters a problem:

"OPEC" is not a COMPANY, COUNTRY, or REGION—it's an international organization
The membership relationship between a Country and an Organization doesn't exist

Without schema evolution, this information is lost. The ontology evolution problem is: how can we systematically identify such gaps and update the schema while maintaining data quality?

1.3 Existing Approaches

Current approaches have significant limitations:

Manual Evolution: Domain experts review data and update ontologies by hand. This ensures quality but doesn't scale—experts cannot review every document, and schema updates become bottlenecks.

Fully Automated Evolution: Machine learning systems modify ontologies without human oversight. This scales well but risks creating redundant types, semantic inconsistencies, and errors that propagate undetected.

1.4 Our Approach

We propose a middle path: AI-assisted evolution with human-in-the-loop approval. The key insight is to separate proposal generation (which AI can do at scale) from decision-making (which requires human judgment).

AI-assisted ontology evolution workflow — Fig. 1. Core workflow: AI generates proposals, humans review, system applies approved changes.

This combines AI's ability to process documents at scale with human expertise for quality control. The hypothesis is simple:

AI + Human Review > Pure Manual or Pure Automation

1.5 Technical Challenges

Implementing this approach requires solving several problems:

Challenge 1: Context Limits. Large language models have bounded context windows. A production ontology with hundreds of types exceeds these limits. We address this through domain-based schema slicing—partitioning the ontology into semantic domains and providing only relevant portions to the LLM.

Challenge 2: Gap vs. Error. When extraction fails, it might indicate a true schema gap, a data error, or simply wrong domain selection. The system must classify failures accurately to avoid generating spurious proposals.

Challenge 3: Proposal Conflicts. Different documents may trigger similar proposals. Without validation, the system might propose "PARTNERS_WITH" and "COOPERATES_WITH" for the same concept. We use embedding-based similarity detection to identify duplicates.

Challenge 4: Review Backlog. Humans aren't always available. Proposals accumulate between review sessions, requiring batch processing while maintaining consistency.

1.6 Contributions

This paper makes four contributions:

Framework Design — A complete architecture for AI-assisted ontology evolution with human oversight
Domain-Based Slicing — A method for managing LLM context through ontology partitioning
Proposal Workflow — A structured process from gap detection through human review to batch application
Embedding Validation — Similarity-based duplicate detection for proposal quality

1.7 Scope

This work presents a conceptual framework with implementation guidance. We note that:

This is a framework design, not a production system
Experimental validation is planned for future work
Human review remains essential—this is not fully autonomous

The goal is to establish foundations for AI-assisted knowledge graph maintenance.

1.8 Paper Organization

Section II provides background on knowledge graphs and related work. Section III presents system architecture. Sections IV and V detail gap detection and human workflow. Section VI discusses implementation. Sections VII and VIII cover limitations and conclusions.