Ontology in the Loop

A Framework for AI-Assisted Knowledge Graph Evolution

Niran Pravithana

VII. Limitations & Future Work

This section acknowledges the framework's limitations and outlines directions for future research.

7.1 Current Limitations

Conceptual Framework Only

This paper presents a design framework, not a production-tested system. Key unknowns include:

  • Actual approval rates in practice
  • Real-world gap detection accuracy
  • Human reviewer throughput
  • Long-term ontology quality

Experimental validation with real data is needed.

LLM Dependence

The framework relies heavily on LLM capabilities:

  • Domain detection accuracy — Wrong domain selection leads to false gaps
  • Gap classification — Distinguishing true gaps from errors
  • Proposal quality — Generating semantically appropriate types
  • Consistency — LLM outputs may vary across runs

As LLM capabilities evolve, framework effectiveness will change.

Human Bottleneck

Human review remains a potential bottleneck. If proposal generation outpaces review capacity, backlogs grow. This limits the framework's ability to handle extremely high-volume scenarios.

CORE Domain Scaling

The always-include CORE domain simplifies slicing but may become problematic if it grows large. Current design assumes CORE remains small (5-10 elements). Alternative approaches may be needed for larger core schemas.

7.2 Future Directions

Learning from Feedback

Rejection feedback could train improved proposal generation:

Feedback collected:
  "COOPERATES_WITH rejected as
   duplicate of PARTNERS_WITH"

Future behavior:
  When similar pattern detected,
  suggest using PARTNERS_WITH
  instead of proposing new type
      

This could improve approval rates over time.

Auto-Merging Proposals

Similar pending proposals could be automatically merged:

PROP-042: Add ORGANIZATION
PROP-047: Add INTERNATIONAL_ORG

Similarity: 0.92

System suggests: Merge into single
proposal for human review
      

This reduces review burden for semantically equivalent proposals.

Confidence-Based Routing

Different confidence levels could receive different treatment:

  • High confidence (>90%) — Fast-track review queue
  • Medium (70-90%) — Standard review
  • Low (<70%) — Detailed expert review

This optimizes reviewer time allocation.

Ontology Health Metrics

Dashboard showing ontology quality over time:

  • Entity type usage frequency
  • Relationship coverage
  • Gaps detected per domain
  • Proposal approval rates
  • Schema growth trajectory

This would help identify areas needing attention.

Multi-Language Support

Extending the framework to handle documents in multiple languages, with appropriate entity resolution across languages.

Restrictive Evolution

Current focus is additive changes. Future work could address:

  • Deprecating unused types
  • Merging redundant types
  • Tightening constraints

These require more careful consistency checking.

7.3 Research Questions

Open questions for future investigation:

  1. What approval rate indicates good proposal quality?
  2. How does domain granularity affect gap detection?
  3. Can rejection feedback measurably improve proposals?
  4. What is optimal human review batch size?
  5. How do different LLMs compare for this task?