VII. Limitations & Future Work

This section acknowledges the framework's limitations and outlines directions for future research.

7.1 Current Limitations

Conceptual Framework Only

This paper presents a design framework, not a production-tested system. Key unknowns include:

Actual approval rates in practice
Real-world gap detection accuracy
Human reviewer throughput
Long-term ontology quality

Experimental validation with real data is needed.

LLM Dependence

The framework relies heavily on LLM capabilities:

Domain detection accuracy — Wrong domain selection leads to false gaps
Gap classification — Distinguishing true gaps from errors
Proposal quality — Generating semantically appropriate types
Consistency — LLM outputs may vary across runs

As LLM capabilities evolve, framework effectiveness will change.

Human Bottleneck

Human review remains a potential bottleneck. If proposal generation outpaces review capacity, backlogs grow. This limits the framework's ability to handle extremely high-volume scenarios.

CORE Domain Scaling

The always-include CORE domain simplifies slicing but may become problematic if it grows large. Current design assumes CORE remains small (5-10 elements). Alternative approaches may be needed for larger core schemas.

7.2 Future Directions

Learning from Feedback

Rejection feedback could train improved proposal generation:

Feedback collected:
  "COOPERATES_WITH rejected as
   duplicate of PARTNERS_WITH"

Future behavior:
  When similar pattern detected,
  suggest using PARTNERS_WITH
  instead of proposing new type

This could improve approval rates over time.

Auto-Merging Proposals

Similar pending proposals could be automatically merged:

PROP-042: Add ORGANIZATION
PROP-047: Add INTERNATIONAL_ORG

Similarity: 0.92

System suggests: Merge into single
proposal for human review

This reduces review burden for semantically equivalent proposals.

Confidence-Based Routing

Different confidence levels could receive different treatment:

High confidence (>90%) — Fast-track review queue
Medium (70-90%) — Standard review
Low (<70%) — Detailed expert review

This optimizes reviewer time allocation.

Ontology Health Metrics

Dashboard showing ontology quality over time:

Entity type usage frequency
Relationship coverage
Gaps detected per domain
Proposal approval rates
Schema growth trajectory

This would help identify areas needing attention.

Multi-Language Support

Extending the framework to handle documents in multiple languages, with appropriate entity resolution across languages.

Restrictive Evolution

Current focus is additive changes. Future work could address:

Deprecating unused types
Merging redundant types
Tightening constraints

These require more careful consistency checking.

7.3 Research Questions

Open questions for future investigation:

What approval rate indicates good proposal quality?
How does domain granularity affect gap detection?
Can rejection feedback measurably improve proposals?
What is optimal human review batch size?
How do different LLMs compare for this task?