VII. Limitations & Future Work
This section acknowledges the framework's limitations and outlines directions for future research.
7.1 Current Limitations
Conceptual Framework Only
This paper presents a design framework, not a production-tested system. Key unknowns include:
- Actual approval rates in practice
- Real-world gap detection accuracy
- Human reviewer throughput
- Long-term ontology quality
Experimental validation with real data is needed.
LLM Dependence
The framework relies heavily on LLM capabilities:
- Domain detection accuracy — Wrong domain selection leads to false gaps
- Gap classification — Distinguishing true gaps from errors
- Proposal quality — Generating semantically appropriate types
- Consistency — LLM outputs may vary across runs
As LLM capabilities evolve, framework effectiveness will change.
Human Bottleneck
Human review remains a potential bottleneck. If proposal generation outpaces review capacity, backlogs grow. This limits the framework's ability to handle extremely high-volume scenarios.
CORE Domain Scaling
The always-include CORE domain simplifies slicing but may become problematic if it grows large. Current design assumes CORE remains small (5-10 elements). Alternative approaches may be needed for larger core schemas.
7.2 Future Directions
Learning from Feedback
Rejection feedback could train improved proposal generation:
Feedback collected:
"COOPERATES_WITH rejected as
duplicate of PARTNERS_WITH"
Future behavior:
When similar pattern detected,
suggest using PARTNERS_WITH
instead of proposing new type
This could improve approval rates over time.
Auto-Merging Proposals
Similar pending proposals could be automatically merged:
PROP-042: Add ORGANIZATION
PROP-047: Add INTERNATIONAL_ORG
Similarity: 0.92
System suggests: Merge into single
proposal for human review
This reduces review burden for semantically equivalent proposals.
Confidence-Based Routing
Different confidence levels could receive different treatment:
- High confidence (>90%) — Fast-track review queue
- Medium (70-90%) — Standard review
- Low (<70%) — Detailed expert review
This optimizes reviewer time allocation.
Ontology Health Metrics
Dashboard showing ontology quality over time:
- Entity type usage frequency
- Relationship coverage
- Gaps detected per domain
- Proposal approval rates
- Schema growth trajectory
This would help identify areas needing attention.
Multi-Language Support
Extending the framework to handle documents in multiple languages, with appropriate entity resolution across languages.
Restrictive Evolution
Current focus is additive changes. Future work could address:
- Deprecating unused types
- Merging redundant types
- Tightening constraints
These require more careful consistency checking.
7.3 Research Questions
Open questions for future investigation:
- What approval rate indicates good proposal quality?
- How does domain granularity affect gap detection?
- Can rejection feedback measurably improve proposals?
- What is optimal human review batch size?
- How do different LLMs compare for this task?