Flavored Markdown Parser Service
Flavored Markdown Parser Service
User-Defined Extended Markdown Syntax
Core Extended Syntax Elements
The Flavored Markdown Parser supports several categories of extended markdown syntax that enhance content creation and cross-referencing capabilities:
1. Wikilinks/Backlinks
Basic Syntax:
[[path/to/file]]With Display Text: [[path/to/file|Display Text]]Examples: markdown
[[tooling/Software Development/Frameworks/Next.js]]
[[concepts/Data Augmentation Workflow|Data Workflows]]
[[organizations/Meta|Meta Platforms]] Features:
- Automatic file resolution across content collections
- Support for nested directory structures
- Display text override capability
- Automatic URL generation from frontmatter
- Integration with content management systems
2. Custom Callouts
Basic Syntax:
markdown
> [!<Class-Definition>] <Heading in Callout>
>
> <content> Supported Classes:
[!info]- Information callouts with blue styling[!warning]- Warning callouts with yellow/orange styling[!error]- Error callouts with red styling[!success]- Success callouts with green styling[!note]- General note callouts with neutral styling[!tip]- Tip callouts with helpful styling
Examples:
markdown
> [!info] Integration Notice
>
> This component integrates with the shared authentication service.
> [!warning] Breaking Changes
>
> Version 2.0 introduces breaking changes to the API interface.
> [!tip] Performance Optimization
>
> Use the `--no-cache` flag for development builds. 3. Content Directives
Leaf Directive Syntax:
::directive-name{attribute="value"}Container Directive Syntax: markdown
:::directive-name
content
::: Supported Directives:
::figma-embed{src="url"}- Embed Figma objects:::tool-showcase- Display tool galleries:::slides- Embed slide presentations::mermaid- Render Mermaid diagrams::youtube{id="video-id"}- Embed YouTube videos
4. Content Collections Integration
Tag References:
tag: [[concept/Tag-Name]]Organization Links: [[organizations/Company-Name]]Tool References: [[tooling/Category/Tool-Name]]5. Specialized Code Blocks
Tool Gallery Syntax:
markdown
```toolingGallery small
- tag: [[AI-Toolkit]]
- [[tooling/AI-Toolkit/OpenAI]]
- [[tooling/AI-Toolkit/Anthropic]] text
**Mermaid Diagrams**:
```markdown
```mermaid
graph TD
A[Start] --> B[Process]
B --> C[End] text
## 1. Executive Summary
The Flavored Markdown Parser Service is a comprehensive content processing system designed specifically for the Augment-It platform's rich content ecosystem. It extends standard markdown with powerful features including bidirectional linking (wikilinks), styled callouts, interactive directives, and content collection integration. Built on the robust remark/rehype ecosystem, it provides semantic parsing, content validation, link resolution, and component rendering capabilities that enable a sophisticated knowledge management and content creation workflow.
## 2. Background & Motivation
### Problem Statement
The Augment-It platform requires advanced markdown processing capabilities that go beyond standard markdown to support knowledge management, content cross-referencing, and interactive component embedding across thousands of interconnected documents.
### Current Limitations
- **Standard Markdown Constraints**: Basic markdown lacks semantic linking and content organization features
- **Manual Cross-Referencing**: No automated way to link related content across collections
- **Static Content**: Limited ability to embed dynamic or interactive components
- **Inconsistent Styling**: No standardized way to create styled content blocks
- **Content Isolation**: Documents exist in isolation without semantic relationships
### Why This Solution
- **Knowledge Graph Integration**: Enables bidirectional linking and content discovery
- **Component Ecosystem**: Supports rich, interactive content through directives
- **Content Collections**: Seamless integration with organized content taxonomies
- **Extensible Architecture**: Plugin-based system for custom syntax extensions
- **Performance Optimized**: Efficient processing of large content repositories
## 3. Goals & Non-Goals
### Goals
1. **Extended Syntax Support**: Comprehensive parsing of wikilinks, callouts, and directives
2. **Link Resolution**: Automatic resolution and validation of internal content links
3. **Component Integration**: Seamless embedding of interactive components via directives
4. **Content Collections**: Deep integration with taxonomized content organization
5. **Performance**: Efficient processing of large content repositories
6. **Extensibility**: Plugin architecture for custom syntax extensions
7. **Error Handling**: Graceful handling of malformed syntax and missing references
### Non-Goals
1. **WYSIWYG Editing**: Focus on parsing, not visual editing interfaces
2. **Real-time Collaboration**: Batch processing focus, not collaborative editing
3. **Version Control**: Markdown processing only, not content versioning
4. **Content Management**: Parsing service, not full CMS functionality
## 4. Technical Design
### High-Level Architecture
```mermaid
graph TD
A[Markdown Input] --> B[Flavored Markdown Parser]
B --> C[Syntax Analyzer]
C --> D[Wikilink Resolver]
C --> E[Callout Processor]
C --> F[Directive Handler]
D --> G[Content Collections]
E --> H[Styled Components]
F --> I[Interactive Components]
G --> J[Link Validation]
H --> K[Rendered Output]
I --> K
J --> K
L[Remark Plugins] --> C
M[Rehype Plugins] --> K
N[Component Registry] --> F Core Components
1. Extended Syntax Parser
- Responsibility: Parse extended markdown syntax elements
- Features:
- Wikilink pattern recognition and parsing
- Custom callout block processing
- Directive syntax analysis
- Content collection reference resolution
2. Link Resolution Engine
- Responsibility: Resolve and validate internal content links
- Features:
- Cross-collection link resolution
- Automatic URL generation from frontmatter
- Broken link detection and reporting
- Display text override handling
3. Directive Processing System
- Responsibility: Transform directives into renderable components
- Features:
- Component registry lookup
- Attribute parsing and validation
- Authentication handling for external services
- Error fallback rendering
4. Callout Styling Engine
- Responsibility: Process custom callout blocks with styling
- Features:
- Multiple callout types (info, warning, error, etc.)
- Custom icon and color schemes
- Nested content support
- Responsive design integration
API Specifications
Primary Interfaces
typescript
interface FlavoredMarkdownOptions {
enableWikilinks?: boolean; // Default: true
enableCallouts?: boolean; // Default: true
enableDirectives?: boolean; // Default: true
strictLinkValidation?: boolean; // Default: false
baseUrl?: string; // For absolute URL generation
contentCollections?: string[]; // Available collections
componentRegistry?: ComponentRegistry;
customSyntax?: CustomSyntaxPlugin[];
}
interface ParseResult {
success: boolean;
ast?: any; // Markdown AST
html?: string; // Rendered HTML
metadata: {
wikilinks: WikilinkInfo[];
callouts: CalloutInfo[];
directives: DirectiveInfo[];
errors: ParseError[];
warnings: ParseWarning[];
processingTime: number;
};
}
interface WikilinkInfo {
originalText: string;
filePath: string;
displayText?: string;
resolved: boolean;
resolvedUrl?: string;
collection?: string;
line: number;
column: number;
}
interface CalloutInfo {
type: 'info' | 'warning' | 'error' | 'success' | 'note' | 'tip';
title?: string;
content: string;
line: number;
}
interface DirectiveInfo {
type: 'leaf' | 'container';
name: string;
attributes: Record<string, any>;
content?: string;
component?: string;
resolved: boolean;
line: number;
}
// Main parsing functions
function parseFlavoredMarkdown(content: string, options?: FlavoredMarkdownOptions): Promise<ParseResult>;
function resolveWikilinks(content: string, collections: ContentCollection[]): Promise<WikilinkInfo[]>;
function validateLinks(content: string, options?: FlavoredMarkdownOptions): Promise<LinkValidationResult>;
function extractDirectives(content: string): DirectiveInfo[];
function renderToHtml(content: string, options?: FlavoredMarkdownOptions): Promise<string>; Core Implementation
typescript
// Based on existing implementations from AstroMarkdown.astro and remark plugins
class FlavoredMarkdownParser {
private options: Required<FlavoredMarkdownOptions>;
private remarkProcessor: any;
private rehypeProcessor: any;
private componentRegistry: ComponentRegistry;
private contentCollections: Map<string, any[]>;
constructor(options: FlavoredMarkdownOptions = {}) {
this.options = {
enableWikilinks: true,
enableCallouts: true,
enableDirectives: true,
strictLinkValidation: false,
baseUrl: '',
contentCollections: [],
componentRegistry: new ComponentRegistry(),
customSyntax: [],
...options
};
this.initializeProcessors();
}
private initializeProcessors() {
// Initialize remark processor with plugins
this.remarkProcessor = remark()
.use(remarkGfm) // GitHub Flavored Markdown
.use(remarkFrontmatter) // YAML frontmatter
.use(remarkDirective) // Directive support
.use(this.remarkWikilinks.bind(this)) // Custom wikilink plugin
.use(this.remarkCallouts.bind(this)) // Custom callout plugin
.use(this.remarkDirectiveToComponent.bind(this)); // Custom directive plugin
// Initialize rehype processor
this.rehypeProcessor = rehype()
.use(rehypeRaw) // Allow raw HTML
.use(rehypeStringify); // Convert to HTML
}
// Wikilink processing plugin
private remarkWikilinks() {
return (tree: any) => {
visit(tree, 'text', (node: any, index: number, parent: any) => {
if (!this.options.enableWikilinks) return;
const wikilinkRegex = /\[\[([^\]|]+)(?:\|([^\]]+))?\]\]/g;
let match;
const replacements = [];
while ((match = wikilinkRegex.exec(node.value)) !== null) {
const [fullMatch, filePath, displayText] = match;
const resolvedLink = this.resolveWikilink(filePath, displayText);
replacements.push({
start: match.index,
end: match.index + fullMatch.length,
replacement: resolvedLink
});
}
if (replacements.length > 0) {
this.applyTextReplacements(node, parent, index, replacements);
}
});
};
}
// Callout processing plugin
private remarkCallouts() {
return (tree: any) => {
visit(tree, 'blockquote', (node: any) => {
if (!this.options.enableCallouts) return;
// Check if this is a callout blockquote
const firstChild = node.children[0];
if (firstChild && firstChild.type === 'paragraph') {
const firstText = this.getTextContent(firstChild);
const calloutMatch = firstText.match(/^\[!([^\]]+)\]\s*(.*)/);
if (calloutMatch) {
const [, type, title] = calloutMatch;
this.transformToCallout(node, type.toLowerCase(), title);
}
}
});
};
}
// Directive processing plugin
private remarkDirectiveToComponent() {
return (tree: any) => {
visit(tree, ['leafDirective', 'containerDirective'], (node: any) => {
if (!this.options.enableDirectives) return;
const directiveName = node.name;
const component = this.componentRegistry.getComponent(directiveName);
if (component) {
// Transform directive to component call
node.type = 'html';
node.value = this.renderDirectiveAsHtml(node, component);
} else {
// Log warning for unknown directive
console.warn(`Unknown directive: ${directiveName}`);
}
});
};
}
// Wikilink resolution
private resolveWikilink(filePath: string, displayText?: string): any {
// Clean up the file path
const cleanPath = filePath.trim();
const linkText = displayText || cleanPath.split('/').pop() || cleanPath;
// Try to resolve against content collections
const resolvedUrl = this.findInContentCollections(cleanPath);
if (resolvedUrl) {
return {
type: 'link',
url: resolvedUrl,
children: [{ type: 'text', value: linkText }]
};
} else {
// Return broken link with warning styling
return {
type: 'html',
value: `<span class="broken-link" title="Link not found: ${cleanPath}">${linkText}</span>`
};
}
}
// Content collection search
private findInContentCollections(filePath: string): string | null {
for (const [collectionName, items] of this.contentCollections.entries()) {
for (const item of items) {
if (item.id === filePath || item.slug === filePath) {
return this.generateUrl(collectionName, item);
}
}
}
return null;
}
// Callout transformation
private transformToCallout(node: any, type: string, title: string) {
// Extract content after the title
const content = this.extractCalloutContent(node);
// Transform to custom callout HTML
node.type = 'html';
node.value = `
<div class="callout callout-${type}">
${title ? `<div class="callout-title">
<span class="callout-icon">${this.getCalloutIcon(type)}</span>
<span class="callout-title-text">${title}</span>
</div>` : ''}
<div class="callout-content">
${content}
</div>
</div>
`;
}
// Directive rendering
private renderDirectiveAsHtml(node: any, component: ComponentInfo): string {
const attributes = this.parseDirectiveAttributes(node.attributes || {});
// Handle different directive types
if (node.type === 'leafDirective') {
return `<${component.tagName} ${this.attributesToString(attributes)} />`;
} else if (node.type === 'containerDirective') {
const content = this.getTextContent(node);
return `<${component.tagName} ${this.attributesToString(attributes)}>${content}</${component.tagName}>`;
}
return '';
}
// Main parsing method
public async parse(content: string): Promise<ParseResult> {
const startTime = Date.now();
const metadata = {
wikilinks: [],
callouts: [],
directives: [],
errors: [],
warnings: [],
processingTime: 0
};
try {
// Process through remark pipeline
const remarkResult = await this.remarkProcessor.process(content);
// Extract metadata during processing
this.extractMetadata(remarkResult, metadata);
// Convert to HTML if needed
const rehypeResult = await this.rehypeProcessor.process(remarkResult);
metadata.processingTime = Date.now() - startTime;
return {
success: true,
ast: remarkResult,
html: String(rehypeResult),
metadata
};
} catch (error) {
metadata.errors.push({
message: error instanceof Error ? error.message : 'Unknown parsing error',
line: -1,
column: -1,
code: 'PARSE_ERROR',
severity: 'error'
});
return {
success: false,
metadata
};
}
}
// Content collection integration
public loadContentCollections(collections: Record<string, any[]>) {
this.contentCollections = new Map(Object.entries(collections));
}
// Custom syntax plugin registration
public registerCustomSyntax(plugin: CustomSyntaxPlugin) {
this.options.customSyntax.push(plugin);
this.reinitializeProcessors();
}
}
// Supporting interfaces and classes
class ComponentRegistry {
private components = new Map<string, ComponentInfo>();
register(name: string, component: ComponentInfo) {
this.components.set(name, component);
}
getComponent(name: string): ComponentInfo | null {
return this.components.get(name) || null;
}
}
interface ComponentInfo {
tagName: string;
attributes: Record<string, any>;
requiredAuth?: boolean;
}
interface CustomSyntaxPlugin {
name: string;
type: 'remark' | 'rehype';
plugin: any;
options?: any;
} Integration Points
1. Content Management System
- Content Collections: Integration with taxonomized content organization
- Link Resolution: Automatic resolution of internal content references
- Metadata Extraction: Extract and index linked content for discovery
2. Component System
- Directive Registry: Register and manage available directives
- Authentication Integration: Handle service authentication for external embeds
- Fallback Rendering: Graceful degradation for missing components
3. Development Tools
- Syntax Highlighting: Enhanced highlighting for extended syntax
- Link Validation: Real-time validation of internal links
- Error Reporting: Detailed error messages with line/column information
Error Handling
Expected Error Cases
- Link Resolution Errors
- Broken internal links
- Missing content collections
- Invalid file paths
- Circular reference detection
- Directive Processing Errors
- Unknown directive names
- Missing required attributes
- Authentication failures
- Component rendering errors
- Syntax Parsing Errors
- Malformed wikilink syntax
- Invalid callout formatting
- Nested directive conflicts
- Unsupported markdown combinations
Error Recovery Strategies
- Graceful Degradation: Render fallback content for failed components
- Link Preservation: Maintain original link text when resolution fails
- Warning Generation: Provide detailed warnings without breaking parsing
- Partial Success: Continue processing valid content despite errors
Performance Considerations
- Lazy Loading: Load content collections and components on-demand
- Caching: Cache resolved links and parsed content
- Streaming: Process large documents in chunks
- Parallel Processing: Resolve links and directives concurrently
- Memory Management: Efficient AST processing and cleanup
Security Considerations
- Link Validation: Prevent malicious internal link exploitation
- Component Sandboxing: Secure rendering of external content
- Authentication: Secure handling of service credentials
- Input Sanitization: Prevent XSS through malformed syntax
5. Implementation Plan
Phase 1: Core Parsing Infrastructure (Week 1-2)
- Basic Parser Setup
- Remark/Rehype pipeline configuration
- Extended syntax detection and parsing
- AST manipulation utilities
- Wikilink Processing
- Pattern recognition and parsing
- Basic link resolution
- Content collection integration
Phase 2: Advanced Features (Week 3-4)
- Callout System
- Multiple callout types with styling
- Nested content support
- Icon and theme integration
- Directive Processing
- Component registry system
- Authentication handling
- Error fallback rendering
Phase 3: Integration & Optimization (Week 5)
- Performance Optimization
- Caching strategies
- Parallel processing
- Memory optimization
- Developer Experience
- Error reporting improvements
- Debugging tools
- Documentation generation
Dependencies
- Internal: Content collections, component registry, authentication services
- External: Remark/Rehype ecosystem, content processing libraries
- Development: TypeScript 5+, Jest for testing, performance profiling
Testing Strategy
- Unit Tests
- Syntax parsing accuracy
- Link resolution correctness
- Component rendering validation
- Error handling scenarios
- Integration Tests
- End-to-end content processing
- Content collection integration
- Component system integration
- Performance benchmarks
- Content Tests
- Real-world markdown processing
- Large repository handling
- Cross-reference validation
6. Alternatives Considered
MDX Processing
- MDX: JSX in markdown with component support
- Pros: Rich component integration, React ecosystem
- Cons: Complex build process, JSX syntax learning curve
- Decision: Directive-based approach provides similar benefits with simpler syntax
Wiki-style Systems
- MediaWiki Syntax: Established wiki linking patterns
- Pros: Proven syntax, extensive features
- Cons: Complex syntax, not markdown-compatible
- Decision: Simplified wikilink syntax maintains markdown compatibility
Notion-style Blocks
- Block-based Editing: Structured content blocks
- Pros: Rich editing experience, structured data
- Cons: Complex implementation, not text-based
- Decision: Markdown-first approach with directive enhancements
7. Open Questions
- Syntax Evolution: How should we handle syntax changes across existing content?
- Performance Scaling: What are the limits for real-time processing of large repositories?
- Plugin Ecosystem: Should we support third-party syntax extensions?
- Caching Strategy: How should we cache parsed content and resolved links?
- Collaboration: How should multiple users handle conflicting link updates?
- Mobile Optimization: Should we provide mobile-specific rendering optimizations?
8. Appendix
Glossary
- Wikilink: Double-bracketed link syntax for internal content references
- Directive: Special syntax for embedding components or interactive content
- Callout: Styled content block for highlighting information
- Content Collection: Organized group of related content (tools, concepts, etc.)
- AST: Abstract Syntax Tree representing parsed markdown structure
References
Revision History
- v0.1.0 (2025-08-12): Initial comprehensive specification with user-defined syntax
- v0.0.0.1 (2025-08-09): Initial file creation