Core Components
The DevFlow backend is built with modular components that work together to provide code indexing, search, and AI features.
Code Parser (app/services/code_parser.py
)
The code parser analyzes source code files and extracts structured information.
Features
- Language Detection: Automatically detects programming languages from file extensions.
- Function Extraction: Identifies and extracts function definitions with metadata.
- Class Extraction: Finds class definitions and their methods.
- Documentation Parsing: Extracts docstrings and comments.
Supported Languages
- Python, JavaScript, TypeScript, Java, Go, C++, C#, Rust, PHP, and more.
Example Output
{
"functions": [
{
"name": "authenticate_user",
"code": "def authenticate_user(username, password):\n ...",
"start_line": 10,
"end_line": 25
}
],
"classes": [
{
"name": "UserManager",
"code": "class UserManager:\n ...",
"start_line": 30,
"end_line": 50
}
]
}
Code Embedder (app/services/embedder.py
)
Generates vector embeddings for code snippets using language models.
Features
- Semantic Embeddings: Creates high-dimensional vectors that capture code meaning.
- Batch Processing: Efficiently processes multiple code snippets.
- Model Selection: Configurable embedding models.
- Caching: Caches embeddings for improved performance.
Embedding Process
- Code is preprocessed and cleaned.
- Text is tokenized and sent to the embedding model.
- High-dimensional vectors are generated and stored.
Vector Store (app/db/vector_store.py
)
Manages the storage and retrieval of code embeddings using ChromaDB.
Features
- Persistent Storage: Embeddings are stored on disk for persistence.
- Similarity Search: Fast vector similarity search across the codebase.
- Metadata Storage: Stores additional information with each embedding.
- Collection Management: Organizes embeddings by project or workspace.
Operations
add_vectors()
: Store new embeddingssearch()
: Find similar code snippetsclear()
: Remove all embeddingsget_stats()
: Retrieve storage statistics
Metadata Store (app/db/metadata_store.py
)
Manages file and chunk metadata in a SQLite database.
Features
- File Tracking: Records all indexed files with timestamps.
- Chunk Management: Tracks individual code chunks and their relationships.
- Embedding Mapping: Links chunks to their vector embeddings.
- Feedback System: Stores user feedback on code chunks.
Database Schema
-- Files table
CREATE TABLE files (
id INTEGER PRIMARY KEY,
path TEXT UNIQUE,
created_at TIMESTAMP
);
-- Chunks table
CREATE TABLE chunks (
id TEXT PRIMARY KEY,
file_id INTEGER,
type TEXT,
name TEXT,
start_line INTEGER,
end_line INTEGER,
embedding_id TEXT
);
RAG Engine (app/services/rag_engine.py
)
Retrieval-Augmented Generation for AI-powered code answers.
Features
- Context Retrieval: Finds relevant code snippets for questions.
- AI Integration: Connects with OpenAI for natural language responses.
- Prompt Engineering: Optimized prompts for code-related questions.
- Response Generation: Creates context-aware answers.
Process
- User question is received.
- Relevant code chunks are retrieved from vector store.
- Context is formatted and sent to OpenAI.
- AI generates a comprehensive answer.
Cache Service (app/services/cache.py
)
Provides caching layer for improved performance.
Features
- In-Memory Caching: Fast access to frequently used data.
- TTL Support: Automatic expiration of cached items.
- Cache Invalidation: Smart cache management.
- Performance Monitoring: Track cache hit rates.
Cached Data
- Search results
- Embeddings
- File metadata
- API responses
Rate Limiter (app/services/rate_limiter.py
)
Prevents API abuse and ensures fair usage.
Features
- Request Limiting: Limits requests per time window.
- IP-based Tracking: Tracks requests by client IP.
- Configurable Limits: Adjustable rate limits via environment variables.
- Graceful Degradation: Returns appropriate error responses.
Configuration
RATE_LIMIT_REQUESTS = 100 # requests per window
RATE_LIMIT_WINDOW = 3600 # seconds
How Components Work Together
graph TD
A[API Request] --> B[Rate Limiter]
B --> C[Code Parser]
C --> D[Code Embedder]
D --> E[Vector Store]
E --> F[Metadata Store]
G[Search Query] --> H[RAG Engine]
H --> E
H --> I[OpenAI API]
I --> J[Response]
Indexing Flow
- API receives indexing request
- Code Parser analyzes source files
- Code Embedder generates vectors
- Vector Store stores embeddings
- Metadata Store records file/chunk info
Search Flow
- API receives search query
- RAG Engine retrieves relevant chunks
- Vector Store performs similarity search
- OpenAI generates answer (if AI query)
- API returns results
Configuration
All components can be configured via environment variables:
# Embedding Model
EMBEDDING_MODEL=text-embedding-ada-002
# Vector Store
CHROMA_DB_PATH=./data/vector_store
# Cache Settings
CACHE_TTL=3600
# Rate Limiting
RATE_LIMIT_REQUESTS=100
RATE_LIMIT_WINDOW=3600
The modular design allows for easy customization and extension of individual components.