Customization

The DevFlow backend is highly customizable to suit different use cases and requirements.

Supported Languages

The backend supports indexing and parsing for multiple programming languages:

Fully Supported

Python (.py) - Functions, classes, docstrings
JavaScript (.js) - Functions, classes, JSDoc
TypeScript (.ts) - Functions, classes, interfaces, types
Java (.java) - Methods, classes, interfaces
Go (.go) - Functions, structs, methods
C++ (.cpp, .cc, .cxx) - Functions, classes
C# (.cs) - Methods, classes, interfaces
Rust (.rs) - Functions, structs, traits
PHP (.php) - Functions, classes
C (.c) - Functions, structs

Language Mapping

LANGUAGE_MAP = {
    'py': 'python', 'js': 'javascript', 'ts': 'typescript', 
    'java': 'java', 'go': 'go', 'rb': 'ruby',
    'cpp': 'cpp', 'cc': 'cpp', 'cxx': 'cpp', 'cs': 'csharp', 
    'kt': 'kotlin', 'php': 'php', 'c': 'c',
    'rs': 'rust', 'scala': 'scala', 'swift': 'swift'
}

Embedding Models

Default Model

text-embedding-ada-002: OpenAI's recommended embedding model
Dimensions: 1536
Performance: Fast and accurate for code embeddings

Alternative Models

You can configure different embedding models via environment variables:

# OpenAI Models
EMBEDDING_MODEL=text-embedding-ada-002
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_MODEL=text-embedding-3-large

# Custom Models (if supported)
EMBEDDING_MODEL=custom-model-endpoint

Advanced Configuration

Environment Variables

# API Configuration
API_HOST=127.0.0.1
API_PORT=8000
DEBUG=false

# OpenAI Configuration
OPENAI_API_KEY=your_api_key
OPENAI_MODEL=gpt-4.1-nano
OPENAI_TOKEN_LIMIT=2048
OPENAI_TIMEOUT=30

# Embedding Configuration
EMBEDDING_MODEL=text-embedding-ada-002
EMBEDDING_BATCH_SIZE=100
EMBEDDING_TIMEOUT=60

# Database Configuration
CHROMA_DB_PATH=./data/vector_store
METADATA_DB_PATH=./data/metadata.db

# Cache Configuration
CACHE_TTL=3600
CACHE_MAX_SIZE=1000

# Rate Limiting
RATE_LIMIT_REQUESTS=100
RATE_LIMIT_WINDOW=3600

# Logging
LOG_LEVEL=INFO
LOG_FORMAT=json

# Workspace
WORKSPACE_ROOT=/path/to/workspace

Performance Tuning

# Increase batch size for faster indexing
EMBEDDING_BATCH_SIZE=200

# Adjust cache settings
CACHE_TTL=7200
CACHE_MAX_SIZE=2000

# Modify rate limits
RATE_LIMIT_REQUESTS=500
RATE_LIMIT_WINDOW=3600

Extending the Backend

Adding New Languages

Update Language Map:

# In app/api/v1/endpoints.py
LANGUAGE_MAP['new_ext'] = 'new_language'

Extend Code Parser:

# In app/services/code_parser.py
def parse_new_language(self, code: str):
    # Implement language-specific parsing
    pass

Add Language Support:

def extract_functions_new_language(self, parsed_code, source_code):
    # Extract functions for the new language
    pass

Custom Embedding Models

Create Custom Embedder:

class CustomEmbedder:
    def embed(self, texts: List[str]) -> List[List[float]]:
        # Implement custom embedding logic
        pass

Update Configuration:

# In app/services/embedder.py
if settings.EMBEDDING_MODEL == "custom":
    embedder = CustomEmbedder()

Custom Search Logic

Extend Vector Store:

class CustomVectorStore(VectorStore):
    def custom_search(self, query: str, **kwargs):
        # Implement custom search logic
        pass

Add Custom Endpoints:

@router.post("/custom_search")
async def custom_search(request: CustomSearchRequest):
    # Implement custom search endpoint
    pass

Deployment Customization

Docker Configuration

# Custom Dockerfile
FROM python:3.9-slim

# Install custom dependencies
RUN pip install custom-package

# Copy custom configuration
COPY custom_config.py /app/

# Set custom environment
ENV CUSTOM_SETTING=value

Production Settings

# Production environment variables
DEBUG=false
LOG_LEVEL=WARNING
API_HOST=0.0.0.0
API_PORT=8000

# Security settings
ENABLE_AUTH=true
API_KEY=your_secure_api_key

# Performance settings
WORKERS=4
MAX_CONNECTIONS=1000

Monitoring and Logging

Custom Logging

# Custom logging configuration
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('devflow.log'),
        logging.StreamHandler()
    ]
)

Metrics Collection

# Add custom metrics
from prometheus_client import Counter, Histogram

search_requests = Counter('search_requests_total', 'Total search requests')
search_duration = Histogram('search_duration_seconds', 'Search request duration')

Best Practices

Performance

Use appropriate batch sizes for embeddings
Configure caching for frequently accessed data
Monitor memory usage with large codebases
Use SSD storage for vector database

Security

Keep API keys secure and rotate regularly
Use HTTPS in production
Implement proper authentication
Validate all input data

Maintenance

Regularly update dependencies
Monitor disk space for vector storage
Backup metadata database
Clear old embeddings periodically

The backend's modular architecture makes it easy to customize and extend for specific requirements.