Rover MCP Metrics
Rover MCP Metrics
Section titled “Rover MCP Metrics”This document describes the OpenTelemetry metrics implemented in the rover-mcp service.
Overview
Section titled “Overview”Rover MCP uses OpenTelemetry (OTel) for comprehensive metrics collection and observability. All metrics are exported via OTLP and can be consumed by monitoring systems like Prometheus, Grafana, or AWS CloudWatch.
Configuration
Section titled “Configuration”Metrics are controlled via environment variables:
TELEMETRY_ENABLED=true # Set to "false" to disable telemetryENVIRONMENT=dev # Environment name (local, dev, stage, prod)OpenTelemetry is initialized automatically on startup using the fetchotel package for consistent configuration across Fetch services.
Available Metrics
Section titled “Available Metrics”HTTP Request Metrics
Section titled “HTTP Request Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
http_request_duration | Histogram | Duration of HTTP requests in milliseconds | method, path, status_code, service_type |
http_request_count | Counter | Total number of HTTP requests | method, path, status_code, service_type |
http_response_count | Counter | Total number of HTTP responses | method, path, status_code, service_type |
MCP Tool Metrics
Section titled “MCP Tool Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
mcp_tool_call_count | Counter | Total number of MCP tool calls | tool_name, service_type |
mcp_tool_success_count | Counter | Total number of successful MCP tool calls | tool_name, service_type |
mcp_tool_failure_count | Counter | Total number of failed MCP tool calls | tool_name, service_type |
mcp_tool_duration | Histogram | Duration of MCP tool calls in milliseconds | tool_name, service_type |
API Client Metrics
Section titled “API Client Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
api_call_count | Counter | Total number of API calls to external services | api_service, operation, service_type |
api_call_duration | Histogram | Duration of API calls in milliseconds | api_service, operation, service_type |
api_call_success_count | Counter | Total number of successful API calls | api_service, operation, service_type |
api_call_failure_count | Counter | Total number of failed API calls | api_service, operation, service_type |
Cache Metrics
Section titled “Cache Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
cache_hit_count | Counter | Total number of cache hits | operation, service_type |
cache_miss_count | Counter | Total number of cache misses | operation, service_type |
cache_eviction_count | Counter | Total number of cache evictions | operation, service_type |
cache_load_duration | Histogram | Duration of cache load operations in milliseconds | operation, service_type |
Business Logic Metrics
Section titled “Business Logic Metrics”Offer Search Metrics
Section titled “Offer Search Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
offer_search_count | Counter | Total number of offer searches | service_type |
offer_search_duration | Histogram | Duration of offer searches in milliseconds | service_type |
offers_returned_count | Counter | Total number of offers returned from searches | service_type |
Location Search Metrics
Section titled “Location Search Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
location_search_count | Counter | Total number of location-based searches | service_type |
location_search_duration | Histogram | Duration of location searches in milliseconds | service_type |
retailers_found_count | Counter | Total number of retailers found in location searches | service_type |
Web Fetch Metrics
Section titled “Web Fetch Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
web_fetch_count | Counter | Total number of web fetch operations | service_type |
web_fetch_duration | Histogram | Duration of web fetch operations in milliseconds | service_type |
web_fetch_bytes_count | Counter | Total bytes fetched from web pages | service_type |
Web Search Metrics
Section titled “Web Search Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
web_search_count | Counter | Total number of web search operations | service_type |
web_search_duration | Histogram | Duration of web search operations in milliseconds | service_type |
web_search_results_count | Counter | Total number of results returned from web searches | service_type |
Token Usage Metrics
Section titled “Token Usage Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
tool_result_tokens_count | Counter | Total number of tokens in tool results | tool_name, service_type |
tool_result_bytes_count | Counter | Total bytes in tool results | tool_name, service_type |
token_count_errors_count | Counter | Total errors during token counting | tool_name, error_type, service_type |
These metrics help track the token usage of MCP tool results, which is essential for understanding OpenAI API costs when tool results are sent to LLMs. Token counting uses the tiktoken-go library with GPT-4o Mini encoding for accurate token estimation that matches OpenAI’s billing.
Purchase History Metrics
Section titled “Purchase History Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
purchase_history_count | Counter | Total number of purchase history requests | service_type |
purchase_history_duration | Histogram | Duration of purchase history requests in milliseconds | service_type |
purchases_found_count | Counter | Total number of purchases found in history | service_type |
Product Search Metrics
Section titled “Product Search Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
product_search_count | Counter | Total number of product searches | service_type |
product_search_duration | Histogram | Duration of product searches in milliseconds | service_type |
products_found_count | Counter | Total number of products found in searches | service_type |
Usage Metrics
Section titled “Usage Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
requests_by_environment | Counter | Total requests by environment | environment, service_type |
requests_by_user_id | Counter | Total requests by user ID | user_id, service_type |
requests_by_tool | Counter | Total requests by MCP tool | tool_name, service_type |
Error Metrics
Section titled “Error Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
error_count | Counter | Total number of errors | error_type, service_type |
validation_error_count | Counter | Total number of validation errors | error_type, service_type |
marshal_error_count | Counter | Total number of marshaling errors | error_type, service_type |
api_error_count | Counter | Total number of API errors | error_type, service_type |
auth_error_count | Counter | Total number of authentication errors | error_type, service_type |
LLM Feedback Metrics
Section titled “LLM Feedback Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
llm_feedback_count | Counter | Total number of LLM feedback messages | service_type |
Server Lifecycle Metrics
Section titled “Server Lifecycle Metrics”| Metric Name | Type | Description | Attributes |
|---|---|---|---|
server_startup_duration | Histogram | Duration of server startup in milliseconds | service_type |
server_shutdown_duration | Histogram | Duration of server shutdown in milliseconds | service_type |
Metric Attributes
Section titled “Metric Attributes”Common Attributes
Section titled “Common Attributes”service_type: Always set to"mcp"for rover-mcperror_type: Categories includevalidation,marshal,api,authenvironment: Deployment environment (local,dev,stage,prod)
Tool Attributes
Section titled “Tool Attributes”tool_name: MCP tool names (search_offers,search_nearby_offers,llm_feedback,fetch_webpage,web_search,get_user_purchase_history,search_products)
API Service Attributes
Section titled “API Service Attributes”api_service: External service names (offer_guardian,lidar,offer_search,neli,fido_product,fido_search,purchase_history,web_fetch,web_search)
Business Logic Attributes
Section titled “Business Logic Attributes”user_id: User identifier (when available)operation: Specific operation within a service
Helper Functions
Section titled “Helper Functions”The metrics package provides convenient helper functions:
// Record HTTP request metricsmetrics.RecordHTTPRequest(ctx, "POST", "/", 200, 150.5)
// Record MCP tool call metricsmetrics.RecordMCPToolCall(ctx, "search_offers", true, 2500.0)
// Record API call metricsmetrics.RecordAPICall(ctx, "offer_search", "search_eligible_offers", true, 1500.0)
// Record cache operation metricsmetrics.RecordCacheOperation(ctx, "hit", 5.2)
// Record offer search metricsmetrics.RecordOfferSearch(ctx, 2000.0, 15)
// Record location search metricsmetrics.RecordLocationSearch(ctx, 1800.0, 3)
// Record web fetch metricsmetrics.RecordWebFetch(ctx, 3000.0, 1024)
// Record web search metricsmetrics.RecordWebSearch(ctx, 2200.0, 10)
// Record purchase history metricsmetrics.RecordPurchaseHistory(ctx, 1200.0, 25)
// Record product search metricsmetrics.RecordProductSearch(ctx, 1500.0, 8)
// Record business metricsmetrics.RecordBusinessMetrics(ctx, "prod", "user123")
// Record error metricsmetrics.RecordError(ctx, "validation")
// Record LLM feedbackmetrics.RecordLLMFeedback(ctx)
// Record server lifecycle metricsmetrics.RecordServerStartup(ctx, startTime)metrics.RecordServerShutdown(ctx, shutdownStartTime)
// Record tool result token usage (automatically called in tool handlers)metrics.RecordToolResult(ctx, "search_offers", toolResult)
// Count tokens in text (uses tiktoken-go for GPT-4o Mini)tokenCount := metrics.CountTokens("Sample text to count tokens")
// Count tokens in structured datatokenCount, err := metrics.CountToolResultTokens(toolResult)Integration Points
Section titled “Integration Points”1. HTTP Middleware
Section titled “1. HTTP Middleware”HTTP request/response metrics are automatically recorded in the LoggingMiddleware in pkg/logger/middleware.go.
2. MCP Tool Handlers
Section titled “2. MCP Tool Handlers”All MCP tool handlers record metrics for:
- Tool call start/completion
- Success/failure rates
- Duration timing
- Business metrics (user activity, tool usage)
- Token usage (result size in tokens and bytes)
3. API Client Integration
Section titled “3. API Client Integration”API call metrics are recorded during external service calls to:
- Offer Guardian
- Lidar API
- Offer Search Service
- NELI (eligibility)
- Fido Product Service
- Fido Search Service
- Purchase History Service
- Web Fetch Service
- Web Search Service
4. Error Handling
Section titled “4. Error Handling”Error metrics are recorded for:
- Validation errors (parameter validation)
- Marshal/unmarshal errors (JSON processing)
- API errors (external service failures)
- Authentication errors
5. Server Lifecycle
Section titled “5. Server Lifecycle”Startup and shutdown timing metrics are recorded in cmd/mcp-server/main.go.
Development
Section titled “Development”Adding New Metrics
Section titled “Adding New Metrics”- Define the metric in
pkg/metrics/metrics.goCounter struct - Initialize the metric in the
Initialize()function - Add helper functions for recording the metric
- Update this documentation with the new metric details
Testing Metrics
Section titled “Testing Metrics”Metrics are automatically enabled based on the TELEMETRY_ENABLED environment variable:
// Metrics will be recorded if telemetry is enabledif metrics.IsEnabled() { metrics.RecordMCPToolCall(ctx, toolName, success, duration)}Monitoring Setup
Section titled “Monitoring Setup”Prometheus Integration
Section titled “Prometheus Integration”Metrics are exported via OTLP and can be scraped by Prometheus using an OTLP receiver.
Grafana Dashboards
Section titled “Grafana Dashboards”Create dashboards using the metric names and attributes defined above. Key dashboards should include:
- MCP tool usage and performance
- API call success rates and latency
- Error rates by type and tool
- Business metrics (user activity, popular tools)
Alerting
Section titled “Alerting”Set up alerts based on:
- High error rates (>5% failure rate)
- Slow response times (p99 latency)
- API call failures
- Tool execution failures
Troubleshooting
Section titled “Troubleshooting”Metrics Not Appearing
Section titled “Metrics Not Appearing”- Check
TELEMETRY_ENABLEDenvironment variable is set totrue - Verify OTLP endpoint is correctly configured in fetchotel
- Check server logs for OpenTelemetry initialization errors
- Ensure fetchotel is properly initialized
High Cardinality Issues
Section titled “High Cardinality Issues”Be careful with dynamic attribute values that could create high cardinality:
- User IDs are tracked but should be monitored for cardinality
- Tool names are predefined to prevent explosion
- API service names are normalized
Related Documentation
Section titled “Related Documentation”- OpenTelemetry Go Documentation
- FetchOTel Package
- Rover Agent Metrics - Related metrics from rover-agent
- CLAUDE.md - Development guidelines including telemetry standards