Skip to content

Rover MCP Metrics

This document describes the OpenTelemetry metrics implemented in the rover-mcp service.

Rover MCP uses OpenTelemetry (OTel) for comprehensive metrics collection and observability. All metrics are exported via OTLP and can be consumed by monitoring systems like Prometheus, Grafana, or AWS CloudWatch.

Metrics are controlled via environment variables:

Terminal window
TELEMETRY_ENABLED=true # Set to "false" to disable telemetry
ENVIRONMENT=dev # Environment name (local, dev, stage, prod)

OpenTelemetry is initialized automatically on startup using the fetchotel package for consistent configuration across Fetch services.

Metric NameTypeDescriptionAttributes
http_request_durationHistogramDuration of HTTP requests in millisecondsmethod, path, status_code, service_type
http_request_countCounterTotal number of HTTP requestsmethod, path, status_code, service_type
http_response_countCounterTotal number of HTTP responsesmethod, path, status_code, service_type
Metric NameTypeDescriptionAttributes
mcp_tool_call_countCounterTotal number of MCP tool callstool_name, service_type
mcp_tool_success_countCounterTotal number of successful MCP tool callstool_name, service_type
mcp_tool_failure_countCounterTotal number of failed MCP tool callstool_name, service_type
mcp_tool_durationHistogramDuration of MCP tool calls in millisecondstool_name, service_type
Metric NameTypeDescriptionAttributes
api_call_countCounterTotal number of API calls to external servicesapi_service, operation, service_type
api_call_durationHistogramDuration of API calls in millisecondsapi_service, operation, service_type
api_call_success_countCounterTotal number of successful API callsapi_service, operation, service_type
api_call_failure_countCounterTotal number of failed API callsapi_service, operation, service_type
Metric NameTypeDescriptionAttributes
cache_hit_countCounterTotal number of cache hitsoperation, service_type
cache_miss_countCounterTotal number of cache missesoperation, service_type
cache_eviction_countCounterTotal number of cache evictionsoperation, service_type
cache_load_durationHistogramDuration of cache load operations in millisecondsoperation, service_type
Metric NameTypeDescriptionAttributes
offer_search_countCounterTotal number of offer searchesservice_type
offer_search_durationHistogramDuration of offer searches in millisecondsservice_type
offers_returned_countCounterTotal number of offers returned from searchesservice_type
Metric NameTypeDescriptionAttributes
location_search_countCounterTotal number of location-based searchesservice_type
location_search_durationHistogramDuration of location searches in millisecondsservice_type
retailers_found_countCounterTotal number of retailers found in location searchesservice_type
Metric NameTypeDescriptionAttributes
web_fetch_countCounterTotal number of web fetch operationsservice_type
web_fetch_durationHistogramDuration of web fetch operations in millisecondsservice_type
web_fetch_bytes_countCounterTotal bytes fetched from web pagesservice_type
Metric NameTypeDescriptionAttributes
web_search_countCounterTotal number of web search operationsservice_type
web_search_durationHistogramDuration of web search operations in millisecondsservice_type
web_search_results_countCounterTotal number of results returned from web searchesservice_type
Metric NameTypeDescriptionAttributes
tool_result_tokens_countCounterTotal number of tokens in tool resultstool_name, service_type
tool_result_bytes_countCounterTotal bytes in tool resultstool_name, service_type
token_count_errors_countCounterTotal errors during token countingtool_name, error_type, service_type

These metrics help track the token usage of MCP tool results, which is essential for understanding OpenAI API costs when tool results are sent to LLMs. Token counting uses the tiktoken-go library with GPT-4o Mini encoding for accurate token estimation that matches OpenAI’s billing.

Metric NameTypeDescriptionAttributes
purchase_history_countCounterTotal number of purchase history requestsservice_type
purchase_history_durationHistogramDuration of purchase history requests in millisecondsservice_type
purchases_found_countCounterTotal number of purchases found in historyservice_type
Metric NameTypeDescriptionAttributes
product_search_countCounterTotal number of product searchesservice_type
product_search_durationHistogramDuration of product searches in millisecondsservice_type
products_found_countCounterTotal number of products found in searchesservice_type
Metric NameTypeDescriptionAttributes
requests_by_environmentCounterTotal requests by environmentenvironment, service_type
requests_by_user_idCounterTotal requests by user IDuser_id, service_type
requests_by_toolCounterTotal requests by MCP tooltool_name, service_type
Metric NameTypeDescriptionAttributes
error_countCounterTotal number of errorserror_type, service_type
validation_error_countCounterTotal number of validation errorserror_type, service_type
marshal_error_countCounterTotal number of marshaling errorserror_type, service_type
api_error_countCounterTotal number of API errorserror_type, service_type
auth_error_countCounterTotal number of authentication errorserror_type, service_type
Metric NameTypeDescriptionAttributes
llm_feedback_countCounterTotal number of LLM feedback messagesservice_type
Metric NameTypeDescriptionAttributes
server_startup_durationHistogramDuration of server startup in millisecondsservice_type
server_shutdown_durationHistogramDuration of server shutdown in millisecondsservice_type
  • service_type: Always set to "mcp" for rover-mcp
  • error_type: Categories include validation, marshal, api, auth
  • environment: Deployment environment (local, dev, stage, prod)
  • tool_name: MCP tool names (search_offers, search_nearby_offers, llm_feedback, fetch_webpage, web_search, get_user_purchase_history, search_products)
  • api_service: External service names (offer_guardian, lidar, offer_search, neli, fido_product, fido_search, purchase_history, web_fetch, web_search)
  • user_id: User identifier (when available)
  • operation: Specific operation within a service

The metrics package provides convenient helper functions:

// Record HTTP request metrics
metrics.RecordHTTPRequest(ctx, "POST", "/", 200, 150.5)
// Record MCP tool call metrics
metrics.RecordMCPToolCall(ctx, "search_offers", true, 2500.0)
// Record API call metrics
metrics.RecordAPICall(ctx, "offer_search", "search_eligible_offers", true, 1500.0)
// Record cache operation metrics
metrics.RecordCacheOperation(ctx, "hit", 5.2)
// Record offer search metrics
metrics.RecordOfferSearch(ctx, 2000.0, 15)
// Record location search metrics
metrics.RecordLocationSearch(ctx, 1800.0, 3)
// Record web fetch metrics
metrics.RecordWebFetch(ctx, 3000.0, 1024)
// Record web search metrics
metrics.RecordWebSearch(ctx, 2200.0, 10)
// Record purchase history metrics
metrics.RecordPurchaseHistory(ctx, 1200.0, 25)
// Record product search metrics
metrics.RecordProductSearch(ctx, 1500.0, 8)
// Record business metrics
metrics.RecordBusinessMetrics(ctx, "prod", "user123")
// Record error metrics
metrics.RecordError(ctx, "validation")
// Record LLM feedback
metrics.RecordLLMFeedback(ctx)
// Record server lifecycle metrics
metrics.RecordServerStartup(ctx, startTime)
metrics.RecordServerShutdown(ctx, shutdownStartTime)
// Record tool result token usage (automatically called in tool handlers)
metrics.RecordToolResult(ctx, "search_offers", toolResult)
// Count tokens in text (uses tiktoken-go for GPT-4o Mini)
tokenCount := metrics.CountTokens("Sample text to count tokens")
// Count tokens in structured data
tokenCount, err := metrics.CountToolResultTokens(toolResult)

HTTP request/response metrics are automatically recorded in the LoggingMiddleware in pkg/logger/middleware.go.

All MCP tool handlers record metrics for:

  • Tool call start/completion
  • Success/failure rates
  • Duration timing
  • Business metrics (user activity, tool usage)
  • Token usage (result size in tokens and bytes)

API call metrics are recorded during external service calls to:

  • Offer Guardian
  • Lidar API
  • Offer Search Service
  • NELI (eligibility)
  • Fido Product Service
  • Fido Search Service
  • Purchase History Service
  • Web Fetch Service
  • Web Search Service

Error metrics are recorded for:

  • Validation errors (parameter validation)
  • Marshal/unmarshal errors (JSON processing)
  • API errors (external service failures)
  • Authentication errors

Startup and shutdown timing metrics are recorded in cmd/mcp-server/main.go.

  1. Define the metric in pkg/metrics/metrics.go Counter struct
  2. Initialize the metric in the Initialize() function
  3. Add helper functions for recording the metric
  4. Update this documentation with the new metric details

Metrics are automatically enabled based on the TELEMETRY_ENABLED environment variable:

// Metrics will be recorded if telemetry is enabled
if metrics.IsEnabled() {
metrics.RecordMCPToolCall(ctx, toolName, success, duration)
}

Metrics are exported via OTLP and can be scraped by Prometheus using an OTLP receiver.

Create dashboards using the metric names and attributes defined above. Key dashboards should include:

  • MCP tool usage and performance
  • API call success rates and latency
  • Error rates by type and tool
  • Business metrics (user activity, popular tools)

Set up alerts based on:

  • High error rates (>5% failure rate)
  • Slow response times (p99 latency)
  • API call failures
  • Tool execution failures
  1. Check TELEMETRY_ENABLED environment variable is set to true
  2. Verify OTLP endpoint is correctly configured in fetchotel
  3. Check server logs for OpenTelemetry initialization errors
  4. Ensure fetchotel is properly initialized

Be careful with dynamic attribute values that could create high cardinality:

  • User IDs are tracked but should be monitored for cardinality
  • Tool names are predefined to prevent explosion
  • API service names are normalized