Rover MCP Metrics

This document describes the OpenTelemetry metrics implemented in the rover-mcp service.

Overview

Rover MCP uses OpenTelemetry (OTel) for comprehensive metrics collection and observability. All metrics are exported via OTLP and can be consumed by monitoring systems like Prometheus, Grafana, or AWS CloudWatch.

Configuration

Metrics are controlled via environment variables:

TELEMETRY_ENABLED=true  # Set to "false" to disable telemetry
ENVIRONMENT=dev         # Environment name (local, dev, stage, prod)

OpenTelemetry is initialized automatically on startup using the fetchotel package for consistent configuration across Fetch services.

Available Metrics

HTTP Request Metrics

Metric Name	Type	Description	Attributes
`http_request_duration`	Histogram	Duration of HTTP requests in milliseconds	`method`, `path`, `status_code`, `service_type`
`http_request_count`	Counter	Total number of HTTP requests	`method`, `path`, `status_code`, `service_type`
`http_response_count`	Counter	Total number of HTTP responses	`method`, `path`, `status_code`, `service_type`

MCP Tool Metrics

Metric Name	Type	Description	Attributes
`mcp_tool_call_count`	Counter	Total number of MCP tool calls	`tool_name`, `service_type`
`mcp_tool_success_count`	Counter	Total number of successful MCP tool calls	`tool_name`, `service_type`
`mcp_tool_failure_count`	Counter	Total number of failed MCP tool calls	`tool_name`, `service_type`
`mcp_tool_duration`	Histogram	Duration of MCP tool calls in milliseconds	`tool_name`, `service_type`

API Client Metrics

Metric Name	Type	Description	Attributes
`api_call_count`	Counter	Total number of API calls to external services	`api_service`, `operation`, `service_type`
`api_call_duration`	Histogram	Duration of API calls in milliseconds	`api_service`, `operation`, `service_type`
`api_call_success_count`	Counter	Total number of successful API calls	`api_service`, `operation`, `service_type`
`api_call_failure_count`	Counter	Total number of failed API calls	`api_service`, `operation`, `service_type`

Cache Metrics

Metric Name	Type	Description	Attributes
`cache_hit_count`	Counter	Total number of cache hits	`operation`, `service_type`
`cache_miss_count`	Counter	Total number of cache misses	`operation`, `service_type`
`cache_eviction_count`	Counter	Total number of cache evictions	`operation`, `service_type`
`cache_load_duration`	Histogram	Duration of cache load operations in milliseconds	`operation`, `service_type`

Business Logic Metrics

Offer Search Metrics

Metric Name	Type	Description	Attributes
`offer_search_count`	Counter	Total number of offer searches	`service_type`
`offer_search_duration`	Histogram	Duration of offer searches in milliseconds	`service_type`
`offers_returned_count`	Counter	Total number of offers returned from searches	`service_type`

Location Search Metrics

Metric Name	Type	Description	Attributes
`location_search_count`	Counter	Total number of location-based searches	`service_type`
`location_search_duration`	Histogram	Duration of location searches in milliseconds	`service_type`
`retailers_found_count`	Counter	Total number of retailers found in location searches	`service_type`

Web Fetch Metrics

Metric Name	Type	Description	Attributes
`web_fetch_count`	Counter	Total number of web fetch operations	`service_type`
`web_fetch_duration`	Histogram	Duration of web fetch operations in milliseconds	`service_type`
`web_fetch_bytes_count`	Counter	Total bytes fetched from web pages	`service_type`

Web Search Metrics

Metric Name	Type	Description	Attributes
`web_search_count`	Counter	Total number of web search operations	`service_type`
`web_search_duration`	Histogram	Duration of web search operations in milliseconds	`service_type`
`web_search_results_count`	Counter	Total number of results returned from web searches	`service_type`

Token Usage Metrics

Metric Name	Type	Description	Attributes
`tool_result_tokens_count`	Counter	Total number of tokens in tool results	`tool_name`, `service_type`
`tool_result_bytes_count`	Counter	Total bytes in tool results	`tool_name`, `service_type`
`token_count_errors_count`	Counter	Total errors during token counting	`tool_name`, `error_type`, `service_type`

These metrics help track the token usage of MCP tool results, which is essential for understanding OpenAI API costs when tool results are sent to LLMs. Token counting uses the tiktoken-go library with GPT-4o Mini encoding for accurate token estimation that matches OpenAI’s billing.

Purchase History Metrics

Metric Name	Type	Description	Attributes
`purchase_history_count`	Counter	Total number of purchase history requests	`service_type`
`purchase_history_duration`	Histogram	Duration of purchase history requests in milliseconds	`service_type`
`purchases_found_count`	Counter	Total number of purchases found in history	`service_type`

Product Search Metrics

Metric Name	Type	Description	Attributes
`product_search_count`	Counter	Total number of product searches	`service_type`
`product_search_duration`	Histogram	Duration of product searches in milliseconds	`service_type`
`products_found_count`	Counter	Total number of products found in searches	`service_type`

Usage Metrics

Metric Name	Type	Description	Attributes
`requests_by_environment`	Counter	Total requests by environment	`environment`, `service_type`
`requests_by_user_id`	Counter	Total requests by user ID	`user_id`, `service_type`
`requests_by_tool`	Counter	Total requests by MCP tool	`tool_name`, `service_type`

Error Metrics

Metric Name	Type	Description	Attributes
`error_count`	Counter	Total number of errors	`error_type`, `service_type`
`validation_error_count`	Counter	Total number of validation errors	`error_type`, `service_type`
`marshal_error_count`	Counter	Total number of marshaling errors	`error_type`, `service_type`
`api_error_count`	Counter	Total number of API errors	`error_type`, `service_type`
`auth_error_count`	Counter	Total number of authentication errors	`error_type`, `service_type`

LLM Feedback Metrics

Metric Name	Type	Description	Attributes
`llm_feedback_count`	Counter	Total number of LLM feedback messages	`service_type`

Server Lifecycle Metrics

Metric Name	Type	Description	Attributes
`server_startup_duration`	Histogram	Duration of server startup in milliseconds	`service_type`
`server_shutdown_duration`	Histogram	Duration of server shutdown in milliseconds	`service_type`

Metric Attributes

Common Attributes

service_type: Always set to "mcp" for rover-mcp
error_type: Categories include validation, marshal, api, auth
environment: Deployment environment (local, dev, stage, prod)

Tool Attributes

tool_name: MCP tool names (search_offers, search_nearby_offers, llm_feedback, fetch_webpage, web_search, get_user_purchase_history, search_products)

API Service Attributes

api_service: External service names (offer_guardian, lidar, offer_search, neli, fido_product, fido_search, purchase_history, web_fetch, web_search)

Business Logic Attributes

user_id: User identifier (when available)
operation: Specific operation within a service

Helper Functions

The metrics package provides convenient helper functions:

// Record HTTP request metrics
metrics.RecordHTTPRequest(ctx, "POST", "/", 200, 150.5)

// Record MCP tool call metrics
metrics.RecordMCPToolCall(ctx, "search_offers", true, 2500.0)

// Record API call metrics
metrics.RecordAPICall(ctx, "offer_search", "search_eligible_offers", true, 1500.0)

// Record cache operation metrics
metrics.RecordCacheOperation(ctx, "hit", 5.2)

// Record offer search metrics
metrics.RecordOfferSearch(ctx, 2000.0, 15)

// Record location search metrics
metrics.RecordLocationSearch(ctx, 1800.0, 3)

// Record web fetch metrics
metrics.RecordWebFetch(ctx, 3000.0, 1024)

// Record web search metrics
metrics.RecordWebSearch(ctx, 2200.0, 10)

// Record purchase history metrics
metrics.RecordPurchaseHistory(ctx, 1200.0, 25)

// Record product search metrics
metrics.RecordProductSearch(ctx, 1500.0, 8)

// Record business metrics
metrics.RecordBusinessMetrics(ctx, "prod", "user123")

// Record error metrics
metrics.RecordError(ctx, "validation")

// Record LLM feedback
metrics.RecordLLMFeedback(ctx)

// Record server lifecycle metrics
metrics.RecordServerStartup(ctx, startTime)
metrics.RecordServerShutdown(ctx, shutdownStartTime)

// Record tool result token usage (automatically called in tool handlers)
metrics.RecordToolResult(ctx, "search_offers", toolResult)

// Count tokens in text (uses tiktoken-go for GPT-4o Mini)
tokenCount := metrics.CountTokens("Sample text to count tokens")

// Count tokens in structured data
tokenCount, err := metrics.CountToolResultTokens(toolResult)

Integration Points

1. HTTP Middleware

HTTP request/response metrics are automatically recorded in the LoggingMiddleware in pkg/logger/middleware.go.

2. MCP Tool Handlers

All MCP tool handlers record metrics for:

Tool call start/completion
Success/failure rates
Duration timing
Business metrics (user activity, tool usage)
Token usage (result size in tokens and bytes)

3. API Client Integration

API call metrics are recorded during external service calls to:

Offer Guardian
Lidar API
Offer Search Service
NELI (eligibility)
Fido Product Service
Fido Search Service
Purchase History Service
Web Fetch Service
Web Search Service

4. Error Handling

Error metrics are recorded for:

Validation errors (parameter validation)
Marshal/unmarshal errors (JSON processing)
API errors (external service failures)
Authentication errors

5. Server Lifecycle

Startup and shutdown timing metrics are recorded in cmd/mcp-server/main.go.

Development

Adding New Metrics

Define the metric in pkg/metrics/metrics.go Counter struct
Initialize the metric in the Initialize() function
Add helper functions for recording the metric
Update this documentation with the new metric details

Testing Metrics

Metrics are automatically enabled based on the TELEMETRY_ENABLED environment variable:

// Metrics will be recorded if telemetry is enabled
if metrics.IsEnabled() {
    metrics.RecordMCPToolCall(ctx, toolName, success, duration)
}

Monitoring Setup

Prometheus Integration

Metrics are exported via OTLP and can be scraped by Prometheus using an OTLP receiver.

Grafana Dashboards

Create dashboards using the metric names and attributes defined above. Key dashboards should include:

MCP tool usage and performance
API call success rates and latency
Error rates by type and tool
Business metrics (user activity, popular tools)

Alerting

Set up alerts based on:

High error rates (>5% failure rate)
Slow response times (p99 latency)
API call failures
Tool execution failures

Troubleshooting

Metrics Not Appearing

Check TELEMETRY_ENABLED environment variable is set to true
Verify OTLP endpoint is correctly configured in fetchotel
Check server logs for OpenTelemetry initialization errors
Ensure fetchotel is properly initialized

High Cardinality Issues

Be careful with dynamic attribute values that could create high cardinality:

User IDs are tracked but should be monitored for cardinality
Tool names are predefined to prevent explosion
API service names are normalized

OpenTelemetry Go Documentation
FetchOTel Package
Rover Agent Metrics - Related metrics from rover-agent
CLAUDE.md - Development guidelines including telemetry standards

Rover MCP Metrics

Rover MCP Metrics

Overview

Configuration

Available Metrics

HTTP Request Metrics

MCP Tool Metrics

API Client Metrics

Cache Metrics

Business Logic Metrics

Offer Search Metrics

Location Search Metrics

Web Fetch Metrics

Web Search Metrics

Token Usage Metrics

Purchase History Metrics

Product Search Metrics

Usage Metrics

Error Metrics

LLM Feedback Metrics

Server Lifecycle Metrics

Metric Attributes

Common Attributes

Tool Attributes

API Service Attributes

Business Logic Attributes

Helper Functions

Integration Points

1. HTTP Middleware

2. MCP Tool Handlers

3. API Client Integration

4. Error Handling

5. Server Lifecycle

Development

Adding New Metrics

Testing Metrics

Monitoring Setup

Prometheus Integration

Grafana Dashboards

Alerting

Troubleshooting

Metrics Not Appearing

High Cardinality Issues

Related Documentation