JSON Component Fencing Fix: Evaluation Study
JSON Component Fencing Fix: Evaluation Study
Section titled “JSON Component Fencing Fix: Evaluation Study”Date: 2025-12-08 Status: Complete Author: Prakash Chaudhary Jira: PLT-301
Table of Contents
Section titled “Table of Contents”- Problem Statement
- Solution Overview
- Benchmark Results
- Recommendations
- Testing Methodology
- Conclusion
Executive Summary
Section titled “Executive Summary”This study evaluates the effectiveness of prompt engineering changes to fix JSON component fencing compliance in the consumer-agent. The mobile app requires JSON components (like prompt-suggestion) to be wrapped in markdown code fences for correct parsing.
Key Findings
Section titled “Key Findings”| Metric | Original Prompt | Fixed Prompt | Improvement |
|---|---|---|---|
| Success Rate | 85.94% | 100% | +14.06pp |
| Failures | 9/64 | 0/64 | -9 failures |
| Failure Rate | 14.06% | 0% | Eliminated |
Recommendation
Section titled “Recommendation”Deploy the improved fencing logic prompt to Opik for stage/production environments. The fix achieves 100% compliance with no observed failures across 64 geographic test iterations.
Problem Statement
Section titled “Problem Statement”The mobile app displays malformed content when the LLM outputs JSON components without proper markdown code fences. This manifests as raw JSON appearing in the chat interface instead of rendered UI components.
Symptoms
Section titled “Symptoms”- Raw JSON like
{"component":"prompt-suggestion","props":{...}}appearing in responses - Broken UI components on mobile
- Inconsistent behavior across responses (non-deterministic failures)
Root Cause Analysis
Section titled “Root Cause Analysis”The original Opik prompt had a single, weak instruction for JSON fencing:
* User-visible text must be **GitHub-Flavored Markdown**. No HTML. No code fences around prose. JSON components go in `json` blocks.This instruction was:
- Too subtle (buried in a list of other rules)
- Not reinforced elsewhere in the prompt
- Missing explicit examples of correct vs incorrect format
Solution Overview
Section titled “Solution Overview”The fix applies the “primacy-recency” principle from cognitive psychology: information at the beginning and end of a sequence is better remembered than information in the middle. The improved fencing logic reinforces JSON formatting requirements in three strategic positions.
Three Strategic Insertions
Section titled “Three Strategic Insertions”1. Primacy Position (Line 13-14 - Quick Rules)
* User-visible text must be **GitHub-Flavored Markdown**. No HTML. No code fences around prose.* **JSON components MUST be wrapped in ```json fences** - raw JSON breaks the mobile app.2. Middle Position (Line 238 - Follow-Ups section)
**CRITICAL - JSON Component Format (non-negotiable):**
* **ALWAYS** wrap each JSON component in triple-backtick markdown code fences with the `json` language identifier.* **NEVER** output raw JSON like `{"component":"..."}` without the triple-backtick markdown fences.* The mobile app parser REQUIRES markdown fences to correctly identify and render components.* Each component must be in its **own** fenced block, not combined.* Outputting unfenced JSON will cause the mobile app to display broken/malformed content to users.
**Correct format (REQUIRED):**```json\{"component":"prompt-suggestion","props":`{"text":"Check nearby stores","type":"recommended"}`\}Wrong format (breaks mobile app - NEVER do this):
{"component":"prompt-suggestion","props":{"text":"Check nearby stores","type":"recommended"}} — emitted as raw text, with no surrounding ```json fence.
3. Recency Position (End of prompt)
## FINAL REMINDER - JSON Formatting
Before completing any response that includes prompt-suggestion or other JSON components:1. Verify each JSON component is wrapped in ```json code fences2. Never output bare JSON objects in your response3. The mobile app will malfunction if JSON is not properly fencedBenchmark Results
Section titled “Benchmark Results”Test Configuration
Section titled “Test Configuration”- Model: gpt-5-mini with low reasoning effort (conversational agent configuration)
- Tools: MCP tools enabled (rover_mcp, web_search)
- Components: All enabled (general-instructions, offer-list, prompt-suggestion, offer-shelf)
- Test queries: 8 shopping-related queries designed to trigger prompt-suggestion components
- Locations: 8 US cities (6 English, 2 Spanish)
- Total requests per test: 64 (8 iterations x 8 locations)
- Concurrency: 8 parallel requests
Results Summary
Section titled “Results Summary”| Prompt Version | Total Tests | Passed | Failed | Success Rate |
|---|---|---|---|---|
| Original (weak fencing) | 64 | 55 | 9 | 85.94% |
| Fixed (improved fencing logic) | 64 | 64 | 0 | 100% |
Recommendations
Section titled “Recommendations”- Deploy the improved fencing prompt to stage and production via Opik.
- Monitor JSON component rendering in mobile for at least one release cycle to confirm sustained 100% fencing compliance.
- Keep the primacy/middle/recency reminders in future prompt revisions to prevent regression.
Testing Methodology
Section titled “Testing Methodology”Test Approach
Section titled “Test Approach”Testing was performed using a parallel async test runner that:
- Sends streaming requests to the API server
- Collects full response text from
textevents - Uses regex to detect:
- Properly fenced JSON:
```json\n{...}\n``` - Unfenced JSON components:
{"component":"...","props":{...}}
- Properly fenced JSON:
- Reports success/failure rates by location, locale, and query type
Detection Logic
Section titled “Detection Logic”The test runner identifies properly fenced JSON components by looking for markdown code fences with the json language identifier, and detects unfenced components by finding raw JSON component patterns in the response text after removing fenced blocks.
Conclusion
Section titled “Conclusion”The improved fencing logic effectively eliminates JSON fencing failures:
| Metric | Before | After | Change |
|---|---|---|---|
| Success Rate | 85.94% | 100% | +14.06pp |
| Failure Rate | 14.06% | 0% | Eliminated |
| Failures | 9 | 0 | -9 |
The improved fencing logic works by applying the primacy-recency principle from cognitive psychology: information at the beginning and end of a sequence is better remembered. The fix reinforces JSON formatting requirements in three strategic positions: (1) Primacy effect - first instruction establishes the rule early in the prompt, (2) Detailed examples - middle section provides concrete correct/incorrect examples, (3) Recency effect - final reminder ensures the rule is top-of-mind when generating output, and (4) Explicit consequences - clear explanation that unfenced JSON “breaks the mobile app” creates urgency.