Autonomous Adaptive Analytics: Safe Agents that Analyze, Execute, and Explain

Q: How This Compares to Traditional Approaches

Demonstration: We'll walk through one complete analysis using Naive Bayes classification to classify 100 news pieces into 20 topic categories. The same autonomous system can run Random Forest or other ML methods by changing a single configuration variable - demonstrating true method-agnostic adaptability. Watch the agent explore data, write code, handle errors, and deliver production-ready results without human intervention.

Q: What Happens Here:

A. Create the Generation Script We build a Python script (generate_analysis.py) that embeds the Claude Code SDK and the complete analysis prompt. Here's the actual script creation from the notebook: This script packages: 🔧 Method-Agnostic Architecture The script builder shown above works with any classification method. The notebook includes two prompt builders: Both share identical analysis/reporting sections via get_shared_analysis_instructions(), ensuring consistent outputs regardless of method. The autonomous agent receives the appropriate prompt based on your CLASSIFICATION_METHOD config—no manual intervention required. B. Upload Script to Daytona Sandbox The generated script is uploaded to /workspace/coding_agent/generate_analysis.py in the Daytona environment: C. Execute Analysis with Simple Error Handling We trigger the script remotely inside Daytona and handle results: Key details: D. Autonomous Agent Execution Inside Daytona Once started, Claude Code runs completely autonomously: This is the Heart of the System Step 4 represents the core autonomous execution, where control is fully handed off to the AI agent. The agent explores data, writes code, fixes errors, and generates complete analysis outputs without human intervention. Everything happens inside the secure Daytona sandbox with no access to external systems. Now that the agent has completed its work, we move to the final phase: retrieving the generated artifacts and organizing them for human review.

Q: 🔄 Want to Try Random Forest?

Remember, the exact same autonomous system can run Random Forest classification by changing one line: The agent will: This demonstrates the true power of method-agnostic autonomous analytics—the infrastructure, error handling, and reporting remain constant regardless of your chosen ML approach.

Q: What Can Be Improved:

Pre-flight knowledge injection: Provide agent with domain-specific "known issues" registry (e.g., "politics/religion categories often overlap") before analysis starts to guide decision-making Mid-run human checkpoints: After initial data exploration, pause for human feedback ("focus on these 5 confusing categories") before proceeding with full classification

TL;DR: Build an autonomous AI agent that detects issues, writes code, self-heals from errors, and delivers production-ready analysis, all inside secure, isolated sandboxes.

Section I: Why AI Analytics Still Falls Short

The Problem: It's 3 AM. Your ML pipeline fails on row 4,732 (NULL value). A data scientist wakes up, debugs for 90 minutes, adds .dropna(), and restarts. The analysis completes, but required manual intervention.

Current AI systems can't adapt autonomously. When analyses hit unexpected issues (distribution drift, missing data, failed assumptions), they stop and wait for humans. Static pipelines lack decision-making capabilities.

Our Solution: A closed-loop AI agent running in secure, isolated sandboxes that:

Detects issues: Missing data, errors, invalid assumptions
Decides & adapts: Retry with fixes, pivot methods, adjust parameters
Executes autonomously: Writes code, self-heals, continues to completion

This is adaptive intelligence, not just automation.

How This Compares to Traditional Approaches

	Manual Analysis	Static ML Pipeline	Autonomous AI Analysis
Adaptability	High but slow	Low, brittle to change	High and fast
When Errors Occur	Manual debugging	Pipeline stops, needs code fix	Self-heals, continues
Time to Results	Hours to days	Minutes (best case), hours (with issues)	10-15 mins regardless
Best For	Novel research	Known data patterns	Exploratory analysis, iteration

Demonstration: We'll walk through one complete analysis using Naive Bayes classification to classify 100 news pieces into 20 topic categories. The same autonomous system can run Random Forest or other ML methods by changing a single configuration variable - demonstrating true method-agnostic adaptability.

Watch the agent explore data, write code, handle errors, and deliver production-ready results without human intervention.

📓 Try It Yourself

We've published a Jupyter notebook that demonstrates this autonomous analysis system with configurable ML methods.

🔧 Choose Your Classification Method:

This simple one-line configuration lets you switch between classification approaches without any code changes—the autonomous agent adapts its implementation automatically. We'll focus on Naive Bayes in this walkthrough.

💡 Tip: For the easiest experience, run this notebook in Google Colab:

Download the notebook from GitHub
Go to colab.research.google.com
Click "Upload" and select the downloaded notebook

This gives you a free cloud environment

Section II: Minimum Design Requirements for Our Solution

After experimenting with different approaches, we decided on these four principles that will make AI autonomous analysis safe, effective, and reliable for production:

1. Security & Tenant Isolation

Per-tenant, short-lived sandboxes with no shared state or persistent credentials. Everything cleans up automatically. This gives customers hard isolation without operational drag.

2. Closed-Loop Decision Engine

The agent owns the how: it chooses data exploration steps, analysis methods, and visualizations, not a prewritten script.

3. Autonomous Error Recovery

When runtime issues arise (OOM, package conflicts), the agent adapts, then continues from the last good checkpoint. Just like a human would do.

4. Observability & Governance

Every decision is logged with rationale; runs are bounded by policy (steps/time/cost); outputs are reproducible artifacts (charts, scripts, reports, decision log).

What You Need to Know Before We Begin

Component	Purpose	Type	Key Features
🏗️ Daytona	Ephemeral sandboxes	Product (requires registration)	• Per-tenant isolation • Fast spin-up & reproducibility
🤖 Claude Code SDK	Autonomous coding agent	SDK/Library	• Read/Write/Edit/Bash/Glob/Grep tools • Closed-loop pattern (detect, decide, act, evaluate) • Structured logging & runnable code generation
📤 Data Transfer	Secure file handling	Function	• Direct upload via `sandbox.fs.upload_file()` • No embedded credentials or persistent storage • API keys injected as environment variables
📥 Result Extraction	Output management	Function	• Direct filesystem download • Categorized by type for clean handoff
🔑 API Keys Required	Authentication	Registration required	• Anthropic API key for Claude Code • Daytona API key for sandbox access • Insert both in the Jupyter notebook placeholders

Section III: Building the Analysis Flow - Step by Step

In this section, we will walk through the process, from start to finish: question to agent to answer, using the 20-Newsgroups data with Naive Bayes classification as the anchor example. The same flow applies to any classification method you configure.

Analysis Flow Diagram

Step 1/6: Prepare Dataset Locally

Download the 20 Newsgroups dataset and sample 1,000 posts to balance speed with statistical power. The dataset is exported to CSV format for sandbox compatibility.

Output: 20newsgroups_1000.csv (1,000 rows × 3 columns: text, target, category_name)

💡 Key Highlights

✓ Standard multi-class dataset (20 categories)
✓ Balanced sampling (1,000 posts)
✓ CSV format for universal compatibility

Step 2/6: Start Daytona Sandbox

Create an isolated, ephemeral sandbox environment with automatic cleanup.

💡 Key Highlights

✓ Session-based caching for warm reuse ✓ Complete tenant isolation ✓ Zero persistent state

Sandbox Creation

Step 3/6: Upload Dataset to Sandbox

Transfer the prepared CSV file securely to the isolated sandbox environment.

For multiple files, concurrent uploads improve performance:

💡 Key Highlights

✓ Direct file transfer via sandbox.fs.upload_file() ✓ Concurrent uploads for multiple files ✓ No persistent credentials or storage

Step 4/6: Build Analysis Prompt & Execute Autonomously

This is the core autonomous execution step where we hand off control to the coding agent running inside the Daytona sandbox.

What Happens Here:

A. Create the Generation Script

We build a Python script (generate_analysis.py) that embeds the Claude Code SDK and the complete analysis prompt. Here's the actual script creation from the notebook:

This script packages:

The analysis prompt with your business requirements (classify 100 posts, generate report, create visualizations)
Claude Code SDK configured with 60-turn limit and tool permissions
Tool allowlist: Read, Write, Edit, Bash, Glob, Grep for autonomous file and code operations
Simple progress tracking showing message types as they're processed

🔧 Method-Agnostic Architecture

The script builder shown above works with any classification method. The notebook includes two prompt builders:

build_naive_bayes_prompt() - Naive Bayes implementation (used in this demo)

build_random_forest_prompt() - Random Forest implementation

Both share identical analysis/reporting sections via get_shared_analysis_instructions(), ensuring consistent outputs regardless of method. The autonomous agent receives the appropriate prompt based on your CLASSIFICATION_METHOD config—no manual intervention required.

B. Upload Script to Daytona Sandbox

The generated script is uploaded to /workspace/coding_agent/generate_analysis.py in the Daytona environment:

C. Execute Analysis with Simple Error Handling

We trigger the script remotely inside Daytona and handle results:

Key details:

Runs inside the sandbox: The Python process executes in Daytona's isolated environment
Environment injection: ANTHROPIC_API_KEY passed securely via environment variable
Working directory: /workspace/coding_agent/ where the dataset CSV already exists
Simple error handling: Clear user messages with retry instructions
Timeout: 25 minutes to handle 60 tool-using turns with API calls

D. Autonomous Agent Execution Inside Daytona

Once started, Claude Code runs completely autonomously:

Read tools: Explores 20newsgroups_1000.csv structure and samples content
Write tools: Creates Python scripts for Naive Bayes training and prediction
Bash tool: Executes classification code, generates visualizations, processes results
Edit tools: Refines code based on errors or intermediate results
Self-directed iteration: Agent decides its own implementation approach (no hardcoded steps)

Example of Claude Code execution logs showing autonomous decision-making and tool usage

This is the Heart of the System

Step 4 represents the core autonomous execution, where control is fully handed off to the AI agent. The agent explores data, writes code, fixes errors, and generates complete analysis outputs without human intervention. Everything happens inside the secure Daytona sandbox with no access to external systems.

Now that the agent has completed its work, we move to the final phase: retrieving the generated artifacts and organizing them for human review.

Step 5/6: Download and Parse Results

Extract all generated files from the sandbox filesystem.

💡 Key Highlights

✓ Direct filesystem download from sandbox ✓ Automatic filtering (excludes temp files & uploaded data) ✓ Per-file error handling for graceful degradation ✓ Binary-safe content handling for images/charts

Download Results

Step 6/6: Save Results Locally

Organize all outputs in a timestamped directory for reproducibility and audit trails.

💡 Key Highlights

✓ Timestamped directories for reproducibility ✓ Binary-safe file handling (images, CSVs, markdown) ✓ Clean handoff ready for downstream systems ✓ Complete audit trail preserved

Section IV: Actual Results - Naive Bayes Classification

The autonomous agent successfully completed the Naive Bayes analysis end-to-end. Here's what we learned:

🎯 Key Finding

The agent achieved 55% overall accuracy across 20 diverse newsgroup categories, with perfect (100%) classification on politically-focused topics and challenges on technical/religious categories requiring more sophisticated approaches.

⚡ Autonomous Recovery Highlight

Without human intervention, the agent:

Detected and removed 15 entries with missing values (NaN in text field)

Applied stratified sampling to ensure category representation in test set

Installed missing seaborn library when visualization failed

Continued from last checkpoint and delivered complete results

Performance Breakdown

Overall Metrics:

Accuracy: 55.00% (55/100 correct predictions)
Test Set: 100 posts sampled from 985 valid posts
Training Set: 885 posts
Categories: 20 distinct newsgroup topics

Category Performance Analysis

Performance Tier	Categories	Example
Perfect (100%)	1 category	talk.politics.guns (6/6)
Strong (80-90%)	4 categories	misc.forsale (83%), rec.autos (83%), sci.crypt (80%)
Medium (50-79%)	8 categories	rec.sport.hockey (75%), sci.space (67%)
Poor (<50%)	7 categories	sci.electronics (17%), alt.atheism (0%)

Key Insights from Analysis Report

What Worked Well:

Distinctive vocabulary topics (politics.guns, forsale, autos, cryptography) achieved 80%+ accuracy
Clear thematic separation enabled strong classification in well-defined domains
TF-IDF with bigrams effectively captured topic-specific terminology patterns

What Struggled:

Religious/political overlap: alt.atheism and talk.religion.misc completely misclassified (0% accuracy)
Technical category confusion: Computer/science categories shared vocabulary, causing cross-classification
Small test samples: Some categories had only 3-4 test instances, limiting statistical confidence

Results Chart 1 Analysis.md file Snapshot

Results Chart 2 Accuracy by category

Generated Artifacts:

The autonomous agent produced 8 production-ready outputs:

Executive summary and methodology report
4 visualization charts (category distribution, accuracy breakdown, confusion matrix, error patterns)
Complete CSV audit trail with all 100 predictions
Executable Python code with full implementation
Detailed execution trace log

Total output size: ~2.0MB across all artifacts

🔄 Want to Try Random Forest?

Remember, the exact same autonomous system can run Random Forest classification by changing one line:

The agent will:

Adapt its implementation to use RandomForestClassifier instead of MultinomialNB
Configure appropriate hyperparameters (n_estimators=100, random_state=42)
Apply the same error recovery logic
Generate identical output artifacts with Random Forest results

This demonstrates the true power of method-agnostic autonomous analytics—the infrastructure, error handling, and reporting remain constant regardless of your chosen ML approach.

Section V: What Did We Achieve & Future Improvements

Achieved:

End-to-end autonomous coding agent that takes in a business question and chosen ML method, then produces complete analysis without human intervention
Method-agnostic architecture: Demonstrated with both Naive Bayes and Random Forest configurations—switch algorithms with one config variable, zero pipeline changes
Closed-loop execution in tenant-isolated Daytona sandboxes with automatic cleanup
Self-directed code generation via Claude Code SDK: agent autonomously decides data exploration, classification implementation, visualization strategy, and reporting format based on configured method
Autonomous error handling: agent self-heals inside the sandbox (missing libraries, NaN values, stratification issues), adapting to runtime problems and continuing from checkpoints
Production-ready artifacts including executive summary (6.4KB markdown), statistical analysis, 4 visualizations, audit trail CSV, and executable Python code—all generated and downloaded automatically

What Can Be Improved:

Pre-flight knowledge injection: Provide agent with domain-specific "known issues" registry (e.g., "politics/religion categories often overlap") before analysis starts to guide decision-making
Mid-run human checkpoints: After initial data exploration, pause for human feedback ("focus on these 5 confusing categories") before proceeding with full classification

Autonomous Adaptive Analytics: Safe Agents that Analyze, Execute, and Explain

Autonomous Adaptive Analytics: Safe Agents that Analyze, Execute, and Explain

Section I: Why AI Analytics Still Falls Short

How This Compares to Traditional Approaches

📓 Try It Yourself

Section II: Minimum Design Requirements for Our Solution

1. Security & Tenant Isolation

2. Closed-Loop Decision Engine

3. Autonomous Error Recovery

4. Observability & Governance

What You Need to Know Before We Begin

Section III: Building the Analysis Flow - Step by Step

Step 1/6: Prepare Dataset Locally

Step 2/6: Start Daytona Sandbox

Step 3/6: Upload Dataset to Sandbox

Step 4/6: Build Analysis Prompt & Execute Autonomously

What Happens Here:

Step 5/6: Download and Parse Results

Step 6/6: Save Results Locally

Section IV: Actual Results - Naive Bayes Classification

🎯 Key Finding

⚡ Autonomous Recovery Highlight

Performance Breakdown

Category Performance Analysis

Key Insights from Analysis Report

Generated Artifacts:

🔄 Want to Try Random Forest?

Section V: What Did We Achieve & Future Improvements

Achieved:

What Can Be Improved:

Tags

Get Started with Yess AI