AXON — AI Automation Tools for Creators & Builders

Manual A/B testing? Too slow and inefficient.

Problems:

Only one test live at a time.
Weeks waiting for results.
Spreadsheet hell for analysis.
3-4 tests a month max.
Everything moves super slow.

So I built an AI-powered A/B testing engine.

Now:

Running 20+ tests simultaneously.
Using multi-armed bandit for allocation.
Winners picked automatically.
15–20 tests/month without breaking a sweat.
Continuous, compounding optimization.

Result? 10× faster testing. Saw a 3.5× jump in conversion in ~90 days.

Let’s break down what I actually built.

The AI A/B Testing Engine

Fully automated split testing, top to bottom.

# Uses Azure GPT-4.1 for analysis and OpenAI for variant generation
from datetime import datetime
import json
import openai

class AIABTestingEngine:
    def __init__(self):
        self.client = openai.OpenAI()
        self.active_tests = {}
        self.completed_tests = []
        self.winning_variants = {}

    def create_test_variants(self, element_to_test, base_version, num_variants=3):
        # AI generates test copy/variants with hypothesis & rationale
        prompt = f"... "  # See full code above for details
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.8,
            response_format={"type": "json_object"}
        )
        result = json.loads(response.choices[0].message.content)
        return result.get('variants', [])

    def setup_multivariate_test(self, test_name, variants, traffic_split='even', min_sample_size=100):
        # Set up test, auto-allocate traffic
        test = {...}
        self.active_tests[test['test_id']] = test
        return test

# Usage: Fire off ideas, generate variants, simulate traffic, log stats.

Real stack: Azure GPT-4.1 for analysis, Playwright+CDP for tracking, brand_cron.py triggers new tests every Monday.

Testing Strategy

What to test? Start with the high-impact stuff:

Headlines
- Unique value, emotion, numbers, questions.
CTAs
- Button copy, color, urgency, position.
Copy
- Pain points, benefits, length, direct/quirky tone.
Social Proof
- Testimonials, trust logos, case studies.
Page Layout
- What’s visible, section order, spacing, mobile.

Multi-Armed Bandit vs Old A/B Testing

Old A/B is just a dumb 50/50 split and wastes time+traffic running losers. New multi-armed bandit (UCB/Thompson sampling) sends more visitors to what’s working TODAY, adapts every run, and lets you run a lot of tests at once.

If you’ve got 4 winning headlines and 6 winning CTAs—stack them, keep iterating, and they compound. It’s 40–60% faster easy.

Testing Workflow

How I actually ship 15–20 solid tests a month:

Monday: Review tests, roll out any winners (15min). Tuesday: Launch 2–3 new tests, variants by AI in minutes (30min). Wed–Fri: Check significance, course correct (10min/day). Saturday: Weekly report + roadmap. What’s working, what sucked, what’s next (30min).

All tracked and auto-prioritized via social_poster.py, brand_cron.py, and a mess of Python glue.

Test Prioritization — ICE Framework

Impact: Will it actually move conversion? Headlines/CTAs = high, button shape = who cares.
Confidence: What does your data say about the likelihood it’ll work?
Ease: Can you change it in 10 minutes or does it need a dev sprint?

Multiply: Impact × Confidence × Ease. Highest number wins.

Stack and Costs

OpenAI GPT-4.1 (variant gen, analysis) — $20–50/month.
VWO/Optimizely or my own Playwright+CDP/Chrome 9222 runner — $50–200/month.
Google Analytics/Hotjar for tracking.

$70–259/month, ROI is stupid:

Conversion 3.2% → 11.2% in three months.
Cost per test: around $15 (less if you DIY).
Average lift: +15% per decent test.

Before vs After

Old way:

3–4 tests/month.
2–3 weeks/test.
Write variants by hand, setup, analyze in Sheets.
12% conversion gain in 6 months.

AI Engine:

15–20 tests/month.
3–7 days/test.
Variants+winners auto-generated.
250% conversion in 3 months.

Speed: 5× more tests, 90% less time spent. Uplift: 3×–3.5× conversion (actual numbers, not “marketing”). Revenue: Up 250% over a quarter.

Get Started

Want to copy my setup? Do this:

Day 1: Pick your A/B tool. Plug in tracking. Log baseline.
Day 2: Let GPT or Claude Sonnet (or local Ollama for the nerds) spit out 10 test ideas. ICE score and shortlist.
Day 3: Use my script or ChatGPT to draft 3–4 solid variants per test. Launch first batch.
Rest of week: Watch, don’t get trigger-happy—let the numbers decide. Roll out winners. Repeat.

Mess-Ups to Avoid

Don’t change multiple things in one test. Isolate variables.
Don’t stop tests early. Get your sample size.
Don’t forget to ship the winner.
Don’t randomly test dumb stuff. High-impact first.
Don’t just wing it. Use a 12-week roadmap.

Bottom Line

Manual A/B? Painfully slow. AI-driven? Real learning, rapid lift, compounding wins—at scale.

Test everything, always.
AI auto-generates and analyzes.
Multi-armed bandit means you don’t waste time on duds.

Result: 5× more tests, 3× faster, conversion at 3.2% → 11.2% in 90 days. The revenue spike is real.

If you want the engine, DM @billy_kennedy_bmx or just get started at axon.nepa-ai.com.