Back to Blog
2026-03-22

Lead Generation with AI: How to Find and Enrich 300 Leads Per Month

Manual lead research takes hours. AI-assisted scraping and enrichment can generate 300+ qualified leads per month with minimal human time. Full pipeline: target identification → scraping → enrichment → CRM export.

The average SDR spends 5 hours/day on prospecting. They manually search LinkedIn, copy info into a spreadsheet, look up email formats, write personalized openers. It's high-value work that AI can largely automate — freeing reps to focus on actual conversations.

Here's a practical, full-stack lead generation pipeline that runs daily, enriches leads with company context, and exports to your CRM — targeting 300 qualified leads/month on the low end.

The Pipeline Architecture

Target Definition → Source Scraping → Dedup → Enrichment → Scoring → Export

Each stage:

  1. Target Definition: ICP (Ideal Customer Profile) parameters — industry, size, job title, tech stack, growth signals
  2. Source Scraping: LinkedIn Sales Nav, Apollo.io, company job boards, G2, Crunchbase
  3. Dedup: cross-reference against existing CRM records, previous outreach log
  4. Enrichment: company financials, tech stack, news/signals, decision-maker contacts
  5. Scoring: rank leads by fit and buying intent signals
  6. Export: push to CRM (HubSpot, Salesforce) or generate outreach queue

Defining Your ICP Programmatically

Your ICP is a set of filters. The more specific, the higher the quality:

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class ICPCriteria:
    # Company filters
    industries: list[str] = field(default_factory=lambda: [
        "SaaS", "E-commerce", "Digital Marketing Agency"
    ])
    employee_range: tuple[int, int] = (10, 500)        # company size
    revenue_range_usd: tuple[int, int] = (1_000_000, 50_000_000)
    locations: list[str] = field(default_factory=lambda: ["United States", "Canada"])
    
    # Contact filters
    job_titles: list[str] = field(default_factory=lambda: [
        "CEO", "Founder", "Head of Marketing", 
        "VP of Marketing", "Director of Content"
    ])
    
    # Tech stack signals (company uses these tools = warmer lead)
    tech_stack_signals: list[str] = field(default_factory=lambda: [
        "HubSpot", "Webflow", "Shopify", "Notion"
    ])
    
    # Growth signals (recently hired, funding, job postings)
    require_growth_signal: bool = True
    
    # Exclusions
    exclude_domains: list[str] = field(default_factory=list)
    exclude_companies: list[str] = field(default_factory=list)

# Your ICP
ICP = ICPCriteria(
    industries=["Digital Agency", "Content Studio", "Media Company"],
    employee_range=(5, 200),
    job_titles=["Founder", "CEO", "Creative Director", "Head of Content"],
    tech_stack_signals=["Adobe Premiere", "Final Cut", "CapCut"],
)

Scraping Apollo.io via API

Apollo has a generous API tier (free: 50 exports/month, paid: unlimited). This is the fastest source for contact data:

import httpx
import time
from typing import Generator

APOLLO_API_KEY = "your_apollo_api_key"

def search_apollo_people(icp: ICPCriteria, pages: int = 10) -> Generator[dict, None, None]:
    """
    Search Apollo for contacts matching ICP. Yields contact records.
    Free tier: 50 exports/month. Paid: unlimited.
    """
    base_url = "https://api.apollo.io/v1/mixed_people/search"
    
    for page in range(1, pages + 1):
        payload = {
            "api_key": APOLLO_API_KEY,
            "page": page,
            "per_page": 25,
            "person_titles": icp.job_titles,
            "organization_industry_tag_ids": [],  # map from industry names
            "organization_num_employees_ranges": [
                f"{icp.employee_range[0]},{icp.employee_range[1]}"
            ],
            "person_locations": icp.locations,
            "contact_email_status": ["verified", "likely to engage"],
        }
        
        resp = httpx.post(base_url, json=payload, timeout=30)
        data = resp.json()
        
        if not data.get("people"):
            break
            
        for person in data["people"]:
            yield {
                "first_name": person.get("first_name"),
                "last_name": person.get("last_name"),
                "email": person.get("email"),
                "title": person.get("title"),
                "company": person.get("organization", {}).get("name"),
                "linkedin_url": person.get("linkedin_url"),
                "company_domain": person.get("organization", {}).get("primary_domain"),
                "employee_count": person.get("organization", {}).get("num_employees"),
                "source": "apollo"
            }
        
        time.sleep(0.5)  # respect rate limits

LinkedIn Scraping with Playwright

For leads not in Apollo, or for enriching with LinkedIn-specific signals:

from playwright.async_api import async_playwright
import asyncio

async def scrape_linkedin_search(query: str, session_file: str, max_results: int = 50):
    """
    Search LinkedIn and extract profile data.
    Requires a saved LinkedIn session (log in manually once).
    """
    leads = []
    
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(storage_state=session_file)
        page = await context.new_page()
        
        # LinkedIn people search
        search_url = f"https://www.linkedin.com/search/results/people/?keywords={query}&origin=GLOBAL_SEARCH_HEADER"
        await page.goto(search_url)
        await page.wait_for_load_state("networkidle")
        
        results_seen = 0
        while results_seen < max_results:
            # Extract current page results
            profiles = await page.query_selector_all('.entity-result__item')
            
            for profile in profiles:
                name_el = await profile.query_selector('.entity-result__title-text')
                title_el = await profile.query_selector('.entity-result__primary-subtitle')
                company_el = await profile.query_selector('.entity-result__secondary-subtitle')
                link_el = await profile.query_selector('a.app-aware-link')
                
                if name_el:
                    leads.append({
                        "name": await name_el.inner_text(),
                        "title": await title_el.inner_text() if title_el else "",
                        "company": await company_el.inner_text() if company_el else "",
                        "linkedin_url": await link_el.get_attribute('href') if link_el else "",
                        "source": "linkedin"
                    })
                    results_seen += 1
            
            # Next page
            next_btn = page.locator('button[aria-label="Next"]')
            if not await next_btn.is_visible():
                break
            await next_btn.click()
            await asyncio.sleep(2)  # avoid rate limiting
        
        await browser.close()
    
    return leads

AI Enrichment with Company Context

Raw lead data isn't enough. You need context to write a relevant opener. Use an LLM to generate company summaries from web searches:

import openai
import httpx

client = openai.OpenAI()

async def enrich_lead(lead: dict) -> dict:
    """Add AI-generated context to a lead record."""
    
    company = lead.get("company", "")
    domain = lead.get("company_domain", "")
    
    # 1. Fetch recent news / company description
    search_results = await web_search_company(company, domain)
    
    # 2. Generate context summary
    prompt = f"""
    Company: {company}
    Website: {domain}
    Contact: {lead.get('first_name')} {lead.get('last_name')}, {lead.get('title')}
    
    Search results:
    {search_results[:2000]}
    
    Write 2-3 sentences summarizing:
    1. What this company does
    2. What pain points our AI automation service could solve for them
    3. One specific hook for an outreach email
    
    Be specific and factual. No fluff.
    """
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # cheap model, runs thousands of times
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200,
        temperature=0.3
    )
    
    lead["ai_context"] = response.choices[0].message.content
    lead["enriched"] = True
    return lead


async def web_search_company(company: str, domain: str) -> str:
    """Quick web search for company context."""
    # Use a search API (Brave, SerpAPI, etc.)
    resp = await httpx.AsyncClient().get(
        "https://api.search.brave.com/res/v1/web/search",
        params={"q": f"{company} {domain} about services"},
        headers={"Accept": "application/json", "X-Subscription-Token": "YOUR_KEY"},
        timeout=10
    )
    
    results = resp.json().get("web", {}).get("results", [])
    return "\n".join(r.get("description", "") for r in results[:3])

Lead Scoring

Rank leads by fit and intent signals:

def score_lead(lead: dict, icp: ICPCriteria) -> int:
    score = 0
    
    # Title match
    title = (lead.get("title") or "").lower()
    for preferred_title in icp.job_titles:
        if preferred_title.lower() in title:
            score += 30
            break
    
    # Company size fit
    employees = lead.get("employee_count", 0) or 0
    if icp.employee_range[0] <= employees <= icp.employee_range[1]:
        score += 20
    
    # Email verified
    if lead.get("email") and lead.get("email_status") == "verified":
        score += 25
    
    # LinkedIn URL present
    if lead.get("linkedin_url"):
        score += 10
    
    # AI context generated (enrichment succeeded)
    if lead.get("ai_context"):
        score += 15
    
    return score


def rank_leads(leads: list[dict]) -> list[dict]:
    for lead in leads:
        lead["score"] = score_lead(lead, ICP)
    return sorted(leads, key=lambda x: x["score"], reverse=True)

Full Pipeline Run

import csv
from datetime import datetime

async def run_lead_pipeline():
    all_leads = []
    
    # Source 1: Apollo
    print("Scraping Apollo...")
    for lead in search_apollo_people(ICP, pages=12):  # 12 pages × 25 = 300 leads
        all_leads.append(lead)
    
    # Source 2: LinkedIn  
    print("Scraping LinkedIn...")
    li_leads = await scrape_linkedin_search("content marketing agency founder", 
                                             "sessions/linkedin_main.json", max_results=100)
    all_leads.extend(li_leads)
    
    # Dedup by email + LinkedIn URL
    seen = set()
    deduped = []
    for lead in all_leads:
        key = lead.get("email") or lead.get("linkedin_url")
        if key and key not in seen:
            seen.add(key)
            deduped.append(lead)
    
    print(f"After dedup: {len(deduped)} leads")
    
    # Enrich (batch with rate limiting)
    print("Enriching leads...")
    enriched = []
    for i, lead in enumerate(deduped[:100]):  # enrich top 100
        enriched_lead = await enrich_lead(lead)
        enriched.append(enriched_lead)
        if i % 10 == 0:
            print(f"  Enriched {i+1}/100...")
        await asyncio.sleep(0.3)
    
    # Score and rank
    ranked = rank_leads(enriched)
    
    # Export to CSV
    filename = f"leads_{datetime.now().strftime('%Y-%m-%d')}.csv"
    with open(filename, 'w', newline='') as f:
        writer = csv.DictWriter(f, fieldnames=[
            "first_name", "last_name", "email", "title", "company",
            "linkedin_url", "score", "ai_context"
        ])
        writer.writeheader()
        writer.writerows(ranked)
    
    print(f"✓ Exported {len(ranked)} ranked leads to {filename}")
    return ranked

asyncio.run(run_lead_pipeline())

Running this daily gives you a fresh batch of scored, enriched leads in your outreach queue — ready for personalized emails or LinkedIn connection requests.


The NEPA AI Lead Gen Machine packages this complete pipeline with Apollo and LinkedIn connectors, automatic CRM export, dedup against your Obsidian vault leads database, and AI-generated personalized openers for the top 20 leads.

→ Get the Lead Gen Machine at /shop/lead-gen-machine

300 leads/month. Minimal manual work.