Text-to-3D Model Locally: No API Key, No Cloud, No Cost
Back to Blog
3D· 8 min min read

Text-to-3D Model Locally: No API Key, No Cloud, No Cost

Generate 3D meshes from text prompts entirely on your local GPU using OpenAI's Shap-E model. Export to OBJ, GLB, STL — no API key or internet connection required.

NA
By NEPA AI
NEPA AI · Building autonomous systems for creators and businesses
#text-to-3d#shap-e#3d generation#python#open source#local ai

You can crank out 3D models straight from text prompts on your own machine, no API key needed. OpenAI's Shap-E is a local model that generates textured 3D meshes in under 60 seconds.

What Is Shap-E?

Shap-E (from OpenAI Research) makes 3D assets from text or images in one pass. It spits out implicit neural representations turned into explicit meshes. Two modes:

  • Text-to-3D: prompt → mesh
  • Image-to-3D: single image → mesh

Output formats: PLY, OBJ, SDF.

Hardware Requirements

| Setup | Gen Time | Notes | |---|---|---| | RTX 3090 (24GB) | ~15–25s | Best choice | | RTX 3060 (12GB) | ~30–45s | Works well | | RTX 3070 (8GB) | ~45–60s | Use fp16 for VRAM savings | | CPU only | 5–15min | Slow but functional |

Installation

git clone https://github.com/openai/shap-e.git
cd shap-e
pip install -e .
pip install trimesh open3d

First run downloads the model (~2.6GB).

Text-to-3D: Basic Generation

import torch
from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

xm = load_model('transmitter', device=device)
model = load_model('text300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))

latents = sample_latents(
    batch_size=1,
    model=model,
    diffusion=diffusion,
    guidance_scale=15.0,
    model_kwargs=dict(texts=['a red leather armchair'] * 1),
    progress=True,
    clip_denoised=True,
    use_fp16=True,
    use_karras=True,
    karras_steps=64,
)

print(f"Generated {len(latents)} latent(s)")

Exporting to OBJ / GLB

from shap_e.util.notebooks import decode_latent_mesh
import trimesh

def export_3d_model(latent, output_path: str, format="obj"):
    t = decode_latent_mesh(xm, latent).tri_mesh()
    mesh = trimesh.Trimesh(
        vertices=t.verts,
        faces=t.faces,
        vertex_colors=(t.vertex_channels['R'],
                       t.vertex_channels['G'],
                       t.vertex_channels['B'])
    )
    mesh.fix_normals()
    mesh.fill_holes()

    if format == "glb":
        mesh.export(output_path + ".glb")
    elif format == "stl":
        mesh.export(output_path + ".stl")
    else:
        mesh.export(output_path + ".obj")

print(f"Exported to: {output_path}.{format}")

Image-to-3D Generation

from PIL import Image
from shap_e.util.image_util import load_image

image = load_image("product_photo.png")
image_model = load_model('image300M', device=device)

latents = sample_latents(
    batch_size=1,
    model=image_model,
    diffusion=diffusion,
    guidance_scale=3.0,
    model_kwargs=dict(images=[image]),
    clip_denoised=True,
    use_fp16=True,
    use_karras=True,
    karras_steps=64
)

export_3d_model(latents[0], "./output/from_photo", format="glb")

Batch Generation

prompts = [
    "a wooden coffee table",
    "a sci-fi helmet with glowing visor",
    "a ceramic coffee mug"
]

for i, prompt in enumerate(prompts):
    print(f"\n[{i+1}/{len(prompts)}] Generating: {prompt}")
    
    latents = sample_latents(
        batch_size=1,
        model=model,
        diffusion=diffusion,
        guidance_scale=15.0,
        model_kwargs=dict(texts=[prompt]),
        clip_denoised=True,
        use_fp16=True,
        use_karras=True,
        karras_steps=64
    )
    
    filename = f"./output/{prompt.replace(' ', '_')[0:40]}"
    mesh = export_3d_model(latents[0], filename, format="glb")
    print(f"  ✓ {filename} → {len(mesh.faces)} faces")

torch.cuda.empty_cache()

Tips for Better Quality

  • Be specific: "a wooden oak dining chair" > "a chair"
  • Include scale hints: "a small ceramic vase"
  • Use real object categories
  • Avoid abstract concepts

Quality Settings:

guidance_scale = 15.0   # Default — good balance
karras_steps = 64       # Fast, decent quality

Post-Processing in Blender

import bpy
bpy.ops.import_scene.obj(filepath="./output/armchair.obj")
bpy.ops.object.modifier_add(type='DECIMATE')
bpy.context.object.modifiers["Decimate"].ratio = 0.5
bpy.ops.object.modifier_apply(modifier="Decimate")
bpy.ops.export_scene.obj(filepath="./output/armchair_optimized.obj")

Limitations

  • Best for furniture, vehicles, food, common objects
  • Complex characters are rough; use as base mesh
  • Textures are moderate; game-ready assets need re-texturing
  • Max resolution is ~128³

Check out my real AI tools at axon.nepa-ai.com