You can crank out 3D models straight from text prompts on your own machine, no API key needed. OpenAI's Shap-E is a local model that generates textured 3D meshes in under 60 seconds.
What Is Shap-E?
Shap-E (from OpenAI Research) makes 3D assets from text or images in one pass. It spits out implicit neural representations turned into explicit meshes. Two modes:
- Text-to-3D: prompt → mesh
- Image-to-3D: single image → mesh
Output formats: PLY, OBJ, SDF.
Hardware Requirements
| Setup | Gen Time | Notes | |---|---|---| | RTX 3090 (24GB) | ~15–25s | Best choice | | RTX 3060 (12GB) | ~30–45s | Works well | | RTX 3070 (8GB) | ~45–60s | Use fp16 for VRAM savings | | CPU only | 5–15min | Slow but functional |
Installation
git clone https://github.com/openai/shap-e.git
cd shap-e
pip install -e .
pip install trimesh open3d
First run downloads the model (~2.6GB).
Text-to-3D: Basic Generation
import torch
from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
xm = load_model('transmitter', device=device)
model = load_model('text300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))
latents = sample_latents(
batch_size=1,
model=model,
diffusion=diffusion,
guidance_scale=15.0,
model_kwargs=dict(texts=['a red leather armchair'] * 1),
progress=True,
clip_denoised=True,
use_fp16=True,
use_karras=True,
karras_steps=64,
)
print(f"Generated {len(latents)} latent(s)")
Exporting to OBJ / GLB
from shap_e.util.notebooks import decode_latent_mesh
import trimesh
def export_3d_model(latent, output_path: str, format="obj"):
t = decode_latent_mesh(xm, latent).tri_mesh()
mesh = trimesh.Trimesh(
vertices=t.verts,
faces=t.faces,
vertex_colors=(t.vertex_channels['R'],
t.vertex_channels['G'],
t.vertex_channels['B'])
)
mesh.fix_normals()
mesh.fill_holes()
if format == "glb":
mesh.export(output_path + ".glb")
elif format == "stl":
mesh.export(output_path + ".stl")
else:
mesh.export(output_path + ".obj")
print(f"Exported to: {output_path}.{format}")
Image-to-3D Generation
from PIL import Image
from shap_e.util.image_util import load_image
image = load_image("product_photo.png")
image_model = load_model('image300M', device=device)
latents = sample_latents(
batch_size=1,
model=image_model,
diffusion=diffusion,
guidance_scale=3.0,
model_kwargs=dict(images=[image]),
clip_denoised=True,
use_fp16=True,
use_karras=True,
karras_steps=64
)
export_3d_model(latents[0], "./output/from_photo", format="glb")
Batch Generation
prompts = [
"a wooden coffee table",
"a sci-fi helmet with glowing visor",
"a ceramic coffee mug"
]
for i, prompt in enumerate(prompts):
print(f"\n[{i+1}/{len(prompts)}] Generating: {prompt}")
latents = sample_latents(
batch_size=1,
model=model,
diffusion=diffusion,
guidance_scale=15.0,
model_kwargs=dict(texts=[prompt]),
clip_denoised=True,
use_fp16=True,
use_karras=True,
karras_steps=64
)
filename = f"./output/{prompt.replace(' ', '_')[0:40]}"
mesh = export_3d_model(latents[0], filename, format="glb")
print(f" ✓ {filename} → {len(mesh.faces)} faces")
torch.cuda.empty_cache()
Tips for Better Quality
- Be specific: "a wooden oak dining chair" > "a chair"
- Include scale hints: "a small ceramic vase"
- Use real object categories
- Avoid abstract concepts
Quality Settings:
guidance_scale = 15.0 # Default — good balance
karras_steps = 64 # Fast, decent quality
Post-Processing in Blender
import bpy
bpy.ops.import_scene.obj(filepath="./output/armchair.obj")
bpy.ops.object.modifier_add(type='DECIMATE')
bpy.context.object.modifiers["Decimate"].ratio = 0.5
bpy.ops.object.modifier_apply(modifier="Decimate")
bpy.ops.export_scene.obj(filepath="./output/armchair_optimized.obj")
Limitations
- Best for furniture, vehicles, food, common objects
- Complex characters are rough; use as base mesh
- Textures are moderate; game-ready assets need re-texturing
- Max resolution is ~128³
Check out my real AI tools at axon.nepa-ai.com



