Configuring AI/LLM Features in reNgine Cloud
Last updated April 7, 2026
What AI Features Does reNgine Offer?
reNgine Cloud includes optional AI/LLM-powered capabilities that enhance your reconnaissance workflow:
- Vulnerability Analysis Summaries — Automatically generates detailed technical descriptions, business impact assessments, remediation steps, and reference links for discovered vulnerabilities.
- Attack Surface Insights — Analyzes your recon data (subdomains, open ports, technologies, HTTP responses) and suggests prioritized attack vectors mapped to the MITRE ATT&CK framework.
- Enhanced Report Generation — Adds AI-driven context and analysis to scan reports, making them more actionable for both technical and non-technical stakeholders.
These features use either the OpenAI API (cloud-hosted) or Ollama (local/self-hosted) as the LLM backend.
Option A: OpenAI API (Cloud)
Best for teams that want fast results with minimal infrastructure setup.
Setup
- Navigate to API Vault — In reNgine, go to Scan Engine Settings > API Vault.
- Add your OpenAI API key — Enter your key in the
OpenAI API Keyfield and save. You can generate a key at platform.openai.com/api-keys. - Select a model — reNgine supports multiple OpenAI models. Choose one from the settings panel.
Recommended Models
| Model | Best For | Context Window |
|---|---|---|
| GPT-4o | Best quality analysis, complex targets | 128k tokens |
| GPT-4o-mini | Cost savings with good quality | 128k tokens |
| GPT-4 Turbo | High-quality analysis, large scans | 128k tokens |
Expected API Costs
Costs depend on scan size and how many vulnerabilities trigger LLM analysis. Ballpark estimates per scan:
- Small scan (single target, <50 findings): $0.05 to $0.50
- Medium scan (multiple subdomains, 50-200 findings): $0.50 to $3.00
- Large scan (broad recon, 200+ findings): $3.00 to $15.00
GPT-4o-mini cuts costs by roughly 80% compared to GPT-4o. Reports are cached in the database, so re-viewing a previously analyzed vulnerability incurs no additional cost.
Option B: Ollama (Local/Self-Hosted)
Best for teams that require data to stay on-premises or want to eliminate ongoing API costs.
Why Local?
- No data leaves your VM — all LLM inference runs locally.
- No per-token API costs — after initial setup, usage is free.
- Full control — choose your model, tune performance, and run offline.
Installing Ollama
If Ollama is not pre-installed on your reNgine Cloud VM:
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable ollama
sudo systemctl start ollama
Verify it is running:
curl http://localhost:11434/api/tags
Configuring reNgine for Ollama
reNgine connects to Ollama at http://ollama:11434 by default (the Docker service name). If Ollama runs on the host machine instead of in Docker, set the OLLAMA_INSTANCE environment variable in your reNgine configuration:
OLLAMA_INSTANCE=http://host.docker.internal:11434
In the reNgine UI, navigate to the Ollama settings panel to select and download models directly from the interface.
GPU Instance Types for Good Performance
A GPU dramatically improves local inference speed. Recommended instance types:
| Provider | Instance Type | GPU | VRAM |
|---|---|---|---|
| AWS | g4dn.xlarge | NVIDIA T4 | 16 GB |
| AWS | g5.xlarge | NVIDIA A10G | 24 GB |
| Azure | NC4as_T4_v3 | NVIDIA T4 | 16 GB |
| Azure | NC6s_v3 | NVIDIA V100 | 16 GB |
CPU-only: Ollama works without a GPU but expect significantly slower inference (minutes per analysis instead of seconds). Suitable for small targets or infrequent scans.
Recommended Models by Available RAM
| RAM | Recommended Models | Notes |
|---|---|---|
| 8 GB | llama3:8b, mistral:7b |
Good baseline performance |
| 16 GB | llama3:8b (larger context), codellama:13b |
Better for detailed vulnerability analysis |
| 32 GB+ | llama3:70b (quantized), mixtral:8x7b |
Best local quality, approaches cloud model output |
Download a model from the reNgine UI or via CLI:
ollama pull llama3:8b
Choosing Between Cloud and Local
| Consideration | OpenAI (Cloud) | Ollama (Local) |
|---|---|---|
| Setup complexity | Minimal — just add an API key | Moderate — install Ollama, download models |
| Data privacy | Data sent to OpenAI servers | All data stays on your VM |
| Ongoing cost | Pay per token | Free after setup (GPU instance cost applies) |
| Output quality | Best (GPT-4o) | Good to very good (depends on model and size) |
| Speed | Fast (cloud infrastructure) | Fast with GPU, slow on CPU-only |
| Offline capable | No | Yes |
Recommendation: Start with OpenAI using GPT-4o-mini to evaluate the features. If data residency or cost is a concern, switch to Ollama with a GPU-backed instance and llama3:8b or larger.
Troubleshooting
“API key invalid” Regenerate your key at platform.openai.com/api-keys. Check for leading or trailing whitespace when pasting. Ensure the key has not been revoked or expired.
Ollama not responding
Check if the service is running: sudo systemctl status ollama (system install) or docker ps | grep ollama (Docker). Confirm the endpoint is reachable from the reNgine container: curl http://ollama:11434/api/tags.
Out of memory with local model
Use a smaller model (e.g., llama3:8b instead of llama3:70b) or increase your VM RAM.
AI features not appearing in the UI These features require reNgine 2.0 or later. Check your version in the reNgine dashboard and update if needed.
Next Steps
Explore more configuration guides and tutorials at hailbytes.com/tutorials.
Still need help? Open a ticket at support.hailbytes.com.