divi-ai: AI Coding Assistant¶

Important

divi-ai is experimental. It runs on CPU with small local models, so answers may be inaccurate and knowledge is limited to what was indexed at build time. Always verify code against the official documentation.

divi-ai is a coding assistant for Divi that runs directly in your terminal. It answers questions, generates code examples, and explains APIs — all using a local LLM on your machine. No API keys required. After the first launch (which downloads the model), divi-ai works fully offline.

Installation¶

pip install qoro-divi[ai]

If installation fails due to llama-cpp-python, see Troubleshooting below.

Choosing a Model¶

On first launch, an interactive selector lets you pick a model. Choose one before launching so you know what to expect:

Key	Model	Download	Est. RAM	Context
`1.5b`	Qwen 2.5 Coder 1.5B	1.0 GB	~1.2 GB	8K
`3b`	Qwen 2.5 Coder 3B	1.9 GB	~2.3 GB	8K
`7b` (default)	Qwen 2.5 Coder 7B	4.5 GB	~5.4 GB	16K
`14b`	Qwen 2.5 Coder 14B	8.4 GB	~10.1 GB	16K
`e2b`	Gemma 4 E2B	2.9 GB	~3.5 GB	8K
`e4b`	Gemma 4 E4B	4.6 GB	~5.5 GB	8K

The Qwen Coder models are code-specialized and generally give better results for code generation. The Gemma models are general-purpose alternatives that work well for explanations and conceptual questions. Larger models produce better answers but need more RAM and run slower.

Hardware recommendations:

Apple Silicon with 16+ GB RAM: 7b or 14b
x86 with 32+ GB RAM: 7b or 14b
x86 with 16+ GB RAM: e4b or 7b
Less than 16 GB RAM: 1.5b, 3b, or e2b

First Launch¶

divi-ai

On the first run:

The interactive model selector opens (arrow keys to navigate, Enter to confirm).
The selected model is downloaded from HuggingFace (~1–9 GB depending on your choice). This requires an internet connection and may take a few minutes.
The model and search index are loaded into memory. This can take 30–60 seconds depending on your hardware.
The TUI opens and you can start asking questions.

Subsequent launches skip the download step and are much faster. Models are cached locally (the exact location is platform-dependent, determined by platformdirs). Delete model folders from the cache directory to free disk space.

Using the Chat Interface¶

Type a question and press Enter to get an answer. The header bar tracks how much of the model’s context window your conversation has used.

Press Escape to cancel generation mid-stream.
Use /reset before switching to a new topic to free context.
Use /retry if the answer seems incomplete or off.

Slash Commands¶

Command	Description
`/save <file>`	Save the last code block to a file (relative to your working directory). Automatically runs a syntax check on the saved file.
`/copy`	Copy the last code block to the clipboard. Requires `xclip` or `xsel` on Linux.
`/check`	Syntax-check all Python code blocks from the last response.
`/retry`	Re-run the query (including retrieval) to get a different response.
`/reset`	Clear conversation history and free context window space.
`/clear`	Clear the screen and reset history.
`/quit`, `/exit`	Exit the TUI.

CLI Options¶

divi-ai [OPTIONS]

--reselect-model: Forget the saved model preference and re-prompt for selection.
--top-k N: Number of documentation chunks retrieved per query (default: 8). Higher values give the model more context but use more of the context window. Lower values are faster but may miss relevant information.
--max-tokens N: Maximum tokens the model can generate per response (default: 1024).
--debug: Show index loading info and library messages.
--dev: Developer mode: show retrieved chunks, FAISS scores, sources, and token generation speed after each response.

Troubleshooting¶

llama-cpp-python fails to install

On Windows this is the most common installation failure. llama-cpp-python has no prebuilt wheels on PyPI, so pip install always downloads the source and compiles it. On Linux and macOS this usually succeeds silently when a C++ toolchain is present; on Windows it frequently fails. Try these in order:

Install a prebuilt wheel from abetlen’s index (works on Windows, Linux, and macOS):
```
pip install "llama-cpp-python==0.3.19" \
    --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu \
    --only-binary=:all:
pip install "qoro-divi[ai]"
```
The explicit ==0.3.19 is required: PyPI hosts no wheels for this package, so without a version pin pip downloads the latest source release and compiles it. --only-binary=:all: causes pip to report an error immediately if no wheel matches your Python version and architecture, instead of silently compiling from source.
If you must build from source, install a C++ toolchain:
- Linux: build-essential (Debian/Ubuntu) or equivalent.
- macOS: Xcode Command Line Tools (xcode-select --install).
- Windows: Visual Studio Build Tools with the Desktop development with C++ workload. Run pip install from an x64 Native Tools Command Prompt for VS so MSVC’s cl.exe is on PATH.

Note

Windows: Strawberry Perl on PATH. If a Windows source build fails and the build log shows C:/Strawberry/c/bin/gcc.exe or any path under C:\Strawberry, CMake is picking up the MinGW compiler bundled with Strawberry Perl instead of MSVC. The vendored llama.cpp sources do not build cleanly with that toolchain. Either remove C:\Strawberry\c\bin from PATH in the current Command Prompt window, or launch an x64 Native Tools Command Prompt for VS before running pip.

“Context window exceeded” / answers cut off mid-sentence

The conversation has filled the model’s context window. Use /reset to clear history. If this happens frequently, switch to a model with a 16K context window (7b or 14b).

Slow or unusable on my machine

Try 1.5b or e2b — they run acceptably on most hardware. If even those are too slow, divi-ai may not be practical on your system.

Answers seem wrong or hallucinated

Try a larger model if your hardware allows it. See the important notice at the top of this page.

For Contributors¶

These commands are for Divi contributors rebuilding or evaluating the search index. Install the AI dependencies first with uv sync --extra ai to pull in only the AI stack, or uv sync --all-extras if you are also working on docs, tests, or other areas that need the full development environment.

python -m divi.ai help       # Show commands and workflow overview
python -m divi.ai build      # Rebuild the FAISS index from source
python -m divi.ai search     # Interactive search against the index
python -m divi.ai inspect    # Inspect assembled prompts (no LLM)
python -m divi.ai eval       # Run eval queries, save results
python -m divi.ai compare    # Compare two eval runs side-by-side

Typical development workflow:

Change source code or docs.
python -m divi.ai build to rebuild the index.
python -m divi.ai search or inspect to verify retrieval quality.
divi-ai to test end-to-end.

Note

If you run out of memory during build, reduce the batch size: python -m divi.ai build --batch-size 4