DocsAgentsOllama (Cloud)
ollama
LLM Provider

Ollama (Cloud)

Gemma 3, Qwen 3, Mistral Large, Kimi, DeepSeek V4 and more via Ollama Cloud

Ollama Cloud is a hosted inference service offering a wide catalogue of top open-weight models — from Google's Gemma 3 and Alibaba's Qwen 3 to Mistral Large, Kimi K2, DeepSeek V4 Pro/Flash, MiniMax, and more. Connect with an API token and select any model from the list without managing your own GPU infrastructure.

Official documentation

Available Models

gemma4:e2b

128k tokens

Gemma 4 models are designed to deliver frontier-level performance at each size. They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding.

Best for: General tasks, reasoning

gemma4:31b

128k tokens

Gemma 4 models are designed to deliver frontier-level performance at each size. They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding.

Best for: General tasks, reasoning

gemma3:27b

128k tokens

Google's Gemma 3 27B. Strong general-purpose performance and instruction following.

Best for: General tasks, reasoning

gemma3:12b

128k tokens

Gemma 3 12B — balanced size and capability for production use.

Best for: Production workflows

gemma3:4b

128k tokens

Gemma 3 4B — the smallest and fastest Gemma 3 variant.

Best for: Low-latency, lightweight tasks

qwen3.5:397b

128k tokens

Alibaba's Qwen 3.5 397B flagship. Massive capacity for the most demanding tasks.

Best for: Most complex reasoning and analysis

qwen3-vl:235b

128k tokens

Qwen 3 Vision-Language 235B. Multimodal model handling text and images.

Best for: Multimodal workflows

mistral-large-3:675b

128k tokens

Mistral Large 3 675B. Mistral's most powerful open-weight model.

Best for: Complex reasoning, enterprise tasks

ministral-3:8b

128k tokens

Ministral 3 8B — compact and fast for high-throughput inference.

Best for: Fast, cost-efficient inference

kimi-k2:1t

128k tokens

Moonshot AI's Kimi K2 1T parameter model. Strong long-context reasoning.

Best for: Long-context tasks, complex reasoning

kimi-k2-thinking

128k tokens

Kimi K2 Thinking variant with extended chain-of-thought capabilities.

Best for: Deep reasoning, math, science

qwen3-coder:480b

128k tokens

Qwen 3 Coder 480B — specialised for code generation and technical tasks.

Best for: Code generation, debugging, review

deepseek-v4-pro

128k tokens

DeepSeek V4 Pro — DeepSeek's latest flagship model with significantly improved reasoning, coding, and instruction-following over V3.

Best for: Advanced reasoning, code generation, complex analysis

deepseek-v4-flash

128k tokens

DeepSeek V4 Flash — a faster, cost-efficient variant of DeepSeek V4 Pro optimised for high-throughput production workloads.

Best for: High-volume pipelines, cost-efficient inference

deepseek-v3.1:671b

128k tokens

DeepSeek V3.1 671B. Powerful general-purpose model with strong reasoning.

Best for: General tasks, cost-efficient reasoning

minimax-m2

1M tokens

MiniMax M2 — large-scale multimodal model from MiniMax AI.

Best for: Long-context, multimodal tasks

devstral-2:123b

128k tokens

Devstral 2 123B — Mistral's code-focused model for agentic development tasks.

Best for: Agentic coding, software engineering

glm-5

128k tokens

Zhipu AI's GLM-5 — strong multilingual and reasoning capabilities.

Best for: Multilingual tasks, general reasoning

gpt-oss:120b

128k tokens

GPT OSS 120B open-source model — high capability for open-weight inference.

Best for: Advanced general-purpose tasks

Setup

1

Create an Ollama Cloud account

Sign up at ollama.com and navigate to the Cloud section to get started with hosted inference.

Ollama
2

Generate an API token

In your Ollama Cloud account settings, generate an API token. Copy the token value — it is only shown once.

3

Add the provider in CipherSense

Go to Organization Settings > LLM Providers > Add Provider. Select Ollama (Cloud), paste your API token, select a model from the list, and click Save.

Connection Fields

Fields required when adding this provider in Organization Settings › LLM Providers.

FieldRequiredDescription
API Token
Required
Your Ollama Cloud API token.

Common Use Cases

Run frontier open-weight models without GPU infrastructureAccess Qwen, Mistral, Kimi, and DeepSeek in one placeCost-efficient inference for open-source modelsAgentic coding and multi-agent workflows

Ready to add Ollama (Cloud)?

Configure this provider in your Organisation Settings and use it in any workflow.