Ollama (Cloud)
Gemma 3, Qwen 3, Mistral Large, Kimi, DeepSeek V4 and more via Ollama Cloud
Ollama Cloud is a hosted inference service offering a wide catalogue of top open-weight models — from Google's Gemma 3 and Alibaba's Qwen 3 to Mistral Large, Kimi K2, DeepSeek V4 Pro/Flash, MiniMax, and more. Connect with an API token and select any model from the list without managing your own GPU infrastructure.
Official documentationAvailable Models
gemma4:e2b
Gemma 4 models are designed to deliver frontier-level performance at each size. They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding.
Best for: General tasks, reasoning
gemma4:31b
Gemma 4 models are designed to deliver frontier-level performance at each size. They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding.
Best for: General tasks, reasoning
gemma3:27b
Google's Gemma 3 27B. Strong general-purpose performance and instruction following.
Best for: General tasks, reasoning
gemma3:12b
Gemma 3 12B — balanced size and capability for production use.
Best for: Production workflows
gemma3:4b
Gemma 3 4B — the smallest and fastest Gemma 3 variant.
Best for: Low-latency, lightweight tasks
qwen3.5:397b
Alibaba's Qwen 3.5 397B flagship. Massive capacity for the most demanding tasks.
Best for: Most complex reasoning and analysis
qwen3-vl:235b
Qwen 3 Vision-Language 235B. Multimodal model handling text and images.
Best for: Multimodal workflows
mistral-large-3:675b
Mistral Large 3 675B. Mistral's most powerful open-weight model.
Best for: Complex reasoning, enterprise tasks
ministral-3:8b
Ministral 3 8B — compact and fast for high-throughput inference.
Best for: Fast, cost-efficient inference
kimi-k2:1t
Moonshot AI's Kimi K2 1T parameter model. Strong long-context reasoning.
Best for: Long-context tasks, complex reasoning
kimi-k2-thinking
Kimi K2 Thinking variant with extended chain-of-thought capabilities.
Best for: Deep reasoning, math, science
qwen3-coder:480b
Qwen 3 Coder 480B — specialised for code generation and technical tasks.
Best for: Code generation, debugging, review
deepseek-v4-pro
DeepSeek V4 Pro — DeepSeek's latest flagship model with significantly improved reasoning, coding, and instruction-following over V3.
Best for: Advanced reasoning, code generation, complex analysis
deepseek-v4-flash
DeepSeek V4 Flash — a faster, cost-efficient variant of DeepSeek V4 Pro optimised for high-throughput production workloads.
Best for: High-volume pipelines, cost-efficient inference
deepseek-v3.1:671b
DeepSeek V3.1 671B. Powerful general-purpose model with strong reasoning.
Best for: General tasks, cost-efficient reasoning
minimax-m2
MiniMax M2 — large-scale multimodal model from MiniMax AI.
Best for: Long-context, multimodal tasks
devstral-2:123b
Devstral 2 123B — Mistral's code-focused model for agentic development tasks.
Best for: Agentic coding, software engineering
glm-5
Zhipu AI's GLM-5 — strong multilingual and reasoning capabilities.
Best for: Multilingual tasks, general reasoning
gpt-oss:120b
GPT OSS 120B open-source model — high capability for open-weight inference.
Best for: Advanced general-purpose tasks
Setup
Create an Ollama Cloud account
Create an Ollama Cloud account
Sign up at ollama.com and navigate to the Cloud section to get started with hosted inference.
OllamaGenerate an API token
Generate an API token
In your Ollama Cloud account settings, generate an API token. Copy the token value — it is only shown once.
Add the provider in CipherSense
Add the provider in CipherSense
Go to Organization Settings > LLM Providers > Add Provider. Select Ollama (Cloud), paste your API token, select a model from the list, and click Save.
Connection Fields
Fields required when adding this provider in Organization Settings › LLM Providers.
| Field | Required | Description |
|---|---|---|
| API Token | Required | Your Ollama Cloud API token. |
Common Use Cases
Ready to add Ollama (Cloud)?
Configure this provider in your Organisation Settings and use it in any workflow.