LLM Provider

HuggingFace

Llama 3.x, Mistral, Qwen, Phi, Gemma and more via the HuggingFace Inference API

HuggingFace is the leading hub for open-source AI models. Their Inference API lets you run curated community and commercially-licensed models — including Llama 3, Mistral, Qwen, Phi, and Gemma — via a simple REST interface without managing GPU infrastructure.

Official documentation

Available Models

meta-llama/Meta-Llama-3.1-8B-Instruct

128k tokens

Meta's Llama 3.1 8B instruction-tuned model. Efficient and capable for general tasks.

Best for: General tasks, low-latency inference

meta-llama/Meta-Llama-3.1-70B-Instruct

128k tokens

Llama 3.1 70B instruction model. Strong reasoning and instruction following.

Best for: Complex reasoning, document analysis

meta-llama/Llama-3.3-70B-Instruct

128k tokens

Latest Llama 3.3 70B. Improved performance over Llama 3.1 at the same size.

Best for: Advanced reasoning, code generation

mistralai/Mistral-7B-Instruct-v0.3

32k tokens

Mistral AI's compact 7B instruction model. Fast and efficient.

Best for: Cost-efficient, fast inference

mistralai/Mixtral-8x7B-Instruct-v0.1

32k tokens

Mixture-of-Experts model with 8x7B architecture. Strong quality at moderate cost.

Best for: High-quality open-source inference

mistralai/Mistral-Small-3.1-24B-Instruct-2503

128k tokens

Mistral Small 3.1 24B — a capable mid-size model with multimodal support.

Best for: Balanced performance and cost

Qwen/Qwen2.5-72B-Instruct

128k tokens

Alibaba's Qwen 2.5 72B. Strong multilingual capabilities and instruction following.

Best for: Multilingual workflows, general tasks

Qwen/Qwen2.5-Coder-32B-Instruct

128k tokens

Qwen's code-specialised 32B model. State-of-the-art open-source coding performance.

Best for: Code generation, review, debugging

microsoft/Phi-3.5-mini-instruct

128k tokens

Microsoft's small but capable Phi-3.5 mini. Great on limited hardware.

Best for: Edge inference, low-resource environments

microsoft/phi-4

16k tokens

Microsoft's Phi-4 model. Strong reasoning in a compact size.

Best for: Reasoning tasks on constrained hardware

google/gemma-2-9b-it

8k tokens

Google's Gemma 2 9B instruction-tuned model. Efficient and well-rounded.

Best for: General-purpose inference

google/gemma-2-27b-it

8k tokens

Google's Gemma 2 27B instruction model. Higher capability than the 9B variant.

Best for: More demanding general tasks

google/gemma-3-4b-pt

8k tokens

Google's Gemma 3 4B instruction model. Higher capability than the 2 27B variant.

Best for: More demanding general tasks

google/gemma-4-31B

128k tokens

Google's Gemma 4 31B model. High-capacity model for complex tasks.

Best for: Advanced reasoning, large-scale tasks

HuggingFaceH4/zephyr-7b-beta

32k tokens

HuggingFace's Zephyr 7B fine-tuned for helpful and harmless dialogue.

Best for: Conversational AI, instruction following

Setup

Create a HuggingFace account

HuggingFace

Generate an access token

Go to your profile Settings > Access Tokens. Click New token, select Read permissions, and copy the token.

Choose a model

Browse models at huggingface.co/models. Note the model ID (e.g. meta-llama/Meta-Llama-3.1-8B-Instruct). Ensure it supports the Inference API — look for the Inference API badge on the model page.

Add the provider in CipherSense

Go to Organization Settings > LLM Providers > Add Provider. Select HuggingFace, enter your access token and model ID, and click Save.

Connection Fields

Fields required when adding this provider in Organization Settings › LLM Providers.

Field	Required	Description
Access Token	Required	Your HuggingFace access token from huggingface.co/settings/tokens.
Model ID	Required	The HuggingFace model ID in owner/model-name format.

Common Use Cases

Experimenting with open-source modelsPrivate model deploymentMultilingual NLPFine-tuned model inference

Ready to add HuggingFace?

Configure this provider in your Organisation Settings and use it in any workflow.

Open Dashboard ← Back to AI Agents