HuggingFace
Llama 3.x, Mistral, Qwen, Phi, Gemma and more via the HuggingFace Inference API
HuggingFace is the leading hub for open-source AI models. Their Inference API lets you run curated community and commercially-licensed models — including Llama 3, Mistral, Qwen, Phi, and Gemma — via a simple REST interface without managing GPU infrastructure.
Official documentationAvailable Models
meta-llama/Meta-Llama-3.1-8B-Instruct
Meta's Llama 3.1 8B instruction-tuned model. Efficient and capable for general tasks.
Best for: General tasks, low-latency inference
meta-llama/Meta-Llama-3.1-70B-Instruct
Llama 3.1 70B instruction model. Strong reasoning and instruction following.
Best for: Complex reasoning, document analysis
meta-llama/Llama-3.3-70B-Instruct
Latest Llama 3.3 70B. Improved performance over Llama 3.1 at the same size.
Best for: Advanced reasoning, code generation
mistralai/Mistral-7B-Instruct-v0.3
Mistral AI's compact 7B instruction model. Fast and efficient.
Best for: Cost-efficient, fast inference
mistralai/Mixtral-8x7B-Instruct-v0.1
Mixture-of-Experts model with 8x7B architecture. Strong quality at moderate cost.
Best for: High-quality open-source inference
mistralai/Mistral-Small-3.1-24B-Instruct-2503
Mistral Small 3.1 24B — a capable mid-size model with multimodal support.
Best for: Balanced performance and cost
Qwen/Qwen2.5-72B-Instruct
Alibaba's Qwen 2.5 72B. Strong multilingual capabilities and instruction following.
Best for: Multilingual workflows, general tasks
Qwen/Qwen2.5-Coder-32B-Instruct
Qwen's code-specialised 32B model. State-of-the-art open-source coding performance.
Best for: Code generation, review, debugging
microsoft/Phi-3.5-mini-instruct
Microsoft's small but capable Phi-3.5 mini. Great on limited hardware.
Best for: Edge inference, low-resource environments
microsoft/phi-4
Microsoft's Phi-4 model. Strong reasoning in a compact size.
Best for: Reasoning tasks on constrained hardware
google/gemma-2-9b-it
Google's Gemma 2 9B instruction-tuned model. Efficient and well-rounded.
Best for: General-purpose inference
google/gemma-2-27b-it
Google's Gemma 2 27B instruction model. Higher capability than the 9B variant.
Best for: More demanding general tasks
google/gemma-3-4b-pt
Google's Gemma 3 4B instruction model. Higher capability than the 2 27B variant.
Best for: More demanding general tasks
google/gemma-4-31B
Google's Gemma 4 31B model. High-capacity model for complex tasks.
Best for: Advanced reasoning, large-scale tasks
HuggingFaceH4/zephyr-7b-beta
HuggingFace's Zephyr 7B fine-tuned for helpful and harmless dialogue.
Best for: Conversational AI, instruction following
Setup
Create a HuggingFace account
Create a HuggingFace account
Sign up at huggingface.co. A free account gives access to many public models.
HuggingFaceGenerate an access token
Generate an access token
Go to your profile Settings > Access Tokens. Click New token, select Read permissions, and copy the token.
Choose a model
Choose a model
Browse models at huggingface.co/models. Note the model ID (e.g. meta-llama/Meta-Llama-3.1-8B-Instruct). Ensure it supports the Inference API — look for the Inference API badge on the model page.
Add the provider in CipherSense
Add the provider in CipherSense
Go to Organization Settings > LLM Providers > Add Provider. Select HuggingFace, enter your access token and model ID, and click Save.
Connection Fields
Fields required when adding this provider in Organization Settings › LLM Providers.
| Field | Required | Description |
|---|---|---|
| Access Token | Required | Your HuggingFace access token from huggingface.co/settings/tokens. |
| Model ID | Required | The HuggingFace model ID in owner/model-name format. |
Common Use Cases
Ready to add HuggingFace?
Configure this provider in your Organisation Settings and use it in any workflow.