QUICK INFO
| Difficulty | Beginner |
| Time Required | 15-20 minutes |
| Prerequisites | Terminal/command line basics, 8GB RAM minimum (16GB recommended), 3GB free disk space |
| Tools Needed | Windows 10/11 (x64/ARM), Windows Server 2025, or macOS with Apple silicon |
What You'll Learn:
- Install Foundry Local via package manager or manual download
- Run your first local AI model with a single command
- Integrate local models into applications using Python, JavaScript, or C# SDKs
- Manage cached models and control the Foundry service
Foundry Local runs generative AI models directly on your hardware with no Azure subscription, no API keys, and no data leaving your device. This guide covers installation, running models via CLI, and integrating with applications through the OpenAI-compatible API.
Getting Started
System Requirements
Operating Systems:
- Windows 10 (x64), Windows 11 (x64/ARM), Windows Server 2025
- macOS (Apple silicon only)
Hardware:
- Minimum: 8GB RAM, 3GB free disk space
- Recommended: 16GB RAM, 15GB free disk space
Optional Hardware Acceleration:
- NVIDIA GPU (RTX 2000 series or newer)
- AMD GPU (6000 series or newer)
- Intel iGPU, Intel NPU (requires 32GB+ system memory)
- Qualcomm Snapdragon X Elite (8GB+ memory)
- Apple silicon (GPU acceleration included)
NPU support on Windows requires version 24H2 or later. Intel NPU users should install the Intel NPU driver separately.
Installation
Windows (via WinGet):
Open PowerShell or Command Prompt and run:
winget install Microsoft.FoundryLocal
macOS (via Homebrew):
brew tap microsoft/foundrylocal
brew install foundrylocal
Verify the installation:
foundry --help
Expected result: A list of available commands and their descriptions.
Manual Installation (Alternative):
Download installers from the GitHub releases page. On Windows, download the .msix package matching your architecture (x64 or arm64) and install via PowerShell:
Add-AppxPackage .\FoundryLocal.msix
Running Your First Model
Start an Interactive Chat
Run a model with a single command:
foundry model run phi-3.5-mini
Foundry Local downloads the model variant optimized for your hardware automatically. On NVIDIA systems, it fetches the CUDA version. On Qualcomm NPUs, it downloads the NPU-optimized variant. Without GPU or NPU, it defaults to the CPU version.
Expected result: After download completes, an interactive chat session starts. Type your prompt and press Enter to receive responses.
Available Models
List all models in the catalog:
foundry model list
The output displays model aliases, supported devices (CPU/GPU/NPU), file sizes, and licenses.
Common models include:
| Model | Parameters | Use Case |
|---|---|---|
| phi-3.5-mini | 3.8B | General chat, coding assistance |
| phi-4-mini | 14B | Complex reasoning, analysis |
| qwen2.5-0.5b | 0.5B | Lightweight tasks, low-resource devices |
| qwen2.5-coder-14b | 14B | Code generation |
| mistral-7b-v0.2 | 7B | General purpose, multilingual |
| deepseek-r1 | Various | Reasoning tasks |
Model Information
Get details about a specific model before running:
foundry model info phi-3.5-mini
This shows available variants, hardware requirements, license terms, and download size.
CLI Command Reference
Model Commands
| Command | Description |
|---|---|
foundry model list |
List all available models |
foundry model run <model> |
Download (if needed) and start interactive chat |
foundry model download <model> |
Download model without running |
foundry model load <model> |
Load model into service memory |
foundry model unload <model> |
Remove model from memory |
foundry model info <model> |
Display model details |
Service Commands
| Command | Description |
|---|---|
foundry service start |
Start the Foundry Local service |
foundry service stop |
Stop the service |
foundry service restart |
Restart the service |
foundry service status |
Check service status and endpoint |
foundry service ps |
List currently loaded models |
Cache Commands
| Command | Description |
|---|---|
foundry cache list |
List downloaded models |
foundry cache location |
Show cache directory path |
foundry cache remove <model> |
Delete a model from cache |
foundry cache cd <path> |
Change cache directory |
Filtering Models
Filter the model list by hardware or task:
foundry model list --filter device=GPU
foundry model list --filter task=chat-completion
foundry model list --filter alias=phi*
Negate filters with !:
foundry model list --filter device=!CPU
Integrating with Applications
Foundry Local exposes an OpenAI-compatible REST API. After loading a model, applications can send requests to the local endpoint.
Check the API Endpoint
foundry service status
The output shows the endpoint URL, typically http://127.0.0.1:5273/v1.
Python Integration
Install the SDK:
pip install foundry-local-sdk openai
Example script:
import openai
from foundry_local import FoundryLocalManager
alias = "phi-3.5-mini"
manager = FoundryLocalManager(alias)
client = openai.OpenAI(
base_url=manager.endpoint,
api_key=manager.api_key
)
response = client.chat.completions.create(
model=manager.get_model_info(alias).id,
messages=[{"role": "user", "content": "Explain recursion in programming."}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
The FoundryLocalManager starts the service if not running and loads the specified model automatically.
JavaScript/Node.js Integration
Install dependencies:
npm install foundry-local-sdk openai
Example:
import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";
const alias = "phi-3.5-mini";
const manager = new FoundryLocalManager();
const modelInfo = await manager.init(alias);
const openai = new OpenAI({
baseURL: manager.endpoint,
apiKey: manager.apiKey,
});
const stream = await openai.chat.completions.create({
model: modelInfo.id,
messages: [{ role: "user", content: "What is the golden ratio?" }],
stream: true,
});
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
C# Integration
Add the NuGet package:
dotnet add package Microsoft.AI.Foundry.Local.WinML
The C# SDK runs entirely in-process without requiring the Foundry CLI or HTTP calls to the local service.
using Microsoft.AI.Foundry.Local;
var config = new Configuration { AppName = "my-app" };
await FoundryLocalManager.CreateAsync(config);
var mgr = FoundryLocalManager.Instance;
var catalog = await mgr.GetCatalogAsync();
var model = await catalog.GetModelAsync("qwen2.5-0.5b");
await model.DownloadAsync(progress => Console.Write($"\rDownloading: {progress:F1}%"));
await model.LoadAsync();
var chatClient = await model.GetChatClientAsync();
var messages = new List<ChatMessage>
{
new ChatMessage { Role = "user", Content = "Why is the sky blue?" }
};
var response = chatClient.CompleteChatStreamingAsync(messages, CancellationToken.None);
await foreach (var chunk in response)
{
Console.Write(chunk.Choices[0].Message.Content);
}
await model.UnloadAsync();
Direct REST API Usage
Without SDKs, send requests directly to the endpoint:
curl http://127.0.0.1:5273/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Phi-3.5-mini-instruct-generic-gpu",
"messages": [{"role": "user", "content": "Hello"}],
"stream": false
}'
Use the exact model ID from foundry model list, not the alias, when calling the API directly.
Troubleshooting
Symptom: foundry: command not found after installation
Fix: Close and reopen your terminal to refresh the PATH. On Windows, try opening a new PowerShell window as Administrator.
Symptom: Exception: Request to local service failed when running model list
Fix: The service may have crashed. Run foundry service restart and try again.
Symptom: Model download stalls or fails
Fix: Check your internet connection. If resuming a partial download, run foundry cache remove <model> then retry the download.
Symptom: Out of memory errors during inference
Fix: Try a smaller model (e.g., qwen2.5-0.5b instead of phi-4). Close other memory-intensive applications. On systems with less than 16GB RAM, avoid models larger than 7B parameters.
Symptom: GPU not being utilized (inference running on CPU)
Fix: Verify GPU drivers are up to date. NVIDIA requires driver version 32.0.15.5585 or newer with CUDA 12.5+. Run foundry model list to confirm GPU variants appear.
Symptom: NPU model not available on Windows ARM
Fix: NPU support requires Windows 24H2 or later. Check Windows Update for the latest version. Intel NPU users must install the Intel NPU driver separately.
What's Next
You now have Foundry Local running AI models locally with full privacy. For production deployments, see the Foundry Local documentation for converting custom Hugging Face models to ONNX format using Microsoft Olive.
PRO TIPS
- Use
foundry model download <model>to pre-cache models before going offline - Set a custom cache location on a larger drive with
foundry cache cd /path/to/drive - Run
foundry service set --port 8081to change the default API port (5273) if it conflicts with other services - Models stay in memory for 10 minutes by default after the last request (TTL). Load explicitly with
foundry model loadfor persistent availability - On multi-GPU systems, use
foundry service set --gpu 1to specify which GPU to use
COMMON MISTAKES
- Using the model alias instead of the full model ID when calling the REST API directly: The alias (e.g.,
phi-3.5-mini) works in CLI commands but REST calls require the exact model ID (e.g.,Phi-3.5-mini-instruct-generic-gpu). Get IDs fromfoundry model list. - Forgetting to load the model before API calls: The SDKs handle this automatically, but direct REST API users must run
foundry model load <model>first or the service returns 404. - Installing on unsupported macOS (Intel): Foundry Local requires Apple silicon. Intel Macs are not supported.
- Running multiple large models simultaneously: Each loaded model consumes RAM/VRAM. Unload unused models with
foundry model unload <model>before loading another.
PROMPT TEMPLATES
System Prompt for Structured Output
You are a data extraction assistant. Extract information from user text and return valid JSON only. No explanations. No markdown formatting.
Schema: {"name": string, "date": string, "amount": number}
Customize by: Modify the JSON schema to match your data structure.
Example output:
{"name": "Invoice #4521", "date": "2025-03-15", "amount": 1250.00}
Code Review Assistant
Review this code for bugs, security issues, and performance problems. List findings as:
- [SEVERITY] Location: Description
Severity levels: CRITICAL, HIGH, MEDIUM, LOW
Customize by: Add language-specific rules or focus areas (e.g., "Focus on SQL injection vulnerabilities").
Example output:
- [HIGH] Line 23: User input passed directly to SQL query without sanitization
- [MEDIUM] Line 45: Unnecessary database call inside loop, move outside
- [LOW] Line 12: Variable 'temp' declared but never used
FAQ
Q: Does Foundry Local send any data to Microsoft or the cloud? A: No. All inference happens locally. An internet connection is only needed to download models initially. After caching, models run entirely offline.
Q: Can I use models from Hugging Face that aren't in the catalog? A: Yes. Convert models to ONNX format using Microsoft Olive, then place them in a custom directory structure that Foundry Local can read. See the custom models documentation.
Q: What's the difference between the alias and model ID?
A: The alias (e.g., phi-3.5-mini) is a friendly name that auto-selects the best hardware variant. The model ID (e.g., Phi-3.5-mini-instruct-generic-gpu) specifies an exact variant. Use aliases in CLI commands; use IDs for direct API calls.
Q: How do I update Foundry Local?
A: On Windows: winget upgrade --id Microsoft.FoundryLocal. On macOS: brew upgrade foundrylocal.
Q: Can I run Foundry Local in a Docker container or CI/CD pipeline? A: Foundry Local is designed for desktop/edge devices with direct hardware access. Container support is limited. For server deployments, consider Azure AI Foundry instead.
Q: What licenses apply to the models?
A: Each model has its own license (MIT, Apache 2.0, etc.) shown in foundry model info <model>. Review terms before commercial use.
RESOURCES
- Foundry Local GitHub Repository: Source code, issue tracking, release downloads
- Microsoft Learn Documentation: Official guides, architecture overview, SDK reference
- Foundry Local Discord: Community support and discussions
- Model Compilation Guide: Convert Hugging Face models to ONNX format




