Self-hosted GitHub Actions runners on Azure: when, and how
When to move off GitHub-hosted runners onto your own Azure VMs or container apps and how to do it without inheriting an ops nightmare.
GitHub-hosted runners are great. Until they aren't. For most teams, they should be the default but a few specific scenarios make self-hosted runners worth the operational cost.
Here's when I switch, and the patterns I use to avoid the obvious traps.
When to switch (and when not to)
Switch to self-hosted when:
- You need access to a private network. Your tests need to reach a private database, a peered VNet, or an on-prem service. GitHub-hosted runners are public-internet only.
- You need bigger machines or GPUs. You can scale GitHub-hosted runners now, but for very large builds or GPU jobs, your own machine is cheaper.
- You have very high CI volume. Past several hundred CI hours per week, self-hosted starts paying back.
- You need specific OS / hardware not on the GitHub menu. ARM workloads, Windows-on-ARM, niche Linux distros.
Stay on GitHub-hosted when:
- You're a small team running standard builds.
- Your security policy doesn't require private networks.
- You don't have an SRE or platform engineer to run the runners.
The default should be GitHub-hosted. Self-hosted is an operational commitment.
The architecture I deploy on Azure
[GitHub Actions Workflow]
│ (queues a job)
▼
[GitHub Actions Runner Controller]
│ (provisions a runner pod just-in-time)
▼
[AKS or Azure Container App with managed identity]
│
├─ Private endpoint to ACR (pull build images)
├─ Access to peered VNets (your private resources)
└─ OIDC federated credential → Azure subscriptions
Three principles:
- Ephemeral runners. Each job runs on a fresh runner that's destroyed afterwards. No state sneaking between builds.
- Just-in-time provisioning. Runners exist only when there's work. Scale to zero when idle.
- Identity, not keys. Runners authenticate via managed identity to Azure resources, never with stored credentials.
Implementation paths
Option A: ARC (Actions Runner Controller) on AKS.
The mature, recommended path for organizations that already run AKS. ARC manages runner lifecycles via Kubernetes operators. Pros: scales well, mature, well-documented. Cons: you need an AKS cluster.
Option B: Azure Container Apps with the GitHub Actions runner image.
Lighter-weight than AKS. Container Apps has built-in autoscaling and serverless billing. Pros: less infrastructure. Cons: less mature for this use case, slightly more complex networking.
Option C: Plain Azure VM Scale Set with the runner agent.
Old-school but works. A VMSS with the runner installed via custom script. Cons: scaling is slower and you're managing VMs.
For most teams I default to ARC on AKS if they already have AKS, Container Apps if they don't.
The security checklist
Self-hosted runners are sensitive. They run untrusted code (your CI is essentially trusted, but a malicious PR could try to abuse it). Mitigations:
- Don't run on public repositories without
pull_request_targetrestrictions. A forked PR shouldn't get to execute on your runner. - Use ephemeral runners only. Fresh OS image every time.
- Limit what the runner's identity can do. Federated credentials scoped per workflow / branch / environment.
- Network isolation. Runners in their own subnet, with explicit NSG rules. They should reach only what they need.
- Image hardening. Use a minimal runner image (e.g., from
actions/runner-imagesrepo) and keep it patched. Don't use community runner images you can't audit. - Audit the secrets in scope. Self-hosted runners can see workflow secrets. Make sure your secrets are scoped per-environment, not all dumped in repo secrets.
Cost — the part that surprises people
Self-hosted runners can be cheaper or more expensive depending on usage:
- High volume + always-on: GitHub-hosted billing dominates. Self-hosted wins.
- Bursty + low utilization: GitHub-hosted wins. Idle self-hosted infrastructure still costs money.
- Specialized hardware: Self-hosted wins almost always.
Run the numbers before migrating. A spreadsheet with average jobs/day, average job duration, runner specs, and Azure compute pricing will tell you in 20 minutes whether the move pays back.
What I always set up on day one
- Diagnostic settings on the runner infra → Log Analytics workspace.
- Auto-shutdown for any non-AKS infrastructure during off-hours.
- A dashboard showing: jobs queued, jobs running, runners idle, runners failed.
- An alert if a runner has been "running" for >2 hours (likely a stuck job consuming budget).
- A runbook for "all runners are dead", the on-call playbook.
What to watch for after you migrate
Stale caches. Self-hosted runners persist disks across jobs by default. Useful for node_modules caching, but watch out for stale state. Use actions/cache explicitly rather than relying on filesystem persistence.
Secret drift. When you move workflows from GitHub-hosted to self-hosted, the secrets they need may differ (managed identity replaces some, but not all). Audit each workflow.
Concurrency limits. AKS clusters have node-pool quotas. Container Apps have replica limits. Hit these and jobs queue without obvious diagnostics. Set up alerting on queue depth.
The honest summary
Self-hosted runners are an operational choice. The technical setup is a couple of weeks; the running of them is forever. If your team has SRE or platform muscle, fine. If not, GitHub-hosted is almost always the better answer.
When in doubt, stay on GitHub-hosted. The day you genuinely need self-hosted, you'll know.