Overview
This repository serves as a monorepository for the infrastructure and Kubernetes cluster powering my homelab. The project embraces Infrastructure as Code (IaC) and GitOps principles to ensure declarative, repeatable, and version-controlled infrastructure management. Leveraging tools such as Ansible, Terraform, Kubernetes, Flux, NixOS, Renovate, Talos and GitHub Actions, this repository is structured to maximize automation, consistency, and maintainability. All configurations are declared as code, promoting reproducibility and enabling seamless updates and scaling of my homelab environment.
Kubernetes
The Kubernetes cluster is deployed using Talos, leveraging M.2 NVMe SSDs across all nodes for high-performance storage. The cluster uses OpenEBS-MayaStor as the storage solution, ensuring low-latency, reliable, and efficient block storage for workloads.
Core Components
- actions-runner-controller: Self-hosted Github runners, pre-pull images for spegel.
- cert-manager: Creates SSL certificates for services in my cluster.
- cilium: Internal Kubernetes container networking interface.
- cloudflared: Enables Cloudflare secure access to certain ingresses.
- external-dns: Automatically syncs ingress DNS records to a DNS provider.
- external-secrets: Managed Kubernetes secrets using 1Password Connect.
- ingress-nginx: Kubernetes ingress controller using NGINX as a reverse proxy and load balancer.
- openebs-mayastor: Distributed block storage for peristent storage.
- spegel: Stateless cluster local OCI registry mirror.
- volsync: Backup and recovery of persistent volume claims.
GitOps
Flux watches the clusters in my kubernetes folder (see Directories below) and ensures that my clusters are updated based on the state of the corresponding Git repository.
In my setup, Flux operates by recursively scanning the kubernetes/apps
folder until it identifies the top-level kustomization.yaml
file within each directory. This file serves as the entry point for Flux, and it lists all the resources to be applied to the cluster. Typically, the kustomization.yaml
contains a namespace resource and one or more Flux kustomizations (ks.yaml
). These kustomizations govern the deployment of specific resources, including HelmRelease
resources or other application-specific resources, which Flux subsequently applies to the cluster.
Renovate continuously monitors my entire repository for dependency updates. When an update is detected, Renovate automatically creates a pull request. Upon merging these pull requests, Flux is triggered to apply the changes to my clusters, ensuring that my environments are always aligned with the latest desired state as defined in Git.
This GitOps workflow enables a fully automated and declarative approach to managing both the infrastructure and application deployments across my Kubernetes clusters. By relying on Flux and Renovate, I can ensure that updates are consistent, repeatable, and seamlessly applied, maintaining the integrity and reliability of the cluster without manual intervention.
Directories
This Git repository contains the following directories under Kubernetes.
๐ kubernetes
โโโ ๐ apps # applications
โโโ ๐ bootstrap # bootstrap procedures
โโโ ๐ components # re-useable components
โโโ ๐ flux # flux system configuration
โโโ ๐ talos # talos configuration
Flux Workflow
This is a high-level look how Flux deploys my applications with dependencies. In most cases a HelmRelease
will depend on other HelmRelease
's, in other cases a Kustomization
will depend on other Kustomization
's, and in rare situations an app can depend on a HelmRelease
and a Kustomization
. The example below shows that searxng
won't be deployed or upgrade until the onepassword-store
Helm release is installed or in a healthy state.
graph TD A>Kustomization:external-secrets] -->|Creates| B[HelmRelease:external-secrets] C>Kustomization:onepassword-connect] -->|Creates| D[HelmRelease:onepassword-connect] E>Kustomization:onepassword-store] -->|Creates| F[HelmRelease:onepassword-store] F>HelmRelease:onepassword-store] -->|Depends on| B>HelmRelease:external-secrets] F>HelmRelease:onepassword-store] -->|Depends on| D>HelmRelease:onepassword-connect] G>Kustomization:gatus] -->|Creates| H[HelmRelease:gatus] I>Kustomization:searxng] -->|Creates| J[HelmRelease:searxng] J>HelmRelease:searxng] -->|Depends on| F>HelmRelease:onepassword-store] J>HelmRelease:searxng] -->|Creates| K>Kustomization:gatus-components] K>Kustomization:gatus-components] -->|Depends on| H>HelmRelease:gatus]
Networking
The homelab network is anchored by a datacenter-grade access switch, primarily operating in Layer 2 mode with BGP for routing. Outside the cluster, a dedicated industrial PC configured with nix-config provides essential Kubernetes infrastructure services, including NTP, external-dns, HTTP proxy, and discovery services.
While the majority of my infrastructure and workloads are self-hosted, I rely on the cloud for certain critical components of my setup, as this approach is essential for mitigating several key risks. By offloading these applications to the cloud, I significantly reduce the complexity of maintenance. Specifically, this approach addresses three critical concerns: (1) avoiding chicken-and-egg scenarios, (2) ensuring the availability of mission-critical services regardless of the status of my Kubernetes cluster, and (3) addressing the "hit by a bus" factorโensuring that vital applications such as email, password managers, and photo storage remain accessible and functional even in the event of an unexpected absence.
While one could theoretically resolve the first two issues by hosting a Kubernetes cluster in the cloud and deploying critical services like HCVault, Keycloak, and Ntfy. The practicality of maintaining another cluster and monitoring a separate set of workloads would incur additional overhead. Moreover, the effort and cost of managing a cloud-based Kubernetes cluster would likely equate to, if not exceed, the savings gained from delegating these responsibilities to the cloud, as described below.
Service | Use | Cost |
---|---|---|
1Password | Secrets with External Secrets | ~$36/yr |
Cloudflare | Domain, S3 and ZeroTrust | Free |
Discord | Private channel notify me cluster alerts | Free |
GitHub | Hosting this repository and continuous integration/deployments | Free |
Total: ~$3/mo |
Hardware
Device | Num | OS Disk Size | Data Disk Size | Ram | OS | Function |
---|---|---|---|---|---|---|
Miniforum MS-01 | 3 | 256GB SSD | 2TB SSD (Mayastor) | 96GB | Talos | Kubernetes |
N100-6L | 1 | 1TB SSD | - | 32GB | Proxmox | Opnsense and Nix-Infra |
H3C S6300-48S | 1 | - | - | - | - | Network Switch |
SANTAK TG-Box 850 | 1 | - | - | - | - | UPS |
Miniforum MS-01
Component | Model | Specifications | Quantity | Notes |
---|---|---|---|---|
CPU | Intel | 13900H | 1 | |
RAM | Crucial | DDR5 5600MHz 48GB | 2 | |
OS Disk | SK Hynix | P41 2TB | 1 | Max. 3 |
Data Disk | Advantech | A+E 2230 SSD 256GB | 1 | Replace WIFI |