Category: kubernetes

  • Miniforum’s MS-01 BIOS Tweaks

    I’m just outlining my tweaks to my MS-01s BIOS for future posterity. This is based on the 1.26 BIOS that was released in October (see previous article about updates).

    As a note, I’m using these as my K3s cluster nodes and so don’t need some of the onboard items that I’m disabling. Also, only modifications to default are listed below.

    • Main
      • Update System Date and Time if required
    • Advanced
      • Onboard Devices settings
        • VMD setup menu
          • Enable VMD controller: Disabled (not using RAID)
        • HD Audio: Disabled
        • SA-PCIE PORT
          • PCIE4.0x4 SSD ASPM: Disabled
        • PCH-PCIE PORT
          • I226-V NIC ASPM: Disabled*
          • I226-LM ASPM: Disabled*
          • WIFI: Disabled
          • WIFI ASPM: Disabled

    *I am disabling ASPM on the NICs due to a lot of wonky networking items. Based on a reddit thread, this definitely fixed it.

  • Miniforum’s MS-01 BIOS Update

    You’d think this would be obvious, but it’s not – hence the article.

    Get the BIOS Update

    Visit the support page and download the latest BIOS.

    Prepare the USB Drive

    Using diskpart run the following commands

    diskpart
    list disk
    select disk X  # Replace X with the USB Drive
    clean
    create partition primary
    format fs=fat32 quick

    Copy the files from the downloaded zip file into the root of the drive

    Download the UEFIShell

    Extract the UefiShell/X64/Shell.efi and copy it to the root of the drive.

    Your USB drive should look something like the following

    D:\>dir
     Volume in drive D has no label.
     Volume Serial Number is 267D-A88E
    
     Directory of D:\
    
    07/25/2023  11:00 PM         1,886,624 Fpt.efi
    05/25/2023  12:32 PM         2,444,336 FPTW64.exe
    10/14/2024  02:55 PM             7,398 Release_Note.txt
    10/14/2024  02:57 PM               203 WinFlash.bat
    10/14/2024  02:57 PM                56 AfuEfiFlash.nsh
    04/18/2023  11:16 AM           629,680 AfuEfix64.efi
    10/14/2024  02:57 PM               207 AfuWinFlash.bat
    10/18/2023  01:15 AM         1,127,536 AFUWINx64.exe
    10/14/2024  02:22 PM        33,554,432 AHWSA.1.26.bin
    04/20/2023  05:20 PM            36,064 amigendrv64.sys
    10/14/2024  02:57 PM                44 EfiFlash.nsh
    03/05/2020  09:01 AM           939,648 Shell.efi
                  12 File(s)     40,626,228 bytes
                   0 Dir(s)   3,956,436,992 bytes free

    Apply the Update

    Enter BIOS (DEL key) and disabled Secure Boot – Security>Secure Boot>Secure Boot: Disabled

    Either enter BIOS (DEL key) and select UEFIShell or eter the UEFIShell directly (F7)

    Push any key to skip the autorunning of startup.nsh

    Change to the USB Drive by entering FS0:

    Run ls to ensure it is the correct drive, if not change to another listed drive

    Run the update by running the AfuEfiFlash.nsh script

    It will reboot on it’s own

    Enter BIOS (DEL key) and enable Secure Boot – Security>Secure Boot>Secure Boot: Enabled

  • K3s, kube-vip, and metallb

    Man, this blog seems to be all my trials and tribulations with kubernetes at this point. Well, to add to it, here’s another issue I stumbled into…

    Tl;dr – when using metallb with kube-vip do not use the --services switch for generating the daemonset manifest, as that will conflict with metallb for load-balancing your services.


    I built a new cluster (based on Miniforum’s awesome MS-01) about two months ago. As part of this new build, I wanted to load balance the control plane instead of the DNS round-robin I had been using. This lead me to kube-vip.

    (Un)Fortunately, kube-vip can also provide the same capabilities as metallb – in that it can provide load balancing capabilities to services running in the cluster on bare metal without an external load balancer. However, I was happy with metallb in the old cluster and didn’t want to change that part.

    So I went and installed k3s with the created manifest per the kube-vip k3s instructions which links to their manifest creation instructions. I even went and looked at other, similar articles to see basically the same instructions.

    All was pretty good until I started having some weird issues where my ingresses would just sort of go offline. When I’d try to hit a website (like this one), I’d never see the request make it to the ingress, but I could ping the IP. Not seeing the request in the ingress logs made me think it was something with metallb not doing it’s L2 advertisement correctly. This seemed to happen if and when I had to restart the nodes for any reason (patches usually).

    Knowing the only real difference for this piece was related to kube-vip, I knew something was going on between kube-vip and metallb. I just didn’t know what. I attempted to upgrade kube-vip, and downgrade metallb, but nothing seemed to work. Figuring it was kube-vip and metallb fighting, I disabled kube-vip from the services (even though I didn’t want it touching them in the first place). Thinking I had fixed it, I left it. Not more than 3 hours later, the ingresses went down again. In fact, I actually made it worse where every 2-4 hours the ingresses would go do for 20 minutes, but then fix themselves. It was incredibly nerve wracking.

    Metallb even has a whole troubleshooting section on it’s website for this exact issue. Sadly, nothing there really helped, but there were some weird pieces, like with arping where it’d return multiple MAC addressess for the initial ping until it standardized on the right one. And then yesterday, while the ingresses went down I, on a whim, cleared the arp cache on my router to have it immediately fix the problem. Hmmm, could it be something with the router?!

    In a fit of frustration, I deleted the kube-vip daemonset from the cluster. Surely, that would fix it?! No, 2 hours later it was flapping again!

    Thinking through the router issue, the only thing I could think of was that it was getting conflicting info, and the only way that would happen is if there were duplicate IPs on the network. I logged into each one of the servers and ran ip -o -f inet addr show. Lo and behold on two different servers I saw the same IP address. Metallb doesn’t bind the IP to the network address, kube-vip does, so it was kube-vip that was causing the issues! Good thing I deleted it, but now I needed to restart the servers to have it remove the IP binding. Thankfully after the restart the IPs were removed.

    However, I really liked the fact that my control plane is load balanced instead of pointing to an individual node or relying on round-robin for DNS. Digging into the configuration a bit more, I see that there are 2 main features: --controlplane and --services. Sadly, the default instructions include services, which is what metallb was doing for me. Therefore, I updated the manifest script to be the following:

    kube-vip manifest daemonset \
        --interface $INTERFACE \
        --address $VIP \
        --inCluster \
        --taint \
        --controlplane \
    ##    --services \
        --arp \
        --leaderElection
    

    Redployed and over 24 hours later, all is resolved! Man, rough 2 months dealing with that…

  • Kustomization with Remote Resources

    I was today days old when I learned you can reference remote locations in your kustomize file.

    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    resources:
      - github.com/intel/intel-device-plugins-for-kubernetes/deployments/gpu_plugin?ref=v0.30.0
    patches:
      - path: add-args.yaml

    And can be run by kubectl apply -k intel/

  • Longhorn, backups, and version control

    **Update as of 1/05/24** I’ve move away from Longhorn. When it works, it works well, but when it does it’s insanely complex to troubleshoot. Plus, I don’t have a lot of storage on my nodes right now. Maybe when I do a node hardware refresh I’ll revisit.

    I’ve been doing a bit of housekeeping on the home k8s cluster, and one of the things I’m doing is moving from microk8s to k3s. This isn’t really a post about that, but long story short, it’s because of how microk8s does a non-existant job of updating addons, and you basically have to use the DNS (coreDNS) addon as I could never get that to work as a normal helm chart (even with updating the kubelet config).

    Anyways as part of that change, I need to create a new cluster, get longhorn running, and restore the volumes it was running in the old cluster. Thankfully, I had tested most of this prior to becoming reliant on longhorn, so I knew the backup and restore process worked well – just point the backTarget variable for longhorn on the new cluster to the same place as the old cluster and magic happens. Unfortunately, I ran into a snag.

    The volume restored properly, and I was able to recreate the PVC with the same name, but the deployment kept complaining about it and my Influx DB wouldn’t mount the drive. It kept throwing the error

    Attach failed for volume : CSINode does not contain driver driver.longhorn.io

    This was super odd though, I could create a new PVC with the same longhorn StorageClass and it would mount. WTF?!

    Well, lo-and-behold it was because when I built the new cluster, I decided to use the newest version of longhorn – 1.4.1 – as you do. However, the old cluster was still on 1.4.0, as were the backups. During any upgrades of longhorn, you must do an engine upgrade to the volume. Needless to say, the backups were on engine 1.4.0 (driver), but I only had 1.4.1 (driver) running as I was never prompted to upgrade the engine on the volume when restoring it. So yes, the error message was factual, if not incredibly frustrating.

    So, note to self (and others) – when restoring a longhorn volume from backup, make sure you are running the same version as from when the backup was taken. Once the volume is successfully restored and running, you can then upgrade to the latest version via the update steps, and update the engine on the volume. Sadly, there didn’t appear to be a way to do that after the restore, and tbh I didn’t look to see what version was listed as the Engine Image after the restore. I’m just thankful it’s back up and running!