Common Server issues – FAQs and answers from those in the know
How can I remove two failed newly-added SSDs from an HPE Smart Array RAID10 and return to the original 4-disk layout
9 April 2026 @ 10:12 pm
I have an HPE Smart Array P440ar controller.
Originally I had a 4x SSD RAID 10 array and everything was working correctly.
I then tried to expand the array by adding 2 more SSDs so that I could later increase the capacity. Unfortunately, those two SSDs turned out to be incompatible with the HPE server/controller setup, and I had to remove them physically.
As a result:
the controller marked the two newly added drives as failed / hot removed
the array pulled in the hot spare
the logical drive is now in degraded / interim recovery mode
the logical drive itself was not expanded, so the usable logical drive size never changed
My goal is to return to the original 4-SSD RAID10 configuration until I get proper compatible SSDs for expansion.
Current relevant situation:
original RA
ZFS on LUKS: zpool operations hang indefinitely after LUKS device disappears
9 April 2026 @ 5:40 pm
I am experimenting a bit with ZFS on top of LUKS, and I was encountering a situation where the pool becomes completely unmanageable if the underlying block device disappears unexpectedly. Note that here, the operating system (Ubuntu) is itself running on ZFS.
Setup:
Format a USB drive (to later best simulate the removal; in principle even internal disks can fail this way!) to contain a LUKS layer under which a ZFS pool is initialized.
Unlock the LUKS device (let's say the mapper name is testmapper)
Import the ZFS pool (let's call it testpool) from the testmapper.
Unplug the USB drive to simulate a disconnect.
Now, the testmapper will remain in use due to the zpool still being active.
Thus, reconnecting the USB drive will not immediately fix this:
The original mapper name is still in use and crytsetup luksOpen for testmapper fails wit
Migrate Letsencrypt certificate to different server but retain existing certificate for other domain
9 April 2026 @ 2:28 pm
I have two servers, server A hosts example.com, and server B hosts subdomain.example.com. Server B needs to host both example.com and subdomain.example.com.
Is it possible to copy the Letsencrypt certificate on server A for example.com over to server B such that server B contains both certificates?
Normally I would copy the /etc/letsencrypt directory from server A to server B, but I believe that would wipe out the certificate that is already there.
Thank you in advance!
(running Apache on server A and nginx on server B if that's relevant)
WiFi to particular network closes after less than a minute [migrated]
9 April 2026 @ 10:31 am
I have an HP EliteBook with "Intel Wi-Fi 7 BE201 320MHz" adapter.
I work at coworking place and WiFi connection establishes after reboot. Signal strength shows maximum. I am able to open a browser and browse the net.
Within one minute, WiFi signal strength indicator changes to very weak and connection is lost.
I use my phone on the same WiFi without a problem. Guys around use same WiFi. I switch on hotspot on my phone and laptop connects without any problem. The same laptop on different WiFi networks works fine.
I was thinking something wrong with built in WiFi adapter and I bought external USB Realtek WiFi adapter. I disabled internal WiFi adapter and use only USB WiFi adapter. Same problem.
I would understand if WiFi router would reject connection. I do not understand why signal strength indicator shows changes.
How to troubleshoot this kind of WiFi connectivity?
Please note, I cannot access the WiFi access point.
Dovecot LDAP authentication with Active Directory times out when using domain root as base DN
9 April 2026 @ 8:01 am
PROBLEM:
I am integrating Dovecot 2.3.21 with Active Directory using LDAP authentication.
I’ve been going through the documentation and several forum threads, but I can’t figure out the root cause of this issue.
The problem appears to be related to the base setting in /etc/dovecot/dovecot-ldap.conf.ext.
LDAP authentication works correctly when I specify a specific OU as the base DN, but it times out when I use the domain root DN instead.
This configuration works correctly:
hosts = ldapserver.net.domain.local
ldap_version = 3
dn = CN=connector,CN=Users,DC=net,DC=domain,DC=local
dnpass = connectorpass
auth_bind = yes
base = OU=VDI Users,DC=net,DC=domain,DC=local
scope=subtree
user_filter = (&(objectClass=user)(sAMAccountName=%n))
pass_filter = (&(objectClass=user)(sAMAccountName=%n))
However, the following configuration does not work:
hosts = ldapserver.net.domain.local
ldap_version = 3
How safe is it to fully replace a GCP Network Load Balancer backend at once?
8 April 2026 @ 6:49 pm
Let's say that I have an internal passthrough network load balancer with a backend named instance-group-1, no session affinity, and 300s connection draining timeout. For maintenance, I want to temporarily change the backend to instance-group-2.
Usually my procedure is:
Add instance-group-2 and wait for load balancer update.
Remove instance-group-1 and wait for load balancer update. From what I understand there is a connection draining process before backend instance-group-1 is removed.
How "safe" is it if I just replace instance-group-1 with instance-group-2 in one step?
By "safe", I mean from the client perspective:
Will the clients' connections be fully drained?
Will the clients experience connection timeout or other form of connection error?
Will there any downtime?
etc.
Traefik ignores containers with multiple routers?
8 April 2026 @ 4:11 pm
I am fairly new to Traefik, but have managed to set up multiple containers behind it. I am now running into an issue where if I create a container that has more than one router, Traefik doesn't process it. Is this something that should work, or does 3.6 not support it?
According to the documentation, this is how I should set up the labels when I have an internal host only route and an external host + prefix route, where I need to strip out the prefix:
labels:
- "traefik.enable=true"
# Local access, host only
- "traefik.http.routers.foundry14lan.rule=Host(`vtt.homelab.lan`)"
- "traefik.http.routers.foundry14lan.entrypoints=websecure"
- "traefik.http.routers.foundry14lan.tls=true"
- "traefik.http.services.foundry14lan.loadbalancer.server.port=30000"
# External access, https, with path that needs to be stripped
- "traefik.http.routers.foundry14web.rule=Host(<redacted>) && PathPref
1U server for L40s [closed]
8 April 2026 @ 2:35 pm
We are looking to build a server for IA workloads on a non production environment.
We would like to get a 1U server and get an L40s into it.
However, size restrictions seem to be a major limitation here, as L40s is a full width, full height card, and Dell R650 for instance can get only cards with up to 3/4 length.
Has someone built a 1U configuration with L40s? R650 for instance seems to be able to handle L4/L4s, but not L40.
L40 capacity is a must, 1U is a preference, not a hard requirement. If it is not an option, we will go for a 2U option, but I would preferrably go for a 1U solution.
Anyone with experience on this willing to share some thoughts?
Thank you.
FreeRADIUS RadSec TCP connections plateau around ~500 per instance
8 April 2026 @ 12:03 pm
We are benchmarking RadSec (TCP/TLS) connections against a single FreeRADIUS instance and observing a consistent connection ceiling.
FreeRADIUS (version 3.2.8)
Running in Kubernetes (single pod)
~1000 proxy clients, each establishing 1 TLS connection
Proxy-only setup (no heavy backend processing)
CPU ~0.1 core, Memory ~250MB
Open files limit: 65536
Established TCP connections plateau around ~505:
ss -tn state established '( sport = :2083 )' | wc -l
→ ~505
Additional connection attempts fail with:
(TLS) System call (I/O) error (-1)
Failed to insert request into the proxy list
We tried:
Increased thread pool:
max_servers = 256
Verified CPU/memory are not bottlenecks
Verified file descriptor limits are sufficient
Is
Best way to write ExecStop for multiple child processes (systemd)?
8 April 2026 @ 12:53 am
I've written my first systemd user service and I'm wondering what the best way is to kill it.
When the script starts it up the processes look like this (output from systemctl status):
...
CGroup: /user.slice/user-1000.slice/[email protected]/app.slice/myservice.service
├─2203135 /bin/sh /home/.../start-myservice-systemd
├─2203136 /home/.../python3 /home/.../bin/the-server --port 8200
└─2203137 /usr/bin/multilog s1000000 n10 /home/.../logs
When stopping the service I want all of the processes to die and I've found that using pkill in the service config file seems to do the job:
ExecStop=/usr/bin/pkill -P $MAINPID
If I use /bin/kill $MAINPID then the python and the multilog processes stick around after a systemctl stop command.
Is there perhaps a preferred way to send a signal to all of the child proce