serverfault.com

VN:F [1.9.22_1171]
Rating: 6.0/10 (1 vote cast)

Common Server issues – FAQs and answers from those in the know

How can I remove two failed newly-added SSDs from an HPE Smart Array RAID10 and return to the original 4-disk layout

9 April 2026 @ 10:12 pm

I have an HPE Smart Array P440ar controller. Originally I had a 4x SSD RAID 10 array and everything was working correctly. I then tried to expand the array by adding 2 more SSDs so that I could later increase the capacity. Unfortunately, those two SSDs turned out to be incompatible with the HPE server/controller setup, and I had to remove them physically. As a result: the controller marked the two newly added drives as failed / hot removed the array pulled in the hot spare the logical drive is now in degraded / interim recovery mode the logical drive itself was not expanded, so the usable logical drive size never changed My goal is to return to the original 4-SSD RAID10 configuration until I get proper compatible SSDs for expansion. Current relevant situation: original RA

ZFS on LUKS: zpool operations hang indefinitely after LUKS device disappears

9 April 2026 @ 5:40 pm

I am experimenting a bit with ZFS on top of LUKS, and I was encountering a situation where the pool becomes completely unmanageable if the underlying block device disappears unexpectedly. Note that here, the operating system (Ubuntu) is itself running on ZFS. Setup: Format a USB drive (to later best simulate the removal; in principle even internal disks can fail this way!) to contain a LUKS layer under which a ZFS pool is initialized. Unlock the LUKS device (let's say the mapper name is testmapper) Import the ZFS pool (let's call it testpool) from the testmapper. Unplug the USB drive to simulate a disconnect. Now, the testmapper will remain in use due to the zpool still being active. Thus, reconnecting the USB drive will not immediately fix this: The original mapper name is still in use and crytsetup luksOpen for testmapper fails wit

Migrate Letsencrypt certificate to different server but retain existing certificate for other domain

9 April 2026 @ 2:28 pm

I have two servers, server A hosts example.com, and server B hosts subdomain.example.com. Server B needs to host both example.com and subdomain.example.com. Is it possible to copy the Letsencrypt certificate on server A for example.com over to server B such that server B contains both certificates? Normally I would copy the /etc/letsencrypt directory from server A to server B, but I believe that would wipe out the certificate that is already there. Thank you in advance! (running Apache on server A and nginx on server B if that's relevant)

WiFi to particular network closes after less than a minute [migrated]

9 April 2026 @ 10:31 am

I have an HP EliteBook with "Intel Wi-Fi 7 BE201 320MHz" adapter. I work at coworking place and WiFi connection establishes after reboot. Signal strength shows maximum. I am able to open a browser and browse the net. Within one minute, WiFi signal strength indicator changes to very weak and connection is lost. I use my phone on the same WiFi without a problem. Guys around use same WiFi. I switch on hotspot on my phone and laptop connects without any problem. The same laptop on different WiFi networks works fine. I was thinking something wrong with built in WiFi adapter and I bought external USB Realtek WiFi adapter. I disabled internal WiFi adapter and use only USB WiFi adapter. Same problem. I would understand if WiFi router would reject connection. I do not understand why signal strength indicator shows changes. How to troubleshoot this kind of WiFi connectivity? Please note, I cannot access the WiFi access point.

Dovecot LDAP authentication with Active Directory times out when using domain root as base DN

9 April 2026 @ 8:01 am

PROBLEM: I am integrating Dovecot 2.3.21 with Active Directory using LDAP authentication. I’ve been going through the documentation and several forum threads, but I can’t figure out the root cause of this issue. The problem appears to be related to the base setting in /etc/dovecot/dovecot-ldap.conf.ext. LDAP authentication works correctly when I specify a specific OU as the base DN, but it times out when I use the domain root DN instead. This configuration works correctly: hosts = ldapserver.net.domain.local ldap_version = 3 dn = CN=connector,CN=Users,DC=net,DC=domain,DC=local dnpass = connectorpass auth_bind = yes base = OU=VDI Users,DC=net,DC=domain,DC=local scope=subtree user_filter = (&(objectClass=user)(sAMAccountName=%n)) pass_filter = (&(objectClass=user)(sAMAccountName=%n)) However, the following configuration does not work: hosts = ldapserver.net.domain.local ldap_version = 3

How safe is it to fully replace a GCP Network Load Balancer backend at once?

8 April 2026 @ 6:49 pm

Let's say that I have an internal passthrough network load balancer with a backend named instance-group-1, no session affinity, and 300s connection draining timeout. For maintenance, I want to temporarily change the backend to instance-group-2. Usually my procedure is: Add instance-group-2 and wait for load balancer update. Remove instance-group-1 and wait for load balancer update. From what I understand there is a connection draining process before backend instance-group-1 is removed. How "safe" is it if I just replace instance-group-1 with instance-group-2 in one step? By "safe", I mean from the client perspective: Will the clients' connections be fully drained? Will the clients experience connection timeout or other form of connection error? Will there any downtime? etc.

Traefik ignores containers with multiple routers?

8 April 2026 @ 4:11 pm

I am fairly new to Traefik, but have managed to set up multiple containers behind it. I am now running into an issue where if I create a container that has more than one router, Traefik doesn't process it. Is this something that should work, or does 3.6 not support it? According to the documentation, this is how I should set up the labels when I have an internal host only route and an external host + prefix route, where I need to strip out the prefix: labels: - "traefik.enable=true" # Local access, host only - "traefik.http.routers.foundry14lan.rule=Host(`vtt.homelab.lan`)" - "traefik.http.routers.foundry14lan.entrypoints=websecure" - "traefik.http.routers.foundry14lan.tls=true" - "traefik.http.services.foundry14lan.loadbalancer.server.port=30000" # External access, https, with path that needs to be stripped - "traefik.http.routers.foundry14web.rule=Host(<redacted>) && PathPref

1U server for L40s [closed]

8 April 2026 @ 2:35 pm

We are looking to build a server for IA workloads on a non production environment. We would like to get a 1U server and get an L40s into it. However, size restrictions seem to be a major limitation here, as L40s is a full width, full height card, and Dell R650 for instance can get only cards with up to 3/4 length. Has someone built a 1U configuration with L40s? R650 for instance seems to be able to handle L4/L4s, but not L40. L40 capacity is a must, 1U is a preference, not a hard requirement. If it is not an option, we will go for a 2U option, but I would preferrably go for a 1U solution. Anyone with experience on this willing to share some thoughts? Thank you.

FreeRADIUS RadSec TCP connections plateau around ~500 per instance

8 April 2026 @ 12:03 pm

We are benchmarking RadSec (TCP/TLS) connections against a single FreeRADIUS instance and observing a consistent connection ceiling. FreeRADIUS (version 3.2.8) Running in Kubernetes (single pod) ~1000 proxy clients, each establishing 1 TLS connection Proxy-only setup (no heavy backend processing) CPU ~0.1 core, Memory ~250MB Open files limit: 65536 Established TCP connections plateau around ~505: ss -tn state established '( sport = :2083 )' | wc -l → ~505 Additional connection attempts fail with: (TLS) System call (I/O) error (-1) Failed to insert request into the proxy list We tried: Increased thread pool: max_servers = 256 Verified CPU/memory are not bottlenecks Verified file descriptor limits are sufficient Is

Best way to write ExecStop for multiple child processes (systemd)?

8 April 2026 @ 12:53 am

I've written my first systemd user service and I'm wondering what the best way is to kill it. When the script starts it up the processes look like this (output from systemctl status): ... CGroup: /user.slice/user-1000.slice/[email protected]/app.slice/myservice.service ├─2203135 /bin/sh /home/.../start-myservice-systemd ├─2203136 /home/.../python3 /home/.../bin/the-server --port 8200 └─2203137 /usr/bin/multilog s1000000 n10 /home/.../logs When stopping the service I want all of the processes to die and I've found that using pkill in the service config file seems to do the job: ExecStop=/usr/bin/pkill -P $MAINPID If I use /bin/kill $MAINPID then the python and the multilog processes stick around after a systemctl stop command. Is there perhaps a preferred way to send a signal to all of the child proce