Which Container Is Eating All My Swap? A Podman Debugging Walkthrough

Swap usage is one of those problems that sneaks up on you. Everything runs fine for a while, then one morning you notice the system is sluggish, iowait is through the roof, and top shows 13.5 GB of your 16 GB swap partition is consumed. The question isn’t if something is wrong, it’s which of the dozens of containerized processes we’re running is responsible.

We ran into this on a production server running Podman containers for a browser-automation workload. The fix itself was straightforward once we found the culprit, but the finding part, tracing a host-level swap problem through PID namespaces, process trees, and containers, was the interesting bit. Here’s how we did it, step by step.

No Obvious Villain

The server was running rootless Podman, each container running a Python process.

When we opened top, the swap line told the story immediately:

MiB Swap:  16384.0 total,   2854.0 free,  13530.0 used.

We also noticed 8.1% iowait on the CPU line indicating the kernel was spending real time shuffling pages between RAM, disk and 24 zombie processes, which is a common sign of containers that aren’t reaping child processes properly (more on that later).

We could see swap was consumed, but top doesn’t tell you which container owns the offending process. On a bare-metal box, you’d just sort by memory and call it a day. With containers, there’s an extra layer.

Find the PIDs holding swap

First, we needed a list of processes actually sitting in swap, sorted by how much they were using. There are a few ways to get this.

The quickest is to add the SWAP column in top (press f to manage fields) or start it pre-sorted:

top -o SWAP

That gave us a clear picture:

| PID       | Command       | SWAP     |
|-----------|---------------|----------|
| 1521091   | python        | ~771 MB  |
| 2538      | slirp4netns   | ~661 MB  |
| 1511015   | python        | ~260 MB  |
| 1510112   | python        | ~36 MB   |

PID 1521091 was the biggest single contributor at 771 MB. The slirp4netns process at 661 MB was interesting too, but that’s Podman’s rootless networking layer, which is shared across all of a user’s containers, so you can’t pin it to a specific workload.

If you don’t want to fiddle with the top field manager, you can scan /proc directly:

for pid in $(ls /proc/ | grep -E '^[0-9]+$'); do
  swap=$(awk '/VmSwap/{print $2}' /proc/$pid/status 2>/dev/null)
  if [ "${swap:-0}" -gt 1000 ]; then
    name=$(cat /proc/$pid/comm 2>/dev/null)
    echo "PID=$pid  SWAP=${swap}kB  CMD=$name"
  fi
done | sort -t= -k3 -n -r | head -20

This works on any Linux system without installing anything. It walks every process, checks VmSwap from /proc/<pid>/status, and prints the ones using more than ~1 MB of swap, sorted descending.

Confirm the process

Before going further it’s a good idea to make sure the process is still alive:

ps -p 1521091 -o pid,ppid,user,%mem,%cpu,etime,args

In our case it was a long-running Python process with over four hours of CPU time, which was technically expected with our processes, but should be freeing swap space

Walking the process tree

On a containerized system, knowing a PID isn’t enough, you need to trace it back through the process hierarchy to figure out which container owns it. pstree does this cleanly:

pstree -sg 1521091

systemd(1)
 └─systemd(1944)
    └─conmon(1520548)
       └─entrypoint.sh(1520675)
          └─python(1520675)
             └─4×[{python}(...)]

Reading this bottom-up

python is our swap consumer
entrypoint.sh the container’s entrypoint script that launches the Python process
conman is the Podman container monitoring process. This bridges the host process and the container. It manages stdio and keeps the container alive.
systemd(1944) the user-level systemd rootless Podman container instances

The PID we need is 1520675, which is the entrypoint script for the container’s init process (PID 1 inside the container).

Finding the container

Here’s a gotcha that cost us a few minutes: Podman’s {{.Pid}} field reports the container’s init PID (the first process inside the container), not the conmon PID. So if you grep for the conmon PID, you’ll get nothing back.

First we tried:

podman ps --format "{{.ID}} {{.Names}} {{.Pid}}" | grep 1520548

No results because 1520548 is conmon, not the init process. The correct search uses the entrypoint PID:

podman ps --format "{{.ID}} {{.Names}} {{.Pid}}" | grep 1520675

a1b2c3d4e5f6 myapp_container 1520675

The wwap consumer lives inside myapp_container.

You can also use the brute-force method:

for cid in $(podman ps -q); do
  pid=$(podman inspect "$cid" --format '{{.State.Pid}}')
  name=$(podman inspect "$cid" --format '{{.Name}}')
  echo "Container=$name  InitPID=$pid"
done | sort

Then match against the PIDs found in pstree.

Confirm the root cause

Now we can look inside the container

podman exec -it myapp_container bash
ps aux
cat /proc/1/status | grep -i swap
cat /proc/1/cmdline | tr '\0' ' '

This confirmed what pstree had already told us: the Python process was running long running processes. But that was only the workload pattern, not the actual explanation for why swap stayed full. The real issue was that the Python runtimes were not returning memory pages to the host OS, so the kernel kept treating that memory as belonging to the process. There may also still be a memory management issue within one of the third-party libraries we are using. We implemented code changes to try to negate it and we may have to restructure our processes, but now that we know we effectively apply solutions without guess work.

Quick Reference

Here’s a quick breakdown of the process

top -o SWAP  (spot the high-swap PIDs)
  → ps -p   (confirm it's alive)
    → pstree -sg   (walk up to conmon/entrypoint)
      → podman ps --format "{{.ID}} {{.Names}} {{.Pid}}" | grep 
        → container name
          → podman exec to inspect inside

Posted Under Articles