Massive kswapd0 CPU spikes?

Hey everyone! I’m running a large, multi-threaded import job into one Dgraph instance: millions of triples from about eight different threads.

I tried running echo 1 > /proc/sys/vm/drop_caches, as some Stackoverflow answers suggested. It helped for a bit, but then kswapd went back up again.

kswapd0 kinda oscillates between really high CPU usage (>90%) and 0%. Which is odd: I at least expected steady swap usage.

(NOTE: I looked at the Dgraph Deploy HOWTO, but I didn’t see any swap recommendations for dgraph. Only for the bulk loader.)

top - 14:10:56 up 145 days, 22:43,  1 user,  load average: 17.65, 13.81, 9.35
Tasks: 213 total,   2 running, 211 sleeping,   0 stopped,   0 zombie
%Cpu(s): 16.7 us, 58.3 sy,  0.0 ni, 23.1 id,  1.7 wa,  0.0 hi,  0.1 si,  0.1 st
KiB Mem : 65807368 total,   330764 free, 20462152 used, 45014452 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 44600528 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26716 root      20   0  158.7g  51.7g  35.5g S  1006 82.4   8560:58 dgraph
  104 root      20   0       0      0      0 R  95.7  0.0 267:05.69 kswapd0

Anyone have any ideas?

SYSTEM SPECS

  • CentOS 7
  • 4 core, 64GB of RAM
  • Linux some-services.test.net 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue May 14 21:24:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
> docker info
Containers: 7
 Running: 4
 Paused: 0
 Stopped: 3
Images: 6
Server Version: 18.09.0
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.14.4.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.76GiB
Name: twosixlabs-clickhouse1.datareservoir.net
ID: HXPF:5VJ2:XUDD:OAF6:PBVU:35QU:OLE2:RYAI:YMEY:2ELH:MAXP:LFZQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
  • my docker-compose.yml
version: "3.2"
services:
  zero:
    image: dgraph/dgraph:latest
    volumes:
      - /mnt/dgraph-volume/data:/dgraph
    ports:
      - 5080:5080
      - 6080:6080
    restart: on-failure
    command: dgraph zero --my=zero:5080
  alpha:
    image: dgraph/dgraph:latest
    volumes:
      - /mnt/dgraph-volume/data:/dgraph
    ports:
      - 8080:8080
      - 9080:9080
    restart: on-failure
    command: dgraph alpha --my=alpha:7080 --lru_mb=10240 --zero=zero:5080
  ratel:
    image: dgraph/dgraph:latest
    ports:
      - 8000:8000
    command: dgraph-ratel

Maybe it’s worth mentioning that this isn’t bare-metal. It’s an OpenStack instance.

Thanks for any tips, ideas, or suggestions!

1 Like

Hi @the-alchemist,

Wondering if this issue is still current, it would be nice if you can re-do your test using our latest dgraph docker image, as since Jul '19 we have done important improvements to the product.

Something that you can collect while testing are cpu profiles and memory profiles.

Also, you can try reproducing the test using Slash GraphQL our managed GraphQL service.

If the issue is still happening, we will be more than happy to investigate.

Best,
Omar