Problem with Dgraph helm chart serviceAccount creation

What I want to do

I want a K8s service account to be created by proper setting in Dgraph’s helm chart values.yaml file.

What I did

I used helm chart to install Dgraph in EKS cluster. The serviceAccount config in values.yaml Dgraph Helm Chart Values doesn’t create the K8s service account when it’s been installed though the attribute create is set as true. Both version, v21.12.0 and v23.0.1 failed.

Here is my settings.

serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::xxxxxxxxxxx:role/demo-iam-role
  name: demo-service-account
  automountServiceAccountToken: true

Dgraph metadata

Impacted Dgraph version: v21.12.0 and v23.0.1

I have tested this feature using IRSA (thus the annotation) with the most recent chart (example below). What Kubernetes version are you using?

Let me try this out with those versions and see if I can find anything.

For example, here’s an installation I did recently.

cat <<-EOF > my_dgraph_config.yaml
fullnameOverride: my-dgraph
image:
  tag: v23.1.0
serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::$ACCOUNT_ID:role/$ROLE_NAME
  name: dgraph
  automountServiceAccountToken: false
zero:
  automountServiceAccountToken: false
  persistence:
    storageClass: $DGRAPH_STORAGE_CLASS
    size: 32Gi
alpha:
  automountServiceAccountToken: true
  persistence:
    storageClass: $DGRAPH_STORAGE_CLASS
  resources:
    limits:
      memory: 32Gi
    requests:
      cpu: 13600m
      memory: 29Gi
  configFile:
    config.yaml: |
      security:
        whitelist: ${EKS_CIDR}
EOF

helm install my-dgraph --namespace dgraph \
  --create-namespace --values my_dgraph_config.yaml dgraph/dgraph

Hey Joaquin, this is the details of the k8s versions and EKS version that I’m using. Let me adjust my config by aligning with yours and get back to you with the result.

Client Version: v1.27.1
Kustomize Version: v5.0.1
Server Version: v1.27.4-eks-2d98532

Using these values above as test.yaml with K8S (EKS) 1.27 using both helm template or helm install --debug --dry-run, and I could not see any issue.

# template test
helm template dgraph/dgraph --namespace dgraph \
  --values test.yaml | grep -A11 -B3 ServiceAccount

# ---
# # Source: dgraph/templates/serviceaccount.yaml
# apiVersion: v1
# kind: ServiceAccount
# metadata:
#   name: demo-service-account
#   labels:
#     app: dgraph
#     chart: dgraph-0.2.1
#     component: alpha
#     release: release-name
#     heritage: Helm
#   annotations:
#     eks.amazonaws.com/role-arn: arn:aws:iam::xxxxxxxxxxx:role/demo-iam-role
# ---

# debug dry-run install test
helm install mydg dgraph/dgraph --namespace dgraph --debug --dry-run \
  --create-namespace --values test.yaml | grep serviceAccountName
#       serviceAccountName: demo-service-account
#       serviceAccountName: demo-service-account

I have also installed this with K8S (EKS) 1.26, with an IAM Role for S3 bucket access for backups, and have not encountered any issues. Though with this one, I used the default dgraph sa name.

As for dgraph sa name, do you mean super admin name?

I just followed your example, unfortunately no serviceAccountName gets returned.
Here is the my test file service-acct.yaml except for the version difference for labels’ chart, the rest is aligned with yours.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: demo-service-account
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::xxxxxxxxxxx:role/demo-iam-role
  labels:
    app: dgraph
    chart: dgraph-0.0.19
    component: alpha
    release: mydg
    heritage: Helm

I run:

helm install mydg dgraph/dgraph --namespace dgraph \
--debug --dry-run --create-namespace \
--values deploy/dev/dgraph/service-acct.yaml | grep serviceAccountName

It returns no service account name. :disappointed_relieved:

install.go:178: [debug] Original chart version: ""
install.go:195: [debug] CHART PATH: /Users/Ann.Zhang/Library/Caches/helm/repository/dgraph-0.0.19.tgz

What’s odd is that using kubectl to create the service account with the same yaml file, there’s no any problem.

# create service account
$ kubectl apply -f deploy/dev/dgraph/service-acct.yaml
# check service account
$ kubectl get serviceaccount -n dgraph
# it returns:
# NAME                           SECRETS   AGE
# demo-service-account           0         107m

@joaquin Did you use eksctl to set up your k8s cluster on the cloud? I’m using Terraform code to set up all the infrastructure stuff on the cloud. First I wonder if this difference between your testing and mine is caused by different tools we’re using or not, then it’s proven false because I packed a helm chart for my own app with configuration in its values.yaml file almost the same as what I set in Dgraph’s values.yaml file, when it’s installed using helm, the service account was created immediately and successfully. At least this tells me that there’s nothing wrong with the way I use helm install.
This is what I set for my own app’s chart values.yaml.

serviceAccount:
  create: true
  # Annotations to add to the service account
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::xxxxxxxxxxx:role/demo-iam-role
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name: "myapp-service-account"
# Install app chart
helm install app-release appchart \
--namespace myapp --debug --dry-run \
--values deploy/dev/app/values.yaml | grep serviceAccountName
install.go:178: [debug] Original chart version: ""
install.go:195: [debug] CHART PATH: /Users/Ann.Zhang/Work/app/appchart

# it retruns
# serviceAccountName: myapp-service-account

When you grep’d for serviceAccountName, you got 2 entries, one for alpha sts and one for zero sts? This would indicate that the service account is indeed created and that STS will use the said service account.

The whole permissions may not work, that would be a different issue, and could be related to how you provisioned it with Terraform. Below is automation I used with eksctl for comparison. The eksctl is doing a lot of hand-holding automation under the hood, so you may need to compare many resources to get equivalency. eksctl uses the reference infra-as-code (AWS cloudformation code) as the guideline, so they will also have tags and such needed by other drivers and resources.

For reference, this is how I provision the resources with eksctl.

eksctl create cluster \
  --version $EKS_VERSION \
  --region $EKS_REGION \
  --name $EKS_CLUSTER_NAME \
  --nodes 3

eksctl utils associate-iam-oidc-provider \
  --cluster $EKS_CLUSTER_NAME \
  --region $EKS_REGION \
  --approve

eksctl create iamserviceaccount \
  --name "ebs-csi-controller-sa" \
  --namespace "kube-system" \
  --cluster $EKS_CLUSTER_NAME \
  --region $EKS_REGION \
  --attach-policy-arn $POLICY_ARN_ECSI \
  --role-only \
  --role-name $ROLE_NAME_ECSI \
  --approve

# Install Addon
eksctl create addon \
  --name "aws-ebs-csi-driver" \
  --cluster $EKS_CLUSTER_NAME \
  --region $EKS_REGION \
  --service-account-role-arn $ACCOUNT_ROLE_ARN_ECSI \
  --force

# Pause here until STATUS=ACTIVE
ACTIVE=""; while [[ -z "$ACTIVE" ]]; do
  if eksctl get addon \
       --name "aws-ebs-csi-driver" \
       --region $EKS_REGION \
       --cluster $EKS_CLUSTER_NAME \
    | tail -1 \
    | awk '{print $3}' \
    | grep -q "ACTIVE"
  then
    ACTIVE="1"
  fi
done

# create storage class using driver
cat <<EOF | kubectl apply --filename -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: $STORAGE_CLASS_NAME
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
EOF

# make ebs-sc the new storage class
kubectl patch storageclass gp2 --patch \
 '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
kubectl patch storageclass $STORAGE_CLASS_NAME --patch \
 '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

I’ve just double checked the infrastructure terraform code, it looks fine, and I destroyed all and re-applied all. Here is the complete dgraph dgraph-values.yaml file I’m using, excuse me that I have to hide some sensitive information.

global:
  ingress:
    enabled: true
    ingressClassName: alb
    alpha_hostname: "dgraph.alpha.dev.annzapp.com"
    annotations:
      kubernetes.io/ingress.class: alb
      alb.ingress.kubernetes.io/scheme: internal
      alb.ingress.kubernetes.io/group.name: dgraph
      alb.ingress.kubernetes.io/subnets: subnet-XXXXXXXXX,subnet-YYYYYYYYY
      alb.ingress.kubernetes.io/load-balancer-name: neosight-alb-dev
      alb.ingress.kubernetes.io/inbound-cidrs: aaaaaaa,bbbbbbbb,ccccccccc
      alb.ingress.kubernetes.io/tags: TargetGroupName=k8s-neosight-alb-dev

image: &image
  tag: v21.12.0

serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXX:role/neosight-iam-role-dev
  namespace: dgraph
  name: neosight-service-account-dev
  automountServiceAccountToken: false

alpha:
  automountServiceAccountToken: true
  replicaCount: 1 # 1 is minimal 3 is stable
  extraEnvs:
    - name: lambda
      value: url=http://dgraph-lambda-dgraph-lambda.neosight.svc:8686/graphql-worker
  configFile:
    config.yaml: |
      security:
        whitelist: 0.0.0.0/0
      telemetry:
        sentry: false
        reports: false
  service:
    type: LoadBalancer
    name: private-lb-dgraph-alpha-dev
    namespace: dgraph
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-subnets: "subnet-xxxxxxxx,subnet-yyyyyyyy"
      service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
    loadBalancerSourceRanges:
    - "0.0.0.0/0"
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: role
          operator: In
          values:
          - dgraph
  persistence:
    storageClass: "gp2"
  livenessProbe:
    enabled: true
    port: 8080
    path: /health?live=1
    initialDelaySeconds: 15
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 6
    successThreshold: 1
  readinessProbe:
    enabled: true
    port: 8080
    path: /probe/graphql
    initialDelaySeconds: 15
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 6
    successThreshold: 1
zero:
  automountServiceAccountToken: false
  replicaCount: 1 # 1 is minimal 3 is stable
  configFile:
    config.yaml: |
      telemetry:
        sentry: false
        reports: false
  service:
    type: LoadBalancer
    name: private-lb-dgraph-zero-dev
    namespace: dgraph
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-subnets: "subnet-xxxxxxx,subnet-yyyyyyyy"
      service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
    loadBalancerSourceRanges:
    - "0.0.0.0/0"
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: role
          operator: In
          values:
          - dgraph
  persistence:
    storageClass: "gp2"
  livenessProbe:
    enabled: true
    port: 6080
    path: /health
    initialDelaySeconds: 15
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 6
    successThreshold: 1
  readinessProbe:
    enabled: true
    port: 6080
    path: /state
    initialDelaySeconds: 15
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 6
    successThreshold: 1

Helm Install:

helm install dgraph-release dgraph/dgraph --namespace dgraph \
  --create-namespace --debug --dry-run --values deploy/dev/dgraph/dgraph-values.yaml | grep -A11 -B3 ServiceAccount

# it returns:
install.go:178: [debug] Original chart version: ""
install.go:195: [debug] CHART PATH: /Users/Ann.Zhang/Library/Caches/helm/repository/dgraph-0.0.19.tgz

TEST SUITE: None
USER-SUPPLIED VALUES:
alpha:
  automountServiceAccountToken: true
  configFile:
    config.yaml: |
      security:
        whitelist: 0.0.0.0/0
      telemetry:
        sentry: false
        reports: false
  extraEnvs:
  - name: lambda
    value: url=http://dgraph-lambda-dgraph-lambda.neosight.svc:8686/graphql-worker
  livenessProbe:
--
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXX:role/neosight-iam-role-dev
  automountServiceAccountToken: false
  create: true
  name: neosight-service-account-dev
  namespace: dgraph
zero:
  automountServiceAccountToken: false
  configFile:
    config.yaml: |
      telemetry:
        sentry: false
        reports: false
  livenessProbe:
    enabled: true
    failureThreshold: 6
    initialDelaySeconds: 15
    path: /health
    periodSeconds: 10
--
  acl:
    enabled: false
  antiAffinity: soft
  automountServiceAccountToken: true
  configFile:
    config.yaml: |
      security:
        whitelist: 0.0.0.0/0
      telemetry:
        sentry: false
        reports: false
  customLivenessProbe: {}
  customReadinessProbe: {}
  customStartupProbe: {}
  encryption:
--
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXX:role/neosight-iam-role-dev
  automountServiceAccountToken: false
  create: true
  name: neosight-service-account-dev
  namespace: dgraph
zero:
  antiAffinity: soft
  automountServiceAccountToken: false
  configFile:
    config.yaml: |
      telemetry:
        sentry: false
        reports: false
  customLivenessProbe: {}
  customReadinessProbe: {}
  customStartupProbe: {}
  extraAnnotations: {}
  extraEnvs: []
  extraFlags: ""

Then:

helm install dgraph-release dgraph/dgraph --namespace dgraph \
  --create-namespace --debug --dry-run --values deploy/dev/dgraph/dgraph-values.yaml | grep ServiceAccountName 

# it returns:
install.go:178: [debug] Original chart version: ""
install.go:195: [debug] CHART PATH: /Users/Ann.Zhang/Library/Caches/helm/repository/dgraph-0.0.19.tgz

This looks problematic, as the default driver doesn’t work out of the box after Kubernetes 1.23+. Amazon doesn’t support the in-tree driver anymore, and recommend using the EBS CSI drvier for persistent volume support.

I have EBS CSI drvier added as Add-on to EKS, also give node permission to obtain EBS volume per its need.
Here is the screenshot of the driver added to the cluster from aws cli website page.

As far as I observed, some volumes are dynamically attached to the currently Only 1 node dgraph-node-dev even including a snapshot for Dgraph to use, and I haven’t noticed any exceptions.

Since I’ve already added the Driver, what would you suggest for

  persistence:
    storageClass: "gp2"

Should it be ignored or what is the correct storageClass that should be declared here?

Maybe with the driver installed the old one works. The gp2 is really slow. I would install a storage class that uses the driver and uses gp3.

gp2 is cheaper though, :laughing:

I’ll try clean everything tomorrow, I mean uninstall Dgraph and all my services, destroy infrastructure completely. Then begin on a blank slate, try Dgraph v23.0.1 with gp3 as storage class of this dgraph-values.yaml, hope I’m lucky tomorrow to have the service account created successfully.

@joaquin Problem solved by running helm upgrade repo dgraph. :laughing: I should have thought about it earlier! argggg… :rofl: Cheers! :beers:

Now I can see the serviceAccountName correctly returned.

helm install dgraph-release dgraph/dgraph \      
--namespace neosight --debug --dry-run \  
--values deploy/dev/dgraph/dgraph-values.yaml | grep serviceAccountName  

install.go:178: [debug] Original chart version: ""
install.go:195: [debug] CHART PATH: /Users/Ann.Zhang/Library/Caches/helm/repository/dgraph-0.2.2.tgz

      serviceAccountName: neosight-service-account-dev
      serviceAccountName: neosight-service-account-dev

That is great to hear. Congrats.

I have been busy adding features or bug fixes. To get early access, you can download the helm chart, then reference it from the file path. Eventually, the features would be scooped up into a new version.

1 Like