[BUG] Opensearch SSL transport error, master not discovered or elected yet

This issue has been tracked since 2021-09-20.

Describe the bug
Can't reproduce default demo setup on kubernetes.

To Reproduce
Steps to reproduce the behavior:

  1. Install helm chart with defaults (optional) from https://github.com/opensearch-project/helm-charts
  2. Copy all configuration yaml from /usr/share/opensearch/plugins/opensearch-security/securityconfig to local
  3. Paste contents to securityConfig.config.data file templates
  4. See error
SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
[opensearch-cluster-master-0] master not discovered or elected yet

Expected behavior
Cluster gets GREEN state

Plugins
Please list all plugins currently enabled.

    cluster.name: opensearch-cluster

    # Bind to all interfaces because we don't know what IP address Docker will assign to us.
    network.host: 0.0.0.0

    # # minimum_master_nodes need to be explicitly set when bound on a public IP
    # # set to 1 to allow single node clusters
    discovery.zen.minimum_master_nodes: 1
    plugins:
      security:
        ssl:
          transport:
            pemcert_filepath: esnode.pem
            pemkey_filepath: esnode-key.pem
            pemtrustedcas_filepath: root-ca.pem
            enforce_hostname_verification: false
          http:
            enabled: false
            pemcert_filepath: esnode.pem
            pemkey_filepath: esnode-key.pem
            pemtrustedcas_filepath: root-ca.pem
        allow_unsafe_democertificates: true
        allow_default_init_securityindex: true
        authcz:
          admin_dn:
            - CN=kirk,OU=client,O=client,L=test, C=de
        audit.type: internal_opensearch
        enable_snapshot_restore_privilege: true
        check_snapshot_restore_write_privileges: true
        restapi:
          roles_enabled: ["all_access", "security_rest_api_access"]
        system_indices:
          enabled: true
          indices:
            [
              ".opendistro-alerting-config",
              ".opendistro-alerting-alert*",
              ".opendistro-anomaly-results*",
              ".opendistro-anomaly-detector*",
              ".opendistro-anomaly-checkpoints",
              ".opendistro-anomaly-detection-state",
              ".opendistro-reports-*",
              ".opendistro-notifications-*",
              ".opendistro-notebooks",
              ".opendistro-asynchronous-search-response*",
            ]

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

peterzhuamazon wrote this answer on 2021-09-21

Never seen this issue before from me, @DandyDeveloper @TheAlgo any idea on this issue from @alborotogarcia ?
Thanks.

DandyDeveloper wrote this answer on 2021-09-22

That specific bug can be ignored (Insufficient buffer remaining for AEAD cipher fragment). Its a known thing in Search Guard: https://bugs.openjdk.java.net/browse/JDK-8221218

Shouldn't have any impact on the cluster working.

[opensearch-cluster-master-0] master not discovered or elected yet

Is this actually causing problem? You mention the cluster being green.

If you are just trying to run a single cluster;

    # # minimum_master_nodes need to be explicitly set when bound on a public IP
    # # set to 1 to allow single node clusters
    # discovery.zen.minimum_master_nodes: 1

    # Setting network.host to a non-loopback address enables the annoying bootstrap checks. "Single-node" mode disables them again.
    #discovery.type: single-node

Uncomment these it'll work.

If not, we need the full log.

alborotogarcia wrote this answer on 2021-09-22

I meant green state as desired, not really reached unfortunately, as securityconfig doesn't get started
I've created a gist with the values.yaml and the full trace of the three nodes here , please could you take a look @DandyDeveloper ?
If there's something else I'm missing let me know :)

Thanks for the help @peterzhuamazon @DandyDeveloper !

alborotogarcia wrote this answer on 2021-09-22

FWIW @DandyDeveloper @peterzhuamazon , I forgot to mention, internal users and other configurations added work if they're kept in their volumes and I redeploy the helm chart one more time with no securityConfig.config.data. Including ldap users.

smlx wrote this answer on 2021-09-22

this seems to be the problem

opensearch-cluster-master-0 opensearch java.nio.file.FileSystemException: /usr/share/opensearch/data/nodes/0/.opensearch_temp_file: Read-only file system
alborotogarcia wrote this answer on 2021-09-22

@smlx I see, since kubernetes version 1.9.6 and forth, volumeMounts behavior on secret, configMap, downwardAPI and projected have changed to Read-Only by default as stated here kubernetes/kubernetes#62099 But I don't understand why just leaving as the default chart template it doesn't complain about RO filesystem.. is it another UID that initiates the process? the current fsGroup is set to user 1000 and so it is set on #9
How can this be solved?

DandyDeveloper wrote this answer on 2021-09-24

@alborotogarcia

I just deployed locally with your exact values and its working for me and able to write to that directory.

[[email protected] ~]$ cd data/
[[email protected] data]$ ls -l
total 20
-rw-rw-r-- 1 opensearch opensearch    5 Sep 24 01:29 batch_metrics_enabled.conf
-rw-rw-r-- 1 opensearch opensearch    5 Sep 24 01:29 logging_enabled.conf
drwxrwxr-x 3 opensearch opensearch 4096 Sep 24 01:29 nodes
-rw-rw-r-- 1 opensearch opensearch    5 Sep 24 01:29 performance_analyzer_enabled.conf
-rw-rw-r-- 1 opensearch opensearch    5 Sep 24 01:29 rca_enabled.conf
[[email protected] data]$ ls -l nodes/
total 4
drwxrwxr-x 3 opensearch opensearch 4096 Sep 24 01:39 0
[[email protected] data]$ ls -l nodes/0
total 4
drwxrwxr-x 2 opensearch opensearch 4096 Sep 24 01:29 _state
-rw-rw-r-- 1 opensearch opensearch    0 Sep 24 01:29 node.lock

Edit: I had a bunch of info here that was redundant and incorrect. I misread volumes :)

What k8s version are you running? I'm running latest in my test cluster here.

alborotogarcia wrote this answer on 2021-09-26

Sorry for the delay @DandyDeveloper, I had some issues with my IdP and had to spent time on it.. I am running a 3 node k3s cluster and yes I am aware that all config files are needed otherwise it will complain.. I run longhorn as storage class.. but IMHO I suspect that If I turn it to subpaths for each file mounts it may be less error prone.. as you said earlier it may affect to the folder that it gets mounted on.. will report back

mprimeaux wrote this answer on 2021-09-26

@DandyDeveloper We are running into what I perceive as the same or similar issue with the 1.0.0 charts with a similar config as @alborotogarcia, though we are using Keycloak as our idP.

Would you mind reviewing the permissions in the /usr/share/opensearch/plugins/opensearch-security folder? It appears the securityconfig is owned by root and not opensearch, which might be the cause.

-rw-r--r-- 1 opensearch opensearch  452868 Jul  8 22:32 saaj-impl-1.5.2.jar
drwxrwsrwt 3 root       opensearch     260 Sep 26 12:24 securityconfig
-rw-r--r-- 1 opensearch opensearch   41203 Jul  8 22:32 slf4j-api-1.7.25.jar
[[email protected] securityconfig]$ ls -l
total 0
lrwxrwxrwx 1 root opensearch 24 Sep 26 12:24 action_groups.yml -> ..data/action_groups.yml
lrwxrwxrwx 1 root opensearch 16 Sep 26 12:24 audit.yml -> ..data/audit.yml
lrwxrwxrwx 1 root opensearch 17 Sep 26 12:24 config.yml -> ..data/config.yml
lrwxrwxrwx 1 root opensearch 25 Sep 26 12:24 internal_users.yml -> ..data/internal_users.yml
lrwxrwxrwx 1 root opensearch 19 Sep 26 12:24 nodes_dn.yml -> ..data/nodes_dn.yml
lrwxrwxrwx 1 root opensearch 16 Sep 26 12:24 roles.yml -> ..data/roles.yml
lrwxrwxrwx 1 root opensearch 24 Sep 26 12:24 roles_mapping.yml -> ..data/roles_mapping.yml
lrwxrwxrwx 1 root opensearch 18 Sep 26 12:24 tenants.yml -> ..data/tenants.yml
lrwxrwxrwx 1 root opensearch 20 Sep 26 12:24 whitelist.yml -> ..data/whitelist.yml

Not sure if this is the issue but the content of each of the above files looks correct as per these examples.

Of note, we have an older version of the OpenSearch charts that do work using the same values file but with the material difference being this block.

mprimeaux wrote this answer on 2021-09-26

It seems my previous assumption is incorrect. Applying an older version of the OpenSearch Helm chart with the same values file works even with the same folder and file permissions as above.

-rw-r--r-- 1 opensearch opensearch  452868 Jul  8 22:32 saaj-impl-1.5.2.jar
drwxrwsrwt 3 root       opensearch     260 Sep 26 12:44 securityconfig
-rw-r--r-- 1 opensearch opensearch   41203 Jul  8 22:32 slf4j-api-1.7.25.jar
[[email protected] securityconfig]$ ls -l
total 0
lrwxrwxrwx 1 root opensearch 24 Sep 26 12:44 action_groups.yml -> ..data/action_groups.yml
lrwxrwxrwx 1 root opensearch 16 Sep 26 12:44 audit.yml -> ..data/audit.yml
lrwxrwxrwx 1 root opensearch 17 Sep 26 12:44 config.yml -> ..data/config.yml
lrwxrwxrwx 1 root opensearch 25 Sep 26 12:44 internal_users.yml -> ..data/internal_users.yml
lrwxrwxrwx 1 root opensearch 19 Sep 26 12:44 nodes_dn.yml -> ..data/nodes_dn.yml
lrwxrwxrwx 1 root opensearch 16 Sep 26 12:44 roles.yml -> ..data/roles.yml
lrwxrwxrwx 1 root opensearch 24 Sep 26 12:44 roles_mapping.yml -> ..data/roles_mapping.yml
lrwxrwxrwx 1 root opensearch 18 Sep 26 12:44 tenants.yml -> ..data/tenants.yml
lrwxrwxrwx 1 root opensearch 20 Sep 26 12:44 whitelist.yml -> ..data/whitelist.yml
[[email protected] securityconfig]$

I'll continue debugging.

mprimeaux wrote this answer on 2021-09-26

When using the latest version of the OpenSearch chart with the same values file as above, these are the exceptions we receive, which prevent securityadmin.sh from succeeding:

Error

opensearch [2021-09-26T12:26:49,688[][DEBUG[][o.o.s.c.ConfigurationRepository[] [opensearch-cluster-master-0[] Try to load config ...
opensearch [2021-09-26T12:26:49,689[][DEBUG[][o.o.s.c.ConfigurationRepository[] [opensearch-cluster-master-0[] security index not exists (yet)
opensearch [2021-09-26T12:26:49,689[][ERROR[][o.o.s.c.ConfigurationLoaderSecurity7[] [opensearch-cluster-master-0[] Exception while retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT[] (index=.opendistro_security)
opensearch org.opensearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
opensearch     at org.opensearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:203) ~[opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:189) ~[opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:72) ~[opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:53) ~[opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:192) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:141) [opensearch-index-management-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:190) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:234) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:154) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:190) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:99) [opensearch-performance-analyzer-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:190) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.action.support.TransportAction.execute(TransportAction.java:168) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.action.support.TransportAction.execute(TransportAction.java:96) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:99) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:88) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:428) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.client.support.AbstractClient.multiGet(AbstractClient.java:546) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.loadAsync(ConfigurationLoaderSecurity7.java:211) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.load(ConfigurationLoaderSecurity7.java:102) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationRepository.getConfigurationsFromIndex(ConfigurationRepository.java:375) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:321) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:306) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationRepository$1.run(ConfigurationRepository.java:166) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at java.lang.Thread.run(Thread.java:832) [?:?]

If this turns out to be a different issue than the issue that's the topic of this thread then I'll open a separate issue.

mprimeaux wrote this answer on 2021-09-26

I believe I found the issue or, at least, a workaround.

It appears the behavior of the majorVersion chart value changed from computing the value 7 to a value of 1 as per PR #21 merge. The workaround (for me, anyway) was to explicitly set the majorVersion in the values file to 7. i.e. majorVersion: 7

If the majorVersion attribute remains at its default of "", then the stateful set computes the env: stanza as:

- name: discovery.zen.minimum_master_nodes
  value: "1"
- name: discovery.zen.ping.unicast.hosts
  value: "opensearch-cluster-master-headless"

...rather than:

- name: cluster.initial_master_nodes
  value: "opensearch-cluster-master-0,opensearch-cluster-master-1,opensearch-cluster-master-2,"
- name: discovery.seed_hosts
  value: "opensearch-cluster-master-headless"

When using "discovery", the failures as per above are present and the security indexes are never created thus resulting in a red cluster status.

I am not very familiar with zen discovery but likely prefer it so new nodes can discovery the cluster state. However, it does not appear to work.

All thoughts and support are welcome.

UPDATE 1: It appears that we should be using discovery.seed_hosts rather than discovery.zen.ping.unicast.hosts as per SettingsBasedSeedHostsProvider.java.

UPDATE 2: I modified the StatefulSet to use discovery.seed_hosts and discovery.seed_providers and it still fails with majorVersion: "". Regardless, the workaround of specifying majorVersion: 7 still succeeds.

alborotogarcia wrote this answer on 2021-09-26

@mprimeaux @DandyDeveloper I followed your suggestions, and here's what it worked for me

discovery.zen.minimum_master_nodes: 1
discovery.seed_hosts: "opensearch-cluster-master-headless"

and let majorVersion: ""

though I can't still login with my IdP

TheAlgo wrote this answer on 2021-09-26

@mprimeaux @alborotogarcia I did not try out the config and installation as I am away from work for some time. But I am thinking out loud. Can this be something related to the core engine and not the chart? Maybe we might need to look at the security repository to understand more because ideally 7 should not fix the issue as OpenSearch starts with 1.

mprimeaux wrote this answer on 2021-09-26

@TheAlgo Here is logic. It appears to be, in part, an issue with the chart logic since we SHOULD be using different discovery env: values.

However, I agree with you that something deeper might be going on and so I will also research the security repository.

Related to the OpenDistro docs, it seems they are stale given the discovery attributes are discovery.zen.ping.unicast.hosts and discovery.seed_hosts as per this in the OpenSearch docs.

mprimeaux wrote this answer on 2021-09-26

@alborotogarcia Thanks, mate. I will try your suggestion in the above reply.

TheAlgo wrote this answer on 2021-09-26

@TheAlgo Here is logic. It appears to be, in part, an issue with the chart logic since we SHOULD be using different discovery env: values.

However, I agree with you that something deeper is likely going on and so I will also research the security repository.

Related to the OpenDistro docs, it seems they are stale given the discovery attributes are discovery.zen.ping.unicast.hosts and discovery.see_hosts as per this in the OpenSearch docs.

@mprimeaux We need to change the Helm logic for sure. As part of #21 we changed it at 1 place and did not change the others which seemed to breaking.

Coming to the OpenDistro docs , yes it is stale and we should follow the official OpenSearch docs as much as possible.

mprimeaux wrote this answer on 2021-09-26

It appears the setting discovery.zen.minimum_master_nodes used in the env: stanza in OpenSearch at ZenDiscoveryUnitTests.java is being deprecated.

See the cluster settings logic here. I believe this might be a point of focus for the chart logic.

alborotogarcia wrote this answer on 2021-09-26

@DandyDeveloper Also an ingress api upgrade from networking.k8s.io/v1beta1 to networking.k8s.io/v1 on kubernetes 1.22+

mprimeaux wrote this answer on 2021-09-26

@alborotogarcia Coincidentally, I noticed this also, and, in addition it seems IngressClassName support was removed from the latest OpenSearch charts when migrated from the old repository. The ingress template should add this back as per Kubernetes 1.18+:

  {{- if and .Values.ingress.ingressClassName }}
  ingressClassName: {{ .Values.ingress.ingressClassName | quote }}
  {{- end }}

I will create a new issue and related PR today for the IngressClassName to be supported. But this is unrelated to this current issue.

peterzhuamazon wrote this answer on 2022-02-20

Close this for now as it seems to be resolved by community.
Please feel free to re-open if you still have questions.

Thanks.

More Details About Repo
Owner Name opensearch-project
Repo Name helm-charts
Full Name opensearch-project/helm-charts
Language Mustache
Created Date 2021-08-06
Updated Date 2022-11-28
Star Count 88
Watcher Count 10
Fork Count 139
Issue Count 52

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date