Using SOS collect to retrieve sos reports from OCP cluster
What is SOS collect?
If you have worked with Red Hat products for a while, you should know about the sos
tool. sos
is a diagnostic data collection utility, used by system administrators, support representatives, and the like to assist in troubleshooting issues with a system or group of systems. The most well known function is sos report
. It generates an archive of system information including configuration files and command output.
What is less commonly known is the subcommand sos collect
, which collects sosreports
from multiple nodes and packages them in a single useful tar archive. Using this tool, you don’t need to follow the cumbersome process defined in this KCS: How to provide an sosreport from a RHEL CoreOS OpenShift 4 node.
What I need to know?
In my personal experience, sos collect
as part of the sos
command is a pretty new feature that is subject to constant updates and improvements. Therefore, I strongly recommend to update to the latest version of the plugin, which is v4.6.1 at the time of writing this blog.
sos collect --verbose
Authenticating against the cluster
This tool uses kubeconfig
as the preferred authentication method. In case you are more focused on the development side of OpenShift and not that aware of how it works, let’s summarize how it works and a couple ways of retrieving it.
Warning | sos collect requires a kubeconfig with cluster-admin role binding to work. |
The two easier ways of retrieving a kubeconfig are:
Use the
kubeconfig
provided during installation. This will work if you didn’t change the certificates of the API after installation.sos collect
will not use the--insecure-skip-tls-verify
flag and the command will not work properly.Generate a
kubeconfig
on demand using a cluster-admin user account. It is as simple as executing the following command:
oc login --token=$USER_TOKEN --server=$OCP_API_URL:6443 --kubeconfig=/tmp/kubeconfig
Tip | Rescuing kubeconfig Oh! Your openshift-authentication pods are gone and you cannot authenticate with a token? And your installer kubeconfig is missing, too? No worries, these KCSs will help you: |
Generating the sos report
Okay, there we go! It is time to retrieve the sos reports. The only thing that you need to do is to export the current KUBECONFIG. Then execute the following command:
export KUBECONFIG=/tmp/kubeconfig
sos collect --no-local --nopasswd-sudo --batch --clean \
--cluster-type=ocp -c ocp.role=master:worker
As you can see, this command has a bit amount of parameters, let’s discuss them one by one:
--no-local
: To avoid collecting also the sos report of your local machine.--nopasswd-sudo
: As thecore
user in the OCP nodes is passwordless sudoer.--batch
: Run in non-interactive mode, skipping all the prompts for user input.--clean
: To obfuscate any confidential variables such as IPs, certificates, etc.--cluster-type=ocp
: To make sure that it uses the ocp profile as well as use the KUBECONFIG to authenticate.-c ocp.role=master:worker
: By default, it only collects reports from the worker nodes. This flag is an array of nodes labels to analyze.
If you want to customize this command in other ways, you can check all the configuration options under the -c
flag running the following command:
sos collect -l
Additional resources
Ok, this post was great, but I need more information about sos report
and sos collect
. Then, this is your section!! Please, check the following Red Hat KCSs:
Still, this is not enough, I’m missing something and this is not working for me! Then, it is time. You cannot wait more. Click here to see the source code and understand how this amazing tool works!!
Annex: Structure of the kubeconfig
The kubeconfig, by default located under ~/.kube/config
is the file that oc
and kubectl
use to authenticate against the cluster. It basically contains three sections:
clusters: A list of all the clusters that it has connected to before.
users: For each user, the latest credentials that it has used before.
context: A relationship between a cluster and its user.
It also contains the key current-context
that stores the context you are currently logged in at.
apiVersion: v1
kind: Config
current-context: <namespace>/<ref-cluster>/<ref-user>
clusters:
- cluster:
server: ""
name: ""
users:
- name: ""
user:
token: ""
contexts:
- context:
cluster: ""
user: ""
name: ""