Using SOS collect to retrieve sos reports from OCP cluster

What is SOS collect?
If you have worked with Red Hat products for a while, you should know about the sos tool. sos is a diagnostic data collection utility, used by system administrators, support representatives, and the like to assist in troubleshooting issues with a system or group of systems. The most well known function is sos report. It generates an archive of system information including configuration files and command output.
What is less commonly known is the subcommand sos collect, which collects sosreports from multiple nodes and packages them in a single useful tar archive. Using this tool, you don’t need to follow the cumbersome process defined in this KCS: How to provide an sosreport from a RHEL CoreOS OpenShift 4 node.
What I need to know?
In my personal experience, sos collect as part of the sos command is a pretty new feature that is subject to constant updates and improvements. Therefore, I strongly recommend to update to the latest version of the plugin, which is v4.6.1 at the time of writing this blog.
sos collect --verboseAuthenticating against the cluster
This tool uses kubeconfig as the preferred authentication method. In case you are more focused on the development side of OpenShift and not that aware of how it works, let’s summarize how it works and a couple ways of retrieving it.
Warning | sos collect requires a kubeconfig with cluster-admin role binding to work. |
The two easier ways of retrieving a kubeconfig are:
Use the
kubeconfigprovided during installation. This will work if you didn’t change the certificates of the API after installation.sos collectwill not use the--insecure-skip-tls-verifyflag and the command will not work properly.Generate a
kubeconfigon demand using a cluster-admin user account. It is as simple as executing the following command:
oc login --token=$USER_TOKEN --server=$OCP_API_URL:6443 --kubeconfig=/tmp/kubeconfigTip | Rescuing kubeconfig Oh! Your openshift-authentication pods are gone and you cannot authenticate with a token? And your installer kubeconfig is missing, too? No worries, these KCSs will help you: |
Generating the sos report
Okay, there we go! It is time to retrieve the sos reports. The only thing that you need to do is to export the current KUBECONFIG. Then execute the following command:
export KUBECONFIG=/tmp/kubeconfig
sos collect --no-local --nopasswd-sudo --batch --clean \
--cluster-type=ocp -c ocp.role=master:workerAs you can see, this command has a bit amount of parameters, let’s discuss them one by one:
--no-local: To avoid collecting also the sos report of your local machine.--nopasswd-sudo: As thecoreuser in the OCP nodes is passwordless sudoer.--batch: Run in non-interactive mode, skipping all the prompts for user input.--clean: To obfuscate any confidential variables such as IPs, certificates, etc.--cluster-type=ocp: To make sure that it uses the ocp profile as well as use the KUBECONFIG to authenticate.-c ocp.role=master:worker: By default, it only collects reports from the worker nodes. This flag is an array of nodes labels to analyze.
If you want to customize this command in other ways, you can check all the configuration options under the -c flag running the following command:
sos collect -lAdditional resources
Ok, this post was great, but I need more information about sos report and sos collect. Then, this is your section!! Please, check the following Red Hat KCSs:
Still, this is not enough, I’m missing something and this is not working for me! Then, it is time. You cannot wait more. Click here to see the source code and understand how this amazing tool works!!
Annex: Structure of the kubeconfig
The kubeconfig, by default located under ~/.kube/config is the file that oc and kubectl use to authenticate against the cluster. It basically contains three sections:
clusters: A list of all the clusters that it has connected to before.
users: For each user, the latest credentials that it has used before.
context: A relationship between a cluster and its user.
It also contains the key current-context that stores the context you are currently logged in at.
apiVersion: v1
kind: Config
current-context: <namespace>/<ref-cluster>/<ref-user>
clusters:
- cluster:
server: ""
name: ""
users:
- name: ""
user:
token: ""
contexts:
- context:
cluster: ""
user: ""
name: ""