The DC/OS CLI provides commands to debug services that are not deploying or behaving as expected. To see full logs, append --log-level=debug
to any DC/OS CLI command. For example, to troubleshoot HDFS package installation, use this command:
For more information about log levels, consult the CLI command reference or run dcos --help
.
Debug Subcommands for Stuck Deployments
The DC/OS CLI provides a set of debugging subcommands to troubleshoot a stuck service or pod deployment. You can also use debug services and pods from the DC/OS UI.
Prerequisites
- A DC/OS cluster
- The DC/OS CLI installed
- A service or pod that is stuck in deployment
Sample application definitions
If you do not currently have a service or pod that is stuck in deployment, you can use the following two Marathon application definitions to test the instructions in this section.
-
mem-app.json
This service creates an infinite deployment by requesting more memory than is available.
-
stuck-sleep.json
This service requests too many instances.
dcos marathon debug list
The dcos marathon debug list
command shows you all the services that are in a waiting state. This enables you to see only the services that are not running.
The output of the command shows:
- How many instances of the service or pod are waiting to launch.
- How many Mesos resource offers have been processed.
- How many Mesos resource offers are unused
- The time when the user created or updated the service or pod.
This output can quickly show you which services or pods are stuck in deployment and how long they have been stuck.
dcos marathon debug summary
Once you know which services or pods are stuck in deployment, use the dcos marathon debug summary /<app-id>|/<pod-id>
command to learn more about a particular stuck service or pod.
The output of the command shows the resources, what the service or pod requested, how many offers were matched, and the percentage of offers that were matched. This command can quickly show you which resource requests are not being met.
dcos marathon debug details
The dcos marathon debug details /<app-id>|/<pod-id>
command lets you learn exactly how your service or pod definition should be changed.
The output of the command shows:
- Which hosts are running the service or pod
- The status of the role, constraints, CPUs, memory, disk, and ports the service or pod has requested
- When the last resource offer was received
In the example above, you can see that one instance of /mem-app
has a status of ok
in all categories except memory. The other instance had fewer successful resource matches, with role, CPUs, memory, and ports having no match.
More information about this command can be found in the CLI Command Reference section.