find /bp2/<log_name>/plan-log/* -mtime +10 -type f -delete
Troubleshooting service issues
Some of the following troubleshooting techniques might require some level of knowledge about OpenStack and the CLI command structure.
For OpenStack specific troubleshooting, you need to understand how to access the OpenStack CLI and be familiar with the command structure.
This section contains:
Using auto clean
If a service/resource fails to create, auto clean automatically cleans up any partially created service/resource that exist (as long as it is enabled). This helps prevent stranded resources as a result of an incomplete or failed creation.
If you disable auto clean (leave it unchecked in the General tab), Blue Planet does not automatically remove any resources (failed or active). This option leaves the resources in the system so that you can troubleshoot. A descriptive error message describes what went wrong during creation.
For information on where to enable/disable auto clean, see Creating services.
Collecting logs
Blue Planet provides several ways for users to collect logs. Depending on what type of log information you are collecting, use one of the following:
Blue Planet Orchestration logs
To download and inspect bpocore logs from the Blue Planet Orchestration interface, select System > System Health > Logging, then select More > System Troubleshooting > Download Log Pack. You can inspect the log files for errors and exceptions that might give clues as to why the resource failed to create.
While Blue Planet log files provide considerable information about your system, we recommend you monitor log file size to ensure they do not affect disk space. Two strategies are available. On the Blue Planet host, install a scheduler job that monitors the Blue Planet logs directory and trims any script-logs files that are older than a certain date.
|
Do not delete log files if you want to have logs for troubleshooting. |
An example scheduled-job command from the bpocore container follows:
From the host, an example scheduled-job command follows:
find /bp2/<bpocore_*>/log/plan-log/* -mtime +10 -type f -delete
where you substitute the current bpocore version above. This command finds all of the files (type f) in the plan-log sub-directory with a modification type that is greater than 10 days old (mtime +10) and performs the delete action on the files.
The following example lists a plan log as part of the scheduled-job command:
/bp2/bpocore_x.x.x/log/plan-log/plan-script-5699be7e-3bc2-4519-8662-f4d4fc5e0125-terminate-2016-01-16T03:55:05.104Z
To create a crontab entry to run the command as the root user enter:
*crontab -u root -e
Enter the command as a crontab entry. For example, to run every day at 1:05 a.m. enter:
5 1 * * * find /bp2/bpocore_*/log/plan-log/* -mtime +10 -type f -delete
Blue Planet core or platform solution logs
The system performs a log rotate schedule automatically.
Creating an auto heal service
Auto heal provides automated recovery for OpenStack virtual machines (VMs) when their compute node suffers from a critical, service-affecting failure. In this scenario, auto heal moves each virtual machine to a healthy host, if such a host is available. Auto heal does not perform the move; rather, it delegates this operation to OpenStack by invoking its evacuate API.
This section describes:
About auto heal
The following describes the auto heal workflow in Blue Planet.
-
Deploy your OpenStack RA. See your installation guide or the RA Functional Reference Guide for details.
-
When you deploy the raopenstack_stautoheal solution, an imperative template with auto heal configured is automatically onboarded. It displays as an OpenStackAutohealManager product in the Blue Planet Orchestration domain.
-
Create a single OpenStackAutoheal service instance. A single instance of the service handles all OpenStack domains attached to the same Blue Planet Orchestration server.
-
Enable auto heal for any newly created VMs created by the OpenStack RA that require protection. Create VMs from the UI or from a custom service template. You must instantiate the auto heal service before creating VMs in order for auto heal to work.
-
If the new VMs require protection, set the
Enable auto heal
property to true.
Configuring auto heal
To configure and activate auto heal in Blue Planet perform the same steps as creating a service keeping the following in mind:
-
When you create a new OpenStack project (using the OpenStack’s horizon dashboard), assign a user with the admin role to that project. Auto-heal success depends on this.
-
Provide the value for the Timeout property. This specifies the time the service waits before reacting to a compute node failure. The Timeout property governs when the reaction to the failure occurs and its value is added to detection and reporting latencies already present in the system.
It might take OpenStack several minutes to detect that the compute node hypervisor is down and several additional minutes for the RA to send the appropriate events to the policy monitor in Blue Planet Orchestration (since the hypervisor is a polled resource). However, even if the timeout is set to zero, the true reaction time might be up to six minutes. The timeout value defaults to five minutes, but the actual reaction time for this use case is up to eleven minutes.
-
Ensure you create only one instance of the auto heal service. Only one service is required to handle compute node monitoring for all OpenStack domains managed by a single Blue Planet orchestrator. If you create a second instance, Blue Planet marks that instance as failed. The feature also depends on accompanying support in the OpenStack RA. The auto heal service creates a Policy Manager monitor that listens for alarms published to the Kafka bus by the RA when a compute node goes down.
-
When you create a virtual machine, ensure you check the Enable Auto heal checkbox. If you create a virtual machine from your own service template, then set the auto heal property for that machine to true.
Note: Enable Auto heal on a per-virtual-machine basis; the system performs recovery if the auto heal property for the VM is set properly.
-
Auto heal leverages the OpenStack evacuate mechanism and is bound by OpenStack rules. One of these rules is that a virtual machine is rebuilt on a healthy compute node in the same availability zone as the failed node.
-
Ensure you comply with any other OpenStack evacuation prerequisites so that evacuations are successfully performed. See your OpenStack documentation about evacuation instances.
Note: The auto heal feature does not require that Ceilometer be part of the OpenStack deployment. The event raised when a compute node fails is generated by the RA and not by OpenStack.
Viewing auto heal alarm activity
To view auto heal alarms, select the Blue Planet logo and click Active Alarms on the Blue Planet Orchestration dashboard. For details on using the alarms viewer, see Viewing and managing alarms in the alarm viewer.
The compute node fail event triggers auto heal and raises an alarm visible in the alarms viewer. You can search for the alarm name event::compute.node.failed or the node hostname. These are the only visible indications that something has gone wrong with a compute node. Currently there is no list displayed of the virtual machines that were evacuated and their current states.
If your Alarms pane on the Blue Planet dashboard does not display alarms, you might need to deploy the alarms solution as well as the optional logging solution on which it is dependent.
Service or resource changes performed outside of Blue Planet
If Blue Planet manages a service or resource and an underlying controller alters it, errors may occur that cause the representation in Blue Planet to be out-of-sync with the managed system.