Mirantis OpenStack 8.0 OperationsGuide
Mirantis OpenStack 8.0 OperationsGuide
version 8.0
Mirantis OpenStack v8.0
Monitoring Guide
Contents
Preface 1
Intended Audience 1
Documentation History 1
Introduction 2
Accessing the shell on the nodes 3
Uploading Public Keys 3
SSH to the Fuel Master Node 3
SSH to target nodes 3
How To: Exclude some drives from RAID-1 array 5
How To: Modify Kernel Parameters 6
Using the Cobbler web UI to set kernel parameters 6
Using the dockerctl command to set kernel parameters 7
HowTo: Create an XFS disk partition 8
HowTo: Enable/Disable Galera Cluster Autorebuild Mechanism 9
HowTo: Backport Galera Pacemaker OCF script 10
HowTo: Backport Memcached backend fixes 12
HowTo: Backport RabbitMQ Pacemaker OCF script 14
HowTo: Manage OpenStack services 22
Adding, Redeploying, and Replacing Nodes 25
Redeploy a Non-Controller Node 25
Add a Non-Controller Node 25
Add a MongoDB node 25
Add a controller node 26
Remove a Controller node 27
Configuring an Operating System node 27
How To: Safely remove a Ceph OSD node 29
How To: Adjust Placement Groups when adding additional Ceph OSD node(s) 31
HowTo: Shut down the whole cluster 32
Starting up the cluster 32
Creating and Configuring ML2 Drivers for Neutron 33
Using YAML configuration files 34
Customizing Passwords 34
Adding new modules 35
Docker Containers and Dockerctl 37
Container types 37
Command reference 37
Basic usage 37
Dockerctl 38
System changes for Docker affecting Fuel 5.0 and later 39
Fuel Master architecture changes for Docker 40
Enable Experimental Features 41
Fuel Access Control 42
Managing your Ceph cluster 44
Accessing the Puppet manifest for Ceph 44
Verify the deployment 44
Missing OSD instances 45
Ceph pools 46
Test that Ceph works with OpenStack components 46
Glance 46
Cinder 47
Rados GW 49
Swift 49
Reset the Ceph cluster 51
S3 API in Ceph RADOS Gateway 51
Introduction 51
Getting started 51
User authentication 52
Migrate workloads from a compute node for maintenance 56
Disable virtual machine scheduling 56
Migrate instances 57
Monitor the migration process 59
Restore a compute node after maintenance 60
Maintenance Mode 61
Overview 61
Using the umm command 61
Configuring the UMM.conf file 62
Example of using MM on one node 62
Example of putting all nodes into the maintenance mode at the same time 63
Running vCenter 66
Nova-compute and vSphere clusters mapping 66
Performance Notes 67
Keystone Token Cleanup 67
HowTo: Backup and restore Fuel Master 68
Running the backup 68
Restoring Fuel Master 68
How slave nodes choose the interface to use for PXE booting 70
Horizon Deployment Notes 71
Overview 71
Details of Health Checks 72
Sanity tests description 72
Functional tests description 72
Network issues 74
HA tests description 74
Configuration tests description 74
Cloud validation tests description 75
Notes on Corosync and Pacemaker 76
Troubleshooting 77
Logs and messages 77
Screen notifications 77
Viewing logs through Fuel 77
Viewing the Fuel Master node logs 78
Viewing logs for target nodes ("Other servers") 78
syslog 79
/var/logs 79
atop logs 80
Preface
This documentation provides information on how to use Fuel to deploy OpenStack environments. The
information is for reference purposes and is subject to change.
Intended Audience
This documentation is intended for OpenStack administrators and developers; it assumes that you have
experience with network and cloud concepts.
Documentation History
The following table lists the released revisions of this documentation:
Introduction
This is a collection of useful procedures for using and managing your Mirantis OpenStack environment. The
information given here supplements the information in:
ssh-keygen -t rsa
2. Paste the public key to the Public key field. Fuel uploads the public key to each Fuel Slave node it deploys.
However, the key is not uploaded to the Fuel Slave nodes that have been already deployed.
3. To upload the SSH key to the Fuel Master node or any deployed node, use the following command
sequence:
ssh-agent
ssh-copy-id -i .ssh/id_rsa.pub root@<ip-addr>
<ip-addr> is the IP address for the Fuel Master node, which is the same IP address you use to access the Fuel
console.
You can use this same command to add a public key to a deployed target node. See SSH to target nodes for
information about getting the <ip-addr> values for the target nodes.
You can instead add the content of your key (stored in the .ssh/id_rsa.pub file) to the node's
/root/.ssh/authorized_keys file.
You can ssh to any of the nodes using the IP address. For example, to ssh to the Cinder node:
ssh 10.110.0.4
You can also use the "id" shown in the first column, for example:
ssh node-6
- size: 300
type: boot
- file_system: ext2
mount: /boot
name: Boot
size: 200
type: raid
3. Run deployment
• Use the Fuel Welcome screen to define kernel parameters that will be set for the Fuel Master node when it
is installed.
• Use the Initial parameters field on the Settings tab to define kernel parameters that Fuel will set on the
target nodes. This only affects target nodes that will be deployed in the future, not those that have already
been deployed.
• Use the Cobbler web UI (see Using the Cobbler web UI to set kernel parameters) to change kernel parameters
for all nodes using or for specific nodes. The nodes appear in Cobbler only after they are deployed; to
change parameters before deployment, stop the deployment, change the parameters, then proceed.
• Issue the dockerctl command on each node where you want to set kernel parameters; see Using the dockerctl
command to set kernel parameters.
Any kernel parameter supported by Ubuntu can be set for the target nodes and the Fuel Master node.
• Use the https://<ip-addr>/cobbler_web URL to access the Cobbler web UI; replace <ip-addr> with the IP
address for your Fuel Master Node.
• Log in, using the user name and password defined in the cobbler section of the /etc/fuel/astute.yaml file.
• Select Systems from the menu on the right. This lists the nodes that are deployed in your environment.
Select the node(s) for which you want to set new parameters and click "Edit". The following screen is
displayed:
• Add the kernel parameters and values to the Kernel Options (Post-install) field then click the `Save button.
Note
Replace /dev/sdb with the appropriate block device you wish to configure.
fdisk /dev/sdb
n(for new)
p(for partition)
<enter> (to accept the defaults)
<enter> (to accept the defaults)
w(to save changes)
3. For a standard swift install, all data drives are mounted directly under /srv/node, so first create the mount
point
mkdir -p /srv/node/sdb1
4. Finally, add the new partition to fstab so it mounts automatically, then mount all current partitions
• The OCF Galera script checks every node in the Galera Cluster for the SEQNO position. This allows it to
find the node with the most recent data.
• The script checks for the status of the current node, if it is synchronized with the quorum, the
procedure stops; otherwise, SEQNO is obtained and stored in the Corosync CIB as a variable.
• The script sleeps for 300 seconds, allowing other nodes to join the Corosync quorum and push their
UUIDs and SEQNOs, too.
• For every node in the quorum, the script compares the UUID and SEQNO. If at least one node has a
higher SEQNO, it bootstraps the node as the Primary Component, allowing other nodes to join the
newly formed cluster later;
• The Primary Component node is started with the --wsrep-new-cluster option, forming a new quorum.
To prevent the autorebuild feature you should do:
To check GTID and SEQNO across all nodes saved in Corosync CIB you should do:
To remove all GTIDs and SEQNOs from Corosync CIB and allow the OCF script to reread the data from the
grastate.dat file, you should do:
Warning
Before performing any operations with Galera, you should schedule the maintenance window, perform
backups of all databases, and stop all MySQL related services.
crm_mon -1
3. Download the latest OCF script from the fuel-library repository to the Fuel Master node:
wget --no-check-certificate -O \
/etc/puppet/modules/galera/files/ocf/mysql-wss \
https://raw.githubusercontent.com/stackforge/fuel-library/master/deployment/puppet/ \
galera/files/ocf/mysql-wss
4. The OCF script requires some modification as it was originally designed for MySQL 5.6:
6. Configure the p_mysql resource for the new Galera OCF script:
Note
During this operation, the MySQL/Galera cluster will be restarted. This may take up to 5 minutes.
8. IPs of all nodes should be present in Galera Cluster. This guarantees that all nodes participate in Cluster
operations and acting properly.
Warning
Before performing any operations with Keystone, you should schedule a maintenance window, perform
backups of Keystone configuration files, and stop all Keystone related services.
1. Download the related fixes for puppet modules from the fuel-library repository to the Fuel Master node:
Note
This step assumes the environment id is a "1" and the controller nodes names have a standard Fuel
notation, like "node-1", "node-42" and so on.
3. Update the /etc/keystone/keystone.conf configuration file on all of the controller nodes as the following:
[revoke]
driver = keystone.contrib.revoke.backends.sql.Revoke
[cache]
memcache_dead_retry = 30
memcache_socket_timeout = 1
memcache_pool_maxsize = 1000
[token]
driver = keystone.token.persistence.backends.memcache_pool.Token
Note
The OCF script in the Fuel 6.1 release also distributes and ensures the consistent Erlang cookie file
among the all controller nodes. For backports to the older Fuel versions, this feature is disabled by default
in the OCF script. If you think you want to enable it, please read carefully the details below.
Warning
Before performing any operations with RabbitMQ, you should schedule maintenance window,
perform backups of all RabbitMQ mnesia files and OCF scripts, and stop all OpenStack services an all
environment nodes, see HowTo: Manage OpenStack services for details.
Mnesia files are located at /var/lib/rabbitmq/mnesia/ and OCF files can be found at
/usr/lib/ocf/resource.d/mirantis/.
2. Inside the maintenance window, put the p_rabbitmq-server primitive in unmanaged state at one of the
controller nodes:
Note
Normally, the crm tool can be installed from the crmsh package, by commands:
crm_mon -1
4. Download the latest OCF script from the fuel-library repository to the Fuel Master node:
Note
For the Fuel 5.1 release update the link to use a "5.1" version in the download path.
Note
This step assumes the environment id is a "1" and the controller nodes names have a standard Fuel
notation, like "node-1", "node-42", and so on.
6. Update the configuration of the p_rabbitmq-server resource for the new RabbitMQ OCF script at any
controller node:
or in an XML notation:
<operations>
<op id="p_rabbitmq-server-monitor-30" interval="30" name="monitor" timeout="60"/> \
<op id="p_rabbitmq-server-monitor-27" interval="27" name="monitor" \
role="Master" timeout="60"/>
<op id="p_rabbitmq-server-start-0" interval="0" \
name="start" timeout="60"/>
<op id="p_rabbitmq-server-stop-0" interval="0" \
name="stop" timeout="60"/>
<op id="p_rabbitmq-server-promote-0" interval="0" \
name="promote" timeout="120"/>
<op id="p_rabbitmq-server-demote-0" interval="0" \
name="demote" timeout="60"/>
<op id="p_rabbitmq-server-notify-0" interval="0" \
name="notify" timeout="60"/>
</operations> \
<instance_attributes id="p_rabbitmq-server-instance_attributes"> \
<nvpair id="p_rabbitmq-server-instance_attributes-node_port" \
name="node_port" value="5673"/>
</instance_attributes> \
<meta_attributes id="p_rabbitmq-server-meta_attributes"> \
<nvpair id="p_rabbitmq-server-meta_attributes-migration-threshold" \
name="migration-threshold" value="INFINITY"/>
<nvpair id="p_rabbitmq-server-meta_attributes-failure-timeout" \
name="failure-timeout" value="60s"/> \
</meta_attributes> \
</primitive>
#vim:set syntax=pcmk
Note
The command_timeout parameter value is given for Ubuntu OS.
<nvpair id="p_rabbitmq-server-instance_attributes-some_param" \
name="some_param" value="some_value"/>
Note
If you want to allow the OCF script to manage the Erlang cookie files, provide the existing
Erlang cookie from /var/lib/rabbitmq/.erlang.cookie as an
erlang_cookie parameter, otherwise set this parameter to a false. Note, that a
different Erlang cookie would require to erase mnesia files for all controller nodes as
well.
Warning
Erasing the mnesia files will also erase all custom users, vhosts, queues, and other
RabbitMQ entities, if any.
Note
Ignore messages like "Error: Unable to find operation matching:"
Note
You cannot add resource attributes with pcs tool, you should install crmsh package and use crm tool
in order to update command_timeout and erlang_cookie parameters, see details above.
The output also may have an XML notation and may look like:
Note
During this operation, the RabbitMQ cluster will be restarted. This may take from a 1 up to 20
minutes. If there are any issues, see crm - Cluster Resource Manager.
rabbitmqctl cluster_status
rabbitmqctl list_users
services=$(curl http://git.openstack.org/cgit/open\
stack/governance/plain/reference/projects.yaml | \
egrep -v 'Security|Documentation|Infrastructure' | \
perl -n -e'/^(\w+):$/ && print "openstack-",lc $1,".*\$|",lc $1,".*\$|"')
Now you can start, stop, or restart the OpenStack services, see details about recommended order below.
Warning
Fuel configures some services, like Neutron agents, Heat engine, Ceilometer agents, to be managed
by Pacemaker instead of generic init scripts. These services should be managed only with the pcs or
crm tools!
In order to figure out the list of services managed by Pacemaker, you should first find disabled or not
running services with the command:
On Ubuntu:
Next, you should inspect the output of command pcs resource (or crm resource list) and find the
corresponding services listed, if any.
The Pacemaker resources list is:
Stack: corosync
Current DC: node-1.domain.tld (1) - partition with quorum
Version: 1.1.12-561c4cf
3 Nodes configured
43 Resources configured
You may notice, that there is only a heat-engine service is managed by Pacemaker and disabled in OS. At
any controller node, use the following command to start or stop cluster-wide:
Note
Use pcs or crm tools for corresponding services, when managed by Pacemaker
• Start/stop/restart the remaining OpenStack services on each Controller and Storage node, in any order.
Note
Use pcs or crm tools for corresponding services, when managed by Pacemaker
1. Use live migration to move instances from the Compute nodes you are going to redeploy.
2. If appropriate, back up or copy information from the Operating System nodes being redeployed.
3. Select the node and click Remove to remove the node from the environment. Select the node(s) to be
deleted then click on the "Delete Nodes" button.
4. Click on the "Deploy Changes" button on that screen.
5. Wait for the node to become available as an unallocated node.
6. Use the same Fuel screen to assign an appropriate role to each node being redeployed.
7. Click on the "Deploy Changes" button.
8. Wait for the environment to be deployed.
After redeploying an Operating System node, you will have to manually apply any configuration changes you
made and reinstall the software that was running on the node or restore the system from the backup you made
before redeploying the node.
Additional MongoDB roles can be added to an existing deployment by using shell commands. Any number of
MongoDB roles (or standalone nodes) can be deployed into an OpenStack environment using the Fuel Web UI
during the initial deployment but you cannot use the Fuel Web UI to add MongoDB nodes to an existing
environment.
Fuel installs MongoDB as a backend for Ceilometer. Ideally, you should configure one MongoDB node for each
Controller node in the environment so, if you add Controller nodes, you should also add MongoDB nodes.
To add one or more MongoDB nodes to the environment:
1. Add an entry for each new MongoDB node to the connection parameter in the ceilometer.conf file on
each Controller node. This entry needs to specify the new node's IP address for the Management logical
network.
2. Open the astute.yaml file on any deployed MongoDB node and determine which node has the
primary-mongo role. Write down the value of the fqdn parameter that you will use to connect to this
node.
For more information, see: MongoDB nodes configuration
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/file-ref/astute-yaml-target.html> in the
Fuel User Guide
3. Retrieve the db_password value from the Ceilometer configuration section in the astute.yaml file. You
will use this password to access the primary MongoDB node.
For more information, see: Ceilometer configuration
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/file-ref/astute-yaml-target.html> in the
Fuel User Guide.
4. Connect to the MongoDB node that has the primary-mongo role and log into Mongo:
• OpenStack environment has only one controller node. To make the environment highly-available, add at
least two additional controller nodes.
• The resources of your existing controller nodes are being exhausted and you want to supplement them.
• A controller node has failed and requires replacement. In this case, you must first remove the failed
controller node as described in Remove a Controller node.
Note
Each OpenStack environment must include an odd number of the controller nodes. For example, you can
deploy one, three, or five controller nodes, and so on.
1. Remove Controller(s) from environment by going to the Nodes tab in the Fuel desktop, selecting the node(s)
to be deleted, and click on the "Delete Nodes" button.
Puppet removes the controller(s) from the configuration files and retriggers services.
2. Physically remove the controller from the configuration.
• Create file systems on partitions you created and populate the fstab file so they will mount automatically.
• Configure additional logical networks you need; Fuel only configures the Admin/PXE network.
• Set up any monitoring facilites you want to use such as monit and atop; configure syslog to send error
messages to a centralized syslog server.
• Tune kernel resources to optimize performance for the particular applications you plan to run here.
You are pretty much free to install and configure this node any way you like. By default, all the repositories from
the Fuel Master node are configured so you can install packages from these repositories by running the apt-get
install <package-name> command. You can also use scp to copy other software packages to this node and then
install them using apt-get or yum.
1. Determine which OSD processes are running on the target node (node-35 in this case):
From this output we can see that OSDs 0 and 1 are running on this node.
2. Remove the OSDs from the Ceph cluster:
This will trigger a rebalance. Placement groups will be moved to other OSDs. Once that has completed we
can finish removing the OSD. This process can take minutes or hours depending on the amount of data to
be rebalanced. While the rebalance is in progress the cluster state will look something like:
# ceph -s
cluster 7fb97281-5014-4a39-91a5-918d525f25a9
health HEALTH_WARN recovery 2/20 objects degraded (10.000%)
monmap e1: 1 mons at {node-33=10.108.2.4:6789/0}, election epoch 1, quorum 0 node-33
osdmap e172: 7 osds: 7 up, 5 in
pgmap v803: 960 pgs, 6 pools, 4012 MB data, 10 objects
10679 MB used, 236 GB / 247 GB avail
2/20 objects degraded (10.000%)
1 active
959 active+clean
# ceph -s
cluster 7fb97281-5014-4a39-91a5-918d525f25a9
health HEALTH_OK
monmap e1: 1 mons at {node-33=10.108.2.4:6789/0}, election epoch 1, quorum 0 node-33
osdmap e172: 7 osds: 7 up, 5 in
pgmap v804: 960 pgs, 6 pools, 4012 MB data, 10 objects
When the cluster is in state HEALTH_OK the OSD(s) can be removed from the CRUSH map:
On Ubuntu hosts:
# stop ceph-osd id=0
After all OSDs have been deleted the host can be removed from the CRUSH map:
3. This node can now be deleted from the environment with Fuel.
Caveats:
• Ensure that at least replica count hosts remain in the cluster. If the replica count is 3 then there should
always be at least 3 hosts in the cluster.
• Do not let the cluster reach its full ratio. By default when the cluster reaches 95% utilization Ceph will
prevent clients from writing data to it. See Ceph documentation for additional details.
1. Determine the current values for pg_num and pgp_num for each pool that will be touched. The pools that
may need to be adjusted are the 'backups', 'images', 'volumes' and 'compute' pools.
First get the list of pools that are currently configured.
The pools that may need to be adjusted are the 'backups', 'images', 'volumes', and 'compute' pools. You can
query each pool to see what the current value of pg_num and pgp_num is for a pool individually.
2. Calculate the correct value for pg_num and pgp_num that should be used based on the number of OSDs the
cluster has. See the Ceph Placement Groups documentation for additional details on how to properly
calculate these values. Ceph.com has a Ceph PGs Per Pool Calculator that can be helpful in this calculation.
3. Adjust the pg_num and pgp_num for each of the pools as necessary.
It should be noted that this will cause a cluster rebalance to occur which may have performance impacts on
services consuming Ceph.
Caveats:
• It is also not advisable to set pg_num and pgp_num to large values unless necessary as it does have an
impact on the amount of resources required. See the Choosing the Number of Placement Groups
documentation for additional details.
• Computes
• Controllers (one by one or all in one)
• Cinder/Ceph/Swift
• Other/Neutron (if separate Neutron node exists)
quantum_settings:
server:
service_plugins:
'neutron.services.l3_router.l3_router_plugin.L3RouterPlugin,
neutron.services.firewall.fwaas_plugin.FirewallPlugin,
neutron.services.metering.metering_plugin.MeteringPlugin'
L2:
provider:'ml2'
mechanism_drivers: 'openvswitch'
type_drivers: "local,flat,l2[:segmentation_type]"
tenant_network_types: "local,flat,l2[:segmentation_type]"
flat_networks: '*'
segmentation-type:'vlan'
tunnel_types: l2[:segmentation_type]
tunnel_id_ranges: l2[:tunnel_id_ranges]
vxlan_group: 'None'
vni_ranges: l2[:tunnel_id_ranges]
• The following values should be set only if L2[enable_tunneling] is true: tunnel_types, tunnel_id_ranges,
vxlan_group, vni_ranges.
• The l2[:item] settings refer to values that are already in the quantum_settings.
• This only shows new items that are related to ml2 configuration. The values shown are the defaults that are
used if no other value is set.
• All Neutron component packages are available to download.
Warning
Be very careful when modifying the configuration files. A simple typo when editing these files may
severely damage your environment. When you modify the YAML files, you will receive a warning that
some attributes were modified from the outside. Some features may become inaccessible from the UI
after you do this.
To do this:
where --env 1 that points to the specific environment (id=1 in this example).
To dump deployment information, the command is:
For a full list of configuration types that can be dumped, see the Fuel section.
These commands create a directories called provisioning_1 or deployment_1, which include a number of
YAML files that correspond to roles that are currently assigned to nodes. Each file includes parameters for
current role, so you can freely modify and save them.
Customizing Passwords
A few service passwords are not managed by the Keystone based access control facility. You may want to set your
own passwords in some cases.
You can edit files and modify password values. For example, you can set the MySQL root password in this block:
"mysql": {
"root_password": "mynewpassword"
},
packages/
packages/manifests
packages/manifests/init.pp
class profile {
$tools = $::fuel_settings['tools']
package { $tools :
ensure => installed,
}
}
“tools”: [
“htop”,
“tmux”,
]
Provisioned nodes will have this addition in their parameters and our 'profile' module will be able to access
their values and install the given list of packages during node deployment.
4. Upload the modified configuration:
You can also use the --dir option to set a directory from which to load the parameters.
• Parameters that are about to be sent to the orchestrator are replaced completely with the ones you
specified.
• The cluster sets the is_customized flag, which is checked by the UI so you will get a message about
attributes customization.
Container types
Application Container
An application container is the most common type of container. It usually runs a single process in the
foreground and writes logs to stdout/stderr. Docker traps these logs and saves them automatically in its
database.
Storage Container
A storage container is a minimalistic container that runs Busybox and acts as a sharer of one or more
directories. It needs to run only one time and then spends the majority of its existence in Exited state.
Command reference
Below is a list of commands that are useful when managing LXC containers on the Fuel Master.
Basic usage
Get a list of available commands:
docker help
docker ps
docker ps -a
Note
the storage containers used for sharing files among application containers are usually in Exited state.
Exited state means that the container exists, but no processes inside are running.
Example: The command below creates a temporary postgres container that is ephemeral and not tied to any other
containers. This is useful for testing without impacting production containers.
Loads in a Docker image in the following formats: .tar, .tar.gz, .tar.xz. lrz is not supported.
Save a Docker image to a file:
Dockerctl
Build and run storage containers, then run application containers:
Note
This can take a few minutes, depending on your hardware.
Launch a container from its image with the necessary options. If the container already exists, will ensure that this
container is in a running state:
Optionally, --attach option can be used to monitor the process and view its stdout and stderr.
Display the entire container log for /app/. Useful for troubleshooting:
Note
The container must be running first in order to use this feature. Additionally, quotes must be escaped if
your command requires them.
Note
This is not reversible, so use with caution.
dockerctl list
• Starting with Fuel 6.1, Docker containers with host networking are used. This means that dhcrelay is not
used anymore because cobbler/dnsmasq are bound to the host system.
• Application logs are inside /var/log/docker-logs, including astute, nailgun, cobbler, and others.
• Supervisord configuration is located inside /etc/supervisord.d/(CurrentRelease)/
• Containers are automatically restarted by supervisord. If you need to stop a container for any reason, first
run supervisorctl stop /app/, and then dockerctl stop /app/
VERSION:
...
feature_groups:
- mirantis
- experimental
For more details about configuring the Nailgun settings see Fuel Developer documentation
<http://docs.openstack.org/developer/fuel-docs/>.
Alternatively, you can build a custom ISO with the experimental features enabled:
The default value of <your_master_ip> is 10.20.0.2. The port number of 35357 never changes.
The <admin_token> is stored in the /etc/fuel/astute.yaml file on the Fuel Master node.
To run or disable authentication, modify /etc/nailgun/settings.yaml (AUTHENTICATION_METHOD) in the Nailgun
container.
All endpoints except the agent updates and version endpoint are protected by an authentication token, obtained
from Keystone by logging into Fuel as the admin user with the appropriate password.
Services such as Astute, Cobbler, Postgres, MCollective, and Keystone), which used to be protected with the default
password, are now each protected by a user/password pair that is unique for each Fuel installation.
Beginning with release 6.0, the Nailgun and OSTF services endpoints are added to Keystone and now it is possible
to use the Keystone service catalog to obtain URLs of those services instead of hardcoding them.
Fuel Authentication is implemented by a dedicated Keystone instance that Fuel installs in a new docker container
on the Fuel Master.
• Fuel Menu generates passwords for fresh installations; the upgrade script generates passwords when
upgrading. The password is stored in the Keystone database.
• The nailgun and ostf users are created in the services project with admin roles. They are used to authenticate
requests in middleware, rather than requiring that each request by middleware be validated using the
Keystone admin token as was done in Release 5.1.
• Some Nailgun URLs are not protected; they are defined in nailgun/middleware/keystone.py in the public_url
section.
• The authentication token does not expire for 24 hours so it is not necessary to store the username and
password in the browser cache.
• A cron script runs daily in the Keystone container to delete outdated tokens using the keystone-manage
token_flush command. It can be seen using the crontab -l command in the Keystone container.
• Support for storing authentication token in cookies is added in releases 5.1.1 and later; this allows the API
to be tested from the browser.
• The keystonemiddleware python package replaces the deprecated keystoneclinet.middleware package; this
is an internal change that makes the implementation more stable. All recent fixes and changes are made to
keystonemiddleware; which was extracted from keystoneclinet.middleware in earlier releases.
Beginning with releases 5.1.1 and later, the user must supply a password when upgrading Fuel from an earlier
release. This password can be supplied on the command line when running the installation script or in response
to the prompt (this is the same password that is used to access Fuel UI).
node 'default' {
...
}
The class section configures components for all Ceph-OSD nodes in the environment:
class { 'ceph::deploy':
auth_supported => 'cephx',
osd_journal_size => '2048',
osd_mkfs_type => 'xfs',
}
You can modify the authentication type, Journal size (specified in KB), and the filesystem architecture to use.
root@fuel-ceph-02:~# ceph -s
health HEALTH_OK
monmap e1: 2 mons at {fuel-ceph-01=10.0.0.253:6789/0,fuel-ceph-02=10.0.0.252:6789/0}, el
osdmap e23: 4 osds: 4 up, 4 in
pgmap v275: 448 pgs: 448 active+clean; 9518 bytes data, 141 MB used, 28486 MB / 28627 MB
mdsmap e4: 1/1/1 up {0=fuel-ceph-02.local.try=up:active}
root@fuel-ceph-01:~# ceph -s
health HEALTH_WARN 63 pgs peering; 54 pgs stuck inactive; 208 pgs stuck unclean; recovery 2
...
• Check the links in /root/ceph*.keyring. There should be one for each admin, osd, and mon role that is
configured. If any are missing, it may be the cause of the error. To correct use soft-links:
Place the downloaded keys to /etc/ceph/. Remove the original files from /root and create symlinks with the
same names in /root, pointing to the actual files in /etc/ceph/.
• Try to run the following command:
If this lists a python process running for ceph-create-keys, it usually indicates that the Ceph-MON processes
are unable to communicate with each other.
• Check the network and firewall for each Ceph-MON. Ceph-MON defaults to port 6789.
• If public_network is defined in the ceph.conf file, mon_host and DNS names must be inside the
public_network or ceph-deploy does not create the Ceph-MON processes.
Ceph pools
To see which Ceph pools have been created, use the ceph osd lspools command:
By default, two pools -- image and volumes are created. In this case, we also have data, metadata, and rbd pools.
source ~/openrc
glance image-create --name cirros --container-format bare \
--disk-format qcow2 --is-public yes --location \
https://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-disk.img
+------------------+--------------------------------------+
| Property | Value |
+------------------+--------------------------------------+
| checksum | None |
| container_format | bare |
| created_at | 2013-08-22T19:54:28 |
| deleted | False |
| deleted_at | None |
| disk_format | qcow2 |
| id | f52fb13e-29cf-4a2f-8ccf-a170954907b8 |
| is_public | True |
| min_disk | 0 |
| min_ram | 0 |
| name | cirros |
| owner | baa3187b7df94d9ea5a8a14008fa62f5 |
| protected | False |
| size | 0 |
| status | active |
| updated_at | 2013-08-22T19:54:30 |
+------------------+--------------------------------------+
rbd ls images
rados -p images df
Cinder
Create a small volume and see if it is saved in Cinder:
source openrc
cinder create 1
This will instruct Сinder to create a 1 GB volume. This should return something similar to the following:
+=====================+======================================+
| Property | Value |
+=====================+======================================+
| attachments | [] |
+---------------------+--------------------------------------+
| availability_zone | nova |
+---------------------+--------------------------------------+
| bootable | false |
+---------------------+--------------------------------------+
| created_at | 2013-08-30T00:01:39.011655 |
+---------------------+--------------------------------------+
| display_description | None |
+---------------------+--------------------------------------+
| display_name | None |
+---------------------+--------------------------------------+
| id | 78bf2750-e99c-4c52-b5ca-09764af367b5 |
+---------------------+--------------------------------------+
| metadata | {} |
+---------------------+--------------------------------------+
| size | 1 |
+---------------------+--------------------------------------+
| snapshot_id | None |
+---------------------+--------------------------------------+
| source_volid | None |
+---------------------+--------------------------------------+
| status | creating |
+---------------------+--------------------------------------+
| volume_type | None |
+---------------------+--------------------------------------+
Check the status of the image using its id with the cinder show <id> command:
+==============================+======================================+
| Property | Value |
+==============================+======================================+
| attachments | [] |
+------------------------------+--------------------------------------+
| availability_zone | nova |
+------------------------------+--------------------------------------+
| bootable | false |
+------------------------------+--------------------------------------+
| created_at | 2013-08-30T00:01:39.000000 |
+------------------------------+--------------------------------------+
| display_description | None |
+------------------------------+--------------------------------------+
| display_name | None |
+------------------------------+--------------------------------------+
| id | 78bf2750-e99c-4c52-b5ca-09764af367b5 |
+------------------------------+--------------------------------------+
| metadata | {} |
+------------------------------+--------------------------------------+
| os-vol-host-attr:host | controller-19.domain.tld |
+------------------------------+--------------------------------------+
| os-vol-tenant-attr:tenant_id | b11a96140e8e4522b81b0b58db6874b0 |
+------------------------------+--------------------------------------+
| size | 1 |
+------------------------------+--------------------------------------+
| snapshot_id | None |
+------------------------------+--------------------------------------+
| source_volid | None |
+------------------------------+--------------------------------------+
| status | available |
+------------------------------+--------------------------------------+
| volume_type | None |
+------------------------------+--------------------------------------+
If the image shows status available, it was successfully created in Ceph. You can check this with the rbd ls volumes
command.
rbd ls volumes
volume-78bf2750-e99c-4c52-b5ca-09764af367b5
Rados GW
First confirm that the cluster is HEALTH_OK using ceph -s or ceph health detail. If the cluster is not healthy most of
these tests will not function.
Note
RedHat distros: mod_fastcgi's /etc/httpd/conf.d/fastcgi.conf must have FastCgiWrapper Off or rados calls
will return 500 errors.
Rados relies on the service radosgw (Debian) ceph-radosgw (RHEL) to run and create a socket for the webserver's
script service to talk to. If the radosgw service is not running, or not staying running then you need to inspect it
closer.
The service script for radosgw might exit 0 and not start the service. An easy way to test this is to simply service
ceph-radosgw restart if the service script can not stop the service, it was not running in the first place.
You can also check to see if the radosgw service is be running with the ps axu | grep radosgw command, but this
might also show the webserver script server processes as well.
Most commands from radosgw-admin will work regardless of whether the radosgw service is running or not.
Swift
Create a new user:
Swift authentication works with subusers. In OpenStack this will be tenant:user, so you need to mimic it:
{ "user_id": "test",
"display_name": "username",
"email": "username@domain.com",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [
{ "id": "test:swift",
"permissions": "full-control"}],
"keys": [
{ "user": "test",
"access_key": "CVMC8OX9EMBRE2F5GA8C",
"secret_key": "P3H4Ilv8Lhx0srz8ALO\/7udwkJd6raIz11s71FIV"}],
"swift_keys": [],
"caps": []}
Note
--gen-secret is required in Cuttlefish and newer.
Getting started
Assumption has been made that you posses a working Ceph cluster and the radosgw is able to access the cluster.
In case of using cephx security system, which is the default scenario, both radosgw and cluster must authenticate
to each other. Please note this is not related to any user-layer authentication mechanism used in radosgw like
Keystone, TempURL, or TempAuth. If radosgw is deployed with Fuel, cephx should work out of the box. In case of
manual deployment, official documentation will be helpful.
To enable or just verify whether S3 has been properly configured, the configuration file used by radosgw (usually
/etc/ceph/ceph.conf) should be inspected. Please take a look at radosgw's section (usually client.radosgw.gateway)
and consider the following options:
User authentication
The component providing S3 API implementation inside radosgw actually supports two methods of user
authentication: Keystone-based and RADOS-based (internal). Each of them may be separately enabled or disabled
with an appropriate configuration option. The first one takes precedence over the second. That is, if both
methods are enabled and Keystone authentication fails for any reason (wrong credentials, connectivity problems
etc.), the RADOS-based will be treated as fallback.
Keystone-based
Configuration
Keystone authentication for S3 is not enabled by default, even if using Fuel. Please take a look at appropriate
section (usually client.radosgw.gateway) in radosgw's configuration file and consider following options:
Present in F
Default uel-deploye
value in d configurat
Option name rgw Comment ion
rgw_s3_auth_use_keystone false must be present and set to true no
rgw_keystone_url empty must be present and set to point admin interface yes
of Keystone (usually port 35357, some versions of
Fuel wrongly set the port to 5000).
rgw_keystone_admin_token empty must be present and match the admin token set in yes
Keystone configuration
rgw_keystone_accepted_roles Member, should correspond the Keystone schema. Fuel yes
admin setting seems to be OK
In case of using Keystone-based authentication, user management is fully delegated to Keystone. You may use
keystone CLI command to do that. Please be aware that the EC2/S3 <AccessKeyId>:<secret> credentials pair do not
map well into Keystone (there is no a direct way to specify a tenant), so special compatibility layer has been
introduced. Practically, it means you need to tell Keystone about the mapping parameters manually. For example:
WARNING: Bypassing authentication using a token & endpoint (authentication credentials are
+-----------+----------------------------------+
| Property | Value |
+-----------+----------------------------------+
| access | 3862b51ecc6a43a78ffca23a05e7c0ad |
| secret | a0b4cb375d5a409893b05e36812fb811 |
| tenant_id | 68a23e70b5854263ab64f2ddc16c2a38 |
| trust_id | |
| user_id | 2ccdd07ae153484296d308eab10c85dd |
+-----------+----------------------------------+
access and secret are the parameters needed to authenticate a client to S3 API.
Performance Impact
Please be aware that Keystone's PKI tokens are not available together with S3 API. Moreover, radosgw doesn't
cache Keystone responses while using S3 API. This could lead to authorization service overload.
RADOS-based (internal)
Configuration
The RADOS-based authentication mechanism should work out of the box. It is enabled by default in radosgw and
Fuel does not change this setting. However, in case of necessity to disable it, the rgw_s3_auth_use_rados may be
set to false.
User management could be performed with command line utility radosgw-admin provided with Ceph. For
example, to create a new user the following command should be executed:
access_key and secret_key are the parameters needed to authenticate a client to S3 API.
Verification
To check whether everything works fine a low-level S3 API client might be very useful, especially if it can provide
assistance in the matter of authentication signature generation. S3 authentication model requires that the client
provides a key identifier (AccessKeyId) and HMAC-based authentication signature, which is calculated against a
user key (secret) and some HTTP headers present in the request. The well-known solution is s3curl application.
However, unpatched versions contain severe bugs (see LP1446704). We fixed them already and sent a pull
request to its author. However, until it is not merged, we may recommend trying this version of s3curl.
Step-by-step instruction
4. Create .s3curl file in your home directory. This file should contain your AccessKeyId and SecretAccessKey
pairs.
%awsSecretAccessKeys = (
# your account
ant => {
id => '9TEP7FTSYTZF2HZD284A',
key => '8uNAjUZ+u0CcpbJsQBgpoVgHkm+PU8e3cXvyMclY',
},
);
my @endpoints = ('172.16.0.2');
Example:
Note
You can get your S3 endpoint using the keystone CLI command as follows:
| s3.publicURL | http://172.16.0.2:8080 |
+--------------+------------------------+
• To get an object
Example:
• Upload a file
Example:
Note
Known issues: LP1477457 in Fuel 6.1.
nova service-list
+---+-----------------+-----+--------+-------+-------+----------+--------+
| Id| Binary |Host | Zone |Status | State |Updated_at|Disabled|
| | | | | | | |Reason |
+---+-----------------+-----+--------+-------+-------+----------+--------+
| 1 | nova-conductor |node1|internal|enabled| up |2015-11-16| - |
| 3 | nova-cert |node1|internal|enabled| up |2015-11-16| - |
| 4 | nova-network |node1|internal|enabled| up |2015-11-16| - |
| 5 | nova-scheduler |node1|internal|enabled| up |2015-11-16| - |
| 6 | nova-consoleauth|node1|internal|enabled| up |2015-11-16| - |
| 7 | nova-compute |node1|nova |enabled| up |2015-11-16| - |
| 8 | nova-network |node2|internal|enabled| up |2015-11-16| - |
| 9 | nova-compute |node2|nova |enabled| up |2015-11-16| |
+---+-----------------+-----+--------+-------+-------+----------+--------+
Example:
This command puts the compute node node2 into a maintenance mode and prevents the
nova-compute service to boot instances on this compute node.
System response:
+-------+--------------+----------+
| Host | Binary | Status |
+-------+--------------+----------+
| node2 | nova-compute | disabled |
+-------+--------------+----------+
Migrate instances
After you disable virtual machine scheduling as described in Disable virtual machine scheduling, you must migrate
the virtual machines instances.
To migrate instances:
Example:
Example:
• Migrate all instances from the node under maintenance to other available compute nodes:
Example:
Host live migration with shared storage:
+--------------------------+-------------------------+--------------+
| Server UUID | Live Migration Accepted |Error Message |
+--------------------------+-------------------------+--------------+
| 4c379e02-f474-4fc3-929d- | True | |
| f917b5d3bca0 | | |
+--------------------------+-------------------------+--------------+
Example:
Host live evacuation without shared storage:
• Cold migration:
3. Confirm migration:
Example:
+-------------------------------+--------------------+---------------+
| Server UUID | Migration Accepted | Error Message |
+-------------------------------+--------------------+---------------+
| 4c379e02-f474-4fc3-929d-f917b | True | |
| 86b5e13a-35c4-434e-aaac-5941f | True | |
+-------------------------------+--------------------+---------------+
2. After migration completes, start the maintenance procedure.
3. When you complete maintenance, proceed to Restore a compute node after maintenance.
+----------------+-------------------+---------+------------------------+
| Action | Request_ID | Message | Start_Time |
+----------------+-------------------+---------+------------------------+
| create | req-16c835a0-ce12-| - | 2015-10-21T13:16:37.000|
| live-migration | req-fe6f0edc-e0ef | - | 2015-10-21T13:22:59.000|
+----------------+-------------------+---------+------------------------+
Example:
+-------------------+------------------------------------------+
| Property | Value |
+-------------------+------------------------------------------+
| action | live-migration |
| events | [] |
| instance_uuid | 6e891799-2180-4b22-99a3-565143a001ea |
| message | - |
| project_id | 7909fc66b9254c38955c5a1bcc75f918 |
| request_id | req-7c1e7b16-3a0b-4083-b740-53997211e441 |
| start_time | 2015-11-10T17:04:11.000000 |
| user_id | b9357b628a774a9eb79ae40aedf9add6 |
+-------------------+------------------------------------------+
Example:
System response:
+-------+--------------+----------+
| Host | Binary | Status |
+-------+--------------+----------+
| node2 | nova-compute | enabled |
+-------+--------------+----------+
Maintenance Mode
During maintenance mode, an operating system runs only a critical set of working services needed for basic
network and disk operations. When in maintenance mode, a node is still reachable using SSH.
You can put your system in Maintenance Mode for system repair or other service operations.
Typically, Mirantis OpenStack maintenance updates do not cause service downtime. However, when you replace
hardware or update kernel packages, your virtual machines become unavailable to users. Therefore, if you run
critical applications that cannot tolerate downtime, you must move your virtual machines from the compute node
under maintenance to an operational compute node as described in Migrate workloads from a compute node for
maintenance before enabling maintenance mode.
Overview
Maintenance mode is enforced by using the umm parameter in one of the following ways:
Note
If you manually start a service in the maintenance mode, it will not be automatically restarted when you
put the system back in the normal mode with the umm off command.
• umm on [cmd] - enter the maintenance mode, and execute cmd when MM is reached;
• umm status - check the mode status. There are three statuses:
• UMM=yes
• REBOOT_COUNT=2
• COUNTER_RESET_TIME=10
where:
UMM
tells the system to go into the maintenance mode based on the REBOOT_COUNT and
COUNTER_RESET_TIME values. If the value is anything other than yes (or if the UMM.conf file is missing),
the system will go into the native Ubuntu recovery mode.
REBOOT_COUNT
determines the number of unclean reboots that trigger the system to go into the maintenance mode.
COUNTER_RESET_TIME
determines the period of time (in minutes) before the Unclean reboot counter reset.
1 root@node-1:~#umm on
2 umm-gr start/running, process 6657
3
4 Broadcast message from root@node-1
5 (/dev/pts/0) at 14:29 ...
6
7 The system is going down for reboot NOW!
8 root@node-1:~# umm status
9 rebooting
10 root@node-1:~# Connection to node-1 closed by remote host.
11 Connection node-1:~# closed.
12 root@fuel:~#:~$
13
14 root@node-1:~#ssh
15
16 root@node-1:~# umm status
17 umm
18 root@node-1:~#ps -Af
1 root@node-1:~#umm off
1 root@node-1:~#umm status
2 runlevel N 2
3 root@node-1:~#/etc/init.d/apache2 status
4 Apache2 is running (pid 1907).
We can see that service was not restarted during switching from MM to working mode.
• Check the state of the OpenStack services:
1 root@node-1:~#crm status
• If you want to reach working mode by reboot, you should use the following command:
Example of putting all nodes into the maintenance mode at the same time
The following maintenance mode sequence is called Last input First out. This guarantees that there is going to be
the most recent data on the Cloud Infrastructure Controller (CIC) that comes back first.
1 [root@fuel ~]# ssh node-1 umm on ssh node-2 umm on ssh node-3 umm on
2 Warning: Permanently added 'node-1' (RSA) to the list of known hosts.
3 umm-gr start/running, process 24318
4 Connection to node-1 closed by remote host.
5 Connection to node-1 closed.
6 [root@fuel ~]#
7 [root@fuel ~]# ssh -tt node-1 ssh -tt node-2 ssh -tt node-3 sleep 1
8 Warning: Permanently added 'node-1' (RSA) to the list of known hosts.
9 ECDSA key fingerprint is 84:17:0d:ea:27:1f:4e:08:f7:54:b2:8c:fe:8a:13:1a.
10 Are you sure you want to continue connecting (yes/no)? yes
11 Warning: Permanently added 'node-2,10.20.0.4' (ECDSA)
12 to the list of known hosts. established.
13 ECDSA key fingerprint is
14 c3:c6:ca:7d:11:d3:53:01:15:64:20:f7:c7:44:fb:d1.
15 Are you sure you want to continue connecting (yes/no)? yes
16 Warning: Permanently added 'node-3,192.168.0.6' (ECDSA)
17 to the list of known hosts.
18 Connection to node-3 closed.
19 Connection to node-2 closed.
20 Connection to node-1 closed. [root@fuel ~]#
Running vCenter
After the OpenStack environment that is integrated with the vCenter server is deployed, you can manage the
VMware cluster using the Horizon dashboard and/or the Nova CLI:
Performance Notes
Keystone Token Cleanup
The Keystone service creates new tokens in the Keystone database each time an external query is run against
OpenStack but it does not automatically clean up expired tokens because they may be required for forensics work
such as that required after a security breach. However, the accumulation of the expired tokens in the database
can seriously degrade the performance of the entire OpenStack.
Beginning with version 5.0, Mirantis OpenStack includes the pt-archiver command from the Percona Toolkit. We
recommend using pt-archiver to set up a cleanup job that runs periodically; the cleanup-keystone-tokens.sh
script from TripleO is a good example:
It is better to use pt-archiver instead of deleting the expired tokens using standard database manipulation
commands because it prevents the Keystone database from being blocked for significant time periods while the
rows with expired tokens are deleted.
Note
If you make further changes to your environment after a backup, you should make a new backup.
How slave nodes choose the interface to use for PXE booting
Fuel configures the NIC name/order based on the data seen by the Nailgun agent (/opt/nailgun/bin/agent) on the
discovered nodes. This is in turn the result of how the NICs are named/ordered by the bootstrap node.
The device used by the Admin (PXE) network will be the interface that is directly attached to this network. If one
is not available, it will fall back to the interface with the default gateway.
For example:
If physical device 0 is connected to the Admin (PXE) network, then eth0 will be the admin interface in Fuel.
If instead physical device 1 is connected to the Admin (PXE) network, then eth1 will be the admin interface in
Fuel.
A common issue here is that physical device 0, may not always be assigned device eth0. You may see:
In this case, having physical device 0 connected to the Admin (PXE) network will result in the eth2 interface
being used as the admin interface in Fuel.
You can confirm that the right interface is in use because the MAC address did not change even though the
device name did.
• Keypair creation
• Security group creation
• Check networks parameters
• Launch instance
• Assign floating IP
• Check that VM is accessible via floating IP address
• Check network connectivity from instance via floating IP
• Check network connectivity from instance without floating IP
• Launch instance with file injection
• Launch instance and perform live migration
• User creation and authentication in Horizon
Network issues
Fuel has the built-in capability to run a network check before or after an OpenStack environment deployment.
The connectivity check includes tests for connectivity between nodes through configured VLANs on configured
host interfaces. Additionally, the checks for an unexpected DHCP server are performed to verify that outside
DHCP servers will not interfere with a deployment. If the verification does not succeed, the Connectivity Check
screen displays the details of a failure.
HA tests description
HA tests verify the High Availability (HA) architecture. The following is a description of each HA test available:
• Check usage of the default credentials (password) for root user to SSH on the Fuel Master node. If the
default password was not changed, the test will fail with a recommendation to change it.
• Check usage of the default credentials for OpenStack cluster. If the default values are used for the admin
user, the test will fail with a recommendation to change the password/username for the OpenStack user
with the admin role.
• If you restart the Corosync service, you will need to restart the Pacemaker as well.
• Corosync 1.x cannot be upgraded to 2.x without full cluster downtime. All Pacemaker resources, such as
Neutron agents, DB and AMQP clusters and others, will be hit by the downtime as well.
• All location constraints for cloned the Pacemaker resources must be removed as a part of the Corosync
version upgrade procedure.
Troubleshooting
Logs and messages
A number of logs are available to help you understand and troubleshoot your OpenStack environment.
Screen notifications
To view the latest updates on the nodes statuses, see the notifications drop-down list using the rightmost icon in
the upper right corner of the Fuel web UI. Click a notification to get a summary configuration view of a particular
node.
Many of the same logs shown for the Fuel Master node are also shown for the Fuel Slave nodes. The difference is
in the nodes given for bootstrap logs. Additionally, the controller node includes a set of OpenStack logs that
diplays logs for services that run on the Controller node.
The "Bootstrap logs" for "Other servers" are:
Bootstrap logs
dmesg
Standard Linux dmesg log that displays log messages from the most recent system startup.
messages
Logs all kernel messages for the node.
mcollective
Logs activities of Mcollective
agent
Logs activities of the Nailgun agent.
syslog
OpenStack uses the standard Linux syslog/rsyslog facilities to manage logging in the environment. Fuel puts the
appropriate templates into the /etc/rsyslog.d directory on each target node.
By default, Fuel sets up the Fuel Master node to be the remote syslog server. See the Fuel User Guide for
instructions on how to configure the environment to use a different server as the rsyslog server. Note that Fuel
configures all the files required for rsyslog when you use the Fuel Master node as the remote server; if you
specify another server, you must configure that server to handle messages from the OpenStack environment.
/var/logs
Logs for each node are written to the node's /var/logs directory and can be viewed there. Under this directory, you
will find subdirectories for the major services that run on that node such as nova, cinder, glance, and heat.
On the Fuel Master node, /var/log/remote is a symbolic link to the /var/log/docker-logs/remote directory.
Logging events from Fuel OCF agents are collected both locally in /var/log/daemon.log and remotely. The naming
convention for a remote log file is the following:
ocf-<AGENT-NAME>.log
For example:
ocf-foo-agent.log
Note
RabbitMQ logs its OCF events to the lrmd.log file to keep it backwards compatible.
atop logs
Fuel installs and runs atop on all deployed nodes in the environment. The atop service uses screen to display
detailed information about resource usage on each node. The data shows usage of the hardware resources that
are most important from a performance standpoint: CPU, memory, disk, and network interfaces and can be used
to troubleshoot performance and scalability issues.
The implementation is:
• By default, atop takes a snapshot of system status every 20 seconds and stores this information in binary
form.
• The binary data is stored locally on each node in the /var/log/atop directory.
• Data is kept for seven days; a logrotate job deletes logs older than seven days.
• Data is stored locally on each target node; it is not aggregated for the whole environment.
• The data consumes no more than 2GB of disk space on each node with the default configuration settings.
• The atop service can be disabled from the Linux shell. A Puppet run (either done manually or when patching
OpenStack) re-enables atop.
• The Diagnostic Logs snapshot includes the atop data for the current day only.
To view the atop data, run the atop(1) command on the shell of the node you are analyzing:
atop -r /var/log/atop/<filename>
See the atop(1) man page for a description of the atop options and commands.
Each target node has a configuration file that controls the behavior of atop on that node. On Ubuntu nodes, this
is the /etc/default/atop file. The contents of the file are:
INTERVAL=20
LOGPATH="/var/log/atop"
OUTFILE="$LOGPATH/daily.log"`
Modifying the value of the INTERVAL parameter or the logrotate settings affects the size of the logs maintained.
For the most efficient log size, use a larger interval setting and a smaller rotate setting. To modify the rotate
setting, edit the /etc/logrotate.d/atop file and make both of the following modifications; the value of X should be
the same in both cases:
• By default, the logrotate script calls /etc/logrotate.d/fuel.nodaily file every 15 minutes to verify whether one
of the following conditions is met:
• age - if the log has not been rotated in more than the specified period of time (weekly by default), and
the file is larger than 10MB on the Fuel Master node or 5MB on slave nodes;
• size - if a log file exceeds 100MB on the Fuel Master node or 20MB on slave nodes.
Warning
Fuel enforces the following global changes to the logrotate configuration: delaycompress and
copytruncate.
• You can run a quick test to check if the logrotate script works.
Find the biggest file in /var/log/, and check date time stamps in the first and the last line:
If it is older than your rotation schedule and bigger than maxsize, then logrotate is not working correctly.
To debug, type:
logrotate -v -d /etc/logrotate.d/fuel.nodaily
The output of this command is a list of files examined by logrotate, including whether they should be
rotated or not.
• You can find an example of a writing rate evaluation for the neutron-server log file: LP1382515.
• When backporting the reworked logrotate configuration to the older Fuel releases, purge old template files:
rm /etc/logrotate.d/{1,2}0-fuel*
The /usr/bin/fuel-logrotate script is needed as well as a new cron job to perform the rotation with it.
Most OpenStack services use the same configuration options to control log level and logging method. If you need
to troubleshoot a specific service, locate its config file under /etc (e.g. /etc/nova/nova.conf) and revert the values
of debug and use_syslog flags like this:
debug = True
use_syslog = False
Disabling syslog will protect the Fuel master node from being overloaded from a flood of debug messages sent
from that node to the rsyslog server. Do not forget to revert both flags back to original values when done with
troubleshooting.
HTTP Error 400: Bad Request (This Session's transaction has been rolled back
due to a previous exception during flush. To begin a new transaction with
this Session, first issue Session.rollback(). Original exception was:
(InternalError) index "notifications_pkey" contains unexpected zero page at
block 26
HINT: Please REINDEX it.
Solution
The postgres container should still be running, so you simply need to run an SQL command to correct the fault.
Before attempting to fix the database, make a quick backup of it:
date=$(date --rfc-3339=date)
dockerctl shell postgres su - postgres -c 'pg_dumpall --clean' \
> /root/postgres_backup_$(date +"%F-%T").sql
Note
You may need to restart the postgres, keystone, nailgun, nginx or ostf Docker container using the
dockerctl restart CONTAINERNAME command.
Solution
This solution requires data recovery, described in the Summary above. It is necessary to recover data manually
using the dmsetup and mount commands.
First, you need the full UID of the docker container that was lost. In the log message above, we can see the ID is
273c9b19ea61414d8838772aa3aeb0f6f1b982a74555fb6631adb6232459fe80. If you are missing such a
message, it can be found this way:
container=postgres
container_id=$(sqlite3 /var/lib/docker/linkgraph.db \
"select entity_id from edge where name like '%$container%'")
echo $container_id
#should look like:
273c9b19ea61414d8838772aa3aeb0f6f1b982a74555fb6631adb6232459fe80
Once you have the container ID, you need to get the devicemapper block device ID for the container:
• for PostgreSQL:
ls -la /mnt/${container}_recovery/rootfs/var/lib/pgsql/9.3/data/
• for Astute:
ls -la /mnt/${container}_recovery/rootfs/var/lib/astute
• for Cobbler:
ls -la /mnt/${container}_recovery/rootfs/var/lib/cobbler
Next, it is necessary to purge the container record from the Docker sqlite database. You may see an issue when
running dockerctl start CONTAINER:
Run this command before trying to restore the container data or if you are simply destroying and recreating it:
Now perform the following recovery actions, which vary depending on whether you need to recover data from
Cobbler, Astute, or PostgreSQL:
For Cobbler:
For PostgreSQL:
To recover a corrupted PostgreSQL database, you can import the dump to another PostgreSQL installation with
the same version, as on fuel master(in 6.0 it is 9.3.5) There you can get a clean dump that you then import to your
PostgreSQL container:
For Astute:
umount "/mnt/${container}_recovery"
dmsetup clear $device_id
Read-only containers
Symptoms
Solution
If the container affected is stateful, it is necessary to recover the data. Otherwise, you can simply destroy and
recreate stateless containers.
For stateless containers:
container_id=$(sqlite3 /var/lib/docker/linkgraph.db \
"select entity_id from edge where name like '%$container%'")
echo $container_id
umount -l /dev/mapper/docker-*$container_id
fsck -y /dev/mapper/docker-*$container_id
dockerctl start $container
Note
In Mirantis OpenStack 6.0 and later, multiple L3 agents are configured as clones, one on each Controller.
When troubleshooting Corosync and Pacemaker, the clone_p_neutron-l3-agent resource (new in 6.0) is
used to act on all L3 agent clones in the environment. The p_neutron-l3-agent resource is still provided,
to act on a specific resource on a specific Controller node;
root@node-1:~# crm
crm(live)#
crm(live)# status
============
Last updated: Tue Jun 23 08:47:23 2015
Last change: Mon Jun 22 17:24:32 2015
Stack: corosync
Current DC: node-1.domain.tld (1) - partition with quorum
Version: 1.1.12-561c4cf
3 Nodes configured
43 Resources configured
============
============
Last updated: Tue Jun 23 08:47:52 2015
Last change: Mon Jun 22 17:24:32 2015
Stack: corosync
Current DC: node-1.domain.tld (1) - partition with quorum
Version: 1.1.12-561c4cf
3 Nodes configured
43 Resources configured
============
crm(live)# resource
Here you can enter resource-specific commands:
root@node-1:~# crm
crm(live)# resource
crm(live)resource# status
In this case, crm found residual OpenStack agent processes that had been started by Pacemaker because of
network failure and cluster partitioning. After the restoration of connectivity, Pacemaker saw these duplicate
resources running on different nodes. You can let it clean up this situation automatically or, if you do not want to
wait, cleanup them manually.
For more information, see crm interactive help and documentation.
Sometimes a cluster gets split into several parts. In this case, crm status shows something like this:
On ctrl1
============
….
Online: [ ctrl1 ]
On ctrl2
============
….
Online: [ ctrl2 ]
On ctrl3
============
….
Online: [ ctrl3 ]
You can troubleshoot this by checking connectivity between nodes. Look for the following:
1. By default Fuel configures corosync over UDP. Security Appliances shouldn't block UDP traffic for 5404,
5405 ports. Deep traffic inspection should be turned off for these ports. These ports should be accepted on
the management network between all controllers.
2. Corosync should start after the network interfaces are activated.
3. bindnetaddr should be located in the management network or at least in the same reachable segment.
corosync-cfgtool -s
This command displays the cluster connectivity status.:
runtime.totem.pg.mrp.srp.members.134245130.ip=r(0) ip(10.107.0.8)
runtime.totem.pg.mrp.srp.members.134245130.join_count=1
...
runtime.totem.pg.mrp.srp.members.201353994.ip=r(0) ip(10.107.0.12)
runtime.totem.pg.mrp.srp.members.201353994.join_count=1
runtime.totem.pg.mrp.srp.members.201353994.status=joined
If the IP of the node is 127.0.0.1, it means that Corosync started when only the loopback interface was available
and bound to it.
If the members list contains only one IP address or is incomplete, it indicates that there is a Corosync
connectivity issue because this node does not see the other ones.
As no-quorum-policy is set to stop on fully functioning cluster, Pacemaker will stop all resources on quorumless
partition. If quorum is present, the cluster will function normally, allowing to drop minor set of controllers. This
eliminates split-brain scenarios where nodes doesn't have quorum or can't see each other.
In some scenarios, such as manual cluster recovery, no-quorum-policy can be set to ignore. This setting allows
operator to start operations on single controller rather than waiting for for quorum.
Once quorum or cluster is restored, no-quorum-policy should be set back to its previous value.
Also, Fuel temporarily sets no-quorum-policy to ignore when Cloud Operator adds/removes a controller node to
the cluster. This is required for scenarios when Cloud Operator adds more controller nodes than the cluster
currently consist of. Once addition/removal of new controller node is done, Fuel sets no-quorum-policy to stop
value.
It's also recommended to configure fencing (STONITH) for Pacemaker cluster. That could be done manually or
with help of Fencing plugin[1]_ for Fuel. When STONITH enabled, no-quorum-policy could be set to suicide as
well. When set to suicide, the node will shoot itself and any other nodes in the partition without quorum - but it
won't try to shoot the nodes it can't see. When set to ignore (or when it has quorum), it will shoot anyone it can't
see. For any other value, it won't shoot anyone when it doesn't have quorum.
Furthermore, Corosync will always try to automatically restore the cluster back into single partition and start all
of the resources, if any were stopped, unless some controller nodes are damaged (cannot run the Corosync
service for example). Such nodes cannot join back the cluster and must be fenced by the STONITH daemon. That
is why production cluster should always have a fencing enabled.
# ip link show
You can also check ovs-vsctl show output to see that all corresponding tunnels/bridges/interfaces are
created and connected properly:
1 Fencing plugin
ce754a73-a1c4-4099-b51b-8b839f10291c
Bridge br-mgmt
Port br-mgmt
Interface br-mgmt
type: internal
Port "eth1"
Interface "eth1"
Bridge br-ex
Port br-ex
Interface br-ex
type: internal
Port "eth0"
Interface "eth0"
Port "qg-814b8c84-8f"
Interface "qg-814b8c84-8f"
type: internal
Bridge br-int
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port br-int
Interface br-int
type: internal
Port "tap7b4ded0e-cb"
tag: 1
Interface "tap7b4ded0e-cb"
type: internal
Port "qr-829736b7-34"
tag: 1
Interface "qr-829736b7-34"
type: internal
Bridge br-tun
Port "gre-1"
Interface "gre-1"
type: gre
options: {in_key=flow, out_key=flow, remote_ip="10.107.0.8"}
Port "gre-2"
Interface "gre-2"
type: gre
options: {in_key=flow, out_key=flow, remote_ip="10.107.0.5"}
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port "gre-3"
Interface "gre-3"
type: gre
5. To verify that this host is a member of the cluster and that p_mysql does not contain any "Failed actions",
run the following command:
crm status
crm_mon -fotAW -1
If there are some RabbitMQ resource failures, they will be shown in the command output with the time stamps,
so you can search for the events in the logs around that moment.
Note
If there are split clusters of Corosync running, you should first fix your Corosync cluster, because you
cannot resolve issues with the Pacemaker resources, including RabbitMQ cluster, when there is a split
brain in the Corosync cluster.
How to recover
It is recommended to clean up and restart the master_p_rabbitmq-server Pacemaker resource, see crm - Cluster
Resource Manager.
Note
Restarting the RabbitMQ Pacemaker resource will introduce a full downtime for the AMQP cluster and
OpenStack applications. The downtime may take from a few to up to 20 minutes.
partitions detected by the rabbitmqctl tool, or even some list channels/queues requests may hang. Note, that the
command rabbitmqctl report issued on all controllers should be enough to gather all the required
information, but there is also a special group of Fuel OSTF HA health checks available in the Fuel UI and CLI. See
also RabbitMQ OSTF replication tests <https://blueprints.launchpad.net/fuel/+spec/ostf-rabbit-replication-tests>.
How to recover
It is recommended to clean up and restart the master_p_rabbitmq-server Pacemaker resource, see crm - Cluster
Resource Manager.
How to recover
It is recommended to restart the affected OpenStack service or services, see HowTo: Manage OpenStack services.
Note
Restarting the OpenStack service will introduce a short (near to zero) downtime for the related OpenStack
application.
Check if there are AMQP problems with any of the OpenStack components
Note, that normally an OpenStack application should be able to reconnect the AMQP host and restore its
operations, eventually. But if it cannot for some reason, there may be "down" status reports or failures of CLI
commands and AMQP/messaging related records in the log files of the services belonging to the affected
OpenStack component under verification. For Nova, for example, there may be records in the log files in the
/var/log/nova/ directory similar to the following ones:
How to recover
It is recommended to restart all instances of the OpenStack services related to the affected OpenStack
component, see HowTo: Manage OpenStack services. For example, for Nova Compute component, you may want to
restart all instances of the Nova services on the Controllers and Compute nodes affected by the AMQP issue.
Replace N with the number of timeouts. For instance, if it is set to 3, the OCF script will tolerate two rabbitmqctl
timeouts in a row, but fail if the third one occurs.
By default, the parameter is set to 1, which means rabbitmqctl timeout is not tolerated at all. The downside
of increasing the parameter is a delay in restarting RabbitMQ when it is down. For example, if a real issue occurs
and causes a rabbitmqctl timeout, the OCF script will detect that only after N monitor runs and then restart
RabbitMQ, which might fix the issue.
To understand that RabbitMQ's restart was caused by a rabbitmqctl timeout, you should examine lrmd.log of
the corresponding Controller on the Fuel Master node in /var/log/docker-logs/remote/ directory for the presence of
the following lines:
This indicates a rabbitmqctl timeout. The next line will explain if it caused restart or not. For example:
"rabbitmqctl timed out 2 of max. 3 time(s) in a row. Doing nothing for now."
Even though initial connection was in 192.168.0.2, the client tries to access the Public network for Nova API. The
reason is because Keystone returns the list of OpenStack services URLs, and for production-grade deployments it
is required to access services over public network.
HA testing scenarios
Currently, several testing scenarios are provided to check HA environment.
• Create a cluster.
• Add 3 nodes with controller role.
• Add 2 nodes with compute role.
• Set up a cluster to use Network VLAN manager with 8 networks.
• Deploy the cluster.
• Make sure that the cluster is configured correctly: there should be no dead services or no errors in the
logs. Also, you should check that all nova services are running and they are in up state; TestVM must
appear in Glance and only one nova-network should be present.
• Run network verification test.
• Run OSTF.
2. Deploy a cluster in HA mode with nova-network and Flat DHCP manager enabled. Steps to perform:
• Create a cluster.
• Add 3 nodes with controller role.
• Add 2 nodes with compute role.
• Deploy the cluster.
• Make sure that the cluster is configured correctly: there should be no dead services or no errors in the
logs. Also, you should check that all nova services are running and they are in up state; TestVM must
appear in Glance and only one nova-network should be present.
• Run network verification test.
• Perform a security check: verify that it is impossible to access TCP or UDP unused ports.
• Run OSTF.
3. Add a compute node to a cluster in HA mode with nova-network with Flat DHCP manager enabled. Steps to
perform:
• Create cluster
• Add 3 nodes with controller role
• Add 2 nodes with compute role.
• Deploy the cluster.
• Make sure that the cluster is configured correctly: there should be no dead services or no errors in the
logs. Also, you should check that all nova services are running and they are in up state; TestVM must
appear in Glance and only one nova-network is present.
• Add one node with compute role.
• Re-deploy the cluster.
• Make sure that the cluster is configured correctly: there should be no dead services or no errors in the
logs. Also, you should check that all nova services are running and they are in up state; TestVM must
appear in Glance and only one nova-network should be present.
• Run network verification test.
• Run OSTF.
4. Deploy an HA cluster with Ceph and nova-network: Steps to perform:
• Create a cluster.
• Add 3 nodes with controller role.
• Start cluster deployment.
• Stop deployment.
• Reset settings.
• Add 2 nodes with compute role.
• Re-deploy the cluster.
• Run OSTF.
• Run network verification test.
6. Deploy nova-network cluster in HA mode with Ceilometer. Steps to perform:
• Create a cluster. On Settings tab of the Fuel web UI, select Install Ceilometer option.
• Add 3 nodes with controller role.
• Add one node with compute role.
• Add one node with MongoDB role.
• Deploy the cluster.
• Check that partitions on MongoDB node are the same as those selected on the Fuel web UI.
• Make sure that Ceilometer API is running (it must be present in ps ax output).
• Run OSTF.
7. Check HA mode on scalability. Steps to perform:
• Create a cluster.
• Add 1 controller node.
• Deploy the cluster.
• Add 2 controller nodes.
• Deploy the changes.
• Check Pacemaker status: all nodes must be online after running crm_mon -1 command.
• Run network verification test.
• Add 2 controller nodes.
• Deploy the changes.
• Check that public and management vIPs have started after running crm_mon -1 command.
• Run network verification test.
• Run OSTF.
8. Backup/restore Fuel Master node with HA cluster. Steps to perform:
Neutron
You can run the following tests on the supported operating system.
• Create a cluster.
• Add 3 nodes with controller role.
• Add 2 nodes with compute role.
• Create a cluster.
• Add 3 nodes with controller role.
• Add 2 nodes with compute role.
• On Settings tab of the Fuel web UI, select Assign public networks to all nodes option.
• Deploy the cluster.
• Check that public network is assigned to all nodes.
• Run network verification test.
• Perform a security check: verify that it is impossible to access TCP or UDP unused ports.
• Run OSTF.
3. Deploy a cluster in HA mode with Neutron VLAN. Steps to perform:
• Create a cluster.
• Add 3 nodes with controller role.
• Add 2 nodes with compute role.
• Deploy the cluster.
• Run network verification test.
• Run OSTF.
4. Deploy cluster in HA mode with Neutron VLAN and public network assigned to all nodes. Steps to perform:
• Create a cluster.
• Add 3 nodes with controller role.
• Add 2 nodes with compute role.
• On Settings tab of the Fuel web UI, select Assign public networks to all nodes option.
• Deploy the cluster.
• Check that public network is assigned to all nodes.
• Run network verification test.
• Perform a security check: verify that it is impossible to access TCP or UDP unused ports.
• Run OSTF.
5. Deploy a cluster in HA mode with Murano and Neutron GRE segmentation. Steps to perform:
• Create a cluster. On Settings tab of the Fuel web UI, select Install Murano option.
• Create a cluster.
• Add 3 nodes with controller role.
• Add one node with compute role.
• Deploy the cluster.
• Verify that Heat services are up and running (check that heat-api is present in 'ps ax' output on every
controller).
• Run OSTF.
• Register Heat image.
• Run OSTF platform tests.
7. Deploy a new Neutron GRE cluster in HA mode after Fuel Master is upgraded. Steps to perform:
• Create a cluster with 1 controller with Ceph, 2 compute nodes with Ceph; Ceph for volumes and images
should also be enabled.
• Run upgrade on Fuel Master node.
• Check that upgrade has succeeded.
• Deploy a new upgraded cluster with HA Neutron VLAN manager, 3 controllers, 2 compute nodes and 1
Cinder.
• Run OSTF.
Bonding
You can run the following tests on the supported operating system:
1. Deploy cluster in HA mode for Neutron VLAN with bonding. Steps to perform:
• Create a cluster.
• Add 3 nodes with controller role.
• Add 2 nodes with compute role.
• Set up bonding for all interfaces in active-backup mode.
• Create a cluster.
• Add 3 nodes with controller role.
• Add 2 nodes with compute role.
• Setup bonding for all interfaces in balance-slb mode.
• Deploy the cluster.
• Run network verification test.
• Run OSTF.
Warning
These scenarios are destructive and you should not try to reproduce them.
• Run OSTF.
6. Shut down non-primary controller. Steps to perform:
• Run OSTF.
7. Shut down management interface on the primary controller.
Note
When you use ifdown, ifup, or commands that call them, it can cause the Corosync service to update
the cluster state and in most cases leads to so-called split-brain: the test will fail. Instead, use ip link
set down <ethX> or physically disconnect the interface.
Steps to perform:
Rally
1. Run Rally for generating typical activity on a cluster (for example, create or delete instance and/or
volumes). Shut down the primary controller and start Rally:
• Reload all HAProxy instances on all controllers in a cluster with a temporary services stop by running
the crm resource restart p_haproxy command.
2. On the Fuel Master node, run the fuel nodes | grep controller command. If the node that you are going to
back up is a host for a Neutron agent, you can move the agent to a different controller with the following
command:
ssh node-1
pcs resource move agent_name node_name
where "node-1" is the name of the node from which you would like to move.
3. For every controller in the cluster, put the MySQL service into maintenance mode by running the following
command from the Fuel Master node:
ssh node-1
crm node maintenance
where "node-1" is the name of the node from which you would like to move.
5. Stop data replication on the selected MySQL instance:
10. Take the MySQL service out of maintenance mode with the following command for every controller in the
cluster:
export OCF_RESOURCE_INSTANCE=p_mysql
export OCF_ROOT=/usr/lib/ocf
export OCF_RESKEY_socket=/var/run/mysqld/mysqld.sock
export OCF_RESKEY_additional_parameters="--wsrep-new-cluster"
/usr/lib/ocf/resource.d/fuel/mysql-wss start
/usr/lib/ocf/resource.d/fuel/mysql-wss start
# dd if=/way/to/your/ISO of=/way/to/your/USB/stick
where /way/to/your/ISO is the path to your Fuel ISO, and /way/to/your/USB/stick is the path to your USB drive.
For example, if your Fuel ISO is in the /home/user/fuel-isos/ folder and your USB drive is at /dev/sdc, issue the
following:
# dd if=/home/user/fuel-isos/fuel-7.0.iso of=/dev/sdc
Note
This operation will wipe all the data you have on on the USB drive and will place a bootable Fuel ISO on
it. You also have to write the ISO to the USB drive itself, not to a partition on it.
Note down the numbers under the id column. You will need these later.
Check the existing nodes:
---|--------|------|------------|------------|-------------------
4 | new | test | ha_compact | 1 | None
Note down the id of the environment. You will need this later.
Check the existing roles:
Your node with an empty role has been added to the cluster.
Configuring repositories
You may need to configure repositories to:
To change the list of repositories, amend the fields that contain the required information for the repositories
configuration depending on the distribution you install.
For Ubuntu
|repo-name|apt-sources-list-string|repo-priority|
Repository priorities
The process of setting up repositories and repository priorities is the same one you normally do on your Linux
distribution.
For more information, see the documentation to your Linux distribution.
Note
To be able to download Ubuntu system packages from the official Ubuntu mirrors and Mirantis packages
from the Mirantis mirrors you need to make sure your Fuel Master node and Slave nodes have Internet
connectivity.
To change the Ubuntu system package repositories from the official ones to your company's local ones, do the
following:
1. In Fuel web UI, navigate to the Settings tab and then scroll down to the Repositories section.
2. Change the path under URI.
Note
You can also change the repositories after a node is deployed, but the new repository paths will only be
used for the new nodes that you are going to add to a cluster.
Note
The script supports only rsync mirrors. Please refer to the official upstream Ubuntu mirrors list.
The script uses a Docker container with Ubuntu to support dependencies resolution.
The script can be installed on any Red Hat based or Debian based system. On a Debian based system it requires
only bash and rsync. On a Red Hat based system it also requires docker-io, dpkg, and dpkg-devel packages (from
Fedora).
When run on the Fuel Master node, the script will attempt to set the created Mirantis OpenStack and/or Ubuntu
local repositories as the default ones for new environments, and apply these repositories to all the existing
environments in the "new" state. This behavior can be changed by using the command line options described
below.
The script supports running behind an HTTP proxy as long as the proxy is configured to allow proxying to Port
873 (rsync). The following environment variables can be set either system-wide (via ~/.bashrc), or in the script
configuration file (see below):
http_proxy=http://username:password@host:port/
RSYNC_PROXY=username:password@host:port
You may also want to configure Docker to use the proxy to download the Ubuntu image needed to resolve the
packages dependencies. Add the above environment variables to the file /etc/sysconfig/docker, and export them:
http_proxy=http://username:password@host:port/
RSYNC_PROXY=username:password@host:port
export http_proxy RSYNC_PROXY
fuel-createmirror -h
OR
fuel-createmirror --help
fuel-createmirror -M
OR
fuel-createmirror --mos
fuel-createmirror -U
OR
fuel-createmirror --ubuntu
If no parameters are specified, the script will create/update both Mirantis OpenStack and Ubuntu mirrors.
Note
Options -M/--mos and -U/--ubuntu can't be used simultaneously.
fuel-createmirror -d
OR
fuel-createmirror --no-default
To disable applying the created repositories to all environments, in the "new" state, issue:
fuel-createmirror -a
OR
fuel-createmirror --no-apply
Note
If you change the default password (admin) in Fuel web UI, you will need to run the utility with the
--password switch, or it will fail.
The following configuration file can be used to modify the script behavior:
/etc/fuel-createmirror/common.cfg
In this file you can redefine the upstream mirrors, set local paths for repositories, configure the upstream
packages mirroring mode, set proxy settings, enable or disable using Docker, and set a path for logging. Please
refer to the comments inside the file for more information.
The following configuration file contains the settings related to Fuel:
/etc/fuel-createmirror/fuel.cfg
If you run the script outside of Fuel node, you may need to redefine the FUEL_VERSION and the FUEL_SERVER
parameters.
Debian-based server
1. Configure MOS DEB repository:
package1
package2
...
packageN
You can also look up the package names at the official Ubuntu website.
Having done that, restart the script. This will download all the missing packages and recreate a local partial
mirror.
Applying patches
This section describes how to apply, rollback, and verify the patches applied to the Fuel Master node and the Fuel
Slave nodes.
Introduction
Patching in brief:
• The patching feature was introduced in Mirantis OpenStack 6.1 and will not work in older releases.
• There are two types of patches: bug-fixes and security updates.
• Patches are downloaded from the Mirantis public repositories.
• You can always check what patches are available and get instructions on how to apply them at the
Maintenance Update section of the Release Notes.
• The changes that the patches introduce will be applied to the new OpenStack environments.
Usage scenarios
Default scenario
• Check each patching item and proceed with the instructions (plan accordingly: for example, schedule a
maintenance slot to run the update).
Note
Use the instructions listed here only for Mirantis OpenStack 6.1 and 7.0.
Note
The rollback instructions listed here are for advanced administrators. If you are not sure how to plan and
execute the rollbacks, your best option is to contact Mirantis support.
• Roll back the packages on the Fuel Master node. Refer to this article as an example.
• Roll back all the changes to the configuration you made when applying the patching instructions.
• Run dockerctl destroy all.
• Run dockerctl start all.
• Wait for bootstrap to complete.
• Figure out where to get the old package version. Run apt-cache policy.
• Figure out if the old package version is available locally.
• If it is, install these versions using dpkg. Otherwise, check the snapshots of previous repositories on
http://mirror.fuel-infra.org/mos/snapshots and pick the repository that contains the packages you need.
• Add this repository to the environment configuration.
• On the Fuel Master node run:
Note
This set of actions should be applied carefully and with consideration. It is strongly recommended that
you do this on your test staging environment before applying the updates to production.
It is a good practice to apply the updates node by node so that you can stop the update procedure whenever an
issue occurs. It is also strongly recommended to back up all sensitive data that can be altered continuously
during the whole lifetime of your environment and the Fuel Master node.
These instructions assume that if you add any custom repositories to your environment configuration, these
commands will update your environment taking packages from these repositories.
• Back up your data with dockerctl backup. This will save the data to /var/backup/fuel/.
• Run yum update.
• Run dockerctl destroy all.
• Run dockerctl start all.
• Wait for the new containers deployment to finish.
• Run apt-get update.
• Run apt-get upgrade.
• Apply all the additional configuration options as described in the supporting documentation.
• Reboot the node.
Note
The tasks rsync_core_puppet, hiera, and globals are required for processing any Puppet changes.
apt-get update
apt-get upgrade
• Two bare metal nodes. Alternatively, you can have one virtual machine (with Fuel installed on it) and one
bare metal.
• Fuel 7.0 ISO.
Deployment flow:
<interface type='bridge'>
<source bridge='br-prv'/>
<virtualport type='openvswitch'/>
<model type='virtio'/>
</interface>
6. If you use tagged VLANs (VLAN segmentation or 'Use VLAN tagging' in the "Networks" tab), you should
upload a network template. For details see Using Networking Templates. See also network template samples
for reduced footprint:
• VLAN segmentation
• VLAN tagging
7. Assign the "virt" role to the discovered node.
8. Upload the virtual machine configuration to Fuel.
9. Provision the bare metal node with the "virt" role. This will also spawn the virtual machines.
10. Assign roles to the spawned and discovered virtual machines.
VERSION:
feature_groups:
- mirantis
- advanced
Having added "advanced" to the yaml file, issue the following commands:
where <NODE_ID> points to a specific node identified by its ID (a number) that you can get by issuing the
fuel nodes command; <ENV_ID> points to the environment ID respectively; you can get the environment
ID by issues the fuel environment command.
For example:
6. Upload the virtual machine configuration to Fuel. On the Fuel Master node, issue the following command:
For example:
where <NODE_ID> is "virt" node ID, <VM_ID> is VM ID that should be unique on that "virt" node,
<MEMORY_SIZE> is the memory amount in gigabytes, and <CPU_CORE_COUNT> is the number of CPUs.
©2016, Mirantis Inc. Page 131
Mirantis OpenStack v8.0
Monitoring Guide Reduced footprint flow detailed
7. Provision the bare metal node with the virtual role and spawn virtual machines. At this point you can go
back to the Fuel UI. On the Dashboard there you will see the Provision VMs button that you need to click.
Alternatively, you can do this through Fuel CLI on the Fuel Master node by issuing the following command:
For example:
8. Assign controller roles to the spawned virtual machines. Alternatively, you can do this through Fuel CLI by
issuing the following command:
You can specify several nodes with the --node-id parameter. For example:
9. Deploy the environment using Fuel UI or Fuel CLI. If you deploy the OpenStack environment using Fuel CLI,
type:
You can specify several nodes with the --node-id parameter. For example:
10. Use the fuel-migrate scrpt to migrate the Fuel Master node into into a virtual machine on a compute node.
This allows for reduced resource use in small environments and lets the Fuel Master node run on physical
or virtual machines by essentially making it host agnostic.
To run the script issue the following command:
fuel-migrate
Note
This will give you all the available parameters to properly do the migration with the
fuel-migrate script.
1. Identify the node with the compute role by issuing the following command on the Fuel Master node
(and checking its output):
fuel node
fuel-migrate <DESTINATION_COMPUTE>
where <DESTINATION_COMPUTE> is the name or IP address of the destination compute node where
the virtual machine will be created.
For example:
fuel-migrate node-1
Or:
fuel-migrate 192.168.116.1
Note
You can get the node name or the IP address by issuing the fuel node command.
1. Create a blank disk image on the destination node, define the virtual machine, start the virtual
machine, and boot with Fuel PXE server.
2. Partition the disk on the destination node.
3. Reboot the Fuel Master node into maintenance mode and synchronize the data.
4. Swap the IP address on the source and destination Fuel Master nodes. It will then reboot the
destination virtual machine.
An indication of that the script has run successfully will be the following message (with additional details
on how to proceed) after you log in to the Fuel Master node via SSH:
Additional notes:
• You can
©2016, Mirantis Inc.define the destination disk size in gigabytes with the --fvm_disk_size parameter.
Page 133
Mirantis OpenStack v8.0
Monitoring Guide Reduced footprint flow detailed
For example:
• By default, the destination node will use the admin network interface. If you need to create additional
interfaces, you can do so with the --other_net_bridges parameter.
For example:
Note
Pay attention that --other_net_bridges uses three parameters, and if you skip one of
these as in this example, you still need to separate it with commas ,,.
• By default, the migration log file is /var/log/fuel-migrate.log. If the migration fails, check the log for
errors.
Custom usage example:
Note
With the HTTPS enabled, you are not able to access the Horizon dashboard through plain HTTP. You will
automatically be redirected to HTTPS port 8443.
The TLS for OpenStack public endpoints checkbox enables TLS termination on HAProxy for OpenStack services.
Note
With TLS for OpenStack public endpoints enabled, you are not able to access the public endpoints through
plain HTTP.
After enabling one or both of the secure access options, you will need to generate or upload a certificate and
update your DNS entries:
• I have my own keypair with certificate -- You will need to upload a file with the certificate information
and a private key that can be consumed by HAProxy. For detailed information read HOWTO SSL
NATIVE IN HAPROXY.
2. Update your DNS entries -- Set the DNS hostname for public TLS endpoints. This hostname will be used in
the two following cases:
Additional information
• Changing keypairs for a cluster -- There is currently no automated way to do this. You can manually change
the keypairs in /var/lib/astute/haproxy/public_haproxy.pem on Controller and Compute
nodes. Make sure you restart the HAProxy service after you edit the file.
• Changing keypairs for the Fuel Master node -- You need to write the key to
/var/lib/fuel/keys/master/nginx/nginx.key and the certificate to
/var/lib/fuel/keys/master/nginx/nginx.crt. Make sure you restart nginx after that.
• Making access to the Fuel Master node HTTPS only -- Edit the /etc/fuel/astute.yaml file so that it
contains the following:
SSL:
force_https: true
Note
Currently Fuel CLI does not support HTTPS. You will also need to update fuel-nailgun-agent
on all nodes deployed with older than 7.0, otherwise they will be reported as inactive.
• Ability to create additional networks (e.g. an extra network for Swift) and delete networks.
• Have a specific set of network roles.
• Ability to create a network only if a relevant node role is present on the node.
• Ability to provide custom networking topologies (e.g. subinterface bonding).
Note
If you delete the template, Fuel's default network solution will automatically become live and all network
related sections in Fuel Web UI will become available again.
network_template_<ENV_ID>.yaml
where <ENV_ID> is the ID of your OpenStack environment that you can get by issuing the fuel environment
command.
For example, if the ID of your environment is 1, the name of the template must be
network_template_1.yaml to operate with the template via Fuel CLI.
Note
There is no default or generated template in your Fuel installation provided by default.
• adv_net_template -- This is the network configuration template for the environment. The template
operates with node groups. Sample:
adv_net_template:
default: # name of the node group
nic_mapping:
...
templates_for_node_role:
...
network_assignments:
...
network_scheme:
...
group_11: # name of the node group
nic_mapping:
templates_for_node_role:
network_assignments:
network_scheme:
The following four sections are defined for each node group in the environment. Definitions from the
default node group will be used for the node groups not listed in the template.
• nic_mapping -- Aliases to NIC names mapping are set here.
• templates_for_node_role -- List of template names for every node role used in the environment.
• network_assignments -- Endpoints used in the template body. This is where the mapping is set
between endpoints and network names to set the L3 configuration for the endpoints.
• network_scheme -- Template bodies for every template listed under templates_for_node_role
nic_mapping:
default:
if1: eth0
if2: eth1
if3: eth2
if4: eth3
node-33:
if1: eth1
if2: eth3
if3: eth2
if4: eth0
NIC aliases (e.g. "if1") are used in templates in the topology description in the transformations section. With
nic_mapping you can set mapping of aliases to NIC names for different nodes.
The default mapping is set for all nodes that do not have name aliases. Custom mapping can be set for any
particular node (by node name).
The number of NICs for any node may vary. It depends on the topologies defined for the nodes in templates in
the transformations section.
Use of aliases in templates is optional. You can use NIC names if all nodes have the same set of NICs and they
are connected in the same way.
templates_for_node_role:
controller:
- public
- private
- storage
- common
compute:
- common
- private
- storage
ceph-osd:
- common
- storage
This is where you provide the list of template names for every node role used in the environment.
The order of templates matters. The description of the topology that is in the transformations section of the
template is executed by Puppet in the order provided on its input. Also, the order of creating the networking
objects cannot be arbitrary. For example, a bridge should be created first, and the subinterface that will carry its
traffic should be created after that.
While templates can be reused for different node roles, each template is executed once for every node.
When several roles are mixed on one node, an alphabetical order of node roles is used to determine the final
order of the templates.
Sample:
network_assignments:
storage:
ep: br-storage
private:
ep: br-prv
public:
ep: br-ex
management:
ep: br-mgmt
fuelweb_admin:
ep: br-fw-admin
Endpoints are used in the template body. The mapping is set here between endpoints and network names to get
the networks' L3 configuration to be set for endpoints.
The sample above shows the default mapping which is set without a template. The set of networks can be
changed using API: networks can be created or deleted via API.
network_scheme:
storage: # template name
transformations:
...
endpoints:
...
roles:
...
private:
transformations:
...
endpoints:
...
roles:
...
...
Each template has a name which is referenced in the sections above and consists of the three following sections:
• transformations -- A sequence of actions to build proper network topology is defined here. The
"transformation" from physical interface to endpoint is described here. The transformations are applied by
the Puppet l23network module and must be compatible with it.
• endpoints -- All endpoints introduced by the template.
• roles -- The mapping of network roles to endpoints. When several templates are used for one node there
should be no contradictions in this mapping.
Note
The order in which you add or remove networks and load the the template does not matter. However,
adding or removing networks will not make sense if a template is not uploaded for the environment at all,
because the default network solution takes into account only the networks created by default.
To upload a networking template, on the Fuel Master node issue the following command:
where where <ENV_ID> is the ID of your OpenStack environment that you can get by issuing the
fuel environment command; <PATH> is the path to where your template is.
For example:
To download a networking template to the current directory, on the Fuel Master node issue the following
command:
For example:
To delete an existing networking template, on the Fuel Master node issue the following command:
For example:
where <GROUP_ID> is the ID of your Node group that you can get by issuing the fuel nodegroup command;
<GROUP_NAME> is the name that you would like to assign to your group; <RELEASE_ID> is the ID of your release;
<VLAN_ID> is the VLAN ID; <NETWORK_CIDR> is an IP address with an associated routing prefix.
For example:
For example:
For example:
This parameter is required by the Fuel library. The Fuel library requires a value called
internal_address for each node. This value is set to the node's IP address from a network group which
has render_addr_mask set to internal in its metadata. Therefore, update render_addr_mask for this
network.
3. Save network template for two networks as network_template_<env id>.yaml.
Note
Verify that nic_mapping matches your configuration.
# ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
Configure Neutron
After you deploy your environment, allocate the correct floating IP pool to the network.
To allocate the correct floating IP pool:
Index
F
Fuel UI: Network Issues
H
Horizon
HowTo: Backport Galera Pacemaker OCF script
HowTo: Backport Memcached backend fixes
HowTo: Backport RabbitMQ Pacemaker OCF script
HowTo: Backup and Restore Fuel Master
HowTo: Create an XFS disk partition
HowTo: Functional tests for HA
HowTo: Galera Cluster Autorebuild
HowTo: Manage OpenStack services
HowTo: Troubleshoot AMQP issues
HowTo: Troubleshoot Corosync/Pacemaker