We use Openstack at Grid Dynamics for more than a year. It is the basis of our private infrastructure originally named Cloud For Grid Dynamics (C4GD) and now known as Altai. C4GD provided cheap and fast VM management for our developers’ needs with reliable support. We were using the Diablo release and were happy with it.
On 5 April 2012 a new shiny Essex was released not without Grid Dynamics initiative (you can even find me in the list of contributors). I was challenged to investigate and prepare migration scripts for our cloud.
I started from my old scripts for installing Diablo and began writing a set of tools that make easy both migration and installation from scratch for different releases. You can see the result of my work at Github. These scripts work with OpenStack packaged to RPMs at Grid Dynamics.
Generally, OpenStack migrates rather well. It has a set of scripts for appropriate database upgrade and its configuration file format is unchanged (except for rare cases; see below). But it has several inconsistencies and dramatic changes between Diablo and Essex, and it made migration a laborious task.
The greatest change in Essex (from migration point of view, of course) is using UUIDs where sequential numeric IDs were used in Diablo. And it’s an inconsistent change, really! Just imagine: we had images in Glance with numeric IDs and instances in Nova with both UUIDs and numeric IDs (however, exactly IDs were used in API calls to reference instances). Then Essex comes and says: ‘Well, we prefer UUIDs, so Glance has to reformat its table and rename all the images. But Nova should just use UUIDs more frequently. Frankly speaking, it will use UUIDs in API and good old IDs as foreign keys in the database, do you understand me? And Keystone must forget about its previous database, tables, or fields because there is nothing better than UUIDs and omnipotent JSON stored in a relational DBMS!’
What are the consequences of these database changes?
- Nova and Glance database can be updated by a single call, like
nova-manage db sync, since changes are small. Keystone database is completely rewritten, so you have to create a brand new database (set the new URI in config file and run
keystone-manage db_sync) and then perform a special call
keystone-manage import_legacy [OLD DATABASE URI].
- Glance images are referenced by IDs. That’s not dangerous when you are migrating Glance database: it can deal even with filesystem images that reference their kernels and RAM disks. But Nova instances reference images they were created from. This information is extremely important if you want to snapshot an instance: Nova must know some image metadata and it preforms a call to Glance during snapshotting. Migration breaks the reference, so old instances cannot be snaphotted anymore. A quick and dirty fix was to leave Glance image numeric IDs as-is and to patch Nova and Glance to persuade them that decimal numbers are appropriate image IDs (Essex Nova and Glance perform ID validation and refuse working with non-UUIDs). The correct approach is to save a map
old ID -> new UUIDduring Glance migration and then fix Nova image references.
Unfortunately, UUIDs are just one problem from a big sack.
- Names of DB migration commands differ:
nova-manage db syncvs
glance-manage db_sync. It’s annoying, but we can live with it.
- You should be very careful and never execute database migration tools as root: try to call
glance-manage db_syncand you will end up with
/var/log/glance/glance-registry.logthat is owned by root and is not writeable by glance user and
- Keystone persistence is more flexible than ever: you can choose between RDBMS, memcached, key-value storage (KVS) in dynamic memory, and even text files for service endpoints. Keystone’s default config prescripts to store tokens and EC2 credentials in KVS. Imagine how fun it is to loose a hundred of user credentials after restart of Keystone daemon! The solution is simple: just run
sedon Keystone config file to replace
sqland enjoy the persistence. But look at the next problem!
- Originally, Essex Keystone did not provided EC2 credential migration. As we know, they are lost after keystone restart, so it is quite logical 😉 So, Dmitry Khovyakov implemented EC2 migration by my request, and I had to double-check that Keystone uses MySQL and not dynamic memory.
- Keystone default configuration file refers to mystical files like
./etc/default_catalog.templates. It’s FHS compliance is doubtful, so must be updated to appropriate locations (like
/etc/keystone/default_catalog.templates) during Essex RPMization.
- Keystone doesn’t store service endpoints in database. The preferable way is to put them in a text file. That’s not a big deal to write one, but it is not migrated by default from the old database, as you can guess.
- Keystone initial configuration process is changed. In Diablo, we called
keystone-manageto create groups and initial users, and this tool manipulated database directly. In Essex, we store a magic admin token in
keystone.confand then use
keystoneclient utility that communicates with keystone server by API calls.
- Keystone doesn’t support global roles anymore. In Diablo, we had two kinds of roles: tenant-specific and global. For example, user
aababilovcan be a
c4gdand he is also an
Adminof the whole system. In Essex, the situation is ridiculous: if I want to be an
Adminof the cloud (e.g., to be able to create new tenants), I have to grant
Adminrole on _any_ tenant and Keystone will treat me as a global
Admin: I am able to manipulate any tenant as
Admin(e.g., drop users or associate them with tenants). So, we decided to introduce a special
systenantthat is used especially for specifying former global roles. Again, Keystone cannot migrate global roles out of the box.On the other hand, Nova will not consider me as an
Adminuntil I add myself to tenant I’m working with. What is the reason? Consider I want to work with
c4gdtenant. I ask Keystone for a token for it (let it be token
XXX-YYY-ZZZ). Then I perform an API call to Nova and provide my
XXX-YYY-ZZZas a token. Nova calls Keystone to validate
XXX-YYY-ZZZ, and Keystone sends only those of my roles that belong in
c4gd(that’s done by Keystone’s design). Since I’m an
systenant, Nova will treat me as an ordinary user of
- OpenStack roles are even more inconsistent. By default, Keystone and Nova roles are case-insensitive, so, it doesn’t matter whether your role is
Admin, or even
AdMiN. On the contrary, Glance is interested in character case, so, if you write
admin_role = admin
in its config, your usual Keystone’s role
Adminwill not help: Glance will treat you as an unprivileged user. Period. I decided to fix it than to just write
admin_role = Admin
- Nova configuration file format is changed from command line arguments-like one:
--verbose=false --ec2_url=http://127.0.0.1:8773/services/Cloud --s3_host=127.0.0.1 --cc_host=127.0.0.1
[DEFAULT] logdir = /var/log/nova state_path = /var/lib/nova lock_path = /var/lock/nova dhcpbridge = /usr/bin/nova-dhcpbridge
There was no converter out of the box. Sad but true.
- The last but not least problem is uncomfortable OpenStack client tools that force user to provide UUIDs of objects (e.g., users, roles, or tenants) rather than names. So you have to write something like
keystone --token 999888777666 --endpoint 'http://127.0.0.1:5001/v2.0/' user-role-add --user 4dc495fd-0407-4e05-9650-2c513198f745 --role 273cf740-3108-49f9-b759-bd529299261b --tenant_id 4d4e8c69-cf5d-4398-80dd-a8d2397c2a96
to grant role
systenant. That’s why I had to create some helpers in my scripts to write a more readable line like
keystone_client user-role-add --user "$ADMIN_USER_ID" --role `get_id role Admin` --tenant_id `get_id tenant systenant`
So, OpenStack migration is a painful process that is not fully supported by OpenStack out of the box. However, OpenStack is developed so that you can write custom scripts that help to migrate without reasonable difficulties. And Grid Dynamics is using Essex at the moment in its private cloud!