OpenStack Migration from Diablo to Essex

We use Openstack at Grid Dynamics for more than a year. It is the basis of our private infrastructure originally named Cloud For Grid Dynamics (C4GD) and now known as Altai. C4GD provided cheap and fast VM management for our developers’ needs with reliable support. We were using the Diablo release and were happy with it.

On 5 April 2012 a new shiny Essex was released not without Grid Dynamics initiative (you can even find me in the list of contributors). I was challenged to investigate and prepare migration scripts for our cloud.

I started from my old scripts for installing Diablo and began writing a set of tools that make easy both migration and installation from scratch for different releases. You can see the result of my work at Github. These scripts work with OpenStack packaged to RPMs at Grid Dynamics.

Generally, OpenStack migrates rather well. It has a set of scripts for appropriate database upgrade and its configuration file format is unchanged (except for rare cases; see below). But it has several inconsistencies and dramatic changes between Diablo and Essex, and it made migration a laborious task.

The greatest change in Essex (from migration point of view, of course) is using UUIDs where sequential numeric IDs were used in Diablo. And it’s an inconsistent change, really! Just imagine: we had images in Glance with numeric IDs and instances in Nova with both UUIDs and numeric IDs (however, exactly IDs were used in API calls to reference instances). Then Essex comes and says: ‘Well, we prefer UUIDs, so Glance has to reformat its table and rename all the images. But Nova should just use UUIDs more frequently. Frankly speaking, it will use UUIDs in API and good old IDs as foreign keys in the database, do you understand me? And Keystone must forget about its previous database, tables, or fields because there is nothing better than UUIDs and omnipotent JSON stored in a relational DBMS!’

What are the consequences of these database changes?

  1. Nova and Glance database can be updated by a single call, like nova-manage db sync, since changes are small. Keystone database is completely rewritten, so you have to create a brand new database (set the new URI in config file and run keystone-manage db_sync) and then perform a special call keystone-manage import_legacy [OLD DATABASE URI].
  2. Glance images are referenced by IDs. That’s not dangerous when you are migrating Glance database: it can deal even with filesystem images that reference their kernels and RAM disks. But Nova instances reference images they were created from. This information is extremely important if you want to snapshot an instance: Nova must know some image metadata and it preforms a call to Glance during snapshotting. Migration breaks the reference, so old instances cannot be snaphotted anymore. A quick and dirty fix was to leave Glance image numeric IDs as-is and to patch Nova and Glance to persuade them that decimal numbers are appropriate image IDs (Essex Nova and Glance perform ID validation and refuse working with non-UUIDs). The correct approach is to save a map old ID -> new UUID during Glance migration and then fix Nova image references.

Unfortunately, UUIDs are just one problem from a big sack.

  1. Names of DB migration commands differ: nova-manage db sync vs keystone-manage db_sync and glance-manage db_sync. It’s annoying, but we can live with it.
  2. You should be very careful and never execute database migration tools as root: try to call glance-manage db_sync and you will end up with /var/log/glance/glance-registry.log that is owned by root and is not writeable by glance user and glance-registry daemon.
  3. Keystone persistence is more flexible than ever: you can choose between RDBMS, memcached, key-value storage (KVS) in dynamic memory, and even text files for service endpoints. Keystone’s default config prescripts to store tokens and EC2 credentials in KVS. Imagine how fun it is to loose a hundred of user credentials after restart of Keystone daemon! The solution is simple: just run sed on Keystone config file to replace kvs with sql and enjoy the persistence. But look at the next problem!
  4. Originally, Essex Keystone did not provided EC2 credential migration. As we know, they are lost after keystone restart, so it is quite logical 😉 So, Dmitry Khovyakov implemented EC2 migration by my request, and I had to double-check that Keystone uses MySQL and not dynamic memory.
  5. Keystone default configuration file refers to mystical files like ./etc/default_catalog.templates. It’s FHS compliance is doubtful, so must be updated to appropriate locations (like /etc/keystone/default_catalog.templates) during Essex RPMization.
  6. Keystone doesn’t store service endpoints in database. The preferable way is to put them in a text file. That’s not a big deal to write one, but it is not migrated by default from the old database, as you can guess.
  7. Keystone initial configuration process is changed. In Diablo, we called keystone-manage to create groups and initial users, and this tool manipulated database directly. In Essex, we store a magic admin token in keystone.conf and then use keystone client utility that communicates with keystone server by API calls.
  8. Keystone doesn’t support global roles anymore. In Diablo, we had two kinds of roles: tenant-specific and global. For example, user aababilov can be a Member in tenant c4gd and he is also an Admin of the whole system. In Essex, the situation is ridiculous: if I want to be an Admin of the cloud (e.g., to be able to create new tenants), I have to grant Admin role on _any_ tenant and Keystone will treat me as a global Admin: I am able to manipulate any tenant as Admin (e.g., drop users or associate them with tenants). So, we decided to introduce a special systenantthat is used especially for specifying former global roles. Again, Keystone cannot migrate global roles out of the box.On the other hand, Nova will not consider me as an Admin until I add myself to tenant I’m working with. What is the reason? Consider I want to work with c4gd tenant. I ask Keystone for a token for it (let it be token XXX-YYY-ZZZ). Then I perform an API call to Nova and provide my XXX-YYY-ZZZ as a token. Nova calls Keystone to validate XXX-YYY-ZZZ, and Keystone sends only those of my roles that belong in c4gd (that’s done by Keystone’s design). Since I’m an Admin only in systenant, Nova will treat me as an ordinary user of c4gd.
  9. OpenStack roles are even more inconsistent. By default, Keystone and Nova roles are case-insensitive, so, it doesn’t matter whether your role is admin, Admin, or even AdMiN. On the contrary, Glance is interested in character case, so, if you write
    admin_role = admin

    in its config, your usual Keystone’s role Admin will not help: Glance will treat you as an unprivileged user. Period. I decided to fix it than to just write

    admin_role = Admin
  10. Nova configuration file format is changed from command line arguments-like one:
    --verbose=false
    --ec2_url=http://127.0.0.1:8773/services/Cloud
    --s3_host=127.0.0.1
    --cc_host=127.0.0.1

    to INI:

    [DEFAULT]
    logdir = /var/log/nova
    state_path = /var/lib/nova
    lock_path = /var/lock/nova
    dhcpbridge = /usr/bin/nova-dhcpbridge

    There was no converter out of the box. Sad but true.

  11. The last but not least problem is uncomfortable OpenStack client tools that force user to provide UUIDs of objects (e.g., users, roles, or tenants) rather than names. So you have to write something like
    keystone --token 999888777666 --endpoint 'http://127.0.0.1:5001/v2.0/' user-role-add --user 4dc495fd-0407-4e05-9650-2c513198f745 --role 273cf740-3108-49f9-b759-bd529299261b --tenant_id 4d4e8c69-cf5d-4398-80dd-a8d2397c2a96

    to grant role Admin to user admin on tenant systenant. That’s why I had to create some helpers in my scripts to write a more readable line like

    keystone_client user-role-add --user "$ADMIN_USER_ID" --role `get_id role Admin` --tenant_id `get_id tenant systenant`

So, OpenStack migration is a painful process that is not fully supported by OpenStack out of the box. However, OpenStack is developed so that you can write custom scripts that help to migrate without reasonable difficulties. And Grid Dynamics is using Essex at the moment in its private cloud!

Advertisements

One response to “OpenStack Migration from Diablo to Essex

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s