How do database migrations fail?

Sometimes database migrations may fail when they are executed during package installation or upgrade.

This can be caused by problems with configuration, such as missing usernames or passwords in /etc/xroad.properties or /etc/xroad/db.properties

Excerpt from a package upgrade output, where xroad-opmonitor database migrations failed due to missing op-monitor.database.admin_password from /etc/xroad.properties:

...
Setting up xroad-opmonitor (6.25.0-0.20201013130750git2e3558f.ubuntu18.04) ...
Installing new version of config file /etc/xroad/conf.d/op-monitor-logback.xml ...
Installing new version of config file /etc/xroad/services/opmonitor.conf ...
Database and user exists, skipping database creation.
Unexpected error running Liquibase: ERROR: must be owner of relation operational_data [Failed SQL: ALTER TABLE public.operational_data ADD service_type VARCHAR(255)]
Connection to database has failed, please check database availability and configuration in /etc/xroad/db.properties file

In this case installation finished successfully and the installed packages look healthy: 

# dpkg -l | grep opmon
ii xroad-addon-opmonitoring 6.25.0-0.20201013130750git2e3558f.ubuntu18.04 all X-Road AddOn: operations monitoring service
ii xroad-opmonitor 6.25.0-0.20201013130750git2e3558f.ubuntu18.04 all X-Road operations monitoring daemon

However, /var/log/xroad/op-monitor.log shows (after some messages have been sent through security server) signs of database access trouble due to missing migrations: 

2020-10-14T09:56:50.813Z [qtp1348115836-43] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - ERROR: column "request_size" of relation "operational_data" does not exist
Position: 559
2020-10-14T09:56:50.814Z [qtp1348115836-43] ERROR e.r.x.o.OpMonitorDaemonRequestHandler - Error while handling data store request
javax.persistence.PersistenceException: org.hibernate.exception.SQLGrammarException: could not execute batch
at org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:154)

In this example problem was with migrations for op-monitor database. Similar issues can occur also for serverconf and messagelog database migrations. 

Finding the root cause and fixing it

To fix migrations, first you need to find the root cause that prevented migrations from being run. In the earlier example, error message "Connection to database has failed, please check database availability and configuration in /etc/xroad/db.properties file" provided some important hints on this. Typically the problem is with either of these configuration files:

  • /etc/xroad/db.properties
    • relevant parameters for regular database users
      • <database name>.hibernate.connection.url
      • <database name>.hibernate.connection.username
      • <database name>.hibernate.connection.password
  • /etc/xroad.properties
    • relevant parameters for database admin users
      • <database name>.database.admin_user
      • <database name>.database.admin_password

Naturally the root cause might be something else as well, such as problems with network connectivity, etc.

Use of regular and admin database user depends on the installed / upgraded security server version and the used configuration (local or remote database). In some configurations only regular db user exists, and this user does both migrations and data access for the application. In other configurations regular user does data access for the application, and admin user does the migrations. 

As a first step, you should check whether the configuration in db.properties and xroad.properties is correct, and verify that the usernames and passwords match to existing users. 

Different error messages point to different problems. A couple of examples:

Example 1: Wrong opmonitor_admin password (Ubuntu)

Setting up xroad-opmonitor (6.24.0-1.ubuntu18.04) ...
Installing new version of config file /etc/xroad/conf.d/op-monitor-logback.xml ...
Installing new version of config file /etc/xroad/services/opmonitor.conf ...
Database and user exists, skipping database creation.
Unexpected error running Liquibase: org.postgresql.util.PSQLException: FATAL: password authentication failed for user "opmonitor_admin"

Error message "password authentication failed for user "opmonitor_admin"" tells us that

  • login to database was attempted as user "opmonitor_admin"
  • login did not succeed. Maybe the password is wrong (or missing)?

To fix, verify that /etc/xroad.properties looks healthy, and maybe test configured admin username and password with e.g 

# psql -d op-monitor -U opmonitor_admin --host=localhost

In case password is wrong and you don't know the right one, you may need to change the password of relevant database user, and update property files accordingly. 

Example of a healthy /etc/xroad.properties: 

op-monitor.database.admin_password = <your_password>
op-monitor.database.admin_user = opmonitor_admin

Example 2: Missing opmonitor_admin user (Ubuntu)

Setting up xroad-opmonitor (6.25.0-0.20201013130750git2e3558f.ubuntu18.04) ... 
Installing new version of config file /etc/xroad/conf.d/op-monitor-logback.xml ... 
Installing new version of config file /etc/xroad/services/opmonitor.conf ... 
Database and user exists, skipping database creation. 
Unexpected error running Liquibase: ERROR: must be owner of relation operational_data [Failed SQL: ALTER TABLE public.operational_data ADD service_type VARCHAR(255)]
Connection to database has failed, please check database availability and configuration in /etc/xroad/db.properties file

Error message "must be owner of relation operational_data" tells us that

  • login to database was successful as some database user (output does not tell us which one)
  • this database user did not own relation operational_data, and hence could not make changes to it

This could occur if opmonitor_admin user exists but /etc/xroad.properties contains neither admin_password nor admin_user properties. In this case migrations are attempted as regular user "opmonitor" (or whichever username is configured in db.properties), which may not have correct permissions.

To fix, add correct op-monitor.database.admin_password and op-monitor.database.admin_user properties to /etc/xroad.properties

Example 3: Wrong serverconf password (RHEL)

Updating : xroad-proxy-6.25.0-0.20201013130750git2e3558f.el7.x86_64 5/29 
client-fastest-connecting-ssl-use-uri-cache not present in local.ini, use default value
ERROR: database "serverconf" already exists
Creating database 'serverconf' on '127.0.0.1:5432' failed.
Updating : xroad-addon-messagelog-6.25.0-0.20201013130750git2e3558f.el7.x86_64 6/29 

In this case the error message from installer is not very helpful.

Here, update from 6.23 to 6.25 was attempted when db.properies had wrong serverconf password. There was no admin user for serverconf, and no admin user configuration in xroad.properties.

As a result package upgrade was successful, but serverconf database does not have the latest migrations, which causes problems when application is run.

Installer has also attempted to create serverconf admin user. It has failed, but xroad.properties has been modified to contain (invalid) admin user name and password.

op-monitor.database.admin_password = _SY-uxT-ozHEU7ry5T8twYNy5xBWXx-g
op-monitor.database.initialized = true
op-monitor.database.admin_user = opmonitor_admin
serverconf.database.admin_user = serverconf_admin
serverconf.database.admin_password = DQvaFxZneWZgzmzZIJdLtfFrAnmyyser

Looking at the database, user serverconf_admin does not exist

serverconf=# \du
List of roles
Role name | Attributes | Member of 
-----------------+------------------------------------------------+-------------------
messagelog | | {}
opmonitor | | {}
opmonitor_admin | | {}
postgres | Superuser, Create role, Create DB, Replication | {opmonitor_admin}
serverconf | | {}

Now, fixing regular serverconf password in db.properties and then running yum reinstall xroad-proxy fails. Installer tries to use serverconf_admin details from xroad.properties. To fix migrations successfully,

  • update correct regular user password in db.properties
  • remove serverconf_admin properties from xroad.properties
  • sudo yum reinstall xroad-proxy

Main steps for fixing failed database migrations

All in all, steps to recover from failed migrations are (adapt to the correct database opmonitor/serverconf/messagelog):

  1.  Find the root cause for failed migrations, and fix it
  2. Repair broken package installation
    1. for op-monitor db migrations 
      1. Ubuntu
        1. try to run "sudo apt install -f"
        2. If that does not help, running "sudo apt install --reinstall xroad-opmonitor" should do the trick
      2. RHEL
        1. running "sudo yum reinstall xroad-opmonitor" should do the trick
    2. for serverconf db migrations: follow op-monitor instuctions and replace xroad-opmonitor with xroad-proxy in the reinstall commands
    3. for messagelog db migrations: follow op-monitor instuctions and replace xroad-opmonitor with xroad-addon-messagelog in the reinstall commands
  3. If completing the steps 2 does not fix the problem for some reason, as a last resort, it is possible to run the database setup/migration manually
    1. op-monitor 

      # sudo /usr/share/xroad/scripts/setup_opmonitor_db.sh
    2. serverconf 

      # sudo /usr/share/xroad/scripts/setup_serverconf_db.sh
    3. messagelog 

      # sudo /usr/share/xroad/scripts/setup_messagelog_db.sh

Your Rating: Results: 1 Star 2 Star 3 Star 4 Star 5 Star 3 rates