Creating seed data in a flask-migrate or alembic migration

Question

How can I insert some seed data in my first migration? If the migration is not the best place for this, then what is the best practice? Answer Alembic has, as one of its operation, bulk_insert(). The documentation gives the following example (with some fixes I&#8217;ve included): Note too that the alembic has…

Accepted Answer

Alembic has, as one of its operation, bulk_insert(). The documentation gives the following example (with some fixes I&#8217;ve included):from datetime import datefrom sqlalchemy.sql import table, columnfrom sqlalchemy import String, Integer, Datefrom alembic import op# Create an ad-hoc table to use for the insert statement.accounts_table = table('account',    column('id', Integer),    column('name', String),    column('create_date', Date))op.bulk_insert(accounts_table,    [        {'id':1, 'name':'John Smith',                'create_date':date(2010, 10, 5)},        {'id':2, 'name':'Ed Williams',                'create_date':date(2007, 5, 27)},        {'id':3, 'name':'Wendy Jones',                'create_date':date(2008, 8, 15)},    ])Note too that the alembic has an execute() operation, which is just like the normal execute() function in SQLAlchemy: you can run any SQL you wish, as the documentation example shows:from sqlalchemy.sql import table, columnfrom sqlalchemy import Stringfrom alembic import opaccount = table('account',    column('name', String))op.execute(    account.update().        where(account.c.name==op.inline_literal('account 1')).        values({'name':op.inline_literal('account 2')})        )Notice that the table that is being used to create the metadata that is used in the update statement is defined directly in the schema. This might seem like it breaks DRY (isn&#8217;t the table already defined in your application), but is actually quite necessary. If you were to try to use the table or model definition that is part of your application, you would break this migration when you make changes to your table/model in your application. Your migration scripts should be set in stone: a change to a future version of your models should not change migrations scripts. Using the application models will mean that the definitions will change depending on what version of the models you have checked out (most likely the latest). Therefore, you need the table definition to be self-contained in the migration script.Another thing to talk about is whether you should put your seed data into a script that runs as its own command (such as using a Flask-Script command, as shown in the other answer). This can be used, but you should be careful about it. If the data you&#8217;re loading is test data, then that&#8217;s one thing. But I&#8217;ve understood &#8220;seed data&#8221; to mean data that is required for the application to work correctly. For example, if you need to set up records for &#8220;admin&#8221; and &#8220;user&#8221; in the &#8220;roles&#8221; table. This data SHOULD be inserted as part of the migrations. Remember that a script will only work with the latest version of your database, whereas a migration will work with the specific version that you are migrating to or from. If you wanted a script to load the roles info, you could need a script for every version of the database with a different schema for the &#8220;roles&#8221; table. Also, by relying on a script, you would make it more difficult for you to run the script between migrations (say migration 3->4 requires that the seed data in the initial migration to be in the database). You now need to modify Alembic&#8217;s default way of running to run these scripts. And that&#8217;s still not ignoring the problems with the fact that these scripts would have to change over time, and who knows what version of your application you have checked out from source control.

Advertisement

Answer