Skip to content
Advertisement

validating parameter dictionaries before insertion in datajoint schema

We’d like to enforce parameter checking before people can insert into a schema as shown below, but the code below doesn’t work.

Is there a way to implement pre-insertion parameter checking?

@schema
class AutomaticCurationParameters(dj.Manual):
    definition = """
    auto_curation_params_name: varchar(200)   # name of this parameter set
    ---
    merge_params: blob   # dictionary of params to merge units
    label_params: blob   # dictionary params to label units
    """
    def insert1(key, **kwargs):
        # validate the labels and then insert
        #TODO: add validation for merge_params
        for metric in key['label_params']:
            if metric not in _metric_name_to_func:
                raise Exception(f'{metric} not in list of available metrics')
            comparison_list = key['label_params'][metric]
            if comparison_list[0] not in _comparison_to_function:
                raise Exception(f'{metric}: {comparison_list[0]} not in list of available comparisons')
            if type(comparison_list[1]) != int and type(comparison_list) != float:
                raise Exception(f'{metric}: {comparison_list[1]} not a number')
            for label in comparison_list[2]:
                if label not in valid_labels:
                  raise Exception(f'{metric}: {comparison_list[2]} not a valid label: {valid_labels}')               
        super().insert1(key, **kwargs)

Advertisement

Answer

This is a great question that has come up many times for us.

Most likely the issue is either that you are missing the class’ self reference or that you are missing the case where the key is passed in as a keyword argument (we are actually expecting it as a row instead).

I’ll demonstrate a simple example that hopefully can illustrate how to inject your validation code which you can tweak to perform as you’re intending above.

Suppose, we want to track filepaths within a dj.Manual table but I’d like to validate that only filepaths with a certain extension are inserted.

As you’ve already discovered, we can achieve this through overloading like so:

import datajoint as dj

schema = dj.Schema('rguzman_insert_validation')

@schema
class FilePath(dj.Manual):
    definition = '''
    file_id: int
    ---
    file_path: varchar(100)
    '''
    def insert1(self, *args, **kwargs):  # Notice that we need a reference to the class
        key = kwargs['row'] if 'row' in kwargs else args[0]  # Handles as arg or kwarg
        if '.md' not in key['file_path']:
            raise Exception('Sorry, we only support Markdown files...')
        super().insert1(*args, **kwargs)

P.S. Though this example is meant to illustrate the concept, there is actually a better way of doing the above if you are using MySQL8. There is a CHECK utility available from MySQL that allows simple validation that DataJoint will respect. If those conditions are met, you can simplify it to:

import datajoint as dj

schema = dj.Schema('rguzman_insert_validation')

@schema
class FilePath(dj.Manual):
    definition = '''
    file_id: int
    ---
    file_path: varchar(100) CHECK(REGEXP_LIKE(file_path, '^.*.md$', 'c'))
    '''
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement