Skip to content
Advertisement

Modeling a dictionary as a queryable data object in python

I have a simple book catalog dictionary as the following

JavaScript

Each book is a string key in the dictionary. A book contains a single title, and possibly has many authors (often just one). An author is made of two strings, firstname and lastname. Also we can associate many tags to a book as novel, literature, art, 1900s, etc. Each book as a blob field that contains additional data. (often the book itself). I want to be able to search for a given entry (or a group of them) based on data, as by author, by tag.

My main workflow would be:

  1. Given a query, return all blob fields associated to each entry.

My question is how to model this, which libraries or formats to use keeping the given constraints:

  • Minimize the number of data objects (preference for a single data object to simplify queries).
  • Small size of columns (create a new column for every possible tag is probably insane and lead to a very sparse dataset)
  • Do not duplicate blob field (since it can be large).

My first idea was to create multiple rows for each author, for example:

JavaScript

Would turn, initially into two entries as

idx key Title authors_firstname authors_lastname tags blob
0 123 Sample Book John Smith [‘tag1’, ‘tag2’, ‘tag3’]
1 123 Sample Book Foos M. Bar [‘tag1’, ‘tag2’, ‘tag3’]

But this still duplicates the blob, and still need to figure out what to do with the unknown number of tags (as the database grows).

Advertisement

Answer

You can use TinyDB to accomplish what you want.

First, convert your dict to a database:

JavaScript

Now you can make queries:

JavaScript

Given a query, return all blob fields associated to each entry.

JavaScript
JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement