Skip to content
Advertisement

Mongo append to standardized string field based on mapping of multiple fields (including nested)

I have a large collection with fields like:

{
    'class': 'apple'
},
{
    'class': 'appl'
},
{
    'class': 'orange',
    'nested': [
        {'classification': 'app'},
        {'classification': 'A',
        {'classification': 'orang'}
    ]
},
{
    'nested': [
        {'classification': 'O'},
        {'classification': 'unknown'}
    ]
}

I also have a Python dictionary mapping field values like:

{
    'class': {
        'apple': 'a',
        'appl': 'a',
        'orange': 'o'
    },
    'nested.classification': {
        'app': 'a',
        'A': 'a',
        'orang': 'o',
        'O': 'o',
        'unknown': 'u'
    }
}

I’m trying to (in PyMongo) update my MongoDB collection so that a string field of mapped characters is accumulated, from both the top-level class field and the nested nested.classification fields.

In the above, this would produce the following updates:

{
    'class': 'apple'
    'standard': 'a'
},
{
    'class': 'appl'
    'standard': 'a'
},
{
    'class': 'orange',
    'nested': [
        {'classification': 'app'},
        {'classification': 'A',
        {'classification': 'orang'}
    ]
    'standard': 'oaao'
},
{
    'nested': [
        {'classification': 'O'},
        {'classification': 'unknown'}
    ]
    'standard': 'ou'
}

How can I effectively do this at scale? Within an aggregation framework?

Advertisement

Answer

You may get the desired result in 3 steps

Note: MongoDB can only iterate arrays, so we need to transform your dictionaries into {k:"key", v: "value"} array (we can use $objectToArray, but it’s not worth it)

  1. We map class field by iterating Python class dictionary
  2. We map nested classification values by iterating Python nested.classification dictionary
  3. We concat mapped values into a single value
  4. (Optional) If you need to persist it, run $merge stage

Disclamer: MongoDB >=4.2 + I am not sure if this solution scales good

db.collection.aggregate([
  {
    "$addFields": {
      standard: {
        $reduce: {
          input: [
            { k: "apple",  v: "a" },
            { k: "appl",   v: "a" },
            { k: "orange", v: "o" }
          ],
          initialValue: "",
          in: {
            $cond: [
              {
                $eq: ["$$this.k", "$class"]
              },
              "$$this.v",
              "$$value"
            ]
          }
        }
      }
    }
  },
  {
    "$addFields": {
      standard: {
        $reduce: {
          input: {
            "$ifNull": [ "$nested", [] ]
          },
          initialValue: [ { v: "$standard" } ],
          in: {
            $concatArrays: [
              "$$value",
              {
                $filter: {
                  input: [
                    { k: "app",     v: "a" },
                    { k: "A",       v: "a" },
                    { k: "orang",   v: "o" },
                    { k: "O",       v: "o" },
                    { k: "unknown", v: "u" }
                  ],
                  as: "nested",
                  cond: {
                    $eq: [ "$$this.classification", "$$nested.k" ]
                  }
                }
              }
            ]
          }
        }
      }
    }
  },
  {
    "$addFields": {
      "standard": {
        $reduce: {
          input: "$standard.v",
          initialValue: "",
          in: {
            "$concat": [ "$$value", "$$this" ]
          }
        }
      }
    }
  },
  //Optional - If you need to persist it
  { 
    $merge: {
      into: "collection",
      on: "_id",
      whenMatched: "replace"
    }
  }
])

MongoPlayground

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement