Skip to content
Advertisement

How to sort by year using a date string in PyMongo Aggregate

Me and another programmer have run around in circles on this and cannot seem to find an answer online.

We have a MongoDB collection into which we have inserted documents which have been pulled from an API. Each document has a field called attributes.created which is a string.

The format of the string is like this: “2019-04-25T07:06:02Z”

Our aim is to get a list of counts per year. We can do this on a different collection which just has an int value for year… but we cannot figure it out here as we need to convert a string to a date.

An example of our aggregate function is:

publicationsByYear = collection.aggregate([
         {"$group": {"_id" : { $dateFromString: { "dateString": "attributes.created", "format": "%Y-%m-%d %H:%M:%S" }}, "num_publications": {"$sum": 1}}}
])

We have also tried the $toDate instead of $dateFromString, but the results are the same. No matter what we try we seem to get one of the same two errors:

  1. Error parsing date, a four digit year could not be found. (this is the error the above code throws)
  2. passing a time zone identifier as part of the string is not allowed. (this is thrown if we remove the ‘format’ parameter and go with the default format).

We’ve tried various uses of $toDate and $dateFromString with different values for the format parameter, but we are getting the same errors. We’ve also tried various format specifications, but can’t seem to get it right.

What are we missing?

Advertisement

Answer

Using the $dateFromString operator,

  • convert that string date to date type by $dateFromString
  • $year to get the year from the above-converted date
publicationsByYear = collection.aggregate([
  {
    $group: {
      _id: {
        $year: { $dateFromString: "$attributes.created" }
      },
      num_publications: { $sum: 1 }
    }
  }
])

Playground

Using the $toDate operator,

  • convert that string date to date type by $toDate
  • $year to get the year from the above-converted date
publicationsByYear = collection.aggregate([
  {
    $group: {
      _id: {
        $year: { $toDate: "$attributes.created" }
      },
      num_publications: { $sum: 1 }
    }
  }
])

Playground

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement