Skip to content
Advertisement

Pandas select rows from a DataFrame based on column values?

I have below json string loaded to dataframe. Now I want to filter the record based on ossId.

The condition I have is giving the error message. what is the correct way to filter by ossId?

import pandas as pd

data = """
{
  "components": [
    {
      "ossId": 3946,
      "project": "OALX",
      "licenses": [
        {
          "name": "BSD 3",
          "status": "APPROVED"
        }
      ]
    },
    {
      "ossId": 3946,
      "project": "OALX",
      "version": "OALX.client.ALL",
      "licenses": [
        {
          "name": "GNU Lesser General Public License v2.1 or later",
          "status": "APPROVED"
        }
      ]
    },
    {
      "ossId": 2550,
      "project": "OALX",
      "version": "OALX.webservice.ALL" ,
      "licenses": [
        {
          "name": "MIT License",
          "status": "APPROVED"
        }
      ]
    }
  ]
}
"""

df = pd.read_json(data)
print(df)

df1 = df[df["components"]["ossId"] == 2550]

Advertisement

Answer

I think your issue is due to the json structure. You are actually loading into df a single row that is the whole list of field component.

You should instead pass to the dataframe the list of records. Something like:

json_data = json.loads(data)
df = pd.DataFrame(json_data["components"])

filtered_data = df[df["ossId"] == 2550]
Advertisement