In the above dataframe, all I want to create a line plot so that we have info on trends per year for each of the columns. I’ve read about pivot-table on related posts, but when I implement that, it says there are no numbers to aggregate. I don’t want to aggregate something. I just need the y-axis in terms of
Tag: pandas
Pandas datetime filter
I want to get subset of my dataframe if date is before 2022-04-22. The original df is like below df: I checked data type by df.dtypes and it told me ‘date’ column is ‘object’. So I checked individual cell using df[‘date’][0] and it is datetime.date(2022, 4, 21). Also, df[‘date’][0] < datetime.date(2022, 4, 22) gave me ‘True’ However, when I wanted
How to extract ids and rows only if a column has all of the designated values
I have the following dataframe I want to group by id and keep those ids if it contains all of the designated values (i.e. 2019Q4, 2020Q4, 2021Q4) then extract rows that correspond to those values. isin() won’t work because it won’t drop C and D. desired output Answer You can use set operations to filter the id and isin for
Index must be DatetimeIndex when filtering dataframe
I then have a function which look for a specific date (in this case, 2022-01-26): Which returns: When I then try to look for only times between 09:00 and 09:30 like so: I get the following error: Full code: What am I doing wrong? Answer between_time is only valid if your index is a DateTiimeIndex As your string time is
How to create dictionary from multiple dataframes?
I have a folder with several csv files. Example of the dataframes from csv files in directory: I need to make a function that accepts route to file directory and return sites frequency dictionary (one for all sites in file directory) with unique sites names the following kind: {‘site_string’: [site_id, site_freq]} For our examle it will be: {‘vk.com’: (1, 2),
loop over rows of csv and put inside code
I am trying to read 5 columns from a 6 column csv data and use each row in a formula and itarete for all the rows. the file is csv file is something like this with hunderds of rows when I put values by hand it works all in well. However, when I put the df columns to do it
How to deal with “ValueError: array must not contain infs or NaNs” while running regressions in python
I have a df with growth variables and often some initial values are 0, in which case it produces infinite values when the value moves from zero to non-zeros. i.e. when i run PanelOLS, i get an error message Is there a way to ignore these entries to continue with the regression without having to drop them and create a
How can I get automatical features with dfs, using featuretools, when I have only one dataframe?
I am trying to figure out how Featuretools works and I am testing it on the Housing Prices dataset on Kaggle. Because the dataset is huge, I’ll work here with only a set of it. The dataframe is: I set de dataframe properties: Then call the dfs method: I get the following warning: UnusedPrimitiveWarning: Some specified primitives were not used
ValueError: Could not interpret input ‘0’ with Seaborn?
I’m using the Kepler exoplanet dataset. After loading it, and running a simple transpose() on it, in order to get the rows as columns, I try a seaborn boxplot, as follows: This returns: I also attempted this: and got a KeyError: ‘0’ instead. What am I doing wrong? As far as I can tell, I’m doing the exact same thing
How can I transform a pandas df to this kind of json structure?
Let’s say I have a pandas df like this: I would like to transform it in a json format like this: How can I do this? The nearest match I had was using df.to_json(orient=”index”) but what I’ve got was a structure like this: Would be really thankful for any help! Answer Use df.to_dict(‘records’)