I want to take a column indexed ‘length’ and make it my second column. It currently exists as the 5th column. I have tried: I see the following error: TypeError: must be str, not list I’m not sure how to interpret this error because it actually should be a list, right? Also, is there a general method to move any
Tag: dataframe
Delete rows that do not contain specific text
I have a tabular file that looks like this: I’m trying to create a script to go through and delete the entire row if column 2 (‘KEGG_KOs’) does not begin with ‘K0’. I’m trying to create an output of: Previous responses have referred people to pandas DataFrame but I’ve had no luck using those responses to help. Any would be
Pandas: Conditionally replace values based on other columns values
I have a dataframe (df) that looks like this: Now my goal is for each add_rd in the event column, the associated NaN-value in the environment column should be replaced with a string RD. What I did so far I stumbled across df[‘environment’] = df[‘environment].fillna(‘RD’) which replaces every NaN (which is not what I am looking for), pd.isnull(df[‘environment’]) which is
Compare two dataframes with same index using one column
I have the following two dataframes (samples). I’d like to know which companies had their sales changed between the two dataframes. For example, AAPL is different in the second dataframe. Answer This you can using ne (not equal)
How to add values to a new column in pandas dataframe?
I want to create a new named column in a Pandas dataframe, insert first value into it, and then add another values to the same column: Something like: How do I do that? Answer Dont do it, because it’s slow: updating an empty frame a-single-row-at-a-time. I have seen this method used WAY too much. It is by far the slowest.
ValueError: DataFrame constructor not properly called
I am trying to create a dataframe with Python, which works fine with the following command: but, when I try to get the data from a variable instead of hard-coding it into the data argument; eg. : I expect this is the same and it should work? But I get: Answer Reason for the error: It seems a string representation
Delete rows with date’s before the required date point based on key value
I have a pd.dataframe that looks like this: So now based on the key_value, I want to drop all the rows that have their date column value before 2018-04-01 I want to have an end output like this: Answer You can just filter your dataframe using Boolean indexing. There is no groupwise operation here. Just remember to convert your series
pivot_table requires more memory if dtype is category (MemoryError)
I have the following strange error with pandas(pandas==0.23.1) : I am wondering if this is expected and I am doing something wrong, or if this is a bug in pandas. Should dtype category for str not be very transparent (for this use case)? Answer This is not a bug. What’s happening is pandas.pivot_table is calculating the Cartesian product of grouper
Pandas query function not working with spaces in column names
I have a dataframe with spaces in column names. I am trying to use query method to get the results. It is working fine with ‘c’ column but getting error for ‘a b’ For this I am getting this error: I don’t want to fill up space with other characters like ‘_’ etc. There is one hack using pandasql to
How to increase process speed using read_excel in pandas?
I need use pd.read_excel to process every sheet in one excel file. But in most cases,I did not know the sheet name. So I use this to judge how many sheet in excel: During the process,I found that the process is quite slow, So,can read_excel only read limited rows to improve the speed? I tried nrows but did not work..still