Tag: pandas

Prevent pandas read_csv treating first row as header of column names

I’m reading in a pandas DataFrame using pd.read_csv. I want to keep the first row as data, however it keeps getting converted to column names. I tried header=False but this just deleted it entirely. (Note on my input data: I have a string (st = ‘n’.join(lst)) that I convert to a file-like object (io.StringIO(st)), then build the csv from that

Merging results from model.predict() with original pandas DataFrame?

pandas python scikit-learn

I am trying to merge the results of a predict method back with the original data in a pandas.DataFrame object. To merge these predictions back with the original df, I try this: But that raises: ValueError: Length of values does not match length of index I know I could split the df into train_df and test_df and this problem would

Jupyter python3 notebook cannot recognize pandas

anaconda jupyter-notebook pandas python

I am using the Jupyter notebook with Python 3 selected. On the first line of a cell I am entering: The error I get from the notebook is, ImportError: No module named ‘pandas’. How can I install pandas to the jupyter notebook? The computer I launched the Jupyter notebook from definitely has pandas. I tried doing: And it says it

Retrieve list of training features names from classifier

pandas python random-forest scikit-learn

Is there a way to retrieve the list of feature names used for training of a classifier, once it has been trained with the fit method? I would like to get this information before applying to unseen data. The data used for training is a pandas DataFrame and in my case, the classifier is a RandomForestClassifier. Answer Based on the

Dropping Multiple Columns from a dataframe

dataframe pandas python

I know how to drop columns from a data frame using Python. But for my problem the data set is vast, the columns I want to drop are grouped together or are basically singularly spread out across the column heading axis. Is there a shorter way to slice or drop all the columns with fewer lines of code rather than

pandas concat generates nan values

concatenation dataframe nan pandas python

I am curious why a simple concatenation of two dataframes in pandas: of the same shape and both without NaN values can result in a lot of NaN values if joined. How can I fix this problem and prevent NaN values being introduced? Trying to reproduce it like failed e.g. worked just fine as no NaN values were introduced. Answer

Stop Pandas from converting int to float due to an insertion in another column

pandas python type-conversion type-inference

I have a DataFrame with two columns: a column of int and a column of str. I understand that if I insert NaN into the int column, Pandas will convert all the int into float because there is no NaN value for an int. However, when I insert None into the str column, Pandas converts all my int to float

quickest way to swap index with values

pandas python

consider the pd.Series s What is the quickest way to swap index and values and get the following Answer One posible solution is swap keys and values by: Another the fastest: Timings: If length of Series is 1M:

Keeping ‘key’ column when using groupby with transform in pandas

pandas python

Finding a normalized dataframe removes the column being used to group by, so that it can’t be used in subsequent groupby operations. for example (edit: updated): Now, with most operations on groups the ‘missing’ column becomes a new index (which can then be adjusted using reset_index, or set as_index=False), but when using transform it just disappears, leaving the original index

How to plot multiple lines in one figure in Pandas Python based on data from multiple columns? [duplicate]

matplotlib pandas plot python

This question already has answers here: Plotting multiple lines, in different colors, with pandas dataframe (6 answers) Closed 1 year ago. I have a dataframe with 3 columns, like this: how can I plot a line for A, B and C, where it shows how their weight develops through the years. So I tried this: However, I get multiple plots