Skip to content
Advertisement

Tag: pandas

How do I find the position of a value in a pandas.DataFrame?

I want to search for ‘row3’ in the index of the DataFrame below, which should be 2 for a zero-based array. Is there a function which return the row number of ‘row3’? Thanks in advance for any help. Answer You could use Index.get_loc (docs): But this is a somewhat unusual access pattern– never need it myself.

Python Pandas Setting Dataframe index and Column names from an array

Lets say I have data loaded from an spreadsheet: and I have a the names in another dataframe for the rows and the columns for example Is there a way I can set the values in colnames[‘Names’].value as the index for df? and is there a way to do this for column names? Answer How about df.index = colnames[‘Names’] for

Pandas dataframe.dot division method

I am trying to divide two series of different length to return the matrix product dataframe of them. I can multiply them using the dot method (from this answer): I’ve tried the div method, but this just fills the dataframe with NaNs: Likewise the standard division operator also returns the same result: So I’m a bit stumped as to what

Ordered Logit in Python?

I’m interested in running an ordered logit regression in python (using pandas, numpy, sklearn, or something that ecosystem). But I cannot find any way to do this. Is my google-skill lacking? Or is this not something that’s been implemented in a standard package? Answer Update: Logit and Probit Ordinal regression models are now built in to statsmodels. https://www.statsmodels.org/devel/examples/notebooks/generated/ordinal_regression.html Examples are

How to loop over grouped Pandas dataframe?

DataFrame: Code: I’m trying to just loop over the aggregated data, but I get the error: ValueError: too many values to unpack @EdChum, here’s the expected output: The output is not the problem, I wish to loop over every group. Answer df.groupby(‘l_customer_id_i’).agg(lambda x: ‘,’.join(x)) does already return a dataframe, so you cannot loop over the groups anymore. In general: df.groupby(…)

How do you represent missing data in a Pandas DataFrame?

Does Pandas have an equivalent of R’s na (meaning not available)? If not, what is the convention for representing a missing value, as opposed to NaN which represents a mathematically impossible value such as a divide by zero? Answer Currently there is no NA value available in Pandas or NumPy. From the section “Working with missing data” in the Pandas

Plot multiple boxplot in one graph in pandas or matplotlib?

I have a two boxplotes But I want to place them in one graph to compare them. Have you any advice to solve this problem? Thanks! Answer Use return_type=’axes’ to get a1.boxplot to return a matplotlib Axes object. Then pass that axes to the second call to boxplot using ax=ax. This will cause both boxplots to be drawn on the

Check if dataframe column is Categorical

I can’t seem to get a simple dtype check working with Pandas’ improved Categoricals in v0.15+. Basically I just want something like is_categorical(column) -> True/False. We can see that the dtype for the categorical column is ‘category’: And normally we can do a dtype check by just comparing to the name of the dtype: But this doesn’t seem to work

Advertisement