I would like to extract some information from a column in my dataframe: Example I was using str.contain to extract the first part (i.e., all the information before the first dash, where there is. I am still getting the same original column (so no extraction). My output would consist in two columns, one withou…
Tag: pandas
Plotting average linear regression of data set consisting of missing values
I was trying to plot a linear graph using m,b = np.polyfit(x0, y0, 1) function however when I print m2,b2,m3,b3 I get nan. from the empty values. How do I fix this? Answer You seem to have a typo in It would probably help to rename the variables idxy12,idxy13 and idxy14 or so. You also could write all this wi…
Multiple dates in a pandas column
I am trying to make the dates in a Pandas DataFrame all of the same format. Currently I have the DataFrame storing the dates in two formats. “6/08/2017 2:15:00 AM” & 2016-01-01T00:05:00 The column name which these dates are stored under is INTERVAL_END. As you can see, one of the dates is a st…
Multiple XML files in directory Python
I am fairly new to Python and this community has been a great help! I am learning a lot. I’m trying to use this existing code to loop through multiple XML files in the same directory. Currently, the code is looking at one specific file. Any help is greatly appreciated! Answer This should help you…
after a groupby create a new column with a list of unique values for another column of the groupes values
So i have a dataframe with two columns: artistID and genre: And what I want to do is to group by the column artistID (so the resulting datafdrame has as many rows as artistID there are in this dataframe), and the second column of the new dataframe I want it to be like a list or an array or whatever
Converting a dataframe with a line separator
I make a function that accepts a dataframe as input: And returns a dataframe, where a certain delimiter number (in the example, it is 6) is the passed parameter: Here’s what I got: How can I simplify the function and make it more versatile? How do I make the function faster? Thanks. Answer You can do th…
Get the max value from each group with pandas.DataFrame.groupby
I need to aggregate two columns of my dataframe, count the values of the second columns and then take only the row with the highest value in the “count” column, let me show: so far so good, but now I need to get only the row of each ‘col1’ group that has the maximum ‘count’…
I’m getting float axis even with the command MaxNlocator(integer=True)
I have this df called normales: With this code i’m plotting time series and bars: You can realize that i’m using ax.yaxis.set_major_locator(MaxNLocator(integer=True)) in every axis to make integer the numbers of the axis. Although i’m using ax.yaxis.set_major_locator(MaxNLocator(integer=True…
python transform 1d array of probabilities to 2d array
I have an array of probabilities: and I want to make it 2d array: What is the best way to do so? Answer One idea is use numpy.hstack: Or use numpy.c_:
How to replace the ‘,’ between two numbers like X,X% into X.X% in all the dataframe python
I have a column in pandas data frame like below. Column name is ‘ingredients_text’ Now I want to replace all the values like 5,5% to 5.5% in this column in all the dataframe. Answer We can use str.replace here: The pattern b(d+),(d+)% matches in the first and second capture groups, respectively, t…