Tag: pandas

looking for the difference between ocurrences in a datframe

I have a dataframe like this (the real one is 7 million records and 345 features) the following image is only a small fraction related to if a cliente make an operation in a month. What I want to do is create a column at the end with the mean difference between each operation. For example in the first record

Update column based on other column condition

pandas python

I need to update vid or maybe create a new column based on the change column df = [{‘vid’: 14, ‘change’: 0}, {‘vid’: 15, ‘change’: 1}, {‘vid’: 16, ‘change’: 0}, {‘vid’: 16, ‘change’: 0}, {‘vid’: 17, &#8…

How to split parallel corpora while keeping alignment?

dataset pandas python scikit-learn unix

I have two text files containing parallel text in two languages (potentially millions of lines). I am trying to generate random train/validate/test files from that single file, as train_test_split does in sklearn. However when I try to import it into pandas using read_csv I get errors from many of the lines b…

Pandas: Remove Column Based on Threshold Criteria

dataframe excel numpy pandas python

I have to solve this problem: Objective: Drops columns most of whose rows missing Inputs: 1. Dataframe df: Pandas dataframe 2. threshold: Determines which columns will be dropped. If threshold is .9, the columns with 90% missing value will be dropped Outputs: 1. Dataframe df with dropped columns (if no column…

How to create subplots from each column in a pandas dataframe

pandas plotly plotly-python python

I have a dataframe ‘df’ with 36 columns, these columns are plotted onto a single plotly chart and displayed in html format using the code below. I want to iterate through each column and create a subplot for each one. I have tried; I created 6 rows and columns as that would give 36 plots and tried…

Remove timezone (+01:00) from DateTime

datetime pandas python

I would like to delete the timezone from my dateTime object. Currently i have: 2019-02-21 15:31:37+01:00 Expected output: 2019-02-21 15:31:37 The code I have converts it to: 2019-02-21 14:31:37. Answer In the first line, the parameter utc=True is not necessary as it converts the input to UTC (subtracting one …

Pandas – Duplicate Rows and Slice String

duplicates pandas python string

I’m trying to create duplicate rows during a dataframe on conditions. For example, I have this Dataframe. And I would like to get the following output: Answer For pandas 0.25+ is possible use DataFrame.explode with splitted values by Series.str.split and for remark column list comprehension with filteri…

Is it possible to display pandas styles in the IPython console?

console ipython pandas pandas-styles python

Is it possible to display pandas styles in an iPython console? The following code in a Jupyter notebook correctly produces In the console I only get Is it possible to achieve a similar result here, or is the style engine dependent on an html frontend? Thanks in advance for any help. Answer I believe that the …

Drop rows that contains the data between specific dates

dataframe numpy pandas python

The file contains data by date and time: All I want I want drop rows that contains between these dates and includes the start and end dates: Any Idea? Answer Sample: Use boolean indexing for filter by condition with chain by | for bitwise OR: Or filter by Series.between and invert mask by ~: