Tag: pandas

Python project – Writing contents of .txt file to Pandas dataframe

I’m currently working on a Python project where I want to: Loop through subdirectories of a root directory Find .txt files with names starting with ‘memory_’. Txt files are: newline-separated, lines consist of: ‘colName: Value’ pairs. Like this. Appending the contents of the .txt…

How to add prefix to the selected records in python pandas df

dataframe pandas python

I have df where some of the records in the column contains prefix and some of them not. I would like to update records without prefix. Unfortunately, my script adds desired prefix to each record in df: How can I ommit records with the prefix? I’ve tried with df[df[‘ids’].str.contains(prefix)…

Remove a substring from a list of string using Pandas

pandas python substring

I’m trying to search some post related to remove substring key but unfortunately those solutions don’t work in my case. Could you help me ? Input: Output Answer You don’t even need pandas for that. A simple list comprehension should be enough

Plotting row based data

pandas pandas-groupby python

I have some data that looks something like this: What I would like to do is to plot the values of the header dates, but grouped by id and period. So essentially this would become 6 line plots, with the x-axis given as the dates. However, maybe I’m just tired, but this data set is weirdly put together im…

Group by column on basis of occurrences

pandas python

This is my dataframe: This is how my dataframe looklike: I first counted the no of items for each user using the following code: This is how it looks: I want to remove a user who is coming 1 time and also observation is 1. For example user nemo and cokie. But I don’t want to remove user alex, even

pandas datetime doesn’t convert the dates properly in python

pandas python

i have a dataframe data i want to convert the dat column to “YYYY-MM-DD” format which is currently in dd-mm-yy format Code using The output of this is coming out to be wrong Problems it was supposed to be giving output year as 1968 and not 2068 Months and date are also not coming in proper order R…

How can I select top k rows based on another dataframe in python?

dataframe numpy pandas python

I have data as follows. Users are 1001 to 1004 (but actual data has one million users). Each user has corresponding probabilities for the variables AT1 to AT6. I would like to select the top 3 users for each choice based on the following data. In the output, top1 to top3 are the top 3 users based on probabili…

Accessing pandas cell value using df.itertuples() and column name gives AttributeError

dataframe loops pandas python

I have the following dataframe from where I want to retrieve the cell values using index and column names. The left column indicates the index values whereas the column names are from 1 to 5. This is a dummy dataframe which looks small but going forward I will be using this code to access a dataframe with 100…

Pandas: Unable to merge on two date columns

dataframe numpy pandas python python-datetime

I have two dataframes that look like: df1: df2: Both date columns have been made using the pd.to_datetime() method, and they both supposedly have <M8[ns] data types when using df1.Date.dtype and df2.Date.dtype. However when trying to merge the dataframes with pd.merge(df,hpi,how=”left”,on=&#822…

pandas consecutive Boolean event rollup time series

pandas python

Here’s some made up time series data on 1 minute intervals: This is just some code to create some Boolean columns On my screen this prints: What I am trying to figure out is how to rollup per hour cumulative events (True or 1) but if there is no 0 between events, its the same event! Hopefully that makes…