I have two DataFrames of 20 rows and 4 columns. The names and value types of the columns are the same. One of the columns is the title, the other 3 are values. Now I would like to create 3 separate tables/lists subtracting each value of df1.col1 – df2.col1 | df1.col2 – df2.col2 | df1.col3 – …
Tag: pandas
How to label multi-word entities?
I’m quite new to data analysis (and Python in general), and I’m currently a bit stuck in my project. For my NLP-task I need to create training data, i.e. find specific entities in sentences and label them. I have multiple csv files containing the entities I am trying to find, many of them consisti…
Delete characters on python dataframe, the number of characters removed per line varies
i want the first line to remove 0 characters to the right, the second line to remove 8 characters to the right, The resulting data will have the following form Thank you very much everyone. I am a newbie and my English is not very good. Hope everyone can help me Answer You can use Pandas’ string methods…
Extracting features from dataframe
I have pandas dataframe like this For example if “ex” start to 533,535,545 new variable should be : Sample output : How can i do that ? Answer You can use np.where: Update You can also use your Phone column directly: Note: If Phone column contains strings you can safely remove .astype(str).
Group values of a chosen column into a list when creating a dictionary from a pandas data frame with a non-unique index
I have a dataframe that looks like his I want to get a dictionary structure that looks as follows I have seen this answer. But it seems like overkill for what I want to do as it converts every value of the key inside the nested dictionary into a list. I would only like to convert col1 into a list
Create new dataframe that contain the average value from some of the columns in the old dataframe
I have a dataframe extracted from a csv file. I want to iterate a data process where only some of the columns’s data is the mean of n rows, while the rest of the columns is the first row for each iteration. For example, the data extracted from the csv consisted of 100 rows and 6 columns. I have a
Majority of my column headers are dates in my dataframe, not able to use the loc function – how do I fix this?
I have a dataframe that shows the number of downloads for each show, where every month is a column, with the actual start of each month being the data column name. df looks like this below: Show 2017-08-01 00:00:00 2017-09-01 00:00:00 2017-10-01 00:00:00 Show 1 23004 50320 450320 Show 2 30418 74021 92103 Howe…
How to create a list that groups columns according to their type?
I have a dataframe that contains over 200 columns. I need to create two lists that group columns by type. Rather than creating this list manually I tried different methods that did not work: or: or : Can anyone help me solve this problem? Answer I believe that for this question you can use df.select_dtypes li…
Rename a row with X unknown characters
If I have the following dataframe: ID other 219218 34 823#32 47 unknown 42 8#3#32 32 1#3#5# 97 6#3### 27 I want to obtain the following result: ID other 219218 34 823#32 47 unknown 42 8#3#32 32 unknown 97 unknown 27 I am using the following code which works. Is there a way to make it more optimal, bearing in
python convert single json column to multiple columns
I have a data frame with one json column and I want to split them into multiple columns. Here is a df I’ve got. I want the output as below: I’ve tried Both didn’t work. can someone please tell me how to get output that I want? Answer One way using pandas.DataFrame.explode: Output: