I have a dataframe with the following structure: I’d like to know, grouping by group, how many nulls there are in each column. In this case, the output should be: I don’t have control on how many columns I have or their names. Thanks! Answer Convert column group to index, test all another values f…
Tag: python
New column based on values from other columns AND respecting pre-established rules
I’m looking for an algorithm to create a new column based on values from other columns AND respecting pre-established rules. Here’s an example: artificial data The goal is to create a new_column based on the values of col_1, col_2, and col_3. For that, the rules are: If the value ‘Yes…
Pandas – How to use multiple cols for mapping (without merging)?
I have a dataframe like as below I would like to do the below a) Attach the location column from key_df to data_df based on two fields – p_id and company So, I tried the below But this resulted in error like below KeyError: “None of [Index([‘p_id’,’company’], dtype=’o…
element wise “contains” in Python
Say I have an array: then arr>3 results in an array of type bool with shape (20,). How can I most efficiently do the same thing with the “contains” operator? The simple will result in “The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()”. Is t…
Apply T-Test test per group
I have dataframe like this: And i want to calculate p-value from T-Test for each variable between groups. I can manually calculate each p-value like this: So the question is how can i get a result dataframe like shown below for all variables automatically? Answer There are several ways, the core idea is to us…
Set random labels for images in tf.data.Dataset
I have a tf data dataset of images with a signature as seen below : All the labels in this dataset are 0. What I would like to do is change each of these labels to a random number from 0 to 3. My code is : This however just assigns 1 to all images as a label. The strange
Java must be installed on this system to use this when using dataflow flex template python
I’m using SQL transform of apache_beam python and deploy to Dataflow by Flex Template. The pipeline show the error: Java must be installed on this system to use. I know the SQL transform of beam python using Java, I researched the way to add Java to pipeline but all failed. Can you give any advice on ho…
Sklearn – Best estimator from GridSearchCV with refit = True
I’m trying to finds the best estimator using GridSearchCV and I’m using refit = True as per default. Given that the documentation states: Should I do .fit on the training data afterwards as such: Or should I do it like this instead: Answer You should do it like your first verison. You need to alwa…
Add additional timestamp to Pandas DataFrame items based on item timestamp/index
I have a large time-indexed Pandas DataFrame with time-series data of a couple of devices. The structure of this DataFrame (in code below self._combined_data_frame) looks like this: The DateTimeIndex and device_name are filled for every row, the other columns contain nan values. Sample data is available on Go…
Prediction with keras embedding leads to indices not in list
I have a model that I trained with For the embedding I use Glove as a pre-trained embedding dictionary. Where I first build the tokenizer and text sequence with: t = Tokenizer() t.fit_on_texts(all_text) and then I’m calculating the embedding matrix with: now I’m using a new dataset for the predict…