Tag: python

Pandas groupby column and sum nulls of all other columns

I have a dataframe with the following structure: I’d like to know, grouping by group, how many nulls there are in each column. In this case, the output should be: I don’t have control on how many columns I have or their names. Thanks! Answer Convert column group to index, test all another values f…

New column based on values from other columns AND respecting pre-established rules

data-wrangling python r

I’m looking for an algorithm to create a new column based on values from other columns AND respecting pre-established rules. Here’s an example: artificial data The goal is to create a new_column based on the values of col_1, col_2, and col_3. For that, the rules are: If the value ‘Yes&#8…

element wise “contains” in Python

python

Say I have an array: then arr>3 results in an array of type bool with shape (20,). How can I most efficiently do the same thing with the “contains” operator? The simple will result in “The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()”. Is t…

Apply T-Test test per group

pandas pandas-apply python

I have dataframe like this: And i want to calculate p-value from T-Test for each variable between groups. I can manually calculate each p-value like this: So the question is how can i get a result dataframe like shown below for all variables automatically? Answer There are several ways, the core idea is to us…

Set random labels for images in tf.data.Dataset

python python-3.x tensorflow tensorflow-datasets tensorflow2.0

I have a tf data dataset of images with a signature as seen below : All the labels in this dataset are 0. What I would like to do is change each of these labels to a random number from 0 to 3. My code is : This however just assigns 1 to all images as a label. The strange

Java must be installed on this system to use this when using dataflow flex template python

apache-beam java python sql

I’m using SQL transform of apache_beam python and deploy to Dataflow by Flex Template. The pipeline show the error: Java must be installed on this system to use. I know the SQL transform of beam python using Java, I researched the way to add Java to pipeline but all failed. Can you give any advice on ho…

Sklearn – Best estimator from GridSearchCV with refit = True

python scikit-learn

I’m trying to finds the best estimator using GridSearchCV and I’m using refit = True as per default. Given that the documentation states: Should I do .fit on the training data afterwards as such: Or should I do it like this instead: Answer You should do it like your first verison. You need to alwa…

Add additional timestamp to Pandas DataFrame items based on item timestamp/index

dataframe pandas python

I have a large time-indexed Pandas DataFrame with time-series data of a couple of devices. The structure of this DataFrame (in code below self._combined_data_frame) looks like this: The DateTimeIndex and device_name are filled for every row, the other columns contain nan values. Sample data is available on Go…

Prediction with keras embedding leads to indices not in list

keras python tensorflow word-embedding

I have a model that I trained with For the embedding I use Glove as a pre-trained embedding dictionary. Where I first build the tokenizer and text sequence with: t = Tokenizer() t.fit_on_texts(all_text) and then I’m calculating the embedding matrix with: now I’m using a new dataset for the predict…