Skip to content

pyspark – How to define MapType for when/otherwise

I have a pyspark DataFrame with a MapType column that either contains the map<string, int> format or is None. I need to perform some calculations using collect_list. But collect_list excludes None values and I am trying to find a workaround, by transforming None to string similar to Include null values …

Grouping asynchronous functions to run

I have a code that outputs numbers from 1 to 10: Output: 1 2 3 4 5 6 7 8 9 10 It is noticeable that in the example above, 10 functions are simultaneously launched at once. How can I fix the code so that the number of concurrent launched functions main() is equal to count_group? That is, immediately the

Count how often values in a 2D array appear in a 3D array

I have one 2-dimensional numpy array and another 3D-dimensional array. For each number in the first array I would like to count how often this value or an extremer one appears in the second array (taking the 3rd dimension as comparison vector for each element in the first array). For 0 values the function sho…

Pandas slow to merge and convert to datetime

I have two columns of data in a DataFrame containing a date and a time. Both start as strings. I want them to end up merged as a single column in datetime format. The head of the DataFrame is: They are in a DF called df_flattened and has about 20k rows and the code I am currently using is: However,