Tag: data-analysis

Polars equivalent of pandas expression df.groupby[‘col1′,’col2’][‘col3’].sum().unstack()

analytics data-analysis dataframe python python-polars

How can i create an equivalent truth table in polars? Something like the below table into a truth table The efficiency of the code is important as the dataset is too large (for using it with apriori algorithm) The unstack function in polars is different, polars alterative for pd.crosstab would also work. Answer It seems like you want tot do

why p-value for high-correlation data is 1? what is wrong?

correlation data-analysis p-value python

I try to filter correlation matrix with p-value for the following matrix I use the following code But the answer that I get it is strange, because the main correlation without filtering is and the P-value matrix is while all should be zero, I do not know what could be the reason, has someone had the same problem before? Answer

Pandas creating a column comparing with different sheets

data-analysis pandas python

My excel includes id of users in current sheet/user sheet and id and name of the users in another sheet/name. I need to compare id and add the name of users in user sheet.Just as shown in figure. Answer assuming: sheet1 is ‘s1’ sheet2 is ‘s2’ and names of the columns are user_id,names you can use dictionary to do this

how to count data in a certain column in python(pandas)?

data-analysis data-science dataframe pandas python

hope you’re doing well . i tried counting green color row after another green colored row in the table below In [1]: df = pd.DataFrame([[green], [red], [red]], columns=[‘A’]) the code i tried to count greengreen: but it didn’t work,hope you can help. note: i’m new to data science Answer You can use: As a one-liner (python ≥ 3.8): example input:

How to create a frequency table of each subject from a given timetable using pandas?

data-analysis dataframe numpy pandas python

This is a time table, columns=hour, rows=weekday, data=subject [weekday x hour] How do you generate a pandas.Dataframe where, rows=weekday, columns=subject, data = subject frequency in the corresponding weekday? Required table: [weekday x subject] Answer Use melt to flatten your dataframe then pivot_table to reshape your dataframe: Output:

Summing up all repeated values in a dataset

data-analysis dataset python r

I have a dataset in which in a column I have the name of a person and in another column I have the amount she was paid for a given service. I’d like to build a list with the names of all people ordained by the total amount they were paid regardless of the service they performed. Example: I figured

Pandas: How to fill nan value for column with part of value in other columns

data-analysis exploratory-data-analysis pandas python python-3.x

I Want the value in city column to be filled with first word of venue column I tried using df.city.fillna(value=df.venue.str.split()[0]) but it taking first row values to fill Thank you in advance Answer From your DataFrame : After the split() you used, we can use map to assign the first list element to the NaN values in the City column

Function plotting with matplotlib

data-analysis errorbar matplotlib python

I am trying to model an equation that depends on T and parameters xi, mu, sig. I have inferred parameters and spread(standard deviation) of those parameters for different durations (1h, 3h, etc). In the example code the parameters are for 1h duration. I need to create a forloop to create a cloud of zp with the array of xi, mu

Does it make sense? If yes then how to handle in MSE?

data-analysis data-science linear-regression python scikit-learn

Can we do log transform to one variable and sqrt to another for LinearRegression? If yes then what to do during MSE? Should I exp or square the y_test and prediction? Answer If you transform variables in training and test sets you don’t need to care about your evaluation metric. In case you transform your target variable (with the log

module ‘networkx’ has no attribute ‘from_pandas_edgelist’

data-analysis networkx pandas python

here is my code: and there is an error:AttributeError: module ‘networkx’ has no attribute ‘from_pandas_edgelist’* however, this the documents of networx we could find networkx has the attribute. here is the link of the documents:from_pandas_edgelist why did this question happen? Answer Are you defining the alias nx as follows: If yes, try calling the required function as follows: