Tag: pandas

Writing a pickle file to an s3 bucket in AWS

amazon-s3 amazon-web-services pandas python

I’m trying to write a pandas dataframe as a pickle file into an s3 bucket in AWS. I know that I can write dataframe new_df as a csv to an s3 bucket as follows: I’ve tried using the same code as above with to_pickle() but with no success. Answer I’ve found the solution, need to call BytesIO into the buffer

Web scraping python (beautifull soup) multiple page and subpage

beautifulsoup pandas python web-scraping

I create my soup with : I’m trying to create a dataframe from web scraping this site “https://myanimelist.net” et and i would like to get in a first step anime title, eps, type and secondly in detail of each anime (page like that : https://myanimelist.net/anime/2928/hack__GU_Returner) i would like to gather the score that user assigned contains in (for example :

How can I create the minimum size executable with pyinstaller?

anaconda pandas pyinstaller python virtualenv

I am on Windows 10, I have anaconda installed but I want to create an executable independently in a new, clean minimal environment using python 3.5. So I did some tests: TEST1: I created a python script test1.py in the folder testenv with only: Then I created the environment, installed pyinstaller and created the executable And it creates my test1.exe

How to select rows in Pandas dataframe where value appears more than once

pandas python

Let’s say I have the Pandas dataframe with columns of different measurement attributes and corresponding measurement values. How can I filter this dataframe to only have measurements that appear more than X number of times? For example, for this dataframe I want to get all rows with more than 5 measurements (lets say only parameters ‘A’ and ‘B’ appear more

Convert pandas DataFrame to list of JSON-strings

elasticsearch json pandas python

I need to know how to implement to_json_string_list() function in that case: to get output like: {“rec1” : “val1”, “rec2” : “val4”} {“rec1” : “val3”, “rec2” : “val4”} I know that there are function to_json(orient=’records’), but it is not that I need, because I get: [{“rec1” : “val1”, “rec2” : “val4”}, {“rec1” : “val3”, “rec2” : “val4”}] Printing is not

Python – Calculating Percent of Grand Total in Pivot Tables

pandas percentage pivot-table python

I have a dataframe that I converted to a pivot table using pd.pivot_table method and a sum aggregate function: I have received an output like this: I would like to add another pivot table that displays percent of grand total calculated in the previous pivot table for each of the categories. All these should add up to 100% and should

python pandas merge multiple csv files

csv datetime pandas python

I have around 600 csv file datasets, all have the very same column names [‘DateTime’, ‘Actual’, ‘Consensus’, ‘Previous’, ‘Revised’], all economic indicators and all-time series data sets. the aim is to merge them all together in one csv file. With ‘DateTime’ as an index. The way I wanted this file to indexed in is the time line way which means

Sort a pandas dataframe series by month name

dataframe date pandas python sorting

I have a Series object that has: Problem statement: I want to make it appear by month and compute the mean price for each month and present it with a sorted manner by month. Desired Output: I thought of making a list and passing it in a sort function: but the sort_values doesn’t support that for series. One big problem

Pandas finding local max and min

dataframe numpy pandas python time-series

I have a pandas data frame with two columns one is temperature the other is time. I would like to make third and fourth columns called min and max. Each of these columns would be filled with nan’s except where there is a local min or max, then it would have the value of that extrema. Here is a sample

AttributeError: ‘PandasExprVisitor’ object has no attribute ‘visit_Ellipsis’, using pandas eval

apply eval pandas python

I have a series of the form: Note that its elements are strings: I’m trying to use pd.eval to parse this string into a column of lists. This works for this sample data. However, on much larger data (order of 10K), this fails miserably! What am I missing here? Is there something wrong with the function or my data? Answer