Tag: pandas

Convert SAS data to a python dataframe

I have this small little code here to import a SAS file into dataframe in Python. The code runs forever without any output. The sas file I’m trying to import is 1.5gb. Answer You should use the native pandas function pandas.read_sas it’s faster than iterating through the file as you did. Here is the documentation of the pandas.read_sas function. This

Python pandas – new column’s value if the item is in the list

conditional-statements numpy pandas python

I want to create a new column in pandas dataframe. The first column contains names of countries. The list contains countries I am interested in (eg. in EU). The new colum should indicate if country from dataframe is in the list or not. Below is the shortened version of the code: The error I get is: ValueError: The truth value

AtributeError: ‘module’ object has no attribute ‘plt’ – Seaborn

matplotlib pandas python seaborn

I’m very new with these libraries and i’m having troubles while plotting this: And i’m getting this output: I’m running this in my Jupyter Notebook with Python 2.7.12. Any ideas? Answer sns.plt.show() works fine for me using seaborn 0.7.1. Could be that this is different in other versions. However, if you anyways import matplotlib.pyplot as plt you may as well

How to read a list of parquet files from S3 as a pandas dataframe using pyarrow?

boto3 dataframe pandas pyarrow python

I have a hacky way of achieving this using boto3 (1.4.4), pyarrow (0.4.1) and pandas (0.20.3). First, I can read a single parquet file locally like this: I can also read a directory of parquet files locally like this: Both work like a charm. Now I want to achieve the same remotely with files stored in a S3 bucket. I

Update patch edge colours in Geopandas plot

geopandas matplotlib pandas python

I’ve plotted a GeoDataFrame as a choropleth using the following code (geopandas 0.2.1, matplotlib 2.0.2, in a Jupyter notebook, using %inline: Which gives me a map with edges around the polygons: I’d like to remove these. So far, I’ve tried cycling through the patches, setting the edge colours to the face colours: But it has no effect, even if I

Pandas: Resample dataframe column, get discrete feature that corresponds to max value

argmax pandas python resampling

Sample data: gives: I want to resample by ‘2D’ and get the max value, something like: The expected result should be: Can anyone help me? Answer You can resample to get the arg max of value and then use it to extract names and value

Join pandas dataframes based on column values

dataframe mysql pandas python sql

I’m quite new to pandas dataframes, and I’m experiencing some troubles joining two tables. The first df has just 3 columns: DF1: And the second has exactly same two columns (and plenty of others): DF2: What I need is to perform an operation which, in SQL, would look as follows: And, as a result, I want to see DF2, complemented

python pandas how to compute on rows with same index values

dataframe pandas python

I have a dataframe called resulttable that looks like: where df Index values are the index values when resulttable is printed or exported to xls, Tag = str, and Exp. m/z, Intensity, and Norm_Intensity are float64. The tag values will be coming from the file names in a specified folder, so they can vary. As you can see, each tag

How to read only visible sheets from Excel using Pandas?

excel pandas python

I have to get some random Excel sheets where I want to read only visible sheets from those files. Consider one file at a time, let’s say I have Mapping_Doc.xls which contains 2-visible sheets and 2-hidden sheets. As the sheets are less here, I can parse them with names like this: Code : Output: How can I get only the

Plotting multiple boxplots in seaborn

boxplot pandas python seaborn

I want to plot boxplots using seaborn in pandas because it is a nicer way to visualize data, but I am not too familiar with it. I have three dataframes that are different metrics, and I want to compare the different metrics. I will loop through the file paths to access them. The dfs for each of the metrics are