I have this small little code here to import a SAS file into dataframe in Python. The code runs forever without any output. The sas file I’m trying to import is 1.5gb. Answer You should use the native pandas function pandas.read_sas it’s faster than iterating through the file as you did. Here is the documentation of the pandas.read_sas function. This
Tag: pandas
Python pandas – new column’s value if the item is in the list
I want to create a new column in pandas dataframe. The first column contains names of countries. The list contains countries I am interested in (eg. in EU). The new colum should indicate if country from dataframe is in the list or not. Below is the shortened version of the code: The error I get is: ValueError: The truth value
AtributeError: ‘module’ object has no attribute ‘plt’ – Seaborn
I’m very new with these libraries and i’m having troubles while plotting this: And i’m getting this output: I’m running this in my Jupyter Notebook with Python 2.7.12. Any ideas? Answer sns.plt.show() works fine for me using seaborn 0.7.1. Could be that this is different in other versions. However, if you anyways import matplotlib.pyplot as plt you may as well
How to read a list of parquet files from S3 as a pandas dataframe using pyarrow?
I have a hacky way of achieving this using boto3 (1.4.4), pyarrow (0.4.1) and pandas (0.20.3). First, I can read a single parquet file locally like this: I can also read a directory of parquet files locally like this: Both work like a charm. Now I want to achieve the same remotely with files stored in a S3 bucket. I
Update patch edge colours in Geopandas plot
I’ve plotted a GeoDataFrame as a choropleth using the following code (geopandas 0.2.1, matplotlib 2.0.2, in a Jupyter notebook, using %inline: Which gives me a map with edges around the polygons: I’d like to remove these. So far, I’ve tried cycling through the patches, setting the edge colours to the face colours: But it has no effect, even if I
Pandas: Resample dataframe column, get discrete feature that corresponds to max value
Sample data: gives: I want to resample by ‘2D’ and get the max value, something like: The expected result should be: Can anyone help me? Answer You can resample to get the arg max of value and then use it to extract names and value
Join pandas dataframes based on column values
I’m quite new to pandas dataframes, and I’m experiencing some troubles joining two tables. The first df has just 3 columns: DF1: And the second has exactly same two columns (and plenty of others): DF2: What I need is to perform an operation which, in SQL, would look as follows: And, as a result, I want to see DF2, complemented
python pandas how to compute on rows with same index values
I have a dataframe called resulttable that looks like: where df Index values are the index values when resulttable is printed or exported to xls, Tag = str, and Exp. m/z, Intensity, and Norm_Intensity are float64. The tag values will be coming from the file names in a specified folder, so they can vary. As you can see, each tag
How to read only visible sheets from Excel using Pandas?
I have to get some random Excel sheets where I want to read only visible sheets from those files. Consider one file at a time, let’s say I have Mapping_Doc.xls which contains 2-visible sheets and 2-hidden sheets. As the sheets are less here, I can parse them with names like this: Code : Output: How can I get only the
Plotting multiple boxplots in seaborn
I want to plot boxplots using seaborn in pandas because it is a nicer way to visualize data, but I am not too familiar with it. I have three dataframes that are different metrics, and I want to compare the different metrics. I will loop through the file paths to access them. The dfs for each of the metrics are