Tag: pandas

How to configure display output in IPython pandas

I’m trying to configure my IPython output in my OS X terminal, but it would seem that none of the changes I’m trying to set are taking effect. I’m trying to configure the display settings such that wider outputs like a big DataFrame will output without any truncation or as the summary info. After importing pandas into my script, I

Pandas: getting the name of the minimum column

pandas python

I have a Pandas dataframe as below: I want to append a reason column that gives a standard text + the column name of the minimum value of that row. In other words, the desired output is: I can do incomplete_df.apply(lambda x: min(x),axis=1) but this does not ignore NAN’s and more importantly returns the value rather than the name of

Pandas fillna on datetime object

pandas python

I’m trying to run fillna on a column of type datetime64[ns]. When I run something like: df[‘date’].fillna(datetime(“2000-01-01”)) I get: TypeError: an integer is required Any way around this? Answer This should work in 0.12 and 0.13 (just released). @DSM points out that datetimes are constructed like: datetime.datetime(2012,1,1) SO the error is from failing to construct the value the you are

Format / Suppress Scientific Notation from Pandas Aggregation Results

floating-point number-formatting pandas python scientific-notation

How can one modify the format for the output from a groupby operation in pandas that produces scientific notation for very large numbers? I know how to do string formatting in python but I’m at a loss when it comes to applying it here. This suppresses the scientific notation if I convert to string but now I’m just wondering how

Compute daily climatology using pandas python

pandas python time-series

I am trying to use pandas to compute daily climatology. My code is: cum_data is the data frame containing daily dates from 1st Jan 1950 to 31st Dec 1953. I want to create a new vector of length 365 with the first element containing the average of rand_data for January 1st for 1950, 1951, 1952 and 1953. And so on

ImportError: No module named dateutil.parser

pandas pip python

I am receiving the following error when importing pandas in a Python program Also here’s the program: Answer On Ubuntu you may need to install the package manager pip first: Then install the python-dateutil package with:

Parsing a JSON string which was loaded from a CSV using Pandas

pandas python

I am working with CSV files where several of the columns have a simple json object (several key value pairs) while other columns are normal. Here is an example: After using df = pandas.read_csv(‘file.csv’), what’s the most efficient way to parse and split the stats column into additional columns? After about an hour, the only thing I could come up

Constructing a co-occurrence matrix in python pandas

pandas python statistics

I know how to do this in R. But, is there any function in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring. For example a matrix df: would yield: Since the matrix is mirrored on the diagonal I guess there would be a way to optimize code. Answer It’s a simple

What is the most efficient way of counting occurrences in pandas?

pandas python

I have a large (about 12M rows) DataFrame df: The following ran in a timely fashion: However, this is taking an unexpectedly long time to run: What am I doing wrong here? Is there a better way to count occurrences in a large DataFrame? ran pretty well, so I really did not expect this Occurrences_of_Words DataFrame to take very long