I’m trying to configure my IPython output in my OS X terminal, but it would seem that none of the changes I’m trying to set are taking effect. I’m trying to configure the display settings such that wider outputs like a big DataFrame will output without any truncation or as the summary info. After importing pandas into my script, I
Tag: pandas
Pandas: getting the name of the minimum column
I have a Pandas dataframe as below: I want to append a reason column that gives a standard text + the column name of the minimum value of that row. In other words, the desired output is: I can do incomplete_df.apply(lambda x: min(x),axis=1) but this does not ignore NAN’s and more importantly returns the value rather than the name of
Pandas fillna on datetime object
I’m trying to run fillna on a column of type datetime64[ns]. When I run something like: df[‘date’].fillna(datetime(“2000-01-01”)) I get: TypeError: an integer is required Any way around this? Answer This should work in 0.12 and 0.13 (just released). @DSM points out that datetimes are constructed like: datetime.datetime(2012,1,1) SO the error is from failing to construct the value the you are
Format / Suppress Scientific Notation from Pandas Aggregation Results
How can one modify the format for the output from a groupby operation in pandas that produces scientific notation for very large numbers? I know how to do string formatting in python but I’m at a loss when it comes to applying it here. This suppresses the scientific notation if I convert to string but now I’m just wondering how
Compute daily climatology using pandas python
I am trying to use pandas to compute daily climatology. My code is: cum_data is the data frame containing daily dates from 1st Jan 1950 to 31st Dec 1953. I want to create a new vector of length 365 with the first element containing the average of rand_data for January 1st for 1950, 1951, 1952 and 1953. And so on
ImportError: No module named dateutil.parser
I am receiving the following error when importing pandas in a Python program Also here’s the program: Answer On Ubuntu you may need to install the package manager pip first: Then install the python-dateutil package with:
Parsing a JSON string which was loaded from a CSV using Pandas
I am working with CSV files where several of the columns have a simple json object (several key value pairs) while other columns are normal. Here is an example: After using df = pandas.read_csv(‘file.csv’), what’s the most efficient way to parse and split the stats column into additional columns? After about an hour, the only thing I could come up
Constructing a co-occurrence matrix in python pandas
I know how to do this in R. But, is there any function in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring. For example a matrix df: would yield: Since the matrix is mirrored on the diagonal I guess there would be a way to optimize code. Answer It’s a simple
What is the most efficient way of counting occurrences in pandas?
I have a large (about 12M rows) DataFrame df: The following ran in a timely fashion: However, this is taking an unexpectedly long time to run: What am I doing wrong here? Is there a better way to count occurrences in a large DataFrame? ran pretty well, so I really did not expect this Occurrences_of_Words DataFrame to take very long
Pandas get topmost n records within each group
Suppose I have pandas DataFrame like this: which looks like: I want to get a new DataFrame with top 2 records for each id, like this: I can do it with numbering records within group after groupby: which looks like: then for the desired output: Output: But is there more effective/elegant approach to do this? And also is there more