I am using pyspark to read a parquet file like below:
my_df = sqlContext.read.parquet('hdfs://myPath/myDB.db/myTable/**')
Then when I do my_df.take(5)
, it will show [Row(...)]
, instead of a table format like when we use the pandas data frame.
Is it possible to display the data frame in a table format like pandas data frame? Thanks!
Advertisement
Answer
The show method does what you’re looking for.
For example, given the following dataframe of 3 rows, I can print just the first two rows like this:
df = sqlContext.createDataFrame([("foo", 1), ("bar", 2), ("baz", 3)], ('k', 'v')) df.show(n=2)
which yields:
+---+---+ | k| v| +---+---+ |foo| 1| |bar| 2| +---+---+ only showing top 2 rows