Skip to content
Advertisement

pyspark matplotlib integration with Zeppelin

I’m trying to draw histogram using pyspark in Zeppelin notebook. Here is what I have tried so far,

%pyspark

import matplotlib.pyplot as plt
import pandas
...
x=dateDF.toPandas()["year(CAST(_c0 AS DATE))"].values.tolist()
y=dateDF.toPandas()["count(year(CAST(_c0 AS DATE)))"].values.tolist()
plt.plot(x,y)
plt.show()

This code run without no errors but this does not give the expected plot. So I googled and found this documantation, enter image description here

According to this, I tried to enable angular flag as follows,

x=dateDF.toPandas()["year(CAST(_c0 AS DATE))"].values.tolist()
y=dateDF.toPandas()["count(year(CAST(_c0 AS DATE)))"].values.tolist()
plt.close()
z.configure_mpl(angular=True,close=False)
plt.plot(x,y)
plt.show()

But now I’m getting an error called No module named 'mpl_config' and I have no idea how to enable angular without this. If you can suggest how to resolve this it will be greatly appriciated

Advertisement

Answer

After struggling some time I noticed that this is a major Bug on Zepplien notebook marked in 2020 November by @Ruslan Dautkhanov. According to him,

mpl_config is part of core Zeppelin. The old Python Interpreter was copying it manually here https://github.com/apache/zeppelin/blob/0d746fa2e2787a661db70d74035120ae3516ace3/python/src/main/java/org/apache/zeppelin/python/PythonInterpreter.java#L179

But new IPythonInterpeter doesn’t do this.

I hope this will solve in future and keep the question for future references

Advertisement