import matplotlib.pyplot as plt #for graphing data
import numpy as np
plt.figure()
x = col1 = [2011.005, 2012.6543, 2013.3456, 2014.7821, 2015.3421, 2016.7891, 2017.0173, 2018.1974]
col2 = [1.4356, "", 5.32245, 6.542, 7.567, .77558, "", ""]
col3 = [1.3345, 2.345, "", 5.356, 3.124, 6.12, "", ""]
col4 = [0.67, 4.235, "", 6.78, "", "", 9.56, ""]
plt.plot(col1, col2, label="Sample 1")
plt.plot(col1, col3, label="Sample 2")
plt.plot(col1, col4, label="Sample 3")
When I plot this graph the y-axis looks very off. Realising I need to remove the “” spaces in the list, I tried this method:
x1 = []
y1 = []
for index in range(len(col2)):
if (col2[index] != ""):
y1.append(col2[index])
x1.append(col1[index])
x2 = []
y2 = []
for index in range(len(col3)):
if (col3[index] != ""):
y2.append(col3[index])
x2.append(col1[index])
x3 = []
y3 = []
for index in range(len(col4)):
if (col4[index] != ""):
y3.append(col4[index])
x2.append(col1[index])
print(x2) #showed that there were 9 values for x2 and 5 values for x1
print(y2)
plt.plot(x1, y1, "b.", linewidth = 1, label="Sample 1")
plt.plot(x2, y2, "g.", linewidth = 1, label="Sample 2")
plt.plot(x3, y3, "k.", linewidth = 1, label="Sample 3")
plt.title("Testing", fontsize=16)
plt.show()
This showed me a dimensional error. I don’t know how to only extract the corresponding values of x to the y values.
Advertisement
Answer
You can use pandas’ pd.to_numeric(..., errors='coerce')
to convert each of the strings in the lists to ‘nan’. (Numpy’s np.genfromtxt(np.array(..., dtype=str))
does something similar, but also removes the empty strings).
nan
values will be skipped while plotting. Matplotlib puts its list of x-values next to the corresponding y-values, e.g. 2011.005, 1.4356
for the first pair and 2012.6543, np.nan
for the second. Each pair that has one or two nan
values will not be plotted.
Here is some example code:
import matplotlib.pyplot as plt
import pandas as pd
col1 = [2011.005, 2012.6543, 2013.3456, 2014.7821, 2015.3421, 2016.7891, 2017.0173, 2018.1974]
col2 = [1.4356, "", 5.32245, 6.542, 7.567, .77558, "", ""]
col3 = [1.3345, 2.345, "", 5.356, 3.124, 6.12, "", ""]
col4 = [0.67, 4.235, "", 6.78, "", "", 9.56, ""]
col1 = pd.to_numeric(col1, errors='coerce')
col2 = pd.to_numeric(col2, errors='coerce')
col3 = pd.to_numeric(col3, errors='coerce')
col4 = pd.to_numeric(col4, errors='coerce')
plt.figure()
plt.plot(col1, col2, "b.", linewidth=1, label="Sample 1")
plt.plot(col1, col3, "g.", linewidth=1, label="Sample 2")
plt.plot(col1, col4, "r.", linewidth=1, label="Sample 3")
plt.legend()
plt.show()
It is unclear how your csv file looks like. The following example supposes the file looks like csv_as_str
. (StringIO
is a function that lets you mimic a file with a string, so it is easier to add to a post. Reading from a file would just be df = pd.read_csv('your_file.csv')
.)
import pandas as pd
from io import StringIO
csv_as_str ='''
col1,col2,col3,col4
2011.005,1.4356,1.3345,0.67
2012.6543,,2.345,4.235
2013.3456,5.32245,,
2014.7821,6.542,5.356,6.78
2015.3421,7.567,3.124,
2016.7891,0.77558,6.12,
2017.0173,,,9.56
2018.1974,,,
'''
df = pd.read_csv(StringIO(csv_as_str))
Then the dataframe already has nan
for the empty spots:
col1 col2 col3 col4
0 2011.0050 1.43560 1.3345 0.670
1 2012.6543 NaN 2.3450 4.235
2 2013.3456 5.32245 NaN NaN
3 2014.7821 6.54200 5.3560 6.780
4 2015.3421 7.56700 3.1240 NaN
5 2016.7891 0.77558 6.1200 NaN
6 2017.0173 NaN NaN 9.560
7 2018.1974 NaN NaN NaN