Skip to content
Advertisement

Creating scatterplot / regression line using python

I am stuck on this problem but cannot figure out why it isn’t working as intended. I have a text file with a bunch of x and y coordinates which I need to use to find the average of all x and y values in order to calculate the slope for my regression line. It seems like stamping the individual coordinates works but apparently appending each x or y value to my lists isn’t working right as the error I am getting is “ZeroDivisionError: division by zero”.

Here’s my code:

import turtle
t = turtle.Turtle()
wn = turtle.Screen()
turtle.setworldcoordinates(-100, -100, 100, 100)
wn.bgcolor('lightblue')
t.pencolor('red')
filename = open('data.txt', 'r')

def plotregression():
    sum_of_x = []
    mean_of_x = sum(sum_of_x) / len(sum_of_x)  #doesnt work as intended
    sum_of_y = []
    mean_of_y = sum(sum_of_y) / len(sum_of_x)   #doesnt work as intended
    #slope =
    for line in filename:
        values = line.split()
        sum_of_x = sum_of_x.append(values[1])
        sum_of_y = sum_of_y.append(values[1])
        t.up()
        t.goto(int(values[0]), int(values[1]))
        t.down()
        t.stamp()
        t.down()

plotregression()
filename.close()
wn.exitonclick()

I really appreciate any input.

Advertisement

Answer

I tried out your code. The reason for the “divide by zero” occurs because your calculation of mean values occurs immediately after you have defined your “sum_of_x” and “sum_of_y” lists. So on the initial go, there are no data points in those lists and thus the numerator and denominator are going to be zero. As a test, I moved the calculation of those mean values after the retrieval of data from the file as noted in the following code snippet.

def plotregression():
    sum_of_x = []
    
    sum_of_y = []
    
    #slope =
    for line in filename:
        values = line.split()
        sum_of_x.append(int(values[0]))
        sum_of_y.append(int(values[1]))
        mean_of_x = sum(sum_of_x) / len(sum_of_x)  #doesnt work as intended
        mean_of_y = sum(sum_of_y) / len(sum_of_x)  #doesnt work as intended
        print('mean_of_x ', mean_of_x, 'mean_of_y ', mean_of_y)
        t.up()
        t.goto(int(values[0]), int(values[1]))
        t.down()
        t.stamp()
        t.down()

I just used some made up data points in placed them into a file named “data.txt” just to see if the program would run and it did. Not a very impressive image but it did produce output.

Sample Window

Hope that helps you out.

Regards.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement