Linear regression doesn't return the expected number of $\beta_i$

I have a dataset of precincts and results of parties on different elections. After reading this article I really wanted to use linear regression to answer the question : how did voters changed their mind since last elections ?

Unnamed: 0  Map Level   Precinct ID Precinct Name   Election    Invalid Ballots (%) More Ballots Than Votes (#) More Votes Than Ballots (#) Total Voter Turnout (#) Total Voter Turnout (%) ... Average votes per minute (17:00-20:00)  CDM ED  FG  GD  LP  NR  UNM Results others
0   0   Precinct    1   63-1    2008 Parliamentary  0.0 0.0 0.0 749 62.11   ... 1.01    0.0 0.0 0.0 0.0 0.0 0.0 77.17   United National Movement    22.83
1   1   Precinct    10  63-10   2008 Parliamentary  0.0 0.0 0.0 419 70.42   ... 0.61    0.0 0.0 0.0 0.0 0.0 0.0 71.12   United National Movement    28.87
...
136 159 Precinct    8   63-1    2013 Presidential   1.75    0.0 0.0 506 50.75   ... 0.52    2.96    0.20    0.00    0.00    1.19    0.00    0.00    Giorgi Margvelashvili   95.65
137 160 Precinct    9   63-10   2013 Presidential   2.50    0.0 0.0 625 48.04   ... 0.66    1.92    0.80    0.00    0.00    1.60    0.00    0.00    Giorgi Margvelashvili   95.68

Where a given precinct is provided in Precinct Name.

To understand which voters changed their mind one can build a very simple model. You can simplify elections to an N party system, by dropping all parties that are not of your interest (or got less than an amount of votes both in the first and second elections). Then if you make the assumption that all people that voted similarly in 2014, will change their mind in the same way in 2019. More specifically, people that voted for party Pᵢ in 2008, have the same probability of voting for party Pᵣ in 2013. (I call this probability Xᵢᵣ)

So, for a given precinct in order to "explain" or "predict" the number of votes Vᵣ²⁰¹⁹ for party Pᵣ in 2013, based on the 2008 results I can use the probabilities Xᵢᵣ (or $\beta_i$ in the more classic notation) as follows:

$$V_r^{2013} = \sum_i V_i^{2008}\times X_{ir} $$

It is a simple linear regression. So, as far as we have 7 parties the result should be for each $X_{r}$ (the set of probabilities to vote for a party $r$) an array of size 7. However. With the linear regression model I'm showing just after it is not the case.

So I tried to implement the model, it is in Python 3:

def error(x_i,y_i, beta):
    return y_i - predict(x_i, beta)

def squared_error(x_i, y_i, beta):
    return error(x_i, y_i, beta)**2

def squared_error_gradient(x_i, y_i, beta):
    """the gradient (with respect to beta)
    corresponding to the ith squared error term"""
    return [-2 * x_ij * error(x_i,y_i, beta)
           for x_ij in x_i]

def predict(x_i, beta):
    # x_i.insert(0,1)
    """assumes that the first element of each x_i is 1"""
    return dot(x_i, beta)

def dot(v, w):
    """v_1 * w_1 + ... + v_n * w_n"""
    return sum(v_i * w_i for v_i, w_i in zip(v, w))

def in_random_order(data):
    """generator that returns the elements of data in random order"""
    indexes = [i for i, _ in enumerate(data)] # create a list of indexes
    random.shuffle(indexes) # shuffle them
    for i in indexes: # return the data in that order
        yield data[i]

def minimize_stochastic(target_fn, gradient_fn, x, y, theta_0, alpha_0=0.01):
    data = zip(x, y)
    theta = theta_0 # initial guess
    alpha = alpha_0 # initial step size
    min_theta, min_value = None, float("inf") # the minimum so far
    iterations_with_no_improvement = 0

    # if we ever go 100 iterations with no improvement, stop
    while iterations_with_no_improvement  100:
        value = sum( target_fn(x_i, y_i, theta) for x_i, y_i in data )
        if value  min_value:
            # if we've found a new minimum, remember it
            # and go back to the original step size
            min_theta, min_value = theta, value
            iterations_with_no_improvement = 0
            alpha = alpha_0
        else:
            # otherwise we're not improving, so try shrinking the step size
            iterations_with_no_improvement += 1
            alpha *= 0.9
            # and take a gradient step for each of the data points
        for x_i, y_i in in_random_order(data):
            gradient_i = gradient_fn(x_i, y_i, theta)
            theta = vector_subtract(theta, scalar_multiply(alpha, gradient_i))
    return min_theta

def estimate_beta(x,y):
    beta_initial = [random.random() for x_i in x[0]]
    return minimize_stochastic(squared_error,
                              squared_error_gradient,
                              x,y,
                              beta_initial,
                              0.001)

For instance let say we have one election in 2008 and one election in 2013:

x = [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [77.17], [22.83]] # each arrangement within this arrangement is the % of people who voted for a party in 2008
y = [[0.35], [0.35], [0.0], [0.0], [2.43], [0.0], [0.0], [96.87]] # each number is the % of people who voted for a party in 2013
random.seed(0)
random.seed(0)
probabilities = [estimate_beta(x,y_i)for y_i in y]
print(probabilities)

It returns :

[[0.8444218515250481], [0.7579544029403025], [0.420571580830845], [0.25891675029296335], [0.5112747213686085], [0.4049341374504143], [0.7837985890347726], [0.30331272607892745]]

I was expecting as many values as parties in each array.

Topic linear-regression python

Category Data Science


It is a little hard to understand your question without seeing the data. However, let's recall how linear regression works. A simple model with one independent variable $x_1$ looks like:

$$ y = \beta_0 + \beta_1 x_{1,i}+u_i.$$

Here we have one independent variable $x_1$, $u_i$ is the error term. For this model we have two coefficients: $\beta_0$ is the intercept and $\beta_1$ is the coefficient for $x_1$. So for one predictor (independent variable), we have two coefficients.

Of course one can add more $x$ to the regression. For $n$ independent variables, we get $n+1$ coefficients.

Note that if you use indicator variables (aka dummies), which are equalt to $1$ if true and $0$ otherwise, you need to define a base category. A model with one continuous variable $x_1$ and one indicator $x_2$ would look like:

$$ y = \beta_0 + \beta_1 x_{1,i}+\beta_2 x_{2,i}+u_i.$$

In this case $\beta_0$ is the intercept iff $x_2=0$ and $\beta_0 + \beta_2$ is the intercept iff $x_2=1$. However, still you have $n+1$ coefficients in the model.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.