Predicting Continuous Outcomes
Linear Regression
A statistical method used to model the relationship between a dependent variable (also known as the response variable) and one or more independent variables (also known as predictor variables).
- Dependent Variable ((Y)): The outcome we want to predict
- Independent Variable ((X)): The input used to make predictions
The goal is to find a linear equation that best predicts the dependent variable based on the values of the independent variables.
The Linear Regression Equation
In its simplest form, the linear regression equation is:
$$ Y = \beta_0 + \beta_1X + \epsilon $$
- $Y$: The dependent variable
- $X$: The independent variable
- $\beta_0$: The intercept, representing the value of $Y$ when $X$ is zero
- $\beta_1$: The slope, indicating how much $Y$ changes for a one-unit change in $X$
- $\epsilon$: The error term, accounting for the variation in $Y$ not explained by $X$
- Linear regression assumes a linear relationship between the dependent and independent variables.
- This means the change in $Y$ is proportional to the change in $X$.
Relationship Between Independent and Dependent Variables
- Independent Variables: These are the predictors. Their values are assumed to influence the dependent variable but are not influenced by it.
- Dependent Variable: This is the response. Its values are assumed to depend on the independent variables.
- Think of the independent variable as the cause and the dependent variable as the effect.
- For example, in predicting house prices, the size of the house (independent variable) influences the price (dependent variable).
Significance of the Slope and Intercept
The Intercept: $\beta_0$
- Represents the expected value of the dependent variable when the independent variable is zero.
- It is where the regression line crosses the y-axis.