Using the MLB data set, we will be studying logistic regression using a binary Y variable (what we are predicting) using two X predictor variables. First, I needed to create a binary variable because this data set has no binary variables. I decided to name the binary variable “fivehundred” to showcase teams with a batting average above .270. Using the formulas necessary to create this binary variable, I found 121 instances where a team hit above .270 in a season.

Now using this binary variable I was able to run a logistic regression equation to predict whether a team will hit above .270 based on hits and RBI’s (runs batted in).

The image bellow shows that all my coefficients used in the logistic regression equation are significant, meaning that hits and RBI’s are good predictors of whether a team will hit above .270.

Using this data I am able to create a model equation of log(p/(1-p))=-54.54+.015*Hits+.04*RBI. This means that if a team where to have no hits or no RBI’s (0), then batting average would be 54.54 bellow .270, or .215. However, if a team were to get 400 more hits and 400 more RBI’s batting average would be expected to increase by 21.46. This means that a teams batting average would be .291 rather than .270.