Loglinear Models: Cargo Ship Damage This data set concerns a type of damage caused by waves to the forward section of certain cargo-carrying vessels. For the purpose of setting standards for hull construction we need to know the risk of damage associated with three classifying variables: 1. Ship Type: A-E. Labelled as 'type' in datafile. 2. Year of Construction: 1960-64, 1965-69, 1970-74, 1975-79. Labelled as 'cons' in datafile. 3. Period of Operation: 1960-74, 1975-79. Labelled as 'op' in datafile. We are also given another explanatory variable: the aggregate number of months service, or total period at risk, for each ship. Labelled as 'ags' in datafile. The response variable is the number of damage incidents for each ship. Labelled as 'damage' in datafile. Some of these values are missing, and are labelled as NA. Can we come up with any models that estimate the number of damage incidents as a function of the other variables? > ships.dat[1:3,] type cons op ags damage 1 A 60 60 127 0 2 A 60 75 63 0 3 A 65 60 1095 3 To deal with the year groupings, we will treat 'cons' and 'op' as qualitative variables. To make sure R thinks the same way, we use a command called factor to make these variables qualitative: > ships.dat$cons <- factor(ships.dat$cons) > ships.dat$op <- factor(ships.dat$op) # Fit a loglinear model. Need to use family = poisson option. > ships.fit <- glm(formula = damage ~ type + cons + op + ags, family = poisson, data = ships.dat) > summary(ships.fit) # Some output deleted Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.786e-01 2.771e-01 0.645 0.519201 typeB 6.701e-01 2.172e-01 3.085 0.002034 ** typeC -1.192e+00 3.275e-01 -3.638 0.000275 *** typeD -8.294e-01 2.877e-01 -2.883 0.003941 ** typeE -1.493e-01 2.349e-01 -0.636 0.524853 cons65 1.087e+00 1.792e-01 6.067 1.30e-09 *** cons70 1.500e+00 2.247e-01 6.673 2.50e-11 *** cons75 8.545e-01 2.759e-01 3.097 0.001955 ** op75 7.284e-01 1.357e-01 5.369 7.93e-08 *** ags 6.697e-05 8.523e-06 7.857 3.92e-15 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 614.539 on 33 degrees of freedom Residual deviance: 70.498 on 24 degrees of freedom (6 observations deleted due to missingness) AIC: 188.36 Number of Fisher Scoring iterations: 5 # If overdispersion suspected, we can estimate the dispersion # and use it in "summary" > summary(ships.fit, dispersion = ships.fit$deviance/ships.fit$df.residual) Estimate Std. Error z value Pr(>|z|) (Intercept) 1.786e-01 4.749e-01 0.376 0.70685 typeB 6.701e-01 3.722e-01 1.800 0.07184 . typeC -1.192e+00 5.613e-01 -2.123 0.03377 * typeD -8.294e-01 4.931e-01 -1.682 0.09256 . typeE -1.493e-01 4.025e-01 -0.371 0.71062 cons65 1.087e+00 3.072e-01 3.540 0.00040 *** cons70 1.500e+00 3.852e-01 3.894 9.87e-05 *** cons75 8.545e-01 4.729e-01 1.807 0.07077 . op75 7.284e-01 2.325e-01 3.132 0.00173 ** ags 6.697e-05 1.461e-05 4.584 4.55e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 2.937403) Null deviance: 614.539 on 33 degrees of freedom Residual deviance: 70.498 on 24 degrees of freedom # Remainder of output deleted Now, compare the predicted number of damage incidents for types B and E, where both are constructed in the 1975 period, operated in the 1975 period, and had 2000 months service. > ship.est <- data.frame(type = c("B", "E"), cons = factor(c(75,75)), op = factor(c(75,75)), ags = c(2000, 2000)) > ship.est type cons op ags 1 B 75 75 2000 2 E 75 75 2000 > predict(ships.fit, newdata = ship.est, se.fit = T, dispersion = ships.fit$deviance/ships.fit$df.residual, type="response") $fit: 1 2 13.006401 5.731907 # B predicted to have about 13 incidents, E about 6 $se.fit: 1 2 4.429120 2.341936 $residual.scale: [1] 1.713885 ####################################################### Can still use commands like fitted, residuals, anova, etc.