Spatial models of county-level roadway crashes

 for Pennsylvania

 

Thesis Proposal

 

Prepared by

        Jonathan Aguero

     

 

For

Course CE 600 Thesis Research

Professor: Dr. Paul P. Jovanis

 

October 15, 2004


Contents

1      Introduction. 1

1.1       Background. 1

1.2       Literature Review.. 2

1.3       Research Summary. 4

2      Methodology. 5

2.1       The Poisson distribution. 5

2.2       The negative binomial distribution. 5

2.3       The spatial approach. 6

2.4       The Gaussian model 7

2.5       The auto Poisson model 7

2.6       The auto negative binomial model 8

3      Data Description. 9

3.1       Crash Data. 9

3.2       Socioeconomic Factors. 10

3.2.1        Age. 10

3.2.2        Sex. 10

3.2.3        Percent of Urban Population. 10

3.2.4        Percent of persons living under poverty. 12

3.2.5        Driving under the influence. 12

3.3       Transportation infrastructure related Factors. 12

3.3.1        Vehicle-Miles Traveled. 12

3.3.2        Number of miles by functional class. 13

3.3.3        Travel Time. 13

3.4       Environmental Factors. 13

4      Analysis of Preliminary results. 15

5      Future tasks. 18

5.1       Data Collection. 18

5.2       Weather surfaces creation. 19

5.3       Variables and functional forms selection. 19

5.4       Modeling Process. 20

5.5       Project Plan Schedule. 20

6      References. 22

 

Tables

Table 1  Description of variables. 15

Table 2 Linear models for natural log of fatal crashes (sample size = 67) 16

Table 3 Spatial Log-linear CAR model of Fatal Crashes 2002. 17

Table 4 Analysis of Parameter Estimates, Negative Binomial Model. 17

Table 5 Availability of data by year at October 15, 2004. 18

Table 6 Project Schedule. 21

 

Figures

Figure 1 Number of Fatal Crashes reported by County in the State of Pennsylvania Year 2000. 11

Figure 2 Predicted Total Precipitation Surface Year 2000. 14

 

 


Introduction

Background

According to a World Health Organization/World Bank report “The Global Burden of Disease”(Murray and Lopez, 1996), deaths from non-communicable diseases are expected to climb 77% from 1990 to 2020 (from 28.1 million to 49.7 million) and traffic accidents are the main cause of this rise. Road traffic injuries are expected to take third place in the rank order of disease burden by the year 2020.

Another study of the World Health Organization, the publication “Injury: A Leading Cause of the Global Burden of Disease” (1999), reports that the leading injury-related cause of death among people aged 15 to 44 years is traffic injuries. Of the 5.8 million people who died of injuries in 1998, 1,170,694 died as a direct result of injuries sustained in a motor vehicle accident.

It is clear that deaths due to car crashes is one of the biggest problems on public health around the world and developed countries are not an exception. Just in 2002, 42,815 persons died in 38,309 crashes in USA (FARS, 2004), 1614 of them in the state of Pennsylvania. 

Motor-vehicle crashes not only cause deaths but also injures ranging from minor to major or severe injures. In 2001, from the total reported crashes, 117,860 persons resulted injured just in the state of Pennsylvania (PennDOT, 2002). From this total, 5,039 persons had major injuries, 23,292 with moderate injuries, 76,796 with minor injuries and 12,733 with unknown type of injuries.

Given these facts it is clear that traffic safety must play an important roll in any transportation policy-maker group, and the U.S. Department of Transportation (USDOT) is not an exception. Safety is one of the five current strategic goals of the USDOT. Just in 2002, the budget for The National Highway Traffic Safety Administration (NHTSA), which is one of the multiple agencies related to traffic safety under the USDOT, reached the sum of 424 millions of dollars (NHTSA, 2004).  However, these resources are scarce, therefore, a better understanding of road crashes is needed in order to direct the resources to the most vulnerable areas in space (i.e. counties or districts), in type (i.e. intersections, rural highways, and ramps), and user groups (i.e. pedestrians and drivers).

Good engineering begins by understanding the problem at hand. As Haight (1988) commented on traffic safety, “Many of us have heard demands that we ‘do something,’ but it is only recently that there have been suggestions that we should ‘know what we are doing’ before we begin to do it.” Consequently, traffic safety modeling may be use as a tool to increase our level of understanding and knowledge about traffic crashes, hoping to finally reduce the impact of them in the society as a whole and into the users as individuals.

Although roadway crashes are by nature determined by individuals (e.g. the driver), it is practically impossible to study, at individual level, the influence of spatially defined factors like land use, demographic characteristics, and traffic volume, among others. In most of the roadway accident studies, the data is grouped in spatial units that range from intersection or road section level to zip code or county level. Several studies of road crashes at different area levels, ranging from local census tracks to counties, have been performed in recent times (Amoros et al, 2003, Miaou et al, 2003, Noland and Oh, 2004, Noland and Quddus, 2004, and MacNab, 2004). However, from these studies, only the works from Miaou et al and MacNab explicitly model spatial autocorrelation by using Bayesian modeling techniques.

The recent development of spatial modeling techniques has enabled researchers to investigate important issues related to risk estimation, unmeasured confounding variables, and spatial dependence (Richardson, 1992). In general, spatial correlation is analogous to temporal correlation where the dependence among observations produces higher variance of the estimates and therefore, underestimated standard errors if it is not recognized and addressed.

Other important advantage of spatial models is that spatial effects may reflect unmeasured cofounding variables.  This is particularly useful for unmeasured cofounders that vary in space like weather, population, and others.  More important yet, as MacNab (2004) mentioned, “The methods also facilitate spatial smoothing and data pooling when regions under investigation involve small-population areas.” Here the term ‘small-population areas’ refers to areas that present very few events given its rare-event nature, for example roadway crashes.

Literature Review

Previous research has deal with the spatial component of road crashes in different ways. For example authors like Levine et al (1995a) and Jones et al (1996) have modeled crashes as point events. Other recent studies like Shankar et al (1996), Amoros et al (2003), Miaou et al (2003), Noland and Oh (2004), and MacNab (2004) have modeled road crashes at different area levels, ranging from road sections to local census tracks or counties.

In the study by Levine et al (1995a), the crashes were geocoded to the nearest intersection or ramp. Once the location of the accident was assigned, different ‘spatial’ statistics were calculated including mean center, standard distance deviation based on “great circle” distance, the standard deviational ellipse (1st and 2nd principal component), and the Nearest Neighbor Index; based on the x and y coordinate of the accidents. The work was concentrated in developing spatial probability ellipses for different categories of crashes, i.e. all crashes or alcohol related, one, two or three or more vehicles involved, etc. The main shortcoming of this analysis is its descriptive rather than predictive approach. In addition, the statistical assumptions in which the work is tacitly based are commonly violated. For example, the assumption of normal spatial distribution of points in the x and y coordinate, which implies the absence of clustering that is clearly violated by crash data.     

The work by Jones et al (1996) is other example of the use of spatial point pattern analysis in traffic accidents. This paper presents a classical K-function analysis on the residuals of a logit model where the log-odds were selected to be fatalities as oppose to seriously injured.  The variables of the model were: age, type of user (pedestrian, bicyclist, Motor Vehicle driver) and number of casualties. With this Ad Hoc approach, the authors fund that, once the trend was removed from the data, the residuals presented clustering. Although this study includes the analysis of certain contributed factors as opposed to the work by Levine et al (1995a), it still failed to include the spatial correlation into the coefficient estimation directly. 

Levine et al (1995b) also estimate a spatial model at census block level. They estimated what is called a “spatial lag” model. The spatial lag model is defined as:

where Yi is an N by 1 vector of observations on the dependent variable for all locations, i, W(Yj) is a weighted matrix of N by 1 vector of values for the dependent variable summed over all locations j, where i≠j (the “spatial lag”), ρ is the coefficient of the spatial lag (the spatial autoregressive term), X is an N by K matrix of observations on the explanatory variables, β is a K by 1 vector of regression coefficients, and ε is an N by 1 vector of normally distributed random error terms, with mean 0 and constant variance σ2. The explanatory variables included in the model were: freeway crossing the block (dummy), miles of arterials or highways, miles of minor roads, miles of freeways, population, and employment. Even though, the model takes into account the spatial correlation of the data, it is based on the normally distributed assumption for number of crashes rather than Poisson or negative binomial distributions.

Other researcher that has worked in the analysis of crashes taking in consideration the spatial component is Robert B. Noland. In his work “A spatially disaggregate analysis of road casualties in England” (Noland and Quddus, 2004) the analysis is performed at “ward” (census track) level. Negative Binomial models with dependant variables total fatalities, serious injures, and slight injuries were estimated. The independent variables were classified into 4 different categories: land use indicator variables (employment and population density), road characteristics, demographic characteristics (age cohorts), and traffic flow proxies (proximate and total employment). Some of the limitations of this study are: the use of cross-sectional data, the use of proxy variables for traffic flow estimation, and the lack of spatial correlation analysis.

On other work with country-level data Noland and Oh (2004) estimated the expected number of crashes using infrastructure characteristics and demographic indicators as independent variables in a negative binomial model. Although this article presented an improvement with respect to the former study by using four years of data, the absence of traffic flow data and spatial correlation analysis persist. Similarly, the study by Amoros et al (2003) presents negative binomial models of road crashes incidence and severity at county level using road type and its interactions with county as independent variables.

The most advanced work in terms of spatial modeling of traffic crashes, within the knowledge of the author, was developed by Miaou et al (2003).  The authors developed a series of spatial models of crashes at county level for data from the state of Texas. Poisson-based full hierarchical Bayes models of Fatal (K), incapacitating (A), and non-incapacitating (B) injuries were estimated using both frequency and rate values (using VMT as an offset term). Conditional Auto-Regressive model (CAR) was used to model spatial correlation and Markov Chain Monte Carlo (MCMC) was used to sample the posterior probability distribution. The main drawback of this work is the use of the surrogate variables: percent of time that the road is wet, sharp horizontal curves, and road side hazards.

These variables were estimated by proportions of crashes. For example, for percent of time that the road is wet, the variable was estimated by dividing the number of crashes that occurred under wet pavement by the total number of crashes. These estimators are clearly biased in the direction of the effect. Given the poor definition of contributing factors in the model, it is likely that the spatial correlation is overestimated.

This project aims to estimate spatial models while controlling for known contributing factors of traffic crashes. It is expected that using better explanatory variables, like the ones proposed in this project, results in a better estimation of spatial interactions. In addition, hypothesis of possible contributing factors, that have not been tested before, like mean travel time to work or number of Driver Under the Influence Arrests will be tested.  Another interesting research question that will be addressed in the work is the effect that the inclusion of spatial-related variables like environmental and population-related variables has in the spatial correlation of the data. In other words the question is how much spatial correlation can be detected in the model, if any, once spatial distributed variables like population and weather are included into the model. 

Research Summary

In this study the contributing factors were divided into three main categories: socioeconomic, transportation infrastructure-related, and environmental factors. The former category attempts to describe the user characteristics while transportation infrastructure factor seeks to explain the system characteristics. Environmental factors include weather related variables like number of days with snow and total precipitation. User characteristics involve drivers and other vehicle occupants, as well as pedestrians and other users of the transportation facilities. Transportation related factors involve the characteristics of the transportation infrastructure like Vehicle-miles traveled and road mileage.

The purpose of this research is to develop spatial models of road crashes for the State of Pennsylvania at county level while controlling for socioeconomic, transportation related, and environmental factors. Different combinations of contributing factors and possible interaction terms will be tested. The work is organized as follows:  the next section presented the methodology to use, next the sources and nature of the data analyzed in this study will be described, followed by the presentation of the preliminary results, and finally a time table or program of activities is showed.


Methodology

The Poisson distribution

When data arise as counts, the Poisson distribution is typically used to model them. Traffic crashes are a clear example of count data; therefore, a Poisson distribution is a useful stating point. In words of Shankar et al (1995), it presents two important advantages: “(i) it lends itself well to modeling of count data by virtue of its discrete, nonnegative integer-distribution characteristics and (ii) can be generalized to more flexible distributional forms.” 

The probability function for a Poisson distribution is:

            , ,  z = 0,1,….                                                                                     (1)

where Pr(z) is the probability of z number of events and λ is the expected number of events.

Now, a model based on a Poisson distribution can be written as

                                                                                                                         (2)

where X is the vector of covariates and β is the vector of coefficients.

An important characteristic of the Poisson distribution is that its variance is equal to its mean.

The negative binomial distribution

Several authors including Shankar et al (1995), Noland and Quddus (2004), and Lord et al (2003) have argued that car crashes are best represented by negative binomial distributions. A negative binomial distribution is a nonnegative count distribution generated by a Poisson process with variance greater that the mean. This key feature makes the negative binomial distribution preferred over the Poisson distribution where the variance is equal to the mean. In the words of Shankar et al (1995) “It is well known, based on the findings of many previews research efforts, that accident frequency data tend to be overdispersed,…”

A negative binomial distribution can be considered as a Poisson distribution with mean λ, which is itself a gamma random variable. The model can be written as

                                                                                                                    (3)

where X is the vector of covariates, β is the vector of coefficients, and ε is the error term.

 

The marginal probability distribution, after mixing over λ, is:

                                   

                                    =  ,  z = 0,1,….                                  (4)

where Γ(§) is the Gamma function and β and μ are the parameters of the gamma random variable λ (Note that the parameter β in equation 2 is different from the vector of coefficients β in equation 1).  

The spatial approach

A general spatial model can be described as:

                                                                                                                        (5)

where s = (x,y) denotes the coordinates of a sample site, Z(s) denotes the variable of interest at the location s and D a set of spatial locations at which data can be obtained.

A lattice process is a finite collection of n elements D = {s1, s2, ···,sn}.

Now the objective is to build a model for the join distribution of the data

                        Z(s1), Z(s2), ···,Z(sn).

For that a class of auto­-models will be consider.

Just as we used conditional distributions to model time series data, we will use conditional distributions to model spatial data on a lattice. Assume:

            ;                     (6)

that is, the conditional probability of Z(si) = zi given the realization of the data at all remaining sites depends only on the data at sites sj belonging to the collection of sites Ni.

A site sj is defined to be a neighbor of site si if the conditional distribution of Z(si) given the data at all remaining sites depends on the realization zj of  Z(sj):

            Ni = {j : sj is a neighbor of si}

The neighboring structure can be built in multiple ways, depending on the research objectives. In the case of this study, given the irregular nature of the lattice (counties), it is convenient to define a neighbor of the county si to any county sj that shares boundaries with si.

Then, {Z(si) : i = 1, 2, …, n} is a Markov Random Field if the conditional distribution of the data at any given site, given the realization of the data at the remaining sites, depends only on the realization of the data at the neighbor sites.

The Gaussian model

Assuming that conditional on, the data Z(sj) is Gaussian distributed, and that there are pairwise spatial interactions only; the conditional distribution of Z(sj) is:

                                                                        (7)

where cij are the spatial autocorrelation coefficients.

Maximum likelihood estimation requires the join distribution of the data vector z. This join distribution is:

                                                                                                         (8)

where I is the identity matrix and C is the connectivity matrix base on the defined neighboring structure.

In a multiple regression model

                                                                                                                                   (9)

where X is the vector of covariates and β is the vector of coefficients.

Then

                                                                                                      (10)

and the expected value of  is

                                                                      (11)

Now, recalling the model proposed by Levine et al (1995b):

                                                                                                           (12)

where W(Yj) is a weighted matrix of N by 1 vector of values for the dependent variable summed over all locations j.

If  cij = ρ, a single spatial autocorrelation coefficient, equations (9) and (10) take the same form, therefore, the method proposed by Levine et al (1995b) is just a especial case of the Conditional Autoregressive Model (CAR).

The auto Poisson model

The auto Poisson conditional specification (assuming pairwise-only dependence between sites is (Cressie, 1993):

,                                              (13)

where

,                                                                         (14)

where θij = θji , θii = 0 and θ’s are the spatial autocorrelation coefficients.

If trend or large-scale variation is introduced, then:

            α = Xβ,                                                                                                                        (15)

where, as in equation 1, X is the vector of covariates and β is the vector of coefficients.

The auto negative binomial model

An auto negative binomial model is a conditional specified spatial model that, like a negative binomial model, describes count processes with overdispersion but takes into account spatial correlation amount different sites. The conditional specification of an auto negative binomial model can be defined as (Cressie, 1993):

            ,           (16)

 = 0, 1, 2,….

If pairwise-only dependence between sites is assumed, then

            ,                                                      (17)

where θij = θji , θii = 0 and θ’s are the spatial autocorrelation coefficients.

Finally, if trend or large-scale variation is introduced, then α = Xβ, where, as in equation 1, X is the vector of covariates and β is the vector of coefficients.

Given this conditional distribution, one can find the join distribution which is required for maximum likelihood estimation of model parameters.

Data Description

As mentioned before, in this study the risk factors will be divided into three main categories: socioeconomic, transportation infrastructure related, and environmental factors.

Among socioeconomic factors, the following will be study in this work:

The transportation related factors are:

The environmental factors are:

The data will be collected from many different sources including US Census Bureau, Pennsylvania Department of Transportation, and the National Climatic Center (NOAA). Following, there is and explanation of each of the variables consider in the analysis including the specific source of each variable.

Crash Data

The crash data consist on Fatal Crashes Reports from FARS database. FARS is the Fatal Accident Report System, part of the National Center for Statistics and Analysis of the National Highway Traffic Safety Administration. The system is presented in a Web-Based Encyclopedia format, accessible from the Internet for querying from years 1994 to 2002 and also in database format for download from years 1974 to 1993. The database includes registers for each crash, vehicle, and person involved into a fatal crash. The system defines a Fatal Crash as “A police-reported crash involving a motor vehicle in transport on a trafficway in which at least one person dies within 30 days of the crash” (FARS, 2004).  Figure 1 presents the map of fatal crashes by county in the year 2000.

Socioeconomic Factors

These factors try to explain differences in risk of car crashes that persons are subject of given their individual differences in the social and economical areas. These may be described as the ‘human factors’ that contribute to car crashes. Different authors like Shinar (1978) and Evans (1991) have suggested factors including age, sex, and personality to explain car crashes risk. In the case of this work, the socioeconomic factors taken into consideration try to reflect these individual differences at county aggregate level.

Age

Different authors, including Evans (1991) and Kam (2003), have shown that young and old drivers have higher risk of car accidents. Therefore, the variables percent of persons between 16 and 24 and percent of persons over 65 will be included in the analysis. However, it may be remark that these variables correspond to the total population rather than the population of drivers but this is the only data available.

Younger population groups are often associated with higher risk of road accidents (Noland and Qudduss, 2004). The higher risk is associated with higher exposition given the lack of awareness about the danger of the roads and the condition of pedestrian. This situation is also expected in elderly pedestrians with may be reflected into the variable percent of persons over 65.

The source of data for percent of persons under each age cohort is the US Census Bureau.

Sex

Sex has been found to be a key factor on car crashes risk by Evans (1991) and Kam (2003). According to their findings, male drivers have higher crash risk than females. In the case of this study, the percent of males will be used to capture this effect. Again, it must be highlight that the variable is based on the whole population instead of the populations of drivers.

The source of data for percent of males is the US Census Bureau.

Percent of Urban Population

Although the percent of urban population is clearly a socioeconomic variable, it indicates the level of urbanization of the county and therefore it is an indicator of land use intensity which is more related to transportation infrastructure. Higher land use intensities are normally associated to higher car accidents risks. The results of Noland and Qudduss (2004) showed that land use is associated with car accidents.

The source of data for percent of urban population is the US Census Bureau.

Figure 1 Number of Fatal Crashes reported by County in the State of Pennsylvania Year 2000

 


 

Percent of persons living under poverty

In this study the percent of persons living under poverty is used as an indicator of area deprivation. Area deprivation has been found to be positively related to car crashes by Chichester et al (1998), Abdalla et al (1997), and Noland and Qudduss (2004).

The source of data for percent of persons living under poverty is also the US Census Bureau, specifically, Small Area Income & Poverty Estimates Office.   

Driving under the influence

According to Evans (1991) “From the earliest days of motorization, alcohol has been recognized as a factor leading to increased crash risk.” Gary et al (2003) found that in 1998 dry counties in Kentucky (those were alcohol sell is prohibited) had fewer alcohol related traffic crashes and fewer driving under influence (DUI) arrests per 1000 licensed drivers. For this paper the number of DUI arrests was used to related alcohol consumption and crashes frequency.

The information on number of DUI arrest was supplied by the Uniform Crime Reporting Unit of the Pennsylvania State Police.

Transportation infrastructure related Factors

In transportation related factors, vehicle-miles traveled is often used as exposure indicator (Miaou et al, 2003) along with the number of miles of different functional classes per county (Noland and Qudduss, 2004). In addition to these factors the mean travel time to work will be tested. The hypothesis is that higher travel times to work results in higher exposure and therefore higher risk.

Vehicle-Miles Traveled

Vehicle-Miles Traveled (VMT) is a performance measure related to the level of usage of a particular section or group of highways. It is evident that the crash risk increases with increases on VMT because of the increase of exposure. Miaou et al (2003) uses VMT by county, among other variables, to predict crash frequency in the state of Texas. A different approach is to use VMT as denominator or normalizing variable for the dependent variable (i.e. number of crashes by VMT). However, in this study the number of VMT will be modeled explicitly as one of the risk factors to quantify its contribution to crash risk comparing with the others risk factors analyzed in the model.

The Daily Vehicle-Miles Traveled (DVMT) was obtained from the Annual Highway Statistics Report published by PennDOT.

Number of miles by functional class

Different functional classes have different design and operational standards, with higher standards for higher functional classes. Therefore, it is expected that higher functional categories will be associated with fewer crashes. Noland and Qudduss (2004) tested this effect for roads in England; however, they could not find statistically significant differences in crash frequency attributable to the roadway functional classification.  In this work it will be test whether or not the number of miles of higher functional class roads results significant with respect to crash risk.

The number of miles by functional category was also obtained from the Annual Highway Statistics Report published by PennDOT.

Travel Time

The mean travel time to work by county is expected to be related to higher crash risk because of the increase on exposure. In addition, higher mean travel times to work are associated with more intensive land uses. Dense urban areas often present high levels of traffic congestion which is related with higher travel times. In the knowledge of the authors, there is not an antecedent of use of travel time as predictor of crash risk; therefore, this variable is one of the most interesting ones in the analysis.

The data on travel time to work were derived from answers to long-form questionnaire of the US Census applied to one of each six households in the years 1990 and 2000. The elapsed time includes time spent waiting for public transportation, picking up passengers in carpools, and time spent in other activities related to getting to work. The dataset is available in the US census web page (US Census Bureau, 2004).

Environmental Factors

Many environmental factors can be associated with higher crash risk. Some examples of environmental factors are rain, snow and darkness. The higher risk may be associated to the reduction of driver’s performance (i.e. sight distance) and also the vehicle’s performance (i.e. wet pavement).

Given the data accessibility constrains of the project, the environmental factors taken into consideration are solely weather related. Weather related factors have been investigated in the past by Shankar et al (1995) and Edwards (1996). Both studies found positive correlation between weather hazards and crash frequency. 

The amount of and number of days of rain and snow will be analyzed into the model.  The data source for these variables is the National Climatic Data Center (NCDC) of the National Oceanic and Atmospheric Administration (NOAA). Hundreds of weather stations will be used to generate predicted surfaces for each variable and then the variables will be summarized at county level for including a single data value by county into the database. An example of this is presented in Figure 2 that shows the predicted total precipitation surface for the year 2000.

 

Figure 2 Predicted Total Precipitation Surface Year 2000


Analysis of Preliminary results

A previous unpublished work by the author (Aguero, 2004) developed a log-linear relationship between fatal crash frequency and some of the predictor variables mentioned before. The variables included into the analysis were:

 

Table 1  Description of variables.

Dependent Variable

Lcrashes

Ln of total fatal crashes in 2002

Socioeconomic Variables

P_pov

Percent of population under poverty in 2000

P16

Percent of population under 16 in 2000

P16_24

Percent of population between 16 and 24 in 2000

P65

Percent of population over 65

Pmales

Percent of males in 2000

P_urban

Percent of urban population in 2000

LDUI

Ln of Driving Under Influence Arrests in 2002

Transportation Related Variables

LDVMT

Ln of Daily Vehicle-Miles Traveled in 2002

Lfed_aid

Ln of miles of federal aid roads in 2002

Lnonfed_aid

Ln of miles of non-federal aid roads in 2002

Ltravel_t

Ln of Mean travel time to work (minutes), workers age 16+, 2000

Ltotal

Ln of miles of roads (federal and non-federal aid) in 2002

Pfed_aid

Percent of miles of federal aid roads in 2002

 

Although, biased estimators are expected given the misspecification of the model (log-normal linear regression instead of a count data model like Poisson or negative binomial), this model can be seen as a good first approximation of the problem. Table 2 presents the results.

The results from table 2 are promising.  Not only do the models indicate a very good fit to the data, but also many variables of interest are statistically significant. However, it seems to be necessary to incorporate more data. The actual sample size is just 67 because just one year was used. The goal is to include data for at least one or two more years, depending on data availability.  

 

 


Table 2 Linear models for natural log of fatal crashes (sample size = 67)


 

MODEL 1

MODEL 2

MODEL 3

MODEL 4

Variable

Estimate

Estimate

Estimate

Estimate

S.E

S.E

S.E

S.E

p-value

p-value

p-value

p-value

Intercept

-12.2579

-9.6169

-9.5382

-9.6221

4.7518

1.4727

1.4641

1.4311

0.0126

0.0000

0.0000

0.0000

P_pov

0.0564

0.0430

0.0423

0.0412

0.0198

0.0174

0.0173

0.0168

0.0062

0.0163

0.0178

0.0177

P16

0.0165

 

 

 

 

 

 

 

 

 

0.0601

0.7851

P16_24

-0.0095

 

 

 

 

 

 

 

 

 

0.0352

0.7874

P65

-0.0250

 

 

 

 

 

 

 

 

 

0.0485

0.6072

Pmales

0.0595

 

 

 

 

 

 

 

 

 

0.0433

0.1753

P_urban

0.0020

 

 

 

 

 

 

 

 

 

0.0044

0.6545

LDUI

0.1884

0.2473

0.2420

0.2325

0.1272

0.1064

0.1058

0.1011

0.1443

0.0238

0.0261

0.0253

LDVMT

0.4555

0.4547

0.4481

0.4376

0.1701

0.1374

0.1373

0.1326

0.0097

0.0017

0.0019

0.0017

Lfed_aid

0.0717

-0.0459

 

 

 

 

 

 

0.2758

0.2446

0.7958

0.8516

Lnonfed_aid

0.2669

0.3132