In this section, you learn to:
Install and activate the R packages for specific computations of this lab such as fitdistrplus and actuar. Some additional generic R packages are needed such as knitr, data.table, etc. Recall that this R lab is written in Bookdown setting and installing R packages should be done directly from RStudio. If you run the R code in a normal fashion, use e.g. install.packages(“knitr”) to install the R package knitr. Activating R packages is possible by individual activation, e.g. library(knitr), or by activating multiple R packages in one go, e.g.
\[ libraries(c("fitdistrplus", "actuar", "knitr", "data.table")), \] but the latter requires a previous installation of the R package easypackages.
#install.packages("knitr") ## example of individual R package installation
#library(knitr) ## example of individual R package activation
#suppressWarnings(library(easypackages))
#library(easypackages)
#libraries(c("fitdistrplus", "actuar", "knitr", "data.table", "MASS", "survival")) ## example of multiple R package activation in one go
suppressWarnings(library(actuar, warn.conflicts = FALSE)) ## warning messages from activating "actuar" package are ignored
suppressWarnings(library(MASS))
suppressWarnings(library(survival))
suppressWarnings(library(fitdistrplus))
suppressWarnings(library(grDevices))
suppressWarnings(library(knitr))
suppressWarnings(library(data.table))
The purpose of this section is to describe the data to a general audience. Our insurance data are given as `Building insurance claim size’ and have a sample of size \(1,502\) of \[(x_{i1}, x_{i2}, x_{i3},y_{i1},y_{i2}) \quad \text{with}\; 1\le i \le 1,502,\] where \((x_{i1}, x_{i2}, x_{i3})\) are the covariates sampled from \((X_1,X_2,X_3)\) that help to predict the target variables \((Y_1,Y_2)\) via the observed sample \((y_{i1}, y_{i2})\). Recall that \((Y_1,Y_2)\) represents the building and content claim that are recorded as Building Insurance Claim Size and Content Insurance Claim Size variables. Moreover, \((X_1,X_2,X_3)\) provides redundant information, since X_1 is the Recorded time variable that contains the information given by X_2 and X_3 which records the claim occurrence time (year and month, respectively). That is,
The feature values \(\textbf{x}\)’s include information about the following
1980
to 1990
;1
representing January to 12
representing December.The data file is named Bivariate_Fire_Insurance_Clean_Data.csv
and a small snapshoot
of the data is given in Table 1.1. A full Exploratory Data Analaysis is provided in Section 1.3, which explains the sample in greater details.
Our insurance data are given as `Building insurance claim size’ and has the following structure:
Bivariate_Fire_Insurance_Clean_Data <- read.table("Bivariate_Fire_Insurance_Clean_Data.csv",header=TRUE,as.is=TRUE,sep=",")
kable(head(Bivariate_Fire_Insurance_Clean_Data[,1:5]), caption="Fire Insurance Data Snapshoot",align = "rrrrr", format.args = list(big.mark = ","))
Recorded.time | Building.Insurance.Claim.Size | Content.Insurance.Claim.Size | Year | Month |
---|---|---|---|---|
03/01/1980 | 1,098,096.6 | 585,651.5 | 1,980 | 1 |
04/01/1980 | 1,756,954.6 | 336,749.6 | 1,980 | 1 |
07/01/1980 | 1,244,509.5 | 3,367,496.0 | 1,980 | 1 |
10/01/1980 | 4,452,039.5 | 4,273,234.0 | 1,980 | 1 |
10/01/1980 | 2,494,875.5 | 3,543,192.0 | 1,980 | 1 |
16/01/1980 | 775,689.6 | 993,117.1 | 1,980 | 1 |
Bivariate_Fire_Insurance_Clean_Data_toy <- read.table("Bivariate_Fire_Insurance_Clean_Data.csv",header=TRUE,as.is=TRUE,sep=",")
colnames(Bivariate_Fire_Insurance_Clean_Data_toy) <- c("X1", "Y1", "Y2","X2", "X3")
#kable(head(Bivariate_Fire_Insurance_Clean_Data_toy), caption="Fire Insurance Data Snapshoot",align = "rrrrr", format.args = list(big.mark = ","))
summary(Bivariate_Fire_Insurance_Clean_Data_toy[,2:5])
Y1 Y2 X2 X3
Min. : 23191 Min. : 825 Min. :1980 Min. : 1.000
1st Qu.: 878753 1st Qu.: 268458 1st Qu.:1983 1st Qu.: 4.000
Median : 1224291 Median : 482160 Median :1986 Median : 7.000
Mean : 1871507 Mean : 1629436 Mean :1986 Mean : 6.525
3rd Qu.: 1928640 3rd Qu.: 1188155 3rd Qu.:1988 3rd Qu.: 9.000
Max. :95168375 Max. :132013200 Max. :1990 Max. :12.000
Note that this R lab aims to provide a prediction model only the Building claims (\(Y_1\)), since the Content claims (\(Y_2\)) could be modelled in the same fashion. Keep in mind that \(Y_1\) and \(Y_2\) are dependent random variables.
suppressWarnings(library(corrplot))
par(mfrow=c(1,2))
### yearly correlation plot
coryear <- c()
for(i in 1980:1990){coryear <- c(coryear, cor(Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X2==i),c("Y1","Y2")])[1,2])}
names(coryear) <- 1980:1990
plot(1980:1990,coryear, main="Yearly Correlation Plot", ylab="Corr", xlab="year", type="p", col="red")
### monthly correlation plot
cormonth <- c()
nmonth <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
for(i in 1:12){cormonth <- c(cormonth, cor(Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X3==i),c("Y1","Y2")])[1,2])}
names(cormonth) <- nmonth
plot(1:12,cormonth, main="Monthly Correlation Plot",ylab="Corr", xlab="month", type="p", col="red",xaxt="n")
axis(side=1,at=1:12,labels=nmonth)
It is quite difficult to explain the yearly correlations from Figure 1.1, but the monthly correlations are quite intuitive. The Building and Content claims are highly correlated over the hot weather – as it could be seen in July and August – when concomitant large claims are more likely to occur during that period of time.
The purpose of this section is to explore the trends in the data through descriptive tools, which could help with selecting the most appropriate data analysis through statistical evidence which is done in Section 3.
Recall that the strength of association between the Building and Content claims has been examined in Section 1.2, namely Figure 1.1. Since our real data analysis from Section 3 will be focused on Building claims, less detailed qualitative data explorations are made for Content claims in Section 1.3. We mainly look at how homogeneous the claims are over the time, though statistical evidence would be required to confirm the findings from this section.
First, we look at two timeplots, namely Figures 1.2 and 1.3, which show the evolution over time of the Building and Content claims, receptively.
Bivariate_Fire_Insurance_Clean_Data_toy$X1 <- as.Date(Bivariate_Fire_Insurance_Clean_Data_toy$X1, "%d/%m/%Y")
library(ggplot2)
library(dplyr)
library(plotly)
library(hrbrthemes)
pp1 <- Bivariate_Fire_Insurance_Clean_Data_toy %>%
ggplot( aes(x=X1, y=Y1)) +
geom_area(fill="#00ffff", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("Building Insurance Claim Size (Y1)") +
xlab("Recorded time (X1)")+
theme_ipsum()
pp1 <- ggplotly(pp1)
pp1
#Bivariate_Fire_Insurance_Clean_Data_toy$X1 <- as.Date(Bivariate_Fire_Insurance_Clean_Data_toy$X1, "%d/%m/%Y")
library(ggplot2)
library(dplyr)
library(plotly)
library(hrbrthemes)
pp1 <- Bivariate_Fire_Insurance_Clean_Data_toy %>%
ggplot( aes(x=X1, y=Y2)) +
geom_area(fill="#84ff00", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("Content Insurance Claim Size (Y2)") +
xlab("Recorded time (X1)")+
theme_ipsum()
pp1 <- ggplotly(pp1)
pp1
Figures 1.2 and 1.3 give an overall understanding of the claims, and we observe some outliers, i.e. very large claims as compared to the common claims. We should be mindful of such claims when fitting the claim amount distribution as such atypical claims may affect the robustness of our estimation.
The next step is to look at some classical histograms to understand whether the claims’ pattern show unusual patterns in some years and/or months. Figures 1.4 and 1.5 show the yearly histograms for the Building and Content claims, respectively.
par(mfrow=c(3,4))
for(i in 1980:1990){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X2==i),]
hist(xxx$Y1,main="", xlab=i, border="red", col="green")}
hist(Bivariate_Fire_Insurance_Clean_Data_toy$Y1, main="", xlab="All Years", border="red", col="green")
par(mfrow=c(3,4))
for(i in 1980:1990){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X2==i),]
hist(xxx$Y2,main="", xlab=i, border="red", col="green")}
hist(Bivariate_Fire_Insurance_Clean_Data_toy$Y2, main="", xlab="All Years", border="red", col="green")
Figures 1.6 and 1.7 show the monthly histograms for the Building and Content claims, respectively.
par(mfrow=c(3,4))
nmonth <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
for(i in 1:12){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X3==i),]
hist(xxx$Y1,main="", xlab=nmonth[i], border="red", col="green")}
par(mfrow=c(3,4))
nmonth <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
for(i in 1:12){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X3==i),]
hist(xxx$Y2,main="", xlab=nmonth[i], border="red", col="green")}
It is quite clear that interpreting numerous histograms – displayed as Figures 1.4 to 1.7 – is not convenient, and thus, we now report summary statistics in some radar plots that could help gathering and comparing the yearly and monthly risks with ease.
The following radar plots are provided for the Building Insurance Claim Size (Y1):
Note that the claims are divided by 1,000,000, i.e. all measures of risks – mean, standard deviation and quantiles – are scaled down by a factor of 1,000,000.
#Bivariate_Fire_Insurance_Clean_Data_toy <- read.table("Bivariate_Fire_Insurance_Clean_Data.csv",header=TRUE,as.is=TRUE,sep=",")
#colnames(Bivariate_Fire_Insurance_Clean_Data_toy) <- c("X1", "Y1", "Y2","X2", "X3")
meanyear <- numeric(11)
sdyear <- numeric(11)
for(i in 1980:1990){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X2==i),]
meanyear[i-1979] <- mean(xxx$Y1)/1000000
sdyear[i-1979] <- sd(xxx$Y1)/1000000}
library(plotly)
fig1eY1 <- plot_ly(
type = 'scatterpolar',
r = c(meanyear, meanyear[1]),
theta=paste0("year ",as.character(c(1980:1990,1980))), #the labels are forced to be characters, so the setting from below does not work
# theta = c("1980", "1981", "1982", "1983", "1984", "1985", "1986'", "1987", "1988", "1989", "1990", "1980"),
#fill = 'toself',
linetype = I("dot"),
name = 'Average Claim/1,000,000'
)
fig1eY1 <- fig1eY1 %>% add_trace(
type = 'scatterpolar',
r = c(sdyear, sdyear[1]),
theta=paste0("year ",as.character(c(1980:1990,1980))),
#fill = 'toself',
linetype = I("dotdash"),
name = 'Standard Deviation Claim/1,000,000'
)
fig1eY1
#Bivariate_Fire_Insurance_Clean_Data_toy <- read.table("Bivariate_Fire_Insurance_Clean_Data.csv",header=TRUE,as.is=TRUE,sep=",")
#colnames(Bivariate_Fire_Insurance_Clean_Data_toy) <- c("X1", "Y1", "Y2","X2", "X3")
q50year <- numeric(11)
q75year <- numeric(11)
q95year <- numeric(11)
for(i in 1980:1990){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X2==i),]
q50year[i-1979] <- quantile(xxx$Y1,0.50)/1000000
q75year[i-1979] <- quantile(xxx$Y1,0.75)/1000000
q95year[i-1979] <- quantile(xxx$Y1,0.95)/1000000}
library(plotly)
fig1qY1 <- plot_ly(
type = 'scatterpolar',
r = c(q50year, q50year[1]),
theta=paste0("year ",as.character(c(1980:1990,1980))),
#fill = 'toself',
linetype = I("dot"),
name = 'q(50%)/1,000,000'
)
fig1qY1 <- fig1qY1 %>% add_trace(
type = 'scatterpolar',
r = c(q75year, q75year[1]),
theta=paste0("year ",as.character(c(1980:1990,1980))),
#fill = 'toself',
linetype = I("dashdot"),
name = 'q(75%)/1,000,000'
)
fig1qY1 <- fig1qY1 %>% add_trace(
type = 'scatterpolar',
r = c(q95year, q95year[1]),
theta=paste0("year ",as.character(c(1980:1990,1980))),
#fill = 'toself',
linetype = I("dash"),
name = 'q(95%)/1,000,000'
)
fig1qY1
#Bivariate_Fire_Insurance_Clean_Data_toy <- read.table("Bivariate_Fire_Insurance_Clean_Data.csv",header=TRUE,as.is=TRUE,sep=",")
#colnames(Bivariate_Fire_Insurance_Clean_Data_toy) <- c("X1", "Y1", "Y2","X2", "X3")
meanmonth <- numeric(11)
sdmonth <- numeric(11)
for(i in 1:12){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X3==i),]
meanmonth[i] <- mean(xxx$Y1)/1000000
sdmonth[i] <- sd(xxx$Y1)/1000000}
library(plotly)
fig1eY1 <- plot_ly(
type = 'scatterpolar',
r = c(meanmonth, meanmonth[1]),
theta = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan"),
#fill = 'toself',
linetype = I("dot"),
name = 'Average Claim/1,000,000'
)
fig1eY1 <- fig1eY1 %>% add_trace(
type = 'scatterpolar',
r = c(sdmonth, sdmonth[1]),
theta = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan"),
#fill = 'toself',
linetype = I("dotdash"),
name = 'Standard Deviation Claim/1,000,000'
)
fig1eY1
#Bivariate_Fire_Insurance_Clean_Data_toy <- read.table("Bivariate_Fire_Insurance_Clean_Data.csv",header=TRUE,as.is=TRUE,sep=",")
#colnames(Bivariate_Fire_Insurance_Clean_Data_toy) <- c("X1", "Y1", "Y2","X2", "X3")
q50month <- numeric(11)
q75month <- numeric(11)
q95month <- numeric(11)
for(i in 1:12){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X3==i),]
q50month[i] <- quantile(xxx$Y1,0.50)/1000000
q75month[i] <- quantile(xxx$Y1,0.75)/1000000
q95month[i] <- quantile(xxx$Y1,0.95)/1000000}
library(plotly)
fig1qY1 <- plot_ly(
type = 'scatterpolar',
r = c(q50month, q50month[1]),
theta = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan"),
#fill = 'toself',
linetype = I("dot"),
name = 'q(50%)/1,000,000'
)
fig1qY1 <- fig1qY1 %>% add_trace(
type = 'scatterpolar',
r = c(q75month, q75month[1]),
theta = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan"),
#fill = 'toself',
linetype = I("dashdot"),
name = 'q(75%)/1,000,000'
)
fig1qY1 <- fig1qY1 %>% add_trace(
type = 'scatterpolar',
r = c(q95month, q95month[1]),
theta = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan"),
#fill = 'toself',
linetype = I("dash"),
name = 'q(95%)/1,000,000'
)
fig1qY1
Building claims show an interesting trend. While Figure 1.8 shows large variation – coefficient of variation larger than 1 – on the yearly basis, the monthly claims from Figure 1.10 show the same behaviour but only for July, August and October. This peculiar behaviour could be explained by the lack of homogeneity on the annual basis, i.e. the iid (independent and identical distributed) assumption may not hold for the individual claims (sampled annually). This is only our initial intuition, though statistical testing could confirm or not this behaviour. At the same time, the timplot from Figures 1.2 shows three extreme claims; the most extreme claim appears in 1980, and the second(third) most extreme claim is recorded in 1985(1988), which is in accordance with Figure 1.8. Quantiles could better explain the tail behaviour, and thus, the \(50\%\), \(75\%\) and \(95\%\) quantiles are plotted in Figures 1.9 and 1.11. The same outliers – explained before – tend to explain the large difference between the \(75\%\) and \(95\%\) quantiles, especially in year 1985 and month October.
Content claims show an even more extreme behaviour – see Figure 1.12 to 1.15 – which is in accordance with multiple unusually large claims observed in the timeplot from Figures 1.3. The last visualisation tools of this section are now given for Content Insurance Claim Size (Y2), i.e. teh following radar plots:
Note that the claims are divided by 1,000,000, i.e. all measures of risks – mean, standard deviation and quantiles – are scaled down by a factor of 1,000,000.
#Bivariate_Fire_Insurance_Clean_Data_toy <- read.table("Bivariate_Fire_Insurance_Clean_Data.csv",header=TRUE,as.is=TRUE,sep=",")
#colnames(Bivariate_Fire_Insurance_Clean_Data_toy) <- c("X1", "Y1", "Y2","X2", "X3")
meanyear <- numeric(11)
sdyear <- numeric(11)
for(i in 1980:1990){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X2==i),]
meanyear[i-1979] <- mean(xxx$Y2)/1000000
sdyear[i-1979] <- sd(xxx$Y2)/1000000}
library(plotly)
fig1eY2 <- plot_ly(
type = 'scatterpolar',
r = c(meanyear, meanyear[1]),
theta=paste0("year ",as.character(c(1980:1990,1980))), #the labels are forced to be characters, so the setting from below does not work
# theta = c("1980", "1981", "1982", "1983", "1984", "1985", "1986'", "1987", "1988", "1989", "1990", "1980"),
#fill = 'toself',
linetype = I("dot"),
name = 'Average Claim/1,000,000'
)
fig1eY2 <- fig1eY2 %>% add_trace(
type = 'scatterpolar',
r = c(sdyear, sdyear[1]),
theta=paste0("year ",as.character(c(1980:1990,1980))),
#fill = 'toself',
linetype = I("dotdash"),
name = 'Standard Deviation Claim/1,000,000'
)
fig1eY2
#Bivariate_Fire_Insurance_Clean_Data_toy <- read.table("Bivariate_Fire_Insurance_Clean_Data.csv",header=TRUE,as.is=TRUE,sep=",")
#colnames(Bivariate_Fire_Insurance_Clean_Data_toy) <- c("X1", "Y1", "Y2","X2", "X3")
q50year <- numeric(11)
q75year <- numeric(11)
q95year <- numeric(11)
for(i in 1980:1990){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X2==i),]
q50year[i-1979] <- quantile(xxx$Y2,0.50)/1000000
q75year[i-1979] <- quantile(xxx$Y2,0.75)/1000000
q95year[i-1979] <- quantile(xxx$Y2,0.95)/1000000}
library(plotly)
fig1qY2 <- plot_ly(
type = 'scatterpolar',
r = c(q50year, q50year[1]),
theta=paste0("year ",as.character(c(1980:1990,1980))),
#fill = 'toself',
linetype = I("dot"),
name = 'q(50%)/1,000,000'
)
fig1qY2 <- fig1qY2 %>% add_trace(
type = 'scatterpolar',
r = c(q75year, q75year[1]),
theta=paste0("year ",as.character(c(1980:1990,1980))),
#fill = 'toself',
linetype = I("dashdot"),
name = 'q(75%)/1,000,000'
)
fig1qY2 <- fig1qY2 %>% add_trace(
type = 'scatterpolar',
r = c(q95year, q95year[1]),
theta=paste0("year ",as.character(c(1980:1990,1980))),
#fill = 'toself',
linetype = I("dash"),
name = 'q(95%)/1,000,000'
)
fig1qY2
#Bivariate_Fire_Insurance_Clean_Data_toy <- read.table("Bivariate_Fire_Insurance_Clean_Data.csv",header=TRUE,as.is=TRUE,sep=",")
#colnames(Bivariate_Fire_Insurance_Clean_Data_toy) <- c("X1", "Y1", "Y2","X2", "X3")
meanmonth <- numeric(11)
sdmonth <- numeric(11)
for(i in 1:12){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X3==i),]
meanmonth[i] <- mean(xxx$Y2)/1000000
sdmonth[i] <- sd(xxx$Y2)/1000000}
library(plotly)
fig1eY2 <- plot_ly(
type = 'scatterpolar',
r = c(meanmonth, meanmonth[1]),
theta = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan"),
#fill = 'toself',
linetype = I("dot"),
name = 'Average Claim/1,000,000'
)
fig1eY2 <- fig1eY2 %>% add_trace(
type = 'scatterpolar',
r = c(sdmonth, sdmonth[1]),
theta = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan"),
#fill = 'toself',
linetype = I("dotdash"),
name = 'Standard Deviation Claim/1,000,000'
)
fig1eY2
#Bivariate_Fire_Insurance_Clean_Data_toy <- read.table("Bivariate_Fire_Insurance_Clean_Data.csv",header=TRUE,as.is=TRUE,sep=",")
#colnames(Bivariate_Fire_Insurance_Clean_Data_toy) <- c("X1", "Y1", "Y2","X2", "X3")
q50month <- numeric(11)
q75month <- numeric(11)
q95month <- numeric(11)
for(i in 1:12){
xxx <- Bivariate_Fire_Insurance_Clean_Data_toy[which(Bivariate_Fire_Insurance_Clean_Data_toy$X3==i),]
q50month[i] <- quantile(xxx$Y2,0.50)/1000000
q75month[i] <- quantile(xxx$Y2,0.75)/1000000
q95month[i] <- quantile(xxx$Y2,0.95)/1000000}
library(plotly)
fig1qY2 <- plot_ly(
type = 'scatterpolar',
r = c(q50month, q50month[1]),
theta = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan"),
#fill = 'toself',
linetype = I("dot"),
name = 'q(50%)/1,000,000'
)
fig1qY2 <- fig1qY2 %>% add_trace(
type = 'scatterpolar',
r = c(q75month, q75month[1]),
theta = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan"),
#fill = 'toself',
linetype = I("dashdot"),
name = 'q(75%)/1,000,000'
)
fig1qY2 <- fig1qY2 %>% add_trace(
type = 'scatterpolar',
r = c(q95month, q95month[1]),
theta = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan"),
#fill = 'toself',
linetype = I("dash"),
name = 'q(95%)/1,000,000'
)
fig1qY2
This ends our EDA that is a preamble of the real data analysis provided in Section @ref{RDA}. We next provide a detailed simulation study in Section that relies on the same loss data estimation procedures from Section @ref{RDA}. The main aim of the simulation study is to provide the possible shortcomings of statistical models used to fit our insurance data.
In this section, you learn to:
This section provides elementary simulations from univariate (loss) distributions. We first simulate samples of size \(n=500\) – though samples of size \(n=\{25, 50, 100, 250, 500\}\) will be later generated – from the following three parametric distributions:
Note that our LogNormal distribution has the usual parametrisation, i.e. its expected value is \(\exp\left(\mu+\frac{\sigma^2}{2}\right)\). Further, our Weibull parametrisation, which is in terms of \(a,b>0\), has the following cumulative distribution function:
\[ F_{Weibull}(x;a,b)=\Pr(X\le x)=\exp\left\{-\left(\frac{x}{b}\right)^a\right\},\quad\text{for all}\;x\ge0,\;\text{i.e.}\;\mathbb{E}[X]=b\cdot\Gamma\left(1+\frac{1}{a}\right), \] where \(\Gamma()\) is the usual Gamma function.
The summary statistics for the three samples are given below:
set.seed(123); #set your seed so you could replicate the same sample every time you run the R script
n=500;
#Simulate Sample 1: logNormal($\mu=10, \sigma=1$) moderate heavy tail
Sample1 <- rlnorm(n, meanlog=10, sdlog=1)
summary(Sample1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1539 12399 22488 36912 43705 563003
#Simulate Sample 2: Weibull($a=0.5, b=1$) moderate heavy tail
Sample2 <- rweibull(n, shape=0.5 , scale=1)
#Note that the scale parameter b=1
summary(Sample2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.09372 0.48676 2.01320 1.98672 45.73743
#Simulate Sample 3: Weibull($a=2, b=1$) very light tail
Sample3 <- rweibull(n, shape=2 , scale=1)
summary(Sample3)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.06029 0.53292 0.83407 0.88164 1.16460 2.31943
Let us plot the histograms for three samples, which are given in Figure 2.1 below.
par(mfrow=c(2,2))
hist(Sample1, prob=TRUE, border="red", col="green");
hist(log(Sample1), prob=TRUE, border="red", col="green");
hist(Sample2, prob=TRUE, border="red", col="green");
hist(Sample3, prob=TRUE, border="red", col="green");
As expected, Samples 1 and 2 histograms are right skewed in Figure 2.1 due to their heavy tailness, while Sample 3 shows a very different pattern. At a first glance, Sample 2 seems to come from a less heavy tailed distribution than Sample 1 since one may say the Sample 1 histogram fades away after the threshold \(3\times10^5\), while Sample 2 histogram fades away after the threshold \(30\); this argument is faulty since the two distributions calibrate distributions with very different moments, e.g. the theoretical expected value of Sample 1 and Sample 2 is \(\exp(10.5)=36,315.50\) and \(1\cdot\Gamma\left(1+\frac{1}{0.5}\right)=2\), respectively. In fact, one could show that the theoretical distribution of Weibull(\(a=0.5\),\(b=1\)) exhibits a heavier tail distribution than LogNormal(\(\mu=10\),\(\sigma=1\)).
Finally, the log-transformation of the Sample 1 data leads to a symmetric histogram due to the fact that \(\log(X)\sim N\big(\mu,\sigma^2\big)\) whenever \(X\sim LogN\big(\mu,\sigma^2\big)\).
We are now ready to parametrically fit our sample data from Section 2.1, and two methods are chosen:
MLE and MOM estimations are possible via function mledist and mmedist functions that are available in the fitdistrplus R package. There are numerous numerical issues with both MLE and MOM implementations in R especially when fitting distributions with more than one parameter. Why?
MLE requires solving non-linear optimisation problems, and R (and Python) rely on basic optimisation tools that are designed for general use, and bespoke software like MATLAB would be more computationally stable; this issue is more acute when the dimension of the optimisation problem, i.e. number of parameters gets larger (two or more parameters).
MOM usually requires solving a non-linear system of equations, which is computationally feasible even in R (and Python) – as an application of the Fixed Point Theorem though other methods are available – when one parameter distribution is fitted, but MOM is computationally more challenging (not only in R and Python) when the number of parameters gets larger (two or more parameters).
Eight parametric families are used to fit our three samples. The standard R notations are used to signify which parametric we refer to. in the two packages for
Recall that pareto is defined on \((0, \infty)\), which is not true for Pareto Type 1, and thus, we prefer using pareto so that all eight parametric families are defined on the same range.
MLE estimation requires a `good’ starting point for their numerical algorithms, and thus, one should be careful when performing MLE estimation for two or more parametric distributions; local minima or even saddle points could be obtained, which are far from the desired global optimum point/points if this/these exists/exist as the likelihood function is not always a convex function. MOM faces numerical shortcomings when performing MLE estimation for two or more parametric distributions. Numerical issues are signified as NA in the R output.
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming lnorm family) as follows:
MLE.lnorm=mledist(Sample1, "lnorm")
MLE.lnorm$estimate
## meanlog sdlog
## 10.0345904 0.9717961
Sample 1 (simulated from LogNormal) is fitted via MLE assuming exp family as follows:
MLE.exp=mledist(Sample1/1000, "exp")
#Large valued data (not large sized samples) require to scale down the data by a factor
#we chose here to scale down by a factor of 1,000 since the expected value is around \exp(10.5)=36,315.50
#we should scale back the MLE estimate, and since we actually fit the rate, we divide the MLE estimate by 1,000
#recall that expected value is equal to 1/rate
MLE.exp$estimate/1000
## rate
## 2.709161e-05
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming gamma family) as follows:
MLE.gamma=mledist(Sample1, "gamma",lower=c(0, 0), start=list(shape =1, rate=1))
#A `good' starting point is needed here
MLE.gamma$estimate #rate=1/scale
## shape rate
## 1.177195e+00 3.188515e-05
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming weibull family) as follows:
MLE.weibull=mledist(Sample1, "weibull")
MLE.weibull$estimate
## shape scale
## 1.016727 37225.585178
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming llogis family) as follows:
MLE.llogis=mledist(Sample1, "llogis")
MLE.llogis$estimate
## shape scale
## 1.802915 22563.980389
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming invweibull family) as follows:
MLE.invweibull=mledist(Sample1, "invweibull")
MLE.invweibull$estimate
## shape scale
## 0.4334861 1036.7107921
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming pareto family) as follows:
MLE.pareto=mledist(Sample1, "pareto",lower=c(0, 0), start=list(shape =1, scale=1))
#A `good' starting point is needed here
MLE.pareto$estimate
## shape scale
## 7.484903e+00 2.382607e+05
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming trgamma family) as follows:
MLE.trgamma=mledist(Sample1, "trgamma")
MLE.trgamma$estimate
## shape1 shape2 rate
## 6.922385543 0.391949647 0.004997469
MLE fitted distributions have their density plots in Figure 2.2, though there is not much to say about these plots.
par(mfrow=c(2,4))
curve(dlnorm(x,MLE.lnorm$estimate[1],MLE.lnorm$estimate[2]),from=0,to=max(Sample1),ylab="Density", col=1, main="lnorm", xlab = "")
curve(dexp(x,MLE.exp$estimate/1000),from=0,to=max(Sample1),ylab="Density", col=2, main="exp", xlab = "")
curve(dgamma(x,MLE.gamma$estimate[1],MLE.gamma$estimate[2]),from=0,to=max(Sample1),ylab="Density", col=3, main="gamma", xlab = "")
curve(dweibull(x,MLE.weibull$estimate[1],MLE.weibull$estimate[2]),from=0, to = max(Sample1), ylab="Density",col=4,main="weibull", xlab = "")
curve(dllogis(x,MLE.llogis$estimate[1],MLE.llogis$estimate[2]),from=0, to = max(Sample1), ylab="Density",col=5,main="llogis", xlab = "")
curve(dinvweibull(x,MLE.invweibull$estimate[1],MLE.invweibull$estimate[2]),from=0, to = max(Sample1), ylab="Density",col=6,main="invweibull", xlab = "")
curve(dpareto(x,MLE.pareto$estimate[1],MLE.pareto$estimate[2]),from=0, to = max(Sample1), ylab="Density",col=7,main="pareto", xlab = "")
curve(dtrgamma(x,MLE.trgamma$estimate[1],MLE.trgamma$estimate[2],MLE.trgamma$estimate[3]),xlim=c(0,max(Sample1)),ylab="Density",col=8, main="trgamma", xlab = "")
MOM usually requires solving a non-linear system of equations, which is often computationally infeasible when MOM is applied for parametric families with two or more parameters – also mentioned at the beginning of Section 2.2. Sample 1 data are fitted via MOM through the mmedist function, and you could see that MOM estimation fails in some cases as we anticipated.
Sample 1 (simulated from LogNormal) is fitted via MOM (assuming lnorm family) as follows:
MME.lnorm=mmedist(Sample1, "lnorm")
MME.lnorm$estimate
## meanlog sdlog
## 10.042359 0.973578
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming exp family) as follows:
MME.exp=mmedist(Sample1, "exp")
MME.exp$estimate
## rate
## 2.709161e-05
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming gamma family) as follows:
MME.gamma=mmedist(Sample1, "gamma")
MME.gamma$estimate
## shape rate
## 6.328446e-01 1.714478e-05
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming weibull family) as follows:
memp <- function(x, order) mean(x^order)
MME.weibull=mmedist(Sample1, "weibull", order=c(1, 2), memp=memp)
## checkstartfix is carried out
MME.weibull$estimate
## shape scale
## 9.504584e-01 3.991438e+04
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming llogis family) as follows:
memp <- function(x, order) mean(x^order)
MME.llogis=mmedist(Sample1, "llogis", order=c(1, 2), memp=memp)
## checkstartfix is carried out
## <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
MME.llogis$estimate
## [1] NA NA
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming invweibull family) as follows:
memp <- function(x, order) mean(x^order)
MME.invweibull=mmedist(Sample1, "invweibull", order=c(1, 2), memp=memp)
## checkstartfix is carried out
MME.invweibull$estimate
## shape scale
## 7379.188 59279.946
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming pareto family) as follows:
memp <- function(x, order) mean(x^order)
MME.pareto=mmedist(Sample1, "pareto", order=c(1, 2), memp=memp)
## checkstartfix is carried out
MME.pareto$estimate
## shape scale
## 5.447284e+00 1.641572e+05
Sample 1 (simulated from LogNormal) is fitted via MLE (assuming trgamma family) as follows:
memp <- function(x, order) mean(x^order)
MME.trgamma=mmedist(Sample1, "trgamma", order=c(1, 2, 3), memp=memp)
## checkstartfix is carried out
## <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
MME.trgamma$estimate
## [1] NA NA NA
There are no obvious concerns with MOM estimates when fitting lnorm, exp and gamma families. There is no doubt that MOM fails when fitting llogis and trgamma families, and there is no immediate remedy. MOM estimation when fitting weibull, invweibull and pareto families provide some caution messages, and thus, we check whether the empirical and theoretical two moments are matched as it supposed to be.
xxx <- c(mean(Sample1),mean(Sample1^2))
xxxW<-c(mweibull(1, shape=MME.weibull$estimate[1], scale = MME.weibull$estimate[2]),mweibull(2, shape=MME.weibull$estimate[1], scale = MME.weibull$estimate[2]))
xxxIW<-c(minvweibull(1, shape=MME.invweibull$estimate[1], scale = MME.invweibull$estimate[2]),minvweibull(2, shape=MME.invweibull$estimate[1], scale = MME.invweibull$estimate[2]))
xxxP<-c(mpareto(1, shape=MME.pareto$estimate[1], scale = MME.pareto$estimate[2]),mpareto(2, shape=MME.pareto$estimate[1], scale = MME.pareto$estimate[2]))
cat("First two (raw) moments of Samples 1 -- empirical estimates")
## First two (raw) moments of Samples 1 -- empirical estimates
xxx
## [1] 3.691179e+04 3.515427e+09
cat("First two (raw) moments of Samples 1 -- theoretical MOM weibull")
## First two (raw) moments of Samples 1 -- theoretical MOM weibull
xxxW
## [1] 4.083912e+04 3.515497e+09
cat("First two (raw) moments of Samples 1 -- theoretical MOM invweibull")
## First two (raw) moments of Samples 1 -- theoretical MOM invweibull
xxxIW
## [1] 5.928458e+04 3.514662e+09
cat("First two (raw) moments of Samples 1 -- theoretical MOM pareto")
## First two (raw) moments of Samples 1 -- theoretical MOM pareto
xxxP
## [1] 3.691179e+04 3.515427e+09
cat("The diferences between the empirical and theoretical MOM for weibull, invweibull and pareto")
## The diferences between the empirical and theoretical MOM for weibull, invweibull and pareto
xxx-xxxW
## [1] -3927.331 -70056.150
xxx-xxxIW
## [1] -22372.79 764842.04
xxx-xxxP
## [1] 3.953755e-07 0.000000e+00
This suggests that the weibull and invweibull MOM estimates are very off, while the pareto MOM estimation is fine as it is.
One could help R to better estimate the weibull MOM estimates from first principles. That is, we have to solve the following system of equations: \[\mathbb{E}[X]=b\cdot\Gamma\left(1+\frac{1}{a}\right)=m_1\quad\text{and}\quad\mathbb{E}\big[X^2\big]=b^2\cdot\Gamma\left(1+\frac{2}{a}\right)=m_2,\] where \(m_1\) and \(m_2\) are the first and second empirical moment estimates. That is, we need to find the MOM estimate of the shape parameter \(\hat{a}_{MOM}>0\) such that \[\frac{m_2}{(m_1)^2}\left(\Gamma\left(1+\frac{1}{a}\right)\right)^2-\Gamma\left(1+\frac{2}{a}\right)=0,\] which is a root problem that could be solved in R with the help of the uniroot function. Clearly, the MOM estimate of the scale parameter is given by \[\hat{b}_{MOM}=\frac{m_1}{\Gamma\left(1+\frac{1}{\hat{a}_{MOM}}\right)}.\]
Now, let us solve the root equation defined earlier, and we get that
xxx <- c(mean(Sample1),mean(Sample1^2))
fW <- function(x) {(xxx[2]/xxx[1]^2)*(gamma(1+(1/x)))^2-gamma(1+(2/x))}
asol<-uniroot(fW, c(0.1,1))$root
bsol<-xxx[1]/gamma(1+(1/asol))
c(asol,bsol)
## [1] 8.020448e-01 3.263813e+04
xxxW<-c(mweibull(1, shape=asol, scale = bsol),mweibull(2, shape=asol, scale = bsol))
cat("The diferences between the empirical and theoretical MOM for weibull")
## The diferences between the empirical and theoretical MOM for weibull
xxx - xxxW
## [1] 0.00 16711.37
cat("Weibull MOM estimates -- (a=shape, b=scale)")
## Weibull MOM estimates -- (a=shape, b=scale)
c(asol,bsol)
## [1] 8.020448e-01 3.263813e+04
As you could see, the default toleration in the uniroot function does not lead to reasonable weibull MOM estimates. Therefore, we reduce the toleration to \(tol=10^{-15}\) and narrow down the search interval to \((0.7,0.9)\) from the previous search interval of \((0.1,1)\), and we get that
#fW <- function(x) {(xxx[2]/xxx[1]^2)*(gamma(1+(1/x)))^2-gamma(1+(2/x))}
asol1<-uniroot(fW, c(0.7,0.9), tol=10^(-15))$root
bsol1<-xxx[1]/gamma(1+(1/asol1))
xxxW1<-c(mweibull(1, shape=asol1, scale = bsol1),mweibull(2, shape=asol1, scale = bsol1))
cat("The diferences between the empirical and theoretical MOM for weibull")
## The diferences between the empirical and theoretical MOM for weibull
xxx -xxxW1
## [1] 0.000000e+00 -4.768372e-07
cat("Weibull MOM estimates -- (a=shape, b=scale)")
## Weibull MOM estimates -- (a=shape, b=scale)
c(asol1,bsol1)
## [1] 8.020419e-01 3.263804e+04
The above estimates are reasonable since the empirical and theoretical moments are different up to an error of \(10^{-7}\), even though there are small differences (default toleration and reduced toleration) between the MOM outputs, \(\big(\hat{a}_{MOM},\hat{b}_{MOM}\big)\).
It is a good exercise for you to try finding a remedy for the invweibull MOM estimation, which is similar to the weibull remedy from above.
As a final note, you should be aware that some ad-hoc MOM estimation are based on dispersion measures as well. For example, if one fits gamma family that has two parameters, one may try matching the first moment and variance, instead of matching the first and second moments. This is not a big issue, though variance could be estimated via the unbiased or the asymptotically unbiased estimator – each having its pros and cons depending on the sample size – and the two choices lead to different set of estimates; recall that even a small difference in parameter estimates could lead to massive model error as we noticed in the previous weibull MOM estimation.
In this section, you learn to:
So far, Sample 1 is drawn from a LogNormal distribution – a moderately heavy tailed distribution – with parameters \(\mu=10\) and \(\sigma=1\). The sample of size of Sample 1 has been \(n=500\), and now take five samples of size \(n=\{25, 50, 100, 250, 500\}\) as follows:
set.seed(1234) #Generate Sample1 and save in a list
n=c(25, 50, 100, 250, 500)
lnorm.meanlog=10
lnorm.sdlog=1
Sample1.list=list(n25=rep(0,25),n50=rep(0,50),n100=rep(0,100),n250=rep(0,250),n500=rep(0,500))
for (i in 1:length(n)) {
Sample1=rlnorm(n[i], meanlog=lnorm.meanlog, sdlog=lnorm.sdlog)
Sample1.list[[i]]=Sample1
}
We then perform a parametric estimation – via MLE and MOM – for all five samples when the underlying distribution is one of the eight parametric families considered before.
Note that our MLE estimates from this section are as given by the R function mledist without checking whether these estimates are points of global maxima, which is beyond the scope of this session. Section 2.2 has detailed what one should do to enhance our confidence in our MLE estimates.
Note also that MOM estimation is as given by the R function mmedist, which we have already seen how off could be for some parametric choices; specifically, when \(n=500\), weibull and invweibull MOM estimations are not acceptable, though the pareto estimates turned out to be fine. We will not repeat the same exercise as in Section 2.3 to rectify these shortcomings, but one should keep in mind these possible drawbacks.
Here is the MLE and MOM estimation for all samples and parametric assumptions.
#MLE and MME for 8 different distributions and saved in each matrix
M1.MLE.lnorm=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("meanlog","sdlog")))
M2.MLE.exp=matrix(0,length(n),1,dimnames=list(c("n25","n50","n100","n250","n500"),c("rate")))
M3.MLE.gamma=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape","rate")))
M4.MLE.weibull=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape","scale")))
M5.MLE.llogis=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape","scale")))
M6.MLE.invweibull=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape","scale")))
M7.MLE.pareto=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape","scale")))
M8.MLE.trgamma=matrix(0,length(n),3,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape1","shape2","rate")))
M1.MME.lnorm=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("meanlog","sdlog")))
M2.MME.exp=matrix(0,length(n),1,dimnames=list(c("n25","n50","n100","n250","n500"),c("rate")))
M3.MME.gamma=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape","rate")))
M4.MME.weibull=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape","scale")))
M5.MME.llogis=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape","scale")))
M6.MME.invweibull=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape","scale")))
M7.MME.pareto=matrix(0,length(n),2,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape","scale")))
M8.MME.trgamma=matrix(0,length(n),3,dimnames=list(c("n25","n50","n100","n250","n500"),c("shape1","shape2","rate")))
for (i in 1:length(n)) {
#lnorm
M1.MLE.lnorm[i,]=mledist(Sample1.list[[i]], "lnorm")$estimate
M1.MME.lnorm[i,]=mmedist(Sample1.list[[i]], "lnorm")$estimate
#exp
MLE.exp=mledist(Sample1.list[[i]]/1000, "exp")
#if (is.na(MLE.exp$estimate)==is.na(NA)) {MLE.exp$estimate=1/mean(Sample1.list[[i]])}
M2.MLE.exp[i,]=MLE.exp$estimate/1000
M2.MME.exp[i,]=mmedist(Sample1.list[[i]], "exp")$estimate
#gamma
MLE.gamma=mledist(Sample1.list[[i]], "gamma",lower=c(0, 0), start=list(shape =1, rate=1)) #when the meanlog is too large, become NA
M3.MLE.gamma[i,]=MLE.gamma$estimate
#if (is.na(MLE.gamma$estimate[1])==is.na(NA)) {
# MLE.gamma$estimate=gammaMLE(Sample1.list[[i]])
# MLE.gamma=c(MLE.gamma$estimate[1],1/MLE.gamma$estimate[2])
# M3.MLE.gamma[i,]=MLE.gamma}
M3.MME.gamma[i,]=mmedist(Sample1.list[[i]], "gamma")$estimate
#weibull
M4.MLE.weibull[i,]=mledist(Sample1.list[[i]], "weibull")$estimate
memp <- function(x, order) mean(x^order)
MME.weibull=mmedist(Sample1.list[[i]], "weibull", order=c(1, 2), memp=memp)
M4.MME.weibull[i,]=MME.weibull$estimate
#llogis
M5.MLE.llogis[i,]=mledist(Sample1.list[[i]], "llogis")$estimate
memp <- function(x, order) mean(x^order)
MME.llogis=mmedist(Sample1.list[[i]], "llogis", order=c(1, 2), memp=memp)
M5.MME.llogis[i,]=MME.llogis$estimate
#invweibull
M6.MLE.invweibull[i,]=mledist(Sample1.list[[i]], "invweibull")$estimate
memp <- function(x, order) mean(x^order)
MME.invweibull=mmedist(Sample1.list[[i]], "invweibull", order=c(1, 2), memp=memp)
M6.MME.invweibull[i,]=MME.invweibull$estimate
#pareto
M7.MLE.pareto[i,]=mledist(Sample1.list[[i]], "pareto")$estimate
memp <- function(x, order) mean(x^order)
MME.pareto=mmedist(Sample1.list[[i]], "pareto", order=c(1, 2), memp=memp)
M7.MME.pareto[i,]=MME.pareto$estimate
#trgamma
M8.MLE.trgamma[i,]=mledist(Sample1.list[[i]], "trgamma")$estimate
memp <- function(x, order) mean(x^order)
MME.trgamma=mmedist(Sample1.list[[i]], "trgamma", order=c(1, 2, 3), memp=memp)
M8.MME.trgamma[i,]=MME.trgamma$estimate
}
## checkstartfix is carried out
## checkstartfix is carried out
## checkstartfix is carried out
## checkstartfix is carried out
## checkstartfix is carried out
## <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
## checkstartfix is carried out
## checkstartfix is carried out
## <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
## checkstartfix is carried out
## checkstartfix is carried out
## checkstartfix is carried out
## <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
## checkstartfix is carried out
## checkstartfix is carried out
## <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
## checkstartfix is carried out
## checkstartfix is carried out
## checkstartfix is carried out
## <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
## checkstartfix is carried out
## checkstartfix is carried out
## <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
## checkstartfix is carried out
## checkstartfix is carried out
## checkstartfix is carried out
## <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
## checkstartfix is carried out
## checkstartfix is carried out
## <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
## checkstartfix is carried out
## checkstartfix is carried out
## checkstartfix is carried out
## <simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
Sample 1 (simulated from LogNormal) of sizes \(n=\{25, 50, 100, 250, 500\}\) are fitted via MLE and MOM (assuming lnorm family) as follows:
#print table
kable(cbind(M1.MLE.lnorm, M1.MME.lnorm), digits = 7, format.args = list(big.mark = ","), caption = "MLE estimates (left) and MOM estimates (right)")
meanlog | sdlog | meanlog | sdlog | |
---|---|---|---|---|
n25 | 9.758218 | 0.9048491 | 9.643222 | 1.1298040 |
n50 | 9.773453 | 1.0790497 | 9.872765 | 1.0785899 |
n100 | 10.052040 | 0.8846443 | 10.091675 | 0.8532116 |
n250 | 10.035170 | 1.0340745 | 10.039688 | 1.0131995 |
n500 | 9.954018 | 0.9791639 | 9.952332 | 0.9837266 |
Sample 1 (simulated from LogNormal) of sizes \(n=\{25, 50, 100, 250, 500\}\) are fitted via MLE and MOM (assuming exp family) as follows:
#print table
kable(cbind(M2.MLE.exp, M2.MME.exp), digits = 7, format.args = list(big.mark = ","), caption = "MLE estimates (left) and MOM estimates (right)")
rate | rate | |
---|---|---|
n25 | 3.43e-05 | 3.43e-05 |
n50 | 2.88e-05 | 2.88e-05 |
n100 | 2.88e-05 | 2.88e-05 |
n250 | 2.61e-05 | 2.61e-05 |
n500 | 2.94e-05 | 2.94e-05 |
Sample 1 (simulated from LogNormal) of sizes \(n=\{25, 50, 100, 250, 500\}\) are fitted via MLE and MOM (assuming gamma family) as follows:
#print table
kable(cbind(M3.MLE.gamma, M3.MME.gamma), digits = 7, format.args = list(big.mark = ","), caption = "MLE estimates (left) and MOM estimates (right)")
shape | rate | shape | rate | |
---|---|---|---|---|
n25 | 1.0918842 | 3.74e-05 | 0.3870090 | 1.33e-05 |
n50 | 0.8632702 | 2.49e-05 | 0.4544097 | 1.31e-05 |
n100 | 1.3814064 | 3.98e-05 | 0.9338181 | 2.69e-05 |
n250 | 1.1027760 | 2.88e-05 | 0.5581965 | 1.46e-05 |
n500 | 1.1761067 | 3.45e-05 | 0.6127707 | 1.80e-05 |
Sample 1 (simulated from LogNormal) of sizes \(n=\{25, 50, 100, 250, 500\}\) are fitted via MLE and MOM (assuming weibull family) as follows:
#print table
kable(cbind(M4.MLE.weibull, M4.MME.weibull), digits = 7, format.args = list(big.mark = ","), caption = "MLE estimates (left) and MOM estimates (right)")
shape | scale | shape | scale | |
---|---|---|---|---|
n25 | 0.9437471 | 29,543.90 | 0.7860005 | 29,567.97 |
n50 | 0.8482235 | 31,171.88 | 0.7734185 | 32,456.36 |
n100 | 1.1193522 | 36,437.53 | 1.1269249 | 39,020.15 |
n250 | 0.9823712 | 37,948.02 | 0.9101338 | 41,143.93 |
n500 | 1.0165764 | 34,341.51 | 0.9437451 | 36,941.55 |
Sample 1 (simulated from LogNormal) of sizes \(n=\{25, 50, 100, 250, 500\}\) are fitted via MLE and MOM (assuming llogis family) as follows:
#print table
kable(cbind(M5.MLE.llogis, M5.MME.llogis), digits = 7, format.args = list(big.mark = ","), caption = "MLE estimates (left) and MOM estimates (right)")
shape | scale | shape | scale | |
---|---|---|---|---|
n25 | 0.1879159 | 17,280.25 | 126,073,923 | 2,071,502 |
n50 | 1.6328594 | 15,686.76 | NA | NA |
n100 | 0.1950570 | 23,373.39 | NA | NA |
n250 | 1.7470879 | 23,175.25 | NA | NA |
n500 | 1.7940411 | 20,968.28 | NA | NA |
Sample 1 (simulated from LogNormal) of sizes \(n=\{25, 50, 100, 250, 500\}\) are fitted via MLE and MOM (assuming invweibull family) as follows:
#print table
kable(cbind(M6.MLE.invweibull, M6.MME.invweibull), digits = 7, format.args = list(big.mark = ","), caption = "MLE estimates (left) and MOM estimates (right)")
shape | scale | shape | scale | |
---|---|---|---|---|
n25 | 0.5934289 | 1,714.3955 | 6,404.055 | 55,248.86 |
n50 | 0.5975857 | 1,616.6989 | 7,926.824 | 62,073.97 |
n100 | 0.6110728 | 2,850.8647 | 6,417.635 | 49,987.98 |
n250 | 0.3123593 | 365.3364 | 8,418.296 | 63,971.32 |
n500 | 0.4255051 | 902.3304 | 6,379.283 | 55,269.19 |
Sample 1 (simulated from LogNormal) of sizes \(n=\{25, 50, 100, 250, 500\}\) are fitted via MLE and MOM (assuming pareto family) as follows:
#print table
kable(cbind(M7.MLE.pareto, M7.MME.pareto), digits = 7, format.args = list(big.mark = ","), caption = "MLE estimates (left) and MOM estimates (right)")
shape | scale | shape | scale | |
---|---|---|---|---|
n25 | 3.930348 | 82,776.96 | 3.262691 | 66,039.26 |
n50 | 2.395978 | 49,999.88 | 3.665754 | 92,496.85 |
n100 | 30.655653 | 1,030,842.25 | 30.219733 | 1,015,106.90 |
n250 | 6.199372 | 197,403.65 | 4.526900 | 135,049.08 |
n500 | 7.722460 | 227,712.70 | 5.164899 | 141,900.73 |
Sample 1 (simulated from LogNormal) of sizes \(n=\{25, 50, 100, 250, 500\}\) are fitted via MLE and MOM (assuming trgamma family) as follows:
#print table
kable(cbind(M8.MLE.trgamma, M8.MME.trgamma), digits = 7, format.args = list(big.mark = ","), caption = "MLE estimates (left) and MOM estimates (right)")
shape1 | shape2 | rate | shape1 | shape2 | rate | |
---|---|---|---|---|---|---|
n25 | 9.115813 | 0.3457418 | 0.0288253 | NA | NA | NA |
n50 | 7.066169 | 0.3303238 | 0.0168226 | NA | NA | NA |
n100 | 7.186013 | 0.4164912 | 0.0040745 | NA | NA | NA |
n250 | 6.720832 | 0.3822265 | 0.0052258 | NA | NA | NA |
n500 | 8.476387 | 0.3523861 | 0.0171844 | NA | NA | NA |
There is a lot to absorb from these estimates, and we will not analyse every single set of estimates, but we note the following:
Let us check the MOM estimates for pareto family when \(n=100\), though the MLE counterpart has the same issue.
xxx <- c(mean(Sample1.list[[3]]),mean(Sample1.list[[3]]^2))
xxxP<-c(mpareto(1, shape=M7.MME.pareto[3,1], scale = M7.MME.pareto[3,2]),mpareto(1, shape=M7.MME.pareto[3,1], scale = M7.MME.pareto[3,2]))
cat("First two (raw) moments of Samples 1 with n=100 -- empirical estimates")
## First two (raw) moments of Samples 1 with n=100 -- empirical estimates
xxx
## [1] 3.474046e+04 2.499335e+09
cat("Pareto MOM estimates of Samples 1 with n=100 -- (a=shape, b=scale)")
## Pareto MOM estimates of Samples 1 with n=100 -- (a=shape, b=scale)
xxxP
## [1] 34740.46 34740.46
cat("The diferences between the empirical and theoretical MOM for weibull")
## The diferences between the empirical and theoretical MOM for weibull
xxx-xxxP
## [1] -1.383093e-05 2.499300e+09
Thus, we need to find a bespoke remedy for this issue. One could help R to better estimate the pareto MOM estimates from first principles. That is, we have to solve the following system of equations: \[\mathbb{E}[X]=\frac{\lambda}{\alpha-1}=m_1\quad\text{and}\quad\mathbb{E}\big[X^2\big]=\frac{2\lambda^2}{(\alpha-1)(\alpha-2)}=m_2,\] where \(m_1\) and \(m_2\) are the first and second empirical moment estimates. Recall that the shape parameter should satisfy \(\alpha>2\) so that the first two moments are finite, and \(\lambda>0\) should always need to hold, since our pareto parametrisation has the following cumulative distribution function: \[ F_{Pareto}(x;\alpha,\lambda)=\Pr(X\le x)=1-\left(\frac{\lambda}{\lambda+x}\right)^\alpha\quad\text{for all}\;x\ge0\quad\text{with}\;\alpha,\lambda>0. \] Our system of equations requires that \(\lambda=m_1(\alpha-1)\), which in turn gives that \[ m_2=2m_1^2\cdot\frac{\alpha-1}{\alpha-2}\implies \hat{\alpha}_{MOM}=2+\left(\frac{m_2}{2m_1^2}-1\right)^{-1}\quad\text{and}\quad \hat{\lambda}_{MOM}=m_1(\hat{\alpha}_{MOM}-1). \] Recall that \(\hat{\alpha}_{MOM}>2\) and \(\hat{\lambda}_{MOM}>0\) should hold, which is true as long as \(\frac{m_2}{2m_1^2}-1>0\), but for this particular sample \(\frac{m_2}{2m_1^2}-1=1.035436\) so the MOM are feasible. We could now check whether we got what we need not only what we hoped to:
xxx <- c(mean(Sample1.list[[3]]),mean(Sample1.list[[3]]^2))
alpha1MOM <- 2+(xxx[2]/(2*xxx[1]^2)-1)^(-1)
lambda1MOM <- (alpha1MOM-1)*xxx[1]
cat("Pareto MOM estimates of Samples 1 with n=100 -- (alpha1=shape, lambda1=scale)")
## Pareto MOM estimates of Samples 1 with n=100 -- (alpha1=shape, lambda1=scale)
c(alpha1MOM,lambda1MOM)
## [1] 3.021973e+01 1.015107e+06
xxxP1<-c(mpareto(1, shape=alpha1MOM, scale = lambda1MOM),mpareto(2, shape=alpha1MOM, scale = lambda1MOM))
cat("First two (raw) moments of Samples 1 with n=100 -- empirical estimates")
## First two (raw) moments of Samples 1 with n=100 -- empirical estimates
xxx
## [1] 3.474046e+04 2.499335e+09
cat("First two (raw) moments of Samples 1 with n=100 -- theoretical MOM pareto")
## First two (raw) moments of Samples 1 with n=100 -- theoretical MOM pareto
xxxP1
## [1] 3.474046e+04 2.499335e+09
cat("The diferences between the empirical and theoretical MOM for pareto")
## The diferences between the empirical and theoretical MOM for pareto
xxx-xxxP1
## [1] 1.964509e-10 4.959106e-05
and we notice that the empirical and theoretical moments are extremely close to each other with an error up to \(10^{-5}\). These MOM estimates for pareto will not be further used so that the R code does not get more cumbersome that is now, but keep in mind that some MOM sets of estimates are not very accurate, particularly those MOM estimates when fitting weibull, invweibull and pareto families.
Q-Q plots and P-P plots are intuitive graphical tools for checking two given distributions. Q-Q plots tend to be more useful, but for completeness, all plots are provided as follows:
Note that the five samples (of sizes \(n=\{25, 50, 100, 250, 500\}\)) are drawn/simulated from LogNormal , i.e. the true distribution is LogNormal, but these samples are fitted via MLE/MOM by assuming lnorm and exp families. Clearly, the other six parametric assumptions could be checked in the same fashion.
par(mfrow = c(2, 5))
mle.fln <- fitdist(Sample1.list[[1]], "lnorm", method="mle")
qqcomp(list(mle.fln), legendtext = "lnorm (mle, n=25)")
mle.fln <- fitdist(Sample1.list[[2]], "lnorm", method="mle")
qqcomp(list(mle.fln), legendtext = "lnorm (mle, n=50)")
mle.fln <- fitdist(Sample1.list[[3]], "lnorm", method="mle")
qqcomp(list(mle.fln), legendtext = "lnorm (mle, n=100)")
mle.fln <- fitdist(Sample1.list[[4]], "lnorm", method="mle")
qqcomp(list(mle.fln), legendtext = "lnorm (mle, n=250)")
mle.fln <- fitdist(Sample1.list[[5]], "lnorm", method="mle")
qqcomp(list(mle.fln), legendtext = "lnorm (mle, n=500)")
mme.fln <- fitdist(Sample1.list[[1]], "lnorm", method="mme")
qqcomp(list(mme.fln), legendtext = "lnorm (mme, n=25)")
mme.fln <- fitdist(Sample1.list[[2]], "lnorm", method="mme")
qqcomp(list(mme.fln), legendtext = "lnorm (mme, n=50)")
mme.fln <- fitdist(Sample1.list[[3]], "lnorm", method="mme")
qqcomp(list(mme.fln), legendtext = "lnorm (mme, n=100)")
mme.fln <- fitdist(Sample1.list[[4]], "lnorm", method="mme")
qqcomp(list(mme.fln), legendtext = "lnorm (mme, n=250)")
mme.fln <- fitdist(Sample1.list[[5]], "lnorm", method="mme")
qqcomp(list(mme.fln), legendtext = "lnorm (mme, n=500)")
par(mfrow = c(2, 5))
mle.fln <- fitdist(Sample1.list[[1]], "lnorm", method="mle")
ppcomp(list(mle.fln), legendtext = "lnorm (mle, n=25)")
mle.fln <- fitdist(Sample1.list[[2]], "lnorm", method="mle")
ppcomp(list(mle.fln), legendtext = "lnorm (mle, n=50)")
mle.fln <- fitdist(Sample1.list[[3]], "lnorm", method="mle")
ppcomp(list(mle.fln), legendtext = "lnorm (mle, n=100)")
mle.fln <- fitdist(Sample1.list[[4]], "lnorm", method="mle")
ppcomp(list(mle.fln), legendtext = "lnorm (mle, n=250)")
mle.fln <- fitdist(Sample1.list[[5]], "lnorm", method="mle")
ppcomp(list(mle.fln), legendtext = "lnorm (mle, n=500)")
mme.fln <- fitdist(Sample1.list[[1]], "lnorm", method="mme")
ppcomp(list(mme.fln), legendtext = "lnorm (mme, n=25)")
mme.fln <- fitdist(Sample1.list[[2]], "lnorm", method="mme")
ppcomp(list(mme.fln), legendtext = "lnorm (mme, n=50)")
mme.fln <- fitdist(Sample1.list[[3]], "lnorm", method="mme")
ppcomp(list(mme.fln), legendtext = "lnorm (mme, n=100)")
mme.fln <- fitdist(Sample1.list[[4]], "lnorm", method="mme")
ppcomp(list(mme.fln), legendtext = "lnorm (mme, n=250)")
mme.fln <- fitdist(Sample1.list[[5]], "lnorm", method="mme")
ppcomp(list(mme.fln), legendtext = "lnorm (mme, n=500)")
#Here, in order to use fitdist of exp, we re-scale of the data sample (just here only)
set.seed(1234) #Generating Sample1 and saved in a list
n=c(25, 50, 100, 250, 500)
lnorm.meanlog=10
lnorm.sdlog=1
Sample1.list.rescaled=list(n25=rep(0,25),n50=rep(0,50),n100=rep(0,100),n250=rep(0,250),n500=rep(0,500))
for (i in 1:length(n)) {
Sample1=rlnorm(n[i], meanlog=lnorm.meanlog, sdlog=lnorm.sdlog)
Sample1.list.rescaled[[i]]=Sample1/1000
}
par(mfrow = c(2, 5))
mle.fe <- fitdist(Sample1.list.rescaled[[1]], "exp", method="mle")
qqcomp(list(mle.fe), legendtext = "exp (mle, n=25)")
mle.fe <- fitdist(Sample1.list.rescaled[[2]], "exp", method="mle")
qqcomp(list(mle.fe), legendtext = "exp (mle, n=50)")
mle.fe <- fitdist(Sample1.list.rescaled[[3]], "exp", method="mle")
qqcomp(list(mle.fe), legendtext = "exp (mle, n=100)")
mle.fe <- fitdist(Sample1.list.rescaled[[4]], "exp", method="mle")
qqcomp(list(mle.fe), legendtext = "exp (mle, n=250)")
mle.fe <- fitdist(Sample1.list.rescaled[[5]], "exp", method="mle")
qqcomp(list(mle.fe), legendtext = "exp (mle, n=500)")
mme.fe <- fitdist(Sample1.list.rescaled[[1]], "exp", method="mme")
qqcomp(list(mme.fe), legendtext = "exp (mme, n=25)")
mme.fe <- fitdist(Sample1.list.rescaled[[2]], "exp", method="mme")
qqcomp(list(mme.fe), legendtext = "exp (mme, n=50)")
mme.fe <- fitdist(Sample1.list.rescaled[[3]], "exp", method="mme")
qqcomp(list(mme.fe), legendtext = "exp (mme, n=100)")
mme.fe <- fitdist(Sample1.list.rescaled[[4]], "exp", method="mme")
qqcomp(list(mme.fe), legendtext = "exp (mme, n=250)")
mme.fe <- fitdist(Sample1.list.rescaled[[5]], "exp", method="mme")
qqcomp(list(mme.fe), legendtext = "exp (mme, n=500)")
par(mfrow = c(2, 5))
mle.fe <- fitdist(Sample1.list.rescaled[[1]], "exp", method="mle")
ppcomp(list(mle.fe), legendtext = "exp (mle, n=25)")
mle.fe <- fitdist(Sample1.list.rescaled[[2]], "exp", method="mle")
ppcomp(list(mle.fe), legendtext = "exp (mle, n=50)")
mle.fe <- fitdist(Sample1.list.rescaled[[3]], "exp", method="mle")
ppcomp(list(mle.fe), legendtext = "exp (mle, n=100)")
mle.fe <- fitdist(Sample1.list.rescaled[[4]], "exp", method="mle")
ppcomp(list(mle.fe), legendtext = "exp (mle, n=250)")
mle.fe <- fitdist(Sample1.list.rescaled[[5]], "exp", method="mle")
ppcomp(list(mle.fe), legendtext = "exp (mle, n=500)")
mme.fe <- fitdist(Sample1.list.rescaled[[1]], "exp", method="mme")
ppcomp(list(mme.fe), legendtext = "exp (mme, n=25)")
mme.fe <- fitdist(Sample1.list.rescaled[[2]], "exp", method="mme")
ppcomp(list(mme.fe), legendtext = "exp (mme, n=50)")
mme.fe <- fitdist(Sample1.list.rescaled[[3]], "exp", method="mme")
ppcomp(list(mme.fe), legendtext = "exp (mme, n=100)")
mme.fe <- fitdist(Sample1.list.rescaled[[4]], "exp", method="mme")
ppcomp(list(mme.fe), legendtext = "exp (mme, n=250)")
mme.fe <- fitdist(Sample1.list.rescaled[[5]], "exp", method="mme")
ppcomp(list(mme.fe), legendtext = "exp (mme, n=500)")
Quantiles play an important role in fitting the best possible distribution. In addition, the real data analysis from Section 3 requires estimating quantiles at level \(\{50\%, 75\%,95\%\}\). Therefore, we next estimate the same quantiles so that we understand the estimation error in a controlled setting where data are simulated.
The following quantile related computations are tabulated: i) `True’ quantile values and ii) MLE and MOM and quantile estimates with various sample sizes.
Quantile.true=qlnorm(c(0.5,0.75,0.95), meanlog=lnorm.meanlog, sdlog=lnorm.sdlog, lower.tail = TRUE, log.p = FALSE)
Quantile.true=matrix(Quantile.true,1,3,dimnames=list(c("Quantile value"),c("(50%)","(75%)","(95%)")))
kable(Quantile.true, digits = 2, format.args = list(big.mark = ","), caption="Quantile values of the true LogNormal distribution at levels 50%, 75% and 95%")
(50%) | (75%) | (95%) | |
---|---|---|---|
Quantile value | 22,026.47 | 43,238.64 | 114,102.6 |
suppressWarnings(library(stats))
Table1.1=matrix(0,8,6,dimnames=list(c("lnorm","exp","gamma","weibull","llogis","invwei","Pareto","trgamma"),c("MOM(50%)","MLE(50%)","MOM(75%)","MLE(75%)","MOM(95%)","MLE(95%)")))
#lnorm
mle.fln.n25 <- fitdist(Sample1.list[[1]], "lnorm", method="mle")
Quantile.lnorm.mle.n25=quantile(mle.fln.n25, probs = c(0.5, 0.75, 0.95))
mme.fln.n25 <- fitdist(Sample1.list[[1]], "lnorm", method="mme")
Quantile.lnorm.mme.n25=quantile(mme.fln.n25, probs = c(0.5, 0.75, 0.95))
#exp
mle.fe.n25 <- fitdist(Sample1.list.rescaled[[1]], "exp", method="mle") #Use re-scaled value
Quantile.exp.mle.n25=quantile(mle.fe.n25, probs = c(0.5, 0.75, 0.95))
mme.fe.n25 <- fitdist(Sample1.list[[1]], "exp", method="mme")
Quantile.exp.mme.n25=quantile(mme.fe.n25, probs = c(0.5, 0.75, 0.95))
#gamma
mle.fg.n25 <- fitdist(Sample1.list[[1]], "gamma", method="mle",lower=c(0, 0), start=list(shape =1, rate=1))
Quantile.gamma.mle.n25=quantile(mle.fg.n25, probs = c(0.5, 0.75, 0.95))
mme.fg.n25 <- fitdist(Sample1.list[[1]], "gamma", method="mme")
Quantile.gamma.mme.n25=quantile(mme.fg.n25, probs = c(0.5, 0.75, 0.95))
#weibull
mle.fw.n25 <- fitdist(Sample1.list[[1]], "weibull", method="mle")
Quantile.weibull.mle.n25=quantile(mle.fw.n25, probs = c(0.5, 0.75, 0.95))
memp <- function(x, order) mean(x^order)
mme.fw.n25 <- fitdist(Sample1.list[[1]], "weibull", method="mme", order=c(1, 2), memp=memp)
Quantile.weibull.mme.n25=quantile(mme.fw.n25, probs = c(0.5, 0.75, 0.95))
#llogis
mle.fl.n25 <- fitdist(Sample1.list[[1]], "llogis", method="mle")
Quantile.llogis.mle.n25=quantile(mle.fl.n25, probs = c(0.5, 0.75, 0.95))
#mme.fl.n25 <- fitdist(Sample1.list[[1]], "llogis", method="mme", order=c(1, 2), memp=memp)
#Quantile.llogis.mme.n25=quantile(mme.fl.n25, probs = c(0.5, 0.75, 0.95))
#invweibull
mle.fi.n25 <- fitdist(Sample1.list[[1]], "invweibull", method="mle")
Quantile.invweibull.mle.n25=quantile(mle.fi.n25, probs = c(0.5, 0.75, 0.95))
mme.fi.n25 <- fitdist(Sample1.list[[1]], "invweibull", method="mme", order=c(1, 2), memp=memp)
Quantile.invweibull.mme.n25=quantile(mme.fi.n25, probs = c(0.5, 0.75, 0.95))
#pareto
mle.fp.n25 <- fitdist(Sample1.list[[1]], "pareto", method="mle")
Quantile.pareto.mle.n25=quantile(mle.fp.n25, probs = c(0.5, 0.75, 0.95))
mme.fp.n25 <- fitdist(Sample1.list[[1]], "pareto", method="mme", order=c(1, 2), memp=memp)
Quantile.pareto.mme.n25=quantile(mme.fp.n25, probs = c(0.5, 0.75, 0.95))
#trgamma
mle.ft.n25 <- fitdist(Sample1.list.rescaled[[1]], "trgamma", method="mle",optim.method="SANN") #Use re-scaled
Quantile.trgamma.mle.n25=quantile(mle.ft.n25, probs = c(0.5, 0.75, 0.95))
memp <- function(x, order) mean(x^order)
mme.ft.n25 <- fitdist(Sample1.list[[1]], "trgamma", method="mme", order=c(1, 2, 3), memp=memp)
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
Quantile.trgamma.mme.n25=quantile(mme.ft.n25, probs = c(0.5, 0.75, 0.95))
#Table
Table1.1[1,c(2,4,6)]=abs(as.numeric(Quantile.lnorm.mle.n25$quantiles)-Quantile.true)
Table1.1[1,c(1,3,5)]=abs(as.numeric(Quantile.lnorm.mme.n25$quantiles)-Quantile.true)
Table1.1[2,c(2,4,6)]=abs(as.numeric(Quantile.exp.mle.n25$quantiles)*1000-Quantile.true) #Use re-scaled
Table1.1[2,c(1,3,5)]=abs(as.numeric(Quantile.exp.mme.n25$quantiles)-Quantile.true)
Table1.1[3,c(2,4,6)]=abs(as.numeric(Quantile.gamma.mle.n25$quantiles)-Quantile.true)
Table1.1[3,c(1,3,5)]=abs(as.numeric(Quantile.gamma.mme.n25$quantiles)-Quantile.true)
Table1.1[4,c(2,4,6)]=abs(as.numeric(Quantile.weibull.mle.n25$quantiles)-Quantile.true)
Table1.1[4,c(1,3,5)]=abs(as.numeric(Quantile.weibull.mme.n25$quantiles)-Quantile.true)
Table1.1[5,c(2,4,6)]=abs(as.numeric(Quantile.llogis.mle.n25$quantiles)-Quantile.true)
#Table1.1[5,c(1,3,5)]=abs(as.numeric(Quantile.llogis.mme.n25$quantiles)-Quantile.true)
Table1.1[5,c(1,3,5)]=c(NA,NA,NA)
Table1.1[6,c(2,4,6)]=abs(as.numeric(Quantile.invweibull.mle.n25$quantiles)-Quantile.true)
#Table1.1[6,c(1,3,5)]=abs(as.numeric(Quantile.invweibull.mme.n25$quantiles)-Quantile.true)
Table1.1[6,c(1,3,5)]=c(NA,NA,NA)
Table1.1[7,c(2,4,6)]=abs(as.numeric(Quantile.pareto.mle.n25$quantiles)-Quantile.true)
Table1.1[7,c(1,3,5)]=abs(as.numeric(Quantile.pareto.mme.n25$quantiles)-Quantile.true)
Table1.1[8,c(2,4,6)]=abs(as.numeric(Quantile.trgamma.mle.n25$quantiles)*1000-Quantile.true) #Use re-scaled
Table1.1[8,c(1,3,5)]=abs(as.numeric(Quantile.trgamma.mme.n25$quantiles)-Quantile.true)
#print table
kable(Table1.1, digits = 3, format.args = list(big.mark = ","), caption="MLE and MOM quantile estimation errors at levels 50%, 75% and 95% for the LogNormal sample of size n=25")
MOM(50%) | MLE(50%) | MOM(75%) | MLE(75%) | MOM(95%) | MLE(95%) | |
---|---|---|---|---|---|---|
lnorm | 6,609.535 | 4,730.688 | 10,205.622 | 11,397.029 | 15,230.399 | 3.748657e+04 |
exp | 1,796.157 | 1,796.158 | 2,778.020 | 2,778.021 | 26,668.697 | 2.666870e+04 |
gamma | 11,845.148 | 1,114.471 | 6,734.219 | 2,821.699 | 8,550.559 | 2.933522e+04 |
weibull | 3,477.938 | 1,990.717 | 1,563.530 | 1,476.883 | 5,312.393 | 1.961538e+04 |
llogis | NA | 4,746.214 | NA | 5,934,882.525 | NA | 1.102743e+11 |
invwei | NA | 18,847.107 | NA | 29,245.701 | NA | 1.416476e+05 |
Pareto | 6,394.976 | 6,061.714 | 8,275.686 | 8,230.097 | 14,731.784 | 1.948882e+04 |
trgamma | NA | 3,392.109 | NA | 8,640.248 | NA | 3.673115e+04 |
Table1.2=matrix(0,8,6,dimnames=list(c("lnorm","exp","gamma","Weibull","llogis","invWeibull","Pareto","trgamma"),c("MOM(50%)","MLE(50%)","MOM(75%)","MLE(75%)","MOM(95%)","MLE(95%)")))
#lnorm
mle.fln.n50 <- fitdist(Sample1.list[[2]], "lnorm", method="mle")
Quantile.lnorm.mle.n50=quantile(mle.fln.n50, probs = c(0.5, 0.75, 0.95))
mme.fln.n50 <- fitdist(Sample1.list[[2]], "lnorm", method="mme")
Quantile.lnorm.mme.n50=quantile(mme.fln.n50, probs = c(0.5, 0.75, 0.95))
#exp
mle.fe.n50 <- fitdist(Sample1.list.rescaled[[2]], "exp", method="mle") #Use re-scaled
Quantile.exp.mle.n50=quantile(mle.fe.n50, probs = c(0.5, 0.75, 0.95))
mme.fe.n50 <- fitdist(Sample1.list[[2]], "exp", method="mme")
Quantile.exp.mme.n50=quantile(mme.fe.n50, probs = c(0.5, 0.75, 0.95))
#gamma
mle.fg.n50 <- fitdist(Sample1.list[[2]], "gamma", method="mle",lower=c(0, 0), start=list(shape =1, rate=1))
Quantile.gamma.mle.n50=quantile(mle.fg.n50, probs = c(0.5, 0.75, 0.95))
mme.fg.n50 <- fitdist(Sample1.list[[2]], "gamma", method="mme")
Quantile.gamma.mme.n50=quantile(mme.fg.n50, probs = c(0.5, 0.75, 0.95))
#weibull
mle.fw.n50 <- fitdist(Sample1.list[[2]], "weibull", method="mle")
Quantile.weibull.mle.n50=quantile(mle.fw.n50, probs = c(0.5, 0.75, 0.95))
memp <- function(x, order) mean(x^order)
mme.fw.n50 <- fitdist(Sample1.list[[2]], "weibull", method="mme", order=c(1, 2), memp=memp)
Quantile.weibull.mme.n50=quantile(mme.fw.n50, probs = c(0.5, 0.75, 0.95))
#llogis
mle.fl.n50 <- fitdist(Sample1.list[[2]], "llogis", method="mle")
Quantile.llogis.mle.n50=quantile(mle.fl.n50, probs = c(0.5, 0.75, 0.95))
mme.fl.n50 <- fitdist(Sample1.list[[2]], "llogis", method="mme", optim.method="Nelder-Mead", order=c(1, 2), memp=memp)
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Quantile.llogis.mme.n50=quantile(mme.fl.n50, probs = c(0.5, 0.75, 0.95))
#invweibull
mle.fi.n50 <- fitdist(Sample1.list[[2]], "invweibull", method="mle")
Quantile.invweibull.mle.n50=quantile(mle.fi.n50, probs = c(0.5, 0.75, 0.95))
mme.fi.n50 <- fitdist(Sample1.list[[2]], "invweibull", method="mme", order=c(1, 2), memp=memp)
Quantile.invweibull.mme.n50=quantile(mme.fi.n50, probs = c(0.5, 0.75, 0.95))
#pareto
mle.fp.n50 <- fitdist(Sample1.list[[2]], "pareto", method="mle")
Quantile.pareto.mle.n50=quantile(mle.fp.n50, probs = c(0.5, 0.75, 0.95))
mme.fp.n50 <- fitdist(Sample1.list[[2]], "pareto", method="mme", order=c(1, 2), memp=memp)
Quantile.pareto.mme.n50=quantile(mme.fp.n50, probs = c(0.5, 0.75, 0.95))
#trgamma
mle.ft.n50 <- fitdist(Sample1.list.rescaled[[2]], "trgamma", method="mle",optim.method="SANN") #Use re-scaled
Quantile.trgamma.mle.n50=quantile(mle.ft.n50, probs = c(0.5, 0.75, 0.95))
memp <- function(x, order) mean(x^order)
mme.ft.n50 <- fitdist(Sample1.list[[2]], "trgamma", method="mme",optim.method="SANN", order=c(1, 2, 3), memp=memp)
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
Quantile.trgamma.mme.n50=quantile(mme.ft.n50, probs = c(0.5, 0.75, 0.95))
#Table
Table1.2[1,c(2,4,6)]=abs(as.numeric(Quantile.lnorm.mle.n50$quantiles)-Quantile.true)
Table1.2[1,c(1,3,5)]=abs(as.numeric(Quantile.lnorm.mme.n50$quantiles)-Quantile.true)
Table1.2[2,c(2,4,6)]=abs(as.numeric(Quantile.exp.mle.n50$quantiles)*1000-Quantile.true) #Use re-scaled
Table1.2[2,c(1,3,5)]=abs(as.numeric(Quantile.exp.mme.n50$quantiles)-Quantile.true)
Table1.2[3,c(2,4,6)]=abs(as.numeric(Quantile.gamma.mle.n50$quantiles)-Quantile.true)
Table1.2[3,c(1,3,5)]=abs(as.numeric(Quantile.gamma.mme.n50$quantiles)-Quantile.true)
Table1.2[4,c(2,4,6)]=abs(as.numeric(Quantile.weibull.mle.n50$quantiles)-Quantile.true)
Table1.2[4,c(1,3,5)]=abs(as.numeric(Quantile.weibull.mme.n50$quantiles)-Quantile.true)
Table1.2[5,c(2,4,6)]=abs(as.numeric(Quantile.llogis.mle.n50$quantiles)-Quantile.true)
#Table1.2[5,c(1,3,5)]=abs(as.numeric(Quantile.llogis.mme.n25$quantiles)-Quantile.true)
Table1.2[5,c(1,3,5)]=c(NA,NA,NA)
Table1.2[6,c(2,4,6)]=abs(as.numeric(Quantile.invweibull.mle.n25$quantiles)-Quantile.true)
#Table1.2[6,c(1,3,5)]=abs(as.numeric(Quantile.invweibull.mme.n25$quantiles)-Quantile.true)
Table1.2[6,c(1,3,5)]=c(NA,NA,NA)
Table1.2[7,c(2,4,6)]=abs(as.numeric(Quantile.pareto.mle.n50$quantiles)-Quantile.true)
Table1.2[7,c(1,3,5)]=abs(as.numeric(Quantile.pareto.mme.n50$quantiles)-Quantile.true)
Table1.2[8,c(2,4,6)]=abs(as.numeric(Quantile.trgamma.mle.n50$quantiles)*1000-Quantile.true) #Use re-scaled
Table1.2[8,c(1,3,5)]=abs(as.numeric(Quantile.trgamma.mme.n50$quantiles)-Quantile.true)
#print table
kable(Table1.2, digits = 3, format.args = list(big.mark = ","), caption="MLE and MOM quantile estimation errors at levels 50%, 75% and 95% for the LogNormal sample of size n=50")
MOM(50%) | MLE(50%) | MOM(75%) | MLE(75%) | MOM(95%) | MLE(95%) | |
---|---|---|---|---|---|---|
lnorm | 2,631.579 | 4,465.168 | 3,093.259 | 6,877.322 | 232.280 | 10,498.602 |
exp | 2,024.488 | 2,024.488 | 4,863.270 | 4,863.270 | 10,156.143 | 10,156.143 |
gamma | 7,596.424 | 541.577 | 1,872.793 | 4,812.937 | 23,784.094 | 4,591.929 |
Weibull | 1,819.864 | 1,791.306 | 6,273.690 | 2,575.692 | 19,988.347 | 463.243 |
llogis | NA | 6,339.707 | NA | 12,496.572 | NA | 18,895.216 |
invWeibull | NA | 18,847.107 | NA | 29,245.701 | NA | 141,647.586 |
Pareto | 2,773.589 | 5,252.152 | 725.469 | 4,062.445 | 2,831.833 | 10,469.517 |
trgamma | NA | 3,351.292 | NA | 3,175.781 | NA | 7,042.718 |
Table1.3=matrix(0,8,6,dimnames=list(c("lnorm","exp","gamma","Weibull","llogis","invWeibull","Pareto","trgamma"),c("MOM(50%)","MLE(50%)","MOM(75%)","MLE(75%)","MOM(95%)","MLE(95%)")))
#lnorm
mle.fln.n100 <- fitdist(Sample1.list[[3]], "lnorm", method="mle")
Quantile.lnorm.mle.n100=quantile(mle.fln.n100, probs = c(0.5, 0.75, 0.95))
mme.fln.n100 <- fitdist(Sample1.list[[3]], "lnorm", method="mme")
Quantile.lnorm.mme.n100=quantile(mme.fln.n100, probs = c(0.5, 0.75, 0.95))
#exp
mle.fe.n100 <- fitdist(Sample1.list.rescaled[[3]], "exp", method="mle")
Quantile.exp.mle.n100=quantile(mle.fe.n100, probs = c(0.5, 0.75, 0.95))
mme.fe.n100 <- fitdist(Sample1.list[[3]], "exp", method="mme")
Quantile.exp.mme.n100=quantile(mme.fe.n100, probs = c(0.5, 0.75, 0.95))
#gamma
mle.fg.n100 <- fitdist(Sample1.list[[3]], "gamma", method="mle",lower=c(0, 0), start=list(shape =1, rate=1))
Quantile.gamma.mle.n100=quantile(mle.fg.n100, probs = c(0.5, 0.75, 0.95))
mme.fg.n100 <- fitdist(Sample1.list[[3]], "gamma", method="mme")
Quantile.gamma.mme.n100=quantile(mme.fg.n100, probs = c(0.5, 0.75, 0.95))
#weibull
mle.fw.n100 <- fitdist(Sample1.list[[3]], "weibull", method="mle")
Quantile.weibull.mle.n100=quantile(mle.fw.n100, probs = c(0.5, 0.75, 0.95))
memp <- function(x, order) mean(x^order)
mme.fw.n100 <- fitdist(Sample1.list[[3]], "weibull", method="mme", order=c(1, 2), memp=memp)
Quantile.weibull.mme.n100=quantile(mme.fw.n100, probs = c(0.5, 0.75, 0.95))
#llogis
mle.fl.n100 <- fitdist(Sample1.list[[3]], "llogis", method="mle")
Quantile.llogis.mle.n100=quantile(mle.fl.n100, probs = c(0.5, 0.75, 0.95))
mme.fl.n100 <- fitdist(Sample1.list[[3]], "llogis", method="mme", optim.method="Nelder-Mead", order=c(1, 2), memp=memp)
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Quantile.llogis.mme.n100=quantile(mme.fl.n100, probs = c(0.5, 0.75, 0.95))
#invweibull
mle.fi.n100 <- fitdist(Sample1.list[[3]], "invweibull", method="mle")
Quantile.invweibull.mle.n100=quantile(mle.fi.n100, probs = c(0.5, 0.75, 0.95))
mme.fi.n100 <- fitdist(Sample1.list[[3]], "invweibull", method="mme", order=c(1, 2), memp=memp)
Quantile.invweibull.mme.n100=quantile(mme.fi.n100, probs = c(0.5, 0.75, 0.95))
#pareto
mle.fp.n100 <- fitdist(Sample1.list[[3]], "pareto", method="mle")
Quantile.pareto.mle.n100=quantile(mle.fp.n100, probs = c(0.5, 0.75, 0.95))
mme.fp.n100 <- fitdist(Sample1.list[[3]], "pareto", method="mme", order=c(1, 2), memp=memp)
Quantile.pareto.mme.n100=quantile(mme.fp.n100, probs = c(0.5, 0.75, 0.95))
#trgamma
mle.ft.n100 <- fitdist(Sample1.list.rescaled[[3]], "trgamma", method="mle",optim.method="SANN")
Quantile.trgamma.mle.n100=quantile(mle.ft.n100, probs = c(0.5, 0.75, 0.95))
memp <- function(x, order) mean(x^order)
mme.ft.n100 <- fitdist(Sample1.list[[3]], "trgamma", method="mme", order=c(1, 2, 3), memp=memp)
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
Quantile.trgamma.mme.n100=quantile(mme.ft.n100, probs = c(0.5, 0.75, 0.95))
#Table
Table1.3[1,c(2,4,6)]=abs(as.numeric(Quantile.lnorm.mle.n100$quantiles)-Quantile.true)
Table1.3[1,c(1,3,5)]=abs(as.numeric(Quantile.lnorm.mme.n100$quantiles)-Quantile.true)
Table1.3[2,c(2,4,6)]=abs(as.numeric(Quantile.exp.mle.n100$quantiles)*1000-Quantile.true)
Table1.3[2,c(1,3,5)]=abs(as.numeric(Quantile.exp.mme.n100$quantiles)-Quantile.true)
Table1.3[3,c(2,4,6)]=abs(as.numeric(Quantile.gamma.mle.n100$quantiles)-Quantile.true)
Table1.3[3,c(1,3,5)]=abs(as.numeric(Quantile.gamma.mme.n100$quantiles)-Quantile.true)
Table1.3[4,c(2,4,6)]=abs(as.numeric(Quantile.weibull.mle.n100$quantiles)-Quantile.true)
Table1.3[4,c(1,3,5)]=abs(as.numeric(Quantile.weibull.mme.n100$quantiles)-Quantile.true)
Table1.3[5,c(2,4,6)]=abs(as.numeric(Quantile.llogis.mle.n100$quantiles)-Quantile.true)
#Table1.3[5,c(1,3,5)]=abs(as.numeric(Quantile.llogis.mme.n25$quantiles)-Quantile.true)
Table1.3[5,c(1,3,5)]=c(NA,NA,NA)
Table1.3[6,c(2,4,6)]=abs(as.numeric(Quantile.invweibull.mle.n25$quantiles)-Quantile.true)
#Table1.3[6,c(1,3,5)]=abs(as.numeric(Quantile.invweibull.mme.n25$quantiles)-Quantile.true)
Table1.3[6,c(1,3,5)]=c(NA,NA,NA)
Table1.3[7,c(2,4,6)]=abs(as.numeric(Quantile.pareto.mle.n100$quantiles)-Quantile.true)
Table1.3[7,c(1,3,5)]=abs(as.numeric(Quantile.pareto.mme.n100$quantiles)-Quantile.true)
Table1.3[8,c(2,4,6)]=abs(as.numeric(Quantile.trgamma.mle.n100$quantiles)*1000-Quantile.true)
Table1.3[8,c(1,3,5)]=abs(as.numeric(Quantile.trgamma.mme.n100$quantiles)-Quantile.true)
#print table
kable(Table1.3, digits = 3, format.args = list(big.mark = ","), caption="MLE and MOM quantile estimation errors at levels 50%, 75% and 95% for the LogNormal sample of size n=100")
MOM(50%) | MLE(50%) | MOM(75%) | MLE(75%) | MOM(95%) | MLE(95%) | |
---|---|---|---|---|---|---|
lnorm | 2,114.735 | 1,176.609 | 315.869 | 1,099.864 | 15,871.023 | 1.467848e+04 |
exp | 2,053.784 | 2,053.784 | 4,921.863 | 4,921.863 | 10,029.525 | 1.002952e+04 |
gamma | 1,383.222 | 4,788.592 | 4,923.187 | 4,528.691 | 7,471.584 | 2.104357e+04 |
Weibull | 6,160.097 | 4,236.670 | 8,900.913 | 5,545.530 | 10,797.083 | 1.699704e+04 |
llogis | NA | 1,346.923 | NA | 6,484,788.997 | NA | 8.404552e+10 |
invWeibull | NA | 18,847.107 | NA | 29,245.701 | NA | 1.416476e+05 |
Pareto | 1,526.024 | 1,547.151 | 4,412.807 | 4,447.684 | 8,316.650 | 8.280262e+03 |
trgamma | NA | 2,812.880 | NA | 1,555.266 | NA | 1.756407e+04 |
Table1.4=matrix(0,8,6,dimnames=list(c("lnorm","exp","gamma","weibull","llogis","invwei","Pareto","trgamma"),c("MOM(50%)","MLE(50%)","MOM(75%)","MLE(75%)","MOM(95%)","MLE(95%)")))
#lnorm
mle.fln.n250 <- fitdist(Sample1.list[[4]], "lnorm", method="mle")
Quantile.lnorm.mle.n250=quantile(mle.fln.n250, probs = c(0.5, 0.75, 0.95))
mme.fln.n250 <- fitdist(Sample1.list[[4]], "lnorm", method="mme")
Quantile.lnorm.mme.n250=quantile(mme.fln.n250, probs = c(0.5, 0.75, 0.95))
#exp
mle.fe.n250 <- fitdist(Sample1.list.rescaled[[4]], "exp", method="mle")
Quantile.exp.mle.n250=quantile(mle.fe.n250, probs = c(0.5, 0.75, 0.95))
mme.fe.n250 <- fitdist(Sample1.list[[4]], "exp", method="mme")
Quantile.exp.mme.n250=quantile(mme.fe.n250, probs = c(0.5, 0.75, 0.95))
#gamma
mle.fg.n250 <- fitdist(Sample1.list[[4]], "gamma", method="mle",lower=c(0, 0), start=list(shape =1, rate=1))
Quantile.gamma.mle.n250=quantile(mle.fg.n250, probs = c(0.5, 0.75, 0.95))
mme.fg.n250 <- fitdist(Sample1.list[[4]], "gamma", method="mme")
Quantile.gamma.mme.n250=quantile(mme.fg.n250, probs = c(0.5, 0.75, 0.95))
#weibull
mle.fw.n250 <- fitdist(Sample1.list[[4]], "weibull", method="mle")
Quantile.weibull.mle.n250=quantile(mle.fw.n250, probs = c(0.5, 0.75, 0.95))
memp <- function(x, order) mean(x^order)
mme.fw.n250 <- fitdist(Sample1.list[[4]], "weibull", method="mme", order=c(1, 2), memp=memp)
Quantile.weibull.mme.n250=quantile(mme.fw.n250, probs = c(0.5, 0.75, 0.95))
#llogis
mle.fl.n250 <- fitdist(Sample1.list[[4]], "llogis", method="mle")
Quantile.llogis.mle.n250=quantile(mle.fl.n250, probs = c(0.5, 0.75, 0.95))
mme.fl.n250 <- fitdist(Sample1.list[[4]], "llogis", method="mme", optim.method="Nelder-Mead", order=c(1, 2), memp=memp)
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Quantile.llogis.mme.n250=quantile(mme.fl.n250, probs = c(0.5, 0.75, 0.95))
#invweibull
mle.fi.n250 <- fitdist(Sample1.list[[4]], "invweibull", method="mle")
Quantile.invweibull.mle.n250=quantile(mle.fi.n250, probs = c(0.5, 0.75, 0.95))
mme.fi.n250 <- fitdist(Sample1.list[[4]], "invweibull", method="mme", order=c(1, 2), memp=memp)
Quantile.invweibull.mme.n250=quantile(mme.fi.n250, probs = c(0.5, 0.75, 0.95))
#pareto
mle.fp.n250 <- fitdist(Sample1.list[[4]], "pareto", method="mle")
Quantile.pareto.mle.n250=quantile(mle.fp.n250, probs = c(0.5, 0.75, 0.95))
mme.fp.n250 <- fitdist(Sample1.list[[4]], "pareto", method="mme", order=c(1, 2), memp=memp)
Quantile.pareto.mme.n250=quantile(mme.fp.n250, probs = c(0.5, 0.75, 0.95))
#trgamma
mle.ft.n250 <- fitdist(Sample1.list.rescaled[[4]], "trgamma", method="mle",optim.method="SANN")
Quantile.trgamma.mle.n250=quantile(mle.ft.n250, probs = c(0.5, 0.75, 0.95))
memp <- function(x, order) mean(x^order)
mme.ft.n250 <- fitdist(Sample1.list[[4]], "trgamma", method="mme", order=c(1, 2, 3), memp=memp)
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
Quantile.trgamma.mme.n250=quantile(mme.ft.n250, probs = c(0.5, 0.75, 0.95))
#Table
Table1.4[1,c(2,4,6)]=abs(as.numeric(Quantile.lnorm.mle.n250$quantiles)-Quantile.true)
Table1.4[1,c(1,3,5)]=abs(as.numeric(Quantile.lnorm.mme.n250$quantiles)-Quantile.true)
Table1.4[2,c(2,4,6)]=abs(as.numeric(Quantile.exp.mle.n250$quantiles)*1000-Quantile.true)
Table1.4[2,c(1,3,5)]=abs(as.numeric(Quantile.exp.mme.n250$quantiles)-Quantile.true)
Table1.4[3,c(2,4,6)]=abs(as.numeric(Quantile.gamma.mle.n250$quantiles)-Quantile.true)
Table1.4[3,c(1,3,5)]=abs(as.numeric(Quantile.gamma.mme.n250$quantiles)-Quantile.true)
Table1.4[4,c(2,4,6)]=abs(as.numeric(Quantile.weibull.mle.n250$quantiles)-Quantile.true)
Table1.4[4,c(1,3,5)]=abs(as.numeric(Quantile.weibull.mme.n250$quantiles)-Quantile.true)
Table1.4[5,c(2,4,6)]=abs(as.numeric(Quantile.llogis.mle.n250$quantiles)-Quantile.true)
#Table1.4[5,c(1,3,5)]=abs(as.numeric(Quantile.llogis.mme.n25$quantiles)-Quantile.true)
Table1.4[5,c(1,3,5)]=c(NA,NA,NA)
Table1.4[6,c(2,4,6)]=abs(as.numeric(Quantile.invweibull.mle.n25$quantiles)-Quantile.true)
#Table1.4[6,c(1,3,5)]=abs(as.numeric(Quantile.invweibull.mme.n25$quantiles)-Quantile.true)
Table1.4[6,c(1,3,5)]=c(NA,NA,NA)
Table1.4[7,c(2,4,6)]=abs(as.numeric(Quantile.pareto.mle.n250$quantiles)-Quantile.true)
Table1.4[7,c(1,3,5)]=abs(as.numeric(Quantile.pareto.mme.n250$quantiles)-Quantile.true)
Table1.4[8,c(2,4,6)]=abs(as.numeric(Quantile.trgamma.mle.n250$quantiles)*1000-Quantile.true)
Table1.4[8,c(1,3,5)]=abs(as.numeric(Quantile.trgamma.mme.n250$quantiles)-Quantile.true)
#print table
kable(Table1.4, digits = 3, format.args = list(big.mark = ","), caption="MLE and MOM quantile estimation errors at levels 50%, 75% and 95% for the LogNormal sample of size n=250")
MOM(50%) | MLE(50%) | MOM(75%) | MLE(75%) | MOM(95%) | MLE(95%) | |
---|---|---|---|---|---|---|
lnorm | 891.759 | 788.462 | 2,152.874 | 2,589.015 | 7,225.328 | 10,897.672 |
exp | 4,514.943 | 4,514.943 | 9,844.182 | 9,844.181 | 607.428 | 607.427 |
gamma | 2,936.292 | 5,506.932 | 8,254.973 | 9,777.674 | 27,308.961 | 3,275.622 |
weibull | 5,478.712 | 4,104.661 | 15,668.511 | 9,677.747 | 23,256.793 | 1,839.958 |
llogis | NA | 1,148.783 | NA | 224.350 | NA | 10,911.390 |
invwei | NA | 18,847.107 | NA | 29,245.701 | NA | 141,647.586 |
Pareto | 319.000 | 1,326.292 | 5,149.616 | 6,229.499 | 12,601.131 | 8,545.292 |
trgamma | NA | 2,392.174 | NA | 4,916.701 | NA | 1,890.435 |
Table1.5=matrix(0,8,6,dimnames=list(c("lnorm","exp","gamma","Weibull","llogis","invWeibull","Pareto","trgamma"),c("MOM(50%)","MLE(50%)","MOM(75%)","MLE(75%)","MOM(95%)","MLE(95%)")))
#lnorm
mle.fln.n500 <- fitdist(Sample1.list[[5]], "lnorm", method="mle")
Quantile.lnorm.mle.n500=quantile(mle.fln.n500, probs = c(0.5, 0.75, 0.95))
mme.fln.n500 <- fitdist(Sample1.list[[5]], "lnorm", method="mme")
Quantile.lnorm.mme.n500=quantile(mme.fln.n500, probs = c(0.5, 0.75, 0.95))
#exp
mle.fe.n500 <- fitdist(Sample1.list.rescaled[[5]], "exp", method="mle")
Quantile.exp.mle.n500=quantile(mle.fe.n500, probs = c(0.5, 0.75, 0.95))
mme.fe.n500 <- fitdist(Sample1.list[[5]], "exp", method="mme")
Quantile.exp.mme.n500=quantile(mme.fe.n500, probs = c(0.5, 0.75, 0.95))
#gamma
mle.fg.n500 <- fitdist(Sample1.list[[5]], "gamma", method="mle",lower=c(0, 0), start=list(shape =1, rate=1))
Quantile.gamma.mle.n500=quantile(mle.fg.n500, probs = c(0.5, 0.75, 0.95))
mme.fg.n500 <- fitdist(Sample1.list[[5]], "gamma", method="mme")
Quantile.gamma.mme.n500=quantile(mme.fg.n500, probs = c(0.5, 0.75, 0.95))
#weibull
mle.fw.n500 <- fitdist(Sample1.list[[5]], "weibull", method="mle")
Quantile.weibull.mle.n500=quantile(mle.fw.n500, probs = c(0.5, 0.75, 0.95))
memp <- function(x, order) mean(x^order)
mme.fw.n500 <- fitdist(Sample1.list[[5]], "weibull", method="mme", order=c(1, 2), memp=memp)
Quantile.weibull.mme.n500=quantile(mme.fw.n500, probs = c(0.5, 0.75, 0.95))
#llogis
mle.fl.n500 <- fitdist(Sample1.list[[5]], "llogis", method="mle")
Quantile.llogis.mle.n500=quantile(mle.fl.n500, probs = c(0.5, 0.75, 0.95))
mme.fl.n500 <- fitdist(Sample1.list[[5]], "llogis", method="mme", optim.method="Nelder-Mead", order=c(1, 2), memp=memp)
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Quantile.llogis.mme.n500=quantile(mme.fl.n500, probs = c(0.5, 0.75, 0.95))
#invweibull
mle.fi.n500 <- fitdist(Sample1.list[[5]], "invweibull", method="mle")
Quantile.invweibull.mle.n500=quantile(mle.fi.n500, probs = c(0.5, 0.75, 0.95))
mme.fi.n500 <- fitdist(Sample1.list[[5]], "invweibull", method="mme", order=c(1, 2), memp=memp)
Quantile.invweibull.mme.n500=quantile(mme.fi.n500, probs = c(0.5, 0.75, 0.95))
#pareto
mle.fp.n500 <- fitdist(Sample1.list[[5]], "pareto", method="mle")
Quantile.pareto.mle.n500=quantile(mle.fp.n500, probs = c(0.5, 0.75, 0.95))
mme.fp.n500 <- fitdist(Sample1.list[[5]], "pareto", method="mme", order=c(1, 2), memp=memp)
Quantile.pareto.mme.n500=quantile(mme.fp.n500, probs = c(0.5, 0.75, 0.95))
#trgamma
mle.ft.n500 <- fitdist(Sample1.list.rescaled[[5]], "trgamma", method="mle",optim.method="SANN")
Quantile.trgamma.mle.n500=quantile(mle.ft.n500, probs = c(0.5, 0.75, 0.95))
memp <- function(x, order) mean(x^order)
mme.ft.n500 <- fitdist(Sample1.list[[5]], "trgamma", method="mme", order=c(1, 2, 3), memp=memp)
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, mdistnam = mdistname, memp = memp, hessian = TRUE, method = opt.meth, lower = lower, upper = upper, ...): non-finite finite-difference value [3]>
Quantile.trgamma.mme.n500=quantile(mme.ft.n500, probs = c(0.5, 0.75, 0.95))
#Table
Table1.5[1,c(2,4,6)]=abs(as.numeric(Quantile.lnorm.mle.n500$quantiles)-Quantile.true)
Table1.5[1,c(1,3,5)]=abs(as.numeric(Quantile.lnorm.mme.n500$quantiles)-Quantile.true)
Table1.5[2,c(2,4,6)]=abs(as.numeric(Quantile.exp.mle.n500$quantiles)*1000-Quantile.true)
Table1.5[2,c(1,3,5)]=abs(as.numeric(Quantile.exp.mme.n500$quantiles)-Quantile.true)
Table1.5[3,c(2,4,6)]=abs(as.numeric(Quantile.gamma.mle.n500$quantiles)-Quantile.true)
Table1.5[3,c(1,3,5)]=abs(as.numeric(Quantile.gamma.mme.n500$quantiles)-Quantile.true)
Table1.5[4,c(2,4,6)]=abs(as.numeric(Quantile.weibull.mle.n500$quantiles)-Quantile.true)
Table1.5[4,c(1,3,5)]=abs(as.numeric(Quantile.weibull.mme.n500$quantiles)-Quantile.true)
Table1.5[5,c(2,4,6)]=abs(as.numeric(Quantile.llogis.mle.n500$quantiles)-Quantile.true)
#Table1.5[5,c(1,3,5)]=abs(as.numeric(Quantile.llogis.mme.n25$quantiles)-Quantile.true)
Table1.5[5,c(1,3,5)]=c(NA,NA,NA)
Table1.5[6,c(2,4,6)]=abs(as.numeric(Quantile.invweibull.mle.n25$quantiles)-Quantile.true)
#Table1.5[6,c(1,3,5)]=abs(as.numeric(Quantile.invweibull.mme.n25$quantiles)-Quantile.true)
Table1.5[6,c(1,3,5)]=c(NA,NA,NA)
Table1.5[7,c(2,4,6)]=abs(as.numeric(Quantile.pareto.mle.n500$quantiles)-Quantile.true)
Table1.5[7,c(1,3,5)]=abs(as.numeric(Quantile.pareto.mme.n500$quantiles)-Quantile.true)
Table1.5[8,c(2,4,6)]=abs(as.numeric(Quantile.trgamma.mle.n500$quantiles)*1000-Quantile.true)
Table1.5[8,c(1,3,5)]=abs(as.numeric(Quantile.trgamma.mme.n500$quantiles)-Quantile.true)
#print table
kable(Table1.5, digits = 3, format.args = list(big.mark = ","), caption="MLE and MOM quantile estimation errors at levels 50%, 75% and 95% for the LogNormal sample of size n=500")
MOM(50%) | MLE(50%) | MOM(75%) | MLE(75%) | MOM(95%) | MLE(95%) | |
---|---|---|---|---|---|---|
lnorm | 1,025.327 | 989.887 | 2,462.778 | 2,519.477 | 8,184.871 | 8,799.418 |
exp | 1,589.495 | 1,589.495 | 3,993.285 | 3,993.284 | 12,036.150 | 12,036.151 |
gamma | 3,830.746 | 3,012.984 | 3,063.139 | 3,854.470 | 7,568.605 | 17,698.725 |
Weibull | 3,026.117 | 1,919.941 | 8,980.098 | 4,115.913 | 4,044.073 | 13,048.882 |
llogis | NA | 1,058.191 | NA | 4,556.497 | NA | 5,874.291 |
invWeibull | NA | 18,847.107 | NA | 29,245.701 | NA | 141,647.586 |
Pareto | 1,645.910 | 642.249 | 449.642 | 1,537.962 | 2,558.748 | 6,185.344 |
trgamma | NA | 362.050 | NA | 351.373 | NA | 14,517.625 |
Akaike Information Criterion (AIC) is useful tool for model selection that applies only to MLE and not to MOM. In other words, the smallest or the lowest value (depending how AIC is defined) indicates the best MLE-based model choice; the lowest AIC indicates the best model in R. From now on, the best distribution chosen through AIC is denoted as DISToo.
Since MLE for exp and trgamma requires re-scalling the data, these two distributions are ignored from the AIC-based comparison, though something more clever could have been done.
#print table
kable(Table.AIC, digits = 3, format.args = list(big.mark = ","), caption="AIC summary for LogNormal samples of size n=25, n=50, n=100, n=250 and n=500")
(MLE,n=25) | (MLE,n=50) | (MLE,n=100) | (MLE,n=250) | (MLE,n=500) | |
---|---|---|---|---|---|
lnorm | 557.858 | 1,130.847 | 2,273.682 | 5,747.808 | 11,355.90 |
exp | NA | NA | NA | NA | NA |
gamma | 567.952 | 1,148.695 | 2,289.276 | 5,779.020 | 11,432.33 |
Weibull | 567.856 | 1,146.482 | 2,292.847 | 5,780.328 | 11,439.93 |
llogis | 645.173 | 1,131.705 | 2,620.043 | 5,742.489 | 11,362.76 |
invWeibull | 601.066 | 1,204.042 | 2,433.059 | 6,394.136 | 12,438.07 |
Pareto | 564.503 | 1,140.397 | 2,295.020 | 5,768.745 | 11,424.99 |
trgamma | NA | NA | NA | NA | NA |
#print table
kable(Table.AIC.min, caption="Best (or DISToo) distribution chosen through AIC for LogNormal samples of size n=25, n=50, n=100, n=250 and n=500")
(MLE,n=25) | (MLE,n=50) | (MLE,n=100) | (MLE,n=250) | (MLE,n=500) | |
---|---|---|---|---|---|
Min (AIC) | lnorm | lnorm | lnorm | llogis | lnorm |
It looks like AIC does make the right choice, except of the LogNormal sample of size \(n=250\), though llogis marginally surpasses the `true’ choice, namely lnorm.
Point estimates – such our previous quantile estimates – are sensitive to sampling error, and thus, bootstrapping is often use to reduce the uncertainty of point estimation. That is, we bootstrap \(n_b=25\) bootstrap samples with replacement from the corresponding sample and fit only the parametric family chosen before as the best AIC distribution, which means that we are
Recall that AIC identified lnorm as the best choice for our initial samples of sizes \(n\in\{25,50,100,500\}\), while llogis was the best choice for our initial sample of size \(n=250\).
The bootstrap sampling for i) to v) – defined above – are now given.
#fix the seed inside the loop -- e.g. set.seed(567*b) -- so the bootstrap sampling generates the same output every time the code is run
#Sample1: the case of n=25, **lnorm** based on min(AIC), samples with replacement=25
MLE.lnorm.boot.n25=matrix(0,25,2,dimnames=list(c(1:25),c("meanlog","sdlog")))
for (b in 1:25) {
set.seed(567*b)#!!!set the seed inside the loop!!!
Sample1.boot=sample(Sample1.list[[1]], replace = TRUE)
MLE.lnorm=mledist(Sample1.boot, "lnorm")
MLE.lnorm.boot.n25[b,]=MLE.lnorm$estimate
}
MLE.lnorm.boot.mean.n25=colMeans(MLE.lnorm.boot.n25)
#Sample1: the case of n=50, **lnorm** based on min(AIC), samples with replacement=25
MLE.lnorm.boot.n50=matrix(0,25,2,dimnames=list(c(1:25),c("meanlog","sdlog")))
for (b in 1:25) {
set.seed(567*b)#!!!set the seed inside the loop!!!
Sample1.boot=sample(Sample1.list[[2]], replace = TRUE)
MLE.lnorm=mledist(Sample1.boot, "lnorm")
MLE.lnorm.boot.n50[b,]=MLE.lnorm$estimate
}
MLE.lnorm.boot.mean.n50=colMeans(MLE.lnorm.boot.n50)
#Sample1: the case of n=100, **lnorm** based on min(AIC), samples with replacement=25
MLE.lnorm.boot.n100=matrix(0,25,2,dimnames=list(c(1:25),c("meanlog","sdlog")))
for (b in 1:25) {
set.seed(567*b)#!!!set the seed inside the loop!!!
Sample1.boot=sample(Sample1.list[[3]], replace = TRUE)
MLE.lnorm=mledist(Sample1.boot, "lnorm")
MLE.lnorm.boot.n100[b,]=MLE.lnorm$estimate
}
MLE.lnorm.boot.mean.n100=colMeans(MLE.lnorm.boot.n100)
#Sample1: the case of n=250, **llogis** based on min(AIC), samples with replacement=25
MLE.llogis.boot.n250=matrix(0,25,2,dimnames=list(c(1:25),c("shape","scale")))
for (b in 1:25) {
set.seed(567*b)#!!!set the seed inside the loop!!!
Sample1.boot=sample(Sample1.list[[4]], replace = TRUE)
MLE.llogis=mledist(Sample1.boot, "llogis")
MLE.llogis.boot.n250[b,]=MLE.llogis$estimate
}
MLE.llogis.boot.mean.n250=colMeans(MLE.llogis.boot.n250)
#Sample1: the case of n=500, **lnorm** based on min(AIC), samples with replacement=25
MLE.lnorm.boot.n500=matrix(0,25,2,dimnames=list(c(1:25),c("meanlog","sdlog")))
for (b in 1:25) {
set.seed(567*b)#!!!set the seed inside the loop!!!
Sample1.boot=sample(Sample1.list[[5]], replace = TRUE)
MLE.lnorm=mledist(Sample1.boot, "lnorm")
MLE.lnorm.boot.n500[b,]=MLE.lnorm$estimate
}
MLE.lnorm.boot.mean.n500=colMeans(MLE.lnorm.boot.n500)
The sets of \(n_b=25\) MLE estimates are now averaged and obtain the so-called Bootstrap MLE estimates, which are tabulated below.
kable(MLE.lnorm.boot.mean.n25, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming lnorm (n=25)")
x | |
---|---|
meanlog | 9.727 |
sdlog | 0.842 |
kable(MLE.lnorm.boot.mean.n50, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming lnorm (n=50)")
x | |
---|---|
meanlog | 9.777 |
sdlog | 1.076 |
kable(MLE.lnorm.boot.mean.n100, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming lnorm (n=100)")
x | |
---|---|
meanlog | 10.104 |
sdlog | 0.883 |
kable(MLE.llogis.boot.mean.n250, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming llogis (n=250)")
x | |
---|---|
shape | 1.569 |
scale | 23,511.935 |
kable(MLE.lnorm.boot.mean.n500, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming lnorm (n=500)")
x | |
---|---|
meanlog | 9.959 |
sdlog | 0.974 |
We now compute the bootstrap quantile estimates as we have agreed upon the parametric model is the ‘best’ option. Recall that Dist00 is lnorm for all \(n=\{25, 50, 100, 500\}\) and llogis if \(n=250\), while the true distribution is lnorm of \((\mu=10,\sigma=1)\).
We report the following:
\[\begin{eqnarray*} &&(1)\;\text{Compute} \;q\big(xx; \hat{\theta}_{MLE}(n,i), Dist00\big) \quad \text{for all}\; i=1,\ldots, 25\quad\text{and}\; n=\{25, 50, 100, 250, 500\},\\ &&(2)\;\text{Compute} \;\bar{\bar{q}}_{boot} (xx;n) =\frac{1}{25} \sum_{i=1}^{25} q\big(xx; \hat{\theta}_{MLE}(n,i), Dist00\big),\quad \text{for all}\;n=\{25, 50, 100, 250, 500\},\\ &&(3)\;\text{Report} \;\big|\bar{\bar{q}}_{boot} (xx;n)-xx\%\mbox{quantiles of the 'true' distribution}\big|\quad \text{for all}\;n=\{25, 50, 100, 250, 500\}, \end{eqnarray*}\] where (1) are the quantile estimates for (individual) bootstrap samples, (2) are quantile bootstrap estimates, and (3) evaluate the estimation errors with respect to (2). All computations are reported for the three quantile levels \(xx=\{50\%, 75\%, 95\%\}\).
The background computations – for (1), (2) and (3) – are now given.
#(best n=25) lnorm
Quantile.best.mle.n25.each=matrix(0,25,3,dimnames=list(NULL,c("q_boot(50%)","q_boot(75%)","q_boot(95%)")))
for (i in 1:25) {
Quantile.best.mle.n25.each[i,]=qlnorm(c(0.5,0.75,0.95), meanlog=MLE.lnorm.boot.n25[i,1], sdlog=MLE.lnorm.boot.n25[i,2])
}
Quantile.best.mle.n25.barbar=colMeans(Quantile.best.mle.n25.each)
#(best n=50) lnorm
Quantile.best.mle.n50.each=matrix(0,25,3,dimnames=list(NULL,c("q_boot(50%)","q_boot(75%)","q_boot(95%)")))
for (i in 1:25) {
Quantile.best.mle.n50.each[i,]=qlnorm(c(0.5,0.75,0.95), meanlog=MLE.lnorm.boot.n50[i,1], sdlog=MLE.lnorm.boot.n50[i,2])
}
Quantile.best.mle.n50.barbar=colMeans(Quantile.best.mle.n50.each)
#(best n=100) lnorm
Quantile.best.mle.n100.each=matrix(0,25,3,dimnames=list(NULL,c("q_boot(50%)","q_boot(75%)","q_boot(95%)")))
for (i in 1:25) {
Quantile.best.mle.n100.each[i,]=qlnorm(c(0.5,0.75,0.95), meanlog=MLE.lnorm.boot.n100[i,1], sdlog=MLE.lnorm.boot.n100[i,2])
}
Quantile.best.mle.n100.barbar=colMeans(Quantile.best.mle.n100.each)
#(best n=250) llogis
Quantile.best.mle.n250.each=matrix(0,25,3,dimnames=list(NULL,c("q_boot(50%)","q_boot(75%)","q_boot(95%)")))
for (i in 1:25) {
Quantile.best.mle.n250.each[i,]=qllogis(c(0.5,0.75,0.95), shape=MLE.llogis.boot.n250[i,1], scale=MLE.llogis.boot.n250[i,2])
}
Quantile.best.mle.n250.barbar=colMeans(Quantile.best.mle.n250.each)
#(best n=500) lnorm
Quantile.best.mle.n500.each=matrix(0,25,3,dimnames=list(NULL,c("q_boot(50%)","q_boot(75%)","q_boot(95%)")))
for (i in 1:25) {
Quantile.best.mle.n500.each[i,]=qlnorm(c(0.5,0.75,0.95), meanlog=MLE.lnorm.boot.n500[i,1], sdlog=MLE.lnorm.boot.n500[i,2])
}
Quantile.best.mle.n500.barbar=colMeans(Quantile.best.mle.n500.each)
# Table2a
Table2a=matrix(0,5,3,dimnames=list(c("n=25 (lnorm)","n=50 (lnorm)","n=100 (lnorm)","n=250 (llogis)","n=500 (lnorm)"),c("boot(50%)","boot(75%)","boot(95%)")))
Table2a[1,]=abs(Quantile.best.mle.n25.barbar-Quantile.true) #lnorm
Table2a[2,]=abs(Quantile.best.mle.n50.barbar-Quantile.true) #lnorm
Table2a[3,]=abs(Quantile.best.mle.n100.barbar-Quantile.true)#lnorm
Table2a[4,]=abs(Quantile.best.mle.n250.barbar-Quantile.true)#llogis
Table2a[5,]=abs(Quantile.best.mle.n500.barbar-Quantile.true)#lnorm
The output (2) is summarised below.
kable(Table2a, digits = 3, format.args = list(big.mark = ","), caption="Quantile bootstrap estimation errors for LogNormal samples of size n=25, n=50, n=100, n=250 and n=500")
boot(50%) | boot(75%) | boot(95%) | |
---|---|---|---|
n=25 (lnorm) | 5,118.359 | 13,267.308 | 4.505281e+04 |
n=50 (lnorm) | 4,140.955 | 5,900.422 | 5.205544e+03 |
n=100 (lnorm) | 2,521.460 | 1,348.219 | 8.685972e+03 |
n=250 (llogis) | 1,485.469 | 529,456.344 | 4.167706e+09 |
n=500 (lnorm) | 863.649 | 2,414.141 | 8.968366e+03 |
The \(n=250\) sample leads to enormous quantile bootstrap estimates at levels \(xx=\{75\%, 95\%\}\), and we need to understand why we have such an anomaly. Recall that MLE estimations in R rely on some optimisation solvers that may not identify a global maximum (if that would exist), but ignore this possible issue once again, and look into the quantile estimates for (individual) bootstrap samples – see (1) – when \(n=250\).
kable(Quantile.best.mle.n250.each[order(Quantile.best.mle.n250.each[,3]),], digits = 3, format.args = list(big.mark = ","), caption="Quantile estimates for (individual) bootstrap samples coming from the LogNormal sample of size n=250")
q_boot(50%) | q_boot(75%) | q_boot(95%) |
---|---|---|
25,210.59 | 43,962.93 | 1.119045e+05 |
22,750.66 | 41,324.46 | 1.126478e+05 |
24,990.67 | 44,010.39 | 1.138929e+05 |
23,310.87 | 42,398.51 | 1.158347e+05 |
23,419.48 | 42,528.22 | 1.158784e+05 |
24,020.06 | 43,434.30 | 1.175072e+05 |
23,370.40 | 42,751.53 | 1.179313e+05 |
22,292.95 | 41,610.82 | 1.187381e+05 |
21,818.53 | 41,420.38 | 1.216037e+05 |
22,837.61 | 42,674.95 | 1.220027e+05 |
22,618.70 | 42,687.85 | 1.240936e+05 |
22,803.86 | 43,248.68 | 1.267631e+05 |
22,314.19 | 42,897.05 | 1.286263e+05 |
21,452.17 | 41,930.58 | 1.292864e+05 |
23,463.28 | 44,368.43 | 1.294032e+05 |
21,035.65 | 41,706.86 | 1.317143e+05 |
24,879.30 | 46,867.56 | 1.358218e+05 |
24,408.74 | 46,340.64 | 1.360636e+05 |
23,751.00 | 45,733.55 | 1.375070e+05 |
23,153.26 | 45,209.63 | 1.391594e+05 |
24,539.01 | 47,225.02 | 1.418608e+05 |
28,568.35 | 54,309.21 | 1.598137e+05 |
23,500.18 | 2,856,906.46 | 9.092886e+09 |
23,597.05 | 4,567,632.67 | 3.176145e+10 |
23,691.79 | 5,924,193.83 | 6.333839e+10 |
Note that the peculiar large quantile bootstrap estimates for \(n=250\) (llogis) is due to some – three in this case – extraordinary large quantile estimates for (individual) bootstrap samples amongst the \(n_b=25\) estimates. If we exclude these three bootstrap samples, then we could re-compute (2) based the 22 individual bootstrap samples, and we get that
#Check this later
Reorder <- Quantile.best.mle.n250.each[order(Quantile.best.mle.n250.each[,3]),]
Quantile.best.mle.n250.barbar.no.outlier <- colMeans(Reorder[-c(23,24,25),])
No.outlier <- matrix(0,2,3)
No.outlier[1,] <- abs(Quantile.best.mle.n250.barbar.no.outlier-Quantile.true)
No.outlier[2,] <- abs(Quantile.best.mle.n250.barbar-Quantile.true)#llogis; as computed before
colnames(No.outlier) <- c("q_boot(50%)","q_boot(75%)","q_boot(95%)")
rownames(No.outlier) <- c("n=250 (llogis) No outliers (22 bootstrap samples)","n=250 (llogis) With outliers (25 bootstrap samples)")
kable(No.outlier, digits = 3, format.args = list(big.mark = ","), caption="Quantile bootstrap estimation errors for LogNormal samples of size n=250 without outliers")
q_boot(50%) | q_boot(75%) | q_boot(95%) | |
---|---|---|---|
n=250 (llogis) No outliers (22 bootstrap samples) | 1,473.960 | 790.525 | 1.262712e+04 |
n=250 (llogis) With outliers (25 bootstrap samples) | 1,485.469 | 529,456.344 | 4.167706e+09 |
The estimation errors have been reduced by eliminating the outliers induced by bootstrap sampling error, and the bootstrap estimates are more reasonable. The summary of the quantile bootstrap errors is given below:
Table2aF <- matrix(0,6,3,dimnames=list(c("n=25 (lnorm)","n=50 (lnorm)","n=100 (lnorm)", "n=250 (llogis) With outliers (25 bootstrap samples)", "n=250 (llogis) No outliers (22 bootstrap samples)", "n=500 (lnorm)"),c("boot(50%)","boot(75%)","boot(95%)")))
Table2aF[1,] <- abs(Quantile.best.mle.n25.barbar-Quantile.true) #lnorm
Table2aF[2,] <- abs(Quantile.best.mle.n50.barbar-Quantile.true) #lnorm
Table2aF[3,] <- abs(Quantile.best.mle.n100.barbar-Quantile.true)#lnorm
Table2aF[4,] <- abs(Quantile.best.mle.n250.barbar-Quantile.true)#llogis
Table2aF[5,] <- No.outlier[1,]#llogis
Table2aF[6,] <- abs(Quantile.best.mle.n500.barbar-Quantile.true)#lnorm
kable(Table2aF, digits = 3, format.args = list(big.mark = ","), caption="Quantile bootstrap estimation errors based on (3) for all LogNormal samples with and without outliers")
boot(50%) | boot(75%) | boot(95%) | |
---|---|---|---|
n=25 (lnorm) | 5,118.359 | 13,267.308 | 4.505281e+04 |
n=50 (lnorm) | 4,140.955 | 5,900.422 | 5.205544e+03 |
n=100 (lnorm) | 2,521.460 | 1,348.219 | 8.685972e+03 |
n=250 (llogis) With outliers (25 bootstrap samples) | 1,485.469 | 529,456.344 | 4.167706e+09 |
n=250 (llogis) No outliers (22 bootstrap samples) | 1,473.960 | 790.525 | 1.262712e+04 |
n=500 (lnorm) | 863.649 | 2,414.141 | 8.968366e+03 |
So far, we reported the bootstrap errors – see (3) – and two additional sets of estimates and their associated estimation errors are provided. Specifically, we
\[\begin{eqnarray*} &&(3)\;\text{Report} \;\big|\bar{\bar{q}}_{boot} (xx;n)-xx\%\mbox{quantiles of the 'true' distribution}\big|\quad \text{for all}\;n=\{25, 50, 100, 250, 500\},\\ &&(4)\;\text{Report} \;\big|\bar{q}^{MLE}_{boot} (xx;n)-xx\%\mbox{quantiles of the 'true' distribution}\big|\quad \text{for all}\;n=\{25, 50, 100, 250, 500\},\\ &&(5)\;\text{Report} \;\big|\bar{q}^{MLE}_{sample} (xx;n)-xx\%\mbox{quantiles of the 'true' distribution}\big|\quad \text{for all}\;n=\{25, 50, 100, 250, 500\},\\ \end{eqnarray*}\] where \(\bar{\bar{q}}_{boot} (xx;n)\) is the quantile bootstrap estimator \[\bar{\bar{q}}_{boot} (xx;n) =\frac{1}{25} \sum_{i=1}^{25} q\big(xx; \hat{\theta}_{MLE}(n,i), Dist00\big),\quad \text{for all}\;n=\{25, 50, 100, 250, 500\},\] \(\bar{q}^{MLE}_{boot} (xx;n)\) is the quantile based on Dist00 with parameters estimated via the parameter bootstrap estimates \[\bar{q}^{MLE}_{boot} (xx;n) = q\bigg(xx; \widehat{\Theta}^{boot}_{MLE}(n), Dist00\bigg),\quad\text{where}\;\widehat{\Theta}^{boot}_{MLE}(n)=\frac{1}{25}\sum^{25}_{i=1} \hat{\Theta}^{boot}_{MLE}(n,i),\quad \text{for all}\;n=\{25, 50, 100, 250, 500\},\] and \(\bar{q}^{MLE}_{sample} (xx;n)\) is the quantile based on Dist00 with parameters estimated via the parameter sample estimates (i.e. without bootstrapping) \[\bar{q}^{MLE}_{sample} (xx;n) = q\bigg(xx; \widehat{\Theta}^{sample}_{MLE}(n), Dist00\bigg),\quad\text{where}\;\widehat{\Theta}^{boot}_{MLE}(n)=\frac{1}{25}\sum^{25}_{i=1} \hat{\Theta}^{boot}_{MLE}(n,i),\quad \text{for all}\;n=\{25, 50, 100, 250, 500\}.\] Note that \(\bar{q}^{MLE}_{sample} (xx;n)\) were computed at the beginning of this section introducing the bootstrap computations. All computations are reported for the three quantile levels \(xx=\{50\%, 75\%, 95\%\}\).
Table2b <- matrix(0,5,6,dimnames=list(c("n=25","n=50","n=100","n=250","n=500"),c("boot(50%)","boot(75%)","boot(95%)","table1(50%)","table1(75%)","table1(95%)")))
#Quantile.true
#################n=25, best model: lnorm
Quantile.best.mle.n25 <- qlnorm(c(0.5,0.75,0.95), meanlog = MLE.lnorm.boot.mean.n25[1], sdlog = MLE.lnorm.boot.mean.n25[2])
#Quantile.best.mle.n25
Table2b[1,c(1,2,3)] <- abs(Quantile.best.mle.n25-Quantile.true)
#################n=50, best model: lnorm
Quantile.best.mle.n50 <- qlnorm(c(0.5,0.75,0.95), meanlog = MLE.lnorm.boot.mean.n50[1], sdlog = MLE.lnorm.boot.mean.n50[2])
#Quantile.best.mle.n50
Table2b[2,c(1,2,3)] <- abs(Quantile.best.mle.n50-Quantile.true)
#################n=100, best model: lnorm
Quantile.best.mle.n100 <- qlnorm(c(0.5,0.75,0.95), meanlog = MLE.lnorm.boot.mean.n100[1], sdlog = MLE.lnorm.boot.mean.n100[2])
#Quantile.best.mle.n100
Table2b[3,c(1,2,3)] <- abs(Quantile.best.mle.n100-Quantile.true)
#################n=250, best model: llogis
Quantile.best.mle.n250 <- qllogis(c(0.5,0.75,0.95), shape=MLE.llogis.boot.mean.n250[1], scale =MLE.llogis.boot.mean.n250[2])
#Quantile.best.mle.n250
Table2b[4,c(1,2,3)] <- abs(Quantile.best.mle.n250-Quantile.true)
#################n=500, best model: lnorm
Quantile.best.mle.n500 <- qlnorm(c(0.5,0.75,0.95), meanlog = MLE.lnorm.boot.mean.n500[1], sdlog = MLE.lnorm.boot.mean.n500[2])
#Quantile.best.mle.n500
Table2b[5,c(1,2,3)] <- abs(Quantile.best.mle.n500-Quantile.true)
#From Table 1 before
Table2b[1,c(4,5,6)] <- Table1.1[1,c(2,4,6)] #lnorm, i.e first row
Table2b[2,c(4,5,6)] <- Table1.2[1,c(2,4,6)] #inorm, i.e first row
Table2b[3,c(4,5,6)] <- Table1.3[1,c(2,4,6)] #lnorm, i.e first row
Table2b[4,c(4,5,6)] <- Table1.4[5,c(2,4,6)] #llogis, i.e fifth row
Table2b[5,c(4,5,6)] <- Table1.5[1,c(2,4,6)] #lnorm, i.e first row
colnames(Table2b) <- c("q_boot(50%)","q_boot(75%)","q_boot(95%)", "q_sample(50%)","q_sample(75%)","q_sample(95%)")
rownames(Table2b) <- c("n=25 (lnorm)", "n=50 (lnorm)", "n=100 (lnorm)", "n=250 (llogis)", "n=500 (lnorm)")
kable(Table2b, digits = 3, format.args = list(big.mark = ","), caption="Quantile bootstrap estimation errors based on (4) and (5) for all LogNormal samples with outliers")
q_boot(50%) | q_boot(75%) | q_boot(95%) | q_sample(50%) | q_sample(75%) | q_sample(95%) | |
---|---|---|---|---|---|---|
n=25 (lnorm) | 5,263.810 | 13,660.436 | 47,146.892 | 4,730.688 | 11,397.029 | 37,486.570 |
n=50 (lnorm) | 4,398.817 | 6,809.925 | 10,592.784 | 4,465.168 | 6,877.322 | 10,498.602 |
n=100 (lnorm) | 2,422.207 | 1,108.441 | 9,650.478 | 1,176.609 | 1,099.864 | 14,678.478 |
n=250 (llogis) | 1,485.469 | 4,115.184 | 39,440.623 | 1,148.783 | 224.350 | 10,911.390 |
n=500 (lnorm) | 880.347 | 2,446.697 | 9,127.131 | 989.887 | 2,519.477 | 8,799.418 |
The estimation errors show a reasonable pattern, since the larger the sample is the smaller the error is. Note that outliers are not removed, and in particular sample of size \(n=250\) shows a weird behaviour. This is very likely due to sensitivity of the R optimisation function used in the MLE computations, which escalated the presence of outliers in individual bootstrap samples. Therefore, one should first check the optimality of the MLE estimates computed by R, and then detect possible outliers.
We now redo the computations from Sections 2.2, 2.3 and 2.4 for Sample 2, which is drawn from the Weibull distribution with parameters \(a=0.5\) and \(b=1\). Note that this particular Weibull distribution is a moderately heavy tailed distribution since \(a<1\). We only report the very final R outputs, since the intermediate steps are very similar to those from Sample 1 computations.
Note that set.seed(1234) is the seed set for, but no seed is set for bootstrapping computations.
set.seed(1234) #Generating Sample1 and saved in a list
n=c(25, 50, 100, 250, 500)
weibull.shape=0.5
weibull.scale=1
Sample2.list=list(n25=rep(0,25),n50=rep(0,50),n100=rep(0,100),n250=rep(0,250),n500=rep(0,500))
for (i in 1:length(n)) {
Sample2=rweibull(n[i], shape=weibull.shape , scale=weibull.scale)
Sample2.list[[i]]=Sample2
}
kable(Table.AIC, digits = 3, format.args = list(big.mark = ","), caption=("AIC summary for moderately heavy tailed Weibull samples of size n=25, n=50, n=100, n=250 and n=500"))
(MLE,n=25) | (MLE,n=50) | (MLE,n=100) | (MLE,n=250) | (MLE,n=500) | |
---|---|---|---|---|---|
lnorm | 86.957 | 153.419 | 254.780 | 517.470 | 1,107.682 |
exp | 96.413 | 185.928 | 302.191 | 804.243 | 1,748.775 |
gamma | 87.293 | 148.174 | 209.015 | 489.321 | 1,087.917 |
Weibull | 85.648 | 145.817 | 216.315 | 469.088 | 1,042.992 |
llogis | 87.223 | 150.473 | 243.450 | 499.986 | 1,098.118 |
invWeibull | 94.412 | 180.450 | 314.424 | 646.758 | 1,315.670 |
Pareto | 87.122 | 154.147 | 269.159 | 544.106 | 1,187.851 |
trgamma | 87.257 | 147.829 | 210.463 | 471.080 | 1,044.309 |
kable(Table.AIC.min, digits = 3, format.args = list(big.mark = ","), caption=("Best (or DISToo) distribution chosen through AIC for moderately heavy tailed Weibull samples of size n=25, n=50, n=100, n=250 and n=500"))
(MLE,n=25) | (MLE,n=50) | (MLE,n=100) | (MLE,n=250) | (MLE,n=500) | |
---|---|---|---|---|---|
Min (AIC) | Weibull | Weibull | gamma | Weibull | Weibull |
It looks like AIC does make the right choice, except of the Weibull sample of size \(n=100\), though gamma marginally surpasses the true choice, i.e. weibull.
We now perform the usual bootstrap sampling.
The sets of \(n_b=25\) MLE estimates are now averaged and obtain the so-called Bootstrap MLE estimates, which are tabulated below.
kable(MLE.weibull.boot.mean.n25, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming weibull (n=25)")
x | |
---|---|
shape | 0.616 |
scale | 1.715 |
kable(MLE.weibull.boot.mean.n50, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming weibull (n=50)")
x | |
---|---|
shape | 0.539 |
scale | 1.345 |
kable(MLE.gamma.boot.mean.n100, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming gamma (n=100)")
x | |
---|---|
shape | 0.39 |
rate | 0.24 |
kable(MLE.weibull.boot.mean.n250, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming weibull (n=250)")
x | |
---|---|
shape | 0.481 |
scale | 0.826 |
kable(MLE.weibull.boot.mean.n500, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming weibull (n=500)")
x | |
---|---|
shape | 0.479 |
scale | 0.928 |
As before, we report three sets of estimation errors as follows:
\[\begin{eqnarray*} &&(3)\;\text{Report} \;\big|\bar{\bar{q}}_{boot} (xx;n)-xx\%\mbox{quantiles of the 'true' distribution}\big|\quad \text{for all}\;n=\{25, 50, 100, 250, 500\},\\ &&(4)\;\text{Report} \;\big|\bar{q}^{MLE}_{boot} (xx;n)-xx\%\mbox{quantiles of the 'true' distribution}\big|\quad \text{for all}\;n=\{25, 50, 100, 250, 500\},\\ &&(5)\;\text{Report} \;\big|\bar{q}^{MLE}_{sample} (xx;n)-xx\%\mbox{quantiles of the 'true' distribution}\big|\quad \text{for all}\;n=\{25, 50, 100, 250, 500\}. \end{eqnarray*}\]
The summary of the quantile bootstrap errors – without any outlier detection – is given below:
kable(Table2a, digits = 3, format.args = list(big.mark = ","), caption="Quantile bootstrap estimation errors based on (3) for all moderately heavy tailed Weibull samples")
boot(50%) | boot(75%) | boot(95%) | |
---|---|---|---|
n=25 (weibull) | 0.450 | 1.055 | 2.223 |
n=50 (weibull) | 0.203 | 0.548 | 1.552 |
n=100 (gamma) | 0.096 | 0.140 | 2.028 |
n=250 (weibull) | 0.094 | 0.294 | 0.863 |
n=500 (weibull) | 0.049 | 0.086 | 0.230 |
colnames(Table2b) <- c("q_boot(50%)","q_boot(75%)","q_boot(95%)", "q_sample(50%)","q_sample(75%)","q_sample(95%)")
rownames(Table2b) <- c("n=25 (weibull)", "n=50 (weibull)", "n=100 (gamma)", "n=250 (weibull)", "n=500 (weibull)")
kable(Table2b, digits = 3, format.args = list(big.mark = ","), caption="Quantile bootstrap estimation errors based on (4) and (5) for all moderately heavy tailed Weibull samples")
q_boot(50%) | q_boot(75%) | q_boot(95%) | q_sample(50%) | q_sample(75%) | q_sample(95%) | |
---|---|---|---|---|---|---|
n=25 (weibull) | 0.465 | 0.992 | 1.204 | 0.430 | 0.849 | 0.576 |
n=50 (weibull) | 0.201 | 0.544 | 1.327 | 0.198 | 0.509 | 1.058 |
n=100 (gamma) | 0.093 | 0.118 | 2.160 | 0.088 | 0.136 | 2.014 |
n=250 (weibull) | 0.095 | 0.294 | 0.905 | 0.090 | 0.281 | 0.885 |
n=500 (weibull) | 0.049 | 0.087 | 0.203 | 0.052 | 0.078 | 0.358 |
The estimation errors show a reasonable pattern, since the larger the sample is the smaller the error is. Note that outliers are not removed in this section. You should also note that the true quantile values are \(q(50\%)=0.480453\), \(q(50\%)=1.921812\) and \(q(95\%)=8.974412\), so estimation errors are very small but proportional to the true values.
We now redo the computations from Sections 2.2, 2.3 and 2.4 for Sample 3, which is drawn from the Weibull distribution with parameters \(a=2\) and \(b=1\). Note that this particular Weibull distribution is a very light tailed distribution since \(a>1\). We only report the very final R outputs, since the intermediate steps are very similar to those from Sample 1 computations.
Note that set.seed(1234) is the seed set for, but no seed is set for bootstrapping computations.
set.seed(1234) #Generating Sample1 and saved in a list
n=c(25, 50, 100, 250, 500)
weibull.shape=2
weibull.scale=1
Sample3.list=list(n25=rep(0,25),n50=rep(0,50),n100=rep(0,100),n250=rep(0,250),n500=rep(0,500))
for (i in 1:length(n)) {
Sample3=rweibull(n[i], shape=weibull.shape , scale=weibull.scale)
Sample3.list[[i]]=Sample3
}
kable(Table.AIC, digits = 3, format.args = list(big.mark = ","), caption=("AIC summary for very light tailed Weibull samples of size n=25, n=50, n=100, n=250 and n=500"))
(MLE,n=25) | (MLE,n=50) | (MLE,n=100) | (MLE,n=250) | (MLE,n=500) | |
---|---|---|---|---|---|
lnorm | 32.430 | 71.909 | 161.961 | 339.342 | 677.589 |
exp | 52.137 | 96.946 | 178.449 | 419.694 | 862.573 |
gamma | 30.941 | 65.430 | 135.475 | 299.051 | 620.399 |
Weibull | 31.122 | 64.306 | 123.496 | 290.960 | 612.898 |
llogis | 32.697 | 68.962 | 150.631 | 321.859 | 668.024 |
invWeibull | 39.886 | 98.940 | 221.605 | 468.631 | 885.576 |
Pareto | 87.122 | 154.147 | 269.159 | 544.106 | 1,187.851 |
trgamma | 32.752 | 67.400 | 117.892 | 292.960 | 618.699 |
kable(Table.AIC.min, digits = 3, format.args = list(big.mark = ","), caption=("Best (or DISToo) distribution chosen through AIC for very light tailed Weibull samples of size n=25, n=50, n=100, n=250 and n=500"))
(MLE,n=25) | (MLE,n=50) | (MLE,n=100) | (MLE,n=250) | (MLE,n=500) | |
---|---|---|---|---|---|
Min (AIC) | gamma | Weibull | trgamma | Weibull | Weibull |
It looks like AIC does make the right choice, except of the Weibull samples of size \(n=25\) and \(n=100\), though gamma and trgamma marginally surpass the true' choice, namely *weibull*. Since MLE estimation for *trgamma* is not very stable -- there are three parameters to estimate -- we will further assume that the
best’ AIC choice for the Weibull sample of size \(n=100\) is weibull, which is the second best' AIC-based choice though is not far from the
best` choice.
We now perform the usual bootstrap sampling.
The sets of \(n_b=25\) MLE estimates are now averaged and obtain the so-called Bootstrap MLE estimates, which are tabulated below.
kable(MLE.gamma.boot.mean.n25, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming gamma (n=25)")
x | |
---|---|
shape | 5.028 |
rate | 4.946 |
kable(MLE.weibull.boot.mean.n50, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming weibull (n=50)")
x | |
---|---|
shape | 2.245 |
scale | 1.063 |
kable(MLE.weibull.boot.mean.n100, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming weibull (n=250)")
x | |
---|---|
shape | 2.104 |
scale | 0.996 |
kable(MLE.weibull.boot.mean.n250, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming weibull (n=250)")
x | |
---|---|
shape | 1.927 |
scale | 0.937 |
kable(MLE.weibull.boot.mean.n500, digits = 3, format.args = list(big.mark = ","), caption="Bootstrap MLE estimates assuming weibull (n=500)")
x | |
---|---|
shape | 1.898 |
scale | 0.978 |
As before, we report three sets of estimation errors as follows:
\[\begin{eqnarray*} &&(3)\;\text{Report} \;\big|\bar{\bar{q}}_{boot} (xx;n)-xx\%\mbox{quantiles of the 'true' distribution}\big|\quad \text{for all}\;n=\{25, 50, 100, 250, 500\},\\ &&(4)\;\text{Report} \;\big|\bar{q}^{MLE}_{boot} (xx;n)-xx\%\mbox{quantiles of the 'true' distribution}\big|\quad \text{for all}\;n=\{25, 50, 100, 250, 500\},\\ &&(5)\;\text{Report} \;\big|\bar{q}^{MLE}_{sample} (xx;n)-xx\%\mbox{quantiles of the 'true' distribution}\big|\quad \text{for all}\;n=\{25, 50, 100, 250, 500\}. \end{eqnarray*}\]
The summary of the quantile bootstrap errors – without any outlier detection – is given below:
kable(Table2a, digits = 3, format.args = list(big.mark = ","), caption="Quantile bootstrap estimation errors based on (3) for all very light tailed tailed Weibull samples")
boot(50%) | boot(75%) | boot(95%) | |
---|---|---|---|
n=25 (gamma) | 0.116 | 0.105 | 0.157 |
n=50 (weibull) | 0.069 | 0.054 | 0.013 |
n=100 (weibull) | 0.004 | 0.013 | 0.047 |
n=250 (weibull) | 0.058 | 0.067 | 0.073 |
n=500 (weibull) | 0.027 | 0.016 | 0.013 |
colnames(Table2) <- c("q_boot(50%)","q_boot(75%)","q_boot(95%)", "q_sample(50%)","q_sample(75%)","q_sample(95%)")
rownames(Table2) <- c("n=25 (gamma)", "n=50 (weibull)", "n=100 (weibull)", "n=250 (weibull)", "n=500 (weibull)")
kable(Table2, digits = 3, format.args = list(big.mark = ","), caption="Quantile bootstrap estimation errors based on (4) and (5) for all very light tailed tailed Weibull samples")
q_boot(50%) | q_boot(75%) | q_boot(95%) | q_sample(50%) | q_sample(75%) | q_sample(95%) | |
---|---|---|---|---|---|---|
n=25 (gamma) | 0.118 | 0.098 | 0.128 | 0.105 | 0.079 | 0.096 |
n=50 (weibull) | 0.070 | 0.052 | 0.002 | 0.075 | 0.071 | 0.049 |
n=100 (weibull) | 0.004 | 0.014 | 0.052 | 0.003 | 0.009 | 0.034 |
n=250 (weibull) | 0.058 | 0.067 | 0.075 | 0.042 | 0.046 | 0.044 |
n=500 (weibull) | 0.027 | 0.016 | 0.012 | 0.023 | 0.012 | 0.017 |
The estimation errors show a reasonable pattern, since the larger the sample is the smaller the error is. Note that outliers are not removed in this section. You should also note that the true quantile values are \(q(50\%)=0.8325546\), \(q(50\%)=1.1774100\) and \(q(95\%)=1.7308184\), so estimation errors are very small but proportional to the true values.
In this section, you learn to:
The real data analysis from this section is the dataset that has been described in Section 1.2, and a detailed exploratory data analysis is provided in Section 1.3. We are particularly focus on Building Insurance Claim Size corresponding to claims recorded in 1990.
We first fit the `best’ MLE distribution from the eight possible choices, and report
\[ \hat{q}^{MLE}_{50\%},\quad\hat{q}^{MLE}_{75\%}\quad\text{and}\quad \hat{q}^{MLE}_{95\%}, \]
i.e. the theoretical quantiles for the distribution (these are three point estimates).
Note that the claims are quite large, and therefore, we scale down all claims by a factor of \(100,000\), and the final outputs will be later scaled up by the same factor.
#View(Bivariate_Fire_Insurance_Clean_Data)
Sample.RealData.raw=Bivariate_Fire_Insurance_Clean_Data[[2]][1336:1502]
#head(Bivariate_Fire_Insurance_Clean_Data)
#head(Sample.RealData.raw)
#tail(Sample.RealData.raw)
#In order to use exp and trgamma fitdist(), we scale the data by 100,000
Sample.RealData=Sample.RealData.raw/100000
summary(Sample.RealData)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8251 8.1848 10.9736 15.4773 16.5017 116.9554
kable(Table.AIC, digits = 3, format.args = list(big.mark = ","), caption=("AIC summary for Building Insurance Claim Size recorded in 1990"))
AIC (MLE,Sample.RealData) | |
---|---|
lnorm | 1,203.430 |
exp | 1,250.951 |
gamma | 1,222.947 |
Weibull | 1,238.022 |
llogis | 1,186.759 |
invWeibull | 1,260.543 |
Pareto | 1,252.947 |
trgamma | 1,208.347 |
kable(Table.AIC.min, digits = 3, format.args = list(big.mark = ","), caption=("Best (or DISToo) distribution chosen through AIC for Building Insurance Claim Size recorded in 1990"))
AIC(MLE,Best dist) | |
---|---|
Min (AIC) | llogis |
Thus, llogis is the best AIC-based parametric choice to fit the Building claims occurred in 1990. The point estimates are provided in Table 3.3 and gives the non-parametric and `best’ AIC-based parametric quantile point estimates.
colnames(TableB.1) <- c("q(50%)","q(75%)","q(95%)")
rownames(TableB.1) <- c("Non-parametric", "AIC Parametric - llogis")
kable(TableB.1, format.args = list(big.mark = ","), caption=("Quantile point estimates for Building Insurance Claim Size recorded in 1990"))
q(50%) | q(75%) | q(95%) | |
---|---|---|---|
Non-parametric | 1,097,360 | 1,650,165 | 3,770,627 |
AIC Parametric - llogis | 1,143,608 | 1,767,316 | 3,672,177 |
It is interesting to note that the non-parametric high quantile point estimate – at level \(95\%\) – is larger than its parametric estimate. We now use bootstrapping to understand the variability of our point estimates.
Bootstrap 100 times is our first setting, i.e. we generate 100 samples with replacement from the original sample of size 167: \[ \tilde{x}_1,\ldots,\tilde{x}_{100}\quad \text{with}\quad dim(\tilde{x}_i)=167 \quad\text{for all}\;1\le i \le 100. \] We then fit the parameter(s) \(\hat{\Theta}_{MLE,i}\) of Distoo, i.e. llogis in this case, based on the bootstrap sample \(\tilde{x}_{i}\) for all \(1\le i \le 100\). Further, we compute \[ \hat{q}^{MLE,i}_{xx} \quad\text{for all}\quad 1\le i \le 100\quad\text{and}\quad xx\in \{50\%, 75\%, 95\%\},\\ \] where \(\hat{q}^{MLE,i}_{xx}\) is the \(xx\%\) quantile of Distoo by assuming \(\Theta=\hat{\Theta}_{MLE,i}\). Furthermore, we report
\[\begin{eqnarray*} &&(i)\;\text{Bootstrap Estimate} \quad\hat{q}^{MLE,boot}_{xx}=\frac{1}{100}\sum^{100}_{i=1} \hat{q}^{MLE,i}_{xx},\\ &&(ii)\;\text{Bootstrap Variance} \quad\widehat{Var}\left(\hat{q}^{MLE,boot}_{xx}\right)=\frac{1}{100}\sum^{100}_{i=1}\left(\hat{q}^{MLE,i}_{xx}-\hat{q}^{MLE,boot}_{xx}\right)^2,\\ &&(iii)\;\text{MSE Estimate} \quad\widehat{MSE}\left(\hat{q}^{MLE,boot}_{xx}\right)=\frac{1}{100}\sum^{100}_{i=1}\left(\hat{q}^{MLE,i}_{XX}-\hat{q}^{MLE}_{XX}\right)^2, \end{eqnarray*}\] for all \(xx\in \{50\%, 75\%, 95\%\}\).
Here are the bootstrap samples for which the sampling seed is not fixed, as it should be for a randomised experiment.
We now bootstrap 150 times our claim data and perform the same computations. All results are reported in Table 3.4 and 3.5.
colnames(TableB.2a) <- c("q(50%)","q(75%)","q(95%)")
rownames(TableB.2a) <- c("Bootstrap Estimate", "Bootstrap Variance" , "MSE Estimate", "Bootstrap Variance / Bootstrap Estimate", "MSE Estimate / Bootstrap Estimate")
kable(TableB.2a, format.args = list(big.mark = ","), caption=("Summary of bootstrap estimation -- 100 bootstrap samples"))
q(50%) | q(75%) | q(95%) | |
---|---|---|---|
Bootstrap Estimate | 1.150928e+06 | 1.779085e+06 | 3.709717e+06 |
Bootstrap Variance | 2.744746e+09 | 1.250897e+10 | 1.882482e+11 |
MSE Estimate | 2.770871e+09 | 1.252238e+10 | 1.877750e+11 |
Bootstrap Variance / Bootstrap Estimate | 2.384812e+03 | 7.031123e+03 | 5.074461e+04 |
MSE Estimate / Bootstrap Estimate | 2.407510e+03 | 7.038661e+03 | 5.061705e+04 |
colnames(TableB.2b) <- c("q(50%)","q(75%)","q(95%)")
rownames(TableB.2b) <- c("Bootstrap Estimate", "Bootstrap Variance" , "MSE Estimate", "Bootstrap Variance / Bootstrap Estimate", "MSE Estimate / Bootstrap Estimate")
kable(TableB.2b, format.args = list(big.mark = ","), caption=("Summary of bootstrap estimation -- 150 bootstrap samples"))
q(50%) | q(75%) | q(95%) | |
---|---|---|---|
Bootstrap Estimate | 1.147084e+06 | 1.769867e+06 | 3.677304e+06 |
Bootstrap Variance | 3.309178e+09 | 1.125144e+10 | 1.468273e+11 |
MSE Estimate | 3.299195e+09 | 1.118293e+10 | 1.458747e+11 |
Bootstrap Variance / Bootstrap Estimate | 2.392803e+03 | 7.067744e+03 | 5.119189e+04 |
MSE Estimate / Bootstrap Estimate | 2.415578e+03 | 7.075322e+03 | 5.106321e+04 |