The linear model still remains a reference point towards advanced modeling of some datasets as foundation for Machine Learning, Data Science and Artificial Intelligence in spite of some of her weaknesses. The major task in modeling is to compare various models before a selection is made for one or for advanced modeling. Often, some trial and error methods are used to decide which model to select. This is where this function is unique. It helps to estimate 14 different linear models and provide their coefficients in a formatted Table for quick comparison so that time and energy are saved. The interesting thing about this function is the simplicity, and it is a one line code.
Arguments
- y
Vector of the dependent variable. This must be numeric.
- x
Data frame of the explanatory variables.
- mod
The group of linear models to be estimated. It takes value from 0 to 6. 0 = EDA (correlation, summary tables, Visuals means); 1 = Linear systems, 2 = power models, 3 = polynomial models, 4 = root models, 5 = inverse models, 6 = all the 14 models
- limit
Number of variables to be included in the coefficients plots
- Test
test data to be used to predict y. If not supplied, the fitted y is used hence may be identical with the fitted value. It is important to be cautious if the data is to be divided between train and test subsets in order to train and test the model. If the sample size is not sufficient to have enough data for the test, errors are thrown up.
Value
A list with the following components:
Visual means of the numeric variable
Plot of the means of the numeric variables.
Correlation plot
Plot of the Correlation Matrix of the numeric variables. To recover the plot, please use this canonical form object$
Correlation plot
$plot().Linear
The full estimates of the Linear Model.
Linear with interaction
The full estimates of the Linear Model with full interaction among the numeric variables.
Semilog
The full estimates of the Semilog Model. Here the independent variable(s) is/are log-transformed.
Growth
The full estimates of the Growth Model. Here the dependent variable is log-transformed.
Double Log
The full estimates of the double-log Model. Here the both the dependent and independent variables are log-transformed.
Mixed-power model
The full estimates of the Mixed-power Model. This is a combination of linear and double log models. It has significant gains over the two models separately.
Translog model
The full estimates of the double-log Model with full interaction of the numeric variables.
Quadratic
The full estimates of the Quadratic Model. Here the square of numeric independent variable(s) is/are included as independent variables.
Cubic model
The full estimates of the Cubic Model. Here the third-power (x^3) of numeric independent variable(s) is/are included as independent variables.
Inverse y
The full estimates of the Inverse Model. Here the dependent variable is inverse-transformed (1/y).
Inverse x
The full estimates of the Inverse Model. Here the independent variable is inverse-transformed (1/x).
Inverse y & x
The full estimates of the Inverse Model. Here the dependent and independent variables are inverse-transformed 1/y & 1/x).
Square root
The full estimates of the Square root Model. Here the independent variable is square root-transformed (x^0.5).
Cubic root
The full estimates of the cubic root Model. Here the independent variable is cubic root-transformed (x^1/3).
Significant plot of Linear
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Linear with interaction
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Semilog
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Growth
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Double Log
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Mixed-power model
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Translog model
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Quadratic
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Cubic model
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Inverse y
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Inverse x
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Inverse y & x
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Square root
Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Cubic root
Plots of order of importance and significance of estimates coefficients of the model.
Model Table
Formatted Tables of the coefficient estimates of all the models
Machine Learning Metrics
Metrics (47) for assessing model performance and metrics for diagnostic analysis of the error in estimation.
Table of Marginal effects
Tables of marginal effects of each model. Because of computational limitations, if you choose to estimate all the 14 models, the Tables are produced separately for the major transformations. They can easily be compiled into one.
Fitted plots long format
Plots of the fitted estimates from each of the model.
Fitted plots wide format
Plots of the fitted estimates from each of the model.
Prediction plots long format
Plots of the predicted estimates from each of the model.
Prediction plots wide format
Plots of the predicted estimates from each of the model.
Naive effects plots long format
Plots of the
lm
effects. May be identical with plots of marginal effects if performed.Naive effects plots wide format
Plots of the
lm
effects. May be identical with plots of marginal effects if performed.Summary of numeric variables
of the dataset.
Summary of character variables
of the dataset.
Examples
## Without test data (not run)
# y = linearsystems$MKTcost # to run all the exercises, uncomment.
# x <- select(linearsystems, -MKTcost)
# Linearsystems(y, x, 6, 15) # NaNs produced if run
## Without test data (not run)
# x = sampling[, -1]
# y = sampling$qOutput
# limit = 20
# mod <-3
# Test <- NA
# Linearsystems(y, x, 3, 15) # NaNs produced if run
# # with test data
# x = sampling[, -1]
# y = sampling$qOutput
# Data <- cbind(y, x)
# sampling <- sample(1:nrow(Data), 0.8*nrow(Data)) # 80% of data is sampled for training the model
# train <- Data[sampling, ]
# Test <- Data[-sampling, ] # 20% of data is reserved for testing (predicting) the model
# y <- train$y
# x <- train[, -1]
# mod <- 4
# Linearsystems(y, x, 4, 15, Test) # NaNs produced if run