The linear model still remains a reference point towards advanced modeling of some datasets as foundation for Machine Learning, Data Science and Artificial Intelligence in spite of some of her weaknesses. The major task in modeling is to compare various models before a selection is made for one or for advanced modeling. Often, some trial and error methods are used to decide which model to select. This is where this function is unique. It helps to estimate 14 different linear models and provide their coefficients in a formatted Table for quick comparison so that time and energy are saved. The interesting thing about this function is the simplicity, and it is a one line code.
Arguments
- y
Vector of the dependent variable. This must be numeric.
- x
Data frame of the explanatory variables.
- mod
The group of linear models to be estimated. It takes value from 0 to 6. 0 = EDA (correlation, summary tables, Visuals means); 1 = Linear systems, 2 = power models, 3 = polynomial models, 4 = root models, 5 = inverse models, 6 = all the 14 models
- limit
Number of variables to be included in the coefficients plots
- Test
test data to be used to predict y. If not supplied, the fitted y is used hence may be identical with the fitted value. It is important to be cautious if the data is to be divided between train and test subsets in order to train and test the model. If the sample size is not sufficient to have enough data for the test, errors are thrown up.
Value
A list with the following components:
Visual means of the numeric variablePlot of the means of the numeric variables.
Correlation plotPlot of the Correlation Matrix of the numeric variables. To recover the plot, please use this canonical form object$
Correlation plot$plot().LinearThe full estimates of the Linear Model.
Linear with interactionThe full estimates of the Linear Model with full interaction among the numeric variables.
SemilogThe full estimates of the Semilog Model. Here the independent variable(s) is/are log-transformed.
GrowthThe full estimates of the Growth Model. Here the dependent variable is log-transformed.
Double LogThe full estimates of the double-log Model. Here the both the dependent and independent variables are log-transformed.
Mixed-power modelThe full estimates of the Mixed-power Model. This is a combination of linear and double log models. It has significant gains over the two models separately.
Translog modelThe full estimates of the double-log Model with full interaction of the numeric variables.
QuadraticThe full estimates of the Quadratic Model. Here the square of numeric independent variable(s) is/are included as independent variables.
Cubic modelThe full estimates of the Cubic Model. Here the third-power (x^3) of numeric independent variable(s) is/are included as independent variables.
Inverse yThe full estimates of the Inverse Model. Here the dependent variable is inverse-transformed (1 / y).
Inverse xThe full estimates of the Inverse Model. Here the independent variable is inverse-transformed (1 / x).
Inverse y & xThe full estimates of the Inverse Model. Here the dependent and independent variables are inverse-transformed 1 / y & 1 / x).
Square rootThe full estimates of the Square root Model. Here the independent variable is square root-transformed (x^0.5).
Cubic rootThe full estimates of the cubic root Model. Here the independent variable is cubic root-transformed (x^1 / 3).
Significant plot of LinearPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of Linear with interactionPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of SemilogPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of GrowthPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of Double LogPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of Mixed-power modelPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of Translog modelPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of QuadraticPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of Cubic modelPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of Inverse yPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of Inverse xPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of Inverse y & xPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of Square rootPlots of order of importance and significance of estimates coefficients of the model.
Significant plot of Cubic rootPlots of order of importance and significance of estimates coefficients of the model.
Model TableFormatted Tables of the coefficient estimates of all the models
Machine Learning MetricsMetrics (47) for assessing model performance and metrics for diagnostic analysis of the error in estimation.
Table of Marginal effectsTables of marginal effects of each model. Because of computational limitations, if you choose to estimate all the 14 models, the Tables are produced separately for the major transformations. They can easily be compiled into one.
Fitted plots long formatPlots of the fitted estimates from each of the model.
Fitted plots wide formatPlots of the fitted estimates from each of the model.
Prediction plots long formatPlots of the predicted estimates from each of the model.
Prediction plots wide formatPlots of the predicted estimates from each of the model.
Naive effects plots long formatPlots of the
lmeffects. May be identical with plots of marginal effects if performed.Naive effects plots wide formatPlots of the
lmeffects. May be identical with plots of marginal effects if performed.Summary of numeric variablesof the dataset.
Summary of character variablesof the dataset.
Examples
## Without test data (not run)
# library(tidyverse)
# y <- linearsystems$MKTcost # to run all the exercises, uncomment.
# x <- select(linearsystems, -MKTcost)
# Linearsystems(y, x, 6, 15) # NaNs produced if run
## Without test data (not run)
# x <- sampling[, -1]
# y <- sampling$qOutput
# limit <- 20
# mod <-3
# Test <- NA
# Linearsystems(y, x, 3, 15) # NaNs produced if run
# # with test data
# x <- sampling[, -1]
# y <- sampling$qOutput
# Data <- cbind(y, x)
# # 80% of data is sampled
# sampling <- sample(1 : nrow(Data), 0.8 * nrow(Data))
# # for training the model
# train <- Data[sampling, ]
# Test <- Data[-sampling, ]
# # 20% of data is reserved for testing (predicting) the model
# y <- train$y
# x <- train[, -1]
# mod <- 4
# Linearsystems(y, x, 4, 15, Test) # NaNs produced if run
