Linear Model and various Transformations for Efficiency

The linear model still remains a reference point towards advanced modeling of some datasets as foundation for Machine Learning, Data Science and Artificial Intelligence in spite of some of her weaknesses. The major task in modeling is to compare various models before a selection is made for one or for advanced modeling. Often, some trial and error methods are used to decide which model to select. This is where this function is unique. It helps to estimate 14 different linear models and provide their coefficients in a formatted Table for quick comparison so that time and energy are saved. The interesting thing about this function is the simplicity, and it is a one line code.

Usage

Linearsystems(y, x, mod, limit, Test = NA)

Arguments

y: Vector of the dependent variable. This must be numeric.
x: Data frame of the explanatory variables.
mod: The group of linear models to be estimated. It takes value from 0 to 6. 0 = EDA (correlation, summary tables, Visuals means); 1 = Linear systems, 2 = power models, 3 = polynomial models, 4 = root models, 5 = inverse models, 6 = all the 14 models
limit: Number of variables to be included in the coefficients plots
Test: test data to be used to predict y. If not supplied, the fitted y is used hence may be identical with the fitted value. It is important to be cautious if the data is to be divided between train and test subsets in order to train and test the model. If the sample size is not sufficient to have enough data for the test, errors are thrown up.

Value

A list with the following components:

Visual means of the numeric variable: Plot of the means of the numeric variables.
Correlation plot: Plot of the Correlation Matrix of the numeric variables. To recover the plot, please use this canonical form object$Correlation plot$plot().
Linear: The full estimates of the Linear Model.
Linear with interaction: The full estimates of the Linear Model with full interaction among the numeric variables.
Semilog: The full estimates of the Semilog Model. Here the independent variable(s) is/are log-transformed.
Growth: The full estimates of the Growth Model. Here the dependent variable is log-transformed.
Double Log: The full estimates of the double-log Model. Here the both the dependent and independent variables are log-transformed.
Mixed-power model: The full estimates of the Mixed-power Model. This is a combination of linear and double log models. It has significant gains over the two models separately.
Translog model: The full estimates of the double-log Model with full interaction of the numeric variables.
Quadratic: The full estimates of the Quadratic Model. Here the square of numeric independent variable(s) is/are included as independent variables.
Cubic model: The full estimates of the Cubic Model. Here the third-power (x^3) of numeric independent variable(s) is/are included as independent variables.
Inverse y: The full estimates of the Inverse Model. Here the dependent variable is inverse-transformed (1 / y).
Inverse x: The full estimates of the Inverse Model. Here the independent variable is inverse-transformed (1 / x).
Inverse y & x: The full estimates of the Inverse Model. Here the dependent and independent variables are inverse-transformed 1 / y & 1 / x).
Square root: The full estimates of the Square root Model. Here the independent variable is square root-transformed (x^0.5).
Cubic root: The full estimates of the cubic root Model. Here the independent variable is cubic root-transformed (x^1 / 3).
Significant plot of Linear: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Linear with interaction: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Semilog: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Growth: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Double Log: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Mixed-power model: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Translog model: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Quadratic: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Cubic model: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Inverse y: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Inverse x: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Inverse y & x: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Square root: Plots of order of importance and significance of estimates coefficients of the model.
Significant plot of Cubic root: Plots of order of importance and significance of estimates coefficients of the model.
Model Table: Formatted Tables of the coefficient estimates of all the models
Machine Learning Metrics: Metrics (47) for assessing model performance and metrics for diagnostic analysis of the error in estimation.
Table of Marginal effects: Tables of marginal effects of each model. Because of computational limitations, if you choose to estimate all the 14 models, the Tables are produced separately for the major transformations. They can easily be compiled into one.
Fitted plots long format: Plots of the fitted estimates from each of the model.
Fitted plots wide format: Plots of the fitted estimates from each of the model.
Prediction plots long format: Plots of the predicted estimates from each of the model.
Prediction plots wide format: Plots of the predicted estimates from each of the model.
Naive effects plots long format: Plots of the lm effects. May be identical with plots of marginal effects if performed.
Naive effects plots wide format: Plots of the lm effects. May be identical with plots of marginal effects if performed.
Summary of numeric variables: of the dataset.
Summary of character variables: of the dataset.

Examples

## Without test data (not run)
# library(tidyverse)
# y <- linearsystems$MKTcost # to run all the exercises, uncomment.
# x <- select(linearsystems, -MKTcost)
# Linearsystems(y, x, 6, 15) # NaNs produced if run
## Without test data (not run)
# x <- sampling[, -1]
# y <- sampling$qOutput
# limit <- 20
# mod <-3
# Test <- NA
# Linearsystems(y, x, 3, 15) # NaNs produced if run
# # with test data
# x <- sampling[, -1]
# y <- sampling$qOutput
# Data <- cbind(y, x)
# # 80% of data is sampled
# sampling <- sample(1 : nrow(Data), 0.8 * nrow(Data))
# # for training the model
# train <- Data[sampling, ]
# Test  <- Data[-sampling, ]
# # 20% of data is reserved for testing (predicting) the model
# y <- train$y
# x <- train[, -1]
# mod <- 4
# Linearsystems(y, x, 4, 15, Test) # NaNs produced if run