Skip to contents

Dyn4cast

R-CMD-check

registry status badge

GitHub release (latest by date)

codecov

lifecycle

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Introduction

The Dyn4cast is designed to be a lightweight package. The philosophy behind it was the need to provide quick methods for estimation, prediction and forecast, updates and visualization of data. The package was launched as a function during the COVID-19 lockdown as a means of providing live updates and forecast of the Nigeria cases. The function was converted to a package in February 2021 with just one function, DynamicForecast. From that single function, the number of functions have grown to more than 10 because there were need to have supporting functions to make the forecasting easy. The package prides itself of line line technology by providing various machine learning functions with excellent functionality in one line of codes. At inception, the package had heavy dependencies on other software (e.g. Java), packages and functions, making the installation quite busy. Currently, efforts have been made and the number of dependencies have become minimal; and the functions are working optimally.

Installation

Dyn4cast is not yet on CRAN, so only the development version is available. However, the package is very functional and stable and is actively being watched for any issue. To install the development version of Dyn4cast from GitHub, use the following canonical form:

# install.packages("devtools")
devtools::install_github("JobNmadu/Dyn4cast")

The development version can also be installed through r-universe. Use the form:

install.packages("Dyn4cast", repos = c("https://jobnmadu.r-universe.dev", "https://cloud.r-project.org"))

Outlined below are the various functions and their capacities as provided by the package.

constrainedforecast: Constrained Forecast of One-sided Integer Response Model

This function constrains the forecast of one-sided integer data to ensure that the forecast remain within the limits of the data. It is used in conjunction with two other functions: scaled_logit and inv_scaled_logit. The basic usage of the function is as follows:

constrainedforecast(model10, lower, upper)

With the following arguments:

`model10`   This is the exponential values from the `inv_scaled_logit` function.

`lower` The lower limit of the forecast

`upper` The upper limit of the forecast

Example

library(splines)
library(forecast)
library(readr)
lower <- 1
upper <- 37
Model   <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data)
FitModel <- scaledlogit(x = fitted.values(Model), lower = lower,
 upper = upper)
ForecastModel <- forecast(FitModel, h = length(200))
constrainedforecast(model10 = ForecastModel, lower, upper)

The values are the unconstrained and constrained forecast. The constrained forecast remains within the limits of the data.

data_transform: Standardize data.frame for comparable Machine Learning prediction and visualization

This function transforms a dataset to uniform units to facilitate estimation, interpretation or visualization. It is simple and straight forward to use as outlined below. The basic usage is as follows:

data_transform(data, method, margin = 2)

With the arguments: 

`data`  A data.frame with numeric data for transformation. All columns in the data are transformed

`method`    The type of transformation. There three options. 1 is for log transformation, 2 is for min-max transformation and 3 is for mean-SD transformation.

`margin`    Option to either transform the data 2 == column-wise or 1 == row-wise. Defaults to column-wise transformation if no option is indicated.

Examples

View the data without transformation

data0 <- Transform %>%
pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data0, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

Transformation by min-max method

data1 <- data_transform(Transform[, -1], 1)
data1 <- cbind(Transform[, 1], data1)
data1 <- data1 %>%
  pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data1, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

log transformation

data2 <- data_transform(Transform[, -1], 2)
data2 <- cbind(Transform[, 1], data2)
data2 <- data2 %>%
  pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data2, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

Mean-SD transformation

data3 <- data_transform(Transform[, -1], 3)
data3 <- cbind(Transform[, 1], data3)
data3 <- data3 %>%
  pivot_longer(!X, names_to = "Factors", values_to = "Data")

ggplot(data = data3, aes(x = X, y = Data, fill = Factors, color = Factors)) +
  geom_line() +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Data", x = "Series", color = "Factors") +
  theme_bw(base_size = 12)

From the various transformation, the pattern in the dataset became more clear. The choice of which method to use will depend on the nature of data.

DynamicForecast: Dynamic Forecast of Five Models and their Ensembles

The function estimates, predict and forecasts data either for the entire trend of the series or subset with various essembles. The main attraction of this function is the use of the newly introduced equal number of trend (days, months, years) forecast. The basic usage is as follows:

DynamicForecast(date, series, Trend, Type, MaximumDate, x = 0, BREAKS = 0,
 ORIGIN = NULL, origin = "1970-01-01", Length = 0, ...)

With the arguments

`date`  A vector containing the dates for which the data is collected. Must be the same length with series. The date must be in 'YYYY-MM-DD'. If the data is monthly series, the recognized date format is the last day of the month of the dataset e.g. 2021-02-28. If the data is a yearly series, the recognized date format is the last day of the year of the data set e.g. 2020-12-31. There is no format for Quarterly data for now.

`series`    A vector containing data for estimation and forecasting. Must be the same length with date.

`x`  vector of optional dataset that is to be added to the model for forecasting. The modeling and forecasting is still done if not provided. Must be the same length with series.

`BREAKS`    A vector of numbers indicating points of breaks for estimation of the spline models.

`MaximumDate`    The date indicating the maximum date (last date) in the data frame, meaning that forecasting starts the next date following it. The date must be a recognized date format. Note that for forecasting, the date origin is set to 1970-01-01.

`Trend` The type of trend. There are three options Day, Month and Year.

`Type`  The type of response variable. There are two options Continuous and Integer. For integer variable, the forecasts are **constrained** between the minimum and maximum value of the response variable.

`Length`    The length for which the forecast would be made. If not given, would default to the length of the dataset i.e. sample size.

`origin`    default date origin which is 1970-01-01 used to position the date of data to properly date the forecasts.

`ORIGIN`    date origin of the dataset and if different from **origin** must be in the format "YYYY-MM-DD". This is used to position the date of the data to properly date the forecasts.

`...`   Additional arguments that may be passed to the function. If the maximum date is NULL which is is the default, it is set to the last date of the series.

Example

Twenty eight points less than full length of data

library(readr)
COVID19$Date <- zoo::as.Date(COVID19$Date, format = '%m/%d/%Y')

LEN <- length(COVID19$Case)
Dss <- seq(COVID19$Date[1], by = "day", length.out = LEN)
ORIGIN = "2020-02-29"
lastdayfo21 <- Dss[length(Dss)]
gggg <- COVID19[COVID19$Date <= lastdayfo21 - 28, ]
BREAKS <- c(70, 131, 173, 228, 274)
DynamicForecast(date = gggg$Date, series = gggg$Case, 
                BREAKS = BREAKS, MaximumDate = "2021-02-10",
                Trend = "Day", Length = 0, Type = "Integer")

Fourteen points less than full length of data

lastdayfo21 <- Dss[length(Dss)]
ddy <- COVID19[COVID19$Date <= lastdayfo21 - 14, ]
BREAKS = c(70, 131, 173, 228, 274)
suppressWarnings(DynamicForecast(date = ddy$Date, series = ddy$Case, BREAKS = BREAKS ,
                MaximumDate = "2021-02-10", Trend = "Day", Length = 0, 
                Type = "Integer"))

The constrained and unconstrained forecast are provided. It can be observed that the unconstrained forecast goes below zero and goes beyond the maximum observed value in the data.

formattedcut: Convert continuous vector variable to formatted factors

This is a wrapper to improve the output of ⁠base R⁠ cut function. This function provide a more user-friendly output and is provided in a formatted manner. It is a easy to implement function. The basic usage is:

formattedcut(data, breaks, cut = FALSE)

With the arguments:

`data`  A vector of the data to be converted to factors if not cut already or the vector of a cut data

`breaks`    Number of classes to break the data into

`cut`   Logical to indicate if the cut function has already being applied to the data, defaults to FALSE.

if data is not from a data frame, the frequency distribution is required.

as.data.frame(DDK %>%
group_by(`Lower class`, `Upper class`, `Class interval`) %>%
tally())

Example

DD <- rnorm(100000)
formattedcut(DD, 12, FALSE)
DD1 <- cut(DD, 12)
formattedcut(DD1, 12, TRUE)

garrett_ranking: Garrett Ranking of Categorical Data

The function is the application of fractional ranking in which the data points are ordered and given an ordinal number/rank. The ordering and ranking provide additional information which may not be available from frequency distribution. Ranking enables ease of comparison and makes grouping more meaningful. The basic usage is:

garrett_ranking(data, num_rank, ranking = NULL, m_rank = c(2:15))

With the arguments:

`data`  The data for the Garrett Ranking, must be a data.frame.

`num_rank`  A vector representing the number of ranks applied to the data. If the data is a five-point Likert-type data, then number of ranks is 5.

`ranking`    A vector of list representing the ranks applied to the data. If not available, positional ranks are applied.

`m_rank`    The scope of the ranking methods which is between 2 and 15.

Examples

Ranking is supplied

library(readr)
garrett_data <- data.frame(garrett_data)
ranking <- c("Serious constraint", "Constraint",
"Not certain it is a constraint", "Not a constraint",
"Not a serious constraint")

garrett_ranking(garrett_data, 5, ranking)

Ranking not supplied

garrett_ranking(garrett_data, 5)

Rank subset of the data

garrett_ranking(garrett_data, 8)

garrett_ranking(garrett_data, 4)

gender: Create Gender Variable

This function combines Age and Sex to create Gender. It also helps to put clarity between the sex assigned at birth and the gender of an individual. The basic usage is:

gender(data)

With argument: 

`data`  data frame containing Age and Sex variables

Example

df <- data.frame(Age = c(49, 30, 44, 37, 29, 56, 28, 26, 33, 45, 45, 19,
                         32, 22, 19, 28, 28, 36, 56, 34),
Sex = c("male", "female", "female", "male", "male", "male", "female",
        "female", "Prefer not to say", "male", "male", "female", "female",
        "male", "Non-binary/third gender", "male", "female", "female", "male",
        "male"))
gender(df)

Linearsystems: Linear Model and various Transformations for Efficiency

This function is unique and versatile as it estimates 14 different transformation of the linear regression model. The interesting thing about this function is its simplicity and one line code but a workforce outputting above 20 different objects. The function incorporates two other functions: corplot and estimate_plot. The basic usage is:

Linearsystems(y, x, mod, limit, Test = NA)

With the arguments:

`y` Vector of the dependent variable. This must be numeric.

`x` Data frame of the explanatory variables.

`mod`   The group of linear models to be estimated. It takes value from 0 to 6. 0 = EDA (correlation, summary tables, Visuals means); 1 = Linear systems, 2 = power models, 3 = polynomial models, 4 = root models, 5 = inverse models, 6 = all the 14 models

`limit` Number of variables to be included in the coefficients plots

`Test`  test data to be used to predict y. If not supplied, the fitted y is used hence may be identical with the fitted value. It is important to be cautious if the data is to be divided between train and test subsets in order to train and test the model. If the sample size is not sufficient to have enough data for the test, errors are thrown up.

Example

Estimation Without test data, 14 models

y <- linearsystems$MKTcost
x <- select(linearsystems, -MKTcost)
Linearsystems(y, x, 6, 15)

Estimation Without test data, polynomial models

x <- sampling[, -1]
y <- sampling$qOutput
limit <- 20
mod <-3
Test <- NA
Linearsystems(y, x, 3, 15)

Estimation With test data, linear models

x <- sampling[, -1]
y <- sampling$qOutput
mod <- 1
Linearsystems(y, x, 1, 15)

Estimation With test data, power models

x <- sampling[, -1]
y <- sampling$qOutput
mod <- 2
Linearsystems(y, x, 2, 15)

Estimation With test data, root models

x <- sampling[, -1]
y <- sampling$qOutput
ddc <- cbind(y, x)
sampling <- sample(1 : nrow(ddc), 0.8 * nrow(ddc))
train <- ddc[sampling, ]
Test  <- ddc[-sampling, ]
y <- train$y
x <- train[, -1]
mod <- 4
Linearsystems(y, x, 4, 15, Test)

Estimation Without test data, inverse models

y <- linearsystems$MKTcost
x <- select(linearsystems, -MKTcost)
Linearsystems(y, x, 5, 15)

mdpi: Sequential Computation of Dynamic Multidimensional Poverty Indices (MDPI)

The function, along with plot_mdpi, computes and visualise dynamic multidimensional poverty and all associated measures of multidimensional poverty.

The main selling points of this function are:

  1. computations are made for between three and nine dimensions and can be grouped with factors like region, sex, gender, marital status or any suitable one. All the earlier algorithms provided for 3 dimensions.

  2. In addition to MDPI, six additional indices are computed.

  3. The computations can be done at national or sub-national levels or suitable factors

The basic code usage is:

mdpi(
  data,
  dm,
  Bar = 0.4,
  id_addn = NULL,
  Factor = NULL,
  plots = NULL,
  id = c("Health", "Education", "Living standard"),
  id_add = "Social security",
  id_add1 = "Employment and Income")
  
With the arguments

`data` ⁠data frame⁠ containing all the variables for the computation. Note that the variables to be used for the computation must be coded ⁠(0,1)⁠.

`dm` list of vectors of indicators making up each dimension to be computed

`Bar` an optional vector of cut-of used to divide the population into those in the poverty category and those that are not. Defaults to 0.4 if not supplied.

`id_addn` an optional vector of additional dimensions to be used for the computation up to a maximum of four.

`Factor` an optional grouping factor for the computation which must be a variable in the data. If not supplied, only the national MDPI will be computed.

`plots` plots of the various measures. For this to be possible, the number of options in the Factor argument must be less than 41. The default is NULL. To produce the plots, any character string will overwrite the default.

`id` a vector of the first three dimensions used in the computation given as _Health, Education and Living standard_. Can be redefined but must match the indicators and cannot be NULL.

`id_add` a vector of the fourth dimension in the computation given as _Social security_. Can be re-defined but never NULL.

`id_add1` a vector of the fifth dimension in the computation given as _Employment and Income_. Can be re-defined but never NULL.

Examples

With three dimensions and factor

 data <- mdpi1 # data from `MPI` package
 data1 <- mdpi2 #from `mpitbR` package

dm <- list(d1 = c("Child.Mortality", "Access.to.health.care"),
             d2 = c("Years.of.education", "School.attendance", "School.lag"),
             d3 = c("Cooking.Fuel", "Access.to.clean.source.of.water",
                    "Access.to.an.improve.sanatation", "Electricity",
                    "Housing.Materials", "Asset.ownership"))
  mdpi(data, dm, plots = "t", Factor = "Region")

With three dimensions, no factor

mdpi(data, dm, plots = "t")

With four dimensions and factor

dm <- list(d1 = c("Child.Mortality", "Access.to.health.care"),
             d2 = c("Years.of.education", "School.attendance", "School.lag"),
             d3 = c("Cooking.Fuel", "Electricity", "Housing.Materials",
                    "Asset.ownership"),
             d4 = c("Access.to.clean.source.of.water",
                    "Access.to.an.improve.sanatation"))

  mdpi(data, dm, id_add = c("Water and Sanitation"),
                plots = "t", Factor = "Region")

With five dimensions and factor

dm <- list(d1 = c("d_cm"),
             d2 = c("d_satt","d_educ"),
             d3 = c("d_elct", "d_hsg","d_ckfl","d_asst"),
             d4 = c("d_sani","d_wtr"),
             d5 = c("d_nutr"))

  mdpi(data1, dm, id_add = "Water and Sanitation", id_add1 = "Nutrition",
       plots = "t", Factor = "region")

With five dimensions, no plot

mdpi(data1, dm, id_add = "Water and Sanitation", id_add1 = "Nutrition",
       Factor = "region")

With five dimensions, no factor, no plot

mdpi(data1, dm, id_add = "Water and Sanitation", id_add1 = "Nutrition")

With six dimensions and factor

dm <- list(d1 = c("d_cm"),
             d2 = c("d_satt","d_educ"),
             d3 = c("d_elct", "d_ckfl"),
             d4 = c("d_sani","d_wtr"),
             d5 = c("d_nutr"),
             d6 = c("d_hsg","d_asst"))

  mdpi(data1, dm, id_add = "Water and Sanitation", id_add1 = "Nutrition",
                id_addn = "Housing and Assets", plots = "t", Factor = "region")

With seven dimensions and factor

dm <- list(d1 = c("d_cm"),
             d2 = c("d_satt","d_educ"),
             d3 = c("d_elct", "d_ckfl"),
             d4 = c("d_sani","d_wtr"),
             d5 = c("d_nutr"),
             d6 = c("d_hsg"),
             d7 = c("d_asst"))

  mdpi(data1, dm, id_add = "Water and Sanitation",
                id_add1 = "Nutrition", id_addn = c("Housing", "Assets"),
                plots = "t", Factor = "region")

With eight dimensions and factor

dm <- list(d1 = c("d_cm"),
             d2 = c("d_satt","d_educ"),
             d3 = c("d_elct", "d_ckfl"),
             d4 = c("d_sani"),
             d5 = c("d_nutr"),
             d6 = c("d_hsg"),
             d7 = c("d_asst"),
             d8 = c("d_wtr"))

  mdpi(data1, dm, id_add = "Sanitation",
                 id_add1 = "Nutrition",
                 id_addn = c("Housing", "Assets", "Water"),
                 plots = "t", Factor = "region")

With nine dimensions and factor

  dm <- list(d1 = c("d_cm"),
             d2 = c("d_satt","d_educ"),
             d3 = c("d_elct", "d_ckfl"),
             d4 = c("d_sani"),
             d5 = c("d_nutr"),
             d6 = c("d_hsg"),
             d7 = c("d_asst"),
             d8 = c("d_wtr"),
             d9 = c("d_elct"))

  mdpi(data1, dm, id_add = "Sanitation",
                 id_add1 = "Nutrition",
                 id_addn = c("Housing", "Assets", "Water", "Electricity"),
                 plots = "t", Factor = "region")

MLMetrics: Collection of Machine Learning Model Metrics for Easy Reference

This function estimates over 40 metrics for assessing the quality of Machine Learning Models. The purpose is to provide a wrapper which brings all the metrics on the table and makes model selection easy.

MLMetrics(Observed, yvalue, modeli, K, Name, Form, kutuf, TTy)

with the arguments:

`Observed`  The Observed data in a data frame format

`yvalue`    The Response variable of the estimated Model

`modeli`    The Estimated Model (Model = a + bx)

`K` The number of variables in the estimated Model to consider

`Name`  The Name of the Models that need to be specified. They are *ARIMA*,
*Values* if the model computes the fitted value without estimation like
Essembles, *SMOOTH* (smooth.spline), *Logit*, Ensembles based on weight 
- *EssemWet*, *QUADRATIC* polynomial, *SPLINE* polynomial.

`Form`  Form of the Model Estimated (LM, ALM, GLM, N-LM, ARDL)

`kutuf` Cutoff for the Estimated values (defaults to 0.5 if not specified)

`TTy`   Type of response variable (Numeric or Response - like binary)

Example

library(splines)
library(readr)
Model   <- lm(states ~ bs(sequence, knots = c(30, 115)), data = Data)
suppressWarnings(MLMetrics(Observed = Data, yvalue = Data$states, modeli = Model, K = 2,
           Name = "Linear", Form = "LM", kutuf = 0, TTy = "Number"))

model_factors: Latent Factors Recovery from Variables Loadings

This function retrieves the latent factors as data.frame and their variable loadings which can be used as R objects to perform other analysis. The function is set to retrieve up to nine factors.

model_factors(data, DATA)

With the arguments:

`data`  An ⁠R object⁠ obtained from exploratory factor analysis (EFA) using the fa function in psych package.

`DATA`  A data.frame, the raw data used to carry out the parallel analysis to obtain data object.

Example

library(psych)
library(readr)
ddd <- Quicksummary
GGn <- names(ddd)
GG <- ncol(ddd)
GGx <- c(paste0('x0', 1 : 9), paste("x", 10 : ncol(ddd), sep = ""))
names(ddd) <- GGx
lll <- fa.parallel(ddd, fm = "minres", fa = "fa")
dat <- fa(ddd, nfactors = lll[["nfact"]], rotate = "varimax",fm = "minres")

model_factors(data = dat, DATA = ddd)

Percent: Attach Per Cent Sign to Data

This function is a wrapper for easy affixing of the per cent sign (%) to a value or a vector or a data frame of values. The basic usage is:

Percent(Data, Type, format = "f", ...)

With the arguments:

`Data`  The Data which the percent sign is to be affixed. The data must be in the raw form because for frame argument, the per cent value of each cell is calculated before the sign is affixed.

`Type`  The type of data. The default arguments are Value for single numeric data of Frame for a numeric vector or data frame data. In the case of vector or data frame, the per cent value of each cell is calculated before the per cent sign is affixed.

`format`    The format of the output which is internal and the default is a character factor

`...`   Additional arguments that may be passed to the function

Example

A vector data

Data <- c(1.2, 0.5, 0.103, 7, 0.1501)
Percent(Data = Data, Type = "Frame")
Data <- 1.2
Percent(Data = Data, Type = "Value")

Data frame

df <- data.frame(c(A = 2320, 5760, 4800, 2600, 5700, 7800, 3000, 6300, 2400,
10000, 2220, 3740),
B = c(0, 0, 1620, 3600, 1200, 1200, 1200, 4250, 14000, 10000, 1850, 1850),
C = c(3000, 3000, 7800, 5400, 3900, 7800, 1950, 2400, 2400, 7000, 1850, 1850),
D = c(2900, 5760, 3750, 5400, 4095, 3150, 2080, 7800, 1920, 1200, 5000, 1950),
E = c(2900, 2030, 0, 5400, 5760, 1800, 2000, 1950, 1850, 3600, 5200, 5760),
F = c(2800, 5760, 1820, 4340, 7500, 2400, 2300, 1680, 1850, 0, 2800, 8000),
G = c(5760, 4600, 13000, 7800, 6270, 1200, 1440, 8000, 1200, 2025, 4800, 2600),
H = c(2100, 5760, 8250, 3900, 1800, 1200, 4800, 1800, 7800, 2035, 8000, 3000))
Percent(Data = df, Type = "Frame")  # Value, Frame

quicksummary: Quick Formatted Summary of Machine Learning Data

This function summary of dataset and the output is a formatted table. This is very handy and user-friendly summaries. The basic usage is:

quicksummary(x, Type, Cut, Up, Down, ci = 0.95)

With the arguments:

`x` The data to be summarised. Only numeric data is allowed.

`Type`  The type of data to be summarised. There are two options here 1 or 2, 1 = Continuous and 2 = Likert-type

`Cut`   The cut-off point for Likert-type data

`Up`    The top Likert-type scale, for example, Agree, Constraints etc which would appear in the remark column.

`Down`  The lower Likert-type scale, for example, Disagree, ⁠Not a Constraint⁠ etc which would appear in the remark column.

`ci`    Confidence interval which is defaults to 0.95.

Examples

Likert-type data

Up <- "Constraint"
Down <- "Not a constraint"
quicksummary(x = Quicksummary, Type = 2, Cut = 2.60, Up = Up, Down = Down)

Continuous data

x <- select(linearsystems, 1:6)
quicksummary(x = x, Type = 1)

treatment_model: Enhanced Estimation of Treatment Effects of Binary Data from Randomized Experiments

Against the practice of either determining treatment effects on the treated or treatment effects on the untreated alone, this function is unique because five different average treatment effects are estimated simultaneously in one line code arguments. The basic usage is:

treatment_model(Treatment, x_data)

With the arguments:

`Treatment` Vector of binary data (0 = control population, 1 = treated population) LHS for the treatment effects estimation

`x_data`    Data frame of explanatory variables for the RHS of the estimation

Example

library(readr)
Treatment = treatments$treatment
data = treatments[, c(2:3)]
treatment_model(Treatment, data)