Title: | Visualizing Classification Results |
---|---|
Description: | Tools to visualize the results of a classification of cases. The graphical displays include stacked plots, silhouette plots, quasi residual plots, and class maps. Implements the techniques described and illustrated in Raymaekers, Rousseeuw and Hubert (2021), Class maps for visualizing classification results, Technometrics, appeared online. <doi:10.1080/00401706.2021.1927849> (open access) and Raymaekers and Rousseeuw (2021), Silhouettes and quasi residual plots for neural nets and tree-based classifiers, <arXiv:2106.08814>. Examples can be found in the vignettes: "Discriminant_analysis_examples","K_nearest_neighbors_examples", "Support_vector_machine_examples", "Rpart_examples", "Random_forest_examples", and "Neural_net_examples". |
Authors: | Jakob Raymaekers [aut, cre], Peter Rousseeuw [aut] |
Maintainer: | Jakob Raymaekers <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.3 |
Built: | 2024-11-16 04:39:13 UTC |
Source: | https://github.com/cran/classmap |
Draw the class map to visualize classification results, based on the output of one of the
vcr.*.*
functions in this package. The vertical axis of the class map shows each case's PAC
, the conditional probability that it belongs to an alternative class. The farness
on the horizontal axis is the probability of a member of the given class being at most as far from the class as the case itself.
classmap(vcrout, whichclass, classLabels = NULL, classCols = NULL, main = NULL, cutoff = 0.99, plotcutoff = TRUE, identify = FALSE, cex = 1, cex.main = 1.2, cex.lab = NULL, cex.axis = NULL, opacity = 1, squareplot = TRUE, maxprob = NULL, maxfactor = NULL)
classmap(vcrout, whichclass, classLabels = NULL, classCols = NULL, main = NULL, cutoff = 0.99, plotcutoff = TRUE, identify = FALSE, cex = 1, cex.main = 1.2, cex.lab = NULL, cex.axis = NULL, opacity = 1, squareplot = TRUE, maxprob = NULL, maxfactor = NULL)
vcrout |
output of |
whichclass |
the number or level of the class to be displayed. Required. |
classLabels |
the labels (levels) of the classes. If |
classCols |
a list of colors for the class labels. There should be at least as many as there are levels. If |
main |
title for the plot. |
cutoff |
cases with overall farness |
plotcutoff |
If true, plots the cutoff on the farness values as a vertical line. |
identify |
if |
cex |
passed on to |
cex.main |
same, for title. |
cex.lab |
same, for labels on horizontal and vertical axes. |
cex.axis |
same, for axes. |
opacity |
determines opacity of plotted dots. Value between 0 and 1, where 0 is transparent and 1 is opaque. |
squareplot |
If |
maxprob |
draws the farness axis at least upto probability maxprob. If |
maxfactor |
if not |
Executing the function plots the class map and returns
coordinates |
a matrix with 2 columns containing the coordinates of the plotted points. The first coordinate is the quantile of the farness probability. This makes it easier to add text next to interesting points. If |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J., Hubert M. (2021). Class maps for visualizing classification results. Technometrics, appeared online. doi:10.1080/00401706.2021.1927849(link to open access pdf)
Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)
vcr.da.train
, vcr.da.newdata
,vcr.knn.train
, vcr.knn.newdata
,vcr.svm.train
, vcr.svm.newdata
,vcr.rpart.train
, vcr.rpart.newdata
,vcr.forest.train
, vcr.forest.newdata
,vcr.neural.train
, vcr.neural.newdata
vcrout <- vcr.da.train(iris[, 1:4], iris[, 5]) classmap(vcrout, "setosa", classCols = 2:4) # tight class classmap(vcrout, "versicolor", classCols = 2:4) # less tight # The cases misclassified as virginica are shown in blue. classmap(vcrout, "virginica", classCols = 2:4) # The case misclassified as versicolor is shown in green. # For more examples, we refer to the vignettes: ## Not run: vignette("Discriminant_analysis_examples") vignette("K_nearest_neighbors_examples") vignette("Support_vector_machine_examples") vignette("Rpart_examples") vignette("Random_forest_examples") vignette("Neural_net_examples") ## End(Not run)
vcrout <- vcr.da.train(iris[, 1:4], iris[, 5]) classmap(vcrout, "setosa", classCols = 2:4) # tight class classmap(vcrout, "versicolor", classCols = 2:4) # less tight # The cases misclassified as virginica are shown in blue. classmap(vcrout, "virginica", classCols = 2:4) # The case misclassified as versicolor is shown in green. # For more examples, we refer to the vignettes: ## Not run: vignette("Discriminant_analysis_examples") vignette("K_nearest_neighbors_examples") vignette("Support_vector_machine_examples") vignette("Rpart_examples") vignette("Random_forest_examples") vignette("Neural_net_examples") ## End(Not run)
vcr.*.*
.
Build a confusion matrix from the output of a function vcr.*.*
.
Optionally, a separate column for outliers can be added to the confusion matrix.
confmat.vcr(vcrout, cutoff = 0.99, showClassNumbers = FALSE, showOutliers = TRUE, silent = FALSE)
confmat.vcr(vcrout, cutoff = 0.99, showClassNumbers = FALSE, showOutliers = TRUE, silent = FALSE)
vcrout |
output of |
cutoff |
cases with overall farness |
showClassNumbers |
if |
showOutliers |
if |
silent |
if |
A confusion matrix
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J., Hubert M. (2021). Class maps for visualizing classification results. Technometrics, appeared online. doi:10.1080/00401706.2021.1927849(link to open access pdf)
vcr.da.train
, vcr.da.newdata
,vcr.knn.train
, vcr.knn.newdata
,vcr.svm.train
, vcr.svm.newdata
,vcr.rpart.train
, vcr.rpart.newdata
,vcr.forest.train
, vcr.forest.newdata
,vcr.neural.train
, vcr.neural.newdata
vcrout <- vcr.knn.train(scale(iris[, 1:4]), iris[, 5], k = 5) # The usual confusion matrix: confmat.vcr(vcrout, showOutliers = FALSE) # Cases with ofarness > cutoff are flagged as outliers: confmat.vcr(vcrout, cutoff = 0.98) # With the default cutoff = 0.99 only one case is flagged here: confmat.vcr(vcrout) # Note that the accuracy is computed before any cases # are flagged, so it does not depend on the cutoff. confmat.vcr(vcrout, showClassNumbers = TRUE) # Shows class numbers instead of labels. This option can # be useful for long level names. # For more examples, we refer to the vignettes: ## Not run: vignette("Discriminant_analysis_examples") vignette("K_nearest_neighbors_examples") vignette("Support_vector_machine_examples") vignette("Rpart_examples") vignette("Random_forest_examples") vignette("Neural_net_examples") ## End(Not run)
vcrout <- vcr.knn.train(scale(iris[, 1:4]), iris[, 5], k = 5) # The usual confusion matrix: confmat.vcr(vcrout, showOutliers = FALSE) # Cases with ofarness > cutoff are flagged as outliers: confmat.vcr(vcrout, cutoff = 0.98) # With the default cutoff = 0.99 only one case is flagged here: confmat.vcr(vcrout) # Note that the accuracy is computed before any cases # are flagged, so it does not depend on the cutoff. confmat.vcr(vcrout, showClassNumbers = TRUE) # Shows class numbers instead of labels. This option can # be useful for long level names. # For more examples, we refer to the vignettes: ## Not run: vignette("Discriminant_analysis_examples") vignette("K_nearest_neighbors_examples") vignette("Support_vector_machine_examples") vignette("Rpart_examples") vignette("Random_forest_examples") vignette("Neural_net_examples") ## End(Not run)
This is a subset of the data used in the paper, which was assembled by Prettenhofer and Stein (2010). It contains 1000 reviews of books on Amazon, of which 500 were selected from the original training data and 500 from the test data.
The full dataset has been used for a variety of things, including classification using svm. The subset was chosen small enough to keep the computation time low, while still containing the examples in the paper.
data("data_bookReviews")
data("data_bookReviews")
A data frame with 1000 observations on the following 2 variables.
review
the review in text format (character)
sentiment
factor indicating the sentiment of the review: negative (1) or positive (2)
Prettenhofer, P., Stein, B. (2010). Cross-language text classification using structural correspondence learning. Proceedings of the 48th annual meeting of the association for computational linguistics, 1118-1127.
data(data_bookReviews) # Example review: data_bookReviews[5, 1] # The data are used in: ## Not run: vignette("Support_vector_machine_examples") ## End(Not run)
data(data_bookReviews) # Example review: data_bookReviews[5, 1] # The data are used in: ## Not run: vignette("Support_vector_machine_examples") ## End(Not run)
This data on floral pear bud detection was first described by Wouters et al. The goal is to classify the instances into buds, branches, scales and support. The numeric vectors resulted from a multispectral vision sensor and describe the scanned images.
data("data_floralbuds")
data("data_floralbuds")
A data frame with 550 observations on the following 7 variables.
X1
numeric vector
X2
numeric vector
X3
numeric vector
X4
numeric vector
X5
numeric vector
X6
numeric vector
y
a factor with levels branch
bud
scales
support
Wouters, N., De Ketelaere, B., Deckers, T. De Baerdemaeker, J., Saeys, W. (2015). Multispectral detection of floral buds for automated thinning of pear. Comput. Electron. Agric. 113, C, 93–103. <doi:10.1016/j.compag.2015.01.015>
data("data_floralbuds") str(data_floralbuds) summary(data_floralbuds) # The data are used in: ## Not run: vignette("Discriminant_analysis_examples") vignette("Neural_net_examples") ## End(Not run)
data("data_floralbuds") str(data_floralbuds) summary(data_floralbuds) # The data are used in: ## Not run: vignette("Discriminant_analysis_examples") vignette("Neural_net_examples") ## End(Not run)
This dataset contains information on fake (spam) accounts on Instagram. The original source is https://www.kaggle.com/free4ever1/instagram-fake-spammer-genuine-accounts by Bardiya Bakhshandeh.
The data contains information on 696 Instagram accounts. For each account, 11 variables were recorded describing its characteristics. The goal is to detect fake instagram accounts, which are used for spamming.
data("data_instagram")
data("data_instagram")
A data frame with 696 observations on the following variables.
binary, indicates whether profile has picture.
ratio of number of numerical chars in username to its length.
number of words in full name.
ratio of number of numerical characters in full name to its length.
binary, indicates whether the name and username of the profile are the same.
length of the description/biography of the profile (in number of characters).
binary, indicates whether profile has external url.
binary, indicates whether profile is private or not.
number of posts made by profile.
number of followers.
numbers of follows.
whether profile is fake or not.
vector taking the values “train” or “test” indicating whether the observation belongs to the training or the test data.
https://www.kaggle.com/free4ever1/instagram-fake-spammer-genuine-accounts
data(data_instagram) str(data_instagram) # The data are used in: ## Not run: vignette("Random_forest_examples") ## End(Not run)
data(data_instagram) str(data_instagram) # The data are used in: ## Not run: vignette("Random_forest_examples") ## End(Not run)
This dataset contains information on 1309 passengers of the RMS Titanic. The goal is to predict survival based on 11 characteristics such as the travel class, age and sex of the passengers.
The original data source is https://www.kaggle.com/c/titanic/data
The data is split up in a training data consisting of 891 observations and a test data of 418 observations. The response in the test set was obtained by combining information from other data files, and has been verified by submitting it as a ‘prediction’ to kaggle and getting perfect marks.
data("data_titanic")
data("data_titanic")
A data frame with 1309 observations on the following variables.
a unique identified for each passenger.
travel class of the passenger.
name of the passenger.
sex of the passenger.
age of the passenger.
number of siblings and spouses traveling with the passenger.
number of parents and children traveling with the passenger.
Ticket number of the passenger.
fare paid for the ticket.
cabin number of the passenger.
Port of embarkation. Takes the values C (Cherbourg), Q (Queenstown) and S (Southampton).
factor indicating casualty or survivor.
vector taking the values “train” or “test” indicating whether the observation belongs to the training or the test data.
https://www.kaggle.com/c/titanic/data
data("data_titanic") traindata <- data_titanic[which(data_titanic$dataType == "train"), -13] testdata <- data_titanic[which(data_titanic$dataType == "test"), -13] str(traindata) table(traindata$y) # The data are used in: ## Not run: vignette("Rpart_examples") ## End(Not run)
data("data_titanic") traindata <- data_titanic[which(data_titanic$dataType == "train"), -13] testdata <- data_titanic[which(data_titanic$dataType == "test"), -13] str(traindata) table(traindata$y) # The data are used in: ## Not run: vignette("Rpart_examples") ## End(Not run)
Constructs feature vectors from a kernel matrix.
makeFV(kmat, transfmat = NULL, precS = 1e-12)
makeFV(kmat, transfmat = NULL, precS = 1e-12)
kmat |
a kernel matrix. If |
transfmat |
transformation matrix. If not |
precS |
if not |
If transfmat
is non-NULL
, we are dealing with a test set.
Denote the number of cases in the test set by . Each row of
kmat
of the test set then must contain the kernel values of a new case with all cases in the training set. Therefore the kernel matrix kmat must have dimensions by
. The matrix
kmat
can e.g. be produced by makeKernel
. It can also be obtained by running kernlab::kernelMatrix
on the union of the training set and the test set, yielding an by
matrix, from which one then takes the
submatrix.
A list with components:
Xf |
When makeKV is applied to the training set, |
transfmat |
square matrix for transforming kmat to |
Raymaekers J., Rousseeuw P.J., Hubert, M.
Raymaekers J., Rousseeuw P.J., Hubert M. (2021). Class maps for visualizing classification results. Technometrics, appeared online. doi:10.1080/00401706.2021.1927849(link to open access pdf)
library(e1071) set.seed(1); X <- matrix(rnorm(200 * 2), ncol = 2) X[1:100, ] <- X[1:100, ] + 2 X[101:150, ] <- X[101:150, ] - 2 y <- as.factor(c(rep("blue", 150), rep("red", 50))) cols <- c("deepskyblue3", "red") plot(X, col = cols[as.numeric(y)], pch = 19) # We now fit an SVM with radial basis kernel to the data: svmfit <- svm(y~., data = data.frame(X = X, y = y), scale = FALSE, kernel = "radial", cost = 10, gamma = 1, probability = TRUE) Kxx <- makeKernel(X, svfit = svmfit) outFV <- makeFV(Kxx) Xf <- outFV$Xf # The data matrix in this feature space. dim(Xf) # The feature vectors are high dimensional. # The inner products of Xf match the kernel matrix: max(abs(as.vector(Kxx - crossprod(t(Xf), t(Xf))))) # 3.005374e-13 # tiny, OK range(rowSums(Xf^2)) # all points in Xf lie on the unit sphere. pairs(Xf[, 1:5], col = cols[as.numeric(y)]) # In some of these we see spherical effects, e.g. plot(Xf[, 1], Xf[, 5], col = cols[as.numeric(y)], pch = 19) # The data look more separable here than in the original # two-dimensional space. # For more examples, we refer to the vignette: ## Not run: vignette("Support_vector_machine_examples") ## End(Not run)
library(e1071) set.seed(1); X <- matrix(rnorm(200 * 2), ncol = 2) X[1:100, ] <- X[1:100, ] + 2 X[101:150, ] <- X[101:150, ] - 2 y <- as.factor(c(rep("blue", 150), rep("red", 50))) cols <- c("deepskyblue3", "red") plot(X, col = cols[as.numeric(y)], pch = 19) # We now fit an SVM with radial basis kernel to the data: svmfit <- svm(y~., data = data.frame(X = X, y = y), scale = FALSE, kernel = "radial", cost = 10, gamma = 1, probability = TRUE) Kxx <- makeKernel(X, svfit = svmfit) outFV <- makeFV(Kxx) Xf <- outFV$Xf # The data matrix in this feature space. dim(Xf) # The feature vectors are high dimensional. # The inner products of Xf match the kernel matrix: max(abs(as.vector(Kxx - crossprod(t(Xf), t(Xf))))) # 3.005374e-13 # tiny, OK range(rowSums(Xf^2)) # all points in Xf lie on the unit sphere. pairs(Xf[, 1:5], col = cols[as.numeric(y)]) # In some of these we see spherical effects, e.g. plot(Xf[, 1], Xf[, 5], col = cols[as.numeric(y)], pch = 19) # The data look more separable here than in the original # two-dimensional space. # For more examples, we refer to the vignette: ## Not run: vignette("Support_vector_machine_examples") ## End(Not run)
Computes kernel value or kernel matrix, where the kernel type is extracted from an svm trained by e1071::svm
.
makeKernel(X1, X2 = NULL, svfit)
makeKernel(X1, X2 = NULL, svfit)
X1 |
first matrix (or vector) of coordinates. |
X2 |
if not |
svfit |
output from |
.
the kernel matrix, of dimensions nrow(X1)
by nrow(X2)
. When both X1
and X2
are vectors, the result is a single number.
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J., Hubert M. (2021). Class maps for visualizing classification results. Technometrics, appeared online. doi:10.1080/00401706.2021.1927849(link to open access pdf)
library(e1071) set.seed(1); X <- matrix(rnorm(200 * 2), ncol = 2) X[1:100, ] <- X[1:100, ] + 2 X[101:150, ] <- X[101:150, ] - 2 y <- as.factor(c(rep("blue", 150), rep("red", 50))) # two classes # We now fit an SVM with radial basis kernel to the data: set.seed(1) # to make the result of svm() reproducible. svmfit <- svm(y~., data = data.frame(X = X, y = y), scale = FALSE, kernel = "radial", cost = 10, gamma = 1, probability = TRUE) Kxx <- makeKernel(X, svfit = svmfit) # The result is a square kernel matrix: dim(Kxx) # 200 200 Kxx[1:5, 1:5] # For more examples, we refer to the vignette: ## Not run: vignette("Support_vector_machine_examples") ## End(Not run)
library(e1071) set.seed(1); X <- matrix(rnorm(200 * 2), ncol = 2) X[1:100, ] <- X[1:100, ] + 2 X[101:150, ] <- X[101:150, ] - 2 y <- as.factor(c(rep("blue", 150), rep("red", 50))) # two classes # We now fit an SVM with radial basis kernel to the data: set.seed(1) # to make the result of svm() reproducible. svmfit <- svm(y~., data = data.frame(X = X, y = y), scale = FALSE, kernel = "radial", cost = 10, gamma = 1, probability = TRUE) Kxx <- makeKernel(X, svfit = svmfit) # The result is a square kernel matrix: dim(Kxx) # 200 200 Kxx[1:5, 1:5] # For more examples, we refer to the vignette: ## Not run: vignette("Support_vector_machine_examples") ## End(Not run)
Draw a quasi residual plot to visualize classification results. The vertical axis of the quasi residual plot shows each case's probability of alternative class (PAC). The horizontal axis shows the feature given as the second argument in the function call.
qresplot(PAC, feat, xlab = NULL, xlim = NULL, main = NULL, identify = FALSE, gray = TRUE, opacity = 1, squareplot = FALSE, plotLoess = FALSE, plotErrorBars = FALSE, plotQuantiles = FALSE, grid = NULL, probs = c(0.5, 0.75), cols = NULL, fac = 1, cex = 1, cex.main = 1.2, cex.lab = 1, cex.axis = 1, pch = 19)
qresplot(PAC, feat, xlab = NULL, xlim = NULL, main = NULL, identify = FALSE, gray = TRUE, opacity = 1, squareplot = FALSE, plotLoess = FALSE, plotErrorBars = FALSE, plotQuantiles = FALSE, grid = NULL, probs = c(0.5, 0.75), cols = NULL, fac = 1, cex = 1, cex.main = 1.2, cex.lab = 1, cex.axis = 1, pch = 19)
PAC |
vector with the PAC values of a classification,
typically the |
feat |
the PAC will be plotted versus this data feature. Note that feat does not have to be one of the explanatory variables of the model. It can be another variable, a combination of variables (like a sum or a principal component score), the row number of the cases if they were recorded succesively, etc. |
xlab |
label for the horizontal axis, i.e. the name of variable feat. |
xlim |
limits for the horizontal axis. If |
main |
title for the plot. |
identify |
if |
gray |
logical, if |
opacity |
determines opacity of plotted dots. Value between 0 and 1, where 0 is transparent and 1 is opaque. |
squareplot |
if |
plotLoess |
if |
plotErrorBars |
if |
plotQuantiles |
if |
grid |
only used when |
probs |
only used when |
cols |
only used when plotquantiles is selected.
A vector with the colors of the quantile curves.
If |
fac |
only used when |
cex |
passed on to |
cex.main |
same, for title. |
cex.lab |
same, for labels on horizontal and vertical axes. |
cex.axis |
same, for axes. |
pch |
plot character for the points, defaults to 19. |
coordinates |
a matrix with 2 columns containing the
coordinates of the plotted points. This makes it
easier to add text next to interesting points.
If |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)
library(rpart) data("data_titanic") traindata <- data_titanic[which(data_titanic$dataType == "train"), -13] set.seed(123) # rpart is not deterministic rpart.out <- rpart(y ~ Pclass + Sex + SibSp + Parch + Fare + Embarked, data = traindata, method = 'class', model = TRUE) mytype <- list(nominal = c("Name", "Sex", "Ticket", "Cabin", "Embarked"), ordratio = c("Pclass")) x_train <- traindata[, -12] y_train <- traindata[, 12] vcrtrain <- vcr.rpart.train(x_train, y_train, rpart.out, mytype) # Quasi residual plot versus age, for males only: PAC <- vcrtrain$PAC[which(x_train$Sex == "male")] feat <- x_train$Age[which(x_train$Sex == "male")] qresplot(PAC, feat, xlab = "Age (years)", opacity = 0.5, main = "quasi residual plot for male passengers", plotLoess = TRUE) text(x = 14, y = 0.60, "loess curve", col = "red", cex = 1)
library(rpart) data("data_titanic") traindata <- data_titanic[which(data_titanic$dataType == "train"), -13] set.seed(123) # rpart is not deterministic rpart.out <- rpart(y ~ Pclass + Sex + SibSp + Parch + Fare + Embarked, data = traindata, method = 'class', model = TRUE) mytype <- list(nominal = c("Name", "Sex", "Ticket", "Cabin", "Embarked"), ordratio = c("Pclass")) x_train <- traindata[, -12] y_train <- traindata[, 12] vcrtrain <- vcr.rpart.train(x_train, y_train, rpart.out, mytype) # Quasi residual plot versus age, for males only: PAC <- vcrtrain$PAC[which(x_train$Sex == "male")] feat <- x_train$Age[which(x_train$Sex == "male")] qresplot(PAC, feat, xlab = "Age (years)", opacity = 0.5, main = "quasi residual plot for male passengers", plotLoess = TRUE) text(x = 14, y = 0.60, "loess curve", col = "red", cex = 1)
Draw the silhouette plot to visualize classification results, based on the output of one of the vcr.*.*
functions in this package. The horizontal axis of the silhouette plot shows each case's s(i)
.
silplot(vcrout, classLabels = NULL, classCols = NULL, showLegend = TRUE, showClassNumbers = FALSE, showCases = FALSE, drawLineAtAverage = FALSE, topdown = TRUE, main = NULL, summary = TRUE)
silplot(vcrout, classLabels = NULL, classCols = NULL, showLegend = TRUE, showClassNumbers = FALSE, showCases = FALSE, drawLineAtAverage = FALSE, topdown = TRUE, main = NULL, summary = TRUE)
vcrout |
output of |
classLabels |
the labels (levels) of the classes. If |
classCols |
a list of colors for the classes. There should be at least as many as there are levels. If |
showLegend |
if |
showClassNumbers |
if |
showCases |
if |
topdown |
if |
drawLineAtAverage |
if |
main |
title for the plot. If |
summary |
if |
A ggplot object containing the silhouette plot.
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)
vcr.da.train
, vcr.da.newdata
,vcr.knn.train
, vcr.knn.newdata
,vcr.svm.train
, vcr.svm.newdata
,vcr.rpart.train
, vcr.rpart.newdata
,vcr.forest.train
, vcr.forest.newdata
,vcr.neural.train
, vcr.neural.newdata
vcrout <- vcr.da.train(iris[, 1:4], iris[, 5]) silplot(vcrout) # For more examples, we refer to the vignettes: ## Not run: vignette("Discriminant_analysis_examples") vignette("K_nearest_neighbors_examples") vignette("Support_vector_machine_examples") vignette("Rpart_examples") vignette("Forest_examples") vignette("Neural_net_examples") ## End(Not run)
vcrout <- vcr.da.train(iris[, 1:4], iris[, 5]) silplot(vcrout) # For more examples, we refer to the vignettes: ## Not run: vignette("Discriminant_analysis_examples") vignette("K_nearest_neighbors_examples") vignette("Support_vector_machine_examples") vignette("Rpart_examples") vignette("Forest_examples") vignette("Neural_net_examples") ## End(Not run)
Make a vertically stacked mosaic plot of class predictions from the output of
vcr.*.train
or vcr.*.newdata
. Optionally,
the outliers for each class can be shown as a gray rectangle at the top.
stackedplot(vcrout, cutoff = 0.99, classCols = NULL, classLabels = NULL, separSize=1, minSize=1.5, showOutliers = TRUE, showLegend = FALSE, main = NULL, htitle = NULL, vtitle = NULL)
stackedplot(vcrout, cutoff = 0.99, classCols = NULL, classLabels = NULL, separSize=1, minSize=1.5, showOutliers = TRUE, showLegend = FALSE, main = NULL, htitle = NULL, vtitle = NULL)
vcrout |
output of |
cutoff |
cases with overall farness |
classCols |
user-specified colors for the classes. If |
classLabels |
names of given labels. If |
separSize |
how much white between rectangles. |
minSize |
rectangles describing less than |
showOutliers |
if |
showLegend |
if |
main |
title for the plot. |
htitle |
title for horizontal axis (given labels). If |
vtitle |
title for vertical axis (predicted labels). If |
A ggplot object.
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J., Hubert M. (2021). Class maps for visualizing classification results. Technometrics, appeared online. doi:10.1080/00401706.2021.1927849(link to open access pdf)
vcr.da.train
, vcr.da.newdata
,vcr.knn.train
, vcr.knn.newdata
,vcr.svm.train
, vcr.svm.newdata
,vcr.rpart.train
, vcr.rpart.newdata
,vcr.forest.train
, vcr.forest.newdata
,vcr.neural.train
, vcr.neural.newdata
data("data_floralbuds") X <- data_floralbuds[, 1:6]; y <- data_floralbuds[, 7] vcrout <- vcr.da.train(X, y) cols <- c("saddlebrown", "orange", "olivedrab4", "royalblue3") stackedplot(vcrout, classCols = cols, showLegend = TRUE) # The legend is not really needed, since we can read the # color of a class from the bottom of its vertical bar: stackedplot(vcrout, classCols = cols, main = "Stacked plot of QDA on foral buds data") # If we do not wish to show outliers: stackedplot(vcrout, classCols = cols, showOutliers = FALSE) # For more examples, we refer to the vignettes: ## Not run: vignette("Discriminant_analysis_examples") vignette("K_nearest_neighbors_examples") vignette("Support_vector_machine_examples") vignette("Rpart_examples") vignette("Random_forest_examples") vignette("Neural_net_examples") ## End(Not run)
data("data_floralbuds") X <- data_floralbuds[, 1:6]; y <- data_floralbuds[, 7] vcrout <- vcr.da.train(X, y) cols <- c("saddlebrown", "orange", "olivedrab4", "royalblue3") stackedplot(vcrout, classCols = cols, showLegend = TRUE) # The legend is not really needed, since we can read the # color of a class from the bottom of its vertical bar: stackedplot(vcrout, classCols = cols, main = "Stacked plot of QDA on foral buds data") # If we do not wish to show outliers: stackedplot(vcrout, classCols = cols, showOutliers = FALSE) # For more examples, we refer to the vignettes: ## Not run: vignette("Discriminant_analysis_examples") vignette("K_nearest_neighbors_examples") vignette("Support_vector_machine_examples") vignette("Rpart_examples") vignette("Random_forest_examples") vignette("Neural_net_examples") ## End(Not run)
Predicts class labels for new data by discriminant analysis, using the output of vcr.da.train
on the training data. For new data cases whose label in yintnew
is non-missing, additional output is produced for constructing graphical displays such as the classmap
.
vcr.da.newdata(Xnew, ynew=NULL, vcr.da.train.out)
vcr.da.newdata(Xnew, ynew=NULL, vcr.da.train.out)
Xnew |
data matrix of the new data, with the same number of columns as in the training data. Missing values are not allowed. |
ynew |
factor with class membership of each new case. Can be |
vcr.da.train.out |
output of |
A list with components:
yintnew |
number of the given class of each case. Can contain |
ynew |
given class label of each case. Can contain |
levels |
levels of the response, from |
predint |
predicted class number of each case. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
label of the alternative class. Is |
PAC |
probability of the alternative class. Is |
fig |
distance of each case |
farness |
farness of each case |
ofarness |
For each case |
classMS |
list with center and covariance matrix of each class, from |
lCurrent |
log of mixture density of each case in its given class. Is |
lPred |
log of mixture density of each case in its predicted class. Always exists. |
lAlt |
log of mixture density of each case in its alternative class. Is |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J., Hubert M. (2021). Class maps for visualizing classification results. Technometrics, appeared online. doi:10.1080/00401706.2021.1927849(link to open access pdf)
vcr.da.train
, classmap
, silplot
, stackedplot
vcr.train <- vcr.da.train(iris[, 1:4], iris[, 5]) inds <- c(51:150) # a subset, containing only 2 classes iris2 <- iris[inds, ] # fake "new" data iris2[c(1:10, 51:60), 5] <- NA vcr.test <- vcr.da.newdata(iris2[, 1:4], iris2[, 5], vcr.train) vcr.test$PAC[1:25] # between 0 and 1. Is NA where the response is. plot(vcr.test$PAC, vcr.train$PAC[inds]); abline(0, 1) # match plot(vcr.test$farness, vcr.train$farness[inds]); abline(0, 1) # match confmat.vcr(vcr.train) # for comparison confmat.vcr(vcr.test) stackedplot(vcr.train) # for comparison stackedplot(vcr.test) classmap(vcr.train, "versicolor", classCols = 2:4) # for comparison classmap(vcr.test, "versicolor", classCols = 2:4) # has fewer points # For more examples, we refer to the vignette: ## Not run: vignette("Discriminant_analysis_examples") ## End(Not run)
vcr.train <- vcr.da.train(iris[, 1:4], iris[, 5]) inds <- c(51:150) # a subset, containing only 2 classes iris2 <- iris[inds, ] # fake "new" data iris2[c(1:10, 51:60), 5] <- NA vcr.test <- vcr.da.newdata(iris2[, 1:4], iris2[, 5], vcr.train) vcr.test$PAC[1:25] # between 0 and 1. Is NA where the response is. plot(vcr.test$PAC, vcr.train$PAC[inds]); abline(0, 1) # match plot(vcr.test$farness, vcr.train$farness[inds]); abline(0, 1) # match confmat.vcr(vcr.train) # for comparison confmat.vcr(vcr.test) stackedplot(vcr.train) # for comparison stackedplot(vcr.test) classmap(vcr.train, "versicolor", classCols = 2:4) # for comparison classmap(vcr.test, "versicolor", classCols = 2:4) # has fewer points # For more examples, we refer to the vignette: ## Not run: vignette("Discriminant_analysis_examples") ## End(Not run)
Custom DA function which prepares for graphical displays such as the classmap
. The disciminant analysis itself is carried out by the maximum a posteriori rule, which maximizes the density of the mixture.
vcr.da.train(X, y, rule = "QDA", estmethod = "meancov")
vcr.da.train(X, y, rule = "QDA", estmethod = "meancov")
X |
a numerical matrix containing the predictors in its columns. Missing values are not allowed. |
y |
a factor with the given class labels. |
rule |
either " |
estmethod |
function for location and covariance estimation.
Should return a list with the center |
A list with components:
yint |
number of the given class of each case. Can contain |
y |
given class label of each case. Can contain |
levels |
levels of |
predint |
predicted class number of each case. For each case this is the class with the highest posterior probability. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
label of the alternative class. Is |
PAC |
probability of the alternative class. Is |
figparams |
parameters for computing |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
classMS |
list with center and covariance matrix of each class |
lCurrent |
log of mixture density of each case in its given class. Is |
lPred |
log of mixture density of each case in its predicted class. Always exists. |
lAlt |
log of mixture density of each case in its alternative class. Is |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J., Hubert M. (2021). Class maps for visualizing classification results. Technometrics, appeared online. doi:10.1080/00401706.2021.1927849(link to open access pdf)
vcr.da.newdata
, classmap
, silplot
, stackedplot
data("data_floralbuds") X <- data_floralbuds[, 1:6]; y <- data_floralbuds[, 7] vcrout <- vcr.da.train(X, y, rule = "QDA") # For linear discriminant analysis, put rule = "LDA". confmat.vcr(vcrout) # There are a few outliers cols <- c("saddlebrown", "orange", "olivedrab4", "royalblue3") stackedplot(vcrout, classCols = cols) classmap(vcrout, "bud", classCols = cols) # For more examples, we refer to the vignette: ## Not run: vignette("Discriminant_analysis_examples") ## End(Not run)
data("data_floralbuds") X <- data_floralbuds[, 1:6]; y <- data_floralbuds[, 7] vcrout <- vcr.da.train(X, y, rule = "QDA") # For linear discriminant analysis, put rule = "LDA". confmat.vcr(vcrout) # There are a few outliers cols <- c("saddlebrown", "orange", "olivedrab4", "royalblue3") stackedplot(vcrout, classCols = cols) classmap(vcrout, "bud", classCols = cols) # For more examples, we refer to the vignette: ## Not run: vignette("Discriminant_analysis_examples") ## End(Not run)
Produces output for the purpose of constructing graphical displays such as the classmap
on new data. Requires the output of
vcr.forest.train
as an argument.
vcr.forest.newdata(Xnew, ynew = NULL, vcr.forest.train.out, LOO = FALSE)
vcr.forest.newdata(Xnew, ynew = NULL, vcr.forest.train.out, LOO = FALSE)
Xnew |
data matrix of the new data, with the same
number of columns |
ynew |
factor with class membership of each new case. Can be |
vcr.forest.train.out |
output of |
LOO |
leave one out. Only used when testing this function on a subset of the training data. Default is |
A list with components:
yintnew |
number of the given class of each case. Can contain |
ynew |
given class label of each case. Can contain |
levels |
levels of the response, from |
predint |
predicted class number of each case. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
alternative label if yintnew was given, else |
PAC |
probability of the alternative class. Is |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)
vcr.forest.train
, classmap
, silplot
, stackedplot
library(randomForest) data("data_instagram") traindata <- data_instagram[which(data_instagram$dataType == "train"), -13] set.seed(71) # randomForest is not deterministic rfout <- randomForest(y ~ ., data = traindata, keep.forest = TRUE) mytype <- list(symm = c(1, 5, 7, 8)) # These 4 columns are # (symmetric) binary variables. The variables that are not # listed are interval-scaled by default. x_train <- traindata[, -12] y_train <- traindata[, 12] vcrtrain <- vcr.forest.train(X = x_train, y = y_train, trainfit = rfout, type = mytype) testdata <- data_instagram[which(data_instagram$dataType == "test"), -13] Xnew <- testdata[, -12] ynew <- testdata[, 12] vcrtest <- vcr.forest.newdata(Xnew, ynew, vcrtrain) confmat.vcr(vcrtest) stackedplot(vcrtest, classCol = c(4, 2)) silplot(vcrtest, classCols = c(4, 2)) classmap(vcrtest, "genuine", classCols = c(4, 2)) classmap(vcrtest, "fake", classCols = c(4, 2)) # For more examples, we refer to the vignette: ## Not run: vignette("Random_forest_examples") ## End(Not run)
library(randomForest) data("data_instagram") traindata <- data_instagram[which(data_instagram$dataType == "train"), -13] set.seed(71) # randomForest is not deterministic rfout <- randomForest(y ~ ., data = traindata, keep.forest = TRUE) mytype <- list(symm = c(1, 5, 7, 8)) # These 4 columns are # (symmetric) binary variables. The variables that are not # listed are interval-scaled by default. x_train <- traindata[, -12] y_train <- traindata[, 12] vcrtrain <- vcr.forest.train(X = x_train, y = y_train, trainfit = rfout, type = mytype) testdata <- data_instagram[which(data_instagram$dataType == "test"), -13] Xnew <- testdata[, -12] ynew <- testdata[, 12] vcrtest <- vcr.forest.newdata(Xnew, ynew, vcrtrain) confmat.vcr(vcrtest) stackedplot(vcrtest, classCol = c(4, 2)) silplot(vcrtest, classCols = c(4, 2)) classmap(vcrtest, "genuine", classCols = c(4, 2)) classmap(vcrtest, "fake", classCols = c(4, 2)) # For more examples, we refer to the vignette: ## Not run: vignette("Random_forest_examples") ## End(Not run)
Produces output for the purpose of constructing graphical displays such as the classmap
and silplot
. The user first needs to train a random forest on the data by randomForest::randomForest
.
This then serves as an argument to vcr.forest.train
.
vcr.forest.train(X, y, trainfit, type = list(), k = 5, stand = TRUE)
vcr.forest.train(X, y, trainfit, type = list(), k = 5, stand = TRUE)
X |
A rectangular matrix or data frame, where the columns (variables) may be of mixed type. |
y |
factor with the given class labels.
It is crucial that |
trainfit |
the output of a |
k |
the number of nearest neighbors used in the farness computation. |
type |
list for specifying some (or all) of the types of the
variables (columns) in |
stand |
whether or not to standardize numerical (interval scaled) variables by their range as in the original |
A list with components:
X |
The data used to train the forest. |
yint |
number of the given class of each case. Can contain |
y |
given class label of each case. Can contain |
levels |
levels of |
predint |
predicted class number of each case. For each case this is the class with the highest posterior probability. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
label of the alternative class. Is |
PAC |
probability of the alternative class. Is |
figparams |
parameters for computing |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
trainfit |
The trained random forest which was given as an input to this function. |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)
vcr.forest.newdata
, classmap
, silplot
, stackedplot
library(randomForest) data("data_instagram") traindata <- data_instagram[which(data_instagram$dataType == "train"), -13] set.seed(71) # randomForest is not deterministic rfout <- randomForest(y~., data = traindata, keep.forest = TRUE) mytype <- list(symm = c(1, 5, 7, 8)) # These 4 columns are # (symmetric) binary variables. The variables that are not # listed are interval-scaled by default. x_train <- traindata[, -12] y_train <- traindata[, 12] # Prepare for visualization: vcrtrain <- vcr.forest.train(X = x_train, y = y_train, trainfit = rfout, type = mytype) confmat.vcr(vcrtrain) stackedplot(vcrtrain, classCols = c(4, 2)) silplot(vcrtrain, classCols = c(4, 2)) classmap(vcrtrain, "genuine", classCols = c(4, 2)) classmap(vcrtrain, "fake", classCols = c(4, 2)) # For more examples, we refer to the vignette: ## Not run: vignette("Random_forest_examples") ## End(Not run)
library(randomForest) data("data_instagram") traindata <- data_instagram[which(data_instagram$dataType == "train"), -13] set.seed(71) # randomForest is not deterministic rfout <- randomForest(y~., data = traindata, keep.forest = TRUE) mytype <- list(symm = c(1, 5, 7, 8)) # These 4 columns are # (symmetric) binary variables. The variables that are not # listed are interval-scaled by default. x_train <- traindata[, -12] y_train <- traindata[, 12] # Prepare for visualization: vcrtrain <- vcr.forest.train(X = x_train, y = y_train, trainfit = rfout, type = mytype) confmat.vcr(vcrtrain) stackedplot(vcrtrain, classCols = c(4, 2)) silplot(vcrtrain, classCols = c(4, 2)) classmap(vcrtrain, "genuine", classCols = c(4, 2)) classmap(vcrtrain, "fake", classCols = c(4, 2)) # For more examples, we refer to the vignette: ## Not run: vignette("Random_forest_examples") ## End(Not run)
Predicts class labels for new data by k nearest neighbors, using the output of vcr.knn.train
on the training data. For cases in the new data whose given label ynew
is not NA
, additional output is produced for constructing graphical displays such as the classmap
.
vcr.knn.newdata(Xnew, ynew = NULL, vcr.knn.train.out, LOO = FALSE)
vcr.knn.newdata(Xnew, ynew = NULL, vcr.knn.train.out, LOO = FALSE)
Xnew |
If the training data was a matrix of coordinates, |
ynew |
factor with class membership of each new case. Can be |
vcr.knn.train.out |
output of |
LOO |
leave one out. Only used when testing this function on a subset of the training data. Default is |
A list with components:
yintnew |
number of the given class of each case. Can contain |
ynew |
given class label of each case. Can contain |
levels |
levels of the response, from |
predint |
predicted class number of each case. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
label of the alternative class. Is |
PAC |
probability of the alternative class. Is |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
k |
the requested number of nearest neighbors, from |
ktrues |
for each case this contains the actual number of elements in its neighborhood. This can be higher than |
counts |
a matrix with 3 columns, each row representing a case. For the neighborhood of each case it says how many members it has from the given class, the predicted class, and the alternative class. The first and third entry is |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J., Hubert M. (2021). Class maps for visualizing classification results. Technometrics, appeared online. doi:10.1080/00401706.2021.1927849(link to open access pdf)
vcr.knn.train
, classmap
, silplot
, stackedplot
data("data_floralbuds") X <- data_floralbuds[, 1:6]; y <- data_floralbuds[, 7] set.seed(12345); trainset <- sample(1:550, 275) vcr.train <- vcr.knn.train(X[trainset, ], y[trainset], k = 5) vcr.test <- vcr.knn.newdata(X[-trainset, ], y[-trainset], vcr.train) confmat.vcr(vcr.train) # for comparison confmat.vcr(vcr.test) cols <- c("saddlebrown", "orange", "olivedrab4", "royalblue3") stackedplot(vcr.train, classCols = cols) # for comparison stackedplot(vcr.test, classCols = cols) classmap(vcr.train, "bud", classCols = cols) # for comparison classmap(vcr.test, "bud", classCols = cols) # For more examples, we refer to the vignette: ## Not run: vignette("K_nearest_neighbors_examples") ## End(Not run)
data("data_floralbuds") X <- data_floralbuds[, 1:6]; y <- data_floralbuds[, 7] set.seed(12345); trainset <- sample(1:550, 275) vcr.train <- vcr.knn.train(X[trainset, ], y[trainset], k = 5) vcr.test <- vcr.knn.newdata(X[-trainset, ], y[-trainset], vcr.train) confmat.vcr(vcr.train) # for comparison confmat.vcr(vcr.test) cols <- c("saddlebrown", "orange", "olivedrab4", "royalblue3") stackedplot(vcr.train, classCols = cols) # for comparison stackedplot(vcr.test, classCols = cols) classmap(vcr.train, "bud", classCols = cols) # for comparison classmap(vcr.test, "bud", classCols = cols) # For more examples, we refer to the vignette: ## Not run: vignette("K_nearest_neighbors_examples") ## End(Not run)
Carries out a k-nearest neighbor classification on the training data. Various additional output is produced for the purpose of constructing graphical displays such as the classmap
.
vcr.knn.train(X, y, k)
vcr.knn.train(X, y, k)
X |
This can be a rectangular matrix or data frame of (already standardized) measurements, or a dist object obtained from |
y |
factor with the given (observed) class labels. There need to be non-missing |
k |
the number of nearest neighbors used. It can be selected by running cross-validation using a different package. |
A list with components:
yint |
number of the given class of each case. Can contain |
y |
given class label of each case. Can contain |
levels |
levels of |
predint |
predicted class number of each case. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
label of the alternative class. Is |
PAC |
probability of the alternative class. Is |
figparams |
parameters used to compute |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
k |
the requested number of nearest neighbors, from the arguments. Will also be used for classifying new data. |
ktrues |
for each case this contains the actual number of elements in its neighborhood. This can be higher than |
counts |
a matrix with 3 columns, each row representing a case. For the neighborhood of each case it says how many members it has from the given class, the predicted class, and the alternative class. The first and third entry is |
X |
If the argument |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J., Hubert M. (2021). Class maps for visualizing classification results. Technometrics, appeared online. doi:10.1080/00401706.2021.1927849(link to open access pdf)
vcr.knn.newdata
, classmap
, silplot
, stackedplot
vcrout <- vcr.knn.train(iris[, 1:4], iris[, 5], k = 5) confmat.vcr(vcrout) stackedplot(vcrout) classmap(vcrout, "versicolor", classCols = 2:4) # The cases misclassified as virginica are shown in blue. # For more examples, we refer to the vignette: ## Not run: vignette("K_nearest_neighbors_examples") ## End(Not run)
vcrout <- vcr.knn.train(iris[, 1:4], iris[, 5], k = 5) confmat.vcr(vcrout) stackedplot(vcrout) classmap(vcrout, "versicolor", classCols = 2:4) # The cases misclassified as virginica are shown in blue. # For more examples, we refer to the vignette: ## Not run: vignette("K_nearest_neighbors_examples") ## End(Not run)
Prepares graphical display of new data fitted by a neural
net that was modeled on the training data, using the output
of vcr.neural.train
on the training data.
vcr.neural.newdata(Xnew, ynew = NULL, probs, vcr.neural.train.out)
vcr.neural.newdata(Xnew, ynew = NULL, probs, vcr.neural.train.out)
Xnew |
data matrix of the new data, with the same number of columns as in the training data. Missing values in |
ynew |
factor with class membership of each new case. Can be |
probs |
posterior probabilities obtained by running the neural net on the new data. |
vcr.neural.train.out |
output of |
A list with components:
yintnew |
number of the given class of each case. Can contain |
ynew |
given class label of each case. Can contain |
levels |
levels of the response, from |
predint |
predicted class number of each case. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
alternative label if yintnew was given, else |
PAC |
probability of the alternative class. Is |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)
vcr.neural.train
, classmap
, silplot
, stackedplot
# For examples, we refer to the vignette: ## Not run: vignette("Neural_net_examples") ## End(Not run)
# For examples, we refer to the vignette: ## Not run: vignette("Neural_net_examples") ## End(Not run)
Produces output for the purpose of constructing graphical displays such as the classmap
. The user first needs train a neural network. The representation of the data in a given layer (e.g. the final layer before applying the softmax function) then serves as the argument X
to vcr.neural.train
.
vcr.neural.train(X, y, probs, estmethod = meancov)
vcr.neural.train(X, y, probs, estmethod = meancov)
X |
the coordinates of the |
y |
factor with the given class labels of the objects. Make sure that the levels are in the same order as used in the neural net, i.e. the columns of its binary "once-hot-encoded" response vectors. |
probs |
posterior probabilities obtained by the neural
net, e.g. in keras. For each case (row of |
estmethod |
function for location and covariance estimation.
Should return a list with |
A list with components:
X |
the coordinates of the |
yint |
number of the given class of each case. Can contain |
y |
given class label of each case. Can contain |
levels |
levels of |
predint |
predicted class number of each case. For each case this is the class with the highest posterior probability. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
label of the alternative class. Is |
ncolX |
number of columns in |
PAC |
probability of the alternative class. Is |
computeMD |
Whether or not the farness is computed using the Mahalanobis distance. |
classMS |
list with center and covariance matrix of each class |
PCAfits |
if not |
figparams |
parameters for computing |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)
vcr.neural.newdata
, classmap
, silplot
, stackedplot
# For examples, we refer to the vignette: ## Not run: vignette("Neural_net_examples") ## End(Not run)
# For examples, we refer to the vignette: ## Not run: vignette("Neural_net_examples") ## End(Not run)
Produces output for the purpose of constructing graphical displays such as the classmap
on new data. Requires the output of
vcr.rpart.train
as an argument.
vcr.rpart.newdata(Xnew, ynew = NULL, vcr.rpart.train.out, LOO = FALSE)
vcr.rpart.newdata(Xnew, ynew = NULL, vcr.rpart.train.out, LOO = FALSE)
Xnew |
data matrix of the new data, with the same
number of columns |
ynew |
factor with class membership of each new case. Can be |
vcr.rpart.train.out |
output of |
LOO |
leave one out. Only used when testing this function on a subset of the training data. Default is |
A list with components:
yintnew |
number of the given class of each case. Can contain |
ynew |
given class label of each case. Can contain |
levels |
levels of the response, from |
predint |
predicted class number of each case. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
alternative label if yintnew was given, else |
PAC |
probability of the alternative class. Is |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)
vcr.rpart.train
, classmap
, silplot
, stackedplot
library(rpart) data("data_titanic") traindata <- data_titanic[which(data_titanic$dataType == "train"), -13] str(traindata); table(traindata$y) set.seed(123) # rpart is not deterministic rpart.out <- rpart(y ~ Pclass + Sex + SibSp + Parch + Fare + Embarked, data = traindata, method = 'class', model = TRUE) y_train <- traindata[, 12] x_train <- traindata[, -12] mytype <- list(nominal = c("Name", "Sex", "Ticket", "Cabin", "Embarked"), ordratio = c("Pclass")) # These are 5 nominal columns, and one ordinal. # The variables not listed are by default interval-scaled. vcrtrain <- vcr.rpart.train(x_train, y_train, rpart.out, mytype) testdata <- data_titanic[which(data_titanic$dataType == "test"), -13] dim(testdata) x_test <- testdata[, -12] y_test <- testdata[, 12] vcrtest <- vcr.rpart.newdata(x_test, y_test, vcrtrain) confmat.vcr(vcrtest) silplot(vcrtest, classCols = c(2, 4)) classmap(vcrtest, "casualty", classCols = c(2, 4)) classmap(vcrtest, "survived", classCols = c(2, 4)) # For more examples, we refer to the vignette: ## Not run: vignette("Rpart_examples") ## End(Not run)
library(rpart) data("data_titanic") traindata <- data_titanic[which(data_titanic$dataType == "train"), -13] str(traindata); table(traindata$y) set.seed(123) # rpart is not deterministic rpart.out <- rpart(y ~ Pclass + Sex + SibSp + Parch + Fare + Embarked, data = traindata, method = 'class', model = TRUE) y_train <- traindata[, 12] x_train <- traindata[, -12] mytype <- list(nominal = c("Name", "Sex", "Ticket", "Cabin", "Embarked"), ordratio = c("Pclass")) # These are 5 nominal columns, and one ordinal. # The variables not listed are by default interval-scaled. vcrtrain <- vcr.rpart.train(x_train, y_train, rpart.out, mytype) testdata <- data_titanic[which(data_titanic$dataType == "test"), -13] dim(testdata) x_test <- testdata[, -12] y_test <- testdata[, 12] vcrtest <- vcr.rpart.newdata(x_test, y_test, vcrtrain) confmat.vcr(vcrtest) silplot(vcrtest, classCols = c(2, 4)) classmap(vcrtest, "casualty", classCols = c(2, 4)) classmap(vcrtest, "survived", classCols = c(2, 4)) # For more examples, we refer to the vignette: ## Not run: vignette("Rpart_examples") ## End(Not run)
Produces output for the purpose of constructing graphical displays such as the classmap
. The user first needs to train a
classification tree on the data by rpart::rpart
.
This then serves as an argument to vcr.rpart.train
.
vcr.rpart.train(X, y, trainfit, type = list(), k = 5, stand = TRUE)
vcr.rpart.train(X, y, trainfit, type = list(), k = 5, stand = TRUE)
X |
A rectangular matrix or data frame, where the
columns (variables) may be of mixed type and
may contain |
y |
factor with the given class labels.
It is crucial that |
k |
the number of nearest neighbors used in the farness computation. |
trainfit |
the output of an |
type |
list for specifying some (or all) of the types of the
variables (columns) in |
stand |
whether or not to standardize numerical (interval scaled) variables by their range as in the original |
A list with components:
X |
The input data |
yint |
number of the given class of each case. Can contain |
y |
given class label of each case. Can contain |
levels |
levels of |
predint |
predicted class number of each case. For each case this is the class with the highest posterior probability. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
label of the alternative class. Is |
PAC |
probability of the alternative class. Is |
figparams |
parameters for computing |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
trainfit |
the trainfit used to build the VCR object. |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)
vcr.rpart.newdata
, classmap
, silplot
, stackedplot
library(rpart) data("data_titanic") traindata <- data_titanic[which(data_titanic$dataType == "train"), -13] str(traindata); table(traindata$y) set.seed(123) # rpart is not deterministic rpart.out <- rpart(y ~ Pclass + Sex + SibSp + Parch + Fare + Embarked, data = traindata, method = 'class', model = TRUE) y_train <- traindata[, 12] x_train <- traindata[, -12] mytype <- list(nominal = c("Name", "Sex", "Ticket", "Cabin", "Embarked"), ordratio = c("Pclass")) # These are 5 nominal columns, and one ordinal. # The variables not listed are by default interval-scaled. vcrtrain <- vcr.rpart.train(x_train, y_train, rpart.out, mytype) confmat.vcr(vcrtrain) silplot(vcrtrain, classCols = c(2, 4)) classmap(vcrtrain, "casualty", classCols = c(2, 4)) classmap(vcrtrain, "survived", classCols = c(2, 4)) # For more examples, we refer to the vignette: ## Not run: vignette("Rpart_examples") ## End(Not run)
library(rpart) data("data_titanic") traindata <- data_titanic[which(data_titanic$dataType == "train"), -13] str(traindata); table(traindata$y) set.seed(123) # rpart is not deterministic rpart.out <- rpart(y ~ Pclass + Sex + SibSp + Parch + Fare + Embarked, data = traindata, method = 'class', model = TRUE) y_train <- traindata[, 12] x_train <- traindata[, -12] mytype <- list(nominal = c("Name", "Sex", "Ticket", "Cabin", "Embarked"), ordratio = c("Pclass")) # These are 5 nominal columns, and one ordinal. # The variables not listed are by default interval-scaled. vcrtrain <- vcr.rpart.train(x_train, y_train, rpart.out, mytype) confmat.vcr(vcrtrain) silplot(vcrtrain, classCols = c(2, 4)) classmap(vcrtrain, "casualty", classCols = c(2, 4)) classmap(vcrtrain, "survived", classCols = c(2, 4)) # For more examples, we refer to the vignette: ## Not run: vignette("Rpart_examples") ## End(Not run)
Carries out a support vector machine classification of new data using the output of vcr.svm.train
on the training data, and computes the quantities needed for its visualization.
vcr.svm.newdata(Xnew, ynew = NULL, vcr.svm.train.out)
vcr.svm.newdata(Xnew, ynew = NULL, vcr.svm.train.out)
Xnew |
data matrix of the new data, with the same number of columns as in the training data. Missing values in |
ynew |
factor with class membership of each new case. Can be |
vcr.svm.train.out |
output of |
A list with components:
yintnew |
number of the given class of each case. Can contain |
ynew |
given class label of each case. Can contain |
levels |
levels of the response, from |
predint |
predicted class number of each case. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
alternative label if yintnew was given, else |
PAC |
probability of the alternative class. Is |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J., Hubert M. (2021). Class maps for visualizing classification results. Technometrics, appeared online. doi:10.1080/00401706.2021.1927849(link to open access pdf)
vcr.svm.train
, classmap
, silplot
, stackedplot
, e1071::svm
library(e1071) set.seed(1); X <- matrix(rnorm(200 * 2), ncol = 2) X[1:100, ] <- X[1:100, ] + 2 X[101:150, ] <- X[101:150, ] - 2 y <- as.factor(c(rep("blue", 150), rep("red", 50))) # We now fit an SVM with radial basis kernel to the data: set.seed(1) # to make the result of svm() reproducible. svmfit <- svm(y~., data = data.frame(X = X, y = y), scale = FALSE, kernel = "radial", cost = 10, gamma = 1, probability = TRUE) vcr.train <- vcr.svm.train(X, y, svfit = svmfit) # As "new" data we take a subset of the training data: inds <- c(1:25, 101:125, 151:175) vcr.test <- vcr.svm.newdata(X[inds, ], y[inds], vcr.train) plot(vcr.test$PAC, vcr.train$PAC[inds]); abline(0, 1) # match plot(vcr.test$farness, vcr.train$farness[inds]); abline(0, 1) confmat.vcr(vcr.test) cols <- c("deepskyblue3", "red") stackedplot(vcr.test, classCols = cols) classmap(vcr.train, "blue", classCols = cols) # for comparison classmap(vcr.test, "blue", classCols = cols) classmap(vcr.train, "red", classCols = cols) # for comparison classmap(vcr.test, "red", classCols = cols) # For more examples, we refer to the vignette: ## Not run: vignette("Support_vector_machine_examples") ## End(Not run)
library(e1071) set.seed(1); X <- matrix(rnorm(200 * 2), ncol = 2) X[1:100, ] <- X[1:100, ] + 2 X[101:150, ] <- X[101:150, ] - 2 y <- as.factor(c(rep("blue", 150), rep("red", 50))) # We now fit an SVM with radial basis kernel to the data: set.seed(1) # to make the result of svm() reproducible. svmfit <- svm(y~., data = data.frame(X = X, y = y), scale = FALSE, kernel = "radial", cost = 10, gamma = 1, probability = TRUE) vcr.train <- vcr.svm.train(X, y, svfit = svmfit) # As "new" data we take a subset of the training data: inds <- c(1:25, 101:125, 151:175) vcr.test <- vcr.svm.newdata(X[inds, ], y[inds], vcr.train) plot(vcr.test$PAC, vcr.train$PAC[inds]); abline(0, 1) # match plot(vcr.test$farness, vcr.train$farness[inds]); abline(0, 1) confmat.vcr(vcr.test) cols <- c("deepskyblue3", "red") stackedplot(vcr.test, classCols = cols) classmap(vcr.train, "blue", classCols = cols) # for comparison classmap(vcr.test, "blue", classCols = cols) classmap(vcr.train, "red", classCols = cols) # for comparison classmap(vcr.test, "red", classCols = cols) # For more examples, we refer to the vignette: ## Not run: vignette("Support_vector_machine_examples") ## End(Not run)
Produces output for the purpose of constructing graphical displays such as the classmap
. The user first needs to run a support vector machine classification on the data by e1071::svm
, with the option probability = TRUE
. This classification can be with two or more classes. The output of e1071::svm
is then an argument to vcr.svm.train
. As e1071::svm
does not output the data itself, it needs to be given as well, in the arguments X
and y
.
vcr.svm.train(X, y, svfit, ortho = FALSE)
vcr.svm.train(X, y, svfit, ortho = FALSE)
X |
matrix of data coordinates, as used in |
y |
factor with the given (observed) class labels. It is crucial that X and y are exactly the same as in the call to |
svfit |
an object returned by |
ortho |
If |
A list with components:
yint |
number of the given class of each case. Can contain |
y |
given class label of each case. Can contain |
levels |
levels of the response |
predint |
predicted class number of each case. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
label of the alternative class. Is |
PAC |
probability of the alternative class. Is |
figparams |
parameters used in |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
svfit |
as it was input, will be useful for new data. |
X |
the matrix of data coordinates from the arguments. This is useful for classifying new data. |
Raymaekers J., Rousseeuw P.J.
Raymaekers J., Rousseeuw P.J., Hubert M. (2021). Class maps for visualizing classification results. Technometrics, appeared online. doi:10.1080/00401706.2021.1927849(link to open access pdf)
vcr.knn.newdata
, classmap
, silplot
, stackedplot
, e1071::svm
library(e1071) set.seed(1); X <- matrix(rnorm(200 * 2), ncol = 2) X[1:100, ] <- X[1:100, ] + 2 X[101:150, ] <- X[101:150, ] - 2 y <- as.factor(c(rep("blue", 150), rep("red", 50))) cols <- c("deepskyblue3", "red") plot(X, col = cols[as.numeric(y)], pch = 19) # We now fit an SVM with radial basis kernel to the data: set.seed(1) # to make the result of svm() reproducible. svmfit <- svm(y~., data = data.frame(X = X, y = y), scale = FALSE, kernel = "radial", cost = 10, gamma = 1, probability = TRUE) plot(svmfit$decision.values, col = cols[as.numeric(y)]); abline(h = 0) # so the decision values separate the classes reasonably well. plot(svmfit, data = data.frame(X = X, y = y), X.2~X.1, col = cols) # The boundary is far from linear (but in feature space it is). vcr.train <- vcr.svm.train(X, y, svfit = svmfit) confmat.vcr(vcr.train) stackedplot(vcr.train, classCols = cols) classmap(vcr.train, "blue", classCols = cols) classmap(vcr.train, "red", classCols = cols) # For more examples, we refer to the vignette: ## Not run: vignette("Support_vector_machine_examples") ## End(Not run)
library(e1071) set.seed(1); X <- matrix(rnorm(200 * 2), ncol = 2) X[1:100, ] <- X[1:100, ] + 2 X[101:150, ] <- X[101:150, ] - 2 y <- as.factor(c(rep("blue", 150), rep("red", 50))) cols <- c("deepskyblue3", "red") plot(X, col = cols[as.numeric(y)], pch = 19) # We now fit an SVM with radial basis kernel to the data: set.seed(1) # to make the result of svm() reproducible. svmfit <- svm(y~., data = data.frame(X = X, y = y), scale = FALSE, kernel = "radial", cost = 10, gamma = 1, probability = TRUE) plot(svmfit$decision.values, col = cols[as.numeric(y)]); abline(h = 0) # so the decision values separate the classes reasonably well. plot(svmfit, data = data.frame(X = X, y = y), X.2~X.1, col = cols) # The boundary is far from linear (but in feature space it is). vcr.train <- vcr.svm.train(X, y, svfit = svmfit) confmat.vcr(vcr.train) stackedplot(vcr.train, classCols = cols) classmap(vcr.train, "blue", classCols = cols) classmap(vcr.train, "red", classCols = cols) # For more examples, we refer to the vignette: ## Not run: vignette("Support_vector_machine_examples") ## End(Not run)