mlr: Machine Learning in R

Bernd Bischl, Michel Lang, Jakob Richter, Jakob Bossek, Leonard Judt, Tobias Kuehn, Erich Studerus, Lars Kotthoff

2017-03-14

mlr: Machine Learning in R

This Vignette is supposed to give you a short introductory glance at the key features of mlr. A more detailed in depth and continuously updated tutorial can be found on the GitHub project page:

Purpose

The main goal of mlr is to provide a unified interface for machine learning tasks as classification, regression, cluster analysis and survival analysis in R. In lack of a common interface it becomes a hassle to carry out standard methods like cross-validation and hyperparameter tuning for different learners. Hence, mlr offers the following features:

Quick Start

To highlight the main principles of mlr we give a quick introduction to the package. We demonstrate how to simply perform a classification analysis using a stratified cross validation, which illustrates some of the major building blocks of the mlr workflow, namely tasks and learners.

library(mlr)
## Loading required package: ParamHelpers
data(iris)

## Define the task:
task = makeClassifTask(id = "tutorial", data = iris, target = "Species")
print(task)
## Supervised task: tutorial
## Type: classif
## Target: Species
## Observations: 150
## Features:
## numerics  factors  ordered 
##        4        0        0 
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Classes: 3
##     setosa versicolor  virginica 
##         50         50         50 
## Positive class: NA
## Define the learner:
lrn = makeLearner("classif.lda")
print(lrn)
## Learner classif.lda from package MASS
## Type: classif
## Name: Linear Discriminant Analysis; Short name: lda
## Class: classif.lda
## Properties: twoclass,multiclass,numerics,factors,prob
## Predict-Type: response
## Hyperparameters:
## Define the resampling strategy:
rdesc = makeResampleDesc(method = "CV", stratify = TRUE)

## Do the resampling:
r = resample(learner = lrn, task = task, resampling = rdesc)
## [Resample] cross-validation iter 1:
## mmce.test.mean=   0
## [Resample] cross-validation iter 2:
## mmce.test.mean=0.0667
## [Resample] cross-validation iter 3:
## mmce.test.mean=   0
## [Resample] cross-validation iter 4:
## mmce.test.mean=   0
## [Resample] cross-validation iter 5:
## mmce.test.mean=0.0667
## [Resample] cross-validation iter 6:
## mmce.test.mean=0.0667
## [Resample] cross-validation iter 7:
## mmce.test.mean=   0
## [Resample] cross-validation iter 8:
## mmce.test.mean=   0
## [Resample] cross-validation iter 9:
## mmce.test.mean=   0
## [Resample] cross-validation iter 10:
## mmce.test.mean=   0
## [Resample] Aggr. Result: mmce.test.mean=0.02
print(r)
## Resample Result
## Task: tutorial
## Learner: classif.lda
## Aggr perf: mmce.test.mean=0.02
## Runtime: 0.150228
## Get the mean misclassification error:
r$aggr
## mmce.test.mean 
##           0.02

Detailed Tutorial

The previous example just demonstrated a tiny fraction of the capabilities of mlr. More features are covered in the tutorial which can be found online on the mlr project page. It covers among others: benchmarking, preprocessing, imputation, feature selection, ROC analysis, how to implement your own learner and the list of all supported learners. Reading is highly recommended!

Thanks

We would like to thank the authors of all packages which mlr uses under the hood: