User guide for speaq package version <= 1.2.3

Trung Nghia Vu, Charlie Beirnaert, et al.

2018-07-03

Preface

This introduction was written for the speaq package up until version 1.2.3. Since version 2.0 a lot of functionality is added but the original functionality is maintained. This vignette can therefor still be used as it decribes one part of the package dealing with spectral alignment and quantitation.

Introduction

We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA).

The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data.

We are going to introduce step-by-step how part of speaq works for a specific dataset, this includes

For any issue reports or discussions about speaq feel free to contact us via the developing website at github (https://github.com/beirnaert/speaq).

Data input

We randomly generate an NMR spectral dataset of two different groups (15 spectra for each group). Each spectrum has two peaks slightly shifted cross over spectra. More details are described in the manual document of function makeSimulatedData().

library(speaq)
#Generate a simulated NMR data set for this experiment
res=makeSimulatedData();
X=res$data;
groupLabel=res$label;

Now, we draw a spectral plot to observe the dataset before alignment.

drawSpec(X)

Landmark peak detection

This section makes use of MassSpecWavelet package to detect peak lists of the dataset.

cat("\n detect peaks....")
## 
##  detect peaks....
startTime <- proc.time()
peakList <- detectSpecPeaks(X, nDivRange = c(128), scales = seq(1, 16, 2), baselineThresh = 50000, 
    SNR.Th = -1, verbose = FALSE)
## [1] "j =1 peak =189 savedpeak = 189" "j =1 peak =297 savedpeak = 297"
## [1] "j =2 peak =169 savedpeak = 297"
## [1] "j =3 peak =330 savedpeak = 586" "j =3 peak =351 savedpeak = 607"
## [1] "j =4 peak =202 savedpeak = 586" "j =4 peak =223 savedpeak = 607"
## [1] "j =1 peak =124 savedpeak = 124" "j =1 peak =232 savedpeak = 232"
## [1] "j =2 peak =104 savedpeak = 232" "j =2 peak =393 savedpeak = 521"
## [3] "j =2 peak =414 savedpeak = 542"
## [1] "j =3 peak =265 savedpeak = 521" "j =3 peak =286 savedpeak = 542"
## [1] "j =4 peak =137 savedpeak = 521" "j =4 peak =158 savedpeak = 542"
## [1] "j =1 peak =222 savedpeak = 222"
## [1] "j =2 peak =94 savedpeak = 222"  "j =2 peak =383 savedpeak = 511"
## [3] "j =2 peak =404 savedpeak = 532"
## [1] "j =3 peak =255 savedpeak = 511" "j =3 peak =276 savedpeak = 532"
## [1] "j =4 peak =127 savedpeak = 511" "j =4 peak =148 savedpeak = 532"
## [1] "j =1 peak =213 savedpeak = 213"
## [1] "j =2 peak =85 savedpeak = 213"  "j =2 peak =374 savedpeak = 502"
## [3] "j =2 peak =395 savedpeak = 523"
## [1] "j =3 peak =246 savedpeak = 502" "j =3 peak =267 savedpeak = 523"
## [1] "j =4 peak =118 savedpeak = 502" "j =4 peak =139 savedpeak = 523"
## [1] "j =1 peak =128 savedpeak = 128" "j =1 peak =236 savedpeak = 236"
## [1] "j =2 peak =108 savedpeak = 236" "j =2 peak =397 savedpeak = 525"
## [3] "j =2 peak =418 savedpeak = 546"
## [1] "j =3 peak =269 savedpeak = 525" "j =3 peak =290 savedpeak = 546"
## [1] "j =4 peak =141 savedpeak = 525" "j =4 peak =162 savedpeak = 546"
## [1] "j =1 peak =175 savedpeak = 175" "j =1 peak =285 savedpeak = 285"
## [1] "j =2 peak =157 savedpeak = 285"
## [1] "j =3 peak =318 savedpeak = 574" "j =3 peak =339 savedpeak = 595"
## [1] "j =4 peak =190 savedpeak = 574" "j =4 peak =211 savedpeak = 595"
## [1] "j =1 peak =134 savedpeak = 134" "j =1 peak =242 savedpeak = 242"
## [1] "j =2 peak =114 savedpeak = 242" "j =2 peak =403 savedpeak = 531"
## [3] "j =2 peak =424 savedpeak = 552"
## [1] "j =3 peak =275 savedpeak = 531" "j =3 peak =296 savedpeak = 552"
## [1] "j =4 peak =147 savedpeak = 531" "j =4 peak =168 savedpeak = 552"
## [1] "j =1 peak =207 savedpeak = 207"
## [1] "j =2 peak =79 savedpeak = 207"  "j =2 peak =372 savedpeak = 500"
## [3] "j =2 peak =389 savedpeak = 517"
## [1] "j =3 peak =244 savedpeak = 500" "j =3 peak =261 savedpeak = 517"
## [1] "j =4 peak =116 savedpeak = 500" "j =4 peak =133 savedpeak = 517"
## [1] "j =1 peak =154 savedpeak = 154" "j =1 peak =264 savedpeak = 264"
## [1] "j =2 peak =136 savedpeak = 264" "j =2 peak =425 savedpeak = 553"
## [1] "j =3 peak =297 savedpeak = 553" "j =3 peak =318 savedpeak = 574"
## [1] "j =4 peak =169 savedpeak = 553" "j =4 peak =190 savedpeak = 574"
## [1] "j =1 peak =171 savedpeak = 171" "j =1 peak =281 savedpeak = 281"
## [1] "j =2 peak =153 savedpeak = 281"
## [1] "j =3 peak =314 savedpeak = 570" "j =3 peak =335 savedpeak = 591"
## [1] "j =4 peak =186 savedpeak = 570" "j =4 peak =207 savedpeak = 591"
## [1] "j =1 peak =178 savedpeak = 178" "j =1 peak =288 savedpeak = 288"
## [1] "j =2 peak =160 savedpeak = 288"
## [1] "j =3 peak =321 savedpeak = 577" "j =3 peak =342 savedpeak = 598"
## [1] "j =4 peak =193 savedpeak = 577" "j =4 peak =214 savedpeak = 598"
## [1] "j =1 peak =170 savedpeak = 170" "j =1 peak =280 savedpeak = 280"
## [1] "j =2 peak =152 savedpeak = 280"
## [1] "j =3 peak =313 savedpeak = 569" "j =3 peak =334 savedpeak = 590"
## [1] "j =4 peak =185 savedpeak = 569" "j =4 peak =206 savedpeak = 590"
## [1] "j =1 peak =221 savedpeak = 221"
## [1] "j =2 peak =93 savedpeak = 221"  "j =2 peak =382 savedpeak = 510"
## [3] "j =2 peak =403 savedpeak = 531"
## [1] "j =3 peak =254 savedpeak = 510" "j =3 peak =275 savedpeak = 531"
## [1] "j =4 peak =126 savedpeak = 510" "j =4 peak =147 savedpeak = 531"
## [1] "j =1 peak =122 savedpeak = 122" "j =1 peak =230 savedpeak = 230"
## [1] "j =2 peak =102 savedpeak = 230" "j =2 peak =391 savedpeak = 519"
## [3] "j =2 peak =412 savedpeak = 540"
## [1] "j =3 peak =263 savedpeak = 519" "j =3 peak =284 savedpeak = 540"
## [1] "j =4 peak =135 savedpeak = 519" "j =4 peak =156 savedpeak = 540"
## [1] "j =1 peak =125 savedpeak = 125" "j =1 peak =233 savedpeak = 233"
## [1] "j =2 peak =105 savedpeak = 233" "j =2 peak =394 savedpeak = 522"
## [3] "j =2 peak =415 savedpeak = 543"
## [1] "j =3 peak =266 savedpeak = 522" "j =3 peak =287 savedpeak = 543"
## [1] "j =4 peak =138 savedpeak = 522" "j =4 peak =159 savedpeak = 543"
## [1] "j =1 peak =142 savedpeak = 142" "j =1 peak =250 savedpeak = 250"
## [1] "j =2 peak =122 savedpeak = 250" "j =2 peak =411 savedpeak = 539"
## [3] "j =2 peak =432 savedpeak = 560"
## [1] "j =3 peak =168 savedpeak = 424" "j =3 peak =221 savedpeak = 477"
## [3] "j =3 peak =283 savedpeak = 539" "j =3 peak =304 savedpeak = 560"
## [5] "j =3 peak =386 savedpeak = 642"
## [1] "j =4 peak =93 savedpeak = 477"  "j =4 peak =155 savedpeak = 539"
## [3] "j =4 peak =176 savedpeak = 560" "j =4 peak =219 savedpeak = 603"
## [5] "j =4 peak =258 savedpeak = 642" "j =4 peak =352 savedpeak = 736"
## [7] "j =4 peak =383 savedpeak = 767" "j =4 peak =405 savedpeak = 789"
## [9] "j =4 peak =427 savedpeak = 811"
## [1] "j =1 peak =186 savedpeak = 186" "j =1 peak =294 savedpeak = 294"
## [1] "j =2 peak =166 savedpeak = 294"
## [1] "j =3 peak =327 savedpeak = 583" "j =3 peak =348 savedpeak = 604"
## [1] "j =4 peak =78 savedpeak = 462"  "j =4 peak =113 savedpeak = 497"
## [3] "j =4 peak =199 savedpeak = 583" "j =4 peak =220 savedpeak = 604"
## [5] "j =4 peak =302 savedpeak = 686"
## [1] "j =1 peak =169 savedpeak = 169" "j =1 peak =279 savedpeak = 279"
## [1] "j =2 peak =151 savedpeak = 279"
## [1] "j =3 peak =312 savedpeak = 568" "j =3 peak =333 savedpeak = 589"
## [1] "j =4 peak =92 savedpeak = 476"  "j =4 peak =184 savedpeak = 568"
## [3] "j =4 peak =205 savedpeak = 589" "j =4 peak =242 savedpeak = 626"
## [5] "j =4 peak =287 savedpeak = 671" "j =4 peak =381 savedpeak = 765"
## [7] "j =4 peak =412 savedpeak = 796" "j =4 peak =434 savedpeak = 818"
## [1] "j =1 peak =168 savedpeak = 168" "j =1 peak =276 savedpeak = 276"
## [1] "j =2 peak =148 savedpeak = 276" "j =2 peak =437 savedpeak = 565"
## [1] "j =3 peak =309 savedpeak = 565" "j =3 peak =330 savedpeak = 586"
## [1] "j =4 peak =114 savedpeak = 498" "j =4 peak =181 savedpeak = 565"
## [3] "j =4 peak =202 savedpeak = 586" "j =4 peak =230 savedpeak = 614"
## [5] "j =4 peak =239 savedpeak = 623" "j =4 peak =284 savedpeak = 668"
## [7] "j =4 peak =378 savedpeak = 762" "j =4 peak =409 savedpeak = 793"
## [9] "j =4 peak =431 savedpeak = 815"
## [1] "j =1 peak =173 savedpeak = 173" "j =1 peak =283 savedpeak = 283"
## [1] "j =2 peak =155 savedpeak = 283"
## [1] "j =3 peak =316 savedpeak = 572" "j =3 peak =337 savedpeak = 593"
## [1] "j =4 peak =188 savedpeak = 572" "j =4 peak =209 savedpeak = 593"
## [3] "j =4 peak =291 savedpeak = 675" "j =4 peak =385 savedpeak = 769"
## [5] "j =4 peak =416 savedpeak = 800"
## [1] "j =1 peak =157 savedpeak = 157" "j =1 peak =265 savedpeak = 265"
## [1] "j =2 peak =137 savedpeak = 265" "j =2 peak =364 savedpeak = 492"
## [3] "j =2 peak =426 savedpeak = 554"
## [1] "j =3 peak =236 savedpeak = 492" "j =3 peak =298 savedpeak = 554"
## [3] "j =3 peak =319 savedpeak = 575"
## [1] "j =4 peak =108 savedpeak = 492" "j =4 peak =170 savedpeak = 554"
## [3] "j =4 peak =191 savedpeak = 575" "j =4 peak =228 savedpeak = 612"
## [5] "j =4 peak =273 savedpeak = 657" "j =4 peak =367 savedpeak = 751"
## [7] "j =4 peak =398 savedpeak = 782" "j =4 peak =420 savedpeak = 804"
## [1] "j =1 peak =177 savedpeak = 177" "j =1 peak =285 savedpeak = 285"
## [1] "j =2 peak =157 savedpeak = 285"
## [1] "j =3 peak =318 savedpeak = 574" "j =3 peak =339 savedpeak = 595"
## [1] "j =4 peak =98 savedpeak = 482"  "j =4 peak =190 savedpeak = 574"
## [3] "j =4 peak =211 savedpeak = 595" "j =4 peak =248 savedpeak = 632"
## [5] "j =4 peak =293 savedpeak = 677" "j =4 peak =387 savedpeak = 771"
## [7] "j =4 peak =418 savedpeak = 802"
## [1] "j =1 peak =132 savedpeak = 132" "j =1 peak =240 savedpeak = 240"
## [1] "j =2 peak =112 savedpeak = 240" "j =2 peak =401 savedpeak = 529"
## [3] "j =2 peak =422 savedpeak = 550"
## [1] "j =3 peak =138 savedpeak = 394" "j =3 peak =158 savedpeak = 414"
## [3] "j =3 peak =211 savedpeak = 467" "j =3 peak =273 savedpeak = 529"
## [5] "j =3 peak =294 savedpeak = 550" "j =3 peak =363 savedpeak = 619"
## [7] "j =3 peak =376 savedpeak = 632"
## [1] "j =4 peak =83 savedpeak = 467"  "j =4 peak =145 savedpeak = 529"
## [3] "j =4 peak =166 savedpeak = 550" "j =4 peak =235 savedpeak = 619"
## [5] "j =4 peak =248 savedpeak = 632" "j =4 peak =342 savedpeak = 726"
## [7] "j =4 peak =373 savedpeak = 757" "j =4 peak =395 savedpeak = 779"
## [9] "j =4 peak =417 savedpeak = 801"
## [1] "j =1 peak =152 savedpeak = 152" "j =1 peak =262 savedpeak = 262"
## [1] "j =2 peak =134 savedpeak = 262" "j =2 peak =423 savedpeak = 551"
## [1] "j =3 peak =295 savedpeak = 551" "j =3 peak =316 savedpeak = 572"
## [1] "j =4 peak =105 savedpeak = 489" "j =4 peak =167 savedpeak = 551"
## [3] "j =4 peak =188 savedpeak = 572" "j =4 peak =225 savedpeak = 609"
## [5] "j =4 peak =270 savedpeak = 654" "j =4 peak =364 savedpeak = 748"
## [7] "j =4 peak =417 savedpeak = 801"
## [1] "j =1 peak =145 savedpeak = 145" "j =1 peak =253 savedpeak = 253"
## [1] "j =2 peak =125 savedpeak = 253" "j =2 peak =352 savedpeak = 480"
## [3] "j =2 peak =414 savedpeak = 542" "j =2 peak =435 savedpeak = 563"
## [1] "j =3 peak =224 savedpeak = 480" "j =3 peak =286 savedpeak = 542"
## [3] "j =3 peak =307 savedpeak = 563" "j =3 peak =350 savedpeak = 606"
## [5] "j =3 peak =389 savedpeak = 645"
## [1] "j =4 peak =96 savedpeak = 480"  "j =4 peak =158 savedpeak = 542"
## [3] "j =4 peak =179 savedpeak = 563" "j =4 peak =222 savedpeak = 606"
## [5] "j =4 peak =261 savedpeak = 645" "j =4 peak =355 savedpeak = 739"
## [7] "j =4 peak =386 savedpeak = 770" "j =4 peak =408 savedpeak = 792"
## [9] "j =4 peak =430 savedpeak = 814"
## [1] "j =1 peak =177 savedpeak = 177" "j =1 peak =285 savedpeak = 285"
## [1] "j =2 peak =157 savedpeak = 285"
## [1] "j =3 peak =318 savedpeak = 574" "j =3 peak =339 savedpeak = 595"
## [1] "j =4 peak =98 savedpeak = 482"  "j =4 peak =190 savedpeak = 574"
## [3] "j =4 peak =211 savedpeak = 595" "j =4 peak =248 savedpeak = 632"
## [5] "j =4 peak =293 savedpeak = 677" "j =4 peak =387 savedpeak = 771"
## [7] "j =4 peak =418 savedpeak = 802"
## [1] "j =1 peak =145 savedpeak = 145" "j =1 peak =253 savedpeak = 253"
## [1] "j =2 peak =125 savedpeak = 253" "j =2 peak =352 savedpeak = 480"
## [3] "j =2 peak =414 savedpeak = 542" "j =2 peak =435 savedpeak = 563"
## [1] "j =3 peak =224 savedpeak = 480" "j =3 peak =286 savedpeak = 542"
## [3] "j =3 peak =307 savedpeak = 563" "j =3 peak =350 savedpeak = 606"
## [1] "j =4 peak =96 savedpeak = 480"  "j =4 peak =158 savedpeak = 542"
## [3] "j =4 peak =179 savedpeak = 563" "j =4 peak =222 savedpeak = 606"
## [5] "j =4 peak =261 savedpeak = 645" "j =4 peak =355 savedpeak = 739"
## [7] "j =4 peak =386 savedpeak = 770" "j =4 peak =408 savedpeak = 792"
## [9] "j =4 peak =430 savedpeak = 814"
## [1] "j =1 peak =118 savedpeak = 118" "j =1 peak =226 savedpeak = 226"
## [1] "j =2 peak =98 savedpeak = 226"  "j =2 peak =370 savedpeak = 498"
## [3] "j =2 peak =387 savedpeak = 515" "j =2 peak =408 savedpeak = 536"
##  [1] "j =3 peak =77 savedpeak = 333"  "j =3 peak =96 savedpeak = 352" 
##  [3] "j =3 peak =124 savedpeak = 380" "j =3 peak =144 savedpeak = 400"
##  [5] "j =3 peak =197 savedpeak = 453" "j =3 peak =242 savedpeak = 498"
##  [7] "j =3 peak =259 savedpeak = 515" "j =3 peak =280 savedpeak = 536"
##  [9] "j =3 peak =349 savedpeak = 605" "j =3 peak =362 savedpeak = 618"
## [1] "j =4 peak =114 savedpeak = 498" "j =4 peak =131 savedpeak = 515"
## [3] "j =4 peak =152 savedpeak = 536" "j =4 peak =221 savedpeak = 605"
## [5] "j =4 peak =234 savedpeak = 618" "j =4 peak =328 savedpeak = 712"
## [7] "j =4 peak =359 savedpeak = 743" "j =4 peak =381 savedpeak = 765"
## [9] "j =4 peak =403 savedpeak = 787"
## [1] "j =1 peak =178 savedpeak = 178" "j =1 peak =288 savedpeak = 288"
## [1] "j =2 peak =160 savedpeak = 288"
## [1] "j =3 peak =321 savedpeak = 577" "j =3 peak =345 savedpeak = 601"
## [1] "j =4 peak =78 savedpeak = 462"  "j =4 peak =193 savedpeak = 577"
## [3] "j =4 peak =217 savedpeak = 601" "j =4 peak =251 savedpeak = 635"
## [5] "j =4 peak =296 savedpeak = 680" "j =4 peak =390 savedpeak = 774"
## [7] "j =4 peak =421 savedpeak = 805"
## [1] "j =1 peak =126 savedpeak = 126" "j =1 peak =236 savedpeak = 236"
## [1] "j =2 peak =108 savedpeak = 236" "j =2 peak =397 savedpeak = 525"
## [3] "j =2 peak =418 savedpeak = 546"
## [1] "j =3 peak =87 savedpeak = 343"  "j =3 peak =106 savedpeak = 362"
## [3] "j =3 peak =134 savedpeak = 390" "j =3 peak =154 savedpeak = 410"
## [5] "j =3 peak =207 savedpeak = 463" "j =3 peak =269 savedpeak = 525"
## [7] "j =3 peak =290 savedpeak = 546" "j =3 peak =349 savedpeak = 605"
## [9] "j =3 peak =372 savedpeak = 628"
## [1] "j =4 peak =79 savedpeak = 463"  "j =4 peak =141 savedpeak = 525"
## [3] "j =4 peak =162 savedpeak = 546" "j =4 peak =221 savedpeak = 605"
## [5] "j =4 peak =244 savedpeak = 628" "j =4 peak =338 savedpeak = 722"
## [7] "j =4 peak =369 savedpeak = 753" "j =4 peak =391 savedpeak = 775"
## [9] "j =4 peak =413 savedpeak = 797"
endTime <- proc.time()
cat("Peak detection time:", (endTime[3] - startTime[3])/60, " minutes")
## Peak detection time: 0.02083333  minutes

Reference finding

Next, We find the reference for other spectra align to.

cat("\n Find the spectrum reference...")
## 
##  Find the spectrum reference...
resFindRef <- findRef(peakList)
refInd <- resFindRef$refInd

# The ranks of spectra
for (i in 1:length(resFindRef$orderSpec)) {
    cat(paste(i, ":", resFindRef$orderSpec[i], sep = ""), " ")
    if (i%%10 == 0) 
        cat("\n")
}
## 1:24  2:25  3:27  4:9  5:21  6:16  7:7  8:23  9:19  10:18  
## 11:5  12:30  13:12  14:10  15:15  16:20  17:2  18:6  19:22  20:26  
## 21:14  22:11  23:29  24:28  25:3  26:13  27:17  28:1  29:4  30:8
cat("\n The reference is: ", refInd)
## 
##  The reference is:  24

Spectral alignment

For spectral alignment, function dohCluster() is used to implement hierarchical Cluster-based Peak Alignment [1] (CluPA) algorithm. In this function maxShift is set by 100 by default which is suitable with many NMR datasets. Experienced users can set select more proper for their dataset. For example:

# Set maxShift
maxShift = 50

Y <- dohCluster(X, peakList = peakList, refInd = refInd, maxShift = maxShift, 
    acceptLostPeak = TRUE, verbose = FALSE)

Automatically detect the optimal maxShift

If users are not confident when selecting a value for the maxShift, just set the value to NULL. Then, the software will automatically learn to select the optimal value based on the median Pearson correlation coefficient between spectra. It is worth noting that this metric is significantly effected by high peaks in the spectra [2], so it might not be the best measure for evaluating alignment performances. However, it is fast for the purpose of detecting the suitable maxShift value. This mode also takes more time since CluPA implements extra alignment for few maxShift values. If set verbose=TRUE, a plot of performances of CluPA with different values of maxShift will be displayed. For example:

Y <- dohCluster(X, peakList = peakList, refInd = refInd, maxShift = NULL, acceptLostPeak = TRUE, 
    verbose = TRUE)
## 
##  --------------------------------
##  maxShift=NULL, thus CluPA will automatically detect the optimal value of maxShift.
##  --------------------------------
## 
##  maxShift= 2
##  Median Pearson correlation coefficent: -0.02429686 , the best result: -1
##  maxShift= 4
##  Median Pearson correlation coefficent: 0.006180729 , the best result: -0.02429686
##  maxShift= 8
##  Median Pearson correlation coefficent: 0.0452733 , the best result: 0.006180729
##  maxShift= 16
##  Median Pearson correlation coefficent: 0.7615448 , the best result: 0.0452733
##  maxShift= 32
##  Median Pearson correlation coefficent: 0.9317428 , the best result: 0.7615448
##  maxShift= 64
##  Median Pearson correlation coefficent: 0.9317428 , the best result: 0.9317428
##  maxShift= 128
##  Median Pearson correlation coefficent: 0.9317428 , the best result: 0.9317428
##  maxShift= 256
##  Median Pearson correlation coefficent: 0.9317428 , the best result: 0.9317428
## Optimal maxShift= 32 with median Pearson correlation of aligned spectra= 0.9317428

## 
##  Alignment time:  0.008116667  minutes

In this example, the best maxShift=32 which is highlighted by a red star in the plot achieves the highest median Pearson correlation coefficient (0.93).

Spectral alignment with selected segments

If users just want to align in specific segments or prefer to use different parameter settings for different segments. speaq allows users to do that by intervene into the process. To do that, users need to create a segment information matrix as the example in Table 1.

begin end forAlign ref maxShift
100 200 0 0 0
450 680 1 0 50

Each row contains the following information corresponding to the columns:

It is worth to note that only segments with forAlign=1 (column 3) will be taken into account for spectral alignment.

Now, simply run dohClusterCustommedSegments with the input from the information file.

segmentInfoMat = matrix(data = c(100, 200, 0, 0, 0, 450, 680, 1, 0, 50), nrow = 2, 
    ncol = 5, byrow = TRUE)
colnames(segmentInfoMat) = c("begin", "end", "forAlign", "ref", "maxShift")
segmentInfoMat
##      begin end forAlign ref maxShift
## [1,]   100 200        0   0        0
## [2,]   450 680        1   0       50
Yc <- dohClusterCustommedSegments(X, peakList = peakList, refInd = refInd, segmentInfoMat = segmentInfoMat, 
    minSegSize = 128, verbose = FALSE)

Spectral plots

We could draw a segment to see the performance of the alignement.

drawSpec(Y)

We could limit the heights of spectra to easily check the alignment performance.

drawSpec(Y, startP = 450, endP = 680, highBound = 5e+05, lowBound = -100)

We achieved similar results with Yc but the region of the first peak was not aligned because the segment information just allows align the region 450-680.

drawSpec(Yc)

Quantitative analysis

This section presents the quantitative analysis for wine data that was used in our paper [1]. To save time, we just do permutation 100 times to create null distribution.

N = 100
alpha = 0.05

# find the BW-statistic
BW = BWR(Y, groupLabel)

# create sampled H0 and export to file
H0 = createNullSampling(Y, groupLabel, N = N, verbose = FALSE)

# compute percentile of alpha
perc = double(ncol(Y))
alpha_corr = alpha/sum(returnLocalMaxima(Y[2, ])$pkMax > 50000)
for (i in 1:length(perc)) {
    perc[i] = quantile(H0[, i], 1 - alpha_corr, type = 3)
}

Now, some figures are plotting. Read the publication to understand more about these figures.

drawBW(BW, perc, Y, groupLabel = groupLabel)

drawBW(BW, perc, Y, startP = 450, endP = 680, groupLabel = groupLabel)

References

[1] Vu, Trung Nghia, Dirk Valkenborg, Koen Smets, Kim A. Verwaest, Roger Dommisse, Filip Lemiere, Alain Verschoren, Bart Goethals, and Kris Laukens. “An Integrated Workflow for Robust Alignment and Simplified Quantitative Analysis of NMR Spectrometry Data.” BMC Bioinformatics 12, no. 1 (October 20, 2011): 405.

[2] Vu, Trung Nghia, and Kris Laukens. “Getting Your Peaks in Line: A Review of Alignment Methods for NMR Spectral Data.” Metabolites 3, no. 2 (April 15, 2013): 259-76.