Getting Started with NNS: Clustering and Regression

require(NNS)
require(knitr)
require(rgl)
require(data.table)

Clustering and Regression

Below are some examples demonstrating unsupervised learning with NNS clustering and nonlinear regression using the resulting clusters. As always, for a more thorough description and definition, please view the References.

NNS Partitioning NNS.part

NNS.part is both a partitional and hierarchical clustering method. NNS iteratively partitions the joint distribution into partial moment quadrants, and then assigns a quadrant identification (1:4) at each partition.

NNS.part returns a data.table of observations along with their final quadrant identification. It also returns the regression points, which are the quadrant means used in NNS.reg.

x = seq(-5, 5, .05); y = x ^ 3

for(i in 1 : 4){NNS.part(x, y, order = i, noise.reduction = "off", Voronoi = TRUE)}

## $order ## [1] 4 ## ##$dt
##   1: -5.00 -125.0000    q4444           q444
##   2: -4.95 -121.2874    q4444           q444
##   3: -4.90 -117.6490    q4444           q444
##   4: -4.85 -114.0841    q4444           q444
##   5: -4.80 -110.5920    q4444           q444
##  ---
## 197:  4.80  110.5920    q1111           q111
## 198:  4.85  114.0841    q1111           q111
## 199:  4.90  117.6490    q1111           q111
## 200:  4.95  121.2874    q1111           q111
## 201:  5.00  125.0000    q1111           q111
##
## $regression.points ## quadrant x y ## 1: q111 4.600 98.164000 ## 2: q113 4.150 71.473375 ## 3: q114 3.650 49.448375 ## 4: q131 3.025 27.746813 ## 5: q134 2.700 19.764000 ## 6: q141 2.050 9.076375 ## 7: q143 1.425 2.924813 ## 8: q144 0.650 0.528125 ## 9: q411 -0.600 -0.450000 ## 10: q412 -1.400 -2.786000 ## 11: q414 -2.025 -8.712562 ## 12: q421 -2.650 -18.689125 ## 13: q424 -3.000 -27.090000 ## 14: q441 -3.625 -48.366563 ## 15: q442 -4.125 -70.197187 ## 16: q444 -4.600 -98.164000 X-only Partitioning NNS.part offers a partitioning based on $$x$$ values only, using the entire bandwidth in its regression point derivation, and shares the same limit condition as partitioning via both $$x$$ and $$y$$ values. for(i in 1 : 4){NNS.part(x, y, order = i, type = "XONLY", Voronoi = TRUE)} Note the partition identifications are limited to 1’s and 2’s (left and right of the partition respectively), not the 4 values per the $$x$$ and $$y$$ partitioning. ##$order
## [1] 4
##
## $dt ## x y quadrant prior.quadrant ## 1: -5.00 -125.0000 q1111 q111 ## 2: -4.95 -121.2874 q1111 q111 ## 3: -4.90 -117.6490 q1111 q111 ## 4: -4.85 -114.0841 q1111 q111 ## 5: -4.80 -110.5920 q1111 q111 ## --- ## 197: 4.80 110.5920 q2222 q222 ## 198: 4.85 114.0841 q2222 q222 ## 199: 4.90 117.6490 q2222 q222 ## 200: 4.95 121.2874 q2222 q222 ## 201: 5.00 125.0000 q2222 q222 ## ##$regression.points
## 1:     q111 -4.375 -85.585938
## 2:     q112 -3.100 -31.000000
## 3:     q121 -1.850  -7.053125
## 4:     q122 -0.600  -0.450000
## 5:     q211  0.650   0.528125
## 6:     q212  1.900   7.600000
## 7:     q221  3.150  32.484375
## 8:     q222  4.400  86.900000

Clusters Used in Regression

The right column of plots shows the corresponding regression for the order of NNS partitioning.

for(i in 1 : 3){NNS.part(x, y, order = i, Voronoi = TRUE) ; NNS.reg(x, y, order = i, ncores = 1)}

NNS Regression NNS.reg

NNS.reg can fit any $$f(x)$$, for both uni- and multivariate cases. NNS.reg returns a self-evident list of values provided below.

Univariate:

NNS.reg(x, y, order = 4, noise.reduction = "off", ncores = 1)

## $R2 ## [1] 0.9998899 ## ##$SE
## [1] 0.7461974
##
## $Prediction.Accuracy ## NULL ## ##$equation
## NULL
##
## $x.star ## NULL ## ##$derivative
##     Coefficient X.Lower.Range X.Upper.Range
##  1:    67.09000        -5.000        -4.600
##  2:    58.87750        -4.600        -4.125
##  3:    43.66125        -4.125        -3.625
##  4:    34.04250        -3.625        -3.000
##  5:    24.00250        -3.000        -2.650
##  6:    15.96250        -2.650        -2.025
##  7:     9.48250        -2.025        -1.400
##  8:     2.92000        -1.400        -0.600
##  9:     0.78250        -0.600         0.650
## 10:     3.09250         0.650         1.425
## 11:     9.84250         1.425         2.050
## 12:    16.44250         2.050         2.700
## 13:    24.56250         2.700         3.025
## 14:    34.72250         3.025         3.650
## 15:    44.05000         3.650         4.150
## 16:    59.31250         4.150         4.600
## 17:    67.09000         4.600         5.000
##
## $Point ## NULL ## ##$Point.est
## NULL
##
## $regression.points ## x y ## 1: -5.000 -125.000000 ## 2: -4.600 -98.164000 ## 3: -4.125 -70.197187 ## 4: -3.625 -48.366563 ## 5: -3.000 -27.090000 ## 6: -2.650 -18.689125 ## 7: -2.025 -8.712562 ## 8: -1.400 -2.786000 ## 9: -0.600 -0.450000 ## 10: 0.650 0.528125 ## 11: 1.425 2.924813 ## 12: 2.050 9.076375 ## 13: 2.700 19.764000 ## 14: 3.025 27.746813 ## 15: 3.650 49.448375 ## 16: 4.150 71.473375 ## 17: 4.600 98.164000 ## 18: 5.000 125.000000 ## ##$Fitted
##          y.hat
##   1: -125.0000
##   2: -121.6455
##   3: -118.2910
##   4: -114.9365
##   5: -111.5820
##  ---
## 197:  111.5820
## 198:  114.9365
## 199:  118.2910
## 200:  121.6455
## 201:  125.0000
##
## $Fitted.xy ## x y y.hat NNS.ID gradient ## 1: -5.00 -125.0000 -125.0000 q4444 67.09 ## 2: -4.95 -121.2874 -121.6455 q4444 67.09 ## 3: -4.90 -117.6490 -118.2910 q4444 67.09 ## 4: -4.85 -114.0841 -114.9365 q4444 67.09 ## 5: -4.80 -110.5920 -111.5820 q4444 67.09 ## --- ## 197: 4.80 110.5920 111.5820 q1111 67.09 ## 198: 4.85 114.0841 114.9365 q1111 67.09 ## 199: 4.90 117.6490 118.2910 q1111 67.09 ## 200: 4.95 121.2874 121.6455 q1111 67.09 ## 201: 5.00 125.0000 125.0000 q1111 67.09 Multivariate: Multivariate regressions return a plot of $$y$$ and $$\hat{y}$$, as well as the regression points ($RPM) and partitions ($rhs.partitions) for each regressor. f= function(x, y) x ^ 3 + 3 * y - y ^ 3 - 3 * x y = x ; z = expand.grid(x, y) g = f(z[ , 1], z[ , 2]) NNS.reg(z, g, order = "max", ncores = 1) ##$R2
## [1] 1
##
## $rhs.partitions ## Var1 Var2 ## 1: -5.00 -5 ## 2: -4.95 -5 ## 3: -4.90 -5 ## 4: -4.85 -5 ## 5: -4.80 -5 ## --- ## 40397: 4.80 5 ## 40398: 4.85 5 ## 40399: 4.90 5 ## 40400: 4.95 5 ## 40401: 5.00 5 ## ##$RPM
##        Var1  Var2         y.hat
##     1: -4.8 -4.80 -7.105427e-15
##     2: -4.8 -2.55 -8.726063e+01
##     3: -4.8 -2.50 -8.806700e+01
##     4: -4.8 -2.45 -8.883587e+01
##     5: -4.8 -2.40 -8.956800e+01
##    ---
## 40397: -2.6 -2.80  3.776000e+00
## 40398: -2.6 -2.75  2.770875e+00
## 40399: -2.6 -2.70  1.807000e+00
## 40400: -2.6 -2.65  8.836250e-01
## 40401: -2.6 -2.60  1.776357e-15
##
## $Point.est ## NULL ## ##$Fitted
##             y.hat
##     1:   0.000000
##     2:   3.562625
##     3:   7.051000
##     4:  10.465875
##     5:  13.808000
##    ---
## 40397: -13.808000
## 40398: -10.465875
## 40399:  -7.051000
## 40400:  -3.562625
## 40401:   0.000000
##

Classification

For a classification problem, we simply set NNS.reg(x, y, type = "CLASS", ...).

NNS.reg(iris[ , 1 : 4], iris[ , 5], type = "CLASS", point.est = iris[1:10, 1 : 4], location = "topleft", ncores = 1)$Point.est ## [1] 0.9915350 0.9923555 1.0000000 0.9908216 1.0000000 1.0000000 1.0000000 ## [8] 0.9908216 1.0000000 0.9923555 NNS Dimension Reduction Regression NNS.reg also provides a dimension reduction regression by including a parameter NNS.reg(x, y, dim.red.method = "cor", ...). Reducing all regressors to a single dimension using the returned equation $equation.

NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", location = "topleft", ncores = 1)$equation ## Variable Coefficient ## 1: Sepal.Length 0.7825612 ## 2: Sepal.Width -0.4266576 ## 3: Petal.Length 0.9490347 ## 4: Petal.Width 0.9565473 ## 5: DENOMINATOR 4.0000000 Thus, our model for this regression would be: $Species = \frac{0.7825612*Sepal.Length -0.4266576*Sepal.Width + 0.9490347*Petal.Length + 0.9565473*Petal.Width}{4}$ Threshold NNS.reg(x, y, dim.red.method = "cor", threshold = ...) offers a method of reducing regressors further by controlling the absolute value of required correlation. NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, location = "topleft", ncores = 1)$equation

##        Variable Coefficient
## 1: Sepal.Length   0.7825612
## 2:  Sepal.Width   0.0000000
## 3: Petal.Length   0.9490347
## 4:  Petal.Width   0.9565473
## 5:  DENOMINATOR   3.0000000

Thus, our model for this further reduced dimension regression would be: $Species = \frac{0.7825612*Sepal.Length -0*Sepal.Width + 0.9490347*Petal.Length + 0.9565473*Petal.Width}{3}$

and the point.est = (...) operates in the same manner as the full regression above, again called with $Point.est. NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)$Point.est

##  [1] 1 1 1 1 1 1 1 1 1 1

References

If the user is so motivated, detailed arguments further examples are provided within the following: