Data

##                                                                           species name
## ABIEALB                                                                     Abies alba
## ABIEGRA                                                                  Abies grandis
## ABIENOR                                                             Abies nordmanniana
## ACERGRA                           Large Maples (Acer platanoides, Acer pseudoplatanus)
## ACERPET Small Maples (Acer campestre, Acer monspessulanum, Acer negundo, Acer opalus) 
## ALNUGLU                                                                Alnus glutinosa

Poisson stochastic blockmodel (SBM)

Assuming \(K\) clusters, \[\begin{align*} \{Z_i\}_{1 \leq i \leq n} & \text{ iid} & Z_i & \sim \mathcal{M}{(1, \pi)} \\ \{Y_{ij}\}_{1 \leq i, j \leq n} & \text{ indep.} \mid Z & (y_{ij} \mid Z_i=k, Z_j = \ell) & \sim \mathcal{P}{(\lambda_{k\ell})} \\ \end{align*}\]

Choosing the number of clusters

Parameter estimates
## pi =
## [1] 0.1569772 0.1763561 0.1569772 0.1957351 0.1375982 0.1763561
## lambda =
##            [,1]      [,2]       [,3]      [,4]       [,5]       [,6]
## [1,] 0.54121406 1.8546421 0.14089332 0.4999912 1.46245649 0.02263218
## [2,] 1.85464212 9.0286270 1.06850270 3.2910355 6.04727443 0.11062572
## [3,] 0.14089332 1.0685027 0.82166374 2.1761007 0.40263458 0.07793391
## [4,] 0.49999123 3.2910355 2.17610068 5.9646020 1.33637565 0.26547703
## [5,] 1.46245649 6.0472744 0.40263458 1.3363757 3.94065914 0.07080814
## [6,] 0.02263218 0.1106257 0.07793391 0.2654770 0.07080814 0.02940543
Classification
##             tau1        tau2        tau3        tau4        tau5        tau6
## [1,] 0.001945525 0.990272374 0.001945525 0.001945525 0.001945525 0.001945525
## [2,] 0.001945525 0.990272374 0.001945525 0.001945525 0.001945525 0.001945525
## [3,] 0.001945525 0.001945525 0.001945525 0.001945525 0.990272374 0.001945525
## [4,] 0.001945525 0.001945525 0.001945525 0.990272374 0.001945525 0.001945525
## [5,] 0.001945525 0.001945525 0.990272374 0.001945525 0.001945525 0.001945525
## [6,] 0.001945525 0.001945525 0.990272374 0.001945525 0.001945525 0.001945525
##         entropy
## [1,] 0.07040218
## [2,] 0.07040218
## [3,] 0.07040218
## [4,] 0.07040218
## [5,] 0.07040218
## [6,] 0.07040218
Content of the clusters

Taxonomy of the tree species

Comparison with SBM clusters

##                 clusterSBM
## treeGroup         1  2  3  4  5  6
##   Conipherophyta  7  9  0  0  7  2
##   Magnoliphyta    1  0  8 10  0  7

Poisson regression stochastic blockmodel (SBM)

Accounting for the taxonomic distance

Denote \[ x_{ij} = \text{taxonomic distance between species $i$ and $j$} \]

Assuming \(K\) clusters, \[\begin{align*} \{Z_i\}_{1 \leq i \leq n} & \text{ iid} & Z_i & \sim \mathcal{M}{(1, \pi)} \\ \{Y_{ij}\}_{1 \leq i, j \leq n} & \text{ indep.} \mid Z & (y_{ij} \mid Z_i=k, Z_j = \ell) & \sim \mathcal{P}{(\lambda_{k\ell}\exp(\beta x_{ij}))} \\ \end{align*}\]

Choosing the number of clusters

Parameter estimates
## pi =
## [1] 0.3715916 0.1575904 0.2548637 0.2159544
## beta =
##            [,1]
## [1,] -0.4220965
## lambda =
##            [,1]       [,2]      [,3]       [,4]
## [1,] 23.2956770 0.81937063 4.4070148 11.9195905
## [2,]  0.8193706 0.01170769 0.3393714  0.2412392
## [3,]  4.4070148 0.33937136 1.0924972  2.7247345
## [4,] 11.9195905 0.24123923 2.7247345  7.2684830

Comparison with taxonomy

##              clusterSBM
## clusterSBMreg  1  2  3  4  5  6
##             1  0  9  0 10  0  0
##             2  0  0  0  0  0  8
##             3  8  0  4  0  0  1
##             4  0  0  4  0  7  0
##                 clusterSBMreg
## treeGroup         1  2  3  4
##   Conipherophyta  9  2  7  7
##   Magnoliphyta   10  6  6  4

Including more covariates

Denote

  • \(x^1_{ij} =\) taxonomic distance between species \(i\) and \(j\),
  • \(x^2_{ij} =\) geographic distance between species \(i\) and \(j\),
  • \(x^3_{ij} =\) (log10-)genetic distance between species \(i\) and \(j\),
  • \(x_{ij} = [x^1_{ij} \; x^2_{ij} \; x^3_{ij}]^\intercal\),

and suppose \[\begin{align*} \{Z_i\}_{1 \leq i \leq n} & \text{ iid} & Z_i & \sim \mathcal{M}{(1, \pi)} \\ \{Y_{ij}\}_{1 \leq i, j \leq n} & \text{ indep.} \mid Z & (y_{ij} \mid Z_i=k, Z_j = \ell) & \sim \mathcal{P}{(\lambda_{k\ell}\exp(x_{ij}^\intercal \beta))} \\ \end{align*}\]

##                   1         2         3         4         5         6
## ICLreg    -2895.342 -2079.961 -1969.226 -1930.106 -1946.129 -1966.994
## ICLregAll -2747.774 -2066.117 -1960.134 -1931.177 -1947.083 -1969.800

Including geographic and genetic distances does not yield improvement with respect to taxonomy.

Parameters and clustering

## beta =
## [1] -0.4176394 -0.3261760  0.0868714
##              clusterSBMregAll
## clusterSBMreg  1  2  3  4
##             1  0  0 19  0
##             2  0  0  0  8
##             3 12  0  0  1
##             4  0 11  0  0