Skip to contents

These functions create one subsampling trial dataset with a desired subsampling method

Usage

subtrialCR(
  x,
  q,
  bin = NULL,
  unit = NULL,
  keep = NULL,
  useFailed = FALSE,
  showFailed = FALSE
)

subtrialOXW(
  x,
  q,
  bin = NULL,
  coll = NULL,
  xexp = 1,
  keep = NULL,
  useFailed = FALSE,
  showFailed = FALSE
)

subtrialSQS(
  x,
  tax,
  q,
  bin = NULL,
  coll = NULL,
  ref = NULL,
  singleton = "occ",
  excludeDominant = FALSE,
  largestColl = FALSE,
  fcorr = "good",
  byList = FALSE,
  keep = NULL,
  useFailed = FALSE,
  showFailed = FALSE,
  appr = "under"
)

Arguments

x

(data.frame): Occurrence dataset, with bin, tax and coll as column names.

q

(numeric): Subsampling level argument (mandatory). Depends on the subsampling function, it is the number of occurrences for "cr", and the number of desired occurrences to the power of xexp for O^x^W. It is also the quorum of the SQS method.

bin

(character): The name of the subsetting variable (has to be integer). For time series, this is the time-slice variable. Rows with NA entries in this column will be omitted.

unit

(character): Argument of the CR subsampling type. The name of the variable that designates the subsampling units. In every bin, CR selects a certain number (quota) of entries from the dataset. By default (unit=NULL), the units will be the rows, and the q number of rows will be selected in each bin. However, this can be a higher level category that has multiple entries in the each bin. If unit is a valid column of the dataset x, then CR will select q number entries in this variable, and will return all the corresponding rows.

keep

(numeric): The bins, which will not be subsampled but will be added to the subsampling trials. NIf the number of occurrences does not reach the subsampling quota, by default it will not be represented in the subsampling trials. You can force their inclusion with the keep argument separetely (for all, see the useFailed argument). Only applicable when bin!=NULL.

useFailed

(logical): If the bin does not reach the subsampling quota, should the bin be used? If bin!=NULL and useFailed=TRUE then only TRUE values will be output (indicating the use of the full dataset).

showFailed

(logical): Toggles the output of the function. If set to TRUE the output will be a list, including both the default output (logical vector of rows) and the numeric vector of bins that did not have enough entries to reach the quota q. Only applicable when bin!=NULL.

coll

(character): The variable name of the collection identifiers.

xexp

(numeric): Argument of the OxW type. The exponent of by-list subsampling, by default it is 1.

tax

(character): The name of the taxon variable.

ref

(character): The name of the reference variable, optional - depending on the subsampling method.

singleton

(character): A parameter of SQS. Either "ref", "occ" or FALSE. If set to "occ", the coverage estimator (e.g. Good's u) will be calculated based on the number of single-occurrence taxa. If set to "ref" the number of occurrences belonging to single-reference taxa will be used instead. In case of the inexact algorithm, if set to FALSE then coverage corrections of frequencies will not be applied.

excludeDominant

(logical): Argument of SQS. This parameter sets whether the dominant taxon should be excluded from all calculations involving frequencies (this is the second correction of Alroy, 2010).

largestColl

(logical): Parameter of SQS. This parameter sets whether the occurrences of taxa only ever found in the most diverse collection should be excluded from the count of single-publication occurrences. (this is the third correction of Alroy, 2010) Note that largestColl=TRUE is dependent on excludeDominant=TRUE. Setting excludeDominant to FALSE will turn this correction off.

fcorr

(character): Parameter for the inexact method of SQS. either "good" or "alroy". This argument changes the frequency correction procedure of the 'inexact' version of SQS (Alroy 2010). As not all taxa are present in the samples, the sampled frequencies of taxa tend overestimate their frequencies in the sampling pool. In Alroy (2010) these are corrected using Good's u ("good", default), in the later versions of SQS this metric is changed to a different method using single occurrence and double occurrence taxa ("alroy").

byList

(character): A parameter of the "inexact" method of SQS. Sets whether occurrences should be subsampled with (FALSE) or without (TRUE) breaking the collection integrity.

appr

(character): A parameter of the inexact method of SQS. Either "over" (default) or ("under"). The current version is not concerned with small fluctuations around the drawn subsampling quorum. Therefore, in the inexact algorithm, sampling is finished when the subset either is immediately below the quorum ("under") or above it ("over").

Value

A logical vector.

Details

The essence of these functions are present within the subsampling wrapper function subsample. Each function implements a certain subsampling type. The return value of the funcfions by default is a logical vector indicating which rows of the original dataset should be present in the subsample. The inexact method for SQS is implemented here as it is computationally less demanding.

References:

Alroy, J., Marshall, C. R., Bambach, R. K., Bezusko, K., Foote, M., Fürsich, F. T., … Webber, A. (2001). Effects of sampling standardization on estimates of Phanerozoic marine diversification. Proceedings of the National Academy of Science, 98(11), 6261-6266.

Alroy, J. (2010). The Shifting Balance of Diversity Among Major Marine Animal Groups. Science, 329, 1191-1194. https://doi.org/10.1126/science.1189910

Raup, D. M. (1975). Taxonomic Diversity Estimation Using Rarefaction. Paleobiology, 1, 333-342. https: //doi.org/10.2307/2400135

Examples

#one classical rarefaction trial
  data(corals)
# return 5 references for each stage
  bRows<-subtrialCR(corals, bin="stg", unit="reference_no", q=5)
  # control
  unCor<-unique(corals[bRows,c("stg", "reference_no")])
  table(unCor$stg)
#> 
#> 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 
#>  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5 
#> 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 
#>  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5