These functions create one subsampling trial dataset with a desired subsampling method
Usage
subtrialCR(
x,
q,
bin = NULL,
unit = NULL,
keep = NULL,
useFailed = FALSE,
showFailed = FALSE
)
subtrialOXW(
x,
q,
bin = NULL,
coll = NULL,
xexp = 1,
keep = NULL,
useFailed = FALSE,
showFailed = FALSE
)
subtrialSQS(
x,
tax,
q,
bin = NULL,
coll = NULL,
ref = NULL,
singleton = "occ",
excludeDominant = FALSE,
largestColl = FALSE,
fcorr = "good",
byList = FALSE,
keep = NULL,
useFailed = FALSE,
showFailed = FALSE,
appr = "under"
)
Arguments
- x
(
data.frame
): Occurrence dataset, withbin
,tax
andcoll
as column names.- q
(
numeric)
: Subsampling level argument (mandatory). Depends on the subsampling function, it is the number of occurrences for"cr"
, and the number of desired occurrences to the power ofxexp
for O^x^W. It is also the quorum of the SQS method.- bin
(
character
): The name of the subsetting variable (has to be integer). For time series, this is the time-slice variable. Rows withNA
entries in this column will be omitted.- unit
(
character
): Argument of the CR subsampling type. The name of the variable that designates the subsampling units. In every bin, CR selects a certain number (quota) of entries from the dataset. By default (unit=NULL
), the units will be the rows, and theq
number of rows will be selected in each bin. However, this can be a higher level category that has multiple entries in the each bin. Ifunit
is a valid column of the datasetx
, then CR will selectq
number entries in this variable, and will return all the corresponding rows.- keep
(
numeric
): The bins, which will not be subsampled but will be added to the subsampling trials. NIf the number of occurrences does not reach the subsampling quota, by default it will not be represented in the subsampling trials. You can force their inclusion with thekeep
argument separetely (for all, see theuseFailed
argument). Only applicable whenbin!=NULL
.- useFailed
(
logical
): If the bin does not reach the subsampling quota, should the bin be used? Ifbin!=NULL
anduseFailed=TRUE
then onlyTRUE
values will be output (indicating the use of the full dataset).- showFailed
(
logical
): Toggles the output of the function. If set toTRUE
the output will be a list, including both the default output (logical vector of rows) and thenumeric
vector of bins that did not have enough entries to reach the quotaq
. Only applicable whenbin!=NULL
.- coll
(
character
): The variable name of the collection identifiers.- xexp
(
numeric
): Argument of the OxW type. The exponent of by-list subsampling, by default it is 1.- tax
(
character
): The name of the taxon variable.- ref
(
character
): The name of the reference variable, optional - depending on the subsampling method.- singleton
(character)
: A parameter of SQS. Either"ref"
,"occ"
orFALSE
. If set to"occ"
, the coverage estimator (e.g. Good's u) will be calculated based on the number of single-occurrence taxa. If set to "ref" the number of occurrences belonging to single-reference taxa will be used instead. In case of the inexact algorithm, if set toFALSE
then coverage corrections of frequencies will not be applied.- excludeDominant
(logical)
: Argument of SQS. This parameter sets whether the dominant taxon should be excluded from all calculations involving frequencies (this is the second correction of Alroy, 2010).- largestColl
(logical)
: Parameter of SQS. This parameter sets whether the occurrences of taxa only ever found in the most diverse collection should be excluded from the count of single-publication occurrences. (this is the third correction of Alroy, 2010) Note thatlargestColl=TRUE
is dependent onexcludeDominant=TRUE
. SettingexcludeDominant
toFALSE
will turn this correction off.- fcorr
(character)
: Parameter for the inexact method of SQS. either "good" or "alroy". This argument changes the frequency correction procedure of the 'inexact' version of SQS (Alroy 2010). As not all taxa are present in the samples, the sampled frequencies of taxa tend overestimate their frequencies in the sampling pool. In Alroy (2010) these are corrected using Good's u ("good", default), in the later versions of SQS this metric is changed to a different method using single occurrence and double occurrence taxa ("alroy").- byList
(
character
): A parameter of the"inexact"
method of SQS. Sets whether occurrences should be subsampled with (FALSE
) or without (TRUE
) breaking the collection integrity.- appr
(
character
): A parameter of the inexact method of SQS. Either "over" (default) or ("under"). The current version is not concerned with small fluctuations around the drawn subsampling quorum. Therefore, in the inexact algorithm, sampling is finished when the subset either is immediately below the quorum ("under"
) or above it ("over"
).
Details
The essence of these functions are present within the subsampling wrapper function subsample
. Each function implements a certain subsampling type.
The return value of the funcfions by default is a logical
vector indicating which rows of the original dataset should be present in the subsample.
The inexact method for SQS is implemented here as it is computationally less demanding.
References:
Alroy, J., Marshall, C. R., Bambach, R. K., Bezusko, K., Foote, M., Fürsich, F. T., … Webber, A. (2001). Effects of sampling standardization on estimates of Phanerozoic marine diversification. Proceedings of the National Academy of Science, 98(11), 6261-6266.
Alroy, J. (2010). The Shifting Balance of Diversity Among Major Marine Animal Groups. Science, 329, 1191-1194. https://doi.org/10.1126/science.1189910
Raup, D. M. (1975). Taxonomic Diversity Estimation Using Rarefaction. Paleobiology, 1, 333-342. https: //doi.org/10.2307/2400135
Examples
#one classical rarefaction trial
data(corals)
# return 5 references for each stage
bRows<-subtrialCR(corals, bin="stg", unit="reference_no", q=5)
# control
unCor<-unique(corals[bRows,c("stg", "reference_no")])
table(unCor$stg)
#>
#> 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
#> 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#> 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
#> 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5