This basic function replaces groups of values in a vector with single values with the help of a key object.
Arguments
- x
(vector)
Object containing the values to be replaced.- key
(list)
A list of vectors. Eachvector
includes the possible elements that will be replaced in a group, thenames
of thevector
s will be the replacement values. Also has to include an element named 'default' with a single value. (see examples)- incbound
(character)
Either"lower"
or"higher"
. Interval identifiers will be treated with different interval rules."lower"
will treat the lowest entry as included,"higher"
works the opposite. The argument will be renamed to 'include.lowest' to make the interface easier to remember.
Details
Online datasets usually contain overly detailed information, as enterers intend to conserve as much data in the entry process, as possible. However, in analyses some values are treated to represent the same, less-detailed information, which is then used in further procedures. The map
function allows users to do this type of multiple replacement using a specific object called a 'key'
.
A key
is an informal class and is essentially a list
of vectors
. In the case of character
vectors as x
, each vector element in the list
corresponds to a set of entries in x
. These will be replaced by the name of the vector
in the list
, to indicate their assumed identity.
In the case of numeric
x
vectors, if the list
elements of the key
are numeric
vectors with 2 values, then this vector will be treated as an interval. The same value will be assigned to the entries that are in this interval (Example 2). If x
contains values that form the boundary of an interval, than either only the one of the two boundary values can be considered to be in the interval (see the incbound
argument to set which of the two).
The elements of key
are looped through in sequence. If values of x
occur in multiple elements of key
, than the last one will be used (Example 3).
Examples of this data type have been included (keys
) to help process Paleobiology Database occurrences.
Examples
# Example 1
# x, as character
set.seed(1000)
toReplace <- sample(letters[1:6], 15, replace=TRUE)
# a and b should mean 'first', c and d 'second' others: NA
key<-list(first=c("a", "b"), second=c("c", "d"), default=NA)
# do the replacement
categorize(toReplace, key)
#> [1] "second" "second" NA "second" NA "second" NA "first"
#> [9] NA NA NA NA "first" "first" NA
# Example 2 - numeric entries and mixed types
# basic vector to be grouped
toReplace2<-1:16
# replacement rules: 5,6,7,8,9 should be "more", 11 should be "eleven" the rest: "other"
key2<-list(default="other", more=c(5,10),eleven=11)
categorize(toReplace2, key2)
#> [1] "other" "other" "other" "other" "more" "more" "more" "more"
#> [9] "more" "other" "eleven" "other" "other" "other" "other" "other"
# Example 3 - multiple occurrences of same values
# a and b should mean first, a and should mean 'second' others: NA
key3<-list(first=c("a", "b"), second=c("a", "d"), default=NA)
# do the replacement (all "a" entries will be replaced with "second")
categorize(toReplace, key3)
#> [1] "second" NA NA NA NA NA NA "first"
#> [9] NA NA NA NA "second" "second" NA