duplicated {base} | R Documentation |
Determines which elements of a vector or data frame are duplicates of elements with smaller subscripts, and returns a logical vector indicating which elements (rows) are duplicates.
duplicated(x, incomparables = FALSE, ...) ## Default S3 method: duplicated(x, incomparables = FALSE, fromLast = FALSE, ...) ## S3 method for class 'array': duplicated(x, incomparables = FALSE, MARGIN = 1L, fromLast = FALSE, ...)
x |
a vector or a data frame or an array or NULL . |
incomparables |
a vector of values that cannot be compared.
FALSE is a special value, meaning that all values can be
compared, and may be the only value accepted for methods other than
the default. It will be coerced internally to the same type as
x . |
fromLast |
logical indicating if duplication should be considered
from the reverse side, i.e., the last (or rightmost) of identical
elements would correspond to duplicated=FALSE . |
... |
arguments for particular methods. |
MARGIN |
the array margin to be held fixed: see
apply . |
This is a generic function with methods for vectors (including lists), data frames and arrays (including matrices).
duplicated(x, fromLast=TRUE)
is equivalent to but faster than
rev(duplicated(rev(x)))
.
The data frame method works by pasting together a character
representation of the rows separated by \r
, so may be imperfect
if the data frame has characters with embedded carriage returns or
columns which do not reliably map to characters.
The array method calculates for each element of the sub-array
specified by MARGIN
if the remaining dimensions are identical
to those for an earlier (or later, when fromLast=TRUE
) element
(in row-major order). This would most commonly be used to find
duplicated rows (the default) or columns (with MARGIN = 2
).
Missing values are regarded as equal, but NaN
is not equal to
NA_real_
.
Values in incomparables
will never be marked as duplicated.
This is intended to be used for a fairly small set of values and will
not be efficient for a very large set.
For a vector input, a logical vector of the same length as
x
. For a data frame, a logical vector with one element for
each row. For a matrix or array, a logical array with the same
dimensions and dimnames.
Using this for lists is potentially slow, especially if the elements
are not atomic vectors (see vector
) or differ only
in their attributes. In the worst case it is O(n^2).
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
x <- c(9:20, 1:5, 3:7, 0:8) ## extract unique elements (xu <- x[!duplicated(x)]) ## similar, but not the same: (xu2 <- x[!duplicated(x, fromLast = TRUE)]) ## xu == unique(x) but unique(x) is more efficient stopifnot(identical(xu, unique(x)), identical(xu2, unique(x, fromLast = TRUE))) duplicated(iris)[140:143] duplicated(iris3, MARGIN = c(1, 3))