I have a huge vector which has a couple of NA
values, and I'm trying to find the max value in that vector (the vector is all numbers), but I can't do this because of the NA
values.
How can I remove the NA
values so that I can compute the max?
I have a huge vector which has a couple of NA
values, and I'm trying to find the max value in that vector (the vector is all numbers), but I can't do this because of the NA
values.
How can I remove the NA
values so that I can compute the max?
Trying ?max
, you'll see that it actually has a na.rm =
argument, set by default to FALSE
. (That's the common default for many other R functions, including sum()
, mean()
, etc.)
Setting na.rm=TRUE
does just what you're asking for:
d <- c(1, 100, NA, 10)
max(d, na.rm=TRUE)
If you do want to remove all of the NA
s, use this idiom instead:
d <- d[!is.na(d)]
A final note: Other functions (e.g. table()
, lm()
, and sort()
) have NA
-related arguments that use different names (and offer different options). So if NA
's cause you problems in a function call, it's worth checking for a built-in solution among the function's arguments. I've found there's usually one already there.
-Inf
for a d
of all NAs.
Aug 1 '19 at 23:27
max()
behaves (as, for instance, when doing max(c(NA, NA)
). Personally, I think its behavior is reasonable; I expect it was constructed that way so that you get the expected result when doing things like a <- c(NA, NA); b <- 1:4; max(c(max(a, na.rm = TRUE), max(b, na.rm = TRUE)))
Aug 2 '19 at 20:23
NA
-handling facilities in Python's excellent NumPy package.)
Aug 2 '19 at 20:24
NA
s from a vector of NA
s, you would expect an empty vector, not -∞.
Jan 29 '20 at 22:57
The na.omit
function is what a lot of the regression routines use internally:
vec <- 1:1000
vec[runif(200, 1, 1000)] <- NA
max(vec)
#[1] NA
max( na.omit(vec) )
#[1] 1000
Use discard
from purrr (works with lists and vectors).
discard(v, is.na)
The benefit is that it is easy to use pipes; alternatively use the built-in subsetting function [
:
v %>% discard(is.na)
v %>% `[`(!is.na(.))
Note that na.omit
does not work on lists:
> x <- list(a=1, b=2, c=NA)
> na.omit(x)
$a
[1] 1
$b
[1] 2
$c
[1] NA
?max
shows you that there is an extra parameter na.rm
that you can set to TRUE
.
Apart from that, if you really want to remove the NA
s, just use something like:
myvec[!is.na(myvec)]
You can call max(vector, na.rm = TRUE)
. More generally, you can use the na.omit()
function.
Just in case someone new to R wants a simplified answer to the original question
How can I remove NA values from a vector?
Here it is:
Assume you have a vector foo
as follows:
foo = c(1:10, NA, 20:30)
running length(foo)
gives 22.
nona_foo = foo[!is.na(foo)]
length(nona_foo)
is 21, because the NA values have been removed.
Remember is.na(foo)
returns a boolean matrix, so indexing foo
with the opposite of this value will give you all the elements which are not NA.
I ran a quick benchmark comparing the two base
approaches and it turns out that x[!is.na(x)]
is faster than na.omit
. User qwr
suggested I try purrr::dicard
also - this turned out to be massively slower (though I'll happily take comments on my implementation & test!)
microbenchmark::microbenchmark(
purrr::map(airquality,function(x) {x[!is.na(x)]}),
purrr::map(airquality,na.omit),
purrr::map(airquality, ~purrr::discard(.x, .p = is.na)),
times = 1e6)
Unit: microseconds
expr min lq mean median uq max neval cld
purrr::map(airquality, function(x) { x[!is.na(x)] }) 66.8 75.9 130.5643 86.2 131.80 541125.5 1e+06 a
purrr::map(airquality, na.omit) 95.7 107.4 185.5108 129.3 190.50 534795.5 1e+06 b
purrr::map(airquality, ~purrr::discard(.x, .p = is.na)) 3391.7 3648.6 5615.8965 4079.7 6486.45 1121975.4 1e+06 c
For reference, here's the original test of x[!is.na(x)]
vs na.omit
:
microbenchmark::microbenchmark(
purrr::map(airquality,function(x) {x[!is.na(x)]}),
purrr::map(airquality,na.omit),
times = 1000000)
Unit: microseconds
expr min lq mean median uq max neval cld
map(airquality, function(x) { x[!is.na(x)] }) 53.0 56.6 86.48231 58.1 64.8 414195.2 1e+06 a
map(airquality, na.omit) 85.3 90.4 134.49964 92.5 104.9 348352.8 1e+06 b