Finding expected values of random variables in R (2024)

Today, I answered a StackOverflow question where the author was implementing a function for finding the mean of a continuous random variable, given its probability density function (PDF).

In the process of writing up my answer, I ended up applying some functional programming techniques (specifically function factories). I also found myself exploring the problem quite far outside the scope of the original question, so I thought the full story would make more sense as a blog post – so here we are!

We’ll be going through a simple R implementation for finding the expected value for any transformation of a random variable.

(Feel free to skip ahead if you’re allergic to maths! It’s not long though.)

Expected values

I’m going to assume that you are already familiar with the concepts of random variables and probability density functions, so I’m not going to go over them here. However, as expected values are at the core of this post, I think it’s worth refreshing the mathematical definition of an expected value.

Let \(X\) be a continuous random variable with a probability density function \(f_X: S \to \mathbb{R}\) where \(S \subseteq \mathbb{R}\). Now, the expected value of \(X\) is defined as:

\[ \mathbb{E}(X) = \int_S x f_X(x) dx. \] For a transformation of \(X\) given by the function \(g\) this generalises to:

\[ \mathbb{E}(g(X)) = \int_S g(x) f_X(x) dx. \]

They key point here is that finding expected values involves integrating the PDF of the random variable, scaled in some way.

Moments

Moments in maths are defined with a strikingly similar formula to that of expected values of transformations of random variables. The \(n\)th moment of a real-valued function \(f\) about point \(c\) is given by:

\[ \int_\mathbb{R} (x - c)^n f(x) dx. \]

In fact, moments are especially useful in the context of random variables: recalling that \(\text{Var}(X) = \mathbb{E}((X-\mu)^2)\)1, it’s easy to see that the mean \(\mu\) and variance \(\sigma^2\) of a random variable \(X\) are given by the first moment and the second central moment2 of its PDF \(f_X\). That is:

\[ \mu = \int_\mathbb{R} (x - 0)^1 f_X(x) dx, \] and \[ \sigma^2 = \int_\mathbb{R} (x - \mu)^2 f_X(x) dx. \]

Other properties of distributions (such as skewness) can also be defined with moments, but they’re not that interesting, really. You can read up on that, though, if you’re into that sort of thing.

Analytically

Yes, this can, of course, be done! (For many distributions at least.) But that’s not what we’re here for today. So let’s just… move right along, in an orderly fashion.

Numerically

Like we covered in the maths bit, finding expected values involves finding values of definite integrals. That means that the problem can be solved computationally with the use of numerical integration methods.

Numerical integration

We could write our own function to do just that. In R, a bare-bones implementation of numerical integration would look something like this:

integrate_numerically <- function(f, a, b, n = 20) { dx <- (b - a) / n x <- seq(a, b - dx, dx) sum(f(x) * dx)}

This function finds the area under a curve \(f\), between points \(a\) and \(b\), by splitting the interval \([a,b]\) into \(n\) smaller “sub-intervals”, and then approximating the area in each sub-interval with the area of a rectangle.

For each sub-interval, the approximating rectangle has width equal to the width of the sub-interval, or “\(dx\)”, and height equal to the value of the function \(f\) evaluated at the starting point of the sub-interval.

Here’s a quick diagram3 to illustrate:

Finding expected values of random variables in R (1)

integrate_numerically(dnorm, -1.96, 1.96)
## [1] 0.9492712

Fortunately we don’t have to be content with that. Since numerical integration is an important computational tool that comes up in many applications, smarter people already thought about it more carefully, and implemented the integrate function. (It does numerical integration too, but better.) Our implementation isn’t awful for this specific problem, but it could be a lot more efficient.

integrate(dnorm, -1.96, 1.96)
## 0.9500042 with absolute error < 1e-11

Expected values with numerical integration

Well, we eventually made it here! The point that we’ve been slowly approaching here is: that based on the formulae presented earlier in the maths bit, we can use integrate to find the expected value of a random variable, or a transformation of one, given its PDF. Here’s how:

integrate(function(x) x * dnorm(x, mean = 5), -Inf, Inf)
## 5 with absolute error < 6e-05

We could wrap this in a function for finding the mean:

find_mean <- function(f, ..., from = -Inf, to = Inf) { integrate(function(x) x * f(x, ...), from, to)}

And then try it out with some simple distributions:

find_mean(dexp, rate = 2)
## 0.5 with absolute error < 8.6e-06

But it could also be useful to generalise a bit, and create a function factory instead. That would be a good way to avoid duplicating code if we wanted to find other moments, or indeed expected values of transformations. The idea is to make a function that, given a transformation function, will return another function that finds the expected value of that transformation of a random variable:

ev_finder <- function(transform = identity) { function(f, ..., from = -Inf, to = Inf) { integrate(function(x) transform(x) * f(x, ...), from, to) }}

Since we know that finding moments of PDFs can be seen as a special case of expected values of transformations, we can wrap ev_finder here to define another function factory, this time for easy generation of functions to find moments.

moment_finder <- function(n, c = 0) { ev_finder(function(x) (x - c) ^ n)}

Then, using moment_finder, we could define find_mean from before with one line. But moment_finder also makes it simple to define a function to find the variance (i.e.the second central moment):

find_mean <- moment_finder(1)find_variance <- function(f, ...) { mu <- find_mean(f, ...)$value moment_finder(2, mu)(f, ...)}

And again, we can try it out on some distributions:

find_variance(dnorm, mean = 2, sd = 2)
## 4 with absolute error < 2.5e-06
find_variance(dexp, rate = 1 / 4)
## 16 with absolute error < 0.00051

There we go! Expected values for random variables and transformations – sorted.

Or are they…

Now to be clear, this implementation of finding expected values isn’t perfect. To be honest, it’s actually kind of rubbish. Among other issues, it fails quite quickly with even slightly larger means4:

find_mean(dnorm, mean = 20)
## 1.148684e-05 with absolute error < 2.1e-05

So, it’s clear that we won’t be using this exact implementation of this method for any serious applications. But I think the process illustrates the benefits of function factories for generalisations quite well. And I had a lot of fun writing this post!

  1. So given \(g\) such that \(g(x) = (x - \mu)^2\) we can write \(\text{Var}(X)\) as the expected value of a transformed \(X\): \(\text{Var}(X) = \mathbb{E}(g(X))\)

  2. Moments where \(c = \mathbb{E}(X)\) are called central moments.

  3. If you want to see the R code I used to create this plot, check out the R Markdown source document for this post on GitHub!

  4. Actually, the issue seems to pop up when the mean and variance are too far apart, as find_mean(dnorm, mean = 20, sd = 2) works fine.

Finding expected values of random variables in R (2024)

FAQs

Finding expected values of random variables in R? ›

It is calculated by multiplying the probability of each outcome by its respective payoff and then summing the results. To calculate expected value in R, you can use the dbinom() function to obtain the probability of each outcome, and then multiply that probability by its expected payoff.

How do you find the expected value of a random variable? ›

To find the expected value, E(X), or mean μ of a discrete random variable X, simply multiply each value of the random variable by its probability and add the products. The formula is given as E ( X ) = μ = ∑ x P ( x ) .

How to calculate expected return in R? ›

The expected return is calculated by multiplying the probability of each possible return scenario by its corresponding value and then adding up the products. The expected return metric—often denoted as “E(R)”—considers the potential return on an individual security or portfolio and the likelihood of each outcome.

How to calculate E xy in R? ›

To obtain E(XY), in each cell of the joint probability distribution table, we multiply each joint probability by its corresponding X and Y values: E(XY) = x1y1p(x1,y1) + x1y2p(x1,y2) + x2y1p(x2,y1) + x2y2p(x2,y2). p(xi,yj) = P(xi)P(yj). E(XY) = x1y1P(x1)P(y1) + x1y2P(x1)P(y2) + x2y1P(x2)P(y1) + x2y2P(x2)P(y2).

How to get random values in R? ›

Random Number Generator in R
  1. Code. RandomNum <- runif(50, 1, 99) ...
  2. Code: set.seed(5) # random number will generate from 5. ...
  3. Code: set.seed(12) # random number will generate from 12. ...
  4. Code. # To get 5 uniformly distributed Random Numbers. ...
  5. Code. # Get 5 random Numbers from 5 to 99. ...
  6. Code. ...
  7. Code. ...
  8. Code:
Mar 20, 2023

What is the formula for expected value of product of random variables? ›

In general, the expected value of the product of two random variables need not be equal to the product of their expectations. However, this holds when the random variables are independent: Theorem 5 For any two independent random variables, X1 and X2, E[X1 · X2] = E[X1] · E[X2].

What is the formula for expected value? ›

To calculate the expected value, use the formula for the expected value of a binomial random variable: E [ X ] = p × q , where p is the binomial probability, and q is the number of trials. In this example, the binomial probability is 0.73 and the number of trials is 2, so the expected value is 0.73 x 2 = 1.46.

Is expected value the same as mean? ›

The phrase expected value is a synonym for mean value in the long run (meaning for many repeats or a large sample size).

How do you calculate the expected value of the sum of two random variables? ›

The expected value of the sum of several random variables is equal to the sum of their expectations, e.g., E[X+Y] = E[X]+ E[Y] . On the other hand, the expected value of the product of two random variables is not necessarily the product of the expected values.

What does predict () return in R? ›

Function predict() returns a named list of class Prediction() . Its most important element is $data which is a data. frame that contains columns with the true values of the target variable (in case of supervised learning problems) and the predictions.

What is the expectation and variance in R? ›

The expected value measures the center of the distribution, but we also want a measure of how spread out the results are. Statisticians use the variance to measure this. Variance is the average squared distance of each value from the mean of the sample.

What is the formula for the future value of R? ›

The Four Formulas
FV = PV (1+r)nFind the Future Value when we know a Present Value, the Interest Rate and number of Periods.
r = ( FV / PV )1/n - 1Find the Interest Rate when we know the Present Value, Future Value and number of Periods.
5 more rows

How to get expected values in R? ›

Expected value in R is the expected outcome of a given event or experiment. It is calculated by multiplying the probability of each outcome by its respective payoff and then summing the results.

How to find the expected value of e xy? ›

In summary, to compute E[XY], we need to multiply each possible value of X by the corresponding possible value of Y, and then multiply that product by the joint probability of X and Y taking those values. Finally, we sum over all possible values of X and Y to get the expected value of XY.

What is the expectation of two random variables? ›

For any two random variables X and Y , E(X+Y)=E(X)+E(Y) E ( X + Y ) = E ( X ) + E ( Y ) That is, the expected value of the sum is the sum of expected values, regardless of how the random variables are related.

How to find SD in R? ›

A: To calculate standard deviation in R, you can use the built-in sd() function. Simply pass your dataset or numeric vector as an argument to sd() , like sd(your_data) . This function calculates the sample standard deviation by default, offering a straightforward method for beginners to apply this statistical measure.

How do you find the expected value of a geometric random variable? ›

The expected value of a Geometric Distribution is given by E[X] = 1 / p. The expected value is also the mean of the geometric distribution.

How do you find the probability of a value in R? ›

To find probabilities for probability distributions for a binomial experiment, you use dbinom(r,n,p) –finds P(X=r) with n trials and the probability of a success is p. pbinom(x,n,p) –finds P(x<=r) with n trials and the probability of a success is p.

References

Top Articles
Latest Posts
Article information

Author: Edmund Hettinger DC

Last Updated:

Views: 5541

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Edmund Hettinger DC

Birthday: 1994-08-17

Address: 2033 Gerhold Pine, Port Jocelyn, VA 12101-5654

Phone: +8524399971620

Job: Central Manufacturing Supervisor

Hobby: Jogging, Metalworking, Tai chi, Shopping, Puzzles, Rock climbing, Crocheting

Introduction: My name is Edmund Hettinger DC, I am a adventurous, colorful, gifted, determined, precious, open, colorful person who loves writing and wants to share my knowledge and understanding with you.