drsp (Dylan R. S. Poulsen)

Comparing, Contrasting, and Unifying the Discrete Time and Continuous Time Fourier Transforms

2023-01-06T00:00:00+00:00

In doing research on the Fourier Transform, I have had some interesting insights that I would like to share here.

Transform Definitions

Classical Definitions

The (classical) Continuous Time Fourier Transform (cCTFT) \({\cal F}_{\mathbb{R}}\) of a function \(f:\mathbb{R} \rightarrow \mathbb{R}\) is given by

\[{\cal X}_{\mathbb{R}}\{f\}(\omega) := \int_{-\infty}^{\infty} f(t) e^{-i \omega t} \; dt.\]

The (classical) Discrete Time Fourier Transform (cDTFT) \({\cal F}_{\mathbb{Z}}\) of a function \(f:\mathbb{Z} \rightarrow \mathbb{R}\) is usually given by

\[{\cal X}_{\mathbb{Z}}\{f\}(\omega) := \sum_{t=-\infty}^{\infty} f(t) e^{-i \omega t}.\]

Moving Towards a Unified Definition

The cDTFT definition feels and looks correct, since it looks exactly like the continuous time definition, but with the integral replaced by a sum. I would tend to agree, but I study time scales, which seeks to unify the continuous and discrete under one analytic framework. One issue I have with this definition is that the domain of both transforms is the same: \(\omega\) can be any real number. In reality, the cCTFT and cDTFT can both be thought of as being evaluated on the cusp between the unstable region and the stable region of the Laplace Transform and the Z-Transform, respectively. This means the domain of the cCTFT is \(\mathbb{R}\), as we already have, but the domain of the cDTFT should really be the unit circle. Let’s make this more explicit by defining the cDTFT in terms of the variable \(e^{i \omega}\), which is always on the unit circle.

\[{\cal X}_{\mathbb{Z}}\{f\}(e^{i \omega}) = \sum_{t=-\infty}^{\infty} f(t) \frac{1}{( e^{i \omega} )^t }.\]

Now, the exponential function in the denominator of the integrand above is looking like the delta exponential on the integers, but without the shift of plus one. Define the new variable \(\xi = (e^{i \omega} -1)/i\). Notice since \(e^{i \omega}\) is always on the unit circle, \(\xi\) is always on the unit circle shifted left by one, then rotated by \(-\pi/2\) radians (this is the same as saying \(\xi\) is on the unit circle shifted up by one unit) This leads to the same cDTFT in terms the new variable \(\xi\),

\[{\cal X}_{\mathbb{Z}}\{f\}(\xi) = \sum_{t=-\infty}^{\infty} f(t) \frac{1}{(1+i \xi)^{t}}.\]

Notice nothing has really changed from the orginal cDTFT definition. I have simply recontextualized the definition and made one change of variable (notice \(1+ i \xi = e^{i \omega}\), so the denominator has remained the same).

Now, for reasons that will be revealed later, I actually want to alter the definition of the cCTFT and cDTFT to arrive at the (unified) Continuous Time Fourier Transform (uCTFT) and (unified) Discrete Time Fourier Transform (uDTFT)

Unified Continuous Time Fourier Transform Definition

For the cCTFT, let \(\xi = \omega\) and define the uCTFT of \(f\), \({\cal F}_{\mathbb{R}}\), as

\[{\cal F}_{\mathbb{R}}\{f\}(\xi) := {\cal X}_{\mathbb{R}}\{f\}(\xi).\]

That is, the cCTFT and uCTFT are exactly the same, except I am renaming \(\omega\) as \(\xi\) (thrilling, I know).

Unified Discrete Time Fourier Transform Definition

For the cDTFT, let \(\xi = (e^{i \omega} - 1)/i\) and define the uDTFT of \(f\), \({\cal F}_{\mathbb{Z}}\), as

\[\begin{aligned} {\cal F}_{\mathbb{Z}}\{f\}(\xi) & := \frac{1}{1+i \xi} {\cal X}_{\mathbb{Z}}\{f\}(\xi) \\ & = \sum_{t=-\infty}^{\infty} f(t) \frac{1}{(1+i \xi)^{t+1}}. \end{aligned}\]

I have added in a forward shift to preserve operational properties.

Operational Properties

The differential operator of \(\mathbb{R}\) is the derivative. The differential operator I want to use on \(\mathbb{Z}\) is the forward difference \(\Delta f(t):= f(t+1)-f(t).\) There is a key property that uCTFT and uDTFT share with respect to their respective differential operators.

uCTFT of a Derivative

Using integration by parts and the fact that acceptable signals must go to zero as time approaches infinity in either direction gives us

\[\begin{aligned} {\cal F}_{\mathbb{R}}\{f'\}(\xi) & = \int_{-\infty}^{\infty} f'(t) e^{-i \xi t} \; dt. \\ & = f(t) e^{-i \xi t} \rvert_{-\infty}^{\infty} - \int_{-\infty}^{\infty} f(t) - i \xi e^{-i \xi t} \; dt \\ & = 0 + i \xi \int_{-\infty}^{\infty} f(t) e^{-i \xi t} \; dt \\ & = i \xi {\cal F}_{\mathbb{R}}\{f\}(\xi). \end{aligned}\]

uCTFT of a Forward Difference

Note that \(\begin{aligned} \frac{1}{1+i \xi} ({\cal F}_{\mathbb{Z}}\{\Delta f\}(\xi) - i \xi {\cal F}_{\mathbb{Z}}\{f\}(\xi)) &= \sum_{t=-\infty}^{\infty} \frac{(f(t+1) - f(t)) - i \xi f(t)}{(1+i \xi)^{t+2}} \\ & = \sum_{t=-\infty}^{\infty} \frac{f(t+1)-(1+i \xi) f(t)}{(1+i \xi)^{t+2}} \\ & = \sum_{t=-\infty}^{\infty} \frac{f(t+1)}{(1+i \xi)^{t+2}} - \frac{f(t)}{(1+i \xi)^{t+1}} \\ & = \sum_{t=-\infty}^{\infty} \frac{f(t+1)}{(1+i \xi)^{t+2}} - \sum_{t=-\infty}^{\infty} \frac{f(t)}{(1+i \xi)^{t+1}} \\ & = {\cal F}_{\mathbb{Z}}\{f\}(\xi) - {\cal F}_{\mathbb{Z}}\{f\}(\xi) \\ & = 0 \end{aligned}\)

Thus \({\cal F}_{\mathbb{Z}}\{\Delta f\}(\xi) = i \xi {\cal F}_{\mathbb{Z}}\{f\}(\xi).\) This matches how the unified transform interacted with the differential operator on \(\mathbb{R}\)

Okay, I have to be honest here that we didn’t need the extra \((1+i \xi)\) in the denominator for this to work out on \(\mathbb{Z}\). But, the shift forward is absolutely essential when trying to make this work on arbitrary time domains.

Domains

The uCTFT is defined on the real line. The uDTFT is defined on the unit circle shifted up by one, so it is tangent to the real axis and in the upper-half plane.

Let’s think about the time domain \(h \mathbb{Z} = \{...-3h, -2h, -h, 0 , h, 2h, 3h, ...\}.\) For \(s \in h \mathbb{Z}\), one can perform the change of variables \(t= s/h\) and perform a uDTFT. However, looking at the domain of this transformation in the orginal domain, we see the domain of the transform is the disc of radius \(1/h\) that is tangent to the real axis and in the upper-half plane. This lets us see a beautiful unity in our approach. As \(h \rightarrow 0\), the domain of the Fourier transform becomes a bigger and bigger circle, so big that the bottom part of the circle becomes almost a straight line – the real axis. This shows that the domain of the uDTFT becomes the domain of the cDTFT in the limit as \(h \rightarrow 0\) (which means we’re making a better and better discrete-time approximation of continuous time).

This browser does not display the video tag.

Revisiting the Domain

Let’s think a little more deply about the domain of these Fourier Transforms. In order for the Transform to be well-defined, the kernels must be bounded in time as \(t \rightarrow \pm \infty\) (the kernels being the functions that \(f(t)\) is multiplied by in the Fourier Transform). If they were not, then the integrals/sums in the Fourier Transforms would diverge.

Continuous Time Kernel

The kernel of the uCTFT is \(K(t,\xi) = e^{-i \xi t}\). Fix \(\xi \in \mathbb{C}\). For which values of \(\xi\) does \(K(t,\xi)\) remain bounded as \(t \rightarrow \pm \infty\)? Well, as long as \(\xi \in \mathbb{R}\) then\(\lvert e^{-i \xi t} \rvert = 1\). However, if \(\xi \not \in \mathbb{R}\), then the modulus of \(K\) will grow arbitrarily large either as \(t \rightarrow \infty\) or as \(t \rightarrow -\infty\). Why? Assume \(\xi = a + bi\), where \(b>0\), for example. Then \(\lvert e^{-i \xi t} \rvert = \lvert e^{-i(a+bi)t} \rvert= \lvert e^{-ait}e^{bt} \rvert = \lvert e^{bt} \rvert \rightarrow \infty\) as \(t \rightarrow \infty\).

Discrete Time Kernel

The kernel of the uDTFT is \(K(t,\xi) = 1/(1+i \xi)^{t+1}\). Fix \(\xi \in \mathbb{C}\). For which values of \(\xi\) does \(K(t,\xi)\) remain bounded as \(t \rightarrow \pm \infty\)? Well, we need \(1/(1+i \xi)^{t+1} = 1/\lvert 1+i \xi\rvert^{t+1}\) to be bounded. This requires the base of the exponential function to be one, so \(\lvert 1+i \xi \rvert=1\). Note that this is the equation for a circle of radius one centered at \(i\), which is exactly the unit circle shifted up by one that we have previously discussed.

Understanding the DFT in this new light.

Pedagogically, I find it difficult to motivate the Discrete Fourier Transform (DFT). The DFT is obtained from the cDTFT by sampling \(\omega\) uniformly. To me, this feels like we are treating \(\omega\) as being on a line, and chopping it up into equally-sized pieces. But, we have seen the issue with thinking about \(\omega\) as being the variable in the cDTFT, in that it tends to cover up the fact that the domain is a circle and that the variable is really best thought of as \(e^{i \omega}\). I think our visualization of the domain can help.

For the cDTFT, the domain is a complete, continuous unit circle. For the DFT, the domain becomes points equally distributed about the circle, with \(e^{-i 0 t}=1\) as the anchor point.

If we label the zeroth point to be the anchor point, then proceed around the circle naming the first point, second point, third point, etc., then we recover the domain of the DFT, \(\{0,1,2,3,...,N\}\). This is acceptable in practice because the DFT is a sequence, but it perhaps obscures what is going on and can lead to confusion.

In our unified view, the domain of the uDTFT is the unit circle shifted up one. The unified Discrete Fourier Transform would also be points equally distributed about the circle, but with the origin as the anchor point. That feels better!

The Inverse Fourier Transform

I have always found the inverse cDTFT to be puzzling, since it is an integral in \(\omega\):

\[{\cal X}_{\mathbb{R}}^{-1}{F}(t) = \frac{1}{2 \pi} \int_{-\pi}^{\pi} F(e^{i \omega}) e^{i \omega t} \; d \omega.\]

The view that the domain of the cDTFT is actaully just a circle helps this integral make more sense. Really, the integral is a contour integral around the unit circle. The reason this integral looks like a real integral is that the unit circle has been parameterized and this form is a result of that parameterization.

The Nyquist Frequency

If the sampling rate is given by \(v\), the so-called Nyquist frequency is given by \(v/2\). If a signal has all its frequencies below the Nyquist frequency, then the signal can be perfectly reconstructed from the cDTFT. What does this mean in our scenerio? If the sampling rate is \(v\) Hertz then then time scale is \(\frac{1}{v} \mathbb{Z}\) (and it means that the angular velocity is \(2 \pi v\)). Therefore the region that the uDTFT is defined over is a circle of radius \(v\) centered at \(v i\). Remember, we have the view that frequency \(\omega\) is mapped to a point \(\xi\) on this domain (one could write \(\xi(\omega)\) to emphasize that \(\xi\) is a function of \(\omega\)).

The Fourier Transform on this time scale is

\[{\cal F}_{\frac{1}{v} \mathbb{Z}}\{f\}(\xi) = \sum_{t=-\infty}^{\infty} f(t) \frac{1/v}{(1+i \xi/v)^{t+1}}.\]

The relationship between \(\xi\) and \(\omega\) on this time scale is

\[\xi(\omega) = \frac{v}{i} (e^{i \omega/v} -1),\]

which again, can be thought as taking the unit circle, shifting it left by one, rotating it \(\pi/2\) radians to the left, then scaling up by factor of \(v\).

Notice that this relationship between \(\xi\) and \(\omega\) is \(2 \pi v\) periodic. One would think this would mean that there would not be overlap and confusion unless the frequency exceeded \(2 \pi v.\) However, there are symmetries (time reversal and conjugation) which essentially induce this overlap early - when the frequency exceeds \(\pi v\).

What I mean by overlap and confusion is that all frequencies that map to the same value of \(\xi\) will contribute to the Fourier transform evaluated at \(\xi\). This is called aliasing.

Perhaps what I like about this viewpoint is that there are two frequencies at work here. There is the sampling rate, which determines the size of the circle (the radius), and the variable frequency \(\omega\) which determines the angle from the cetner of the circle. So, the two frequencies act as the two components of a polar representation of the domain.

On a general time scale, there are many more frequencies at work (the time between samples is a function of time. But, this seperation is still at play. The sampling frequencies determine the shape of the domain (they won’t be circles anymore!) and the variable frequency \(\omega\) determines the location on the domain.

Two time steps.

Suppose that we have a signal that is sampled non-uniformly in time. For the sake of this example, let’s say the signal is sampled with a one-second gap, then a two-second gap, then a one-second gap, then a two-second gap, and so on. The sampling times form a time scale \(\mathbb{T}_{1,2} = \{\ldots, -7,-6,-4,-3,-1,0,1,3,4,6,7,ldots}.\) The unified Fourier Transform my colleagues and I have developed becomes, in this instance

\[\begin{aligned} {\cal F}_{\mathbb{T}_{1,2}}\{f\}(\xi) & = \ldots f(-7) (1+i \xi)^2 (1+2 i \xi)^2 + 2 f(-6) (1+i \xi)^2 (1+ 2 i \xi) \\ & + f(-4) (1+i \xi) (1+2 i \xi) + 2 f(-3) (1+i \xi) \\ & + f(-1) + \frac{f(0)}{(1+ i \xi)} + 2 \frac{f(1)}{(1+i \xi)(1+2 i \xi)} \\ & + \frac{f(3)}{(1+i \xi)^2 (1+2 i \xi)} + 2 \frac{f(4)}{(1+i \xi)^2 (1+2 i \xi)^2} + \ldots \end{aligned}\]

In order for the kernel to be bounded as \(t \rightarrow \pm \infty,\) it is neccesary and sufficient that

\[|(1+i \xi)(1 + 2 i \xi)| = 1.\]

The set of \(\xi \in \mathbb{C}\) for which this is true is shown below. This is the domain of this Fourier Transform. Note that the region is still tangent to the real axis and entirely in the upper-half plane.

This region is in some sense the average of the unit circle tangent to the real-axis and the circle of radius 1/2 tangent to the real axis - ie/ the average of the domain of the Fourier Transform on \(\mathbb{Z}\) and the Fourier Transform on \(2 \mathbb{Z}\).

The mapping from frequency \(\omega\) to point in the complex plane \(\xi\) is complicated, even in this basic case. While the domain looks elliptical, it is not an ellipse. This mapping is, however, periodic with period \(4 \pi/3.\) This is again in agreement with the Nyquist frequency, as it has been proven that a signal can be reconstructed if the average sampling rate satisfies the Nyquist criterion and the signal has a finite bandwidth. Here, the average is two samples in three seconds, so \(v = 2/3\), and the Nyquist frequency should be \(2 \pi/3\), which is exactly half the peiod yet again!

Antarctic Ice Sheet Example

Consider the Vostok Ice Core CO2 Data, which shows CO2 concentrations over the past 419,000 years, sampled non-uniformly in time via an ice core sample. This gives us a signal \(f(t)\), where \(f\) is CO2 concentration, and \(t\) is defined on the time scale \(\mathbb{T} = \{-413649, -411959, -410653, -407331, -405523, -405523,\ldots\}\), measured in years, with \(t=0\) being the age of the ice at the last data point, 5679 years ago.

The time between samples is 1142.68 years on average, so we do expect the region where the Fourier Transform is defined to be relatively smaller than the regions we have encountered so far.

Region where the Fourier Transform is defined for the Vostok Ice Core CO2 Data. It is almost lemon-shaped.

Again, we see that the sampling rate determine the domain of the Fourier Transform.

Below is the plot of the the signal versus time in years.

Just as with the Discrete Fourier Transform, in order to work with this signal effectively, we need to assume the signal is periodic. Moreover, our work assumes that the time domain is symmetric about the origin. Moreover, what is the time step between the last data point and the copy of the first data point when the signal repeats? We propose using the average time step in the signal, so as to not change the average and hence to not change the Nyquist frequency. Now the graph of the signal looks like the graph below, and continuing on in either direction to infinity.

We are not saying that we know the CO2 levels 800,000 years into the future, we are just augmenting the signal to allow the theory to analyze the signal.

The Nyquist frequency for this example is \(\pi/1142.68\) radians per year, which corresponds to a period of 363.7 years. We should be able to detect patterns at this time scale and larger in the data.

Takeaways

When considering Fourier Transforms, there are two frequencies that play a role. The sampling rate and the target signal frequency (the variable \(\omega\) in the cCTFT).
The sampling rate of a signal manifests in the shape of the domain of the Fourier Transform.
The domain of the Fourier Transform is parameterized by the target signal frequency \(\omega.\) The Nyquist frequency manifests in the periodicity of this parameterization.
The cCTFT and cDTFT are really two manifestations of the same process. A simple change of variables helps us to see how the cDTFT becomes the cCTFT as the sampling rate goes to infinity.

The Math of Shiny Hunting in Pokemon

2022-12-15T00:00:00+00:00

When I am not teaching or writing about mathematics, one of my hobbies is playing Pokemon games. In particular, I have been playing Pokemon Go for the past six years, and have recently started playing Pokemon Scarlet and Violet.

Pokemon Go has encouraged me to be active and to meet people both in my local community and around the world. I know people from my town that I wouldn’t otherwise know from the game. I have traveled to events and met people who I have befriended online. I find the games to be very relaxing, and a way to redirect my thoughts away from work.

A Group Photo at a meetup in Washington DC for members of a Discord server I administrate. I am third from the left.

A Group Photo at a Pokemon Go Tournament in Baltimore. I am in the middle in the yellow and blue shirt with the Ho-Oh on my shoulder. I know all these people from a Discord server that I administrate.

That said, I have a tendancy to look for the math in everything (although math is NOT everywhere, as I tell my History of Mathematics students). I have joked with colleagues that I could teach a whole introductory mathematics course on the math of Pokemon. One subject in that course would be probability theory via shiny hunting.

A shiny Pokemon is a differently-colored version of a Pokemon that appear randomly and very rarely. Although they are purely cosmetic, they are highly sought after trophies, with people dedicating hours, days, even months, trying to find a shiny Pokemon. So many Twitch and Youtube streams are dedicated to shiny hunting. So. Many.

Shiny Pokemon hunting is a wonderful lens through which to understand some of the most important concepts in probability theory: Independence, Bernoulli trials, the binomial distribution, the geometric distribution, the Poisson distribution, the exponential distribution, and the memoryless property. Let’s take each of these in turn.

Independence

In Pokemon Scarlet and Violet,the probability that an individual Pokemon will be shiny is 1/4096. With certain methods in the game, this probability can be increased up to 1/512. Whether a given Pokemon is shiny or not does not depend on whether another Pokemon is shiny or not. In probability, we say that shininess is independent of Pokemon.

When calculating the probability that two independent things happen, one simply multiplies the probability that the first thing happens by the probability that the second thing happpens. For example, the probability of 1) flipping a fair coin and getting a head and 2) rolling a five on a fair six-sided dice is \((1/2) (1/6) = 1/12\).

Bernoulli Trial

A Bernoulli trial is a fancy way of saying “flipping a weighted coin.” Technically, a Bernoulli trial is an experiment where “success” occurs with probability \(p\), and “failure” occurs with probability \(1-p\). There are no other options (here failure is an option).

Let’s recast this definition in terms of Pokemon. Checking whether one Pokemon is shiny (hereafter known as a shiny check) is a Bernoulli trial where “success” means the Pokemon is shiny, and “failure” means the Pokemon is not shiny. The value for \(p\) is \(1/4096\) (without any boosts).

Binomial Distribution

When hunting for shinies, people do not just check one Pokemon and call it a day. The name of the game is to check as many Pokemon as possible as quickly as possible. This means the Bernoulli trial is repeated, and each trial is independent. Let’s say a streamer was going to check 1000 Pokemon, then give up. They are curious in knowing the probability that they see zero shinies, one shiny, two shinies, and so on. The binomial distribution would sate their curiosity.

For \(p=4096\), the probability mass function of the binomial distribution is plotted below. On the \(x\)-axis is the number of shinies, and on the \(y\)-axis is the probability.

The probability that the streamer fails the entire shiny hunt is almost \(80\%\), while the probability of getting one shiny is almost \(20\%\). There is a small probability of getting two (or more!) shinies.

Let’s break down one of these probabilities. If the streamer were to get one shiny, then they would have 999 failures and one success. The probability of this is \((1/4096)^{1} (4095/4096)^{999}\). But, this does not account for the order that the streamer finds a shiny. They could find the shiny on the first check, or the second check, or the third check, and so on until the 1000th check, which means there are 1000 different orders. The probability of getting exactly one shiny in 1000 checks is then

\(1000 (1/4096)^{1} (4095/4096)^{999} \approx 0.191...,\) or, about 19.1%.

If the streamer were to use all the available methods (complete the Pokedex to get the shiny charm, go to a mass outbreak and clear sixty or more Pokemon, and make a level three sparkling power sandwich), and make \(p=1/512.44\) ¹ during the 1000 Pokemon hunt, the probability mass function for the binomial distribution would instead look like the graph below.

The probability of failing the hunt (getting zero shinies) has decreased from nearly \(80\%\) to about \(14\%\). There is also a much higher probability of two or more shinies during the hunt. The extra effort to increase \(p\) seems to be worth it.

Side Note About the 1/512.44 Probability

The way the boosts in shiny rate work is that instead of doing one Bernoulli trial with \(p=1/4096\) to determine shininess, the game does more than one Bernoulli trial to determine shininess (with the number of trials determined by the boost). If any of these trials are successful, then the Pokemon is shiny. A shiny charm changes the number of Bernoulli trials to three, while the 1/512.44 rate comes from the number changing to eight trials. Why? Let’s work it out!

Let’s look at the binomial distribution with \(p=1/4096\) as before, but with \(8\) trials instead of \(1000\).

That looks like the probability of zero successes in eight rolls is one, but this is why graphs can be misleading. Let’s do the math.

The probability of zero successes in eight independent trials is (by the multiplication rule) \((4095/4096)^8 \approx 0.99804954\). So the probability of more than one success (and therefore of a shiny Pokemon being produced) is about \(p_{\text{shiny}} = 1-.99804954 \approx 0.00195146\). Converting this to a fraction with one in the numerator gives \(1/512.4376602...\), which the Bulbapedia ¹ rounds to \(1/512.44\).

The Geometric Distribution

Oftentimes, a shiny hunter will not be interested in multiple shinies of the same Pokemon. They will stop the hunt if and only if they find one shiny. Here, the shiny hunter is not interested in the number of shinies they get for a fixed number of checks, but instead is interested in the amount of checks until a shiny is found. The geometric distribution addresses this idea.

The geometric distribution gives the probability of number of Bernoulli trials until the first success for each possible number of trials. If a person is full-odds shiny hunting (\(p=1/4096\)), the geometric distribution for the number of checks until a shiny is found is shown below.

First, notice how the \(x\)-axis is now “Number of Shiny Checks.” Also, The probability axis has very small numbers. This is because the probability (which adds to one) is spread out over many, many possibilities. In this context, it doesn’t make much sense to talk about the probability of it taking exactly, say, 4000 checks to get a shiny. Instead, it is more informative to to look at the probability that it take less than, say, 4000 checks to get the shiny. This idea is called the cumulative distribution function. For a geometric distribution with \(p=1/4096\), the cumulative distribution function has the graph plotted below.

Some features I notice really quickly:

1) If a shiny hunter does 4096 checks, the shiny is not guaranteed. In fact, there is only about a 63% ² chance that the shiny will appear in the first 4096 checks. Even worse, as the number of checks gets large, the function levels off but never actually equals one. This means a shiny is never guaranteed.

2) It takes about 2838 checks in order for there to be a 50% chance of getting a shiny. I think most people would guess 2048 checks.

Some other features are less apparent. One of my favorite facts is that if someone has already checked some shinies, the cumulative distribution function for how many more checks until they encounter a shiny has exactly the graph above. This is called the memoryless property. This should feel both obvious and strange. I like to think about coin flipping. If I flip five heads in a row, but I know the coin is fair, the probability that the next coin is heads has not changed from \(1/2\). The past results have no impact on the future in this regard. So, if someone has already checked 10,000 Pokemon for shininess and has come up empty-handed, the probability that they’ll find a shiny in the next 4096 checks is still only 63%. There’s no credit earned from the universe from those first 10,000 checks.

The Poisson Distribution

In Pokemon, the shiny checks, which are discrete events, still happen in continuous time. Instead of thinking about the number of shinies in a certain amount of checks, as in the Binomial Distribution, one could instead consider the number of shinies in a certain amount of time.

Let’s work with an example. Yesterday, while home sick, I set aside 30 minutes to shiny hunt Magnemite in a mass outbreak with \(p=1/512.44\). Using a technique known as “picnic resetting” I was able to shiny check 15 Magnemite in, on average, 30 seconds (yes, I kept track). This means in 30 minutes, I would do approximately 900 checks. If I had hours and hours to play, I might expect that I would get \(900/512.44 \approx 1.756\) shinies per 30 minute period. But, due to randomness, the amount I get in just one 30 minute period might be larger or smaller. The amount of shinies I get in a 30 minute period with an average amount of 1.756 can be modeled by a Poisson Distribution, a distribution that models the number of rare events that occur in a unit of time.

The probability mass function for the Poisson distribution with an average value of 1.756 is shown below.

We can compare this to a Binomial distribution with \(p=1/512.44\) over 900 shiny checks.

They look very similar, and this is hopefully not a surprise (they are modeling the same thing, after all)! This is a wonderful illustration of the Poisson Paradigm, which states that the Poisson Distribution and the Binomial Distribution are very similar when the probability of success is small and the number of trials is large. One advantage in using the Poisson distribution is that the probabilities are easier to calculate. Another advantage is that we have shifted our thinking from discrete events that exist outside of time thinking of these checks as existing in time.

The Exponential Distribution

The geometric distribution models the amount of shiny checks until the first success. Given the shift to continuous time in the Poisson Distribution, it makes sense to think about the amount of time until the first shiny appears. A big shift has occured here, since time is continuous, not discrete like the previous distributions. The name of the distribution that models the time between rare events with a given average number of events is the exponential distribution.

Continuing with the example in the previous section, I ask how long should it take to find a shiny Magnemite in a mass outbreak (remember that I only have 30 minutes)? I need a average, which I know is 1.756 per 30 minutes. But, if I want a plot where the \(x\)-axis is time in minutes, I should adjust this average to be 1.756/30 per minute. Below is the cumulative distribution function for an exponential distribution with average 1.756/30 versus time in minutes.

According to this model, there is about an 82% chance that I will find the shiny in the thirty minutes I set aside for shiny hunting. If I give myself an hour, the probability increases to 97%. But, cruelly, if I don’t find one in the first 30 minutes, the probability that I find one in the next thirty minutes is only 82%, not 97%. Again, the universe doesn’t give credit for past effort in these regards. The memoryless property strikes again!

The Memoryless Property

Both the Geometric Distribution and the Exponential Distribution have the memoryless property. In fact, they are the only two distributions that have this property.

Conclusion

The Binomial Distribution and the Poisson Distribution are intricately linked, as are the Geometric Distribution and the Exponential Distribution. In fact, I wrote an academic paper that makes this link explicit, showing that they are the result of the same process on different time domains. ³

Even the silly and recreational things in life can lead to interesting and deep ideas if pursued (that is perhaps the entire premise of this blog). That said, getting at deep and interesting ideas should not always be the goal in life. Sometimes, it’s just fun to kick back, relax, and look for differently colored digital monsters.

Bulbapedia Shiny Pokemon Page ↩ ↩²
[\(.63 \approx 1-\frac{1}{e}\), where \(e\) is Euler’s constant \(e \approx 2.71828...\) The fact that this shows up here is not a coincidence.] ↩
The Poisson process and Associated Probability Distributions on Time Scales ↩

Sharing (is) a Piece of Cake

2022-11-27T00:00:00+00:00

When my wife and I get takeout from a local Italian restaurant, we like to order one slice of cake to split for dessert. The slice of cake is very tall, but very skinny, so it comes in a container laying on its side. The slice is thin enough that we do not want to split the cake by turning it upright and creating two equal slices. Instead, we keep the cake on its side and cut it in a way that creates a wedge as one piece, and the rest of the cake as the other piece.

Where should the slice be made to result in an even split?

Quick Remark

As long as people have had to share limited resources, fair sharing has been a topic of interest. For sharing food, a common technique is the “I cut, you choose” method, where one person makes a cut that they deem fair, and the second person chooses which portion to take. This ensures both parties are happy with the result.

This particular problem is potentially deceptive, since it is hard to estimate volumes of different shapes. The analysis to follow can at least set a good baseline for an “I cut, you choose” strategy. That said, it does not account for frosting distribution, amongst other factors.

Get Out Your Protractor and Ruler, We’re Having Cake!

It’s time to discuss mathematics, so by convention I wil switch to using “we” (as in, you and I, dear reader). First, we realize that the answer must depend on the angular size of the cake, since if the angle were extremely small, the place to make the cut would be nearly half the length of the slice.

So, we will model the slice of cake as a sector of a cylinder of radius one, with angle of \(\theta\). To simplify things more, we recognize that the volume of the slice is the area of the top of the slice times the height of the cake, so we can just consider a sector of a circle of radius one with angle \(\theta\) in the “standard position” for a triangle.

Now, we will cut the cake perpindicular to the \(x\)-axis, at some number \(x=c, 0<c<1\). This creates a right triangle with base \(c\) and height \(c \tan(\theta),\) since the tangent of an angle is the ratio of the opposite to the adjacent side of a triangle.

The area of the wedge shape is \(A_{wedge} = \frac{1}{2} c^2 \tan(\theta).\) In order to share fairly, \(A_{\text{wedge}}\) must be half the the area of the entire slice, \(A_{\text{slice}}=\frac{1}{2} \theta.\) That is

\[\frac{1}{2} c^2 \tan(\theta) = \frac{1}{4} \theta.\]

Solving for \(c\), we find \(c=\sqrt{\frac{1}{2} \theta \cot(\theta)}\) (ignoring the negative solution, since it doesn’t make sense in the context of the problem).

Some Examples

\(\theta = \pi/4\)

According to our house rules, a slice is legally defined to be one eighth of the whole. Therefore, for a slice of cake, the cut should be made at

\[c=\sqrt{\frac{1}{2} \left( \frac{\pi}{4} \right) \cot \left(\frac{\pi}{4} \right)} = \sqrt{ \frac{\pi}{8} } \approx 0.6266570686.\]

\(\theta = \pi/6\)

Restaurants are not bound by house rules, so another likely definition of a slice is one twelfth of the whole. In this case, the cut should be made at

\[c=\sqrt{\frac{1}{2} \left( \frac{\pi}{6} \right) \cot \left(\frac{\pi}{6} \right)} \approx 0.6733868435.\]

This is a really nice result, since for any realistic application this means the cake should be cut to the 2/3 of its base length, which is easily estimatable.

Serious mathematics has been inspired by sharing cake. Steinhaus posed to his students the problem of whether there was a strategy like the “I cut, you choose” method for sharing a cake amongst \(n\) people. These students, Banach and Knaster (yes, the Banach of Banach spaces, Banach-Tarski paradox, Banach fixed-point theorem, etc) solved this problem ¹, which will be the subject of a future blog post.

Martin L. Jones. “A Note on a Cake Cutting Algorithm of Banach and Knaster,” The American Mathematical Monthly ↩

MARP for Mathematical Slides in Markdown

2022-11-21T00:00:00+00:00

I recently discovered MARP, which allows the creation of slides from simple markdown. The slides can be exported as HTML, PDF, or Powerpoint once written in markdown.

I decided to write my slides for my colloquim talk about cutting an onion optimally (see also my blog post about this topic Onion Blog Post ) with MARP to test it out. I am very happy with the results, especially being able to export the slides as html and embedding a YouTube video within the slides.

If you want to check out the code for the slides, please see my GitHub repository

If you would like me to give a colloquium talk about this, please contact me via email or leave a reply on Mastodon.

A solution to the Onion Problem of J. Kenji López-Alt

2021-11-13T00:00:00+00:00

Note: This blog post originally appeared on my Medium blog. I am reproducing the article here.

I first became interested in the the problem of cutting onions in a way to reduce the variance of the volumes of the slices at a gathering with friends. One of my friends and colleagues, Dr. Gabe Feinberg, also a mathematician, pointed me to the Youtube video below.

In the video, Chef Kenji López-Alt says he has a friend who is a mathematician, who claims that you should cut radially towards a point 60% of the radius below the center of the onion, and mentions that this might be related to the reciprocal of golden ratio, \(1/\phi = 0.61803398875...\)

I was intrigued by this, and even began cutting onions at home with this technique, just because it made me happy.

Each time I cut an onion for dinner, my mind would wander. I would think about why this is true, and what techniques I could use to approach the problem. While this was meditative for me, these musings did not lead anywhere substantial over the span of two months.

Last weekend, my thoughts actually lead me towards a solution. Within two days I had found the “true onion constant”, which, spoiler alert, is not the reciprocal of the golden ratio. The depth to which you have to aim your knife for radial cuts depends on the number of layers. You can see this by thinking of how to cut an onion with one layer versus an onion with ten layers to keep the pieces as similar as possible. For one layer you would aim towards the center of the onion, but for ten layers you would aim somewhere below the center of the onion. To simplify matters, I therefore thought of an onion with infinitely many layers (or, as Gabe called it, “the great onion in the sky,” which I love). These kind of abstractions are common in mathematics, and make problems tractable. Once there are infinitely many layers, it makes sense to think of infinitely many cuts. This moves the problem into the realm of continuous mathematics, where calculus can be used to great effect.

Here’s the technical part of the post. You probably need to know multivariable calculus to follow from here. I’m going to switch to using “we” instead of “I” to match mathematical writing conventions, and to indicate that we (you, dear reader, and I) are walking down this mathematical path together.

First, we model the onion as half of a disc of radius one, with its center at the origin and existing entirely in the first two quadrants in a rectangular (Cartesian) coordinate system. This ignores a dimension, and perhaps also some geometry of actual onions (are cross sections actually circles?) but makes the problem tractable and is still a good approximation.

The insight that leads to a solution comes from the Jacobian. When we change from rectangular coordinates to polar coordinates in integration, small rectangular pieces of area \(dx dy\) are transformed into small pieces of area \(r dr d \theta\), where \(x = r \cos(\theta)\) and \(y = r \sin(\theta)\). The idea of the Jacobian applies to all changes in coordinate systems. We can calculate the Jacobian as

\[J(r,\theta) = \frac{\partial x}{\partial r} \frac{\partial y}{\partial \theta} = r \cos^2(\theta) + r \sin^2(\theta) = r.\]

Below is a diagram showing the change of coordinates and the Jacobian in this setting.

Notice that the coordinate system cuts the onion, much as usual grid lines cut the plane into rectangles in the Cartesian coordinate system. The radial part of the coordinate system cuts the onion radially (which of course nature does by default, but we need to model this mathematically), while the angular part of the coordinate system cuts the onion as our knife would if we were making straight cuts towards the center of the onion. Even though every piece of the onion is infinitely small (there are infinitely many layers, and infinitely many cuts) The Jacobian \(r dr d\theta\) gives a measure of how big the infinitely small pieces are relative to each other. Pieces near the center of the onion are smaller than pieces near the edge of the onion, as we can see that since \(r\) is smaller towards the center of the onion and larger towards the edge of the onion.

We can find the average value of the function \(f(r,\theta) = r\) over the part of the plane that defines the onion to find the average weight of the infinitesimal area, \(A\).

\[\overline{A} = \frac{\int_{0}^{\pi/2} \int_{0}^{1} r \; dr \; d \theta}{\int_{0}^{\pi/2} \int_{0}^{1} 1 \; dr \; d \theta} = \frac{1}{2}\]

Once we have the average, we can find the variance, \(\sigma^2\), of the weight of the infinitesimal area by calculating

\[\sigma^2 = \frac{\int_{0}^{\pi/2} \int_{0}^{1} (r-\overline{A})^2 \; dr \; d \theta}{\int_{0}^{\pi/2} \int_{0}^{1} 1 \; dr \; d \theta} = \frac{\int_{0}^{\pi/2} \int_{0}^{1} (r-1/2)^2 \; dr \; d \theta}{\int_{0}^{\pi/2} \int_{0}^{1} 1 \; dr \; d \theta}=\frac{1}{12}\]

The variance is a good measure of the uniformity of the pieces. If the variance is large, the pieces are not very uniform, and vice-versa.

The problem with this analysis, of course, is that we are cutting towards the center of the onion. We want to cut towards a point below the center of the onion. To accomplish this, we need a new coordinate system.

We make a coordinate system for cutting towards a point a distance \(h>0\) below the center of the onion. In this coordinate system, we measure the angle \(\theta\) from the point \((0,-h)\), while we measure the radius from the origin \((0,0)\) (both points in the rectangular coordinate system). The radial part of the coordinate system cuts the onion radially from the origin as before, while the angular part of the coordinate system cuts the onion as our knife would if we were making straight cuts towards the point \((0,-h)\), below the onion.

This coordinate system only works for the upper half plane, as there are now technically two points in the plane for a given point \((r,\theta)\). Luckily, our onion is entirely in the upper-half plane!

In this coordinate system, the region of the plane that we model as the onion is defined by \(0 \leq \theta \leq \arctan(1/h)\), and \(h \tan(\theta) \leq r \leq 1\). Notice that we are using symmetry. Usually we would think of the onion as a half-onion in the upper half of the plane. But, since the left side of the onion is a mirror image of the right side of the onion, and therefore both sides would have the same variance in area, we can perform this analysis just in the first quadrant.

The relation between \((x,y)\) and \((r,\theta)\) is less clear. Given we know \(r\), \(h\), and \(\theta\), we can draw the following triangle, with a new variable \(c\) which represents the distance from the point \((0,-h)\) to a given point \((x,y)\) (both in the rectangular coordinate system).

First, using the law of cosines, we can calculate

\[c=h \cos(\theta)+\sqrt{r^2-h^2\sin^2(\theta)}.\]

Using this, we can find the relationship between \((x,y)\) and \((r,\theta)\) as

\[\begin{aligned} x & = c \sin(\theta) \\ y & = c \cos(\theta)-h \end{aligned}\]

From this, for a given depth \(h\), we can calculate the Jacobian as

\[\begin{aligned} \scriptscriptstyle J(r,\theta) =& \scriptscriptstyle \frac{r \cos (\theta ) \left(\sin (\theta ) \left(-\frac{h^2 \sin (\theta ) \cos (\theta )}{\sqrt{r^2-h^2 \sin ^2(\theta )}}-h \sin (\theta )\right)+\cos (\theta ) \left(\sqrt{r^2-h^2 \sin ^2(\theta )}+h \cos (\theta )\right)\right)}{\sqrt{r^2-h^2 \sin ^2(\theta )}}\\ & \scriptscriptstyle -\frac{r \sin (\theta ) \left(\cos (\theta ) \left(-\frac{h^2 \sin (\theta ) \cos (\theta )}{\sqrt{r^2-h^2 \sin ^2(\theta )}}-h \sin (\theta )\right)-\sin (\theta ) \left(\sqrt{r^2-h^2 \sin ^2(\theta )}+h \cos (\theta )\right)\right)}{\sqrt{r^2-h^2 \sin ^2(\theta )}}. \end{aligned}\]

This is, to put it mildly, complicated. Nevertheless, we have done fairly straightforward calculus computations to get here, which shows the power of making this problem continuous. Mimicking what we did before, given a depth \(h\), we can find the average weight of the infinitesimal area, \(A(h)\), by calculating the integral of the Jacobian over the onion region divided by the integral of 1 over the same region

\[\begin{aligned} \overline{A}(h) &= \frac{\int_{0}^{\arctan(1/h)} \int_{h \tan(\theta)}^{1} J(r,\theta) \; dr \; d\theta}{\int_{0}^{\arctan(1/h)} \int_{h \tan(\theta)}^{1} 1 \; dr \; d \theta}. \end{aligned}\]

And the variance of the weight of the infinitesimal area, \(\sigma^2(h)\), is found by calculating the integral of the square of the Jacobian minus \(A(h)\) over the onion region divided by the integral of 1 over the same region

\[\sigma^2(h) = \frac{\int_{0}^{\arctan(1/h)} \int_{h \tan(\theta)}^{1} (J(r,\theta)-\overline{A}(h))^2 \; dr \; d\theta}{\int_{0}^{\arctan(1/h)} \int_{h \tan(\theta)}^{1} \; dr \; d \theta}.\]

Yikes! Integrating this by hand looks really difficult, if not impossible. We should use a computer to help us. Using the power of numerical integration in Mathematica, we can plot the variance versus h, the depth of the point we are cutting towards.

We can see the minimum variance is around \(h=.55\). We can use a numerical minimization technique to find the \(h\) that minimizes the variance.

I am only confident of this number to 7 decimal points, but the “true onion constant” for the “onion in the sky” is given by 0.5573066…

To get the most even cuts of an onion by making radial cuts, one should aim towards a point 55.73066% the radius of the onion below the center. This is close, but different from, the 61.803% suggested in the Youtube video at the top. Also, this number will be different for onions for finitely many layers (that is to say, all onions). Nevertheless, I find this answer to be beautiful, and I will forever treasure the true onion constant.

I think it would be interesting to consider the effect of the number of layers on this answer. Since with one layer the best strategy is to cut towards the center, I suspect that the best depth \(h\) to cut towards increases from zero with one layer, with 0.5573066… as the upper bound on the depth. So, the best depth for an onion with ten layers would be somewhere between 0 and 0.5573066. I have not investigated this in depth, but this seems like a fun next step.

I hope we all now know enough about onions to object.

Exo Comics 685

Update: I actually was able to evaluate \(\sigma^2(h)\) in a closed form. The techniques used to do it are really fun, and I am hoping to write them up for a recreational mathematics journal.

As calculus students know, if you want to minimize a function, you should take the derivative and set it equal to zero. Here, the derivative of \(\sigma^2(h)\) is given by

\[[\sigma^2(h)]' = \frac{k(h)}{48 \left(\cot ^{-1}(h)-\frac{1}{2} h \log \left(\frac{1}{h^2}+1\right)\right)^3},\]

where

\[\begin{aligned} &\scriptscriptstyle k(h)= \scriptscriptstyle -3 \pi ^2 \log \left(\frac{1}{h^2}+1\right) \\ & \scriptscriptstyle + 6 \left(h \log \left(\frac{1}{h^2}+1\right)-2 \cot ^{-1}(h)\right)^2 \left(h \left(h \log \left(4 h^2\right)+4 \sqrt{1-h^2} \left(\tan ^{-1}\left(\frac{h+1}{\sqrt{1-h^2}}\right)-\sin ^{-1}(h)\right)\right)+1\right)\\ & \scriptscriptstyle -2 \log \left(\frac{1}{h^2}+1\right) \left(h \log \left(\frac{1}{h^2}+1\right)-2 \cot ^{-1}(h)\right) \\ & \scriptscriptstyle \times \left(4 \left(1-h^2\right)^{3/2} \sin ^{-1}(h)-4 \left(1-h^2\right)^{3/2} \tan ^{-1}\left(\frac{h+1}{\sqrt{1-h^2}}\right)+h^3 \log \left(4 h^2\right)+h+2 \pi \right). \end{aligned}\]

The unique root of the above expression in the interval \((0,1)\) is the onion constant, since it is a critical point for the function \(\sigma^2(h)\) and the sign of \([\sigma^2(h)]'\) changes from negative to positive at this point, as seen in the graph of \([\sigma^2(h)]'\) below.

With this, I can calculate the onion constant to arbitrary precision. Here it is to 1000 decimal places:

0.55730669298566447885109305914592718083200030207273275933982921319 4698135127210458697529556348892779238421515729764144366026144985585 4165046873271472618959107816152780606384065758548635804885244580180 0007394442805906736214054844087432881741438971785006588976790490992 3546045053996637979358236569783223477190862479127621607686248472908 3731336235000704236891376747519710815301807822317779086701048122723 0239150930543232987021503400654503271867566236420521560986469125085 8159370220537524022076834487502663198536347064463252552885622069125 8227307037720900190873707797080215945078389222941122441664099620992 6654693052663485088353188368234518499463417515539540122160704233743 5539919306999218795184234750992607153483541905867849402571200687099 2663407278202945110198402208378584410140122892631419360798953694134 2227610384234804380488890547391245831871629728678785899984149264095 1979084439023291773013425234306472822863355983488650721455375797473 6357343027167265972675903577598983959532796594227162648681839040…

Such a beautiful mathematical constant deserves a name. I choose to use the Hebrew character samekh, because it looks particularly like an onion.