Skip to article frontmatterSkip to article content

Weak lensing

Convergence field

Let’s introduce what can only be defined as the protagonist cosmological field of this thesis: the convergence field Dodelson & Schmidt (2021) Blandford et al. (1991) Kilbinger (2015) Bartelmann & Maturi (2017). As explained in the introduction weak lensing happens due to the presence of matter between us and a distant source. In mathematical terms we write this as an integral over the line of sight of the matter overdensity field, weighted accordingly.

κ(θ)=0χdχW(χ)δm(χθ,χ),\kappa(\theta) = \int_0^{\chi*}d\chi W(\chi)\delta_m(\chi \theta, \chi),
W(χ)=32H02Ωm(1+z(χ))χχχdχn(z(χ))(1χχ).W(\chi)=\tfrac{3}{2}H_0^2\Omega_m(1+z(\chi))\chi \int_\chi^{\chi*}d\chi' n(z(\chi'))\left(1-\frac{\chi}{\chi'}\right).

With W(χ)W(\chi) being a measure of the lensing weights; W(χ)W(\chi) incorporates all of the relevant cosmological parameters, as well as knowledge of the redshift distribution of the source galaxies n(z)n(z).

2-point statistics

2-point statistics such as correlation functions and power spectra are arguably the most powerful tool of analysis in cosmology. They work as a way to summarise raw data into something simpler, while still retaining most of the relevant information; for examples, allowing us to extract constraints on cosmological parameters.

Full sky

Let us start by introducing the statistics of a 3D spherically symmetric field, also known as a full sky field. The 2-point autocorrelation function is defined as the expectation value E\mathbb{E} of the product of a random field with its complex conjugate

w(x,y)E[ϕ(x)ϕ(y)].w(\bm{x}, \bm{y}) \equiv \mathbb{E}[\phi(\bm{x})\phi^*(\bm{y})].

Where ϕ(x)\phi(\bm{x}) is the 2D projection of a spherically symmetric 3D field. We can now perform a decomposition in spherical harmonics YmY_{\ell m}. Such a decomposition gives rise to the correlation function associated to the harmonic coefficients ϕm\phi_{\ell m}, the 2D power spectrum CC_\ell,

ϕ(x)=mϕmYm(x^),CδmδmE[ϕmϕm].\begin{gather} \phi(\bm{x}) = \sum_{\ell m}\phi_{\ell m} Y_{\ell m}(\hat{\bm{x}}),\\ C_\ell \equiv \delta_{\ell m}\delta_{\ell'm'}\mathbb{E}[ \phi_{\ell m}\phi^*_{\ell'm'}]. \end{gather}

In this setup correlation function and power spectrum are therefore related by,

E[ϕ(x)ϕ(y)]=mmYm(x^)Ym(y^)E[ϕmϕm]=mYm(x^)Ym(y^)C=2+14πCP(x^y^),\begin{align*} & \mathbb{E}[\phi(\bm{x})\phi^*(\bm{y})] \\ =& \sum_{\ell m}\sum_{\ell'm'}Y_{\ell m}(\hat{\bm{x}})Y^*_{\ell'm'}(\hat{\bm{y}}) \mathbb{E}[ \phi_{\ell m}\phi^*_{\ell'm'} ]\\ =& \sum_{\ell m}Y_{\ell m}(\hat{\bm{x}})Y^*_{\ell m}(\hat{\bm{y}}) C_\ell\\ =& \sum_{\ell}\frac{2\ell+1}{4\pi}C_\ell P_\ell(\hat{\bm{x}}\cdot\hat{\bm{y}}), \end{align*}

where we have used the identity relating Legendre polynomials to spherical harmonics P(x^y^)=4π2+1mYm(x^)Ym(y^)P_\ell(\hat{\bm{x}}\cdot\hat{\bm{y}}) = \frac{4\pi}{2\ell+1} \sum_m Y_{\ell m}(\hat{\bm{x}})Y^*_{\ell m}(\hat{\bm{y}}). Cleaning up the equations we are just left with the well known full sky equation

w(θ)=2+14πCP(cos(θ)),w(\theta) = \sum_{\ell}\frac{2\ell+1}{4\pi}C_\ell P_\ell (\cos(\theta)),

which relates the angular correlation function to the angular power spectrum.

Flat-sky

Most analysis of weak lensing data actually use the flat-sky approximation Nicola et al. (2021) Gao et al. (2023) Matthewson & Durrer (2021) Gao et al. (2023) García-García et al. (2019) Schneider, P. et al. (2002), our work will not be an exception. Such an approximation essentially changes our setup to a flat-sky 2D field, instead of the 2D projection of a 3D one, allowing for simpler analytical expressions. In this setup, harmonic decomposition will not work, we will have to Fourier transform our space instead. To do so, we can assume the existence of a function C(l)C(l) which is the continuous extension of CC_\ell. Then we say that C(l)C(l) is the result of a 2D inverse Fourier transformation F\mathcal{F},

{F1C}(θ)=d2l4π2eilθC(l).\{\mathcal{F}^{-1}C\}(\theta)=\int\frac{d^2l}{4\pi^2}e^{i\bm{l}\cdot\bm{\theta}}C(l).

To prove that {F1C}(θ)\{\mathcal{F}^{-1}C\}(\theta) is none other than the angular correlation function, we make use of the radial symmetry of the cosmological field. Which simplifies the equation into a 1D integral. Consider the polar coordinate substitution from (lx,ly)(l_x,l_y) to (l,ϕ)(l,\phi), the integral becomes

dl2πlC(l)dϕ2πeilθcosϕ.\int \frac{dl}{2\pi} l C(l) \int\frac{d\phi}{2\pi}e^{il\theta \cos{\phi}}.

Lastly, we make use of the identity dϕeilθcosϕ=2πJ0(lθ)\int d\phi e^{il\theta \cos{\phi}} = 2\pi J_0(l\theta), where J0J_0 is the zeroth order Bessel function. Which leaves us with

{F1C}(θ)=dl2πlC(l)J0(lθ).\{\mathcal{F}^{-1}C\}(\theta) = \int \frac{dl}{2\pi} l C(l) J_0(l\theta).

In this form it is clear that Eq. (6) and Eq. (8) are asymptotically equivalent, since it is known that J0(θ)P(cos(θ))J_0(\ell\theta) \xrightarrow{\quad} P_\ell (\cos(\theta)) as \ell \rightarrow \infty. This concludes our heuristic proof that the angular correlation function is recovered from an inverse Fourier transformation of the angular power spectrum w(θ){F1C}(θ)w(\theta) \sim \{\mathcal{F}^{-1}C\}(\theta) in the flat-sky approximation. Physically speaking this approximation makes sense when we consider a patch of sky of size L. The wavenumber of the flat-sky angular power spectrum is related to its dimensionless counterpart p by

l=2πLp,l = \frac{2\pi}{L}p,

in other words, it is inversely proportional to the size of the map. Just as we would expect the geometry of a sphere to become flat when zooming in, the flat-sky approximation holds as L0L \rightarrow 0.

Limber approximation

The way we computationally obtain κ\kappa’s 2-point statistics is with the Limber approximation. We use the code jaxcosmo Campagne et al. (2023) to compute the 2D power spectrum C(l)C(l) from the 3D matter power spectrum Pδ(k)P_\delta(k) with the efficient Limber approximation,

C(l)=Cκκ(l)=0χdχχW(χ)W(χ)Pδ(k=lχ,χ).C(l)=C_{\kappa\kappa}(l)=\int_0^{\chi*}\frac{d\chi}{\chi} W(\chi)W(\chi)P_\delta(k=\frac{l}{\chi},\chi).

The assumptions of the Limber approximation are:

Such assumptions allow to simplify the relation between C(l)C(l) and Pδ(k)P_\delta(k) to a one line integral, Eq. (10). It is noteworthy to mention that the Limber approximation introduces significant errors only for modes l<10l<10, as explained in detail in Gao et al. (2023). As far as this work is concerned, we only deal with patches of size 1010^{\circ}, which means we have l>36l>36 according to Eq. (9), well within the range for a good approximation.

Field generation

Gaussian random field

In order to simulate cosmological fields with a specific power spectrum, we make use of Gaussian random fields. They are fast to generate and only need information about the 2-point power spectrum of the field. The algorithm we use for the generation of GRFs is common to most packages and is explained in detail in Bertschinger (2001) Lang & Potthoff (2011). In 1D, assume a pair of functions functions ξ\xi and PP, related by a Fourier transformation

ξ(x,y)=dk2πeik(xy)P(k).\xi(x,y)=\int \frac{dk}{2\pi} e^{i k(x-y)} P(k).

Let W(x)W(x) be a Gaussian white noise field and let F\mathcal{F} denote a Fourier transformation, we then define

ϕ(x)(F1P1/2FW)(x)\phi(x) \equiv (\mathcal{F}^{-1} P^{1/2}\mathcal{F}W)(x)

to be a Gaussian random field. Such a procedure ensures that the covariance E[ϕ(x)ϕ(y)]\mathbb{E}\left[\phi(x)\phi(y)\right] of the GRF recovers the correlation function ξ(x,y)\xi(x,y),

Undefined control sequence: \iiiint at position 61: …y)\right] \\
=&\̲i̲i̲i̲i̲n̲t̲ ̲dx' dy' \frac{d…

\begin{align*}
 &\mathbb{E}\left[\phi(x)\phi(y)\right] \\
=&\iiiint dx' dy' \frac{dk}{2\pi} \frac{dl}{2\pi}  \, e^{i(kx + ly)} P(k)^{1/2} P(l)^{1/2} e^{-i(kx' + ly')} \mathbb{E}[W(x')W(y')] \\
=& \iint \frac{dk}{2\pi} \frac{dl}{2\pi} \, e^{i(kx + ly)} P(k)^{1/2} P(l)^{1/2} \int dx' \, e^{-i(k+l)x'}   \\
=& \int \frac{dk}{2\pi} \, e^{i k(x-y)} P(k)  \\
=& \, \xi(x, y).
\end{align*}

Where we have used E[W(x)W(y)]=δ(xy)\mathbb{E}[W(x')W(y')] = \delta(x'-y'), as white noise is defined by a constant power spectrum.

Rayleigh distribution

Since we are dealing with 2D maps, our algorithm will have to be implemented in two dimensions. To do so, we decide to implement the algorithm in Eq. (11) with the use of the Rayleigh distribution R(σ)\mathcal{R}(\sigma). Given two independent Gaussian random variables XX and YY, the random variable RR given by

R=X2+Y2,R=\sqrt{X^2+Y^2},

is said to be Rayleigh distributed. If we then multiply RR by the complex exponential eiθe^{i\theta} of a uniformly distributed random variable θU(0,2π)\theta \sim \mathcal{U}(0,2\pi), we obtain a map of Gaussianly distributed complex numbers, which will substitute the FW\mathcal{F}W term in Eq. (11). We showcase this equivalency in Fig. 2 for distributions of μ=0\mu=0 and σ=1\sigma=1. The sequence of transformations in 2D therefore becomes,

ϕ(x)(F1P1/2Reiθ)(x).\phi(x) \equiv (\mathcal{F}^{-1} P^{1/2} R e^{i\theta})(x).
Sampling a 2D Gaussian against a Rayleigh distributed amplitude with uniform complex phase. In purple is the complex number X+iY with X,Y \sim \mathcal{N}(0,1), in cyan is Re^{i\theta} with R\sim\mathcal{R}(1) and \theta \sim \mathcal{U}(0,2\pi).

Figure 2:Sampling a 2D Gaussian against a Rayleigh distributed amplitude with uniform complex phase. In purple is the complex number X+iYX+iY with X,YN(0,1)X,Y \sim \mathcal{N}(0,1), in cyan is ReiθRe^{i\theta} with RR(1)R\sim\mathcal{R}(1) and θU(0,2π)\theta \sim \mathcal{U}(0,2\pi).

Lognormal field

The universe today is not Gaussian. GRFs are able to capture the 2-point statistics of cosmological fields, but they cannot capture their skewed distribution. A possible way to get closer to the true distribution of the convergence field is with a lognormal transformation Boruah et al. (2022) Leclercq & Heavens (2021) Hilbert, S. et al. (2011) Zhou et al. (2023). In this work we will denote lognormal transformations as L\mathcal{L} and adopt the following convention for both the field and the correlation function,

Lw1(wL,a)log(wLa2+1)=wG,Lκ(κG,a)a(exp(κG12Var(κG))1)=κL.\begin{gather} \mathcal{L}_w^{-1}(w^{L}, a) \equiv \log \left(\frac{w^{L}}{a^2}+1\right) = w^G,\\ \mathcal{L}_\kappa(\kappa^G, a) \equiv a\left(\exp(\kappa^G-\tfrac{1}{2}\text{Var}(\kappa^G)) -1\right) = \kappa^{L}. \end{gather}

Where the superscripts L and G respectively stand for lognormal and Gaussian, Var(\cdot) stands for the variance of the field and aa is the so-called shift parameter, which is an indicator of the non-Gaussianity of the resulting field.

Gaussian process

One way to think of Gaussian processes is as an extension of random vectors to infinite dimensions. Following this train of thought, let’s begin with the concept of a random variable following a normal distribution. We say,

XN(μ,σ2),X \sim \mathcal{N}\left(\mu, \sigma^2\right),

to mean that XX is a sample of a Gaussian of mean μ\mu and variance σ2\sigma^2. If we were to get enough samples XX, we would eventually recover its distribution. The generalisation of this concept to n-dimensions is a collection of random variables, described by a so-called multivariate normal distribution,

XN(μ,K).\bm{X} \sim \mathcal{N}\left(\bm{\mu}, \bm{K}\right).

Where X=(X0,X1,...)\bm{X}=(X_0,X_1,...) is a vector of random variables, μ\bm{\mu} is the mean vector and K\bm{K} the covariance matrix. For a zero mean field, the covariance matrix is formed by the variance of each of the random variables on its diagonal, while the cross correlation terms populate the rest of the matrix.

Definition

Adopting the philosophy of Rasmussen Rasmussen & Williams (2005), functions can be thought of as very long vectors of the form (f(x1),f(x2),...)\left(f(x_1), f(x_2), ...\right). Such a view allows us to extend the definition of multivariate Gaussians, to functions. Defining GP\mathcal{GP} a Gaussian process, a sample function ff will be given by:

f(x)GP(m(x),k(x,x))f(\bm{x}) \sim \mathcal{GP}\left(m(\bm{x}), k(\bm{x}, \bm{x'})\right)

with m(x)m(\bm{x}) and k(x,x)k(\bm{x}, \bm{x'}) defined as,

m(x)=E[f(x)],k(x,x)=E[(f(x)m(x))(f(x)m(x))],\begin{gather} m(\bm{x})=\mathbb{E}[f(\bm{x})],\\ k(\bm{x}, \bm{x'})=\mathbb{E}[(f(\bm{x})-m(\bm{x}))(f(\bm{x}')-m(\bm{x}'))], \end{gather}

Mathematically a GP is defined for a continuous function. Computationally this is not possible and we must treat space as a discrete grid.

Prior and posterior samples

Given some m()m(\cdot) and k(,)k(\cdot,\cdot) which define a GP, a random sample from said GP would be a function f\bm{f}_* defined on a domain DD_*, which is our grid.

fN(m,K).\bm{f}_* \sim \mathcal{N}(\bm{m}, \bm{K}_{**}).

Here we adopt the convention K=k(D,D)\bm{K} = k(D_*,D_*). When drawing a sample function from Eq. (17), computationally the operation is equivalent to drawing a vector from a multivariate Gaussian. What we obtain is a so-called prior, or priors, see Fig. 3.

Prior samples of a GP with mean m(\bm{x})=0 and squared exponential kernel k(\bm{x},\bm{x'})=e^{-(\bm{x}-\bm{x'})^2}.

Figure 3:Prior samples of a GP with mean m(x)=0m(\bm{x})=0 and squared exponential kernel k(x,x)=e(xx)2k(\bm{x},\bm{x'})=e^{-(\bm{x}-\bm{x'})^2}.

Let’s now see how we can introduce knowledge of data points in this system. We divide the grid in training points DD and test points DD_*. To each training point is associated a known value y\bm{y} and variance σn2\sigma_n^2, whereas the values of the function at the test points f\bm{f}_* are unknown. We can summarise this as,

[yf]N(m,[K+σn2IKKTK])\begin{bmatrix} \bm{y} \\ \bm{f}_* \\ \end{bmatrix} \sim \mathcal{N}\left(\bm{m}, \begin{bmatrix} \bm{K} + \sigma_n^2 I & \bm{K}_* \\ \bm{K}^T_* & \bm{K}_{**} \\ \end{bmatrix}\right)

where we have once again adopted the notation K=k(D,D)\bm{K}=k(D,D), K=k(D,D)\bm{K_{*}}=k(D,D*), K=k(D,D)\bm{K_{**}}=k(D_*,D_*). At this point, one way to find samples that follow the data would be to blindly draw priors until we get something that goes through all data points. This would be inefficient and computationally wasteful. Instead, we make a better guess for the test function values. This operation is called conditioning, because we condition the joint Gaussians on the training points, this gives

fD,D,yN(KT[K+σn2I]1y,KKT[K+σn2I]1K).\bm{f}_* \mid D_*, D, \bm{y} \sim \mathcal{N}\left( \bm{K}^T_* [\bm{K} + \sigma_n^2 I]^{-1} \bm{y} , \bm{K}_{**} - \bm{K}^T_* [\bm{K} + \sigma_n^2 I]^{-1} \bm{K}_* \right).

Conditioning can therefore give rise to what is called a posterior sample, Fig. 4. The result is still a multivariate Gaussian, but the mean and variance given by Eq. (19) generate samples that are a better guess of the behaviour of the function outside of the training points.

Summary plot of a GP conditioned to some data. The cyan line is the mean of the GP and the filled region corresponds to 1\sigma. The purple lines are posterior samples, which are distributed Gaussianly around the mean. The data points are clearly marked in black, they are also the points where all samples converge to.

Figure 4:Summary plot of a GP conditioned to some data. The cyan line is the mean of the GP and the filled region corresponds to 1σ1\sigma. The purple lines are posterior samples, which are distributed Gaussianly around the mean. The data points are clearly marked in black, they are also the points where all samples converge to.

References
  1. Dodelson, S., & Schmidt, F. (2021). 13 - Probes of structure: lensing. In S. Dodelson & F. Schmidt (Eds.), Modern Cosmology (Second Edition) (Second Edition, pp. 373–399). Academic Press. https://doi.org/10.1016/B978-0-12-815948-4.00019-X
  2. Blandford, R. D., Saust, A. B., Brainerd, T. G., & Villumsen, J. V. (1991). The distortion of distant galaxy images by large scale structure. AIP Conference Proceedings, 222(1), 455–458. 10.1063/1.40414
  3. Kilbinger, M. (2015). Cosmology with cosmic shear observations: a review. Reports on Progress in Physics, 78(8), 086901. 10.1088/0034-4885/78/8/086901
  4. Bartelmann, M., & Maturi, M. (2017). Weak gravitational lensing. Scholarpedia, 12(1), 32440. 10.4249/scholarpedia.32440
  5. Nicola, A., García-García, C., Alonso, D., Dunkley, J., Ferreira, P. G., Slosar, A., & Spergel, D. N. (2021). Cosmic shear power spectra in practice. Journal of Cosmology and Astroparticle Physics, 2021(03), 067. 10.1088/1475-7516/2021/03/067
  6. Gao, Z., Raccanelli, A., & Vlah, Z. (2023). Asymptotic connection between full- and flat-sky angular correlators. Phys. Rev. D, 108(4), 043503. 10.1103/PhysRevD.108.043503
  7. Matthewson, W. L., & Durrer, R. (2021). The flat sky approximation to galaxy number counts. Journal of Cosmology and Astroparticle Physics, 2021(02), 027. 10.1088/1475-7516/2021/02/027
  8. Gao, Z., Vlah, Z., & Challinor, A. (2023). Flat-sky Angular Power Spectra Revisited.
  9. García-García, C., Alonso, D., & Bellini, E. (2019). Disconnected pseudo-C\ell covariances for projected large-scale structure data. Journal of Cosmology and Astroparticle Physics, 2019(11), 043. 10.1088/1475-7516/2019/11/043
  10. Schneider, P., van Waerbeke, L., Kilbinger, M., & Mellier, Y. (2002). Analysis of two-point statistics of cosmic shear - I. Estimators and covariances. A&A, 396(1), 1–19. 10.1051/0004-6361:20021341
  11. Campagne, J.-E., Lanusse, F., Zuntz, J., Boucaud, A., Casas, S., Karamanis, M., Kirkby, D., Lanzieri, D., Peel, A., & Li, Y. (2023). JAX-COSMO: An End-to-End Differentiable and GPU Accelerated Cosmology Library. The Open Journal of Astrophysics, 6. 10.21105/astro.2302.05163
  12. Bertschinger, E. (2001). Multiscale Gaussian Random Fields and Their Application to Cosmological Simulations. The Astrophysical Journal Supplement Series, 137(1), 1. 10.1086/322526
  13. Lang, A., & Potthoff, J. (2011). Monte Carlo Methods and Applications, 17(3), 195–214. doi:10.1515/mcma.2011.009
  14. Boruah, S. S., Rozo, E., & Fiedorowicz, P. (2022). Map-based cosmology inference with lognormal cosmic shear maps.
  15. Leclercq, F., & Heavens, A. (2021). On the accuracy and precision of correlation functions and field-level inference in cosmology. Monthly Notices of the Royal Astronomical Society: Letters, 506(1), L85–L90. 10.1093/mnrasl/slab081