Skip to article frontmatterSkip to article content

Weak lensing map inference: a physics informed Gaussian process approach

Abstract

In this work we propose the use of physically informed Gaussian processes (GP) to analyse cosmological fields at the map level. We will show that GPs can capture the statistical behaviour of cosmological fields, providing us with a likelihood as a function of the cosmological parameters conditioned to the map. In practice, we set the Gaussian process kernel to be the 2-point autocorrelation function associated to a 2D discrete flat-sky convergence map. We find that a GP in this setup is not only able to generate maps with the wanted 2-point statistics, but also to reconstruct masked data with an associated uncertainty. Additionally, we perform a Bayesian inference analysis in order to test the ability of Gaussian processes to recover cosmological parameters. We find that we are able to consistently recover the σ8\sigma_8 and Ωm\Omega_m degeneracy, recovering S8S_8 within two sigma uncertainty. The data is simulated by a Gaussian random field realisation of a convergence map of size (10,10)(10^{\circ},10^{\circ}), 64×6464\times64 grid, mask 10%\sim 10\% and noise given by a galaxy density of ng=10 galaxies/arcmin2n_g=10\text{ galaxies}/\text{arcmin}^2.

keywords: Gaussian processes, weak lensing.

Keywords:Gaussian processesweak lensing

Matter bends light. The theory of general relativity predicts that the presence of matter or energy changes the geometry of space and time, which in turn can cause what would otherwise be the straight path of a beam of light to curve. Take a distant source of light. If we assume the universe not to be empty, then between us and said source there exists a non trivial matter disposition. As light travels through everything that is in between us and the source, it gets blocked, bent and distorted. We call this phenomenon weak lensing. In practice what we measure is the shape of distant galaxies. These images do not show a distinct lensing feature individually, as such tiny changes can only be seen with a large number of sources. For example, we observe that galaxies have a tendency of aligning along a preferred axis, causing a statistical discrepancy in an otherwise seemingly isotropic universe. For further details on lensing, please refer to Kilbinger (2015) Bartelmann & Maturi (2017); for weak lensing Blandford et al. (1991). The image of a distant galaxy can change shape or size. Changes in shape are fully characterised by the shear distortion \Vec{\gamma} vector field, where the change in size is given by its magnitude, the convergence field κ\kappa. We lay out a theoretical framework for weak lensing in Sec. ?? as well as details on the cosmology used in this thesis in Sec. ??.

A lot of work goes into translating a measurement of distant galaxies to its mathematically friendly counterpart \Vec{\gamma}. This is one of the reasons why we will not be dealing with it in this thesis, as it is outside of our scope. Instead we will be simulating our own convergence fields. We use a Gaussian random field (GRF) algorithm for the creation of Gaussianly distributed data. As well as lognormal transformations to create fields with a distribution that resembles more closely that in our universe. These transformations are listed in Sec. ??; we then use them to simulate our data as explained in Sec. ??. We also verify that the generated fields recover the fiducial power spectrum in Sec. ??.

Figure 1:Simplified steps taken by the model to go from cosmological parameters Θ\Theta to likelihood L\mathcal{L} using GP. Here CC and ww stand for the power spectrum and correlation function respectively, yy is the data.

A Gaussian process (GP) usually assumes little prior knowledge about the data it is applied to. Current research in the field of cosmology views GPs as a machine learning tool to be trained. It is used to accelerate and optimise models Mootoovaloo et al. (2020) Boruah et al. (2022) Karchev et al. (2022), as well as for its interpolation qualities applied to the reconstruction of functions determining the evolution of the universe Shafieloo et al. (2012) Seikel et al. (2012) Holsclaw et al. (2010). Our work, however, is based on a different approach. We apply our prior knowledge about 2-point statistics in cosmology to create a fully informed GP. Restricting ourselves to 2D flat-sky weak lensing convergence fields, as shown in Fig. 1, we can:

With a Bayesian approach we make use of this pipeline to infer the values of the cosmological parameters. Running a Markov chain Monte Carlo (MCMC) we can sample the posterior distribution of the cosmological parameters, in particular we will get contours for Ωm\Omega_m, σ8\sigma_8 and S8S_8. Other than that, GPs have several other interesting properties at the field level. They are not only able to generate fields that recover the fiducial 2-point statistics, but are also able to reconstruct masked fields, a task that usually brings many challenges to CC_\ell estimation Chon et al. (2004) Brown et al. (2005). In the field of weak lensing in particular, foreground objects like bright stars or galaxies can contaminate measurements, leading to the need of masking such a region, essentially removing the signal.

Here we list the advantages of our method:

Whilst some of the disadvantages:

GPs in this thesis are presented in a general introduction in Sec. ??, followed by a detailed account of the computational methods used to recover a working kernel for GPs in Sec. ??. In our results we show their ability to create maps that follow the desired statistic Sec. ?? and reconstruct data Sec. ??. We also present our attempt at cosmological parameters inference with GPs in Sec. ??.

It is important to note that throughout the thesis we follow the extremely useful guidelines set by the Miko pipeline Zhou et al. (2023) on how to deal with discrete maps in weak lensing.

Weak lensing

Convergence field

Let’s introduce what can only be defined as the protagonist cosmological field of this thesis: the convergence field Dodelson & Schmidt (2021) Blandford et al. (1991) Kilbinger (2015) Bartelmann & Maturi (2017). As explained in the introduction weak lensing happens due to the presence of matter between us and a distant source. In mathematical terms we write this as an integral over the line of sight of the matter overdensity field, weighted accordingly.

κ(θ)=0χdχW(χ)δm(χθ,χ),\kappa(\theta) = \int_0^{\chi*}d\chi W(\chi)\delta_m(\chi \theta, \chi),
W(χ)=32H02Ωm(1+z(χ))χχχdχn(z(χ))(1χχ).W(\chi)=\tfrac{3}{2}H_0^2\Omega_m(1+z(\chi))\chi \int_\chi^{\chi*}d\chi' n(z(\chi'))\left(1-\frac{\chi}{\chi'}\right).

With W(χ)W(\chi) being a measure of the lensing weights; W(χ)W(\chi) incorporates all of the relevant cosmological parameters, as well as knowledge of the redshift distribution of the source galaxies n(z)n(z).

2-point statistics

2-point statistics such as correlation functions and power spectra are arguably the most powerful tool of analysis in cosmology. They work as a way to summarise raw data into something simpler, while still retaining most of the relevant information; for examples, allowing us to extract constraints on cosmological parameters.

Full sky

Let us start by introducing the statistics of a 3D spherically symmetric field, also known as a full sky field. The 2-point autocorrelation function is defined as the expectation value E\mathbb{E} of the product of a random field with its complex conjugate

w(x,y)E[ϕ(x)ϕ(y)].w(\bm{x}, \bm{y}) \equiv \mathbb{E}[\phi(\bm{x})\phi^*(\bm{y})].

Where ϕ(x)\phi(\bm{x}) is the 2D projection of a spherically symmetric 3D field. We can now perform a decomposition in spherical harmonics YmY_{\ell m}. Such a decomposition gives rise to the correlation function associated to the harmonic coefficients ϕm\phi_{\ell m}, the 2D power spectrum CC_\ell,

ϕ(x)=mϕmYm(x^),CδmδmE[ϕmϕm].\begin{gather} \phi(\bm{x}) = \sum_{\ell m}\phi_{\ell m} Y_{\ell m}(\hat{\bm{x}}),\\ C_\ell \equiv \delta_{\ell m}\delta_{\ell'm'}\mathbb{E}[ \phi_{\ell m}\phi^*_{\ell'm'}]. \end{gather}

In this setup correlation function and power spectrum are therefore related by,

E[ϕ(x)ϕ(y)]=mmYm(x^)Ym(y^)E[ϕmϕm]=mYm(x^)Ym(y^)C=2+14πCP(x^y^),\begin{align*} & \mathbb{E}[\phi(\bm{x})\phi^*(\bm{y})] \\ =& \sum_{\ell m}\sum_{\ell'm'}Y_{\ell m}(\hat{\bm{x}})Y^*_{\ell'm'}(\hat{\bm{y}}) \mathbb{E}[ \phi_{\ell m}\phi^*_{\ell'm'} ]\\ =& \sum_{\ell m}Y_{\ell m}(\hat{\bm{x}})Y^*_{\ell m}(\hat{\bm{y}}) C_\ell\\ =& \sum_{\ell}\frac{2\ell+1}{4\pi}C_\ell P_\ell(\hat{\bm{x}}\cdot\hat{\bm{y}}), \end{align*}

where we have used the identity relating Legendre polynomials to spherical harmonics P(x^y^)=4π2+1mYm(x^)Ym(y^)P_\ell(\hat{\bm{x}}\cdot\hat{\bm{y}}) = \frac{4\pi}{2\ell+1} \sum_m Y_{\ell m}(\hat{\bm{x}})Y^*_{\ell m}(\hat{\bm{y}}). Cleaning up the equations we are just left with the well known full sky equation

w(θ)=2+14πCP(cos(θ)),w(\theta) = \sum_{\ell}\frac{2\ell+1}{4\pi}C_\ell P_\ell (\cos(\theta)),

which relates the angular correlation function to the angular power spectrum.

Flat-sky

Most analysis of weak lensing data actually use the flat-sky approximation Nicola et al. (2021) Gao et al. (2023) Matthewson & Durrer (2021) Gao et al. (2023) García-García et al. (2019) Schneider, P. et al. (2002), our work will not be an exception. Such an approximation essentially changes our setup to a flat-sky 2D field, instead of the 2D projection of a 3D one, allowing for simpler analytical expressions. In this setup, harmonic decomposition will not work, we will have to Fourier transform our space instead. To do so, we can assume the existence of a function C(l)C(l) which is the continuous extension of CC_\ell. Then we say that C(l)C(l) is the result of a 2D inverse Fourier transformation F\mathcal{F},

{F1C}(θ)=d2l4π2eilθC(l).\{\mathcal{F}^{-1}C\}(\theta)=\int\frac{d^2l}{4\pi^2}e^{i\bm{l}\cdot\bm{\theta}}C(l).

To prove that {F1C}(θ)\{\mathcal{F}^{-1}C\}(\theta) is none other than the angular correlation function, we make use of the radial symmetry of the cosmological field. Which simplifies the equation into a 1D integral. Consider the polar coordinate substitution from (lx,ly)(l_x,l_y) to (l,ϕ)(l,\phi), the integral becomes

dl2πlC(l)dϕ2πeilθcosϕ.\int \frac{dl}{2\pi} l C(l) \int\frac{d\phi}{2\pi}e^{il\theta \cos{\phi}}.

Lastly, we make use of the identity dϕeilθcosϕ=2πJ0(lθ)\int d\phi e^{il\theta \cos{\phi}} = 2\pi J_0(l\theta), where J0J_0 is the zeroth order Bessel function. Which leaves us with

{F1C}(θ)=dl2πlC(l)J0(lθ).\{\mathcal{F}^{-1}C\}(\theta) = \int \frac{dl}{2\pi} l C(l) J_0(l\theta).

In this form it is clear that Eq. (6) and Eq. (8) are asymptotically equivalent, since it is known that J0(θ)P(cos(θ))J_0(\ell\theta) \xrightarrow{\quad} P_\ell (\cos(\theta)) as \ell \rightarrow \infty. This concludes our heuristic proof that the angular correlation function is recovered from an inverse Fourier transformation of the angular power spectrum w(θ){F1C}(θ)w(\theta) \sim \{\mathcal{F}^{-1}C\}(\theta) in the flat-sky approximation. Physically speaking this approximation makes sense when we consider a patch of sky of size L. The wavenumber of the flat-sky angular power spectrum is related to its dimensionless counterpart p by

l=2πLp,l = \frac{2\pi}{L}p,

in other words, it is inversely proportional to the size of the map. Just as we would expect the geometry of a sphere to become flat when zooming in, the flat-sky approximation holds as L0L \rightarrow 0.

Limber approximation

The way we computationally obtain κ\kappa’s 2-point statistics is with the Limber approximation. We use the code jaxcosmo Campagne et al. (2023) to compute the 2D power spectrum C(l)C(l) from the 3D matter power spectrum Pδ(k)P_\delta(k) with the efficient Limber approximation,

C(l)=Cκκ(l)=0χdχχW(χ)W(χ)Pδ(k=lχ,χ).C(l)=C_{\kappa\kappa}(l)=\int_0^{\chi*}\frac{d\chi}{\chi} W(\chi)W(\chi)P_\delta(k=\frac{l}{\chi},\chi).

The assumptions of the Limber approximation are:

Such assumptions allow to simplify the relation between C(l)C(l) and Pδ(k)P_\delta(k) to a one line integral, Eq. (10). It is noteworthy to mention that the Limber approximation introduces significant errors only for modes l<10l<10, as explained in detail in Gao et al. (2023). As far as this work is concerned, we only deal with patches of size 1010^{\circ}, which means we have l>36l>36 according to Eq. (9), well within the range for a good approximation.

Field generation

Gaussian random field

In order to simulate cosmological fields with a specific power spectrum, we make use of Gaussian random fields. They are fast to generate and only need information about the 2-point power spectrum of the field. The algorithm we use for the generation of GRFs is common to most packages and is explained in detail in Bertschinger (2001) Lang & Potthoff (2011). In 1D, assume a pair of functions functions ξ\xi and PP, related by a Fourier transformation

ξ(x,y)=dk2πeik(xy)P(k).\xi(x,y)=\int \frac{dk}{2\pi} e^{i k(x-y)} P(k).

Let W(x)W(x) be a Gaussian white noise field and let F\mathcal{F} denote a Fourier transformation, we then define

ϕ(x)(F1P1/2FW)(x)\phi(x) \equiv (\mathcal{F}^{-1} P^{1/2}\mathcal{F}W)(x)

to be a Gaussian random field. Such a procedure ensures that the covariance E[ϕ(x)ϕ(y)]\mathbb{E}\left[\phi(x)\phi(y)\right] of the GRF recovers the correlation function ξ(x,y)\xi(x,y),

Undefined control sequence: \iiiint at position 61: …y)\right] \\
=&\̲i̲i̲i̲i̲n̲t̲ ̲dx' dy' \frac{d…

\begin{align*}
 &\mathbb{E}\left[\phi(x)\phi(y)\right] \\
=&\iiiint dx' dy' \frac{dk}{2\pi} \frac{dl}{2\pi}  \, e^{i(kx + ly)} P(k)^{1/2} P(l)^{1/2} e^{-i(kx' + ly')} \mathbb{E}[W(x')W(y')] \\
=& \iint \frac{dk}{2\pi} \frac{dl}{2\pi} \, e^{i(kx + ly)} P(k)^{1/2} P(l)^{1/2} \int dx' \, e^{-i(k+l)x'}   \\
=& \int \frac{dk}{2\pi} \, e^{i k(x-y)} P(k)  \\
=& \, \xi(x, y).
\end{align*}

Where we have used E[W(x)W(y)]=δ(xy)\mathbb{E}[W(x')W(y')] = \delta(x'-y'), as white noise is defined by a constant power spectrum.

Rayleigh distribution

Since we are dealing with 2D maps, our algorithm will have to be implemented in two dimensions. To do so, we decide to implement the algorithm in Eq. (11) with the use of the Rayleigh distribution R(σ)\mathcal{R}(\sigma). Given two independent Gaussian random variables XX and YY, the random variable RR given by

R=X2+Y2,R=\sqrt{X^2+Y^2},

is said to be Rayleigh distributed. If we then multiply RR by the complex exponential eiθe^{i\theta} of a uniformly distributed random variable θU(0,2π)\theta \sim \mathcal{U}(0,2\pi), we obtain a map of Gaussianly distributed complex numbers, which will substitute the FW\mathcal{F}W term in Eq. (11). We showcase this equivalency in Fig. 2 for distributions of μ=0\mu=0 and σ=1\sigma=1. The sequence of transformations in 2D therefore becomes,

ϕ(x)(F1P1/2Reiθ)(x).\phi(x) \equiv (\mathcal{F}^{-1} P^{1/2} R e^{i\theta})(x).
Sampling a 2D Gaussian against a Rayleigh distributed amplitude with uniform complex phase. In purple is the complex number X+iY with X,Y \sim \mathcal{N}(0,1), in cyan is Re^{i\theta} with R\sim\mathcal{R}(1) and \theta \sim \mathcal{U}(0,2\pi).

Figure 2:Sampling a 2D Gaussian against a Rayleigh distributed amplitude with uniform complex phase. In purple is the complex number X+iYX+iY with X,YN(0,1)X,Y \sim \mathcal{N}(0,1), in cyan is ReiθRe^{i\theta} with RR(1)R\sim\mathcal{R}(1) and θU(0,2π)\theta \sim \mathcal{U}(0,2\pi).

Lognormal field

The universe today is not Gaussian. GRFs are able to capture the 2-point statistics of cosmological fields, but they cannot capture their skewed distribution. A possible way to get closer to the true distribution of the convergence field is with a lognormal transformation Boruah et al. (2022) Leclercq & Heavens (2021) Hilbert, S. et al. (2011) Zhou et al. (2023). In this work we will denote lognormal transformations as L\mathcal{L} and adopt the following convention for both the field and the correlation function,

Lw1(wL,a)log(wLa2+1)=wG,Lκ(κG,a)a(exp(κG12Var(κG))1)=κL.\begin{gather} \mathcal{L}_w^{-1}(w^{L}, a) \equiv \log \left(\frac{w^{L}}{a^2}+1\right) = w^G,\\ \mathcal{L}_\kappa(\kappa^G, a) \equiv a\left(\exp(\kappa^G-\tfrac{1}{2}\text{Var}(\kappa^G)) -1\right) = \kappa^{L}. \end{gather}

Where the superscripts L and G respectively stand for lognormal and Gaussian, Var(\cdot) stands for the variance of the field and aa is the so-called shift parameter, which is an indicator of the non-Gaussianity of the resulting field.

Gaussian process

One way to think of Gaussian processes is as an extension of random vectors to infinite dimensions. Following this train of thought, let’s begin with the concept of a random variable following a normal distribution. We say,

XN(μ,σ2),X \sim \mathcal{N}\left(\mu, \sigma^2\right),

to mean that XX is a sample of a Gaussian of mean μ\mu and variance σ2\sigma^2. If we were to get enough samples XX, we would eventually recover its distribution. The generalisation of this concept to n-dimensions is a collection of random variables, described by a so-called multivariate normal distribution,

XN(μ,K).\bm{X} \sim \mathcal{N}\left(\bm{\mu}, \bm{K}\right).

Where X=(X0,X1,...)\bm{X}=(X_0,X_1,...) is a vector of random variables, μ\bm{\mu} is the mean vector and K\bm{K} the covariance matrix. For a zero mean field, the covariance matrix is formed by the variance of each of the random variables on its diagonal, while the cross correlation terms populate the rest of the matrix.

Definition

Adopting the philosophy of Rasmussen Rasmussen & Williams (2005), functions can be thought of as very long vectors of the form (f(x1),f(x2),...)\left(f(x_1), f(x_2), ...\right). Such a view allows us to extend the definition of multivariate Gaussians, to functions. Defining GP\mathcal{GP} a Gaussian process, a sample function ff will be given by:

f(x)GP(m(x),k(x,x))f(\bm{x}) \sim \mathcal{GP}\left(m(\bm{x}), k(\bm{x}, \bm{x'})\right)

with m(x)m(\bm{x}) and k(x,x)k(\bm{x}, \bm{x'}) defined as,

m(x)=E[f(x)],k(x,x)=E[(f(x)m(x))(f(x)m(x))],\begin{gather} m(\bm{x})=\mathbb{E}[f(\bm{x})],\\ k(\bm{x}, \bm{x'})=\mathbb{E}[(f(\bm{x})-m(\bm{x}))(f(\bm{x}')-m(\bm{x}'))], \end{gather}

Mathematically a GP is defined for a continuous function. Computationally this is not possible and we must treat space as a discrete grid.

Prior and posterior samples

Given some m()m(\cdot) and k(,)k(\cdot,\cdot) which define a GP, a random sample from said GP would be a function f\bm{f}_* defined on a domain DD_*, which is our grid.

fN(m,K).\bm{f}_* \sim \mathcal{N}(\bm{m}, \bm{K}_{**}).

Here we adopt the convention K=k(D,D)\bm{K} = k(D_*,D_*). When drawing a sample function from Eq. (17), computationally the operation is equivalent to drawing a vector from a multivariate Gaussian. What we obtain is a so-called prior, or priors, see Fig. 3.

Prior samples of a GP with mean m(\bm{x})=0 and squared exponential kernel k(\bm{x},\bm{x'})=e^{-(\bm{x}-\bm{x'})^2}.

Figure 3:Prior samples of a GP with mean m(x)=0m(\bm{x})=0 and squared exponential kernel k(x,x)=e(xx)2k(\bm{x},\bm{x'})=e^{-(\bm{x}-\bm{x'})^2}.

Let’s now see how we can introduce knowledge of data points in this system. We divide the grid in training points DD and test points DD_*. To each training point is associated a known value y\bm{y} and variance σn2\sigma_n^2, whereas the values of the function at the test points f\bm{f}_* are unknown. We can summarise this as,

[yf]N(m,[K+σn2IKKTK])\begin{bmatrix} \bm{y} \\ \bm{f}_* \\ \end{bmatrix} \sim \mathcal{N}\left(\bm{m}, \begin{bmatrix} \bm{K} + \sigma_n^2 I & \bm{K}_* \\ \bm{K}^T_* & \bm{K}_{**} \\ \end{bmatrix}\right)

where we have once again adopted the notation K=k(D,D)\bm{K}=k(D,D), K=k(D,D)\bm{K_{*}}=k(D,D*), K=k(D,D)\bm{K_{**}}=k(D_*,D_*). At this point, one way to find samples that follow the data would be to blindly draw priors until we get something that goes through all data points. This would be inefficient and computationally wasteful. Instead, we make a better guess for the test function values. This operation is called conditioning, because we condition the joint Gaussians on the training points, this gives

fD,D,yN(KT[K+σn2I]1y,KKT[K+σn2I]1K).\bm{f}_* \mid D_*, D, \bm{y} \sim \mathcal{N}\left( \bm{K}^T_* [\bm{K} + \sigma_n^2 I]^{-1} \bm{y} , \bm{K}_{**} - \bm{K}^T_* [\bm{K} + \sigma_n^2 I]^{-1} \bm{K}_* \right).

Conditioning can therefore give rise to what is called a posterior sample, Fig. 4. The result is still a multivariate Gaussian, but the mean and variance given by Eq. (19) generate samples that are a better guess of the behaviour of the function outside of the training points.

Summary plot of a GP conditioned to some data. The cyan line is the mean of the GP and the filled region corresponds to 1\sigma. The purple lines are posterior samples, which are distributed Gaussianly around the mean. The data points are clearly marked in black, they are also the points where all samples converge to.

Figure 4:Summary plot of a GP conditioned to some data. The cyan line is the mean of the GP and the filled region corresponds to 1σ1\sigma. The purple lines are posterior samples, which are distributed Gaussianly around the mean. The data points are clearly marked in black, they are also the points where all samples converge to.

Data simulation

Cosmology

This work follows in the general footsteps of data analysis of weak lensing surveys like HSC Hikage et al. (2019) and KiDS Köhlinger et al. (2017)Asgari, Marika et al. (2021), as we aim to replicate their methodologies. Throughout the work, we assume a fiducial cosmology of fixed parameter values as shown in Tab. %s. In particular, all data maps will be generated from a power spectrum following this cosmology. We will refer to this power spectrum as the fiducial power spectrum C(l)C(l). A leading modelling choice comes with the redshift distribution n(z)n(z). We model it as a Smail-type distribution Smail et al. (1995)Kirk et al. (2012),

n(z)=zαexp[(zz0)β].n(z)=z^\alpha \exp{\left[-\left(\frac{z}{z_0}\right)^\beta\right]}.

The choice of parameters has been made to emulate bin 5 of the KiDS1000 survey Hildebrandt, H. et al. (2021), with parameters α=3.5\alpha=3.5, β=4.5\beta=4.5, z0=1z_0=1.

Smail-type redshift distribution for the chosen parameters: \alpha=3.5, \beta=4.5, z_0=1.

Figure 5:Smail-type redshift distribution for the chosen parameters: α=3.5\alpha=3.5, β=4.5\beta=4.5, z0=1z_0=1.

Map making pipeline

When dealing with maps of finite size (L,L)(L, L) and pixel resolution (N,N)(N,N), Fourier space is going to have boundaries, just as real space does. These limits are given by

lmin=2πL,lmax=2πLN.\begin{gather*} l_{\text{min}} = \frac{2\pi}{L},\\ l_{\text{max}} = \frac{2\pi}{L} N. \end{gather*}

Therefore, a map of size (10,10)(10^{\circ},10^{\circ}) and a grid 64×6464\times64 will have limits (lmin,lmax)=(36,2304)rad1(l_{\text{min}}, l_{\text{max}}) = (36, 2304)\text{rad}^{-1}. Now that we have the physical range for the power spectrum, we can generate data.

Figure 6:Sequence of transformations used to generate Gaussian and lognormal maps starting from the fiducial angular power spectrum. The top branch shows how to generate a GRF which when transformed to a lognormal field, follows the fiducial C(l)C(l). The bottom branch is a standard GRF realisation starting from the fiducial C(l)C(l). LL and GG stand for lognormal and Gaussian respectively. H\mathcal{H} is the Hankel transformation, and tilde refers to intermediate results.

To generate a GRF, we employ the algorithm mentioned in Eq. (13). However, in making a lognormal field, the matter is a bit more complicated. Any GRF on which we apply the lognormal transformation Lκ\mathcal{L}_\kappa, from Eq. eq:L_k, becomes a lognormal field. The reason we do not just lognormal transform any field is given by the fact that they would not recover the fiducial power spectrum C(l)C(l). Our goal is therefore to find a transfer GRF κ~G\tilde{\kappa}^G which, when transformed lognormally, gives rise to a lognormal field κL\kappa^L that recovers C(l)C(l). To do so, we follow the sequence of transformations illustrated in the top branch of Fig. 6.

Noise and mask

Approximately 10\% mask applied to the data. Size (10^{\circ},10^{\circ}) and grid 64\times64.

Figure 7:Approximately 10%10\% mask applied to the data. Size (10,10)(10^{\circ},10^{\circ}) and grid 64×6464\times64.

As all data is being simulated, we only take into account the so-called shape noise, which is due to the intrinsic distribution of ellipticities and angle formed with respect to us. GPs will treat each pixel of the map as a random variable Gaussianly distributed with standard deviation given by Croft et al. (2017),

σnoise=σengApx.\sigma_\text{noise} = \frac{\sigma_e}{\sqrt{n_g A_{px}}}.

We use values of σe=0.26\sigma_e=0.26, ng=4,10,30,100 galaxies/arcmin2n_g= 4, 10, 30, 100 \text{ galaxies}/\text{arcmin}^2, and a pixel area given by the pixel resolution squared, Apx=(L/N)2A_{px}=(L/N)^2. For our highest-resolution run, we have N=64N=64 and use ng=10 galaxies/arcmin2n_g = 10 \text{ galaxies}/\text{arcmin}^2, which results in a noise standard deviation of σnoise0.0088\sigma_\text{noise} \sim 0.0088. The mask being used as seen in Fig. 7, covers approximately 10%10\% of the patch Grewal et al. (2024). We keep the noise and mask random seeds fixed throughout the work.

Kernel

The kernel of a Gaussian process is given by a function of the form k(x,y)k(x,y). It takes in two points, xx and yy, and returns the value of their correlation. In our case, specifically, xx and yy will be two points in a grid of shape (N,N)(N,N), and the kernel function will be the convergence angular autocorrelation function. As all code used for this paper is written in JAX Bradbury et al. (2018) we opt for the use of the library tinygp Foreman-Mackey et al. (2024) for all Gaussian process computations. tinygp allows for the use of a custom kernel with a custom evaluate method, which takes two points on the grid and returns the correlation value. We follow with two Python pseudocodes of our kernel implementations.

class kernel_Hankel:
    def __init__(self, cl, N, L):
        self.w   = Hankel(cl, N, L)
        self.r   = L / N
        
    def evaluate(self, x, y):
        theta    = self.r * sqrt(sum(x - y))
        return self.w(theta)

Program 1:\code{kernel_Hankel} uses the helper function \code{Hankel} which returns a 1D callable correlation function \code{w}. Then finds the euclidean distance between $x$ and $y$ and evaluates the correlation function at that point. We will discuss how we perform the Hankel transformation in the following section.

class kernel_FFT:
    def __init__(self, cl2D, N, L):
        M        = sqrt(cl2D.size)
        self.w2D = abs(ifft2(cl2D)) * M**2 / L**2
        self.r   = M / N
        
    def evaluate(self, x, y):
        d0       = self.r * abs(x[0] - y[0])
        d1       = self.r * abs(x[1] - y[1])
        return self.w2D[int(d0)][int(d1)]

Program 2:\code{kernel_FFT} performs a 2D Fourier transform on the power spectrum \code{cl2D}. This returns a 2D array that is the correlation function. Then to evaluate \code{w2D} we need two indices. These are given by the difference of $x$ and $y$ component wise. Furthermore we have to be careful about possible mismatches between the shape of \code{cl2D} and the grid of our map. For this we have the renormalisation factor \code{r}. We will discuss further how we get \code{cl2D} in the following section.

Hankel transform

In order to build a Gaussian process kernel, we need to find the correlation function that best describes the data. The way we do this is by computing the angular power spectrum and transforming it in the corresponding angular correlation function. Let’s explore the previously discussed flat-sky relation between angular power spectrum and correlation function given by Eq. (8). This particular integral of a Bessel function is also known as a zeroth order Hankel transformation,

w(θ)=dl2πlC(l)J0(lθ).w(\theta) = \int \frac{dl}{2\pi} l C(l) J_0(l\theta).

We will explore two methods for computing this integral: the integration method Sec. ?? and the FFTlog method Sec. ??.

Integration

Integration is the most straight forward way to evaluate the integral, but it requires to implement an algorithm for the approximation of the Bessel function J0J_0. The advantages of this method:

The disadvantages:

FFTlog

The FFTlog method Simonović et al. (2018) is a fast implementation of the Hankel transformation. In fact it simplifies Eq. (23) by assuming a power decomposition of C(l)C(l). Such a decomposition is achievable by taking the fast Fourier transformation (FFT) in logk\log k, hence the name FFTlog. The power spectrum becomes,

C(l)=αcαlυ+iηα.C(l)=\sum_\alpha c_\alpha l^{\upsilon + i\eta_\alpha}.

Substituting in Eq. (23),

w(θ)=αcα0dl2πlυ+iηα+1J0(lθ).w(\theta) = \sum_\alpha c_\alpha \int_0^\infty \frac{dl}{2\pi} l^{\upsilon + i\eta_\alpha+1} J_0(l\theta).

Take x=lθx=l\theta and sα1=υ+iηα+1s_\alpha-1=\upsilon + i\eta_\alpha+1,

w(θ)=αcαθsα0dx2πxsα1J0(x).w(\theta) = \sum_\alpha c_\alpha \theta^{-s_\alpha} \int_0^\infty \frac{dx}{2\pi} x^{s_\alpha-1} J_0(x).

Lastly we recognise 0dxxs1J0(x)=2s1πsin(πs/2)[Γ(s/2)]2\int_0^\infty dx x^{s-1}J_0(x) = \frac{2^{s-1}}{\pi}\sin(\pi s/2) [\Gamma(s/2)]^2, a Mellin transform. Using this tabulated result we conclude that the correlation is given by the sum,

w(θ)=12π2αcαθsα2sα1sin(πsα/2)[Γ(sα/2)]2.w(\theta) = \frac{1}{2\pi^2}\sum_\alpha c_\alpha \theta^{-s_\alpha}2^{s_\alpha-1}\sin(\pi s_\alpha/2) [\Gamma(s_\alpha/2)]^2.

For the Mellin transform identity to hold analytically, the integral bounds have to go from 0 to \infty. Although computationally we don’t need to consider such a wide range, we still have to broaden the integration limits to something larger than our ll-range; so as to avoid ringing effects. Experimentally we have found that extending the ll-range between lmin/4l_{\text{min}}/4 and lmaxl_{\text{max}}, is enough to compensate for such effects. The advantages of this method:

The disadvantages:

Fourier transform

The second transformation that we can use to go from power spectrum to correlation function, is the Fourier transformation discussed in Eq. (7). It is a more fundamental relation than the Hankel transformation as it does not assume that ll is radially symmetric. In our case, low resolution, coupled with a square map, means that the radial assumption might not apply. Note that, the lowest resolution grid we use is 32×3232\times32, with L/N19L/N \sim 19 arcmin; which is one order of magnitude worse than today’s weak lensing data, around 3\sim 3 arcmin according to Zhou et al. (2023). In the following Sec. ?? we will find experimentally that the Hankel methods described above do not work well with our GP setup. For this reason, we explore this method of conversion between angular power spectrum and correlation function,

w(θ)=d2l4π2eilθC(l).w(\theta)=\int\frac{d^2l}{4\pi^2}e^{-i\bm{l}\cdot\bm{\theta}}C(l).

As computers can only deal with discrete functions, it is important to note that we will be performing a discrete Fourier transformation (DFT). We will use the widely known FFT algorithm to compute DFTs. In particular we use JAX’s implementation of the 2D FFT algorithm, jax.numpy.fft.ifft2.

Let us find the relation between continuous Fourier transformation and DFT. The definitions of continuous and discrete Fourier transformations in 1D, are respectively:

F1=dk2πeikx,F1=1Npeikpx.\begin{align} \mathcal{F}^{-1}&=\int\frac{dk}{2\pi}e^{ikx}, \\ \text{F}^{-1}&=\frac{1}{N}\sum_p e^{ik_px}. \end{align}

First of all, a DFT is dimensionless. Secondly, it is discrete and bounded. We can therefore rewrite Eq. (27) using the substitution k(p)=2πpLk(p)=2\pi \frac{p}{L} to discretise k-space,

dk2πeikx=dpLeik(p)x=1Lpeikpx.\int\frac{dk}{2\pi}e^{ikx} = \int\frac{dp}{L} e^{ik(p)x} = \frac{1}{L}\sum_p e^{ik_px}.

Applying the definition of DFT as seen in Eq. eq:DFT, it follows that in 1D

F1=NLF1.\mathcal{F}^{-1}=\frac{N}{L}\text{F}^{-1}.

Which means that the correlation function is given by a backwards normalised inverse DFT to be scaled by a factor (N/L)d(N/L)^d, where dd is the dimension of the considered space. In our case, d=2d=2.

In order to perform a FFT in 2D, we will need a two dimensional extension of the angular power spectrum. We make use of its radial symmetry with respect to l=(lx,ly)\bm{l}=(l_x,l_y) and create a 2D grid of shape (M,M)(M,M) as shown in Fig. 8.

Figure 8:1D to 2D extension of the power spectrum C(l)C(l)

Now that we have a 2D power spectrum we can take the inverse two dimensional fast Fourier transformation to obtain a 2D correlation function. In practice, we will test two grids, which we name full-range FFT method and half-range FFT method.

Full-range FFT

The full-range FFT is defined by a grid of shape (2N,2N)(2N,2N). Since the grid is radial, it will be centered. This implies that if we want to keep information of all NN modes,

2πL,4πL,,2πLN,\frac{2\pi}{L},\, \frac{4\pi}{L},\,\cdot\cdot\cdot\,,\, \frac{2\pi}{L}N,

the grid will have to be at least of shape M=2NM=2N. The advantages of this method:

The disadvantages:

Half-range FFT

The half-range FFT is defined by a grid of shape (N,N)(N,N). The advantages of this method:

The disadvantages:

Power spectrum recovery

Gaussian and lognormal fields

We begin by testing the consistency of our Gaussian and lognormal maps generation pipeline.

Comparison of a Gaussian map, on the left, with a lognormal map on the right. Both maps arise from the same random seed. The colorbar has been adjusted to enhance the differences between the two. The histogram plot shows the clear difference in the map distributions.

Figure 9:Comparison of a Gaussian map, on the left, with a lognormal map on the right. Both maps arise from the same random seed. The colorbar has been adjusted to enhance the differences between the two. The histogram plot shows the clear difference in the map distributions.

We show example realisations of the two fields in Fig. 9, Gaussian on left and lognormal on the right. There’s a visible difference between the two, as it can be seen clearly from the distribution plot. The main check to perform is for testing whether the generated fields recover the theoretical power spectrum. Fig. 10 shows that this is the case for the Gaussian fields. They recover the fiducial C(l)C(l) within a few percent error, with larger deviations 5%\sim 5\% at the low and high ends of the ll-range. Instead, lognormal fields present deviations 10%\gtrsim 10\%. As the lognormal transformations we use have been reported by different sources Zhou et al. (2023)Boruah et al. (2022), the issue must lie with our JAX implementation of the Hankel transformation. Resolving such issues could be achieved by future iterations of this work. In this work, we restrict ourselves to the use of Gaussian fields, as it is enough to prove our thesis and show that Gaussian processes can be applied to cosmological fields.

Power spectrum estimation from Gaussian and lognormal maps. Mean and standard deviation are calculated with 500 realisation of both fields.

Figure 10:Power spectrum estimation from Gaussian and lognormal maps. Mean and standard deviation are calculated with 500 realisation of both fields.

Gaussian process priors

First, we test the ability of the kernels we have built in Sec. ?? to recover the power spectrum of our cosmology.

Reconstructed power spectrum from prior sample of GP with the four proposed kernels: integration, FFTlog, full-range FFT, half-range FFT and sinc FFTlog. Mean and standard deviation are calculated with 500 prior samples from each GP.

Figure 11:Reconstructed power spectrum from prior sample of GP with the four proposed kernels: integration, FFTlog, full-range FFT, half-range FFT and sinc FFTlog. Mean and standard deviation are calculated with 500 prior samples from each GP.

We test five models in Fig. 11: integration, FFTlog, full-range FFT, half-range FFT and sinc FFTlog. The first four methods are described in the Gaussian process kernels Sec. ??, whereas the sinc FFTlog referes to a FFTlog model on which we applied smoothing, by multiplying the power spectrum by a factor of sinc4(lL2πN)sinc^4(l\frac{L}{2\pi N}). The recovered power spectra are plotted against the fiducial power spectrum, or smoothed power spectrum for the sinc FFTlog. Mean and standard deviation associated to the plots are calculated from 500 samples. As expected the integration, FFTlog and full-range FFT perform similarly, as they all contain the same ammount of information. As these models deviate so strongly from the fiducial power spectrum we tried applying smoothing, which helps to recover half of the ll-range at large scales. The only method that seems to be consistently recovering the fiducial power spectrum is the half-range FFT. One could argue that due to the inherent discreteness and boundedness of the fields we are working with, using FFTs is the most natural choice; also, half-range FFT uses the only grid that recovers a correlation function of the same shape as the field without having to perform binning.

Reconstructed power spectrum from prior samples of a GP, as a function of \{\sigma_8, S_8\}. Mean and standard deviation are calculated with 500 prior samples for each different cosmology.

Figure 12:Reconstructed power spectrum from prior samples of a GP, as a function of {σ8,S8}\{\sigma_8, S_8\}. Mean and standard deviation are calculated with 500 prior samples for each different cosmology.

We have also tested the efficacy of the half-range FFT model for different cosmologies of values {σ8,S8}\{\sigma_8, S_8\} equal to {0.4,0.2}\{0.4,0.2\}, {1.2,1.5}\{1.2,1.5\} and, our fiducial cosmology, {0.8,0.8}\{0.8,0.8\}. As Fig. 12 shows, the model is independent of the choice of cosmology. From here on the results will be presented assuming a kernel built with the half-range FFT model.

Gaussian process map reconstruction

Armed with a reliable kernel, let’s embark upon the journey of reconstructing a heavily masked cosmological field. What we will do is: create a noiseless GRF in the fiducial cosmology Tab. %s, True map; apply a mask to obtain the Data map; condition a Gaussian process which assumes the fiducial cosmology. Fig. 13 lists the result of this operation, showing the resulting mean μ\mu and standard deviation σ\sigma of the conditioned GP. We also plot the ratio between residuals Δ=μ\Delta=\mu-True and standard deviation squared, to test the goodness of fit of our model, the values of the map sum up to χ22495\chi^2 \sim 2495. With the mask covering ν=2353\nu=2353 pixels, we obtain χ2/ν=1.06\chi^2 / \nu = 1.06. Of course, this is just a noiseless application, which is unreasonable for a real application.

Summary of field reconstruction abilities of a Gaussian process conditioned on data. The left column shows the masked GRF, which is our data. The middle column shows the true GRF without masks and a posterior sample drawn from the conditioned GP. The right column shows maps of the mean, standard deviation and residuals over standard deviation squared of the conditioned GP. Regions of higher uncertainty correspond to the masked regions. The residuals over standard deviation map also shows how regions with low mask recover the data.

Figure 13:Summary of field reconstruction abilities of a Gaussian process conditioned on data. The left column shows the masked GRF, which is our data. The middle column shows the true GRF without masks and a posterior sample drawn from the conditioned GP. The right column shows maps of the mean, standard deviation and residuals over standard deviation squared of the conditioned GP. Regions of higher uncertainty correspond to the masked regions. The residuals over standard deviation map also shows how regions with low mask recover the data.

Inference of cosmological parameters

To test the ability of Gaussian processes to recover cosmological parameters without any prior knowledge except a noisy and masked map, we perform a MCMC simulation to infer the posterior distributions of σ8\sigma_8 and S8S_8. We use the convention

S8=σ8Ωm0.3,S_8 = \sigma_8 \sqrt{\frac{\Omega_m}{0.3}},

to infer deterministically a posterior for Ωm\Omega_m. Such a reparametrisation is needed due to the strong degeneracy between σ8\sigma_8 and Ωm\Omega_m. Eq. (30) breaks this degeneracy, changing the geometry of the sampling space and making the sampling more consistent. The model assumes uninformed flat priors for the cosmological parameters, as shown in Tab. %s, such prior bounds are also in accordance with the jaxcosmo release Campagne et al. (2023). The likelihood of the model is given by a Gaussian process distribution conditioned on Data, with a standard deviation equal to the noise applied to the map. The analysis is coded with numpyro Phan et al. (2019) Bingham et al. (2019), using a the No-U-Turn Sampler (NUTS) method with max_tree_depth=16, target_accept_prob=0.8. We simulate 8 chains for the 32×3232\times32 grid and 4 chains for the 64×6464\times64. Each chain performs 1000 warmup steps and 3000 samples.

One parameter

As a first step and for a consistency check, we run the inference model for one cosmological parameter, keeping all others fixed. Using a 64×6464\times64 grid with ng=10 galaxies/arcmin2n_g=10 \text{ galaxies}/\text{arcmin}^2. In Fig. 15 we show the inferred distribution for both σ8\sigma_8 and Ωm\Omega_m. We find that we are able to recover the true value for both parameters within two sigmas, σ8=0.776±0.015\sigma_8 = 0.776\pm0.015 and Ωm=0.284±0.010\Omega_m = 0.284\pm0.010. We notice a slight tendency of the inferred distribution to be biased low; a tendency we also observe next for both sampled parameters, S8S_8 and σ8\sigma_8.

Inferred posterior distribution of \sigma_8 on the left and \Omega_m on the right. Dotted lines indicate the 1\sigma level. Truth values corresponding to the fiducial cosmology are indicated in blue.

Figure 15:Inferred posterior distribution of σ8\sigma_8 on the left and Ωm\Omega_m on the right. Dotted lines indicate the 1σ1\sigma level. Truth values corresponding to the fiducial cosmology are indicated in blue.

Two parameters

Effect of noise

Table 1:List of inferred cosmological parameters inferred by the model with a small 32×3232\times32 grid and for a fixed true GRF realisation. We present the cosmological parameters inferred as we increase the noise level, corresponding to ng=4n_g = 4, 10, 30 and 100 galaxies/arcmin2100 \text{ galaxies}/\text{arcmin}^2.

σnoise\sigma_\text{noise}0.00690.00440.00250.0014
S8S_80.716±0.0430.716\pm0.0430.733±0.0400.733\pm0.0400.747±0.0410.747\pm0.0410.752±0.0420.752\pm0.042
σ8\sigma_80.645±0.1570.645\pm0.1570.628±0.1580.628\pm0.1580.632±0.1660.632\pm0.1660.631±0.1700.631\pm0.170
Ωm\Omega_m0.423±0.1640.423\pm0.1640.469±0.1750.469\pm0.1750.485±0.1860.485\pm0.1860.497±0.1930.497\pm0.193

We perform some tests on low resolution 32×3232\times32 grids to see the effect that the noise level has on the recovered parameters, see Tab. 1. Here we report the inferred cosmological parameters for one data realisation and different noise levels, corresponding respectively to ng=4n_g = 4, 10, 30 and 100 galaxies/arcmin2100 \text{ galaxies}/\text{arcmin}^2, see Eq. (22). The inferred value of S8S_8 can vary as much as a full σ\sigma between high and low noise runs. Keeping in mind that σ8\sigma_8 and Ωm\Omega_m are extremely unreliable due to relative uncertainties of 2530%\sim 25-30\% caused by the degeneracy: as a general trend we notice Ωm\Omega_m gets bigger when σ8\sigma_8 gets smaller with less noise.

Inferred cosmological parameters

Running the model for a larger 64×6464\times64 grid with ng=10 galaxies/arcmin2n_g=10 \text{ galaxies}/\text{arcmin}^2, gives much better constraints on the cosmological parameters. We present the values recovered by the posterior distributions, listed as follows in Tab. 2.

Table 2:Mean and sigma values recovered from the inferred distributions of the cosmological parameters.

S8S_8σ8\sigma_8Ωm\Omega_m
0.762±0.0280.762\pm0.0280.745±0.1510.745\pm0.1510.353±0.1430.353\pm0.143

Fig. 16 shows the inferred posterior distributions and contours for the three cosmological parameters σ8\sigma_8, Ωm\Omega_m and S8S_8. Looking at the contours, we obtain the well known banana-shaped degeneracy between σ8\sigma_8 and Ωm\Omega_m. The S8S_8 and Ωm\Omega_m contour presents sharp cuts for high and low Ωm\Omega_m, indicating an issue with the bounds of the uniform priors imposed. Unfortunately the jaxcosmo package does not allow for the choice of priors to be wider than what shown in Tab. %s, as the model then starts to have divergent samples.

Inferred posterior distributions of S_8, \sigma_8 and \Omega_m. For noise level \sigma_\text{noise}\sim 0.0088. Contours indicate the 1\sigma and 2\sigma credible interval respectively. Dotted lines indicate the 1\sigma level. Truth values corresponding to the fiducial cosmology are indicated in blue.

Figure 16:Inferred posterior distributions of S8S_8, σ8\sigma_8 and Ωm\Omega_m. For noise level σnoise0.0088\sigma_\text{noise}\sim 0.0088. Contours indicate the 1σ1\sigma and 2σ2\sigma credible interval respectively. Dotted lines indicate the 1σ1\sigma level. Truth values corresponding to the fiducial cosmology are indicated in blue.

Posterior checks

Following the two parameter inference model, we perform some posterior checks at the map level Porqueres et al. (2021). Fig. 18 sums up the ability of the model to recover the true map, noiseless and unmasked. Here we present the run with noise level σnoise0.0088\sigma_\text{noise}\sim 0.0088 and a 64×6464\times64 grid. We show the mean and standard deviation for the sample with highest likelihood out of the 12000. The mean field μ\mu is visibly different to the true field in the masked regions and it seems to be of overall lower amplitude. The sample map is comparable to the noisy data; which is to be expected, as the internal noise given to the Gaussian process is the same as the noise level of the data. The standard deviation map σ\sigma presents an overall amplitude comparable to the noise level 0.010\sim 0.010, with higher values for the masked regions. Summing up the map values of the residuals divided by standard deviation squared, we obtain a χ21297.4\chi^2\sim 1297.4. Compared to the number of free parameters ν\nu in our inference model, which for a 10%10\% mask and a 64×6464\times64 grid, is ν=3689\nu=3689. The value of χ2\chi^2 therefore seems to be low, indicating that the noise level assumed by the GP is overestimated. This is supported by the fact that the sample map looks just as noisy as the data, according to Fig. 17, its distribution is in fact just as wide as the noise.

Residual distributions of the mean and sample compared to noise. The mean is less spread, whereas the sample is wider.

Figure 17:Residual distributions of the mean and sample compared to noise. The mean is less spread, whereas the sample is wider.

Residuals

Summary of the two parameter inference at the map level. The left column shows the masked and noisy GRF realisation used, which is our data. The middle column shows the true GRF and a sample from the conditioned GP. The right column shows maps of the mean, standard deviation and residuals over standard deviation squared resulting from the numpyro model sample with highest likelihood. Regions of higher uncertainty correspond to the masked regions.

Figure 18:Summary of the two parameter inference at the map level. The left column shows the masked and noisy GRF realisation used, which is our data. The middle column shows the true GRF and a sample from the conditioned GP. The right column shows maps of the mean, standard deviation and residuals over standard deviation squared resulting from the numpyro model sample with highest likelihood. Regions of higher uncertainty correspond to the masked regions.

We have hereby introduced the tool of Gaussian processes to the landscape of map inference of cosmological fields, in particular weak lensing convergence. We considered how the 2-point statistics of cosmological fields changes when we are dealing with bounded and discrete maps in Sec. ??. We discussed the realisation of Gaussian and lognormal fields in Sec. ??, and showed their ability to recover the 2-point statistics that they encode, in Sec. ??. We have included masking and noise to the data to simulate realistic maps in Sec. ??. We have shown that it is possible to apply physical knowledge about the 2-point correlation function of a cosmological field in order to set up a Gaussian process able to produce a Gaussian realisation of such a field. We considered different set ups for the Gaussian process kernel in Sec. ?? and showed how they fare against one another in Sec. ??, ultimately proving empirically that the half-range FFT model is the best. In Sec. ?? we present an application of Gaussian processes to a noiseless masked convergence map, in order to showcase its ability to reconstruct a heavily masked map. Finally we present our results for the cosmological parameters inference with GPs, conditioning on noisy and masked data. When running the inference model on one cosmological parameter we recover both parameters within two sigmas, σ8=0.776±0.015\sigma_8 = 0.776\pm0.015 and Ωm=0.284±0.010\Omega_m = 0.284\pm0.010. For the two parameter inference we observe the well known banana-shaped degeneracy between σ8\sigma_8 and Ωm\Omega_m, as well as recovering 0.762±0.0280.762\pm0.028 within two sigmas.

In future studies GPs could be tested on maps with larger grids. In order to achieve a resolution of 3\sim 3 arcmin with a map of size (10,10)(10^\circ,10^\circ), a 200×200200\times200 grid is needed. Too big for a GP. The bottleneck is given by the inversion of the kernel matrix, see (19). Approximations of this operation could enable the use of GPs on larger grids. This can lead to the possibility of applying this method on current weak lensing catalogues and perhaps even full sky catalogues. Another pathway to explore are lognormal fields, as they do a much better job at simulating data than GRFs. Due to GPs being Gaussian, their associated likelihood is not suitable to treat lognormal fields. A modified likelihood could therefore unlock a correct application of GPs to lognormal fields.

Acknowledgments

This project wouldn’t have been possible without the author of the idea, Dr. Tilman Tröster. You guided me through my first real research experience and I am grateful. Thank you Veronika Oehl for always being there and for the very helpful discussions. I also express my gratitude to the Cosmology group at ETH, led by Prof. Alexandre Réfrégier. Hearing about my every progress every Monday morning for six months, couldn’t have been easy, thank you. It has been an incredible experience, long and grinding, which I embarked upon with my friends and colleagues Tommaso and Pietro. Thank you for making these past few months memorable. Thanks to Guido van Rossum for giving us Python. Thank you Mia, for your unconditional support, you keep me grounded. Non sarei qua senza di te mamma, grazie.

Thank you, reader.

References
  1. Kilbinger, M. (2015). Cosmology with cosmic shear observations: a review. Reports on Progress in Physics, 78(8), 086901. 10.1088/0034-4885/78/8/086901
  2. Bartelmann, M., & Maturi, M. (2017). Weak gravitational lensing. Scholarpedia, 12(1), 32440. 10.4249/scholarpedia.32440
  3. Blandford, R. D., Saust, A. B., Brainerd, T. G., & Villumsen, J. V. (1991). The distortion of distant galaxy images by large scale structure. AIP Conference Proceedings, 222(1), 455–458. 10.1063/1.40414
  4. Mootoovaloo, A., Heavens, A. F., Jaffe, A. H., & Leclercq, F. (2020). Parameter inference for weak lensing using Gaussian Processes and MOPED. Monthly Notices of the Royal Astronomical Society, 497(2), 2213–2226. 10.1093/mnras/staa2102
  5. Boruah, S. S., Eifler, T., Miranda, V., & Krishanth, P. M. S. (2022). Accelerating cosmological inference with Gaussian processes and neural networks – an application to LSST Y1 weak lensing and galaxy clustering. Monthly Notices of the Royal Astronomical Society, 518(4), 4818–4831. 10.1093/mnras/stac3417
  6. Karchev, K., Coogan, A., & Weniger, C. (2022). Strong-lensing source reconstruction with variationally optimized Gaussian processes. Monthly Notices of the Royal Astronomical Society, 512(1), 661–685. 10.1093/mnras/stac311
  7. Shafieloo, A., Kim, A. G., & Linder, E. V. (2012). Gaussian process cosmography. Phys. Rev. D, 85(12), 123530. 10.1103/PhysRevD.85.123530
  8. Seikel, M., Clarkson, C., & Smith, M. (2012). Reconstruction of dark energy and expansion dynamics using Gaussian processes. Journal of Cosmology and Astroparticle Physics, 2012(06), 036. 10.1088/1475-7516/2012/06/036
  9. Holsclaw, T., Alam, U., Sansó, B., Lee, H., Heitmann, K., Habib, S., & Higdon, D. (2010). Nonparametric reconstruction of the dark energy equation of state. Phys. Rev. D, 82(10), 103502. 10.1103/PhysRevD.82.103502
  10. Chon, G., Challinor, A., Prunet, S., Hivon, E., & Szapudi, I. (2004). Fast estimation of polarization power spectra using correlation functions. Monthly Notices of the Royal Astronomical Society, 350(3), 914–926. 10.1111/j.1365-2966.2004.07737.x
  11. Brown, M. L., Castro, P. G., & Taylor, A. N. (2005). Cosmic microwave background temperature and polarization pseudo-C\ell estimators and covariances. Monthly Notices of the Royal Astronomical Society, 360(4), 1262–1280. 10.1111/j.1365-2966.2005.09111.x
  12. Zhou, A. J., Li, X., Dodelson, S., & Mandelbaum, R. (2023). Accurate field-level weak lensing inference for precision cosmology.
  13. Dodelson, S., & Schmidt, F. (2021). 13 - Probes of structure: lensing. In S. Dodelson & F. Schmidt (Eds.), Modern Cosmology (Second Edition) (Second Edition, pp. 373–399). Academic Press. https://doi.org/10.1016/B978-0-12-815948-4.00019-X
  14. Nicola, A., García-García, C., Alonso, D., Dunkley, J., Ferreira, P. G., Slosar, A., & Spergel, D. N. (2021). Cosmic shear power spectra in practice. Journal of Cosmology and Astroparticle Physics, 2021(03), 067. 10.1088/1475-7516/2021/03/067
  15. Gao, Z., Raccanelli, A., & Vlah, Z. (2023). Asymptotic connection between full- and flat-sky angular correlators. Phys. Rev. D, 108(4), 043503. 10.1103/PhysRevD.108.043503