Abstract¶
In this work we propose the use of physically informed Gaussian processes (GP) to analyse cosmological fields at the map level. We will show that GPs can capture the statistical behaviour of cosmological fields, providing us with a likelihood as a function of the cosmological parameters conditioned to the map. In practice, we set the Gaussian process kernel to be the 2-point autocorrelation function associated to a 2D discrete flat-sky convergence map. We find that a GP in this setup is not only able to generate maps with the wanted 2-point statistics, but also to reconstruct masked data with an associated uncertainty. Additionally, we perform a Bayesian inference analysis in order to test the ability of Gaussian processes to recover cosmological parameters. We find that we are able to consistently recover the and degeneracy, recovering within two sigma uncertainty. The data is simulated by a Gaussian random field realisation of a convergence map of size , grid, mask and noise given by a galaxy density of .
keywords: Gaussian processes, weak lensing.
Matter bends light. The theory of general relativity predicts that the presence of matter or energy changes the geometry of space and time, which in turn can cause what would otherwise be the straight path of a beam of light to curve. Take a distant source of light. If we assume the universe not to be empty, then between us and said source there exists a non trivial matter disposition. As light travels through everything that is in between us and the source, it gets blocked, bent and distorted. We call this phenomenon weak lensing. In practice what we measure is the shape of distant galaxies. These images do not show a distinct lensing feature individually, as such tiny changes can only be seen with a large number of sources. For example, we observe that galaxies have a tendency of aligning along a preferred axis, causing a statistical discrepancy in an otherwise seemingly isotropic universe. For further details on lensing, please refer to Kilbinger (2015) Bartelmann & Maturi (2017); for weak lensing Blandford et al. (1991). The image of a distant galaxy can change shape or size. Changes in shape are fully characterised by the shear distortion \Vec{\gamma} vector field, where the change in size is given by its magnitude, the convergence field . We lay out a theoretical framework for weak lensing in Sec. ?? as well as details on the cosmology used in this thesis in Sec. ??.
A lot of work goes into translating a measurement of distant galaxies to its mathematically friendly counterpart \Vec{\gamma}. This is one of the reasons why we will not be dealing with it in this thesis, as it is outside of our scope. Instead we will be simulating our own convergence fields. We use a Gaussian random field (GRF) algorithm for the creation of Gaussianly distributed data. As well as lognormal transformations to create fields with a distribution that resembles more closely that in our universe. These transformations are listed in Sec. ??; we then use them to simulate our data as explained in Sec. ??. We also verify that the generated fields recover the fiducial power spectrum in Sec. ??.
Figure 1:Simplified steps taken by the model to go from cosmological parameters to likelihood using GP. Here and stand for the power spectrum and correlation function respectively, is the data.
A Gaussian process (GP) usually assumes little prior knowledge about the data it is applied to. Current research in the field of cosmology views GPs as a machine learning tool to be trained. It is used to accelerate and optimise models Mootoovaloo et al. (2020) Boruah et al. (2022) Karchev et al. (2022), as well as for its interpolation qualities applied to the reconstruction of functions determining the evolution of the universe Shafieloo et al. (2012) Seikel et al. (2012) Holsclaw et al. (2010). Our work, however, is based on a different approach. We apply our prior knowledge about 2-point statistics in cosmology to create a fully informed GP. Restricting ourselves to 2D flat-sky weak lensing convergence fields, as shown in Fig. 1, we can:
compute the angular power spectrum from a set of cosmological parameters ,
transform it in the convergence angular autocorrelation function ,
create a zero mean GP with kernel given by said correlation function,
evaluate the likelihood of given a set of data points .
With a Bayesian approach we make use of this pipeline to infer the values of the cosmological parameters. Running a Markov chain Monte Carlo (MCMC) we can sample the posterior distribution of the cosmological parameters, in particular we will get contours for , and . Other than that, GPs have several other interesting properties at the field level. They are not only able to generate fields that recover the fiducial 2-point statistics, but are also able to reconstruct masked fields, a task that usually brings many challenges to estimation Chon et al. (2004) Brown et al. (2005). In the field of weak lensing in particular, foreground objects like bright stars or galaxies can contaminate measurements, leading to the need of masking such a region, essentially removing the signal.
Here we list the advantages of our method:
minimal information loss, as it is a map based method we use all available data points,
it can be used as a likelihood for cosmological parameters inference,
easily deals with masked fields, providing estimates with an associated uncertainty for the masked points.
Whilst some of the disadvantages:
the conditioning process depends on the inverse of a correlation matrix, with a computational effort that grows as for data points, indicating possible scaling issues;
due to the intrinsic Gaussianity of GPs, the distribution of the samples will be Gaussian, making it hard to apply them to other fields, say for example lognormal fields.
GPs in this thesis are presented in a general introduction in Sec. ??, followed by a detailed account of the computational methods used to recover a working kernel for GPs in Sec. ??. In our results we show their ability to create maps that follow the desired statistic Sec. ?? and reconstruct data Sec. ??. We also present our attempt at cosmological parameters inference with GPs in Sec. ??.
It is important to note that throughout the thesis we follow the extremely useful guidelines set by the Miko pipeline Zhou et al. (2023) on how to deal with discrete maps in weak lensing.
Weak lensing¶
Convergence field¶
Let’s introduce what can only be defined as the protagonist cosmological field of this thesis: the convergence field Dodelson & Schmidt (2021) Blandford et al. (1991) Kilbinger (2015) Bartelmann & Maturi (2017). As explained in the introduction weak lensing happens due to the presence of matter between us and a distant source. In mathematical terms we write this as an integral over the line of sight of the matter overdensity field, weighted accordingly.
With being a measure of the lensing weights; incorporates all of the relevant cosmological parameters, as well as knowledge of the redshift distribution of the source galaxies .
2-point statistics¶
2-point statistics such as correlation functions and power spectra are arguably the most powerful tool of analysis in cosmology. They work as a way to summarise raw data into something simpler, while still retaining most of the relevant information; for examples, allowing us to extract constraints on cosmological parameters.
Full sky¶
Let us start by introducing the statistics of a 3D spherically symmetric field, also known as a full sky field. The 2-point autocorrelation function is defined as the expectation value of the product of a random field with its complex conjugate
Where is the 2D projection of a spherically symmetric 3D field. We can now perform a decomposition in spherical harmonics . Such a decomposition gives rise to the correlation function associated to the harmonic coefficients , the 2D power spectrum ,
In this setup correlation function and power spectrum are therefore related by,
where we have used the identity relating Legendre polynomials to spherical harmonics . Cleaning up the equations we are just left with the well known full sky equation
which relates the angular correlation function to the angular power spectrum.
Flat-sky¶
Most analysis of weak lensing data actually use the flat-sky approximation Nicola et al. (2021) Gao et al. (2023) Matthewson & Durrer (2021) Gao et al. (2023) García-García et al. (2019) Schneider, P. et al. (2002), our work will not be an exception. Such an approximation essentially changes our setup to a flat-sky 2D field, instead of the 2D projection of a 3D one, allowing for simpler analytical expressions. In this setup, harmonic decomposition will not work, we will have to Fourier transform our space instead. To do so, we can assume the existence of a function which is the continuous extension of . Then we say that is the result of a 2D inverse Fourier transformation ,
To prove that is none other than the angular correlation function, we make use of the radial symmetry of the cosmological field. Which simplifies the equation into a 1D integral. Consider the polar coordinate substitution from to , the integral becomes
Lastly, we make use of the identity , where is the zeroth order Bessel function. Which leaves us with
In this form it is clear that Eq. (6) and Eq. (8) are asymptotically equivalent, since it is known that as . This concludes our heuristic proof that the angular correlation function is recovered from an inverse Fourier transformation of the angular power spectrum in the flat-sky approximation. Physically speaking this approximation makes sense when we consider a patch of sky of size L. The wavenumber of the flat-sky angular power spectrum is related to its dimensionless counterpart p by
in other words, it is inversely proportional to the size of the map. Just as we would expect the geometry of a sphere to become flat when zooming in, the flat-sky approximation holds as .
Limber approximation¶
The way we computationally obtain ’s 2-point statistics is with the Limber approximation. We use the code jaxcosmo Campagne et al. (2023) to compute the 2D power spectrum from the 3D matter power spectrum with the efficient Limber approximation,
The assumptions of the Limber approximation are:
flat-sky, as described previously and
the matter power spectrum depends only on modes on the field , essentially setting the modes parallel to the line of sight to zero, .
Such assumptions allow to simplify the relation between and to a one line integral, Eq. (10). It is noteworthy to mention that the Limber approximation introduces significant errors only for modes , as explained in detail in Gao et al. (2023). As far as this work is concerned, we only deal with patches of size , which means we have according to Eq. (9), well within the range for a good approximation.
Field generation¶
Gaussian random field¶
In order to simulate cosmological fields with a specific power spectrum, we make use of Gaussian random fields. They are fast to generate and only need information about the 2-point power spectrum of the field. The algorithm we use for the generation of GRFs is common to most packages and is explained in detail in Bertschinger (2001) Lang & Potthoff (2011). In 1D, assume a pair of functions functions and , related by a Fourier transformation
Let be a Gaussian white noise field and let denote a Fourier transformation, we then define
to be a Gaussian random field. Such a procedure ensures that the covariance of the GRF recovers the correlation function ,
Undefined control sequence: \iiiint at position 61: …y)\right] \\
=&\̲i̲i̲i̲i̲n̲t̲ ̲dx' dy' \frac{d…
\begin{align*}
&\mathbb{E}\left[\phi(x)\phi(y)\right] \\
=&\iiiint dx' dy' \frac{dk}{2\pi} \frac{dl}{2\pi} \, e^{i(kx + ly)} P(k)^{1/2} P(l)^{1/2} e^{-i(kx' + ly')} \mathbb{E}[W(x')W(y')] \\
=& \iint \frac{dk}{2\pi} \frac{dl}{2\pi} \, e^{i(kx + ly)} P(k)^{1/2} P(l)^{1/2} \int dx' \, e^{-i(k+l)x'} \\
=& \int \frac{dk}{2\pi} \, e^{i k(x-y)} P(k) \\
=& \, \xi(x, y).
\end{align*}Where we have used , as white noise is defined by a constant power spectrum.
Rayleigh distribution¶
Since we are dealing with 2D maps, our algorithm will have to be implemented in two dimensions. To do so, we decide to implement the algorithm in Eq. (11) with the use of the Rayleigh distribution . Given two independent Gaussian random variables and , the random variable given by
is said to be Rayleigh distributed. If we then multiply by the complex exponential of a uniformly distributed random variable , we obtain a map of Gaussianly distributed complex numbers, which will substitute the term in Eq. (11). We showcase this equivalency in Fig. 2 for distributions of and . The sequence of transformations in 2D therefore becomes,
Figure 2:Sampling a 2D Gaussian against a Rayleigh distributed amplitude with uniform complex phase. In purple is the complex number with , in cyan is with and .
Lognormal field¶
The universe today is not Gaussian. GRFs are able to capture the 2-point statistics of cosmological fields, but they cannot capture their skewed distribution. A possible way to get closer to the true distribution of the convergence field is with a lognormal transformation Boruah et al. (2022) Leclercq & Heavens (2021) Hilbert, S. et al. (2011) Zhou et al. (2023). In this work we will denote lognormal transformations as and adopt the following convention for both the field and the correlation function,
Where the superscripts L and G respectively stand for lognormal and Gaussian, Var() stands for the variance of the field and is the so-called shift parameter, which is an indicator of the non-Gaussianity of the resulting field.
Gaussian process¶
One way to think of Gaussian processes is as an extension of random vectors to infinite dimensions. Following this train of thought, let’s begin with the concept of a random variable following a normal distribution. We say,
to mean that is a sample of a Gaussian of mean and variance . If we were to get enough samples , we would eventually recover its distribution. The generalisation of this concept to n-dimensions is a collection of random variables, described by a so-called multivariate normal distribution,
Where is a vector of random variables, is the mean vector and the covariance matrix. For a zero mean field, the covariance matrix is formed by the variance of each of the random variables on its diagonal, while the cross correlation terms populate the rest of the matrix.
Definition¶
Adopting the philosophy of Rasmussen Rasmussen & Williams (2005), functions can be thought of as very long vectors of the form . Such a view allows us to extend the definition of multivariate Gaussians, to functions. Defining a Gaussian process, a sample function will be given by:
with and defined as,
Mathematically a GP is defined for a continuous function. Computationally this is not possible and we must treat space as a discrete grid.
Prior and posterior samples¶
Given some and which define a GP, a random sample from said GP would be a function defined on a domain , which is our grid.
Here we adopt the convention . When drawing a sample function from Eq. (17), computationally the operation is equivalent to drawing a vector from a multivariate Gaussian. What we obtain is a so-called prior, or priors, see Fig. 3.
Figure 3:Prior samples of a GP with mean and squared exponential kernel .
Let’s now see how we can introduce knowledge of data points in this system. We divide the grid in training points and test points . To each training point is associated a known value and variance , whereas the values of the function at the test points are unknown. We can summarise this as,
where we have once again adopted the notation , , . At this point, one way to find samples that follow the data would be to blindly draw priors until we get something that goes through all data points. This would be inefficient and computationally wasteful. Instead, we make a better guess for the test function values. This operation is called conditioning, because we condition the joint Gaussians on the training points, this gives
Conditioning can therefore give rise to what is called a posterior sample, Fig. 4. The result is still a multivariate Gaussian, but the mean and variance given by Eq. (19) generate samples that are a better guess of the behaviour of the function outside of the training points.
Figure 4:Summary plot of a GP conditioned to some data. The cyan line is the mean of the GP and the filled region corresponds to . The purple lines are posterior samples, which are distributed Gaussianly around the mean. The data points are clearly marked in black, they are also the points where all samples converge to.
Data simulation¶
Cosmology¶
This work follows in the general footsteps of data analysis of weak lensing surveys like HSC Hikage et al. (2019) and KiDS Köhlinger et al. (2017)Asgari, Marika et al. (2021), as we aim to replicate their methodologies. Throughout the work, we assume a fiducial cosmology of fixed parameter values as shown in Tab. %s. In particular, all data maps will be generated from a power spectrum following this cosmology. We will refer to this power spectrum as the fiducial power spectrum . A leading modelling choice comes with the redshift distribution . We model it as a Smail-type distribution Smail et al. (1995)Kirk et al. (2012),
The choice of parameters has been made to emulate bin 5 of the KiDS1000 survey Hildebrandt, H. et al. (2021), with parameters , , .
Figure 5:Smail-type redshift distribution for the chosen parameters: , , .
Map making pipeline¶
When dealing with maps of finite size and pixel resolution , Fourier space is going to have boundaries, just as real space does. These limits are given by
Therefore, a map of size and a grid will have limits . Now that we have the physical range for the power spectrum, we can generate data.
Figure 6:Sequence of transformations used to generate Gaussian and lognormal maps starting from the fiducial angular power spectrum. The top branch shows how to generate a GRF which when transformed to a lognormal field, follows the fiducial . The bottom branch is a standard GRF realisation starting from the fiducial . and stand for lognormal and Gaussian respectively. is the Hankel transformation, and tilde refers to intermediate results.
To generate a GRF, we employ the algorithm mentioned in Eq. (13). However, in making a lognormal field, the matter is a bit more complicated. Any GRF on which we apply the lognormal transformation , from Eq. eq:L_k, becomes a lognormal field. The reason we do not just lognormal transform any field is given by the fact that they would not recover the fiducial power spectrum . Our goal is therefore to find a transfer GRF which, when transformed lognormally, gives rise to a lognormal field that recovers . To do so, we follow the sequence of transformations illustrated in the top branch of Fig. 6.
Noise and mask¶
Figure 7:Approximately mask applied to the data. Size and grid .
As all data is being simulated, we only take into account the so-called shape noise, which is due to the intrinsic distribution of ellipticities and angle formed with respect to us. GPs will treat each pixel of the map as a random variable Gaussianly distributed with standard deviation given by Croft et al. (2017),
We use values of , , and a pixel area given by the pixel resolution squared, . For our highest-resolution run, we have and use , which results in a noise standard deviation of . The mask being used as seen in Fig. 7, covers approximately of the patch Grewal et al. (2024). We keep the noise and mask random seeds fixed throughout the work.
Kernel¶
The kernel of a Gaussian process is given by a function of the form . It takes in two points, and , and returns the value of their correlation. In our case, specifically, and will be two points in a grid of shape , and the kernel function will be the convergence angular autocorrelation function. As all code used for this paper is written in JAX Bradbury et al. (2018) we opt for the use of the library tinygp Foreman-Mackey et al. (2024) for all Gaussian process computations. tinygp allows for the use of a custom kernel with a custom evaluate method, which takes two points on the grid and returns the correlation value. We follow with two Python pseudocodes of our kernel implementations.
class kernel_Hankel:
def __init__(self, cl, N, L):
self.w = Hankel(cl, N, L)
self.r = L / N
def evaluate(self, x, y):
theta = self.r * sqrt(sum(x - y))
return self.w(theta)Program 1:\code{kernel_Hankel} uses the helper function \code{Hankel} which returns a 1D callable correlation function \code{w}. Then finds the euclidean distance between $x$ and $y$ and evaluates the correlation function at that point. We will discuss how we perform the Hankel transformation in the following section.
class kernel_FFT:
def __init__(self, cl2D, N, L):
M = sqrt(cl2D.size)
self.w2D = abs(ifft2(cl2D)) * M**2 / L**2
self.r = M / N
def evaluate(self, x, y):
d0 = self.r * abs(x[0] - y[0])
d1 = self.r * abs(x[1] - y[1])
return self.w2D[int(d0)][int(d1)]Program 2:\code{kernel_FFT} performs a 2D Fourier transform on the power spectrum \code{cl2D}. This returns a 2D array that is the correlation function. Then to evaluate \code{w2D} we need two indices. These are given by the difference of $x$ and $y$ component wise. Furthermore we have to be careful about possible mismatches between the shape of \code{cl2D} and the grid of our map. For this we have the renormalisation factor \code{r}. We will discuss further how we get \code{cl2D} in the following section.
Hankel transform¶
In order to build a Gaussian process kernel, we need to find the correlation function that best describes the data. The way we do this is by computing the angular power spectrum and transforming it in the corresponding angular correlation function. Let’s explore the previously discussed flat-sky relation between angular power spectrum and correlation function given by Eq. (8). This particular integral of a Bessel function is also known as a zeroth order Hankel transformation,
We will explore two methods for computing this integral: the integration method Sec. ?? and the FFTlog method Sec. ??.
Integration¶
Integration is the most straight forward way to evaluate the integral, but it requires to implement an algorithm for the approximation of the Bessel function . The advantages of this method:
it is easy to integrate over the correct -range, from and , a freedom that we do not have with the FFTlog.
The disadvantages:
integration is computationally slow, especially when dealing with highly oscillatory behaviour introduced by the Bessel function, which requires fine sampling.
FFTlog¶
The FFTlog method Simonović et al. (2018) is a fast implementation of the Hankel transformation. In fact it simplifies Eq. (23) by assuming a power decomposition of . Such a decomposition is achievable by taking the fast Fourier transformation (FFT) in , hence the name FFTlog. The power spectrum becomes,
Substituting in Eq. (23),
Take and ,
Lastly we recognise , a Mellin transform. Using this tabulated result we conclude that the correlation is given by the sum,
For the Mellin transform identity to hold analytically, the integral bounds have to go from 0 to . Although computationally we don’t need to consider such a wide range, we still have to broaden the integration limits to something larger than our -range; so as to avoid ringing effects. Experimentally we have found that extending the -range between and , is enough to compensate for such effects. The advantages of this method:
the FFTlog method is ultimately much faster than the integration method.
The disadvantages:
the widening of the -range needed to avoid ringing effects inevitably adds power to the correlation function resulting in a higher variance and overall amplitude.
Fourier transform¶
The second transformation that we can use to go from power spectrum to correlation function, is the Fourier transformation discussed in Eq. (7). It is a more fundamental relation than the Hankel transformation as it does not assume that is radially symmetric. In our case, low resolution, coupled with a square map, means that the radial assumption might not apply. Note that, the lowest resolution grid we use is , with arcmin; which is one order of magnitude worse than today’s weak lensing data, around arcmin according to Zhou et al. (2023). In the following Sec. ?? we will find experimentally that the Hankel methods described above do not work well with our GP setup. For this reason, we explore this method of conversion between angular power spectrum and correlation function,
As computers can only deal with discrete functions, it is important to note that we will be performing a discrete Fourier transformation (DFT). We will use the widely known FFT algorithm to compute DFTs. In particular we use JAX’s implementation of the 2D FFT algorithm, jax.numpy.fft.ifft2.
Let us find the relation between continuous Fourier transformation and DFT. The definitions of continuous and discrete Fourier transformations in 1D, are respectively:
First of all, a DFT is dimensionless. Secondly, it is discrete and bounded. We can therefore rewrite Eq. (27) using the substitution to discretise k-space,
Applying the definition of DFT as seen in Eq. eq:DFT, it follows that in 1D
Which means that the correlation function is given by a backwards normalised inverse DFT to be scaled by a factor , where is the dimension of the considered space. In our case, .
In order to perform a FFT in 2D, we will need a two dimensional extension of the angular power spectrum. We make use of its radial symmetry with respect to and create a 2D grid of shape as shown in Fig. 8.
Figure 8:1D to 2D extension of the power spectrum
Now that we have a 2D power spectrum we can take the inverse two dimensional fast Fourier transformation to obtain a 2D correlation function. In practice, we will test two grids, which we name full-range FFT method and half-range FFT method.
Full-range FFT¶
The full-range FFT is defined by a grid of shape . Since the grid is radial, it will be centered. This implies that if we want to keep information of all modes,
the grid will have to be at least of shape . The advantages of this method:
it keeps information on the full range of modes.
The disadvantages:
it introduces rounding errors, as the field has shape and not all of the possible distance combinations of such a grid are covered bya grid of shape .
Half-range FFT¶
The half-range FFT is defined by a grid of shape . The advantages of this method:
has the perk of having no shape mismatch between field and correlation function.
The disadvantages:
it loses half of the -range, missing information on small scales.
Power spectrum recovery¶
Gaussian and lognormal fields¶
We begin by testing the consistency of our Gaussian and lognormal maps generation pipeline.
Figure 9:Comparison of a Gaussian map, on the left, with a lognormal map on the right. Both maps arise from the same random seed. The colorbar has been adjusted to enhance the differences between the two. The histogram plot shows the clear difference in the map distributions.
We show example realisations of the two fields in Fig. 9, Gaussian on left and lognormal on the right. There’s a visible difference between the two, as it can be seen clearly from the distribution plot. The main check to perform is for testing whether the generated fields recover the theoretical power spectrum. Fig. 10 shows that this is the case for the Gaussian fields. They recover the fiducial within a few percent error, with larger deviations at the low and high ends of the -range. Instead, lognormal fields present deviations . As the lognormal transformations we use have been reported by different sources Zhou et al. (2023)Boruah et al. (2022), the issue must lie with our JAX implementation of the Hankel transformation. Resolving such issues could be achieved by future iterations of this work. In this work, we restrict ourselves to the use of Gaussian fields, as it is enough to prove our thesis and show that Gaussian processes can be applied to cosmological fields.
Figure 10:Power spectrum estimation from Gaussian and lognormal maps. Mean and standard deviation are calculated with 500 realisation of both fields.
Gaussian process priors¶
First, we test the ability of the kernels we have built in Sec. ?? to recover the power spectrum of our cosmology.
Figure 11:Reconstructed power spectrum from prior sample of GP with the four proposed kernels: integration, FFTlog, full-range FFT, half-range FFT and sinc FFTlog. Mean and standard deviation are calculated with 500 prior samples from each GP.
We test five models in Fig. 11: integration, FFTlog, full-range FFT, half-range FFT and sinc FFTlog. The first four methods are described in the Gaussian process kernels Sec. ??, whereas the sinc FFTlog referes to a FFTlog model on which we applied smoothing, by multiplying the power spectrum by a factor of . The recovered power spectra are plotted against the fiducial power spectrum, or smoothed power spectrum for the sinc FFTlog. Mean and standard deviation associated to the plots are calculated from 500 samples. As expected the integration, FFTlog and full-range FFT perform similarly, as they all contain the same ammount of information. As these models deviate so strongly from the fiducial power spectrum we tried applying smoothing, which helps to recover half of the -range at large scales. The only method that seems to be consistently recovering the fiducial power spectrum is the half-range FFT. One could argue that due to the inherent discreteness and boundedness of the fields we are working with, using FFTs is the most natural choice; also, half-range FFT uses the only grid that recovers a correlation function of the same shape as the field without having to perform binning.
Figure 12:Reconstructed power spectrum from prior samples of a GP, as a function of . Mean and standard deviation are calculated with 500 prior samples for each different cosmology.
We have also tested the efficacy of the half-range FFT model for different cosmologies of values equal to , and, our fiducial cosmology, . As Fig. 12 shows, the model is independent of the choice of cosmology. From here on the results will be presented assuming a kernel built with the half-range FFT model.
Gaussian process map reconstruction¶
Armed with a reliable kernel, let’s embark upon the journey of reconstructing a heavily masked cosmological field. What we will do is: create a noiseless GRF in the fiducial cosmology Tab. %s, True map; apply a mask to obtain the Data map; condition a Gaussian process which assumes the fiducial cosmology. Fig. 13 lists the result of this operation, showing the resulting mean and standard deviation of the conditioned GP. We also plot the ratio between residuals True and standard deviation squared, to test the goodness of fit of our model, the values of the map sum up to . With the mask covering pixels, we obtain . Of course, this is just a noiseless application, which is unreasonable for a real application.
Figure 13:Summary of field reconstruction abilities of a Gaussian process conditioned on data. The left column shows the masked GRF, which is our data. The middle column shows the true GRF without masks and a posterior sample drawn from the conditioned GP. The right column shows maps of the mean, standard deviation and residuals over standard deviation squared of the conditioned GP. Regions of higher uncertainty correspond to the masked regions. The residuals over standard deviation map also shows how regions with low mask recover the data.
Inference of cosmological parameters¶
To test the ability of Gaussian processes to recover cosmological parameters without any prior knowledge except a noisy and masked map, we perform a MCMC simulation to infer the posterior distributions of and . We use the convention
to infer deterministically a posterior for . Such a reparametrisation is needed due to the strong degeneracy between and . Eq. (30) breaks this degeneracy, changing the geometry of the sampling space and making the sampling more consistent. The model assumes uninformed flat priors for the cosmological parameters, as shown in Tab. %s, such prior bounds are also in accordance with the jaxcosmo release Campagne et al. (2023). The likelihood of the model is given by a Gaussian process distribution conditioned on Data, with a standard deviation equal to the noise applied to the map. The analysis is coded with numpyro Phan et al. (2019) Bingham et al. (2019), using a the No-U-Turn Sampler (NUTS) method with max_tree_depth=16, target_accept_prob=0.8. We simulate 8 chains for the grid and 4 chains for the . Each chain performs 1000 warmup steps and 3000 samples.
One parameter¶
As a first step and for a consistency check, we run the inference model for one cosmological parameter, keeping all others fixed. Using a grid with . In Fig. 15 we show the inferred distribution for both and . We find that we are able to recover the true value for both parameters within two sigmas, and . We notice a slight tendency of the inferred distribution to be biased low; a tendency we also observe next for both sampled parameters, and .
Figure 15:Inferred posterior distribution of on the left and on the right. Dotted lines indicate the level. Truth values corresponding to the fiducial cosmology are indicated in blue.
Two parameters¶
Effect of noise¶
Table 1:List of inferred cosmological parameters inferred by the model with a small grid and for a fixed true GRF realisation. We present the cosmological parameters inferred as we increase the noise level, corresponding to , 10, 30 and .
| 0.0069 | 0.0044 | 0.0025 | 0.0014 | |
We perform some tests on low resolution grids to see the effect that the noise level has on the recovered parameters, see Tab. 1. Here we report the inferred cosmological parameters for one data realisation and different noise levels, corresponding respectively to , 10, 30 and , see Eq. (22). The inferred value of can vary as much as a full between high and low noise runs. Keeping in mind that and are extremely unreliable due to relative uncertainties of caused by the degeneracy: as a general trend we notice gets bigger when gets smaller with less noise.
Inferred cosmological parameters¶
Running the model for a larger grid with , gives much better constraints on the cosmological parameters. We present the values recovered by the posterior distributions, listed as follows in Tab. 2.
Table 2:Mean and sigma values recovered from the inferred distributions of the cosmological parameters.
Fig. 16 shows the inferred posterior distributions and contours for the three cosmological parameters , and . Looking at the contours, we obtain the well known banana-shaped degeneracy between and . The and contour presents sharp cuts for high and low , indicating an issue with the bounds of the uniform priors imposed. Unfortunately the jaxcosmo package does not allow for the choice of priors to be wider than what shown in Tab. %s, as the model then starts to have divergent samples.
Figure 16:Inferred posterior distributions of , and . For noise level . Contours indicate the and credible interval respectively. Dotted lines indicate the level. Truth values corresponding to the fiducial cosmology are indicated in blue.
Posterior checks¶
Following the two parameter inference model, we perform some posterior checks at the map level Porqueres et al. (2021). Fig. 18 sums up the ability of the model to recover the true map, noiseless and unmasked. Here we present the run with noise level and a grid. We show the mean and standard deviation for the sample with highest likelihood out of the 12000. The mean field is visibly different to the true field in the masked regions and it seems to be of overall lower amplitude. The sample map is comparable to the noisy data; which is to be expected, as the internal noise given to the Gaussian process is the same as the noise level of the data. The standard deviation map presents an overall amplitude comparable to the noise level , with higher values for the masked regions. Summing up the map values of the residuals divided by standard deviation squared, we obtain a . Compared to the number of free parameters in our inference model, which for a mask and a grid, is . The value of therefore seems to be low, indicating that the noise level assumed by the GP is overestimated. This is supported by the fact that the sample map looks just as noisy as the data, according to Fig. 17, its distribution is in fact just as wide as the noise.
Figure 17:Residual distributions of the mean and sample compared to noise. The mean is less spread, whereas the sample is wider.
Residuals
Figure 18:Summary of the two parameter inference at the map level. The left column shows the masked and noisy GRF realisation used, which is our data. The middle column shows the true GRF and a sample from the conditioned GP. The right column shows maps of the mean, standard deviation and residuals over standard deviation squared resulting from the numpyro model sample with highest likelihood. Regions of higher uncertainty correspond to the masked regions.
We have hereby introduced the tool of Gaussian processes to the landscape of map inference of cosmological fields, in particular weak lensing convergence. We considered how the 2-point statistics of cosmological fields changes when we are dealing with bounded and discrete maps in Sec. ??. We discussed the realisation of Gaussian and lognormal fields in Sec. ??, and showed their ability to recover the 2-point statistics that they encode, in Sec. ??. We have included masking and noise to the data to simulate realistic maps in Sec. ??. We have shown that it is possible to apply physical knowledge about the 2-point correlation function of a cosmological field in order to set up a Gaussian process able to produce a Gaussian realisation of such a field. We considered different set ups for the Gaussian process kernel in Sec. ?? and showed how they fare against one another in Sec. ??, ultimately proving empirically that the half-range FFT model is the best. In Sec. ?? we present an application of Gaussian processes to a noiseless masked convergence map, in order to showcase its ability to reconstruct a heavily masked map. Finally we present our results for the cosmological parameters inference with GPs, conditioning on noisy and masked data. When running the inference model on one cosmological parameter we recover both parameters within two sigmas, and . For the two parameter inference we observe the well known banana-shaped degeneracy between and , as well as recovering within two sigmas.
In future studies GPs could be tested on maps with larger grids. In order to achieve a resolution of arcmin with a map of size , a grid is needed. Too big for a GP. The bottleneck is given by the inversion of the kernel matrix, see (19). Approximations of this operation could enable the use of GPs on larger grids. This can lead to the possibility of applying this method on current weak lensing catalogues and perhaps even full sky catalogues. Another pathway to explore are lognormal fields, as they do a much better job at simulating data than GRFs. Due to GPs being Gaussian, their associated likelihood is not suitable to treat lognormal fields. A modified likelihood could therefore unlock a correct application of GPs to lognormal fields.
Acknowledgments¶
This project wouldn’t have been possible without the author of the idea, Dr. Tilman Tröster. You guided me through my first real research experience and I am grateful. Thank you Veronika Oehl for always being there and for the very helpful discussions. I also express my gratitude to the Cosmology group at ETH, led by Prof. Alexandre Réfrégier. Hearing about my every progress every Monday morning for six months, couldn’t have been easy, thank you. It has been an incredible experience, long and grinding, which I embarked upon with my friends and colleagues Tommaso and Pietro. Thank you for making these past few months memorable. Thanks to Guido van Rossum for giving us Python. Thank you Mia, for your unconditional support, you keep me grounded. Non sarei qua senza di te mamma, grazie.
Thank you, reader.
- Kilbinger, M. (2015). Cosmology with cosmic shear observations: a review. Reports on Progress in Physics, 78(8), 086901. 10.1088/0034-4885/78/8/086901
- Bartelmann, M., & Maturi, M. (2017). Weak gravitational lensing. Scholarpedia, 12(1), 32440. 10.4249/scholarpedia.32440
- Blandford, R. D., Saust, A. B., Brainerd, T. G., & Villumsen, J. V. (1991). The distortion of distant galaxy images by large scale structure. AIP Conference Proceedings, 222(1), 455–458. 10.1063/1.40414
- Mootoovaloo, A., Heavens, A. F., Jaffe, A. H., & Leclercq, F. (2020). Parameter inference for weak lensing using Gaussian Processes and MOPED. Monthly Notices of the Royal Astronomical Society, 497(2), 2213–2226. 10.1093/mnras/staa2102
- Boruah, S. S., Eifler, T., Miranda, V., & Krishanth, P. M. S. (2022). Accelerating cosmological inference with Gaussian processes and neural networks – an application to LSST Y1 weak lensing and galaxy clustering. Monthly Notices of the Royal Astronomical Society, 518(4), 4818–4831. 10.1093/mnras/stac3417
- Karchev, K., Coogan, A., & Weniger, C. (2022). Strong-lensing source reconstruction with variationally optimized Gaussian processes. Monthly Notices of the Royal Astronomical Society, 512(1), 661–685. 10.1093/mnras/stac311
- Shafieloo, A., Kim, A. G., & Linder, E. V. (2012). Gaussian process cosmography. Phys. Rev. D, 85(12), 123530. 10.1103/PhysRevD.85.123530
- Seikel, M., Clarkson, C., & Smith, M. (2012). Reconstruction of dark energy and expansion dynamics using Gaussian processes. Journal of Cosmology and Astroparticle Physics, 2012(06), 036. 10.1088/1475-7516/2012/06/036
- Holsclaw, T., Alam, U., Sansó, B., Lee, H., Heitmann, K., Habib, S., & Higdon, D. (2010). Nonparametric reconstruction of the dark energy equation of state. Phys. Rev. D, 82(10), 103502. 10.1103/PhysRevD.82.103502
- Chon, G., Challinor, A., Prunet, S., Hivon, E., & Szapudi, I. (2004). Fast estimation of polarization power spectra using correlation functions. Monthly Notices of the Royal Astronomical Society, 350(3), 914–926. 10.1111/j.1365-2966.2004.07737.x
- Brown, M. L., Castro, P. G., & Taylor, A. N. (2005). Cosmic microwave background temperature and polarization pseudo-C\ell estimators and covariances. Monthly Notices of the Royal Astronomical Society, 360(4), 1262–1280. 10.1111/j.1365-2966.2005.09111.x
- Zhou, A. J., Li, X., Dodelson, S., & Mandelbaum, R. (2023). Accurate field-level weak lensing inference for precision cosmology.
- Dodelson, S., & Schmidt, F. (2021). 13 - Probes of structure: lensing. In S. Dodelson & F. Schmidt (Eds.), Modern Cosmology (Second Edition) (Second Edition, pp. 373–399). Academic Press. https://doi.org/10.1016/B978-0-12-815948-4.00019-X
- Nicola, A., García-García, C., Alonso, D., Dunkley, J., Ferreira, P. G., Slosar, A., & Spergel, D. N. (2021). Cosmic shear power spectra in practice. Journal of Cosmology and Astroparticle Physics, 2021(03), 067. 10.1088/1475-7516/2021/03/067
- Gao, Z., Raccanelli, A., & Vlah, Z. (2023). Asymptotic connection between full- and flat-sky angular correlators. Phys. Rev. D, 108(4), 043503. 10.1103/PhysRevD.108.043503