Weak lensing map inference: a physics informed Gaussian process approach

Data simulation¶

Cosmology¶

This work follows in the general footsteps of data analysis of weak lensing surveys like HSC Hikage et al. (2019) and KiDS Köhlinger et al. (2017)Asgari, Marika et al. (2021), as we aim to replicate their methodologies. Throughout the work, we assume a fiducial cosmology of fixed parameter values as shown in Tab. %s. In particular, all data maps will be generated from a power spectrum following this cosmology. We will refer to this power spectrum as the fiducial power spectrum $C(l)$ . A leading modelling choice comes with the redshift distribution $n(z)$ . We model it as a Smail-type distribution Smail et al. (1995)Kirk et al. (2012),

n(z)=z^\alpha \exp{\left[-\left(\frac{z}{z_0}\right)^\beta\right]}.

(1)

The choice of parameters has been made to emulate bin 5 of the KiDS1000 survey Hildebrandt, H. et al. (2021), with parameters $\alpha=3.5$ , $\beta=4.5$ , $z_0=1$ .

Smail-type redshift distribution for the chosen parameters: \alpha=3.5, \beta=4.5, z_0=1. — Figure 5:Smail-type redshift distribution for the chosen parameters: $\alpha=3.5$ , $\beta=4.5$ , $z_0=1$ .

Map making pipeline¶

When dealing with maps of finite size $(L, L)$ and pixel resolution $(N,N)$ , Fourier space is going to have boundaries, just as real space does. These limits are given by

\begin{gather*} l_{\text{min}} = \frac{2\pi}{L},\\ l_{\text{max}} = \frac{2\pi}{L} N. \end{gather*}

(2)

Therefore, a map of size $(10^{\circ},10^{\circ})$ and a grid $64\times64$ will have limits $(l_{\text{min}}, l_{\text{max}}) = (36, 2304)\text{rad}^{-1}$ . Now that we have the physical range for the power spectrum, we can generate data.

Figure 6:Sequence of transformations used to generate Gaussian and lognormal maps starting from the fiducial angular power spectrum. The top branch shows how to generate a GRF which when transformed to a lognormal field, follows the fiducial $C(l)$ . The bottom branch is a standard GRF realisation starting from the fiducial $C(l)$ . $L$ and $G$ stand for lognormal and Gaussian respectively. $\mathcal{H}$ is the Hankel transformation, and tilde refers to intermediate results.

To generate a GRF, we employ the algorithm mentioned in Eq. (13). However, in making a lognormal field, the matter is a bit more complicated. Any GRF on which we apply the lognormal transformation $\mathcal{L}_\kappa$ , from Eq. eq:L_k, becomes a lognormal field. The reason we do not just lognormal transform any field is given by the fact that they would not recover the fiducial power spectrum $C(l)$ . Our goal is therefore to find a transfer GRF $\tilde{\kappa}^G$ which, when transformed lognormally, gives rise to a lognormal field $\kappa^L$ that recovers $C(l)$ . To do so, we follow the sequence of transformations illustrated in the top branch of Fig. 6.

Noise and mask¶

Approximately 10\% mask applied to the data. Size (10^{\circ},10^{\circ}) and grid 64\times64. — Figure 7:Approximately $10\%$ mask applied to the data. Size $(10^{\circ},10^{\circ})$ and grid $64\times64$ .

As all data is being simulated, we only take into account the so-called shape noise, which is due to the intrinsic distribution of ellipticities and angle formed with respect to us. GPs will treat each pixel of the map as a random variable Gaussianly distributed with standard deviation given by Croft et al. (2017),

\sigma_\text{noise} = \frac{\sigma_e}{\sqrt{n_g A_{px}}}.

(3)

We use values of $\sigma_e=0.26$ , $n_g= 4, 10, 30, 100 \text{ galaxies}/\text{arcmin}^2$ , and a pixel area given by the pixel resolution squared, $A_{px}=(L/N)^2$ . For our highest-resolution run, we have $N=64$ and use $n_g = 10 \text{ galaxies}/\text{arcmin}^2$ , which results in a noise standard deviation of $\sigma_\text{noise} \sim 0.0088$ . The mask being used as seen in Fig. 7, covers approximately $10\%$ of the patch Grewal et al. (2024). We keep the noise and mask random seeds fixed throughout the work.

Kernel¶

The kernel of a Gaussian process is given by a function of the form $k(x,y)$ . It takes in two points, $x$ and $y$ , and returns the value of their correlation. In our case, specifically, $x$ and $y$ will be two points in a grid of shape $(N,N)$ , and the kernel function will be the convergence angular autocorrelation function. As all code used for this paper is written in JAX Bradbury et al. (2018) we opt for the use of the library tinygp Foreman-Mackey et al. (2024) for all Gaussian process computations. tinygp allows for the use of a custom kernel with a custom evaluate method, which takes two points on the grid and returns the correlation value. We follow with two Python pseudocodes of our kernel implementations.

class kernel_Hankel:
    def __init__(self, cl, N, L):
        self.w   = Hankel(cl, N, L)
        self.r   = L / N
        
    def evaluate(self, x, y):
        theta    = self.r * sqrt(sum(x - y))
        return self.w(theta)

Program 1:\code{kernel_Hankel} uses the helper function \code{Hankel} which returns a 1D callable correlation function \code{w}. Then finds the euclidean distance between $x$ and $y$ and evaluates the correlation function at that point. We will discuss how we perform the Hankel transformation in the following section.

class kernel_FFT:
    def __init__(self, cl2D, N, L):
        M        = sqrt(cl2D.size)
        self.w2D = abs(ifft2(cl2D)) * M**2 / L**2
        self.r   = M / N
        
    def evaluate(self, x, y):
        d0       = self.r * abs(x[0] - y[0])
        d1       = self.r * abs(x[1] - y[1])
        return self.w2D[int(d0)][int(d1)]

Program 2:\code{kernel_FFT} performs a 2D Fourier transform on the power spectrum \code{cl2D}. This returns a 2D array that is the correlation function. Then to evaluate \code{w2D} we need two indices. These are given by the difference of $x$ and $y$ component wise. Furthermore we have to be careful about possible mismatches between the shape of \code{cl2D} and the grid of our map. For this we have the renormalisation factor \code{r}. We will discuss further how we get \code{cl2D} in the following section.

Hankel transform¶

In order to build a Gaussian process kernel, we need to find the correlation function that best describes the data. The way we do this is by computing the angular power spectrum and transforming it in the corresponding angular correlation function. Let’s explore the previously discussed flat-sky relation between angular power spectrum and correlation function given by Eq. (8). This particular integral of a Bessel function is also known as a zeroth order Hankel transformation,

w(\theta) = \int \frac{dl}{2\pi} l C(l) J_0(l\theta).

(4)

We will explore two methods for computing this integral: the integration method Sec. ?? and the FFTlog method Sec. ??.

Integration¶

Integration is the most straight forward way to evaluate the integral, but it requires to implement an algorithm for the approximation of the Bessel function $J_0$ . The advantages of this method:

it is easy to integrate over the correct $l$ -range, from $l_{\text{min}}$ and $l_{\text{max}}$ , a freedom that we do not have with the FFTlog.

The disadvantages:

integration is computationally slow, especially when dealing with highly oscillatory behaviour introduced by the Bessel function, which requires fine sampling.

FFTlog¶

The FFTlog method Simonović et al. (2018) is a fast implementation of the Hankel transformation. In fact it simplifies Eq. (23) by assuming a power decomposition of $C(l)$ . Such a decomposition is achievable by taking the fast Fourier transformation (FFT) in $\log k$ , hence the name FFTlog. The power spectrum becomes,

C(l)=\sum_\alpha c_\alpha l^{\upsilon + i\eta_\alpha}.

(5)

Substituting in Eq. (23),

w(\theta) = \sum_\alpha c_\alpha \int_0^\infty \frac{dl}{2\pi} l^{\upsilon + i\eta_\alpha+1} J_0(l\theta).

Take $x=l\theta$ and $s_\alpha-1=\upsilon + i\eta_\alpha+1$ ,

w(\theta) = \sum_\alpha c_\alpha \theta^{-s_\alpha} \int_0^\infty \frac{dx}{2\pi} x^{s_\alpha-1} J_0(x).

Lastly we recognise $\int_0^\infty dx x^{s-1}J_0(x) = \frac{2^{s-1}}{\pi}\sin(\pi s/2) [\Gamma(s/2)]^2$ , a Mellin transform. Using this tabulated result we conclude that the correlation is given by the sum,

w(\theta) = \frac{1}{2\pi^2}\sum_\alpha c_\alpha \theta^{-s_\alpha}2^{s_\alpha-1}\sin(\pi s_\alpha/2) [\Gamma(s_\alpha/2)]^2.

(6)

For the Mellin transform identity to hold analytically, the integral bounds have to go from 0 to $\infty$ . Although computationally we don’t need to consider such a wide range, we still have to broaden the integration limits to something larger than our $l$ -range; so as to avoid ringing effects. Experimentally we have found that extending the $l$ -range between $l_{\text{min}}/4$ and $l_{\text{max}}$ , is enough to compensate for such effects. The advantages of this method:

the FFTlog method is ultimately much faster than the integration method.

The disadvantages:

the widening of the $l$ -range needed to avoid ringing effects inevitably adds power to the correlation function resulting in a higher variance and overall amplitude.

Fourier transform¶

The second transformation that we can use to go from power spectrum to correlation function, is the Fourier transformation discussed in Eq. (7). It is a more fundamental relation than the Hankel transformation as it does not assume that $l$ is radially symmetric. In our case, low resolution, coupled with a square map, means that the radial assumption might not apply. Note that, the lowest resolution grid we use is $32\times32$ , with $L/N \sim 19$ arcmin; which is one order of magnitude worse than today’s weak lensing data, around $\sim 3$ arcmin according to Zhou et al. (2023). In the following Sec. ?? we will find experimentally that the Hankel methods described above do not work well with our GP setup. For this reason, we explore this method of conversion between angular power spectrum and correlation function,

w(\theta)=\int\frac{d^2l}{4\pi^2}e^{-i\bm{l}\cdot\bm{\theta}}C(l).

(7)

As computers can only deal with discrete functions, it is important to note that we will be performing a discrete Fourier transformation (DFT). We will use the widely known FFT algorithm to compute DFTs. In particular we use JAX’s implementation of the 2D FFT algorithm, jax.numpy.fft.ifft2.

Let us find the relation between continuous Fourier transformation and DFT. The definitions of continuous and discrete Fourier transformations in 1D, are respectively:

\begin{align} \mathcal{F}^{-1}&=\int\frac{dk}{2\pi}e^{ikx}, \\ \text{F}^{-1}&=\frac{1}{N}\sum_p e^{ik_px}. \end{align}

(8)

First of all, a DFT is dimensionless. Secondly, it is discrete and bounded. We can therefore rewrite Eq. (27) using the substitution $k(p)=2\pi \frac{p}{L}$ to discretise k-space,

\int\frac{dk}{2\pi}e^{ikx} = \int\frac{dp}{L} e^{ik(p)x} = \frac{1}{L}\sum_p e^{ik_px}.

Applying the definition of DFT as seen in Eq. eq:DFT, it follows that in 1D

\mathcal{F}^{-1}=\frac{N}{L}\text{F}^{-1}.

(9)

Which means that the correlation function is given by a backwards normalised inverse DFT to be scaled by a factor $(N/L)^d$ , where $d$ is the dimension of the considered space. In our case, $d=2$ .

In order to perform a FFT in 2D, we will need a two dimensional extension of the angular power spectrum. We make use of its radial symmetry with respect to $\bm{l}=(l_x,l_y)$ and create a 2D grid of shape $(M,M)$ as shown in Fig. 8.

Figure 8:1D to 2D extension of the power spectrum $C(l)$

Now that we have a 2D power spectrum we can take the inverse two dimensional fast Fourier transformation to obtain a 2D correlation function. In practice, we will test two grids, which we name full-range FFT method and half-range FFT method.

Full-range FFT¶

The full-range FFT is defined by a grid of shape $(2N,2N)$ . Since the grid is radial, it will be centered. This implies that if we want to keep information of all $N$ modes,

\frac{2\pi}{L},\, \frac{4\pi}{L},\,\cdot\cdot\cdot\,,\, \frac{2\pi}{L}N,

(10)

the grid will have to be at least of shape $M=2N$ . The advantages of this method:

it keeps information on the full range of modes.

The disadvantages:

it introduces rounding errors, as the field has shape $(N,N)$ and not all of the possible distance combinations of such a grid are covered bya grid of shape $(2N,2N)$ .

Half-range FFT¶

The half-range FFT is defined by a grid of shape $(N,N)$ . The advantages of this method:

has the perk of having no shape mismatch between field and correlation function.

The disadvantages:

it loses half of the $l$ -range, missing information on small scales.

References¶

Hikage, C., Oguri, M., Hamana, T., More, S., Mandelbaum, R., Takada, M., Köhlinger, F., Miyatake, H., Nishizawa, A. J., Aihara, H., Armstrong, R., Bosch, J., Coupon, J., Ducout, A., Ho, P., Hsieh, B.-C., Komiyama, Y., Lanusse, F., Leauthaud, A., … Yamada, Y. (2019). Cosmology from cosmic shear power spectra with Subaru Hyper Suprime-Cam first-year data. Publications of the Astronomical Society of Japan, 71(2), 43. 10.1093/pasj/psz010
Köhlinger, F., Viola, M., Joachimi, B., Hoekstra, H., van Uitert, E., Hildebrandt, H., Choi, A., Erben, T., Heymans, C., Joudaki, S., Klaes, D., Kuijken, K., Merten, J., Miller, L., Schneider, P., & Valentijn, E. A. (2017). KiDS-450: the tomographic weak lensing power spectrum and constraints on cosmological parameters. Monthly Notices of the Royal Astronomical Society, 471(4), 4412–4435. 10.1093/mnras/stx1820
Asgari, Marika, Lin, Chieh-An, Joachimi, Benjamin, Giblin, Benjamin, Heymans, Catherine, Hildebrandt, Hendrik, Kannawadi, Arun, Stölzner, Benjamin, Tröster, Tilman, van den Busch, Jan Luca, Wright, Angus H., Bilicki, Maciej, Blake, Chris, de Jong, Jelte, Dvornik, Andrej, Erben, Thomas, Getman, Fedor, Hoekstra, Henk, Köhlinger, Fabian, … Valentijn, Edwin. (2021). KiDS-1000 cosmology: Cosmic shear constraints and comparison between two point statistics. A&A, 645, A104. 10.1051/0004-6361/202039070
Smail, I., Hogg, D. W., Yan, L., & Cohen, J. G. (1995). Deep Optical Galaxy Counts with the Keck Telescope*. The Astrophysical Journal, 449(2), L105. 10.1086/309647
Kirk, D., Rassat, A., Host, O., & Bridle, S. (2012). The cosmological impact of intrinsic alignment model choice for cosmic shear. Monthly Notices of the Royal Astronomical Society, 424(3), 1647–1657. 10.1111/j.1365-2966.2012.21099.x
Hildebrandt, H., van den Busch, J. L., Wright, A. H., Blake, C., Joachimi, B., Kuijken, K., Tröster, T., Asgari, M., Bilicki, M., de Jong, J. T. A., Dvornik, A., Erben, T., Getman, F., Giblin, B., Heymans, C., Kannawadi, A., Lin, C.-A., & Shan, H.-Y. (2021). KiDS-1000 catalogue: Redshift distributions and their calibration. A&A, 647, A124. 10.1051/0004-6361/202039018
Croft, R. A. C., Freeman, P. E., Schuster, T. S., & Schafer, C. M. (2017). Prediction of galaxy ellipticities and reduction of shape noise in cosmic shear measurements. Monthly Notices of the Royal Astronomical Society, 469(4), 4422–4427. 10.1093/mnras/stx1206
Grewal, N., Zuntz, J., & Tröster, T. (2024). Comparing Mass Mapping Reconstruction Methods with Minkowski Functionals.
Bradbury, J., Frostig, R., Hawkins, P., Johnson, M. J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., VanderPlas, J., Wanderman-Milne, S., & Zhang, Q. (2018). JAX: composable transformations of Python+NumPy programs (0.3.13) [Computer software]. http://github.com/google/jax
Foreman-Mackey, D., Yu, W., Yadav, S., Becker, M. R., Caplar, N., Huppenkothen, D., Killestein, T., Tronsgaard, R., Rashid, T., & Schmerler, S. (2024). dfm/tinygp: The tiniest of Gaussian Process libraries (v0.3.0) [Computer software]. Zenodo. 10.5281/zenodo.10463641
Simonović, M., Baldauf, T., Zaldarriaga, M., Carrasco, J. J., & Kollmeier, J. A. (2018). Cosmological perturbation theory using the FFTLog: formalism and connection to QFT loop integrals. Journal of Cosmology and Astroparticle Physics, 2018(04), 030. 10.1088/1475-7516/2018/04/030
Zhou, A. J., Li, X., Dodelson, S., & Mandelbaum, R. (2023). Accurate field-level weak lensing inference for precision cosmology.

Weak lensing map inference: a physics informed Gaussian process approach

Theoretical Framework

Weak lensing map inference: a physics informed Gaussian process approach

Results