kNNpy.HelperFunctions

View Source

  1####################################################################################################
  2
  3#-------------------  These libraries are required for evaluating the functions  -------------------
  4
  5import numpy as np
  6from scipy import interpolate
  7import time
  8import pyfftw
  9import warnings
 10import smoothing_library as SL
 11import MAS_library as MASL
 12
 13####################################################################################################
 14
 15#--------------------------------------  Function Definitions  -------------------------------------
 16
 17####################################################################################################
 18
 19def cdf_vol_knn(vol):
 20    r'''
 21    Returns interpolating functions for empirical CDFs of the given $k$-nearest neighbour distances.
 22    
 23    Parameters
 24    ----------
 25    vol : numpy float array of shape ``(n_query, n_kNN)``
 26        Sorted array of nearest neighbour distances, where 'n_query' is the number of query points and 'n_kNN' is the number of nearest neighbours queried.
 27
 28    Returns
 29    -------
 30    cdf: list of function objects
 31        list of interpolated empirical CDF functions that can be evaluated at desired distance bins.
 32    '''
 33    
 34    #-----------------------------------------------------------------------------------------------
 35
 36    #Initialising a list to contain the interpolating functions
 37    cdf = []
 38
 39    #-----------------------------------------------------------------------------------------------
 40
 41    #Inferring the number of query points and nearest neighbours
 42    n = vol.shape[0]
 43    l = vol.shape[1]
 44
 45    #-----------------------------------------------------------------------------------------------
 46
 47    #Calculating the empirical CDF
 48    gof = ((np.arange(0, n) + 1) / (n*1.0))
 49    for c in range(l):
 50        ind = np.argsort(vol[:, c])
 51        s_vol= vol[ind, c]
 52        #Calculating the interpolating function
 53        cdf.append(interpolate.interp1d(s_vol, gof, kind='linear', bounds_error=False))
 54        
 55    return cdf
 56
 57####################################################################################################
 58
 59def calc_kNN_CDF(vol, bins):
 60    r'''
 61    Returns the kNN-CDFs for the given nearest-neighbour distances, evaluated at the given distance bins.
 62
 63    Parameters
 64    ----------
 65    vol : numpy float array of shape ``(n_query, n_kNN)``
 66        2D array containing sorted 1D arrays of nearest-neighbour distances, where 'n_query' is the number of query points and 'n_kNN' is the number of nearest-neighbours queried. `vol[:, i]` should be the array with the sorted $k_i^{th}$ nearest-neighbour distances.
 67    bins : list of numpy float array
 68        list of distance scale arrays at which the CDFs need to be evaluated (units must be same as in `vol`).
 69
 70    Returns
 71    -------
 72    data : list of numpy float array
 73        kNN-CDFs evaluated at the desired distance bins. ``data[i]`` is the $k_i$NN-CDF if ``vol[:, i]`` containts the $k_i^{th}$ nearest-neigbour distances.
 74    '''
 75
 76    #-----------------------------------------------------------------------------------------------
 77
 78    #Initialising the list of kNN-CDFs
 79    data = []
 80
 81    #-----------------------------------------------------------------------------------------------
 82
 83    #Computing the interpolated empirical CDFs using the nearest-neighbour distances
 84    cdfs = cdf_vol_knn(vol)
 85
 86    #-----------------------------------------------------------------------------------------------
 87
 88    #Looping over the nearest-neighbour indices
 89    for i in range(vol.shape[1]):
 90
 91        #-------------------------------------------------------------------------------------------
 92
 93        #Finding the minimum and maximum values of the NN distances
 94        min_dist = np.min(vol[:, i])
 95        max_dist = np.max(vol[:, i])
 96
 97        #-------------------------------------------------------------------------------------------
 98
 99        #Finding if any of the user-input bins lie outside the range spanned by the NN distances
100        bin_mask = np.searchsorted(bins[i], [min_dist, max_dist])
101        if bin_mask[1]!=len(bins[i]):
102            if bins[i][bin_mask[1]] == max_dist:
103                bin_mask[1] += 1
104
105        #-------------------------------------------------------------------------------------------
106                
107        NNcdf = np.zeros(len(bins[i]))
108        
109        #Setting the value of the CDFs at scales smaller than the smallest NN distance to 0
110        NNcdf[:bin_mask[0]] = 0
111        
112        NNcdf[bin_mask[0]:bin_mask[1]] = cdfs[i](bins[i][bin_mask[0]:bin_mask[1]])
113        
114        #Setting the value of the CDFs at scales larger than the largest NN distance to 1
115        NNcdf[bin_mask[1]:] = 1
116        
117        data.append(NNcdf)
118        
119    return data
120
121####################################################################################################
122
123def create_query_3D(query_type, query_grid, BoxSize):
124    '''
125    Generates an array of query points; can be either randomly drawn from a uniform distribution defined over the box or put on a uniform grid.
126
127    Parameters
128    ----------
129    query_type : {'grid', 'random'}, str
130        the type of query points to be generated; should be 'grid' for query points defined on a uniform grid and 'random' for query points drawn from a uniform random distribution.
131    query_grid : int
132        the 1D size of the query points array; the total number of query points generated will be ``query_grid**3``.
133    BoxSize : float
134        the size of the 3D box of the input density field, in Mpc/h. Must be a positive real number, and must not be ``np.inf`` or ``np.nan``.
135
136    Returns
137    -------
138    query_pos : numpy float array of shape ``(query_grid**3, 3)``
139        array of query point positions. For each query point in the array, the first, second and third entries are the x, y and z coordinates respectively, in Mpc/h.
140
141    Raises
142    ------
143    ValueError
144        if `BoxSize` is not a positive real number less than infinity.
145    ValueError
146        if an unknown query type is provided.
147        
148    See Also
149    --------
150    kNNpy.HelperFunctions.create_query_2DA : generates query points in 2D angular coordinates.
151    '''
152
153    #Validating inputs
154
155    if np.isnan(BoxSize) or np.isinf(BoxSize) or BoxSize<=0.0:
156        raise ValueError(f"Invalid box size: {BoxSize}; please provide a positive real number less than infinity!")
157
158    #Creating the query points
159
160    if query_type == 'grid':
161
162        #Creating a grid of query points
163        x_ = np.linspace(0., BoxSize, query_grid, endpoint=False)
164        y_ = np.linspace(0., BoxSize, query_grid, endpoint=False)
165        z_ = np.linspace(0., BoxSize, query_grid, endpoint=False)
166
167        x, y, z = np.array(np.meshgrid(x_, y_, z_, indexing='xy'))
168
169        query_pos = np.zeros((query_grid**3, 3))
170        query_pos[:, 0] = np.reshape(x, query_grid**3)
171        query_pos[:, 1] = np.reshape(y, query_grid**3)
172        query_pos[:, 2] = np.reshape(z, query_grid**3)
173
174    elif query_type == 'random':
175
176        #Creating a set of randomly distributed query points
177        query_pos = np.random.rand(query_grid**3, 3)*BoxSize
178
179    else:   
180        raise ValueError(f"Unknown query type: {query_type}; please provide a valid query type")
181    
182    return query_pos
183    
184####################################################################################################
185
186def smoothing_3D(field, Filter, grid, BoxSize, R=None, kmin=None, kmax=None, thickness=None, Verbose=False):
187    r'''
188    Smooths the given map at the given scale using a window function of choice in real or k-space. 
189    
190    Parameters
191    ----------
192    field : numpy float array
193        the 3D array of the continuous field that needs to be smoothed. 
194    Filter : string
195        the filter to be used for smoothing. 'Top-Hat', 'Gaussian', 'Shell' are for real space, and 'Top-Hat-k' is a top-hat filter in k-space.
196    grid : int
197        the grid size of the input density field, which should be field.shape[0] assuming a cubical box.
198    BoxSize : float
199        the size of the 3D box of the input density field, in Mpc/h.
200    R : float, optional
201        radial scale (in Mpc/h) at which the field is to be smoothed. Only use this parameter for real space smoothing.
202    kmin : float, optional
203        the minimum value of the wavenumber. Only use this parameter when 'Top-Hat-k' filter is used.
204    kmax : float, optional
205        the maximum value of the wavenumber. Only use this parameter when 'Top-Hat-k' filter is used.
206    thickness : float, optional
207        the thickness of the shell used for smoothing. Only use this parameter when 'Shell' filter is used. The smoothing is done using a shell with inner radius R-thickness/2 and outer radius R+thickness/2.
208    Verbose : bool, optional
209        if set to True, the time taken to complete each step of the calculation will be printed, by default False.
210
211    Returns
212    -------
213    smoothed_field : numpy float array of shape field.shape
214        the smoothed field.
215
216    Raises
217    ------
218    ValueError
219        If required parameters (like R, kmin, kmax, or thickness) are missing for the specified filter type.
220    ValueError
221        If the input field dimensions do not form a cubical box.
222    ValueError
223        If the grid size does not match the field dimensions.
224    ValueError
225        If an unknown filter name is provided.
226
227    Notes
228    -----
229    - For real-space filters ('Top-Hat', 'Gaussian', 'Shell'), the radial scale R must be specified.
230    - For the 'Shell' filter, thickness must also be specified.
231    - For the 'Top-Hat-k' filter in Fourier space, kmin and kmax must be specified, while R and thickness are ignored.
232    - Any unused parameters will trigger warnings but not stop execution.
233    '''
234    
235    #-----------------------------------------------------------------------------------------------
236    
237    if Verbose:
238        total_start_time = time.perf_counter()
239        print("\nStarting smoothing ...")
240        
241    #-----------------------------------------------------------------------------------------------
242    
243    if not (field.shape[0] == field.shape[1] == field.shape[2]):
244        raise ValueError("The box provided is not cubical.")
245    elif field.shape[0] != grid:
246        raise ValueError("Grid size provided does not match with dimensions of the cubical box.")
247        
248    #-----------------------------------------------------------------------------------------------
249    
250    if Filter in ['Top-Hat', 'Gaussian']:
251        if R is None:
252            raise ValueError(f"R must be provided for {Filter} filter.")
253        if kmin is not None or kmax is not None:
254            warnings.warn("kmin and kmax are not used for real-space filters and will be ignored.")
255        if thickness is not None:
256            warnings.warn("thickness is not used for real-space filters and will be ignored.")
257        
258        W_k = SL.FT_filter(BoxSize, R, grid, Filter, threads=1)
259        field_k = pyfftw.interfaces.numpy_fft.rfftn(field)
260        smoothed_field_k = field_k * W_k
261        smoothed_field = pyfftw.interfaces.numpy_fft.irfftn(smoothed_field_k)
262        
263    #-----------------------------------------------------------------------------------------------
264    
265    elif Filter == 'Top-Hat-k':
266        if kmin is None or kmax is None:
267            raise ValueError("Both kmin and kmax must be provided for 'Top-Hat-k' filter.")
268        if R is not None:
269            warnings.warn("R is not used for 'Top-Hat-k' filter and will be ignored.")
270        if thickness is not None:
271            warnings.warn("thickness is not used for 'Top-Hat-k' filter and will be ignored.")
272        
273        R = 0.0
274        W_k = SL.FT_filter(BoxSize, R, grid, Filter, kmin=kmin, kmax=kmax, threads=1)
275        field_k = pyfftw.interfaces.numpy_fft.rfftn(field)
276        smoothed_field_k = field_k * W_k
277        smoothed_field = pyfftw.interfaces.numpy_fft.irfftn(smoothed_field_k)
278        
279    #-----------------------------------------------------------------------------------------------
280    
281    elif Filter == 'Shell':
282        if R is None or thickness is None:
283            raise ValueError("Both R and thickness must be provided for 'Shell' filter.")
284        if kmin is not None or kmax is not None:
285            warnings.warn("kmin and kmax are not used for 'Shell' filter and will be ignored.")
286        
287        if Verbose:
288            print("\nGenerating shell-smoothed field ...")
289        
290        grid_cell_size = BoxSize / grid
291        field_k = pyfftw.interfaces.numpy_fft.rfftn(field)
292
293        x = np.fft.fftfreq(grid) * grid
294        y = np.fft.fftfreq(grid) * grid
295        z = np.fft.fftfreq(grid) * grid
296        X, Y, Z = np.meshgrid(x, y, z, indexing='ij')
297        r_grid = grid_cell_size * np.sqrt(X**2 + Y**2 + Z**2)
298
299        W = np.zeros((grid, grid, grid), dtype=np.float32)
300        W[(r_grid >= R - thickness/2) & (r_grid <= R + thickness/2)] = 1.0
301        W_k = pyfftw.interfaces.numpy_fft.rfftn(W)
302
303        smoothed_field_k = field_k * W_k
304        smoothed_field = pyfftw.interfaces.numpy_fft.irfftn(smoothed_field_k) / np.sum(W)
305        
306    #-----------------------------------------------------------------------------------------------
307    
308    else:
309        raise ValueError(f"Unknown filter: {Filter}")
310        
311    #-----------------------------------------------------------------------------------------------
312    
313    if Verbose:
314        print("Smoothing completed.")
315        print(f'Total time taken: {time.perf_counter() - total_start_time:.2e} s.')
316        
317    #-----------------------------------------------------------------------------------------------
318    
319    return smoothed_field
320
321####################################################################################################
322
323def create_smoothed_field_dict_3D(field, Filter, grid, BoxSize, bins, thickness=None, Verbose=False):
324    r'''
325    Creates a dictionary containing the continuous field smoothed at various radial distance scales.
326
327    Parameters
328    ----------
329    field : numpy float array
330        the 3D array of the continuous field that needs to be smoothed. 
331    Filter : string
332        the filter to be used for smoothing. Valid filter types are: 'Top-Hat', 'Gaussian', 'Shell'. 
333    grid : int
334        the grid size of the input density field, which should be field.shape[0] assuming a cubical box.
335    BoxSize : float
336        the size of the 3D box of the input density field, in Mpc/h.
337    bins : list of numpy float array
338        list of distances for each nearest neighbour. The $i^{th}$ element of the list should contain a numpy array of the desired distance scales for the $k_i^{th}$ nearest neighbour.
339    thickness : float, optional
340        the thickness of the shell used for smoothing. Only use this parameter when 'Shell' filter is used. The smoothing is done using a shell with inner radius R-thickness/2 and outer radius R+thickness/2.
341    Verbose : bool, optional
342        if set to True, the time taken to complete each step of the calculation will be printed, by default False.
343
344    Returns
345    -------
346    SmoothedFieldDict : dict
347        dictionary containing the continuous field smoothed at various radial distance scales. For example, `SmoothedFieldDict['50.0']`  represents the continuous map smoothed at a scale of 50 Mpc/h.
348
349    Raises
350    ------
351    ValueError
352        If required parameters (like bins or thickness) are missing for the specified filter type.
353    ValueError
354        If the input field dimensions do not form a cubical box.
355    ValueError
356        If the grid size does not match the field dimensions.
357    ValueError
358        If an unknown filter name is provided.
359
360    Notes
361    -----
362    - This function only works for the real space filters, so 'Top-Hat-k' is not a valid filter for this function.
363    - For the 'Shell' filter, thickness must be specified.
364    - Any unused parameters will trigger warnings but not stop execution.
365    '''
366    
367    #-----------------------------------------------------------------------------------------------
368    
369    # This function is only for smoothing in real space
370    
371    if Filter == 'Top-Hat-k':
372        raise ValueError(f"Unknown filter: {Filter}")
373        
374    kmin = None
375    kmax = None
376    
377    #-----------------------------------------------------------------------------------------------
378    
379    if Verbose: 
380        total_start = time.perf_counter()
381        print(f'\nSmoothing the density field over the radial distance scales...\n')
382
383    #-----------------------------------------------------------------------------------------------
384
385    #Initializing the dictionary
386    SmoothedFieldDict = {}
387
388    #-----------------------------------------------------------------------------------------------
389    
390    #Looping over the nearest neighbour indices as inferred from the length of 'bins'
391    for i in range(len(bins)):
392
393        #-------------------------------------------------------------------------------------------
394        
395        if Verbose: start = time.perf_counter()
396
397        #-------------------------------------------------------------------------------------------
398        
399        for j, R in enumerate(bins[i]):
400            SmoothedFieldDict[str(R)] = \
401            smoothing_3D(field, Filter, grid, BoxSize, R, kmin, kmax, thickness, Verbose)
402
403    #-----------------------------------------------------------------------------------------------
404    
405    if Verbose: print('\nTotal time taken for all scales: {:.2e} s.'.format(time.perf_counter()-total_start))
406
407    return SmoothedFieldDict
408
409####################################################################################################
410
411def CIC_3D_Interp(pos, field, Boxsize):
412    r'''
413    Interpolates a 3D field onto particle positions using Cloud-In-Cell (CIC) interpolation.
414
415    Parameters
416    ----------
417    field : numpy.ndarray of shape ``(Ng, Ng, Ng)``
418        The 3D scalar field defined on a cubic grid with resolution 'Ng^3'.
419
420    pos : numpy.ndarray of shape ``(Np, 3)``
421        The positions of 'Np' particles. The columns represent x, y, and z coordinates. Units in Mpc/h
422    
423    Boxsize: float
424            The side length of the cubic volume in the same units as `pos`.
425    Returns
426    -------
427    fieldI : numpy.ndarray of shape ``(Np,)``
428        The interpolated field values at the given particle positions.
429    '''
430    
431    #-----------------------------------------------------------------------------------------------
432
433    # define the array containing the value of the density field at positions pos
434    density_interpolated = np.zeros(pos.shape[0], dtype=np.float32)
435
436    #-----------------------------------------------------------------------------------------------
437
438    # find the value of the density field at the positions pos
439    MASL.CIC_interp(field, Boxsize, pos, density_interpolated)
440
441    #-----------------------------------------------------------------------------------------------
442
443    return density_interpolated
444
445####################################################################################################
446
447def kNN_excess_cross_corr(auto_cdf_list_1, auto_cdf_list_2, joint_cdf_list, k1_k2_list=None):
448    r'''
449    Computes the excess spatial cross-correlation (Banerjee & Abel 2023)[^1] between two tracers (discrete or continuous) from their joint kNN distributions (`joint_cdf_list`) and their respective kNN-CDFs (`auto_cdf_list_1`, `auto_cdf_list_2`).
450
451    Parameters
452    ----------
453    auto_cdf_list_1 : list of numpy float array
454        auto kNN-CDFs of the first set of tracers. If `k1_k2_list` is not ``None``, The $i^{th}$ element should be the $k_1^i$NN-CDF if the $i^{th}$ element of `k1_k2_list` is ($k_1^i$, $k_2^i$).
455    auto_cdf_list_2 : list of numpy float array
456        auto kNN-CDFs of the second set of tracers. If `k1_k2_list` is not ``None``, The $i^{th}$ element should be the $k_2^i$NN-CDF if where the $i^{th}$ element of `k1_k2_list` is ($k_1^i$, $k_2^i$).
457    joint_cdf_list : list of numpy float array
458        joint kNN distributions of the two tracer sets. If `k1_k2_list` is not ``None``, The $i^{th}$ element should be the joint {$k_1^i$, $k_2^i$}NN-CDF, where the $i^{th}$ element of `k1_k2_list` is ($k_1^i$, $k_2^i$).
459        
460    k1_k2_list : list of int tuples
461        describes the kind of cross-correlations being computed (see notes for more details), by default `None`. Should be not None only if dealing with tracer-tracer cross-correlations
462
463    Returns
464    -------
465    psi_list : list of numpy float array
466        excess spatial cross-correlation between the two tracer sets.
467
468    Raises
469    ------
470    ValueError
471        if `k1_k2_list` is not `None` and `len(joint_cdf_list)!=len(k1_k2_list)`
472    ValueError
473        if `k1_k2_list` is `None` and `len(joint_cdf_list)!=len(auto_cdf_list_1) or len(joint_cdf_list)!=len(auto_cdf_list_2)`
474
475    Notes
476    -----
477    The parameter `k1_k2_list` describes the kind of cross-correlations being computed. It should be set to `None` for every scenario other than tracer-tracer cross-correlation, in which case it should provide the combinations of NN indices for the two tracers sets being cross-correlated. 
478    
479    For example, if you wish to compute the excess cross correlation for the joint {1,1}, {1,2} and {2,1}NN-CDFs, then set
480            
481        k1_k2_list = [(1,1), (1,2), (2,1)]
482
483    Note that the inputs must be self-consistent, which means the following must be ``True``
484
485        len(joint_cdf_list)==len(auto_cdf_list_1) and len(joint_cdf_list)==len(auto_cdf_list_2) and len(joint_cdf_list)==len(k1_k2_list)
486        
487    For example, if
488
489        k1_k2_list = [(1,1), (1,2)]
490
491    then
492        
493        len(auto_cdf_list_1) == 2 and len(auto_cdf_list_2) == 2 and len(joint_cdf_list) == 2
494
495    must hold, and the first (second) element of `joint_cdf_list` should be the joint {1,1}NN-CDF ({1,2}NN-CDF).
496        
497    If `None` is passed for tracer-tracer cross-correlations, the correlations are assumed to be between the same NN indices (eg. {1,1}NN-CDF, {2,2}NN-CDF).
498
499    References
500    ----------
501    [^1]: Arka Banerjee, Tom Abel, Tracer-field cross-correlations with k-nearest neighbour   distributions, [Monthly Notices of the Royal Astronomical Society](https://doi.org/10.1093/mnras/stac3813), Volume 519, Issue 4, March 2023, Pages 4856–4868
502    '''
503
504    #-------------------------------------------------------------------------------------------
505
506    psi_list = []
507
508    #-------------------------------------------------------------------------------------------
509    
510    #Check for consistency:
511    if k1_k2_list:
512        if len(joint_cdf_list)!=len(k1_k2_list) or len(joint_cdf_list)!=len(auto_cdf_list_1) or len(joint_cdf_list)!=len(auto_cdf_list_2): 
513            raise ValueError('Inconsistent input shapes')
514    else:
515        if len(joint_cdf_list)!=len(auto_cdf_list_1) or len(joint_cdf_list)!=len(auto_cdf_list_2): 
516            raise ValueError('Inconsistent input: shapes not compatible with each other')
517    
518    for k in range(len(joint_cdf_list)):
519        psi_list.append(joint_cdf_list[k]/(auto_cdf_list_1[k]*auto_cdf_list_2[k]))
520
521    return psi_list
522        
523####################################################################################################
524
525#----------------------------------------  END OF PROGRAM!  ----------------------------------------
526
527####################################################################################################

def cdf_vol_knn(vol): View Source

20def cdf_vol_knn(vol):
21    r'''
22    Returns interpolating functions for empirical CDFs of the given $k$-nearest neighbour distances.
23    
24    Parameters
25    ----------
26    vol : numpy float array of shape ``(n_query, n_kNN)``
27        Sorted array of nearest neighbour distances, where 'n_query' is the number of query points and 'n_kNN' is the number of nearest neighbours queried.
28
29    Returns
30    -------
31    cdf: list of function objects
32        list of interpolated empirical CDF functions that can be evaluated at desired distance bins.
33    '''
34    
35    #-----------------------------------------------------------------------------------------------
36
37    #Initialising a list to contain the interpolating functions
38    cdf = []
39
40    #-----------------------------------------------------------------------------------------------
41
42    #Inferring the number of query points and nearest neighbours
43    n = vol.shape[0]
44    l = vol.shape[1]
45
46    #-----------------------------------------------------------------------------------------------
47
48    #Calculating the empirical CDF
49    gof = ((np.arange(0, n) + 1) / (n*1.0))
50    for c in range(l):
51        ind = np.argsort(vol[:, c])
52        s_vol= vol[ind, c]
53        #Calculating the interpolating function
54        cdf.append(interpolate.interp1d(s_vol, gof, kind='linear', bounds_error=False))
55        
56    return cdf

Returns interpolating functions for empirical CDFs of the given $k$-nearest neighbour distances.

Parameters

vol (numpy float array of shape (n_query, n_kNN)): Sorted array of nearest neighbour distances, where 'n_query' is the number of query points and 'n_kNN' is the number of nearest neighbours queried.

Returns

cdf (list of function objects): list of interpolated empirical CDF functions that can be evaluated at desired distance bins.

def calc_kNN_CDF(vol, bins): View Source

 60def calc_kNN_CDF(vol, bins):
 61    r'''
 62    Returns the kNN-CDFs for the given nearest-neighbour distances, evaluated at the given distance bins.
 63
 64    Parameters
 65    ----------
 66    vol : numpy float array of shape ``(n_query, n_kNN)``
 67        2D array containing sorted 1D arrays of nearest-neighbour distances, where 'n_query' is the number of query points and 'n_kNN' is the number of nearest-neighbours queried. `vol[:, i]` should be the array with the sorted $k_i^{th}$ nearest-neighbour distances.
 68    bins : list of numpy float array
 69        list of distance scale arrays at which the CDFs need to be evaluated (units must be same as in `vol`).
 70
 71    Returns
 72    -------
 73    data : list of numpy float array
 74        kNN-CDFs evaluated at the desired distance bins. ``data[i]`` is the $k_i$NN-CDF if ``vol[:, i]`` containts the $k_i^{th}$ nearest-neigbour distances.
 75    '''
 76
 77    #-----------------------------------------------------------------------------------------------
 78
 79    #Initialising the list of kNN-CDFs
 80    data = []
 81
 82    #-----------------------------------------------------------------------------------------------
 83
 84    #Computing the interpolated empirical CDFs using the nearest-neighbour distances
 85    cdfs = cdf_vol_knn(vol)
 86
 87    #-----------------------------------------------------------------------------------------------
 88
 89    #Looping over the nearest-neighbour indices
 90    for i in range(vol.shape[1]):
 91
 92        #-------------------------------------------------------------------------------------------
 93
 94        #Finding the minimum and maximum values of the NN distances
 95        min_dist = np.min(vol[:, i])
 96        max_dist = np.max(vol[:, i])
 97
 98        #-------------------------------------------------------------------------------------------
 99
100        #Finding if any of the user-input bins lie outside the range spanned by the NN distances
101        bin_mask = np.searchsorted(bins[i], [min_dist, max_dist])
102        if bin_mask[1]!=len(bins[i]):
103            if bins[i][bin_mask[1]] == max_dist:
104                bin_mask[1] += 1
105
106        #-------------------------------------------------------------------------------------------
107                
108        NNcdf = np.zeros(len(bins[i]))
109        
110        #Setting the value of the CDFs at scales smaller than the smallest NN distance to 0
111        NNcdf[:bin_mask[0]] = 0
112        
113        NNcdf[bin_mask[0]:bin_mask[1]] = cdfs[i](bins[i][bin_mask[0]:bin_mask[1]])
114        
115        #Setting the value of the CDFs at scales larger than the largest NN distance to 1
116        NNcdf[bin_mask[1]:] = 1
117        
118        data.append(NNcdf)
119        
120    return data

Returns the kNN-CDFs for the given nearest-neighbour distances, evaluated at the given distance bins.

Parameters

vol (numpy float array of shape (n_query, n_kNN)): 2D array containing sorted 1D arrays of nearest-neighbour distances, where 'n_query' is the number of query points and 'n_kNN' is the number of nearest-neighbours queried. vol[:, i] should be the array with the sorted $k_i^{th}$ nearest-neighbour distances.
bins (list of numpy float array): list of distance scale arrays at which the CDFs need to be evaluated (units must be same as in vol).

Returns

data (list of numpy float array): kNN-CDFs evaluated at the desired distance bins. data[i] is the $k_i$NN-CDF if vol[:, i] containts the $k_i^{th}$ nearest-neigbour distances.

def create_query_3D(query_type, query_grid, BoxSize): View Source

124def create_query_3D(query_type, query_grid, BoxSize):
125    '''
126    Generates an array of query points; can be either randomly drawn from a uniform distribution defined over the box or put on a uniform grid.
127
128    Parameters
129    ----------
130    query_type : {'grid', 'random'}, str
131        the type of query points to be generated; should be 'grid' for query points defined on a uniform grid and 'random' for query points drawn from a uniform random distribution.
132    query_grid : int
133        the 1D size of the query points array; the total number of query points generated will be ``query_grid**3``.
134    BoxSize : float
135        the size of the 3D box of the input density field, in Mpc/h. Must be a positive real number, and must not be ``np.inf`` or ``np.nan``.
136
137    Returns
138    -------
139    query_pos : numpy float array of shape ``(query_grid**3, 3)``
140        array of query point positions. For each query point in the array, the first, second and third entries are the x, y and z coordinates respectively, in Mpc/h.
141
142    Raises
143    ------
144    ValueError
145        if `BoxSize` is not a positive real number less than infinity.
146    ValueError
147        if an unknown query type is provided.
148        
149    See Also
150    --------
151    kNNpy.HelperFunctions.create_query_2DA : generates query points in 2D angular coordinates.
152    '''
153
154    #Validating inputs
155
156    if np.isnan(BoxSize) or np.isinf(BoxSize) or BoxSize<=0.0:
157        raise ValueError(f"Invalid box size: {BoxSize}; please provide a positive real number less than infinity!")
158
159    #Creating the query points
160
161    if query_type == 'grid':
162
163        #Creating a grid of query points
164        x_ = np.linspace(0., BoxSize, query_grid, endpoint=False)
165        y_ = np.linspace(0., BoxSize, query_grid, endpoint=False)
166        z_ = np.linspace(0., BoxSize, query_grid, endpoint=False)
167
168        x, y, z = np.array(np.meshgrid(x_, y_, z_, indexing='xy'))
169
170        query_pos = np.zeros((query_grid**3, 3))
171        query_pos[:, 0] = np.reshape(x, query_grid**3)
172        query_pos[:, 1] = np.reshape(y, query_grid**3)
173        query_pos[:, 2] = np.reshape(z, query_grid**3)
174
175    elif query_type == 'random':
176
177        #Creating a set of randomly distributed query points
178        query_pos = np.random.rand(query_grid**3, 3)*BoxSize
179
180    else:   
181        raise ValueError(f"Unknown query type: {query_type}; please provide a valid query type")
182    
183    return query_pos

Generates an array of query points; can be either randomly drawn from a uniform distribution defined over the box or put on a uniform grid.

Parameters

query_type ({'grid', 'random'}, str): the type of query points to be generated; should be 'grid' for query points defined on a uniform grid and 'random' for query points drawn from a uniform random distribution.
query_grid (int): the 1D size of the query points array; the total number of query points generated will be query_grid**3.
BoxSize (float): the size of the 3D box of the input density field, in Mpc/h. Must be a positive real number, and must not be np.inf or np.nan.

Returns

query_pos (numpy float array of shape (query_grid**3, 3)): array of query point positions. For each query point in the array, the first, second and third entries are the x, y and z coordinates respectively, in Mpc/h.

Raises

ValueError: if BoxSize is not a positive real number less than infinity.
ValueError: if an unknown query type is provided.

Parameters

field (numpy float array): the 3D array of the continuous field that needs to be smoothed.
Filter (string): the filter to be used for smoothing. 'Top-Hat', 'Gaussian', 'Shell' are for real space, and 'Top-Hat-k' is a top-hat filter in k-space.
grid (int): the grid size of the input density field, which should be field.shape[0] assuming a cubical box.
BoxSize (float): the size of the 3D box of the input density field, in Mpc/h.
R (float, optional): radial scale (in Mpc/h) at which the field is to be smoothed. Only use this parameter for real space smoothing.
kmin (float, optional): the minimum value of the wavenumber. Only use this parameter when 'Top-Hat-k' filter is used.
kmax (float, optional): the maximum value of the wavenumber. Only use this parameter when 'Top-Hat-k' filter is used.
thickness (float, optional): the thickness of the shell used for smoothing. Only use this parameter when 'Shell' filter is used. The smoothing is done using a shell with inner radius R-thickness/2 and outer radius R+thickness/2.
Verbose (bool, optional): if set to True, the time taken to complete each step of the calculation will be printed, by default False.

Returns

smoothed_field (numpy float array of shape field.shape): the smoothed field.

Raises

ValueError: If required parameters (like R, kmin, kmax, or thickness) are missing for the specified filter type.
ValueError: If the input field dimensions do not form a cubical box.
ValueError: If the grid size does not match the field dimensions.
ValueError: If an unknown filter name is provided.

Notes

For real-space filters ('Top-Hat', 'Gaussian', 'Shell'), the radial scale R must be specified.
For the 'Shell' filter, thickness must also be specified.
For the 'Top-Hat-k' filter in Fourier space, kmin and kmax must be specified, while R and thickness are ignored.
Any unused parameters will trigger warnings but not stop execution.

def create_smoothed_field_dict_3D(field, Filter, grid, BoxSize, bins, thickness=None, Verbose=False): View Source

324def create_smoothed_field_dict_3D(field, Filter, grid, BoxSize, bins, thickness=None, Verbose=False):
325    r'''
326    Creates a dictionary containing the continuous field smoothed at various radial distance scales.
327
328    Parameters
329    ----------
330    field : numpy float array
331        the 3D array of the continuous field that needs to be smoothed. 
332    Filter : string
333        the filter to be used for smoothing. Valid filter types are: 'Top-Hat', 'Gaussian', 'Shell'. 
334    grid : int
335        the grid size of the input density field, which should be field.shape[0] assuming a cubical box.
336    BoxSize : float
337        the size of the 3D box of the input density field, in Mpc/h.
338    bins : list of numpy float array
339        list of distances for each nearest neighbour. The $i^{th}$ element of the list should contain a numpy array of the desired distance scales for the $k_i^{th}$ nearest neighbour.
340    thickness : float, optional
341        the thickness of the shell used for smoothing. Only use this parameter when 'Shell' filter is used. The smoothing is done using a shell with inner radius R-thickness/2 and outer radius R+thickness/2.
342    Verbose : bool, optional
343        if set to True, the time taken to complete each step of the calculation will be printed, by default False.
344
345    Returns
346    -------
347    SmoothedFieldDict : dict
348        dictionary containing the continuous field smoothed at various radial distance scales. For example, `SmoothedFieldDict['50.0']`  represents the continuous map smoothed at a scale of 50 Mpc/h.
349
350    Raises
351    ------
352    ValueError
353        If required parameters (like bins or thickness) are missing for the specified filter type.
354    ValueError
355        If the input field dimensions do not form a cubical box.
356    ValueError
357        If the grid size does not match the field dimensions.
358    ValueError
359        If an unknown filter name is provided.
360
361    Notes
362    -----
363    - This function only works for the real space filters, so 'Top-Hat-k' is not a valid filter for this function.
364    - For the 'Shell' filter, thickness must be specified.
365    - Any unused parameters will trigger warnings but not stop execution.
366    '''
367    
368    #-----------------------------------------------------------------------------------------------
369    
370    # This function is only for smoothing in real space
371    
372    if Filter == 'Top-Hat-k':
373        raise ValueError(f"Unknown filter: {Filter}")
374        
375    kmin = None
376    kmax = None
377    
378    #-----------------------------------------------------------------------------------------------
379    
380    if Verbose: 
381        total_start = time.perf_counter()
382        print(f'\nSmoothing the density field over the radial distance scales...\n')
383
384    #-----------------------------------------------------------------------------------------------
385
386    #Initializing the dictionary
387    SmoothedFieldDict = {}
388
389    #-----------------------------------------------------------------------------------------------
390    
391    #Looping over the nearest neighbour indices as inferred from the length of 'bins'
392    for i in range(len(bins)):
393
394        #-------------------------------------------------------------------------------------------
395        
396        if Verbose: start = time.perf_counter()
397
398        #-------------------------------------------------------------------------------------------
399        
400        for j, R in enumerate(bins[i]):
401            SmoothedFieldDict[str(R)] = \
402            smoothing_3D(field, Filter, grid, BoxSize, R, kmin, kmax, thickness, Verbose)
403
404    #-----------------------------------------------------------------------------------------------
405    
406    if Verbose: print('\nTotal time taken for all scales: {:.2e} s.'.format(time.perf_counter()-total_start))
407
408    return SmoothedFieldDict

Creates a dictionary containing the continuous field smoothed at various radial distance scales.

Parameters

field (numpy float array): the 3D array of the continuous field that needs to be smoothed.
Filter (string): the filter to be used for smoothing. Valid filter types are: 'Top-Hat', 'Gaussian', 'Shell'.
grid (int): the grid size of the input density field, which should be field.shape[0] assuming a cubical box.
BoxSize (float): the size of the 3D box of the input density field, in Mpc/h.
bins (list of numpy float array): list of distances for each nearest neighbour. The $i^{th}$ element of the list should contain a numpy array of the desired distance scales for the $k_i^{th}$ nearest neighbour.
thickness (float, optional): the thickness of the shell used for smoothing. Only use this parameter when 'Shell' filter is used. The smoothing is done using a shell with inner radius R-thickness/2 and outer radius R+thickness/2.
Verbose (bool, optional): if set to True, the time taken to complete each step of the calculation will be printed, by default False.

Returns

SmoothedFieldDict (dict): dictionary containing the continuous field smoothed at various radial distance scales. For example, SmoothedFieldDict['50.0'] represents the continuous map smoothed at a scale of 50 Mpc/h.

Raises

ValueError: If required parameters (like bins or thickness) are missing for the specified filter type.
ValueError: If the input field dimensions do not form a cubical box.
ValueError: If the grid size does not match the field dimensions.
ValueError: If an unknown filter name is provided.

Notes

This function only works for the real space filters, so 'Top-Hat-k' is not a valid filter for this function.
For the 'Shell' filter, thickness must be specified.
Any unused parameters will trigger warnings but not stop execution.

def CIC_3D_Interp(pos, field, Boxsize): View Source

412def CIC_3D_Interp(pos, field, Boxsize):
413    r'''
414    Interpolates a 3D field onto particle positions using Cloud-In-Cell (CIC) interpolation.
415
416    Parameters
417    ----------
418    field : numpy.ndarray of shape ``(Ng, Ng, Ng)``
419        The 3D scalar field defined on a cubic grid with resolution 'Ng^3'.
420
421    pos : numpy.ndarray of shape ``(Np, 3)``
422        The positions of 'Np' particles. The columns represent x, y, and z coordinates. Units in Mpc/h
423    
424    Boxsize: float
425            The side length of the cubic volume in the same units as `pos`.
426    Returns
427    -------
428    fieldI : numpy.ndarray of shape ``(Np,)``
429        The interpolated field values at the given particle positions.
430    '''
431    
432    #-----------------------------------------------------------------------------------------------
433
434    # define the array containing the value of the density field at positions pos
435    density_interpolated = np.zeros(pos.shape[0], dtype=np.float32)
436
437    #-----------------------------------------------------------------------------------------------
438
439    # find the value of the density field at the positions pos
440    MASL.CIC_interp(field, Boxsize, pos, density_interpolated)
441
442    #-----------------------------------------------------------------------------------------------
443
444    return density_interpolated

Interpolates a 3D field onto particle positions using Cloud-In-Cell (CIC) interpolation.

Parameters

field (numpy.ndarray of shape (Ng, Ng, Ng)): The 3D scalar field defined on a cubic grid with resolution 'Ng^3'.
pos (numpy.ndarray of shape (Np, 3)): The positions of 'Np' particles. The columns represent x, y, and z coordinates. Units in Mpc/h
Boxsize (float): The side length of the cubic volume in the same units as pos.

Returns

fieldI (numpy.ndarray of shape (Np,)): The interpolated field values at the given particle positions.

def kNN_excess_cross_corr(auto_cdf_list_1, auto_cdf_list_2, joint_cdf_list, k1_k2_list=None): View Source

448def kNN_excess_cross_corr(auto_cdf_list_1, auto_cdf_list_2, joint_cdf_list, k1_k2_list=None):
449    r'''
450    Computes the excess spatial cross-correlation (Banerjee & Abel 2023)[^1] between two tracers (discrete or continuous) from their joint kNN distributions (`joint_cdf_list`) and their respective kNN-CDFs (`auto_cdf_list_1`, `auto_cdf_list_2`).
451
452    Parameters
453    ----------
454    auto_cdf_list_1 : list of numpy float array
455        auto kNN-CDFs of the first set of tracers. If `k1_k2_list` is not ``None``, The $i^{th}$ element should be the $k_1^i$NN-CDF if the $i^{th}$ element of `k1_k2_list` is ($k_1^i$, $k_2^i$).
456    auto_cdf_list_2 : list of numpy float array
457        auto kNN-CDFs of the second set of tracers. If `k1_k2_list` is not ``None``, The $i^{th}$ element should be the $k_2^i$NN-CDF if where the $i^{th}$ element of `k1_k2_list` is ($k_1^i$, $k_2^i$).
458    joint_cdf_list : list of numpy float array
459        joint kNN distributions of the two tracer sets. If `k1_k2_list` is not ``None``, The $i^{th}$ element should be the joint {$k_1^i$, $k_2^i$}NN-CDF, where the $i^{th}$ element of `k1_k2_list` is ($k_1^i$, $k_2^i$).
460        
461    k1_k2_list : list of int tuples
462        describes the kind of cross-correlations being computed (see notes for more details), by default `None`. Should be not None only if dealing with tracer-tracer cross-correlations
463
464    Returns
465    -------
466    psi_list : list of numpy float array
467        excess spatial cross-correlation between the two tracer sets.
468
469    Raises
470    ------
471    ValueError
472        if `k1_k2_list` is not `None` and `len(joint_cdf_list)!=len(k1_k2_list)`
473    ValueError
474        if `k1_k2_list` is `None` and `len(joint_cdf_list)!=len(auto_cdf_list_1) or len(joint_cdf_list)!=len(auto_cdf_list_2)`
475
476    Notes
477    -----
478    The parameter `k1_k2_list` describes the kind of cross-correlations being computed. It should be set to `None` for every scenario other than tracer-tracer cross-correlation, in which case it should provide the combinations of NN indices for the two tracers sets being cross-correlated. 
479    
480    For example, if you wish to compute the excess cross correlation for the joint {1,1}, {1,2} and {2,1}NN-CDFs, then set
481            
482        k1_k2_list = [(1,1), (1,2), (2,1)]
483
484    Note that the inputs must be self-consistent, which means the following must be ``True``
485
486        len(joint_cdf_list)==len(auto_cdf_list_1) and len(joint_cdf_list)==len(auto_cdf_list_2) and len(joint_cdf_list)==len(k1_k2_list)
487        
488    For example, if
489
490        k1_k2_list = [(1,1), (1,2)]
491
492    then
493        
494        len(auto_cdf_list_1) == 2 and len(auto_cdf_list_2) == 2 and len(joint_cdf_list) == 2
495
496    must hold, and the first (second) element of `joint_cdf_list` should be the joint {1,1}NN-CDF ({1,2}NN-CDF).
497        
498    If `None` is passed for tracer-tracer cross-correlations, the correlations are assumed to be between the same NN indices (eg. {1,1}NN-CDF, {2,2}NN-CDF).
499
500    References
501    ----------
502    [^1]: Arka Banerjee, Tom Abel, Tracer-field cross-correlations with k-nearest neighbour   distributions, [Monthly Notices of the Royal Astronomical Society](https://doi.org/10.1093/mnras/stac3813), Volume 519, Issue 4, March 2023, Pages 4856–4868
503    '''
504
505    #-------------------------------------------------------------------------------------------
506
507    psi_list = []
508
509    #-------------------------------------------------------------------------------------------
510    
511    #Check for consistency:
512    if k1_k2_list:
513        if len(joint_cdf_list)!=len(k1_k2_list) or len(joint_cdf_list)!=len(auto_cdf_list_1) or len(joint_cdf_list)!=len(auto_cdf_list_2): 
514            raise ValueError('Inconsistent input shapes')
515    else:
516        if len(joint_cdf_list)!=len(auto_cdf_list_1) or len(joint_cdf_list)!=len(auto_cdf_list_2): 
517            raise ValueError('Inconsistent input: shapes not compatible with each other')
518    
519    for k in range(len(joint_cdf_list)):
520        psi_list.append(joint_cdf_list[k]/(auto_cdf_list_1[k]*auto_cdf_list_2[k]))
521
522    return psi_list

Computes the excess spatial cross-correlation (Banerjee & Abel 2023)¹ between two tracers (discrete or continuous) from their joint kNN distributions (joint_cdf_list) and their respective kNN-CDFs (auto_cdf_list_1, auto_cdf_list_2).

Parameters

auto_cdf_list_1 (list of numpy float array): auto kNN-CDFs of the first set of tracers. If k1_k2_list is not None, The $i^{th}$ element should be the $k_1^i$NN-CDF if the $i^{th}$ element of k1_k2_list is ($k_1^i$, $k_2^i$).
auto_cdf_list_2 (list of numpy float array): auto kNN-CDFs of the second set of tracers. If k1_k2_list is not None, The $i^{th}$ element should be the $k_2^i$NN-CDF if where the $i^{th}$ element of k1_k2_list is ($k_1^i$, $k_2^i$).
joint_cdf_list (list of numpy float array): joint kNN distributions of the two tracer sets. If k1_k2_list is not None, The $i^{th}$ element should be the joint {$k_1^i$, $k_2^i$}NN-CDF, where the $i^{th}$ element of k1_k2_list is ($k_1^i$, $k_2^i$).
k1_k2_list (list of int tuples): describes the kind of cross-correlations being computed (see notes for more details), by default None. Should be not None only if dealing with tracer-tracer cross-correlations

Returns

psi_list (list of numpy float array): excess spatial cross-correlation between the two tracer sets.

Raises

ValueError: if k1_k2_list is not None and len(joint_cdf_list)!=len(k1_k2_list)
ValueError: if k1_k2_list is None and len(joint_cdf_list)!=len(auto_cdf_list_1) or len(joint_cdf_list)!=len(auto_cdf_list_2)

Notes

The parameter k1_k2_list describes the kind of cross-correlations being computed. It should be set to None for every scenario other than tracer-tracer cross-correlation, in which case it should provide the combinations of NN indices for the two tracers sets being cross-correlated.

For example, if you wish to compute the excess cross correlation for the joint {1,1}, {1,2} and {2,1}NN-CDFs, then set

k1_k2_list = [(1,1), (1,2), (2,1)]

Note that the inputs must be self-consistent, which means the following must be True

len(joint_cdf_list)==len(auto_cdf_list_1) and len(joint_cdf_list)==len(auto_cdf_list_2) and len(joint_cdf_list)==len(k1_k2_list)

For example, if

k1_k2_list = [(1,1), (1,2)]

then

len(auto_cdf_list_1) == 2 and len(auto_cdf_list_2) == 2 and len(joint_cdf_list) == 2

must hold, and the first (second) element of joint_cdf_list should be the joint {1,1}NN-CDF ({1,2}NN-CDF).

If None is passed for tracer-tracer cross-correlations, the correlations are assumed to be between the same NN indices (eg. {1,1}NN-CDF, {2,2}NN-CDF).

References

Arka Banerjee, Tom Abel, Tracer-field cross-correlations with k-nearest neighbour distributions, Monthly Notices of the Royal Astronomical Society, Volume 519, Issue 4, March 2023, Pages 4856–4868 ↩