ism3d.maths package

Submodules

ism3d.maths.stats module

ism3d.maths.stats.cdf2rv(x, cdf, size=100000, seed=None)[source]

**** only here for backward compaibility *

Generate a pseudo-random random variable set following a target distribution described by the CDF, using the inverse transform / pesudo-random number sampling method

The input CDF is presumed to be sorted already with mono increasing trend.

The result should stay within the range of cdf_x

this is for backward compaibility make sure

x[0] cdf=0 x[-1] cdf=1

icdf=interpolate.interp1d(y_cdf,x,kind=’linear’) # icdf: inverse cumulative distribution function # a.k.a Percent point function (inverse of cdf) # https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous # It seems that scipy.interpolate mess up units when x.type is QUANTITY return icdf(pick)

ism3d.maths.stats.custom_pdf(func, x, sersic_n=1)[source]
note:

the pdf values is not normalized: int(pdf*dx) is not one!

Although we can use scipy.stats, we write a customized code here to improve performance.

ism3d.maths.stats.custom_ppf(func, q, sersic_n=1)[source]

* we also wrap the random generator into rv_continous class * *** but this inline function may have better performance without the error checking overhead in rv_continous class

To build a fast persodu-random genenrator for an analytical function, the key is calculate the inverse cumlutative diftributon function or Percent Point Function (PPF) in high precisiom: This is the parameter reliazation values as a function of CDF (Percent point function):

Sometimes this function will have an analytical form, which will be good in terms of performance and precision.

should support vector Although this is 1D, any fancy function can be further manupilated

https://reference.wolfram.com/language/ref/InverseCDF.html

calculate ppf from percent cutoff for common (regularized/normalized) PDF this can be used for inverse transform sampling

If “u” is an uniform random number from (0,1), then the output would be a sample following the desired distribution *

see also the appendix Xue+2020

see also: inverseCDF.nb in MM12

If “-ss” is in the function name, we will use scipy.stats build-in RV class

  • typicall using scipy.special will be faster

to use this function as inverse survive function (ISF) try:

sp=1-q x=custom_ppf(func,sp)

sersic_n only works for func=”sersic2d”

The interpolation sample should stay in:

x <-> SF or x <->log(SF) space

As it’s difficult to keep the numerical accuracy and x <-> CDF for large x + close to one CDF table : when you set up a x grid, for large x, the corrspoding CDF will one due to numerical precision

ism3d.maths.stats.custom_rvs(func, size=100000, scale=1.0, interp=0.02, 20, sersic_n=1, seed=None)[source]

https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.RandomState.html#numpy.random.RandomState use_nprng==True:

use np built-in random generator rather than the hardcoded uniform+its formula

more ppf rvs can be found here:

https://github.com/scipy/scipy/blob/master/scipy/stats/_continuous_distns.py scipy try to use special.fun as much as possible due to performance gain

performance:

scipy.stats.rvs is generally slowest as the wrapper A Good in-line ITS is usually fast and comparable to rng.dist (which still use C underhood)

rvs=

When size is large, the ppf calculation can slow down the ITS (ur->x); If interp is specificed as a two-element tuple, we will use the pre-calculated ur->x table to performance the transform interp=(a,b) ; (a,b) determined the random variable PDF accuracy, in units of “scale”

a: resolution (so the uncernraiy in x should be smaller than a*scale b: beyond b*scale, the transform is truncated and we lost the PDF accuracy in the random variable.

smaller a or large b will increase the computation time.

if interp is None then, the custom_ppf method is used and the compulation time will incrase with size=1.0 if interp is set as default, the computing time will flatten at some point as size increase.

When using interp for inverse transformation, the better accurate in x within a limit dynamical range setting is determined by which part of sampling table is usable for interpolation. For oneside or twoside function, this is typically x-SF as saving numerican value SF can catpture x change at large x (with close to zero SF) with good enough precsion.

note: np.interp(x,xp,xf) works faster if x is sorted; however, there is cost of sorting random number

the bottleneck is here is still np.interp() https://github.com/numpy/numpy/issues/10937

func: string / built-in function formula
dict / {‘x’:x,’cdf’:cdf} cdf-x is the CDF function

{‘y’:y,’sf’:sf} sf-x is the surveve function

this feature is used for the case where the PPF is not analytical for the distribution

ism3d.maths.stats.custom_sf(func, x, sersic_n=1)[source]

The relation between x and q (i.e. CDF) is used to transfer uniform standard random variables to the desired sampling for a specified PDF. If .ppf can be writtten in analytical forms, we can call funtions to directly calculate. However, if special functions are involved, the computation cost can be high and an pre-calculated transform table can be used.

To Keep numerical precision and meet the sampling gridding error requirementaion (that is the sampling transfer error in X_{i} is smaller than pixel size / bin size. It’s easier to build the table from a regular oversamped grid in x, then calculate the crosspoding p value (within 0~1). Of course the regular oversamped x value can not go to +finity, and an upper limit should be given:

e.g. x_grid=np.linspace(0,100,10000) may be good enought for “expon2d”,

this means the transfered x from interpolated should have error(x)<=0.01Re (as we use linear interp to replace the actual curve), withinin 100Re (more than enought for our modeling accuray)

However, another numerical problem may occare as large x may lead to very close to 1 in this case, and numercial

accurcy may degrade lead to 1: see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html note on _logsf and the test_custom_ppf_learn(): x=100 will lead to q=1 rather than something cose to zero.

The solution is calculate sf=1-q=1-CDF or logsf=log(1-q)=log(1-CDF); then the small deviation from 1 can still be acturrately represented numerically. And we can use x vs. logsf table to transfer ur to x-domain acurrate enought for gridding purpose

The computer is good at saving small number close to zero rather than a number very close to one.

https://scicomp.stackexchange.com/questions/20629/when-should-log1p-and-expm1-be-used

note: the interplation method is only used for the transform in which the calculatione becomes to expensive

The inverse function x=isf(sf):

if sf is a uniform (0-1) random variable, then x should follow the desired distribution

use sf instead of CDF is for numerical precsion purpose.

We only use this to build interpolation table for expensive case; so only func=’expon2d’ & ‘sersic2d’ are implemented.

class ism3d.maths.stats.expon2d_gen(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, extradoc=None, seed=None)[source]

Bases: scipy.stats._distn_infrastructure.rv_continuous

rho Distribution of a axisymatteric 2D distribution with a Exponential radial profile

class ism3d.maths.stats.laplace_gen(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, extradoc=None, seed=None)[source]

Bases: scipy.stats._distn_infrastructure.rv_continuous

rho Distribution of a axisymatteric 2D distribution with a Exponential radial profile

class ism3d.maths.stats.norm2d_gen(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, extradoc=None, seed=None)[source]

Bases: scipy.stats._distn_infrastructure.rv_continuous

rho Distribution of a axisymatteric 2D distribution with a Gaussian radial profile

ism3d.maths.stats.pdf2rv(x, pdf, size=100000, seed=None)[source]

Generate a pseudo-random random variable set following a target distribution described by the PDF using the inverse transform / pesudo-random number sampling method

The result should stay within the range of pdf_x

ism3d.maths.stats.pdf2rv_nd(pdf, size=100000, sort=False, interp=True, seed=None)[source]

Generate a pseudo-random sample (approximately) following a target PDF specified by a n-dimension array The inverse transform sampling approach is used without expensive MC. see more details here:

Note:
  • A over-sampled PDF grid (with pdf_sort/pdf_interp=True) is preferred for accuracy.

  • For a input PDF with a shape of (ny,nx), the output will be

    xpos=sample[0,0] ypos=sample[1,:] xpos and ypos will be in 0-based pixel index units (i.a.xypos=[0,0] means the first pixel center.) This convention follow the FITS image format layout.

ism3d.maths.stats.rng_seeded(seed=None)[source]

return a seeded RNG plus is_mkl mkl_random is preferrred over numpy.random

ism3d.maths.stats.sech = <scipy.stats._distn_infrastructure.rv_frozen object>

logistic = exp(-x)/(1+exp(-x))^2 = 1/(2+exp(-x)+exp(x)) norm(sech) = 2 / (2+exp(-2*x)+exp(2*x))

class ism3d.maths.stats.sechsq_gen(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, extradoc=None, seed=None)[source]

Bases: scipy.stats._distn_infrastructure.rv_continuous

rho Distribution of a axisymatteric 2D distribution with a Exponential radial profile this is a direct implementation for testing, one should use sech2 instead

class ism3d.maths.stats.sersic2d_gen(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, extradoc=None, seed=None)[source]

Bases: scipy.stats._distn_infrastructure.rv_continuous

rho Distribution of a axisymatteric 2D distribution with a Sersic radial profile template from:

Module contents

subpackage maths