Pycorrelate examples¶

This notebook shows howto use pycorrelate as well as comparisons with other implementations.

In [1]:

import numpy as np
import h5py

/home/docs/checkouts/readthedocs.org/user_builds/pycorrelate/conda/latest/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

In [2]:

# Tweak here matplotlib style
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['font.sans-serif'].insert(0, 'Arial')
mpl.rcParams['font.size'] = 14
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

In [3]:

import pycorrelate as pyc
print('pycorrelate version: ', pyc.__version__)

pycorrelate version:  0.3+24.gf9e109e

Load Data¶

We start by downloading some timestamps data:

In [4]:

url = 'http://files.figshare.com/2182601/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5'
pyc.utils.download_file(url, save_dir='data')

URL:  http://files.figshare.com/2182601/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5
File: 0023uLRpitc_NTP_20dT_0.5GndCl.hdf5

File already on disk: data/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5
Delete it to re-download.

In [5]:

fname = './data/' + url.split('/')[-1]
h5 = h5py.File(fname)
unit = 12.5e-9

In [6]:

num_ph = int(3e6)
detectors = h5['photon_data']['detectors'][:num_ph]
timestamps = h5['photon_data']['timestamps'][:num_ph]
t = timestamps[detectors == 0]
u = timestamps[detectors == 1]

In [7]:

t.shape, u.shape, t[0], u[0]

Out[7]:

((839592,), (1844370,), 146847, 188045)

In [8]:

t.max()*unit, u.max()*unit

Out[8]:

(599.9994419125, 599.9984722875)

Timestamps need to be monotonic, let’s test it:

In [9]:

assert (np.diff(t) > 0).all()
assert (np.diff(u) > 0).all()

Log-scale bins (base 10)¶

Here we compute the cross-correlation on log10-spaced bins.

First we compute the array of lag bins using the function make_loglags:

In [10]:

# compute lags in sec. then convert to timestamp units
bins = pyc.make_loglags(-7, 0, 10, return_int=False) / unit

Then, we compute the cross-correlation using the function pcorrelate:

In [11]:

G = pyc.pcorrelate(t, u, bins)

/home/docs/checkouts/readthedocs.org/user_builds/pycorrelate/conda/latest/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

In [12]:

fig, ax = plt.subplots(figsize=(10, 6))
plt.plot(bins*unit, np.hstack((G[:1], G)), drawstyle='steps-pre')
plt.xlabel('Time (s)')
#for x in bins[1:]: plt.axvline(x*unit, lw=0.2)  # to mark bins
plt.grid(True); plt.grid(True, which='minor', lw=0.3)
plt.xscale('log')
plt.xlim(30e-9, 2)

Out[12]:

(3e-08, 2)

../_images/notebooks_pycorrelate-examples_17_1.png

Log-scale bins (base 2)¶

Here we compute the same cross-correlation on log2-spaced bins.

First we compute the array of lag bins using the function make_loglags:

In [13]:

# compute lags directly in timestamp units
bins = pyc.make_loglags(0, 28, 1, base=2)

Then, we compute the cross-correlation using the function pcorrelate:

In [14]:

G = pyc.pcorrelate(t, u, bins)

In [15]:

fig, ax = plt.subplots(figsize=(10, 6))
plt.plot(bins*unit, np.hstack((G[:1], G)), drawstyle='steps-pre')
plt.xlabel('Time (s)')
#for x in bins[1:]: plt.axvline(x*unit, lw=0.2)  # to mark bins
plt.grid(True); plt.grid(True, which='minor', lw=0.3)
plt.xscale('log')
plt.xlim(30e-9, 2)

Out[15]:

(3e-08, 2)

../_images/notebooks_pycorrelate-examples_22_1.png

Multi-tau bins¶

Finally, we compute the cross-correlation on arbitrarly-spaced bins. Similar to the multi-tau bins, here we use constant bin size for a number of bins (n_group), then we double the bin size and we keep it constant for another n_group and so on:

In [16]:

n_group = 4
bin_widths = []
for i in range(26):
    bin_widths += [2**i]*n_group
np.array(bin_widths)
bins = np.hstack(([0], np.cumsum(bin_widths)))

Then, we compute the cross-correlation using the function pcorrelate:

In [17]:

G = pyc.pcorrelate(t, u, bins)

In [18]:

fig, ax = plt.subplots(figsize=(10, 6))
plt.plot(bins*unit, np.hstack((G[:1], G)), drawstyle='steps-pre')
plt.xlabel('Time (s)')
#for x in bins[1:]: plt.axvline(x*unit, lw=0.2)  # to mark bins
plt.grid(True); plt.grid(True, which='minor', lw=0.3)
plt.xscale('log')
plt.xlim(30e-9, 2)

Out[18]:

(3e-08, 2)

../_images/notebooks_pycorrelate-examples_27_1.png

Test: comparison with np.histogram¶

For testing alternative (slower) implementations we use smaller input arrays:

In [19]:

tt = t[:5000]
uu = u[:5000]

The algoritm implemented in pycorrelate.pcorrelate can be re-written in a very simple way using numpy.histogram:

In [20]:

# compute lags in sec. then convert to timestamp units
bins = pyc.make_loglags(-7, 0, 10, return_int=False) / unit

In [21]:

Y = np.zeros(bins.size - 1, dtype=np.int64)
for ti in tt:
    Yc, _ = np.histogram(uu - ti, bins=bins)
    Y += Yc
G = Y / np.diff(bins)

In [22]:

assert (G == pyc.pcorrelate(tt, uu, bins)).all()

Test passed! Here we demonstrated that the logic of the algorithm is implemented as described in the paper (and in the few lines of code above).

Tests: comparison with np.correlate¶

The comparison with np.correlate is a little tricky. First we need to bin our input to create timetraces that can be correlated by linear convolution. For testing purposes, let’s use a small portion of the timetraces:

In [23]:

binwidth = 50e-6
bins_tt = np.arange(0, tt.max()*unit, binwidth) / unit
bins_uu = np.arange(0, uu.max()*unit, binwidth) / unit

In [24]:

bins_tt.max()*unit, bins_tt.size

Out[24]:

(4.13725, 82746)

In [25]:

bins_uu.max()*unit, bins_uu.size

Out[25]:

(1.8020999999999998, 36043)

In [26]:

tx, _ = np.histogram(tt, bins=bins_tt)
ux, _ = np.histogram(uu, bins=bins_uu)

plt.figure(figsize=(10, 6))
plt.plot(bins_tt[1:]*unit, tx)
plt.plot(bins_uu[1:]*unit, ux)
plt.xlabel('Time (s)')

Out[26]:

Text(0.5,0,'Time (s)')

../_images/notebooks_pycorrelate-examples_39_1.png

The plots above are the two curves we are going to feed to np.correlate:

In [27]:

C = np.correlate(ux, tx, mode='full')

We need to trim the result to obtain a proper alignment with the 0-time lag:

In [28]:

Gn = C[tx.size-1:]  # trim to positive time lags

Now, we can check that both numpy.correlate and pycorrelate.ucorrelate give the same result:

In [29]:

Gu = pyc.ucorrelate(tx, ux)
assert (Gu == Gn).all()

Now, let’s compute the correlation also with pycorrelate.pcorrelate:

In [30]:

maxlag_sec = 3.9
lagbins = (np.arange(0, maxlag_sec, binwidth) / unit).astype('int64')

In [31]:

Gp = pyc.pcorrelate(tt, uu, lagbins) * int(binwidth / unit)

Let’s plot a comparison:

In [32]:

fig, ax = plt.subplots(figsize=(10, 6))
Gn_t = np.arange(1, Gn.size+1) * binwidth * 1e3
Gu_t = np.arange(1, Gu.size+1) * binwidth * 1e3
Gp_t = lagbins[1:] * unit * 1e3
plt.plot(Gn_t, Gn, alpha=1, lw=2, label='numpy.correlate')
plt.plot(Gu_t, Gu, alpha=0.6, lw=2, label='pycorrelate.ucorrelate')
plt.plot(Gp_t, Gp, alpha=0.7, lw=2, label='pycorrelate.pcorrelate')
plt.xlabel('Time (ms)', fontsize='large')
plt.grid(True)
plt.xlim(30e-3, 500)
plt.xscale('log')
plt.title('pycorrelate.correlate vs numpy.correlate', fontsize='x-large')
plt.legend(loc='best', fontsize='x-large');

../_images/notebooks_pycorrelate-examples_50_0.png

Conclusion¶

numpy.correlate and pycorrelate.ucorrelate give identical results, with the latter being much faster. Note that the inputs are swapped between the two functions.
pycorrelate.ucorrelate and pycorrelate.pcorrelate agree when using uniform time-lag bins.