Modelling K2-24 with simppler#

This is a reproduction of the RadVel tutorial on the same dataset. Hopefully this can provide a useful comparison between how to implement similar models with the two packages (note that simppler models have a .to_radvel() method to easily convert models).

Importing the data#

Let us first load the K2-24 observations directly from the RadVel repository, extract the relevant columns, and display it.

from pandas import read_csv

url = "https://raw.githubusercontent.com/California-Planet-Search/radvel/refs/heads/master/example_data/epic203771098.csv"
df = read_csv(url, index_col=0)
df.head()
errvel t vel
0 1.593725 2364.819580 6.959066
1 1.600745 2364.825101 5.017650
2 1.658815 2364.830703 13.811799
3 1.653224 2366.827579 1.151030
4 1.639095 2367.852646 9.389273
from matplotlib import rcParams
import matplotlib.pyplot as plt
rcParams["font.size"] = 12.0

t = df.t.values
vel = df.vel.values
errvel = df.errvel.values

def plot_data():
    plt.figure(figsize=(12, 4))
    plt.errorbar(t, vel, yerr=errvel, fmt="k.", capsize=2, mfc="w", label="Data")
    plt.xlabel("Time [d]")
    plt.ylabel("RV [m/s]")
plot_data()
plt.title("RVs of K2-24")
plt.show()
../_images/3d45d68f19a736ddf2afd7d3c42f4da9d950330059a7273b21914a9f8d57f4ef.png

Building the model#

To build a simppler model, we must first specify our parameters as prior distributions.

To follow the RadVel tutorial, we will fix some parameters. simpple allows us to do this with a Fixed distribution in the prior. Parameters with a fixed distribution will not be included in the model dimensions or keys by default, but are registered and passed to the forward model when needed. See the dedicated simpple tutorial on this topic for more info.

We will use a builder function to easily create models with varying subsets of fixed parameters.

A few things to note:

  • Models are created via RVModel

  • The RVModel requires dictionary of parameter distributions

  • The RVModel also requires:

    • The number of planets (2 in this case)

    • t, rv and erv: The RV data

    • The basis to be used for orbital parameters

    • Model times tmod, to be used when plotting model curves (optional, will be set to t by default)

    • time_base, to be used as reference point for the trend component of the model (optional, 0 by default)

import numpy as np
import simppler.model as smod
from simpple import distributions as sdist
periods = [20.8851, 42.3633]
period_errs = [0.0003, 0.0006]
t0s = [2072.7948, 2082.6251]
t0_errs = [0.0007, 0.0004]
def build_model(vary):
    # TODO: Eccentricity constraint
    if vary == "all":
        parameters = {
            "per1": sdist.Normal(periods[0], period_errs[0]),
            "tc1": sdist.Normal(t0s[0], t0_errs[0]),
            "secosw1": sdist.Uniform(-1, 1),
            "sesinw1": sdist.Uniform(-1, 1),
            "logk1": sdist.Normal(np.log(5), 10),
            "per2": sdist.Normal(periods[1], period_errs[1]),
            "tc2": sdist.Normal(t0s[1], t0_errs[1]),
            "secosw2": sdist.Uniform(-1, 1),
            "sesinw2": sdist.Uniform(-1, 1),
            "logk2": sdist.Normal(np.log(5), 10),
            "gamma": sdist.Normal(0, 10.0),
            "dvdt": sdist.Normal(0, 1.0),
            "curv": sdist.Normal(0, 1e-1),
            "jit": sdist.Normal(np.log(3), 0.5),
        }
    elif vary == "ecc":
        parameters = {
            "per1": sdist.Fixed(periods[0]),
            "tc1": sdist.Fixed(t0s[0]),
            "secosw1": sdist.Uniform(-1, 1),
            "sesinw1": sdist.Uniform(-1, 1),
            "logk1": sdist.Normal(np.log(5), 10),
            "per2": sdist.Fixed(periods[1]),
            "tc2": sdist.Fixed(t0s[1]),
            "secosw2": sdist.Uniform(-1, 1),
            "sesinw2": sdist.Uniform(-1, 1),
            "logk2": sdist.Normal(np.log(5), 10),
            "gamma": sdist.Normal(0, 10.0),
            "dvdt": sdist.Normal(0, 1.0),
            "curv": sdist.Normal(0, 1e-1),
            "jit": sdist.Normal(np.log(3), 0.5),
        }
    else:
        parameters = {
            "per1": sdist.Fixed(periods[0]),
            "tc1": sdist.Fixed(t0s[0]),
            "secosw1": sdist.Fixed(0.01),
            "sesinw1": sdist.Fixed(0.01),
            "logk1": sdist.Normal(np.log(5), 10),
            "per2": sdist.Fixed(periods[1]),
            "tc2": sdist.Fixed(t0s[1]),
            "secosw2": sdist.Fixed(0.01),
            "sesinw2": sdist.Fixed(0.01),
            "logk2": sdist.Normal(np.log(5), 10),
            "gamma": sdist.Normal(0, 10.0),
            "dvdt": sdist.Normal(0, 1.0),
            "curv": sdist.Normal(0, 1e-1),
            "jit": sdist.Normal(np.log(3), 0.5),
        }
    tmod = np.linspace(t.min() - 5, t.max() + 5, num=1000)
    time_base = 2420
    return smod.RVModel(parameters, 2, t, vel, errvel, basis="per tc secosw sesinw logk", tmod=tmod, time_base=time_base)

Circular model#

Let us start by building a circular model and plotting it with test parameter values. The simppler.plot module has a few utility functions to plot RV data and models from an RVModel.

from simppler.plot import plot_rv, plot_phase

model = build_model("circular")
test_p = {"per1": periods[0], "tc1": t0s[0], "secosw1": 0.01, "sesinw1": 0.01, "logk1": 1.1}
test_p |= {"per2": periods[1], "tc2": t0s[1], "secosw2": 0.01, "sesinw2": 0.01, "logk2": 1.1}
test_p |= {"gamma": -10, "dvdt": -0.02, "curv": 0.01, "jit": 1.0}
plot_rv(model, parameters=test_p, residuals=False)
plt.title("K2-24 RVs with test model")
plt.show()
../_images/fc92904ea490f620e7362115818afba05d1fce6fa0e2a3fd5174b3e9a0afae4c.png
model.log_likelihood(test_p)
model.log_prob(test_p)
np.float64(-418.48109031025376)

Optimization#

Let us start by doing a simple maximum a posteriori (MAP) estimate.

from scipy.optimize import minimize

vary_p = {p: v for p, v in test_p.items() if p in model.vary_p}
res = minimize(lambda p: - model.log_prob(p), np.array(list(vary_p.values())), method="Nelder-Mead")
opt_p = dict(zip(model.keys(), res.x))
fig, axs = plot_rv(model, opt_p)
axs[0].set_title("K2-24 RVs with the optimized circular model")
plt.show()
../_images/d40356d89d162f3d6ba86d1c1c3c35e2c32dfb3bf3e70fbe5b5d8703327e5c89.png
fig, axs = plot_phase(model, opt_p)
axs[0].set_title("K2-24 phase folds for the optimized circular model")
plt.show()
../_images/c9da0eb86b0027c7cc5d29a437294e8d9d40c85914cc9704e7a313d89bc62e2c.png

Sampling#

We can also do MCMC sampling for our circular model.

import emcee

nwalkers = 50
nsteps = 10_000
ndim = model.ndim
sampler = emcee.EnsembleSampler(nwalkers, ndim, model.log_prob)
rng = np.random.default_rng()
p0 = res.x + 1e-4 * rng.normal(size=(nwalkers, ndim))
_ = sampler.run_mcmc(p0, nsteps, progress=True)
from simpple.plot import chainplot
chainplot(sampler.get_chain(), labels=model.keys())
plt.show()
../_images/016cdf89448deb142ae7d9935c2031dff8154f6c79557246d616af742007d818.png
import corner
chains = sampler.get_chain(discard=2000, flat=True, thin=10)
corner.corner(chains, labels=model.keys(), show_titles=True)
plt.show()
../_images/3c1220706c68a16d6f10fb872fa8426a5cba524bac82e318644b516e5e6dd160.png

The plot_rv and plot_phase functions also accept MCMC chains as input. By default they will display 100 samples from the chain.

fig, axs = plot_rv(model, chains.T, n_samples=100)
axs[0].set_title("Posterior samples for the circular model")
plt.show()
../_images/5a2018863388e83dcab76802b649aa3e625be604d5e633a9b249fb6755c2c086.png
plot_phase(model, chains.T, n_samples=100)
axs[0].set_title("Phase-folded posterior samples for the circular model")

plt.show()
../_images/99291eb2d7dc851a27abce3af0424b2b94c816fa00884bd5c37874f5aebc13e2.png

Eccentric orbits#

Let us now repeat all the steps we did for the circular model, but for an eccentric model.

model = build_model("ecc")

Optimization#

vary_p = {p: v for p, v in test_p.items() if p in model.vary_p}
res = minimize(lambda p: - model.log_prob(p), np.array(list(vary_p.values())), method="Powell")
invalid value encountered in scalar multiply
opt_p = dict(zip(model.keys(), res.x))
fig, axs = plot_rv(model, opt_p)
axs[0].set_title("K2-24 RVs with the optimized eccentric model")
plt.show()
../_images/7e49d825efed05def53e35fe19ab2ab13e606f3243e135444b2db01c969ba08f.png
fig, axs = plot_phase(model, opt_p)
axs[0].set_title("K2-24 pase folds with the optimized eccentric model")
plt.show()
../_images/1931bfe947a6c457422c89fa0d8384773a7ce127421f45853d0929ae9edabde4.png

The optimized model looks like it provides a better fit to the data. Let us now explore the posterior a bit more with MCMC.

Sampling#

import emcee

nwalkers = 50
nsteps = 10_000
ndim = model.ndim
sampler = emcee.EnsembleSampler(nwalkers, ndim, model.log_prob)
rng = np.random.default_rng()
p0 = res.x + 1e-4 * rng.normal(size=(nwalkers, ndim))
_ = sampler.run_mcmc(p0, nsteps, progress=True)
chainplot(sampler.get_chain(), labels=model.keys())
plt.show()
../_images/8faebbd408e8b65d209f0e5b827105743ccaa2ee79b5f1cc83fc5697096fe247.png
import corner
chains = sampler.get_chain(discard=1000, flat=True, thin=10)
corner.corner(chains, labels=model.keys(), show_titles=True)
plt.show()
../_images/fcb6842f99a339192a9ed1ea899ff1403c9e366997b5cab1cfefd8b226c85802.png
fig, axs = plot_rv(model, chains.T)
axs[0].set_title("Posterior samples for the eccentric model")
plt.show()
../_images/c0ce531bfbde9b94699ae64736febe6454418f53e7e85a68cb323f0b1138f29f.png
fig,axs = plot_phase(model, chains.T)
axs[0].set_title("Phase-folded posterior samples for the eccentric model")
plt.show()
../_images/d13708cde94e0db957683c1c855379600a6d11c1b192716c7bdf7f71a65987cf.png

Orbit without fixed parameters#

Instead of freezing some parameters, let us see what happens if we let them all vary. For parameters with good external contraints, we will use Gaussian priors, instead of fixing them.

model = build_model("all")

Optimization#

vary_p = {p: v for p, v in test_p.items() if p in model.vary_p}
res = minimize(lambda p: - model.log_prob(p), np.array(list(vary_p.values())), method="Powell")
invalid value encountered in scalar multiply
opt_p = dict(zip(model.keys(), res.x))
fig, axs = plot_rv(model, opt_p)
axs[0].set_title("K2-24 RVs with the optimized model")
plt.show()
../_images/18a9126f3837a06101ee5b3d6978ecc34ed6176f6f161c9fee808f431a012b3e.png
fig, axs = plot_phase(model, opt_p)
axs[0].set_title("K2-24 phase folds for the optimized model")
plt.show()
../_images/6b6979db1e71cd24cc487b15d409525dbe6b81546aefb8978e7a3de8333dfba7.png

Sampling#

import emcee

nwalkers = 50
nsteps = 10_000
ndim = model.ndim
sampler = emcee.EnsembleSampler(nwalkers, ndim, model.log_prob)
rng = np.random.default_rng()
p0 = res.x + 1e-4 * rng.normal(size=(nwalkers, ndim))
_ = sampler.run_mcmc(p0, nsteps, progress=True)
chainplot(sampler.get_chain(), labels=model.keys())
plt.show()
../_images/c4e708a1fe7ca1c16d9ef01a271ec8ae99605a1eaeae1d980fa0ef39b8bd94e7.png
import corner
chains = sampler.get_chain(discard=1000, flat=True, thin=10)
corner.corner(chains, labels=model.keys(), show_titles=True)
plt.show()
../_images/74a7263e567c2231d82bbda41634b4fefbbcc539baf7804fdd006398d7e66638.png
fig, axs = plot_rv(model, chains.T)
axs[0].set_title("Posterior samples for the full model")
plt.show()
../_images/b82729982b136b5e0876fc00d6c6a34480c678822da5a7bc07cadcd91333274e.png
fig, axs = plot_phase(model, chains.T)
axs[0].set_title("Phase-folded posterior samples for the full model")
plt.show()
../_images/a8704a5ebfbf0c32c8f1bbc3f7792b1f218739c382dbcc64fd678c653285b6f8.png