Sept. 16, 2012, 10:04 p.m.

One Week Of Experiments

Detoxification Diet

I spent the last two weeks cleansing my body of years of accumulated waste and slug. I was assigned a very, very specific detoxification diet, designed to remove toxins naturally. More on that in a second. My friend and a soon-to-be naturopathic doctor Chelsea Schreiner both gave me the idea for this OWO and mentored me throughout the process. Ideally, this should be a 3-6 weeks long regime, but as I have better things to do and try, I compromised with a two week regime. It was more than enough.

So, what is a detoxification diet? It is a diet designed to naturally support your body's detox instruments, namely your liver. By ingesting some foods, according to the theory, your liver will increase its detox process, but in this new active state it must be supported. And, to let your liver sweep up everything, it must not be muttled down by extraneous toxins. What do I mean by extraneous toxins: all most everything delicious. Let's examine what I couldn't eat:

  • red meat
  • dairy products
  • peanuts
  • soy
  • processed sugar
  • corn and corn byproducts
  • wheat
  • gluten
  • oats, barley etc.
  • alcohol =(
  • vegetables from the Nightshade family, eg: tomatoes, potatoes, eggplant, peppers
  • bananas, citrus, strawberries
  • and finally, caffeine. f**k

It was this last item that was most detrimental to my well being: two weeks without caffeine is an OWO of itself, and here it was merely a side constraint. As a thrice-daily coffee drinker, I have built up a strong tolerance and, one may say, dependence on coffee. Little did I understand how serious the relationship was. More later.

So looking at the set of all foods, what I could eat was a pretty small set. The idea of the diet is to take in large amounts of vegetables. So one having a good hand at cooking vegetables is key to doing this diet. Lucky, I'm fucking awesome. But this only solves a partial problem: vegetables are not enough! And without wheat and potatoes, you have to get creative with substantial foods to eat. Enter quinoa: the most delicious, detox-friendly grain known to man. I created a wonderful quinoa dish that I would be happy to eat whether detoxing or not (the recipe is at the bottom of this article). What about breakfast - what did I eat for breakfast? Typically it was light: an apple, a glass of almond milk + a pea-derived protein powder. Lunch was usually a soup I prepared: common was split pea soup or carrot-ginger soup. I made a really nice spinach curry soup that I would make again. Supper was again soup, curried cabbage or quinoa salad. On a diet like this, it is important to snack often, lest you have 0 energy, so I supplemented by meals with carrots + humus, celery + tahini, nuts, fruit and daydreams of doughnuts.

Ok, so what happened? How did the two weeks go? Bad, then good. I underestimated what caffeine withdrawal was like. What an unpleasant feeling! Head and body-aches, and constant feelings of lethargy. For four days I suffered like this. The only relief was my constant naps in classrooms. In my experience, a brain without caffeine, or some other stimulant, is a dull brain. I constantly felt unmotivated, and even apathetic at times. Contrast this to now where all I can think about is 'how can I get more shit done in less time--oh and my coffee cup is empty'. So included in my caffeine withdrawals was withdrawals from my regular diet. Chelsea, my mentor, recommends to her patients who embark on a detox diet to gradually eliminate foods. Not me. I don't have time for that. So I literally went from gorging on junk (in preparation of a two-week fast), to clean, sugar-free, diet. The shock layered my sufferings. So if the first week was bad, the second week was much better. I found my cooking and diet stride. Preparing food was easier. Even my palette changed, and I found sweetness and flavour where I otherwise never did. But still the issue of lethargy hanged over me. Over the two weeks, I could not dispel it. My body - no, my brain - is powered by coffee.

I did feel better eating all these vegetables and avoiding nasty foods. Whether the feeling with an actual mental response to less bodily toxins, or purely psycho-somatic is debatable. I slept better. I was more regular (infact my whole digestive tract changed for the better). I lost a few pounds (actually I'm not sure, but probably). But I knew this diet was temporary. I had my entire day's diet planned the day after this experiment was finished: coffee + cinnamon buns in the morning, burger and fries for lunch, and Indian food for dinner. And a beer. Waking up that morning was like Christmas morning!

What are my thoughts on the diet now? I think it is an interesting diet for those for which it is recommended. I entered the diet only because it was a challenge, not for any health benefits or allergen profiling.

So I know this write-up was late, and in fact I have already done another OWO inbetween when I finished this and now (it was a private one, sorry can't say what it was), but next week I will be bullying my morals around. Stay tuned and stay away from me next week. =)

Quinoa salad

  • 1 cup of quinoa grain
  • tsp of cumin
  • splash of olive oil
  • splash of balsamic vinegar
  • chopped spinach
  • diced red onion
  • diced green onion
  • optional: chopped radishes, chopped almonds, or grated carrots
  • salt and pepper to taste

Wash the quinoa in cold water. Add two cups of water, add some salt and the cumin. Bring to a boil. Reduce to a lower simmer. Once all the liquid is absorbed, let it sit to room temp. Put into fridge to cool (or freezer w/e). As the quinoa is cooling, prepare vegetables. Combine remaining ingredients once quinoa is cool enough. Eat .

  • Detoxification Diet
  • Eating Tim Hortons
  • Being Roman Catholic
  • Sleeping 4 hours a day
  • Meditation
  • Bulk Barn diet
  • Not looking at mirrors
  • Veganism
  • No Internet


March 02th, 2014

Generating exponential survival data

TLDR: Suppose we interested in generating exponential survival times with scale parameter $\lambda$, and having $\alpha$ probability of censorship ( $0 < \alpha < 1$. This is actually, at least from what I tried, a non-trivial problem. Here's the algorithm, and below I'll go through what doesn't work to:


December 27th, 2013

Deriving formulas for the expected sample size needed in A/B tests

Often an estimate of the number of samples need in an A/B test is asked. Now I've sat down and tried to work out a formula (being disatisfied with other formulas' missing derivations). The below derivation starts off with Bayesian A/B, but uses frequentist methods to derive a single estimate (God help an individual interested in a posterior sample size distribution!)


December 19th, 2013

lifelines: survival analysis in Python

The lifelines library provides a powerful tool to data analysts and statisticians looking for methods to solve a common problem:

How do I predict durations?

This question seems very vague and abstract, but thats only because we can be so general in this space. Some more specific questions lifelines will help you solve are:


October 03th, 2013

Evolutionary Group Theory

We construct a dynamical population whose individuals are assigned elements from an algebraic group \(G\) and subject them to sexual reproduction. We investigate the relationship between the dynamical system and the underlying group and present three dynamical properties equivalent to the standard group properties.


August 25th, 2013

Videos about the Bayesian Methods for Hackers project

  1. New York Tech Meetup, July 2013: This one is about 2/3 the way through, under the header "Hack of the month"

    Available via MLB Media player
  2. PyData Boston, July 2013: Slides available here

    Video available here.

Read more

March 2, 2014, 5:24 p.m.

Latest Blog

Generating exponential survival data

TLDR: Suppose we interested in generating exponential survival times with scale parameter $\lambda$, and having $\alpha$ probability of censorship ( $0 < \alpha < 1$). This is actually, at least from what I tried, a non-trivial problem. I've derived a few algorithms:

Algorithm 1

  1. Generate $T \sim \text{Exp}( \lambda )$. If $\alpha = 0$, return $(T, 1)$.
  2. Solve $\frac{ \lambda h }{ \exp (\lambda h) -1 } = \alpha $ for $h$.
  3. Generate $E \sim \text{TruncExp}( \lambda, h )$, where $\text{TruncExp}$ is the truncated exponential distribution with max value $h$.
  4. $C = (T + E) < h$ (it's a boolean)
  5. $T = \min ( h - E, T )$
  6. return $(T,C)$
Yes, it is actually that hard (unless I am missing something and there is a super simple solution). ' Here's the Python:

Algorithm 2

  1. Generate $T \sim \text{Exp}( \lambda )$. If $\alpha = 0$, return $(T, 1)$.
  2. Generate $T_c \sim \text{Exp}( \frac{ \alpha \lambda}{1- \alpha} )$.
  3. $C = (T > T_c)$ (it's a boolean)
  4. $T = \min ( T_c, T )$
  5. return $(T,C)$

The long

Here's what doesn't work, which I rudely found out today (why? This fails independences assumptions when using Kaplan Meier)

  1. Generate exponentials,
  2. randomly pick $\alpha$ of them, and scale their magnitude by a $\text{Uni}(0,1)$.

Details on Algorithm 1

Instead, I visualised the problem as a mini real-world situation. That is, given a randomly staggered birth and a individual having an exponential lifetime, at what time should I observe the individual so that there is a $\alpha$ probability that I have censored them (that is, they have no died yet). To make things mathematically easier, I assumed that staggered births also came from a independent and identical distribution as the lifetime distribution. Call the time before birth $S$ and the lifetime of an individual $L$ (so $S$ and $L$ are iid exponentials). I am curious about the time I should observe the individual, call this time $h$, so that there is a $alpha$ probability they have not died yet. I also need that $S < h$, so that I at least see the birth of the individual. Thus, I need to solve:

$$ P( S+ L > h | S < h ) = \alpha $$

for $\alpha$. This involved lots of fun integrals, and actually reduced to the amazingly simple formula:

$$ \frac{ \lambda h }{ \exp (\lambda h) -1 } $$

The right-hand side, as a function of $h$, looks like this:

After solving for $h$, the next step was simple simulation: generate $S | S < h$ and $L$, and determine if witnessing the individual at time $h$ would be a censorship or not. This last step was just some inequalities and algebra.

Details on Algorithm 2

This one is pretty simple: Suppose censorship times follow an exponential too, but what is the parameter? Well we want: $P( S > L ) = \alpha$ Computing this integral and solving for $\alpha$ implies that the correct parameter should be $\frac{ \alpha \lambda } { 1 - \alpha}$.


So that's it! This would be implemented in lifelines shortly.

PyProcess Library

An open-source library for stochastic processes in Python

PyProcess is an open source library meant for quick, useful and robust simulation of stochastic processes in the Python programming language. From the homepage:

PyProcess is a Python library for generating random processes. What is a random processes? A random process is a sequence of numbers, none of which are known with certainty. For example, a particular stock or commodity in financial markets are random processes, as their value in the future is random. Another example is the number of cars in a parking lot: you don't know how many cars will be present in the future. PyProcess is a tool used to generate possible realizations of the stock market, or number of cars in a parking lot, or any other random process.

PyProcess Online

For even an even quicker use, or to see a demo of what processes and options are available, one can use the PyProcess web-app

PyProcess is an open-source project, hence will never be completed 100%. That being said, I welcome any bug reports, assistance, or features that one would like to see added.


printable version

Altruistic Programmer

Specifically, I love to code. More specifically, I love to code tools that make my life, or others' lives, easier/more interesting/happier, especially when the original task is not even possible for humans (machine learning, web searching, etc.) I'm an advocate for open source technologies and tools like GitHub (that make open source cool).

The langauges and tools I am most familiar with are:

  1. Python

    I've been coding in Python for 5 years, and consider myself a Python enthusiast. I am very experienced in the many sides of Python: i) statistical and computing work, ii) web technologies, and iii) as the server's glue. I have contributed to a number of open-source Python projects and am the main author of the repository Bayesian Methods For Hackers: Using Python and PyMC, written in the IPython Notebook format.

    I have been to a few International PyCons, and attended the Advanced Scientific Computing in Python (2012)

  2. Matlab

    If Python is running too slow to move large matrices around, I switch to MATLAB. I've been using MATLAB for 5 years, and know when to use it and know when not to use it. I've created some interesting financial apps using (the suprisingly good) MATLAB GUI-DE.

  3. Javascript (including jQuery and data vis. libraries)

    My personal websites's subway system, is written in Javascript, and so are a few other projects I have worked on. I like Javascript.

  4. Linux and Windows enviroments

    I'm comfortable with both, and can transition between the two quickly (though I really prefer Linux.)

  5. MySQL and Redis

    I am using Redis for some large-scale jobs, and I think it's great!

  6. HTML(5)/CSS, R, Perl, Excel, Git, LaTeX

    all tools I use to get the job done.

I should also mention I am learning/experimenting/fighting with Scala, MongoDB and Hadoop.

I'm on GitHub at CamDavidsonPilon.


My academic background is mathematics and statistics, with applications to finance and biology(exclusively, much to my disappointment). I graduated from the University of Waterloo, 2012, with a Master's Degree in Quantitative Finance, and graduated from Wilfrid Laurier University with a BSc in Mathematical Finance and Biology.

I have experience in the following:

  • predictive analytics,
  • machine-learning,
  • time series modelling,
  • data cleaning/wrangling,
  • Bayesian methods,
  • A/B and MAB testing,
  • and data visualizations and web-based interfaces.

I believe that statistics is uninteresting without computers and novel applications. This motivates my interest in Bayesian statistics and machine learning. I've completed some interesting projects that combine these two fields (see my blog, for details):

Broad Experience

Aside from the open source and one-the-side projects, below is my more formal experience. Starting from the most recent:

  1. Canadian Pension Plan Investment Board: Quantitative Analyst

    May 2012 - Current: The CPPIB is Canada's largest pension fund, controlling and investing over 160 billion dollars.

    • Create and implement new trading strategies based on market dislocations or statistical patterns. We use machine-learning algorithms and predictive statistical models of time series to generate trades.

    • Developed a web-based GUI to display real-time analytics about the market.
    • Create efficient infastructure to capture and store market real-time data.
    1. Tools: Python, Redis, Linux/Windows, MATLAB, SQL, C++, Bloomberg, Excel, Javascript
    2. Techniques: data visualization, machine-learning, data scrapping, model creation and validation.
  2. Advanced Scientific Programming in Python: Participant

    September 2012: An amazing one-week program based on Python's numerical power.

    1. Tools: Python, IPython, Linux
    2. Techniques: data viz, parallel programming, paired programming, Cython, memory considerations
  3. Graduate Teaching Assistant at University of Waterloo

    September 2011-December 2012: (Tried to) inspire undergraduate in differential equations and linear algebra.

  4. Math in Moscow: Student

    September 2011-December 2011: an international math program for invited students to study mathematics in Moscow, Russia.

    1. Techniques: calculus on manifolds, advanced linear algebra, group theory, Russian Language
  5. Undergraduate Researcher at Fields Institute

    Summer 2011: Part of an international program that invites students to work closely with researchers at the Fields Institute for Research in Mathematical Sciences. I worked on epidemiology and evolutionary dynamics.

    1. Techniques: probability, dynamical systems, evolutionary models, game theory.
  6. NSERC Undergraduate Researcher

    Summer 2011: Worked with Dr. Roman Makarov on pricing exotic financial derivatives through Monte Carlo methods. Started development on PyProcess.

  7. NSERC Undergraduate Researcher

    Summer 2010: Worked with Dr. Manuel Santoprete on the N-body problem.

Contact me

I am a very pleasant and excited person to be around. I like dogs, biking, and coffee. I love science, hacking, and making people unexpectedly happy. As a student of evolution, I am constantly trying new experiments. Most fail, but then they don't.

I can be reached at Thanks for taking the time to consider me ;)


Many of my projects involve hacking on the Twitter API and using said API to construct games, re-tell stories or otherwise just troll people. Check out some of my Twitter related projects to the right.

Twitter, if you are seeing this, I'd love to hack on your end. Please allow me an interview for a data analytics position. So much of my data and research involves you guys. Also, it's so cool you visited my site.


Twittxor is a Twitter-based game that challenges a user's knowledge of the people they follow on Twitter. The best way to explain it is to actually just play it, so click below to visit the game.
It was built in Django/Python with the assistance of python-twitter. The code for the web app is available on GitHub.

I built this game at an overnight UWaterloo hackathon. Note: My explainations can be so vague on the page (I just assume the UI is so intuitive that the game explains itself haha).

The Twitter Projects

@EnglishTeach85 - all she wants to do is love and troll.
@My6thGradeNovel - The shittiest writing I've ever done.
@HootBack - Tweet to me, @Hootback, for an anonymous tweet from the previous tweeter.

More Projects

Probablistic Programming and Bayesian Methods for Hackers

A book I am main author of introducing Bayesian methods for the non-mathematical. Awesome stuff! See the Subway station on the North of the map.

Responsive art project (unnamed)

I'm designing a kinetic-responsive art project that will display a chaotic system in response to my phone's movements.

Interactive Subway maps

I've got a great library to build subway maps [1], [2]. Now what?


typing password into mobile devices is a pain. What if you could swipe in your password for mobile sites? Check out my in development jPattern test.

Hey! If you want to see my latest projects in real time, follow me on Twitter @cmrn_dp

Probabilistic Programming and Bayesian Methods for Hackers

From the repo on Github:

Probabilistic Programming and Bayesian Methods for Hackers

Using Python and PyMC

Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author's own prior opinion.

After some recent success of Bayesian methods in machine-learning competitions, I decided to investigate the subject again. Even with my mathematical background, it took me three straight-days of reading examples and trying to put the pieces together to understand the methods. There was simplely not enough literature bridging theory to practice. The problem with my misunderstanding was the disconnect between Bayesian mathematics and probabilistic programming. That being said, I suffered then so the reader would not have to now. This book attempts to bridge the gap.

If Bayesian inference is the destination, then mathematical analysis is a particular path to it. On the other hand, computing power is cheap enough that we can afford to take an alternate route via probabilistic programming. The path is much more useful, as it denies the necessity of mathematical intervention at each step, that is, we remove often-intractable mathematical analysis as a prerequisite to Bayesian inference. Simply put, this computational path proceeds via small intermediate jumps from beginning to end, where as the first path proceeds by enormous leaps, often landing far away from our target. Furthermore, without a strong mathematical background, the analysis required by the first path cannot even take place.

Probabilistic Programming and Bayesian Methods for Hackers is designed as a introduction to Bayesian inference from a computational/understanding-first, and mathematics-second, point of view. Of course as an introductory book, we can only leave it at that: an introductory book. For the mathematically trained, they may cure their curiousity this text generates with other texts designed with mathematical analysis in mind. For the enthusiast with less mathematical-background, or one who is not interested in the mathematics but simply the practice of Bayesian methods, this text should be sufficient and entertaining.

The choice of PyMC as the probabilistic programming language is two-fold. As of this writing, there is currently no central resource for examples and explanations in the PyMC universe. The official documentation assumes prior knowledge of Bayesian inference and probabilistic programming. We hope this book encourages users at every level to look at PyMC. Secondly, with recent core developments and popularity of the scientific stack in Python, PyMC is likely to become a core component soon enough.

PyMC does have dependencies to run, namely NumPy and (optionally) SciPy. To not limit the user, the examples in this book will rely only on PyMC, NumPy and SciPy only.

Examples from the book:

  1. Infering human behaviour from SMS message rates, Chapter 1.
  2. Solving the Price is Right Showdown, Chapter 5.
  3. Implementing Kaggle winning solutions, Chapter 5.
  4. Exploring Github's social datasets, Chapter 10.
  5. Aerospace data, specifically the Challenger Spacecraft explosion, Chapter 2.
  6. Financial models wth non-linear payoffs, Chapter 5.


Probablistic Programming and Bayesian Methods for Hackers

A book I am main author of introducing Bayesian methods for the non-mathematical. Awesome stuff! See the Subway station on the North of the map.

Interactive Subway maps

I've got a great library to build subway maps [1], [2]. Now what?


typing password into mobile devices is a pain. What if you could swipe in your password for mobile sites? Check out my in development jPattern test.

Kaggle Competitions / Machine Learning posts is a site that provides companies and researchers with the global data science department. They host machine learning competitions for professional and amateur data scientist and machine-learnist to attempt, and winners receive sometimes quite large prizes (from \$500 to possibly \$3 million). I've been attempting a few competitions I found interesting, and overall I am doing well. More importantly I'm learning what doesn't work, and how to generalize to solve other data science problems. I guess one can call this human-learning =\

Similarly, I've been creating really nice machine learning posts/tutorials. There are more mathematical than others, but always have a nice example to demonstrate the subject matter.

This site

This site is always a work in progress. This is currently the second implementation of my personal we site, the previous being the now dead If you like the interactive subway component, you can download the javascript code from my Github account to build you own.


During a hackathon, I put together a fun little Twitter game, Twittxor. Soon to go viral. BTW, if you know any comp sci or mathematics, the name Twittxor is a concatenation of Twitter and XOR (exclusive-or), which describes the game well.


I'm making I made a nice online graphical interface for exact Simulation of Stochastic Processes via Python (see below) using Django .

The Golden Retrieber

The only algorithm in the world that tries to find Justin Bieber, introducing The Golden Retrieber. This was a project for a stat. learning class, but I will hopefully evolve into a web app!

OWO: One Week Of

OWO is an attempt to pertubate my mental understanding into a new, higher state by exploring my own, and society's, pyschology, physiology, energy, prejudices, biases, spirituality, motivations, affections, all of it. Keep a close eye on this project.

PyProcess: Exact Simulation of Stochastic Processes via Python

The project PyProcess is a Python implementation of the research I did on exactly simulating stochastic processes. With the PyProcess library, you can exactly simulate many diffusion processes, jump processes (finite and infinite activiy) and combinations of processes. Here is a link to the Google project's Github's main page and the documentation. Please abuse my code.

Marketing Director and Webmaster

I was the marketing director and webmaster of the UW Cabaret, a small and terrific cabaret troupe. Check out our sweet website.

AMMCS 2013

Webmaster for the conference.

AMMCS 2011

To celebrate their centennial year, Wilfrid Laurier is hosting the 2011 Applied Mathematics, Modeling and Computational Science Conference. The conference is held between July 25th to the 29th. I, as I have too much free time, volunteered to help I was recruited to create the Book of Abstracts for the conference. It took me 100+ volunteer hours, so I might as well show if off. You can view it below:

2011 AMMCS Book of Abstracts

Modeling human password creation

I used Python and 14 million leaked passwords to create a pretty cool machine-learning model of how humans create passwords.


Visit my github account [here]


  1. Press and hold SHIFT and navigate the subway lines with your arrow keys. If your L33T-gamer, you can also use your WASD keys.
  2. You can use your mouse to click on the different stations.
You can also drag and drop the subway stations to create new maps.

Who am I?

My name is Cam Davidson-Pilon. I spend my time on awesome subjects like machine learning, coding Twitter trolls, web design, data science and biology. I do stats consulting at


Machine and Statistical Learning

"There are wavelengths that people cannot see, there are sounds that people cannot hear, and maybe computers have thoughts that people cannot think."
-R. Hamming

Machine learning has really taken over my time. With huge web-inspired data sets becoming more and more available over the past decade, it seems to be taking over everyone's time.

Also check out my blog, where I discuss and practice with common/rare machine-learning algorithms.

Evolutionary Systems and Biology

"A nest, like a bird, is a gene's way of making another gene." -R.Dawkins

In biology I study population dynamics, game theory and evolution, disease modeling (WITH stochastics) and applications to genetic algorithms. I recently started investigating host-parasite dynamics with sex. I believe that many sexual species' immune systems have evolved a quasigroup-like structure.

Probability and Finance

"Don't sell them 30% fat, sell them 70% fat free."-N.Taleb