The encapsulation of beads together with cells in droplets is the basis of microfluidic based single-cell RNA-seq technologies. Ideally droplets contain exactly one bead and one cell, however in practice the number of beads and cells in droplets is stochastic and encapsulation of cells in droplets produces an approximately Poisson distribution of number of cells per droplet:

Specifically, the probability of observing *k* cells in a droplet is approximated by

.

The rate parameter can be controlled and the average number of cells per droplet is equal to it. Therefore, setting to be much less than 1 ensures that two or more cells are rarely encapsulated in a single droplet. A consequence of this is that the number of empty droplets, given by $e^{-\lambda}$, is large. Importantly, one of the properties of the Poisson distribution is that variance is equal to the mean so the number of cells per droplet is also equal to .

Along with cells, beads must also be captured in droplets, and when plastic beads are used the occupancy statistics follow a Poisson distribution as well. This means that with technologies such as Drop-seq (Macosko et al. 2015), which uses polystyrene beads, many droplets are either empty, contain a bead and no cell, or a cell and no bead. The latter situation (cell and no bead) leads to a low “capture rate”, i.e. not many of the cells are assayed in an experiment.

One of the advantages of the inDrops method (Klein et al. 2015) over other single-cell RNA-seq methods is that it uses hydrogel beads which allow for a reduction in the variance of the number of beads per cell. In an important paper Abate et al. 2009 showed that close packing of hydrogel beads allows for an almost degenerate distribution where the number of beads per droplet is exactly one 98% of the time. The video below shows how close to degeneracy the distribution can be squeezed (in the example two beads are being encapsulated per droplet):

A discrete distribution defined over the non-negative integers with variance *less* than the mean is called *sub-Poisson.* Similarly, a discrete distribution defined over the non-negative integers with variance *greater* than the mean is called *super-Poisson*. This terminology dates back to at least the 1940s (e.g., Berkson et al. 1942) and is standard in many fields from physics (e.g. Rodionov and Cherkin 2004) to biology (e.g. Pitchiaya et al. 2014 ).

Figure 5.26 from Adrian Jeantet, Cavity quantum electrodynamics with carbon nanotubes, 2017.

Using this terminology, **the close packing of hydrogel beads can be said to enable sub-Poisson loading of beads into droplets** because the variance of beads per droplet is reduced in comparison to the Poisson statistics of plastic beads.

Unfortunately, in a 2015 paper, Bose et al. used the term “super-Poisson” instead of “sub-Poisson” in discussing an approach to reducing bead occupancy variance in the single-cell RNA-seq context. This sign error in terminology has subsequently been propagated and recently appeared in a single- cell RNA-seq review (Zhang et al. 2018) and in 10x Genomics advertising materials.

When it comes to single-cell RNA-seq we already have people referring to the number of reads sequenced as “the library size” and calling trees “one-dimensional manifolds“. Now sub-Poisson is mistaken for super-Poisson. Before you know it we’ll have professors teaching students that cell clusters obtained by k-means clustering are “cell types“…

Supper poisson (not to be confused with super-Poisson (not to be confused with sub-Poisson)).

## 6 comments

Comments feed for this article

February 7, 2019 at 5:53 am

amoebaWhat _should_ be called “library size”?

February 8, 2019 at 4:20 pm

Warren McGeeLior can correct me if I’m wrong, but the term “library size” ought to refer to the number of RNA molecules in a sample after the library preparation steps. Since only an aliquot of any library is submitted to the sequencer, one never measures the library size. The actual term that should be used is “sequencing depth” or “depth of sequencing”, analogous to “coverage depth” when doing genomic sequencing (i.e. how many times a particular base on the genome is measured by different reads). It would be an interesting history research project to figure out the origin of the term “library size”, just as this blog post and Peter Sims post below have done so with “super-poisson” for distributions with reduced variation in the number of beads in droplets.

February 7, 2019 at 7:29 am

Allen KnutsonOoh! Do you have an opinion on which way is tropicalization and which way is detropicalization?

February 12, 2019 at 12:58 pm

Lior Pachter😱

February 7, 2019 at 7:55 am

Peter SimsHi Lior,

I completely agree that I started an unfortunate trend with the improper use of the term “super-Poisson” in Bose et al. I’m writing to point out that the problem is even worse than you may realize. To my knowledge, this bad habit dates back about a decade to when fluorescence-based sequencing technologies were under development, when sadly, I also helped to propagate the improper use of “super-Poisson” for example here:

https://patents.google.com/patent/US20130053252A1

However, I cannot claim total responsibility for starting this trend. In fact, I appropriated the term from PacBio because I thought it sounded cool:

https://patents.google.com/patent/US20100009872A1

Even Illumina fell victim to this problem as it reported on patterned flow cell technology:

https://patents.google.com/patent/US8895249B2

I did manage to break this habit in subsequent reports at least for myself, but not until around 2016. Anyway, I really appreciate your post – if anyone can reverse this trend, it would probably be you.

-Peter

February 7, 2019 at 10:42 am

Lior PachterThanks- I really appreciate *your* comment!