{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Calculation of multiplet frequency from cell-type mixing in `Python`\n", "Here we implement the simple function to calculate the multiplet frequency from single-cell RNA sequencing experiments where we have mixed cells of two types (e.g., human and mouse), and know the number of observed droplets that contain cells of each type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Function to compute multiplet frequency\n", "The multiplet frequency is computed in terms of the following three experimental observables:\n", " - the number of droplets that contain at least one cell of type 1, which we denote as $N_1$\n", " - the number of droplets that contain at least one of type 2, which we denote as $N_2$\n", " - the number of droplets containing cells of both type 1 and type 2, which we denote as $N_{1,2}$\n", "\n", "The multiplet frequency $M$ is\n", "$$M = 1 - \\frac{\\left(\\mu_1 + \\mu_2\\right)e^{-\\mu_1 - \\mu_2}}{1 - e^{-\\mu_1 - \\mu_2}}$$\n", "where\n", "$$\\mu_1 = -\\ln\\left(\\frac{N - N_1}{N}\\right)$$\n", "and\n", "$$\\mu_2 = -\\ln\\left(\\frac{N - N_2}{N}\\right),$$\n", "and where\n", "$$N = \\frac{N_1 N_2}{N_{12}}.$$\n", "\n", "## Python function\n", "Here is the calculation implemented as Python function:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy\n", "\n", "def multipletFreq(n1, n2, n12):\n", " \"\"\"Estimated multiplet frequency from cell-type mixing experiment.\n", "\n", " `n1`, `n2`, `n12` (`int` or `numpy.ndarray` of integers)\n", " Number of droplets with at least one cell of type 1,\n", " at least one cell of type 2, or cells of both types.\n", " \"\"\"\n", " n = numpy.array(n1 * n2 / n12).astype('float')\n", " mu1 = -numpy.log((n - n1) / n)\n", " mu2 = -numpy.log((n - n2) / n)\n", " mu = mu1 + mu2\n", " return 1 - mu * numpy.exp(-mu) / (1 - numpy.exp(-mu))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example when cell types mixed in equal proportion\n", "First we demonstrate the calculations in the case when the cell types are mixed in equal proportions.\n", "\n", "Let's create some hypothetical data.\n", "We'll imagine that these data come from the [10X cellranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger) software analysis of a multi-species experiment that mixed mouse and human cells equally.\n", "The current version (2.1.1) of the [10X cellranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger) pipeline for this type of study would return the *human Estimated Number of Cell Partitions*, the *mouse Estimated Number of Cell Partitions*, and the number of *GEMs with > 0 Cells* (this last quantity is also reported as the *Estimated Number of Cells*).\n", "These statistics give the numbers of non-empty GEMs with cells of each type (GEMs is the term that 10X uses to refer to their droplets).\n", "\n", "Here is a data frame of some hypothetical data from three different experiments, each with 4000 cells total but with different numbers of cross-celltype droplets:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/jbloom/Documents/software/conda/envs/BloomLab/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88\n", " return f(*args, **kwds)\n", "/Users/jbloom/Documents/software/conda/envs/BloomLab/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88\n", " return f(*args, **kwds)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
human_dropletsmouse_dropletsnonempty_droplets
experiment
1200520054000
2205020504000
3250025004000
\n", "
" ], "text/plain": [ " human_droplets mouse_droplets nonempty_droplets\n", "experiment \n", "1 2005 2005 4000\n", "2 2050 2050 4000\n", "3 2500 2500 4000" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas\n", "\n", "df_equal = (pandas.DataFrame({\n", " 'human_droplets':[2005, 2050, 2500],\n", " 'mouse_droplets':[2005, 2050, 2500],\n", " 'nonempty_droplets':[4000, 4000, 4000]\n", " },\n", " index=numpy.arange(3) + 1)\n", " .rename_axis('experiment')\n", " )\n", "\n", "df_equal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We calculate the number of droplets with cells of **both** types (human and mouse) simply as the sum of the human and mouse droplets minus the total number of non-empty droplets, since these cross-celltype droplets are double counted in the tally of human and mouse droplets.\n", "As is apparent after this calculation, in this hypothetical example the cross-celltype droplets represent 0.25%, 2.5%, and 25% of the total non-empty droplets in the three examples." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
human_dropletsmouse_dropletsnonempty_dropletshuman_and_mouse_droplets
experiment
120052005400010
2205020504000100
32500250040001000
\n", "
" ], "text/plain": [ " human_droplets mouse_droplets nonempty_droplets \\\n", "experiment \n", "1 2005 2005 4000 \n", "2 2050 2050 4000 \n", "3 2500 2500 4000 \n", "\n", " human_and_mouse_droplets \n", "experiment \n", "1 10 \n", "2 100 \n", "3 1000 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_equal = df_equal.assign(human_and_mouse_droplets=lambda x:\n", " x.human_droplets + x.mouse_droplets - x.nonempty_droplets)\n", "\n", "df_equal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now calculate the multiplet frequency in two ways.\n", "The first way is to precisely calculate the multiplet frequency using the exact Poisson derivation as implemented in the `multipletFreq` function above.\n", "\n", "The second way is to do the simple calculation that has commonly been used in paper that do equal-proportion mixing.\n", "This method is to simply estimate the multiplet frequency as twice the frequency of cross-cell-type droplets among all non-empty droplets (that is, as $\\frac{2 \\times N_{1,2}}{N_1 + N_2 + N_{12}}$)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
human_dropletsmouse_dropletsnonempty_dropletshuman_and_mouse_dropletsmultiplet_freqtwice_cross_celltype_freq
experiment
1200520054000100.0050.005
22050205040001000.0490.050
325002500400010000.4250.500
\n", "
" ], "text/plain": [ " human_droplets mouse_droplets nonempty_droplets \\\n", "experiment \n", "1 2005 2005 4000 \n", "2 2050 2050 4000 \n", "3 2500 2500 4000 \n", "\n", " human_and_mouse_droplets multiplet_freq \\\n", "experiment \n", "1 10 0.005 \n", "2 100 0.049 \n", "3 1000 0.425 \n", "\n", " twice_cross_celltype_freq \n", "experiment \n", "1 0.005 \n", "2 0.050 \n", "3 0.500 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_equal = (df_equal\n", " .assign(multiplet_freq=lambda x: \n", " multipletFreq(x.human_droplets,\n", " x.mouse_droplets,\n", " x.human_and_mouse_droplets))\n", " .assign(twice_cross_celltype_freq=lambda x:\n", " 2 * x.human_and_mouse_droplets / x.nonempty_droplets)\n", " )\n", "\n", "df_equal.round(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As can be seen above, the two methods give virtually identical results as long as the number of cross-celltype droplets is small relative to the total number of droplets.\n", "The reason that the two methods aren't identical as simply calculating the multiplet frequency as twice the cross-celltype frequency neglects to account for droplets that have more than two cells.\n", "However, this difference only becomes appreciable when the multiplet frequency is high.\n", "So for the examples above, we can see that it only really matters in the case when the true multiplet frequency is $\\approx$0.425; in that case, simply taking twice the cross-celltype frequency slightly overestimates the true multiplet frequency" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example when cell types are mixed unequally\n", "Now we repeat the example above, but for experiments where the cell types are mixed unequally.\n", "\n", "Below we give some example calculations for various numbers. \n", "An interesting (and initially non-intuitve aspect) of the results is that when the cells are mixed highly unequally and multiplets are common, the multiplet frequency is actually substantially **less** than the fraction of droplets with the rare cell type that are multiplets. \n", "The reason is that multiplets are more likely than singlets to have a cell of the rarer type, and become progressively more likely to have a cell of the rare type as the number of cells in the multiplet increases." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
human_dropletsmouse_dropletsnonempty_dropletshuman_and_mouse_dropletsmultiplet_freq
experiment
12050205040001000.049
23050105040001000.065
3355055040001000.110
4385025040001000.245
5395015040001000.459
\n", "
" ], "text/plain": [ " human_droplets mouse_droplets nonempty_droplets \\\n", "experiment \n", "1 2050 2050 4000 \n", "2 3050 1050 4000 \n", "3 3550 550 4000 \n", "4 3850 250 4000 \n", "5 3950 150 4000 \n", "\n", " human_and_mouse_droplets multiplet_freq \n", "experiment \n", "1 100 0.049 \n", "2 100 0.065 \n", "3 100 0.110 \n", "4 100 0.245 \n", "5 100 0.459 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_unequal = (pandas.DataFrame({\n", " 'human_droplets':[2050, 3050, 3550, 3850, 3950],\n", " 'mouse_droplets':[2050, 1050, 550, 250, 150],\n", " 'nonempty_droplets':[4000, 4000, 4000, 4000, 4000]\n", " },\n", " index=numpy.arange(5) + 1)\n", " .rename_axis('experiment')\n", " .assign(human_and_mouse_droplets=lambda x:\n", " x.human_droplets + x.mouse_droplets - x.nonempty_droplets)\n", " .assign(multiplet_freq=lambda x:\n", " multipletFreq(x.human_droplets,\n", " x.mouse_droplets,\n", " x.human_and_mouse_droplets))\n", " )\n", "\n", "df_unequal.round(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Write results to LaTex tables for paper\n", "Finally, we write the example results to LaTex tables to be included in the paper:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing equal example data frame to equal_example.tex\n", "Writing unequal example data frame to unequal_example.tex\n" ] } ], "source": [ "for (df, dfname) in [(df_equal, 'equal'), (df_unequal, 'unequal')]:\n", " f = '{0}_example.tex'.format(dfname)\n", " print(\"Writing {0} example data frame to {1}\".format(dfname, f))\n", " ncol = len(df.columns) + 1\n", " column_format = 'C{0.66in}' * ncol\n", " (df.round(3)\n", " .rename(columns={col:col.replace('_', ' ') for col in df.columns})\n", " .reset_index()\n", " .to_latex(f, index=False, column_format=column_format)\n", " )" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" }, "toc": { "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "toc_cell": true, "toc_position": {}, "toc_section_display": "none", "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }