Functions | Variables

bbcflib::rnaseq Namespace Reference

Functions

def positive
def lsqnonneg
def fetch_mappings
def fetch_labels
def build_pileup
def fusion
def save_results
def genes_expression
def transcripts_expression
def estimate_size_factors
def to_rpkm
def rnaseq_workflow
def run_glm
def clean_before_deseq
def clean_deseq_output
def differential_analysis

Variables

 n

Detailed Description

======================
Module: bbcflib.rnaseq
======================

Methods of the bbcflib's RNA-seq worflow. The main function is ``rnaseq_workflow()``,
and is usually called by bbcfutils' ``run_rnaseq.py``, e.g. command-line:

``python run_rnaseq.py -v local -c gapkowt.txt -d rnaseq -p transcripts,genes``

From a BAM file produced by an alignement on the **exonome**, gets counts of reads
on the exons, add them to get counts on genes, and uses least-squares to infer
counts on transcripts, avoiding to map on either genome or transcriptome.
Note that the resulting counts on transcripts are approximate.
The annotation of the bowtie index has to be consistent to that of the database (same assembly version).

Function Documentation

def bbcflib::rnaseq::build_pileup (   bamfile,
  labels 
)
From a BAM file, returns a dictionary of the form {feature ID: number of reads that mapped to it}.

:param bamfile: name of a BAM file.
:param labels: index references - as returned by **fetch_labels()**
:type bamfile: string
:type exons: list
def bbcflib::rnaseq::clean_before_deseq (   filename  ) 
Delete all lines of *filename* where counts are 0 in every run.
def bbcflib::rnaseq::clean_deseq_output (   filename  ) 
Delete all lines of *filename* with NA's everywhere, add 0.5 to zero scores
before recalculating fold change, and remove rows' ids.
def bbcflib::rnaseq::differential_analysis (   ex,
  result,
  rpath,
  design = None,
  contrast = None 
)
For each file in *result*, launches an analysis of differential expression on the count
values, and saves the output in the MiniLIMS.

:param ex: the bein's execution.
:param result: dictionary {type:filename} as returned by rnaseq_workflow.
:param rpath: path to the R scripts (negbin.test.R).
:param design: name of the file containing the design matrix.
:param contrast: name of the file containing the contrasts matrix.
def bbcflib::rnaseq::estimate_size_factors (   counts  ) 
The total number of reads may be different between conditions or replicates.
This treatment makes different count sets being comparable. If rows are features
and columns are different conditions/replicates, each column is divided by the
geometric mean of the rows.
The median of these ratios is used as the size factor for this column. Size
factors may be used for further variance sabilization.

:param counts: an array of counts, each line representing a transcript, each
               column a different sample.
def bbcflib::rnaseq::fetch_labels (   bamfile  ) 
Returns a list of the exons/transcripts labels in the header of *bamfile*.
def bbcflib::rnaseq::fetch_mappings (   assembly  ) 
Given an assembly ID, returns a tuple
``(gene_mapping, transcript_mapping, exon_mapping, trans_in_gene, exons_in_trans)``

* [0] gene_mapping is a dict ``{gene ID: (gene name,start,end,length,chromosome)}``
* [1] transcript_mapping is a dictionary ``{transcript ID: (gene ID,start,end,length,chromosome)}``
* [2] exon_mapping is a dictionary ``{exon ID: ([transcript IDs],gene ID,start,end,chromosome)}``
* [3] trans_in_gene is a dict ``{gene ID: [IDs of the transcripts it contains]}``
* [4] exons_in_trans is a dict ``{transcript ID: [IDs of the exons it contains]}``

'Lenghts' are always the sum of the lengths of the exons in the gene/transcript.

:param path_or_assembly_id: can be a numeric or nominal ID for GenRep
(e.g. 11, 76 or 'hg19' for H.Sapiens), or a path to a file containing a
pickle object which is read to get the mapping.
def bbcflib::rnaseq::fusion (   X  ) 
Takes a track-like generator with items of the form (chromosome,start,end,score)
and returns a merged track-like generator with summed scores.
This is to avoid having overlapping coordinates of features from both DNA strands,
which some genome browsers cannot handle for quantitative tracks.
def bbcflib::rnaseq::genes_expression (   exons_data,
  gene_mapping,
  exon_to_gene,
  ncond,
  nreads 
)
Get gene counts from exons counts.

Returns two dictionaries, one for counts and one for rpkm, of the form ``{gene ID: score}``.

:param exons_data: list of lists ``[exonsIDs,counts,rpkm,starts,ends,geneIDs,geneNames,strands,chrs]``.
:param exon_lengths: dictionary ``{exon ID: length}``
:param gene_mapping: dictionary ``{gene ID: (gene name,start,end,length,strand,chromosome)}``
:param exon_to_gene: dictionary ``{exon ID: gene ID}``.
:param ncond: number of samples.
def bbcflib::rnaseq::lsqnonneg (   C,
  d,
  x0 = None,
  tol = None,
  itmax_factor = 3 
)
Linear least squares with nonnegativity constraints (NNLS), based on MATLAB's lsqnonneg function.

``(x,resnorm,res) = lsqnonneg(C,d)`` returns

* the vector *x* that minimizes norm(d-Cx) subject to x >= 0
* the norm of residuals *resnorm* = norm(d-Cx)^2
* the residuals *res* = d-Cx

:param x0: Initial point for x.
:param tol: Tolerance to determine what is considered as close enough to zero.
:param itmax_factor: Maximum number of iterations.

:type C: nxm numpy array
:type d: nx1 numpy array
:type x0: mx1 numpy array
:type tol: float
:type itmax_factor: int
:rtype: *x*: numpy array, *resnorm*: float, *res*: numpy array

Reference: Lawson, C.L. and R.J. Hanson, Solving Least-Squares Problems, Prentice-Hall, Chapter 23, p. 161, 1974.
http://diffusion-mri.googlecode.com/svn/trunk/Python/lsqnonneg.py
def bbcflib::rnaseq::positive (   x  ) 
Set to zero all negative components of an array.
def bbcflib::rnaseq::rnaseq_workflow (   ex,
  job,
  bam_files,
  pileup_level = ["exons",
  genes,
  transcripts,
  via = "lsf" 
)
Main function of the workflow.

:rtype: None

:param ex: the bein's execution Id.
:param job: a Job object (or a dictionary of the same form) as returned from HTSStation's frontend.
:param bam_files: a complicated dictionary such as returned by mapseq.get_bam_wig_files.
:param pileup_level: a string or array of strings indicating the features you want to compare.
                     Targets can be 'genes', 'transcripts', or 'exons'.
:param via: 'local' or 'lsf'.
def bbcflib::rnaseq::run_glm (   rpath,
  data_file,
  options = [] 
)
Run *rpath*/negbin.test.R on *data_file*.
def bbcflib::rnaseq::save_results (   ex,
  cols,
  conditions,
  group_ids,
  assembly,
  header = [],
  feature_type = 'features' 
)
Save results in a tab-delimited file, one line per feature, one column per run.

:param ex: bein's execution.
:param cols: list of iterables, each element being a column to write in the output.
:param conditions: list of strings corresponding to descriptions of the different samples.
:param group_ids: dictionary {group name: group ID}.
:param assembly: a GenRep Assembly object.
:param header: list of strings, the column headers of the output file.
:param feature_type: (str) the kind of feature of which you measure the expression.
def bbcflib::rnaseq::transcripts_expression (   exons_data,
  exon_lengths,
  transcript_mapping,
  trans_in_gene,
  exons_in_trans,
  ncond,
  nreads 
)
Get transcript rpkms from exon rpkms.

Returns two dictionaries, one for counts and one for rpkm, of the form ``{transcript ID: score}``.

:param exons_data: list of lists ``[exonsIDs,counts,rpkm,starts,ends,geneIDs,geneNames,strands,chrs]``
:param exon_lengths: dictionary ``{exon ID: length}``
:param transcript_mapping: dictionary ``{transcript ID: (gene ID,start,end,length,strand,chromosome)}``
:param trans_in_gene: dictionary ``{gene ID: [transcript IDs it contains]}``
:param exons_in_trans: dictionary ``{transcript ID: [exon IDs it contains]}``
:param ncond: number of samples
 All Classes Namespaces Functions