Functions

bbcflib::chipseq Namespace Reference

Functions

def macs
def add_macs_results
def camelPeaks
def run_deconv
def workflow_groups

Detailed Description

=======================
Module: bbcflib.chipseq
=======================

This module provides functions to run a basic ChIP-seq analysis from reads mapped on
a reference genome.
The most important steps are the binding of ``macs`` via the 'add_macs_results' function and
the peak deconvolution algorithm with the 'run_deconv' function.

The whole workflow can be run via the function ``workflow_groups`` with appropriate options in 'job.options'.

Below is the script used by the frontend::

from bbcflib import daflims, genrep, frontend, email, common
from bbcflib.mapseq import *
from bbcflib.chipseq import *
M = MiniLIMS( '/path/to/chipseq/minilims' )
ms_limspath = '/path/to/mapseq/minilims'
working_dir = '/path/to/scratch/on/cluster'
hts_key = 'test_key'
assembly_id = 'mm9'
gl = { 'hts_chipseq': {'url': 'http://htsstation.epfl.ch/chipseq/'},
       'hts_mapseq': {'url': 'http://htsstation.epfl.ch/mapseq/'},
       'script_path': '/srv/chipseq/lib' }
htss = frontend.Frontend( url=gl['hts_chipseq']['url'] )
job = htss.job( hts_key )
assembly = genrep.Assembly( assembly=assembly_id )
with execution( M, description=hts_key, remote_working_directory=working_dir ) as ex:
    (ms_files, job) = get_bam_wig_files( ex, job, ms_limspath, gl['hts_mapseq']['url'], gl['script_path'], via=via )
    files = workflow_groups( ex, job, ms_files, assembly, gl['script_path'] )
print ex.id
allfiles = common.get_files( ex.id, M )
print allfiles

Function Documentation

def bbcflib::chipseq::add_macs_results (   ex,
  read_length,
  genome_size,
  bamfile,
  ctrlbam = None,
  name = None,
  poisson_threshold = None,
  alias = None,
  macs_args = None,
  via = 'lsf' 
)
Calls the ``macs`` function on each possible pair
of test and control bam files and adds
the respective outputs to the execution repository.

``macs`` options can be controlled with `macs_args`.
If a dictionary of Poisson thresholds for each sample is given, then the enrichment bounds ('-m' option)
are computed from them otherwise the default is '-m 10,100'.

Returns the set of file prefixes.
def bbcflib::chipseq::camelPeaks (   scores_fwd,
  scores_rev,
  peaks,
  chromosome_name,
  chromosome_length,
  read_extension,
  script_path 
)
Runs the 'camelPeaks.py' wrapper script on the
'scores_fwd', 'scores_rev' and 'peaks'
input with parameters 'chromosome_name' (name of chromosome to process)
and 'read_extension', using functions from 'script_path'/deconv_fcts.R.
Returns a pdf file and several data tracks.
def bbcflib::chipseq::macs (   read_length,
  genome_size,
  bamfile,
  ctrlbam = None,
  args = None 
)
Binding for the ``macs`` peak caller.

takes one (optionally two) bam file(s) and
the 'read_length' and 'genome_size' parameters passed to ``macs``.

Returns the file prefix ('-n' option of ``macs``)
def bbcflib::chipseq::run_deconv (   ex,
  sql,
  peaks,
  chromosomes,
  read_extension,
  script_path,
  via = 'lsf' 
)
Runs the complete deconvolution process for a set of sql files and a bed file,
parallelized over a set of chromosomes (a dictionary with keys 'chromosome_id'
and values a dictionary with chromosome names and lengths).

Returns a dictionary of file outputs with keys file types
('pdf', 'sql' and 'bed') and values the file names.
def bbcflib::chipseq::workflow_groups (   ex,
  job_or_dict,
  mapseq_files,
  assembly,
  script_path = '',
  logfile = None,
  via = 'lsf' 
)
Runs a chipseq workflow over bam files obtained by mapseq. Will optionally run ``macs`` and 'run_deconv'.

:param ex: a 'bein' execution environment to run jobs in,

:param job_or_dict: a 'Frontend' 'job' object, or a dictionary with key 'groups' and 'options' if applicable,

:param mapseq_files: a dictionary of files fomr a 'mapseq' execution, imported via 'mapseq.get_bam_wig_files',

:param assembly: a genrep.Assembly object,

:param script_path: only needed if 'run_deconv' is in the job options, must point to the location of the R scripts.

Defaults ``macs`` parameters (overriden by ``job_or_dict['options']['macs_args']``) are set as follows:

* ``'-bw'``: 200 ('bandwith')

* ``'-m'``: 10,100 ('minimum and maximum enrichments relative to background or control')

The enrichment bounds will be computed from a Poisson threshold *T*, if available, as *(min(30,5*(T+1)),50*(T+1))*.

Returns a tuple of a dictionary with keys *group_id* from the job groups, *macs* and *deconv* if applicable and values file description dictionaries and a dictionary of *group_ids* to *names* used in file descriptions.
 All Classes Namespaces Functions