Public Member Functions
def	__init__
def	set_assembly
def	map_chromosome_names
def	get_links
def	fasta_from_regions
def	statistics
def	fasta_path
def	get_sqlite_url
def	sqlite_path
def	get_features_from_gtf
def	get_gene_mapping
def	get_transcript_mapping
def	get_exon_mapping
def	get_exons_in_trans
def	get_trans_in_gene
def	gene_coordinates
def	annot_track
def	gene_track
def	exon_track
def	transcript_track
def	chrmeta
def	chrnames
Public Attributes
	genrep
	intype
	index_path
	created_at
	updated_at
	source_name
	md5
	chromosomes

Member Function Documentation

def bbcflib::genrep::Assembly::__init__	(	self,
		assembly = `None`,
		genrep = `None`,
		intype = `0`
	)

A representation of a GenRep assembly.
To get an assembly from the repository, call the Assembly
constructor with either the integer assembly ID or the string assembly
name.  This returns an Assembly object::

    a = Assembly(3)
    b = Assembly('mm9')

An Assembly has the following fields:

.. attribute:: id

An integer giving the assembly ID in GenRep.

.. attribute:: name

A string giving the nameassembly of the assembly in GenRep.

.. attribute:: index_path

The absolute path to the bowtie index for this assembly.

.. attribute:: chromosomes

A dictionary of chromosomes in the assembly.  The dictionary
values are tuples of the form (chromsome id, RefSeq locus,
RefSeq version), and the values are dictionaries with the keys
'name' and 'length'.

.. attribute:: bbcf_valid

Boolean.

.. attribute:: updated_at

.. attribute:: created_at

``datetime`` objects.

.. attribute:: nr_assembly_id

.. attribute:: genome_id

.. attribute:: source_id

.. attribute:: intype

All integers. ``intype`` is '0' for genomic data, '1' for exons and '2' for transcripts.

.. attribute:: source_name

.. attribute:: md5

def bbcflib::genrep::Assembly::annot_track	(	self,
		annot_type = `'gene'`,
		chromlist = `None`,
		biotype = `["protein_coding"]`
	)

Return an iterator over all annotations of a given type in the genome.

:param annot_type: (str) one of 'gene','transcript','exon','CDS'.
:chrom_list: (list of str) return only features in the specified chromosomes.
:biotype: (list of str, or None) return only features with the specified biotype(s).
:rtype: btrack.FeatureStream

def bbcflib::genrep::Assembly::chrmeta ( self )

Return a dictionary of chromosome meta data of the type:
``{'chr1': {'length': 249250621},'chr2': {'length': 135534747},'chr3': {'length': 135006516}}``

def bbcflib::genrep::Assembly::chrnames ( self )

Return a list of chromosome names.

def bbcflib::genrep::Assembly::exon_track	(	self,
		chromlist = `None`,
		biotype = `["protein_coding"]`
	)

Return an iterator over all coding exons annotation in the genome.

def bbcflib::genrep::Assembly::fasta_from_regions	(	self,
		regions,
		out = `None`,
		path_to_ref = `None`,
		chunk = `50000`,
		shuffled = `False`
	)

Get a fasta file with sequences corresponding to the features in the
bed or sqlite file.

Returns the name of the output file and the total size of the extracted sequence.

:param regions: (str or dict or list) bed or sqlite file name, or sequence of features.
    If *regions* is a dictionary {'chr': [[start1,end1],[start2,end2]]}
    or a list [['chr',start1,end1],['chr',start2,end2]],
    will simply iterate through its items instead of loading a track from file.
:param out: (str or dict) output file name. If *out* is a (possibly empty) dictionary,
    will return the filled dictionary.
:param path_to_ref: (str) path to a fasta file containing the whole reference sequence.
:rtype: (str,int)

def bbcflib::genrep::Assembly::fasta_path	(	self,
		chromosome = `None`
	)

Return the path to the compressed fasta file, for the whole assembly or for a single chromosome.

def bbcflib::genrep::Assembly::gene_coordinates	(	self,
		id_list
	)

Creates a BED-style stream from a list of gene ids.

def bbcflib::genrep::Assembly::gene_track	(	self,
		chromlist = `None`,
		biotype = `["protein_coding"]`
	)

Return an iterator over all protein coding genes annotation in the genome.

def bbcflib::genrep::Assembly::get_exon_mapping ( self )

Return a dictionary ``{exon ID: ([transcript IDs],gene ID,start,end,strand,chromosome)}``

def bbcflib::genrep::Assembly::get_exons_in_trans ( self )

Return a dictionary ``{transcript ID: list of exon IDs it contains}``

def bbcflib::genrep::Assembly::get_features_from_gtf	(	self,
		h,
		chr = `None`,
		method = `"dico"`
	)

Return a dictionary *data* of the form
``{key:[[values],[values],...]}`` containing the result of an SQL request which
parameters are given as a dictionary *h*. All [values] correspond to a line in the SQL.

:param chr: (str, or list of str) chromosomes on which to perform the request. By default,
every chromosome is searched.

Available keys for *h*, and possible values:

* "keys":       "$,$,..."            (fields to `SELECT` and pass as a key of *data*)
* "values":     "$,$,..."            (fields to `SELECT` and pass as respective values of *data*)
* "conditions": "$:#,$:#,..."        (filter (SQL `WHERE`))
* "uniq":       "whatever"           (SQL `DISTINCT` if specified, no matter what the -string- value is)
* "at_pos":     "12,36,45,1124,..."  (to select only features overlapping this list of positions)

where

* $ holds for any column name in the database
* # holds for any value in the database

Note: giving several field names to "keys" permits to select unique combinations of these fields.
The corresponding keys of *data* are a concatenation (by ';') of these fields.

def bbcflib::genrep::Assembly::get_gene_mapping ( self )

Return a dictionary ``{geneID: (geneName, start, end, length, strand, chromosome)}``
Note that the gene's length is not the sum of the lengths of its exons.

def bbcflib::genrep::Assembly::get_links	(	self,
		params
	)

Returns urls to features. Example::

    assembly.get_links({'name':'ENSMUSG00000085692', 'type':'gene'})

returns the dictionary
``{"Ensembl":"http://ensembl.org/Mus_musculus/Gene/Summary?g=ENSMUSG00000085692"}``.
If *params* is a string, then it is assumed to be the *name* parameter with ``type=gene``.

def bbcflib::genrep::Assembly::get_sqlite_url ( self )

Return the url of the sqlite file containing gene annotations.

def bbcflib::genrep::Assembly::get_trans_in_gene ( self )

Return a dictionary ``{gene ID: list of transcript IDs it contains}``

def bbcflib::genrep::Assembly::get_transcript_mapping ( self )

Return a dictionary ``{transcript ID: (gene ID,start,end,length,strand,chromosome)}``

def bbcflib::genrep::Assembly::map_chromosome_names	(	self,
		names
	)

Finds keys in the `chromosomes` dictionary that corresponds to the names or ids given as `names`.
Returns a dictionary, such as::

    assembly.map_chromosome_names([3,5,6,47])

    {'3': (2701, u'NC_001135', 4),
    '47': None,
    '5': (2508, u'NC_001137', 2),
    '6': (2580, u'NC_001138', 4)}

def bbcflib::genrep::Assembly::set_assembly	(	self,
		assembly
	)

Reset the Assembly attributes to correspond to *assembly*.

:param assembly: integer giving the assembly ID, or a string giving the assembly name.

def bbcflib::genrep::Assembly::sqlite_path ( self )

Return the path to the sqlite file containing genes annotations.

def bbcflib::genrep::Assembly::statistics	(	self,
		output = `None`,
		frequency = `False`,
		matrix_format = `False`
	)

Return (di-)nucleotide counts or frequencies for an assembly, writes in file *output* if provided.
Example of result::

    {
"TT": 13574667
"GG": 3344762
"CC": 3365555
"AA": 13571722
"A": 32370285
"TA": 6362526
"GT": 4841536
"AC": 4846697
"N": 0
"C": 17781115
"TC": 6228639
"GA": 6231575
"CG": 3131283
"GC: 3340219
"CT": 5079814
"AG": 5075950
"G": 17758095
"TG": 6206098
"CA": 6204462
"AT": 8875914
"T": 32371931
    }

    Total = A + T + G + C

If *matrix_format* is True, *output* is like::

     >Assembly: sacCer2
    1   0.309798640038793   0.308714120881750   0.190593944221299   0.190893294858157

def bbcflib::genrep::Assembly::transcript_track	(	self,
		chromlist = `None`,
		biotype = `["protein_coding"]`
	)

Return an iterator over all protein coding transcripts annotation in the genome.

The documentation for this class was generated from the following file:

/home/rougemon/Data/pipelines/libv2/bbcflib/bbcflib/genrep.py

bbcflib::genrep::Assembly Class Reference

Public Member Functions

Public Attributes

Member Function Documentation