Public Member Functions | Public Attributes

bbcflib::genrep::Assembly Class Reference

List of all members.

Public Member Functions

def __init__
def set_assembly
def map_chromosome_names
def get_links
def fasta_from_regions
def statistics
def fasta_path
def get_sqlite_url
def sqlite_path
def get_features_from_gtf
def get_gene_mapping
def get_transcript_mapping
def get_exon_mapping
def get_exons_in_trans
def get_trans_in_gene
def gene_coordinates
def annot_track
def gene_track
def exon_track
def transcript_track
def chrmeta
def chrnames

Public Attributes

 genrep
 intype
 index_path
 created_at
 updated_at
 source_name
 md5
 chromosomes

Member Function Documentation

def bbcflib::genrep::Assembly::__init__ (   self,
  assembly = None,
  genrep = None,
  intype = 0 
)
A representation of a GenRep assembly.
To get an assembly from the repository, call the Assembly
constructor with either the integer assembly ID or the string assembly
name.  This returns an Assembly object::

    a = Assembly(3)
    b = Assembly('mm9')

An Assembly has the following fields:

.. attribute:: id

An integer giving the assembly ID in GenRep.

.. attribute:: name

A string giving the nameassembly of the assembly in GenRep.

.. attribute:: index_path

The absolute path to the bowtie index for this assembly.

.. attribute:: chromosomes

A dictionary of chromosomes in the assembly.  The dictionary
values are tuples of the form (chromsome id, RefSeq locus,
RefSeq version), and the values are dictionaries with the keys
'name' and 'length'.

.. attribute:: bbcf_valid

Boolean.

.. attribute:: updated_at

.. attribute:: created_at

``datetime`` objects.

.. attribute:: nr_assembly_id

.. attribute:: genome_id

.. attribute:: source_id

.. attribute:: intype

All integers. ``intype`` is '0' for genomic data, '1' for exons and '2' for transcripts.

.. attribute:: source_name

.. attribute:: md5

def bbcflib::genrep::Assembly::annot_track (   self,
  annot_type = 'gene',
  chromlist = None,
  biotype = ["protein_coding"] 
)
Return an iterator over all annotations of a given type in the genome.

:param annot_type: (str) one of 'gene','transcript','exon','CDS'.
:chrom_list: (list of str) return only features in the specified chromosomes.
:biotype: (list of str, or None) return only features with the specified biotype(s).
:rtype: btrack.FeatureStream
def bbcflib::genrep::Assembly::chrmeta (   self  ) 
Return a dictionary of chromosome meta data of the type:
``{'chr1': {'length': 249250621},'chr2': {'length': 135534747},'chr3': {'length': 135006516}}``
def bbcflib::genrep::Assembly::chrnames (   self  ) 
Return a list of chromosome names.
def bbcflib::genrep::Assembly::exon_track (   self,
  chromlist = None,
  biotype = ["protein_coding"] 
)
Return an iterator over all coding exons annotation in the genome.
def bbcflib::genrep::Assembly::fasta_from_regions (   self,
  regions,
  out = None,
  path_to_ref = None,
  chunk = 50000,
  shuffled = False 
)
Get a fasta file with sequences corresponding to the features in the
bed or sqlite file.

Returns the name of the output file and the total size of the extracted sequence.

:param regions: (str or dict or list) bed or sqlite file name, or sequence of features.
    If *regions* is a dictionary {'chr': [[start1,end1],[start2,end2]]}
    or a list [['chr',start1,end1],['chr',start2,end2]],
    will simply iterate through its items instead of loading a track from file.
:param out: (str or dict) output file name. If *out* is a (possibly empty) dictionary,
    will return the filled dictionary.
:param path_to_ref: (str) path to a fasta file containing the whole reference sequence.
:rtype: (str,int)
def bbcflib::genrep::Assembly::fasta_path (   self,
  chromosome = None 
)
Return the path to the compressed fasta file, for the whole assembly or for a single chromosome.
def bbcflib::genrep::Assembly::gene_coordinates (   self,
  id_list 
)
Creates a BED-style stream from a list of gene ids.
def bbcflib::genrep::Assembly::gene_track (   self,
  chromlist = None,
  biotype = ["protein_coding"] 
)
Return an iterator over all protein coding genes annotation in the genome.
def bbcflib::genrep::Assembly::get_exon_mapping (   self  ) 
Return a dictionary ``{exon ID: ([transcript IDs],gene ID,start,end,strand,chromosome)}``
def bbcflib::genrep::Assembly::get_exons_in_trans (   self  ) 
Return a dictionary ``{transcript ID: list of exon IDs it contains}``
def bbcflib::genrep::Assembly::get_features_from_gtf (   self,
  h,
  chr = None,
  method = "dico" 
)
Return a dictionary *data* of the form
``{key:[[values],[values],...]}`` containing the result of an SQL request which
parameters are given as a dictionary *h*. All [values] correspond to a line in the SQL.

:param chr: (str, or list of str) chromosomes on which to perform the request. By default,
every chromosome is searched.

Available keys for *h*, and possible values:

* "keys":       "$,$,..."            (fields to `SELECT` and pass as a key of *data*)
* "values":     "$,$,..."            (fields to `SELECT` and pass as respective values of *data*)
* "conditions": "$:#,$:#,..."        (filter (SQL `WHERE`))
* "uniq":       "whatever"           (SQL `DISTINCT` if specified, no matter what the -string- value is)
* "at_pos":     "12,36,45,1124,..."  (to select only features overlapping this list of positions)

where

* $ holds for any column name in the database
* # holds for any value in the database

Note: giving several field names to "keys" permits to select unique combinations of these fields.
The corresponding keys of *data* are a concatenation (by ';') of these fields.
def bbcflib::genrep::Assembly::get_gene_mapping (   self  ) 
Return a dictionary ``{geneID: (geneName, start, end, length, strand, chromosome)}``
Note that the gene's length is not the sum of the lengths of its exons.
def bbcflib::genrep::Assembly::get_links (   self,
  params 
)
Returns urls to features. Example::

    assembly.get_links({'name':'ENSMUSG00000085692', 'type':'gene'})

returns the dictionary
``{"Ensembl":"http://ensembl.org/Mus_musculus/Gene/Summary?g=ENSMUSG00000085692"}``.
If *params* is a string, then it is assumed to be the *name* parameter with ``type=gene``.
def bbcflib::genrep::Assembly::get_sqlite_url (   self  ) 
Return the url of the sqlite file containing gene annotations.
def bbcflib::genrep::Assembly::get_trans_in_gene (   self  ) 
Return a dictionary ``{gene ID: list of transcript IDs it contains}``
def bbcflib::genrep::Assembly::get_transcript_mapping (   self  ) 
Return a dictionary ``{transcript ID: (gene ID,start,end,length,strand,chromosome)}``
def bbcflib::genrep::Assembly::map_chromosome_names (   self,
  names 
)
Finds keys in the `chromosomes` dictionary that corresponds to the names or ids given as `names`.
Returns a dictionary, such as::

    assembly.map_chromosome_names([3,5,6,47])

    {'3': (2701, u'NC_001135', 4),
    '47': None,
    '5': (2508, u'NC_001137', 2),
    '6': (2580, u'NC_001138', 4)}

def bbcflib::genrep::Assembly::set_assembly (   self,
  assembly 
)
Reset the Assembly attributes to correspond to *assembly*.

:param assembly: integer giving the assembly ID, or a string giving the assembly name.
def bbcflib::genrep::Assembly::sqlite_path (   self  ) 
Return the path to the sqlite file containing genes annotations.
def bbcflib::genrep::Assembly::statistics (   self,
  output = None,
  frequency = False,
  matrix_format = False 
)
Return (di-)nucleotide counts or frequencies for an assembly, writes in file *output* if provided.
Example of result::

    {
"TT": 13574667
"GG": 3344762
"CC": 3365555
"AA": 13571722
"A": 32370285
"TA": 6362526
"GT": 4841536
"AC": 4846697
"N": 0
"C": 17781115
"TC": 6228639
"GA": 6231575
"CG": 3131283
"GC: 3340219
"CT": 5079814
"AG": 5075950
"G": 17758095
"TG": 6206098
"CA": 6204462
"AT": 8875914
"T": 32371931
    }

    Total = A + T + G + C

If *matrix_format* is True, *output* is like::

     >Assembly: sacCer2
    1   0.309798640038793   0.308714120881750   0.190593944221299   0.190893294858157
def bbcflib::genrep::Assembly::transcript_track (   self,
  chromlist = None,
  biotype = ["protein_coding"] 
)
Return an iterator over all protein coding transcripts annotation in the genome.

The documentation for this class was generated from the following file:
 All Classes Namespaces Functions