List of all members.
Member Function Documentation
def bbcflib::genrep::Assembly::__init__ |
( |
|
self, |
|
|
|
assembly = None , |
|
|
|
genrep = None , |
|
|
|
intype = 0 | |
|
) |
| | |
A representation of a GenRep assembly.
To get an assembly from the repository, call the Assembly
constructor with either the integer assembly ID or the string assembly
name. This returns an Assembly object::
a = Assembly(3)
b = Assembly('mm9')
An Assembly has the following fields:
.. attribute:: id
An integer giving the assembly ID in GenRep.
.. attribute:: name
A string giving the nameassembly of the assembly in GenRep.
.. attribute:: index_path
The absolute path to the bowtie index for this assembly.
.. attribute:: chromosomes
A dictionary of chromosomes in the assembly. The dictionary
values are tuples of the form (chromsome id, RefSeq locus,
RefSeq version), and the values are dictionaries with the keys
'name' and 'length'.
.. attribute:: bbcf_valid
Boolean.
.. attribute:: updated_at
.. attribute:: created_at
``datetime`` objects.
.. attribute:: nr_assembly_id
.. attribute:: genome_id
.. attribute:: source_id
.. attribute:: intype
All integers. ``intype`` is '0' for genomic data, '1' for exons and '2' for transcripts.
.. attribute:: source_name
.. attribute:: md5
def bbcflib::genrep::Assembly::annot_track |
( |
|
self, |
|
|
|
annot_type = 'gene' , |
|
|
|
chromlist = None , |
|
|
|
biotype = ["protein_coding"] | |
|
) |
| | |
Return an iterator over all annotations of a given type in the genome.
:param annot_type: (str) one of 'gene','transcript','exon','CDS'.
:chrom_list: (list of str) return only features in the specified chromosomes.
:biotype: (list of str, or None) return only features with the specified biotype(s).
:rtype: btrack.FeatureStream
def bbcflib::genrep::Assembly::chrmeta |
( |
|
self |
) |
|
Return a dictionary of chromosome meta data of the type:
``{'chr1': {'length': 249250621},'chr2': {'length': 135534747},'chr3': {'length': 135006516}}``
def bbcflib::genrep::Assembly::chrnames |
( |
|
self |
) |
|
Return a list of chromosome names.
def bbcflib::genrep::Assembly::exon_track |
( |
|
self, |
|
|
|
chromlist = None , |
|
|
|
biotype = ["protein_coding"] | |
|
) |
| | |
Return an iterator over all coding exons annotation in the genome.
def bbcflib::genrep::Assembly::fasta_from_regions |
( |
|
self, |
|
|
|
regions, |
|
|
|
out = None , |
|
|
|
path_to_ref = None , |
|
|
|
chunk = 50000 , |
|
|
|
shuffled = False | |
|
) |
| | |
Get a fasta file with sequences corresponding to the features in the
bed or sqlite file.
Returns the name of the output file and the total size of the extracted sequence.
:param regions: (str or dict or list) bed or sqlite file name, or sequence of features.
If *regions* is a dictionary {'chr': [[start1,end1],[start2,end2]]}
or a list [['chr',start1,end1],['chr',start2,end2]],
will simply iterate through its items instead of loading a track from file.
:param out: (str or dict) output file name. If *out* is a (possibly empty) dictionary,
will return the filled dictionary.
:param path_to_ref: (str) path to a fasta file containing the whole reference sequence.
:rtype: (str,int)
def bbcflib::genrep::Assembly::fasta_path |
( |
|
self, |
|
|
|
chromosome = None | |
|
) |
| | |
Return the path to the compressed fasta file, for the whole assembly or for a single chromosome.
def bbcflib::genrep::Assembly::gene_coordinates |
( |
|
self, |
|
|
|
id_list | |
|
) |
| | |
Creates a BED-style stream from a list of gene ids.
def bbcflib::genrep::Assembly::gene_track |
( |
|
self, |
|
|
|
chromlist = None , |
|
|
|
biotype = ["protein_coding"] | |
|
) |
| | |
Return an iterator over all protein coding genes annotation in the genome.
def bbcflib::genrep::Assembly::get_exon_mapping |
( |
|
self |
) |
|
Return a dictionary ``{exon ID: ([transcript IDs],gene ID,start,end,strand,chromosome)}``
def bbcflib::genrep::Assembly::get_exons_in_trans |
( |
|
self |
) |
|
Return a dictionary ``{transcript ID: list of exon IDs it contains}``
def bbcflib::genrep::Assembly::get_features_from_gtf |
( |
|
self, |
|
|
|
h, |
|
|
|
chr = None , |
|
|
|
method = "dico" | |
|
) |
| | |
Return a dictionary *data* of the form
``{key:[[values],[values],...]}`` containing the result of an SQL request which
parameters are given as a dictionary *h*. All [values] correspond to a line in the SQL.
:param chr: (str, or list of str) chromosomes on which to perform the request. By default,
every chromosome is searched.
Available keys for *h*, and possible values:
* "keys": "$,$,..." (fields to `SELECT` and pass as a key of *data*)
* "values": "$,$,..." (fields to `SELECT` and pass as respective values of *data*)
* "conditions": "$:#,$:#,..." (filter (SQL `WHERE`))
* "uniq": "whatever" (SQL `DISTINCT` if specified, no matter what the -string- value is)
* "at_pos": "12,36,45,1124,..." (to select only features overlapping this list of positions)
where
* $ holds for any column name in the database
* # holds for any value in the database
Note: giving several field names to "keys" permits to select unique combinations of these fields.
The corresponding keys of *data* are a concatenation (by ';') of these fields.
def bbcflib::genrep::Assembly::get_gene_mapping |
( |
|
self |
) |
|
Return a dictionary ``{geneID: (geneName, start, end, length, strand, chromosome)}``
Note that the gene's length is not the sum of the lengths of its exons.
def bbcflib::genrep::Assembly::get_links |
( |
|
self, |
|
|
|
params | |
|
) |
| | |
Returns urls to features. Example::
assembly.get_links({'name':'ENSMUSG00000085692', 'type':'gene'})
returns the dictionary
``{"Ensembl":"http://ensembl.org/Mus_musculus/Gene/Summary?g=ENSMUSG00000085692"}``.
If *params* is a string, then it is assumed to be the *name* parameter with ``type=gene``.
def bbcflib::genrep::Assembly::get_sqlite_url |
( |
|
self |
) |
|
Return the url of the sqlite file containing gene annotations.
def bbcflib::genrep::Assembly::get_trans_in_gene |
( |
|
self |
) |
|
Return a dictionary ``{gene ID: list of transcript IDs it contains}``
def bbcflib::genrep::Assembly::get_transcript_mapping |
( |
|
self |
) |
|
Return a dictionary ``{transcript ID: (gene ID,start,end,length,strand,chromosome)}``
def bbcflib::genrep::Assembly::map_chromosome_names |
( |
|
self, |
|
|
|
names | |
|
) |
| | |
Finds keys in the `chromosomes` dictionary that corresponds to the names or ids given as `names`.
Returns a dictionary, such as::
assembly.map_chromosome_names([3,5,6,47])
{'3': (2701, u'NC_001135', 4),
'47': None,
'5': (2508, u'NC_001137', 2),
'6': (2580, u'NC_001138', 4)}
def bbcflib::genrep::Assembly::set_assembly |
( |
|
self, |
|
|
|
assembly | |
|
) |
| | |
Reset the Assembly attributes to correspond to *assembly*.
:param assembly: integer giving the assembly ID, or a string giving the assembly name.
def bbcflib::genrep::Assembly::sqlite_path |
( |
|
self |
) |
|
Return the path to the sqlite file containing genes annotations.
def bbcflib::genrep::Assembly::statistics |
( |
|
self, |
|
|
|
output = None , |
|
|
|
frequency = False , |
|
|
|
matrix_format = False | |
|
) |
| | |
Return (di-)nucleotide counts or frequencies for an assembly, writes in file *output* if provided.
Example of result::
{
"TT": 13574667
"GG": 3344762
"CC": 3365555
"AA": 13571722
"A": 32370285
"TA": 6362526
"GT": 4841536
"AC": 4846697
"N": 0
"C": 17781115
"TC": 6228639
"GA": 6231575
"CG": 3131283
"GC: 3340219
"CT": 5079814
"AG": 5075950
"G": 17758095
"TG": 6206098
"CA": 6204462
"AT": 8875914
"T": 32371931
}
Total = A + T + G + C
If *matrix_format* is True, *output* is like::
>Assembly: sacCer2
1 0.309798640038793 0.308714120881750 0.190593944221299 0.190893294858157
def bbcflib::genrep::Assembly::transcript_track |
( |
|
self, |
|
|
|
chromlist = None , |
|
|
|
biotype = ["protein_coding"] | |
|
) |
| | |
Return an iterator over all protein coding transcripts annotation in the genome.
The documentation for this class was generated from the following file:
- /home/rougemon/Data/pipelines/libv2/bbcflib/bbcflib/genrep.py