goenrich package¶
Submodules¶
goenrich.enrich module¶
-
goenrich.enrich.
analyze
(O, query, background_attribute, **kwargs)[source]¶ run enrichment analysis for query
>>> O = goenrich.obo.ontology('db/go-basic.obo') >>> gene2go = goenrich.read.gene2go('db/gene2go.gz') >>> values = {k: set(v) for k,v in gene2go.groupby('GO_ID')['GeneID']} >>> goenrich.enrich.propagate(O, values, 'gene2go') >>> df = goenrich.enrich.analyze(O, query, ...)
Parameters: - O – Ontology graph after backgroud was set
- query – array like of ids
Returns: pandas.DataFrame with results
-
goenrich.enrich.
calculate_pvalues
(nodes, query, background_attribute, M, min_category_size=3, max_category_size=500, max_category_depth=5, **kwargs)[source]¶ calculate pvalues for all categories in the graph
Parameters: - nodes – nodes dictionary from the ontology graph after background was set
- query – set of identifiers for which the p value is calculated
- background_attribute – node attribute assoc. with the background set
- M – background size, total number of genes in the data
- min_category_size – categories smaller than this number are ignored
- max_category_size – categories larger than this number are ignored
- max_category_depth – categories lower in the hierarchy (more specific) will be ignored
Returns: pvalues, x, n
-
goenrich.enrich.
induced_subgraph
(O, terms)[source]¶ Extracts a subgraph from O including the provided terms and all higher hierarchy
>>> df = goenrich.enrich.analyze(O, ...) >>> G = goenrich.induced_subgraph(O, df[df.rejected]['terms'])
Parameters: - O – ontology graph
- terms – a list of terms to extract
-
goenrich.enrich.
multiple_testing_correction
(ps, alpha=0.05, method='benjamini-hochberg', **kwargs)[source]¶ correct pvalues for multiple testing and add corrected q value
Parameters: - ps – list of pvalues
- alpha – significance level default : 0.05
- method – multiple testing correction method [bonferroni|benjamini-hochberg]
Returns (q, rej): two lists of q-values and rejected nodes
-
goenrich.enrich.
propagate
(O, values, attribute)[source]¶ Propagate values trough the hierarchy
>>> O = goenrich.obo.ontology('db/go-basic.obo') >>> gene2go = goenrich.read.gene2go('db/gene2go.gz') >>> values = {k: set(v) for k,v in gene2go.groupby('GO_ID')['GeneID']} >>> goenrich.enrich.propagate(O, values, 'gene2go')
Uses topological sorting of the vertices. Since degrees are usually low performance is almost linear time.
Parameters: - O – ontology graph
- values – mapping of nodes to set of ids
- attribute – name of the attribute
goenrich.export module¶
-
goenrich.export.
to_frame
(nodes, **kwargs)[source]¶ export node attributes and key-values pairs to pd.DataFrame
>>> to_frame(nodes, term = terms, pvalues = ps, ...)
Parameters: - nodes – list of dictionaries with node attributes
- **kwargs –
additional columns
Returns: a pd.DataFrame
-
goenrich.export.
to_graphviz
(G, gvfile, graph_label='', **kwargs)[source]¶ export graph of signifcant findings to dot file. A png can be generated from the commandline using graphviz
>>> import subprocess >>> subprocess.call(['dot', '-Tpng', 'filpath.dot', '>', 'filepath.png'])
Parameters: - G – the graph to be exported
- gvfile – file or filepath
- graph_label – For empty label pass graph_label=’‘.
goenrich.obo module¶
goenrich.read module¶
parsers for different go-annotation formats
-
goenrich.read.
gene2go
(filename, experimental=False, tax_id=9606, **kwds)[source]¶ read go-annotation file
Parameters: - filename – protein or gene identifier column
- experimental – use only experimentally validated annotations
- tax_id – filter according to taxon