goenrich package

Submodules

goenrich.enrich module

goenrich.enrich.analyze(O, query, background_attribute, **kwargs)[source]

run enrichment analysis for query

>>> O = goenrich.obo.ontology('db/go-basic.obo')
>>> gene2go = goenrich.read.gene2go('db/gene2go.gz')
>>> values = {k: set(v) for k,v in gene2go.groupby('GO_ID')['GeneID']}
>>> goenrich.enrich.propagate(O, values, 'gene2go')
>>> df = goenrich.enrich.analyze(O, query, ...)
Parameters:
  • O – Ontology graph after backgroud was set
  • query – array like of ids
Returns:

pandas.DataFrame with results

goenrich.enrich.calculate_pvalues(nodes, query, background_attribute, M, min_category_size=3, max_category_size=500, max_category_depth=5, **kwargs)[source]

calculate pvalues for all categories in the graph

Parameters:
  • nodes – nodes dictionary from the ontology graph after background was set
  • query – set of identifiers for which the p value is calculated
  • background_attribute – node attribute assoc. with the background set
  • M – background size, total number of genes in the data
  • min_category_size – categories smaller than this number are ignored
  • max_category_size – categories larger than this number are ignored
  • max_category_depth – categories lower in the hierarchy (more specific) will be ignored
Returns:

pvalues, x, n

goenrich.enrich.induced_subgraph(O, terms)[source]

Extracts a subgraph from O including the provided terms and all higher hierarchy

>>> df = goenrich.enrich.analyze(O, ...)
>>> G = goenrich.induced_subgraph(O, df[df.rejected]['terms'])
Parameters:
  • O – ontology graph
  • terms – a list of terms to extract
goenrich.enrich.multiple_testing_correction(ps, alpha=0.05, method='benjamini-hochberg', **kwargs)[source]

correct pvalues for multiple testing and add corrected q value

Parameters:
  • ps – list of pvalues
  • alpha – significance level default : 0.05
  • method – multiple testing correction method [bonferroni|benjamini-hochberg]
Returns (q, rej):
 

two lists of q-values and rejected nodes

goenrich.enrich.propagate(O, values, attribute)[source]

Propagate values trough the hierarchy

>>> O = goenrich.obo.ontology('db/go-basic.obo')
>>> gene2go = goenrich.read.gene2go('db/gene2go.gz')
>>> values = {k: set(v) for k,v in gene2go.groupby('GO_ID')['GeneID']}
>>> goenrich.enrich.propagate(O, values, 'gene2go')

Uses topological sorting of the vertices. Since degrees are usually low performance is almost linear time.

Parameters:
  • O – ontology graph
  • values – mapping of nodes to set of ids
  • attribute – name of the attribute

goenrich.export module

goenrich.export.to_frame(nodes, **kwargs)[source]

export node attributes and key-values pairs to pd.DataFrame

>>> to_frame(nodes, term = terms, pvalues = ps, ...)
Parameters:
  • nodes – list of dictionaries with node attributes
  • **kwargs

    additional columns

Returns:

a pd.DataFrame

goenrich.export.to_graphviz(G, gvfile, graph_label='', **kwargs)[source]

export graph of signifcant findings to dot file. A png can be generated from the commandline using graphviz

>>> import subprocess
>>> subprocess.call(['dot', '-Tpng', 'filpath.dot', '>', 'filepath.png'])
Parameters:
  • G – the graph to be exported
  • gvfile – file or filepath
  • graph_label – For empty label pass graph_label=’‘.

goenrich.obo module

goenrich.obo.ontology(file)[source]

read ontology from file :param file: file path of file handle

goenrich.read module

parsers for different go-annotation formats

goenrich.read.gene2go(filename, experimental=False, tax_id=9606, **kwds)[source]

read go-annotation file

Parameters:
  • filename – protein or gene identifier column
  • experimental – use only experimentally validated annotations
  • tax_id – filter according to taxon
goenrich.read.goa(filename, experimental=True, **kwds)[source]

read go-annotation file

Parameters:
  • filename – protein or gene identifier column
  • experimental – use only experimentally validated annotations
goenrich.read.sgd(filename, experimental=False, **kwds)[source]

read yeast genome database go-annotation file

Parameters:
  • filename – protein or gene identifier column
  • experimental – use only experimentally validated annotations

Module contents