Gene Enrichment Analysis Utilities (indra_cogex.client.enrichment.utils)

Utility functions for gene enrichment analysis.

Utilities for getting gene sets.

collect_gene_sets(query, *, client, background_gene_ids=None, include_ontology_children=False, cache_file=None)[source]

Collect gene sets based on the given query.

Parameters:
  • query (str) – A cypher query

  • client (Neo4jClient) – The Neo4j client.

  • background_gene_ids (Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

  • include_ontology_children (bool) – If True, extend the gene set associations with associations from child terms using the indra ontology

  • cache_file (Optional[Path]) – The path to the cache file.

Return type:

Dict[Tuple[str, str], Set[str]]

Returns:

A dictionary whose keys that are 2-tuples of CURIE and name of each queried item and whose values are sets of HGNC gene identifiers (as strings)

get_entity_to_regulators(*, client, background_gene_ids=None, minimum_evidence_count=1, minimum_belief=0.0)[source]

Get a mapping from each entity in the INDRA database to the set of human genes that are causally upstream of it.

Parameters:
  • client (Neo4jClient) – The Neo4j client.

  • background_gene_ids (Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

  • minimum_evidence_count (Optional[int]) – The minimum number of evidences for a relationship to count it as a regulator. Defaults to 1 (i.e., cutoff not applied.

  • minimum_belief (Optional[float]) – The minimum belief for a relationship to count it as a regulator. Defaults to 0.0 (i.e., cutoff not applied).

Return type:

Dict[Tuple[str, str], Set[str]]

Returns:

A dictionary whose keys that are 2-tuples of CURIE and name of each entity and whose values are sets of HGNC gene identifiers (as strings)

get_entity_to_targets(*, client, background_gene_ids=None, minimum_evidence_count=1, minimum_belief=0.0)[source]

Get a mapping from each entity in the INDRA database to the set of human genes that it regulates.

Parameters:
  • client (Neo4jClient) – The Neo4j client.

  • background_gene_ids (Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

  • minimum_evidence_count (Optional[int]) – The minimum number of evidences for a relationship to count it as a regulator. Defaults to 1 (i.e., cutoff not applied.

  • minimum_belief (Optional[float]) – The minimum belief for a relationship to count it as a regulator. Defaults to 0.0 (i.e., cutoff not applied).

Return type:

Dict[Tuple[str, str], Set[str]]

Returns:

A dictionary whose keys that are 2-tuples of CURIE and name of each entity and whose values are sets of HGNC gene identifiers (as strings)

get_go(*, background_gene_ids=None, client)[source]

Get GO gene sets.

Parameters:
  • client (Neo4jClient) – The Neo4j client.

  • background_gene_ids (Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

Return type:

Dict[Tuple[str, str], Set[str]]

Returns:

A dictionary whose keys that are 2-tuples of CURIE and name of each GO term and whose values are sets of HGNC gene identifiers (as strings)

get_phenotype_gene_sets(*, background_gene_ids=None, client)[source]

Get HPO phenotype gene sets.

Parameters:
  • client (Neo4jClient) – The Neo4j client.

  • background_gene_ids (Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

Return type:

Dict[Tuple[str, str], Set[str]]

Returns:

A dictionary whose keys that are 2-tuples of CURIE and name of each phenotype gene set and whose values are sets of HGNC gene identifiers (as strings)

get_reactome(*, background_gene_ids=None, client)[source]

Get Reactome gene sets.

Parameters:
  • client (Neo4jClient) – The Neo4j client.

  • background_gene_ids (Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

Return type:

Dict[Tuple[str, str], Set[str]]

Returns:

A dictionary whose keys that are 2-tuples of CURIE and name of each Reactome pathway and whose values are sets of HGNC gene identifiers (as strings)

get_wikipathways(*, background_gene_ids=None, client)[source]

Get WikiPathways gene sets.

Parameters:
  • client (Neo4jClient) – The Neo4j client.

  • background_gene_ids (Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

Return type:

Dict[Tuple[str, str], Set[str]]

Returns:

A dictionary whose keys that are 2-tuples of CURIE and name of each WikiPathway pathway and whose values are sets of HGNC gene identifiers (as strings)