Gene Enrichment Analysis Utilities (indra_cogex.client.enrichment.utils)
Utility functions for gene enrichment analysis.
Utilities for getting gene sets.
- collect_gene_sets(query, *, client, background_gene_ids=None, include_ontology_children=False, cache_file=None, force_cache_refresh=False)[source]
Collect gene sets based on the given query.
- Parameters:
query (
str) – A cypher queryclient (
Neo4jClient) – The Neo4j client.background_gene_ids (
Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.include_ontology_children (
bool) – If True, extend the gene set associations with associations from child terms using the indra ontologyforce_cache_refresh (
bool) – If True, the cache will be ignored and the query will be run again. The current results will overwrite any existing cache.
- Return type:
- Returns:
A dictionary whose keys that are 2-tuples of CURIE and name of each queried item and whose values are sets of HGNC gene identifiers (as strings)
- get_entity_to_regulators(*, client, background_gene_ids=None, minimum_evidence_count=1, minimum_belief=0.0, force_cache_refresh=False)[source]
Get a mapping from each entity in the INDRA database to the set of human genes that are causally upstream of it.
- Parameters:
client (
Neo4jClient) – The Neo4j client.background_gene_ids (
Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.minimum_evidence_count (
Optional[int]) – The minimum number of evidences for a relationship to count it as a regulator. Defaults to 1 (i.e., cutoff not applied.minimum_belief (
Optional[float]) – The minimum belief for a relationship to count it as a regulator. Defaults to 0.0 (i.e., cutoff not applied).force_cache_refresh (
bool) – If True, the cache will be ignored and the query will be run again. Any existing cache will be overwritten.
- Return type:
- Returns:
A dictionary whose keys that are 2-tuples of CURIE and name of each entity and whose values are sets of HGNC gene identifiers (as strings)
- get_entity_to_targets(*, client, background_gene_ids=None, minimum_evidence_count=1, minimum_belief=0.0, force_cache_refresh=False)[source]
Get a mapping from each entity in the INDRA database to the set of human genes that it regulates.
- Parameters:
client (
Neo4jClient) – The Neo4j client.background_gene_ids (
Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.minimum_evidence_count (
Optional[int]) – The minimum number of evidences for a relationship to count it as a regulator. Defaults to 1 (i.e., cutoff not applied.minimum_belief (
Optional[float]) – The minimum belief for a relationship to count it as a regulator. Defaults to 0.0 (i.e., cutoff not applied).force_cache_refresh (
bool) – If True, the cache will be ignored and the query will be run again. Any existing cache will be overwritten.
- Return type:
- Returns:
A dictionary whose keys that are 2-tuples of CURIE and name of each entity and whose values are sets of HGNC gene identifiers (as strings)
- get_go(*, background_gene_ids=None, client, force_cache_refresh=False)[source]
Get GO gene sets.
- Parameters:
client (
Neo4jClient) – The Neo4j client.background_gene_ids (
Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.force_cache_refresh (
bool) – If True, the cache will be ignored and the query will be run again. The current results will overwrite any existing cache.
- Return type:
- Returns:
A dictionary whose keys that are 2-tuples of CURIE and name of each GO term and whose values are sets of HGNC gene identifiers (as strings)
- get_phenotype_gene_sets(*, background_gene_ids=None, force_cache_refresh=False, client)[source]
Get HPO phenotype gene sets.
- Parameters:
client (
Neo4jClient) – The Neo4j client.background_gene_ids (
Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.force_cache_refresh (
bool) – If True, the cache will be ignored and the query will be run again. Any existing cache will be overwritten.
- Return type:
- Returns:
A dictionary whose keys that are 2-tuples of CURIE and name of each phenotype gene set and whose values are sets of HGNC gene identifiers (as strings)
- get_reactome(*, background_gene_ids=None, force_cache_refresh=False, client)[source]
Get Reactome gene sets.
- Parameters:
client (
Neo4jClient) – The Neo4j client.background_gene_ids (
Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.force_cache_refresh (
bool) – If True, the cache will be ignored and the query will be run again.
- Return type:
- Returns:
A dictionary whose keys that are 2-tuples of CURIE and name of each Reactome pathway and whose values are sets of HGNC gene identifiers (as strings)
- get_wikipathways(*, background_gene_ids=None, force_cache_refresh=False, client)[source]
Get WikiPathways gene sets.
- Parameters:
client (
Neo4jClient) – The Neo4j client.background_gene_ids (
Optional[Iterable[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.force_cache_refresh (
bool) – If True, the cache will be ignored and the query will be run again. Any existing cache will be overwritten.
- Return type:
- Returns:
A dictionary whose keys that are 2-tuples of CURIE and name of each WikiPathway pathway and whose values are sets of HGNC gene identifiers (as strings)