Discrete Gene Enrichment Analysis (indra_cogex.client.enrichment.discrete)

A collection of analyses possible on gene lists (of HGNC identifiers).

EXAMPLE_GENE_IDS = ['613', '1116', '1119', '1697', '7067', '2537', '2734', '29517', '8568', '4910', '4931', '4932', '4962', '4983', '18873', '5432', '5433', '5981', '16404', '5985', '18358', '6018', '6019', '6021', '6118', '6120', '6122', '6148', '6374', '6378', '6395', '6727', '14374', '8004', '18669', '8912', '30306', '23785', '9253', '9788', '10498', '10819', '6769', '11120', '11133', '11432', '11584', '18348', '11849', '28948', '11876', '11878', '11985', '20820', '12647', '20593', '12713']

This example list comes from human genes associated with COVID-19 (https://bgee.org/?page=top_anat#/result/9bbddda9dea22c21edcada56ad552a35cb8e29a7/)

go_ora(client, gene_ids, background_gene_ids=None, **kwargs)[source]

Calculate over-representation on all GO terms.

Parameters:
  • client (Neo4jClient) – Neo4jClient

  • gene_ids (Iterable[str]) – List of HGNC gene identifiers

  • background_gene_ids (Optional[Collection[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

  • **kwargs – Additional keyword arguments to pass to _do_ora

Return type:

DataFrame

Returns:

DataFrame with columns: curie, name, p, q, mlp, mlq

indra_downstream_ora(client, gene_ids, background_gene_ids=None, *, minimum_evidence_count=1, minimum_belief=0.0, **kwargs)[source]

Calculate a p-value for each entity in the INDRA database based on the genes that are causally upstream of it and how they compare to the query gene set.

Parameters:
  • client (Neo4jClient) – Neo4jClient

  • gene_ids (Iterable[str]) – List of HGNC gene identifiers

  • background_gene_ids (Optional[Collection[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

  • minimum_evidence_count (Optional[int]) – Minimum number of evidences to consider a causal relationship

  • minimum_belief (Optional[float]) – Minimum belief to consider a causal relationship

  • **kwargs – Additional keyword arguments to pass to _do_ora

Return type:

DataFrame

Returns:

DataFrame with columns: curie, name, p, q, mlp, mlq

indra_upstream_ora(client, gene_ids, background_gene_ids=None, *, minimum_evidence_count=1, minimum_belief=0.0, **kwargs)[source]

Calculate a p-value for each entity in the INDRA database based on the set of genes that it regulates and how they compare to the query gene set.

Parameters:
  • client (Neo4jClient) – Neo4jClient

  • gene_ids (Iterable[str]) – List of HGNC gene identifiers

  • background_gene_ids (Optional[Collection[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

  • minimum_evidence_count (Optional[int]) – Minimum number of evidences to consider a causal relationship

  • minimum_belief (Optional[float]) – Minimum belief to consider a causal relationship

  • **kwargs – Additional keyword arguments to pass to _do_ora

Return type:

DataFrame

Returns:

DataFrame with columns: curie, name, p, q, mlp, mlq

phenotype_ora(gene_ids, background_gene_ids=None, *, client, **kwargs)[source]

Calculate over-representation on all HP phenotypes.

Parameters:
  • gene_ids (Iterable[str]) – List of HGNC gene identifiers

  • background_gene_ids (Optional[Collection[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

  • client (Neo4jClient) – Neo4jClient

  • **kwargs – Additional keyword arguments to pass to _do_ora

Return type:

DataFrame

Returns:

DataFrame with columns: curie, name, p, q, mlp, mlq

reactome_ora(client, gene_ids, background_gene_ids=None, **kwargs)[source]

Calculate over-representation on all Reactome pathways.

Parameters:
  • client (Neo4jClient) – Neo4jClient

  • gene_ids (Iterable[str]) – List of HGNC gene identifiers

  • background_gene_ids (Optional[Collection[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

  • **kwargs – Additional keyword arguments to pass to _do_ora

Return type:

DataFrame

Returns:

DataFrame with columns: curie, name, p, q, mlp, mlq

wikipathways_ora(client, gene_ids, background_gene_ids=None, **kwargs)[source]

Calculate over-representation on all WikiPathway pathways.

Parameters:
  • client (Neo4jClient) – Neo4jClient

  • gene_ids (Iterable[str]) – List of HGNC gene identifiers

  • background_gene_ids (Optional[Collection[str]]) – List of HGNC gene identifiers for the background gene set. If not given, all genes with HGNC IDs are used as the background.

  • **kwargs – Additional keyword arguments to pass to _do_ora

Return type:

DataFrame

Returns:

DataFrame with columns: curie, name, p, q, mlp, mlq