Continuous Gene Enrichment Analysis (indra_cogex.client.enrichment.continuous
)
A collection of analyses possible on gene lists (of HGNC identifiers) with scores.
For example, this could be applied to the log_2 fold scores from differential gene expression experiments.
Warning
This module requires the optional dependency gseapy
. Install with
pip install gseapy
.
- get_human_scores(path, read_csv_kwargs=None, gene_symbol_column_name=None, score_column_name=None)[source]
Load a differential gene expression file with human measurements.
- Parameters:
path (
Union
[Path
,str
,DataFrame
]) – Path to the file to read withpandas.read_csv()
.read_csv_kwargs (
Optional
[Dict
[str
,Any
]]) – Keyword arguments to pass topandas.read_csv()
gene_symbol_column_name (
Optional
[str
]) – The name of the column with gene symbols. If none, will try and guess.score_column_name (
Optional
[str
]) – The name of the column with scores. If none, will try and guess.
- Return type:
- Returns:
A dictionary of human gene HGNC IDs to scores.
- get_mouse_scores(path, read_csv_kwargs=None, gene_symbol_column_name=None, score_column_name=None)[source]
Load a differential gene expression file with mouse measurements.
This function extracts the MGI gene symbols, maps them to MGI identifiers, uses PyOBO to map orthologs to HGNC, then returns the HGNC gene and scores as a dictionary.
- Parameters:
path (
Union
[Path
,str
,DataFrame
]) – Path to the file to read withpandas.read_csv()
.read_csv_kwargs (
Optional
[Dict
[str
,Any
]]) – Keyword arguments to pass topandas.read_csv()
gene_symbol_column_name (
Optional
[str
]) – The name of the column with gene symbols. If none, will try and guess.score_column_name (
Optional
[str
]) – The name of the column with scores. If none, will try and guess.
- Return type:
- Returns:
A dictionary of mapped orthologus human gene HGNC IDs to scores.
- get_rat_scores(path, read_csv_kwargs=None, gene_symbol_column_name=None, score_column_name=None)[source]
Load a differential gene expression file with rat measurements.
This function extracts the RGD gene symbols, maps them to RGD identifiers, uses PyOBO to map orthologs to HGNC, then returns the HGNC gene and scores as a dictionary.
- Parameters:
path (
Union
[Path
,str
,DataFrame
]) – Path to the file to read withpandas.read_csv()
.read_csv_kwargs (
Optional
[Dict
[str
,Any
]]) – Keyword arguments to pass topandas.read_csv()
gene_symbol_column_name (
Optional
[str
]) – The name of the column with gene symbols. If none, will try and guess.score_column_name (
Optional
[str
]) – The name of the column with scores. If none, will try and guess.
- Return type:
- Returns:
A dictionary of mapped orthologus human gene HGNC IDs to scores.
- go_gsea(scores, directory=None, *, client, **kwargs)[source]
Run GSEA with gene sets for each Gene Ontology term.
- Parameters:
client (
Neo4jClient
) – The Neo4j client.scores (
Dict
[str
,float
]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)directory (
Union
[None
,Path
,str
]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen setkwargs – Remaining keyword arguments to pass through to
gseapy.prerank()
- Return type:
DataFrame
- Returns:
A pandas dataframe with the GSEA results
- gsea(scores, gene_sets, directory=None, alpha=None, keep_insignificant=True, **kwargs)[source]
Run GSEA on pre-ranked data.
- Parameters:
scores (
Dict
[str
,float
]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)gene_sets (
Dict
[Tuple
[str
,str
],Set
[str
]]) – A mapping fromdirectory (
Union
[None
,Path
,str
]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen setalpha (
Optional
[float
]) – The cutoff for significance. Defaults to 0.05keep_insignificant (
bool
) – If false, removes results with a p value less than alpha.kwargs – Remaining keyword arguments to pass through to
gseapy.prerank()
- Return type:
DataFrame
- Returns:
A pandas dataframe with the GSEA results
- indra_downstream_gsea(scores, directory=None, *, client, minimum_evidence_count=None, minimum_belief=None, **kwargs)[source]
Run GSEA for each entry in the INDRA database and the set of human genes that are upstream regulators of it.
- Parameters:
client (
Neo4jClient
) – The Neo4j client.scores (
Dict
[str
,float
]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)directory (
Union
[None
,Path
,str
]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen setminimum_evidence_count (
Optional
[int
]) – The minimum number of evidences for a relationship to count it as a regulator. Defaults to 1 (i.e., cutoff not applied.minimum_belief (
Optional
[float
]) – The minimum belief for a relationship to count it as a regulator. Defaults to 0.0 (i.e., cutoff not applied).kwargs – Remaining keyword arguments to pass through to
gseapy.prerank()
- Return type:
DataFrame
- Returns:
A pandas dataframe with the GSEA results
- indra_upstream_gsea(scores, directory=None, *, client, minimum_evidence_count=None, minimum_belief=None, **kwargs)[source]
Run GSEA for each entry in the INDRA database and the set of human genes that it regulates.
- Parameters:
client (
Neo4jClient
) – The Neo4j client.scores (
Dict
[str
,float
]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)directory (
Union
[None
,Path
,str
]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen setminimum_evidence_count (
Optional
[int
]) – The minimum number of evidences for a relationship to count it as a regulator. Defaults to 1 (i.e., cutoff not applied.minimum_belief (
Optional
[float
]) – The minimum belief for a relationship to count it as a regulator. Defaults to 0.0 (i.e., cutoff not applied).kwargs – Remaining keyword arguments to pass through to
gseapy.prerank()
- Return type:
DataFrame
- Returns:
A pandas dataframe with the GSEA results
- phenotype_gsea(scores, directory=None, *, client, **kwargs)[source]
Run GSEA with HPO phenotype gene sets.
- Parameters:
client (
Neo4jClient
) – The Neo4j client.scores (
Dict
[str
,float
]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)directory (
Union
[None
,Path
,str
]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen setkwargs – Remaining keyword arguments to pass through to
gseapy.prerank()
- Return type:
DataFrame
- Returns:
A pandas dataframe with the GSEA results
- reactome_gsea(scores, directory=None, *, client, **kwargs)[source]
Run GSEA with Reactome gene sets.
- Parameters:
client (
Neo4jClient
) – The Neo4j client.scores (
Dict
[str
,float
]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)directory (
Union
[None
,Path
,str
]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen setkwargs – Remaining keyword arguments to pass through to
gseapy.prerank()
- Return type:
DataFrame
- Returns:
A pandas dataframe with the GSEA results
- wikipathways_gsea(scores, directory=None, *, client, **kwargs)[source]
Run GSEA with WikiPathways gene sets.
- Parameters:
client (
Neo4jClient
) – The Neo4j client.scores (
Dict
[str
,float
]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)directory (
Union
[None
,Path
,str
]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen setkwargs – Remaining keyword arguments to pass through to
gseapy.prerank()
- Return type:
DataFrame
- Returns:
A pandas dataframe with the GSEA results