Continuous Gene Enrichment Analysis (`indra_cogex.client.enrichment.continuous`)

A collection of analyses possible on gene lists (of HGNC identifiers) with scores.

For example, this could be applied to the log_2 fold scores from differential gene expression experiments.

Warning

This module requires the optional dependency gseapy. Install with pip install gseapy.

get_human_scores(path, read_csv_kwargs=None, gene_symbol_column_name=None, score_column_name=None)[source]

Load a differential gene expression file with human measurements.

Parameters:

path (Union[Path, str, DataFrame]) – Path to the file to read with pandas.read_csv().
read_csv_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to pandas.read_csv()
gene_symbol_column_name (Optional[str]) – The name of the column with gene symbols. If none, will try and guess.
score_column_name (Optional[str]) – The name of the column with scores. If none, will try and guess.

Return type:

Dict[str, float]

Returns:

A dictionary of human gene HGNC IDs to scores.

get_mouse_scores(path, read_csv_kwargs=None, gene_symbol_column_name=None, score_column_name=None)[source]

Load a differential gene expression file with mouse measurements.

This function extracts the MGI gene symbols, maps them to MGI identifiers, uses PyOBO to map orthologs to HGNC, then returns the HGNC gene and scores as a dictionary.

Parameters:

path (Union[Path, str, DataFrame]) – Path to the file to read with pandas.read_csv().
read_csv_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to pandas.read_csv()
gene_symbol_column_name (Optional[str]) – The name of the column with gene symbols. If none, will try and guess.
score_column_name (Optional[str]) – The name of the column with scores. If none, will try and guess.

Return type:

Dict[str, float]

Returns:

A dictionary of mapped orthologus human gene HGNC IDs to scores.

get_rat_scores(path, read_csv_kwargs=None, gene_symbol_column_name=None, score_column_name=None)[source]

Load a differential gene expression file with rat measurements.

This function extracts the RGD gene symbols, maps them to RGD identifiers, uses PyOBO to map orthologs to HGNC, then returns the HGNC gene and scores as a dictionary.

Parameters:

path (Union[Path, str, DataFrame]) – Path to the file to read with pandas.read_csv().
read_csv_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to pandas.read_csv()
gene_symbol_column_name (Optional[str]) – The name of the column with gene symbols. If none, will try and guess.
score_column_name (Optional[str]) – The name of the column with scores. If none, will try and guess.

Return type:

Dict[str, float]

Returns:

A dictionary of mapped orthologus human gene HGNC IDs to scores.

go_gsea(scores, directory=None, *, client, **kwargs)[source]

Run GSEA with gene sets for each Gene Ontology term.

Parameters:

client (Neo4jClient) – The Neo4j client.
scores (Dict[str, float]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)
directory (Union[None, Path, str]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen set
kwargs – Remaining keyword arguments to pass through to gseapy.prerank()

Return type:

DataFrame

Returns:

A pandas dataframe with the GSEA results

gsea(scores, gene_sets, directory=None, alpha=None, keep_insignificant=True, **kwargs)[source]

Run GSEA on pre-ranked data.

Parameters:

scores (Dict[str, float]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)
gene_sets (Dict[Tuple[str, str], Set[str]]) – A mapping from
directory (Union[None, Path, str]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen set
alpha (Optional[float]) – The cutoff for significance. Defaults to 0.05
keep_insignificant (bool) – If false, removes results with a p value less than alpha.
kwargs – Remaining keyword arguments to pass through to gseapy.prerank()

Return type:

DataFrame

Returns:

A pandas dataframe with the GSEA results

indra_downstream_gsea(scores, directory=None, *, client, minimum_evidence_count=None, minimum_belief=None, **kwargs)[source]

Run GSEA for each entry in the INDRA database and the set of human genes that are upstream regulators of it.

Parameters:

client (Neo4jClient) – The Neo4j client.
scores (Dict[str, float]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)
directory (Union[None, Path, str]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen set
minimum_evidence_count (Optional[int]) – The minimum number of evidences for a relationship to count it as a regulator. Defaults to 1 (i.e., cutoff not applied.
minimum_belief (Optional[float]) – The minimum belief for a relationship to count it as a regulator. Defaults to 0.0 (i.e., cutoff not applied).
kwargs – Remaining keyword arguments to pass through to gseapy.prerank()

Return type:

DataFrame

Returns:

A pandas dataframe with the GSEA results

indra_upstream_gsea(scores, directory=None, *, client, minimum_evidence_count=None, minimum_belief=None, **kwargs)[source]

Run GSEA for each entry in the INDRA database and the set of human genes that it regulates.

Parameters:

client (Neo4jClient) – The Neo4j client.
scores (Dict[str, float]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)
directory (Union[None, Path, str]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen set
minimum_evidence_count (Optional[int]) – The minimum number of evidences for a relationship to count it as a regulator. Defaults to 1 (i.e., cutoff not applied.
minimum_belief (Optional[float]) – The minimum belief for a relationship to count it as a regulator. Defaults to 0.0 (i.e., cutoff not applied).
kwargs – Remaining keyword arguments to pass through to gseapy.prerank()

Return type:

DataFrame

Returns:

A pandas dataframe with the GSEA results

phenotype_gsea(scores, directory=None, *, client, **kwargs)[source]

Run GSEA with HPO phenotype gene sets.

Parameters:

client (Neo4jClient) – The Neo4j client.
scores (Dict[str, float]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)
directory (Union[None, Path, str]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen set
kwargs – Remaining keyword arguments to pass through to gseapy.prerank()

Return type:

DataFrame

Returns:

A pandas dataframe with the GSEA results

reactome_gsea(scores, directory=None, *, client, **kwargs)[source]

Run GSEA with Reactome gene sets.

Parameters:

client (Neo4jClient) – The Neo4j client.
scores (Dict[str, float]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)
directory (Union[None, Path, str]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen set
kwargs – Remaining keyword arguments to pass through to gseapy.prerank()

Return type:

DataFrame

Returns:

A pandas dataframe with the GSEA results

wikipathways_gsea(scores, directory=None, *, client, **kwargs)[source]

Run GSEA with WikiPathways gene sets.

Parameters:

client (Neo4jClient) – The Neo4j client.
scores (Dict[str, float]) – A mapping from HGNC gene identifiers to floating point scores (e.g., from a differential gene expression analysis)
directory (Union[None, Path, str]) – Specify the directory if the results should be saved, including both a dataframe and plots for each gen set
kwargs – Remaining keyword arguments to pass through to gseapy.prerank()

Return type:

DataFrame

Returns:

A pandas dataframe with the GSEA results

Continuous Gene Enrichment Analysis (indra_cogex.client.enrichment.continuous)

Continuous Gene Enrichment Analysis (`indra_cogex.client.enrichment.continuous`)