cell_annotator.SampleAnnotator#

class cell_annotator.SampleAnnotator(adata, sample_name, species, tissue, stage='adult', cluster_key='leiden', model=None, max_completion_tokens=None, provider=None, api_key=None, _skip_validation=False)#

Handles cell type annotation for a single sample/batch.

Computes marker genes, queries LLM for cell type predictions, and manages annotation results for an individual sample. Typically used as part of a multi-sample workflow orchestrated by CellAnnotator.

Parameters:
  • %(adata_sample)s

  • %(sample_name)s

  • %(species)s

  • %(tissue)s

  • %(stage)s

  • %(cluster_key)s

  • %(model)s

  • %(max_completion_tokens)s

  • %(provider)s

  • %(api_key)s

  • adata (AnnData)

  • sample_name (str)

  • species (str)

  • tissue (str)

  • stage (str)

  • cluster_key (str)

  • model (str | None)

  • max_completion_tokens (int | None)

  • provider (str | None)

  • api_key (str | None)

  • _skip_validation (bool)

Attributes table#

api_keys

Access to API key manager.

Methods table#

annotate_clusters(min_markers, ...[, ...])

Annotate clusters based on marker genes.

check_api_access([provider, model])

Check API access and log warnings if needed.

get_cluster_markers([method, ...])

Get marker genes per cluster

harmonize_annotations(global_cell_type_list)

Map local cell type names to global cell type names.

list_available_models()

List available models for the current provider.

query_llm(instruction, response_format[, ...])

Query the LLM with a given instruction.

test_query([return_details])

Test if the LLM setup is working correctly.

Attributes#

SampleAnnotator.api_keys#

Access to API key manager.

Methods#

SampleAnnotator.annotate_clusters(min_markers, expected_marker_genes, restrict_to_expected=False)#

Annotate clusters based on marker genes.

Parameters:
  • min_markers (int) – Minimum number of required marker genes per cluster.

  • expected_marker_genes (dict[str, list[str]] | None) – Expected marker genes per cell type.

  • restrict_to_expected (bool (default: False)) – If True, only use expected cell types for annotation.

Return type:

None

Returns:

Updates the following attributes: - self.annotation_dict - self.annotation_df

SampleAnnotator.check_api_access(provider=None, model=None)#

Check API access and log warnings if needed.

Return type:

bool

Parameters:
  • provider (str | None)

  • model (str | None)

SampleAnnotator.get_cluster_markers(method='wilcoxon', min_cells_per_cluster=3, min_specificity=0.75, min_auc=0.7, max_markers=7, use_raw=False, use_rapids=False)#

Get marker genes per cluster

Parameters:
  • method (Optional[Literal['logreg', 't-test', 'wilcoxon', 't-test_overestim_var']] (default: 'wilcoxon')) – Method for marker gene computation. See scanpy.tl.rank_genes_groups for details.

  • min_cells_per_cluster (int (default: 3)) – Include only clusters with at least this many cells.

  • min_specificity (float (default: 0.75)) – Minimum specificity threshold for marker genes.

  • min_auc (float (default: 0.7)) – Minimum AUC threshold for marker genes.

  • max_markers (int (default: 7)) – Maximum number of marker genes per cluster.

  • use_raw (bool (default: False)) – Whether to use raw data for calculations.

  • use_rapids (bool (default: False)) – Whether to use RAPIDS for GPU acceleration.

Return type:

None

Returns:

None

Updates the following attributes: - self.marker_dfs - self.marker_genes

SampleAnnotator.harmonize_annotations(global_cell_type_list, unknown_key='Unknown')#

Map local cell type names to global cell type names.

Parameters:
  • global_cell_type_list (list[str]) – List of global cell types.

  • unknown_key (str (default: 'Unknown')) – Key for the unknown category.

Return type:

None

Returns:

Updates the following fields: - self.local_cell_type_mapping - self.annotation_df["cell_type_harmonized"]

SampleAnnotator.list_available_models()#

List available models for the current provider.

Return type:

list[str]

Returns:

list[str] List of available model names.

SampleAnnotator.query_llm(instruction, response_format, other_messages=None)#

Query the LLM with a given instruction.

Parameters:
  • instruction (str) – Instruction to provide to the model.

  • response_format (type[BaseOutput]) – Response format class.

  • other_messages (list | None (default: None)) – Additional messages to provide to the model.

Return type:

BaseOutput

Returns:

Parsed response.

SampleAnnotator.test_query(return_details=False)#

Test if the LLM setup is working correctly.

Performs a simple query to verify that the API key is valid and the model can be accessed successfully.

Parameters:

return_details (bool (default: False)) – If True, returns (success, message) tuple with detailed information. If False, returns only boolean success status.

Return type:

bool | tuple[bool, str]

Returns:

If return_details=False: True if the test query succeeds, False otherwise. If return_details=True: Tuple of (success, message) with detailed status.