cell_annotator.SampleAnnotator#
- class cell_annotator.SampleAnnotator(adata, sample_name, species, tissue, stage='adult', cluster_key='leiden', model=None, max_completion_tokens=None, provider=None, api_key=None, _skip_validation=False)#
Handles cell type annotation for a single sample/batch.
Computes marker genes, queries LLM for cell type predictions, and manages annotation results for an individual sample. Typically used as part of a multi-sample workflow orchestrated by CellAnnotator.
- Parameters:
%(adata_sample)s
%(sample_name)s
%(species)s
%(tissue)s
%(stage)s
%(cluster_key)s
%(model)s
%(max_completion_tokens)s
%(provider)s
%(api_key)s
adata (AnnData)
sample_name (str)
species (str)
tissue (str)
stage (str)
cluster_key (str)
model (str | None)
max_completion_tokens (int | None)
provider (str | None)
api_key (str | None)
_skip_validation (bool)
Attributes table#
Access to API key manager. |
Methods table#
|
Annotate clusters based on marker genes. |
|
Check API access and log warnings if needed. |
|
Get marker genes per cluster |
|
Map local cell type names to global cell type names. |
List available models for the current provider. |
|
|
Query the LLM with a given instruction. |
|
Test if the LLM setup is working correctly. |
Attributes#
- SampleAnnotator.api_keys#
Access to API key manager.
Methods#
- SampleAnnotator.annotate_clusters(min_markers, expected_marker_genes, restrict_to_expected=False)#
Annotate clusters based on marker genes.
- Parameters:
- Return type:
- Returns:
Updates the following attributes: -
self.annotation_dict-self.annotation_df
- SampleAnnotator.check_api_access(provider=None, model=None)#
Check API access and log warnings if needed.
- SampleAnnotator.get_cluster_markers(method='wilcoxon', min_cells_per_cluster=3, min_specificity=0.75, min_auc=0.7, max_markers=7, use_raw=False, use_rapids=False)#
Get marker genes per cluster
- Parameters:
method (
TypeAliasType|None(default:'wilcoxon')) – Method for marker gene computation. See scanpy.tl.rank_genes_groups for details.min_cells_per_cluster (
int(default:3)) – Include only clusters with at least this many cells.min_specificity (
float(default:0.75)) – Minimum specificity threshold for marker genes.min_auc (
float(default:0.7)) – Minimum AUC threshold for marker genes.max_markers (
int(default:7)) – Maximum number of marker genes per cluster.use_raw (
bool(default:False)) – Whether to use raw data for calculations.use_rapids (
bool(default:False)) – Whether to use RAPIDS for GPU acceleration.
- Return type:
- Returns:
None
Updates the following attributes: -
self.marker_dfs-self.marker_genes
- SampleAnnotator.harmonize_annotations(global_cell_type_list, unknown_key='Unknown')#
Map local cell type names to global cell type names.
- SampleAnnotator.list_available_models()#
List available models for the current provider.
- SampleAnnotator.query_llm(instruction, response_format, agent_description=None, other_messages=None)#
Query the LLM with a given instruction.
- Parameters:
instruction (
str) – Instruction to provide to the model.response_format (
type[BaseOutput]) – Response format class.agent_description (
str|None(default:None)) – Optional system prompt override. If None, uses the default cell-annotation prompt fromself.prompts.other_messages (
list|None(default:None)) – Additional messages to provide to the model.
- Return type:
BaseOutput- Returns:
Parsed response.
- SampleAnnotator.test_query(return_details=False)#
Test if the LLM setup is working correctly.
Performs a simple structured-output query against the configured model. For OpenRouter slugs whose upstream model does not implement OpenAI’s
.parse()endpoint, the provider’s fallback chain (extra_bodyjson_schema → plainjson_object→ optional text-repair) carries the request, so the same code path works for every provider.- Parameters:
return_details (
bool(default:False)) – If True, returns (success, message) tuple with detailed information. If False, returns only boolean success status.- Return type:
- Returns:
If return_details=False: True if the test query succeeds, False otherwise. If return_details=True: Tuple of (success, message) with detailed status.