cell_annotator.CellAnnotator#
- class cell_annotator.CellAnnotator(adata, species, tissue, stage='adult', cluster_key='leiden', sample_key=None, model=None, max_completion_tokens=None, provider=None, api_key=None)#
Main class for annotating cell types across multiple samples.
Orchestrates the annotation workflow by creating SampleAnnotator instances for each sample, coordinating marker gene computation, cell type annotation, and harmonizing results across samples. Supports any LLM provider backend.
- Parameters:
adata (
AnnData) – AnnData object containing single-cell data.sample_key (
str|None(default:None)) – Key inobsindicating sample/batch membership. If None, treats the entire dataset as a single sample.species (
str) – Species name (e.g., ‘homo sapiens’, ‘mus musculus’).tissue (
str) – Tissue name (e.g., ‘brain’, ‘heart’, ‘lung’).stage (
str(default:'adult')) – Developmental stage (e.g., ‘adult’, ‘embryonic’, ‘fetal’).cluster_key (
str(default:'leiden')) – Key of the cluster column in adata.obs.model (
str|None(default:None)) – Model name. If None, uses the default model for the selected or auto-detected provider. Examples: ‘gpt-4o-mini’, ‘gemini-2.5-flash-lite’, ‘claude-haiku-4-5’.max_completion_tokens (
int|None(default:None)) – Maximum number of tokens the model is allowed to use for completion.provider (
str|None(default:None)) – LLM provider name. If None, auto-detects from model name or uses the first available provider with a valid API key. See PackageConstants.supported_providers for the list of supported providers.api_key (
str|None(default:None)) – Optional API key for the selected provider. If None, uses environment variables. Useful for programmatically providing API keys or using different keys per instance.
Attributes table#
Access to API key manager. |
Methods table#
|
Annotate clusters based on marker genes. |
|
Check API access and log warnings if needed. |
|
Get marker genes per cluster |
|
Get expected cell types and marker genes. |
List available models for the current provider. |
|
|
Query the LLM with a given instruction. |
|
Test if the LLM setup is working correctly. |
Attributes#
- CellAnnotator.api_keys#
Access to API key manager.
Methods#
- CellAnnotator.annotate_clusters(min_markers=2, restrict_to_expected=False, key_added='cell_type_predicted')#
Annotate clusters based on marker genes.
- Parameters:
min_markers (
int(default:2)) – Minimal number of required marker genes per cluster.key_added (
str(default:'cell_type_predicted')) – Name of the key in .obs where updated annotations will be written.restrict_to_expected (
bool(default:False)) – If True, only use expected cell types for annotation.
- Returns:
Updates the following attributes: -
self.annotation_df-self.adata.obs[key_added]-self.annotated
- CellAnnotator.check_api_access(provider=None, model=None)#
Check API access and log warnings if needed.
- CellAnnotator.get_cluster_markers(method='wilcoxon', min_specificity=0.75, min_auc=0.7, max_markers=7, use_raw=False, use_rapids=False)#
Get marker genes per cluster
- Parameters:
method (
TypeAliasType|None(default:'wilcoxon')) – Method forsc.tl.rank_genes_groups.min_specificity (
float(default:0.75)) – Minimum specificity threshold for marker genes.min_auc (
float(default:0.7)) – Minimum AUC threshold for marker genes.max_markers (
int(default:7)) – Maximum number of marker genes per cluster.use_raw (
bool(default:False)) – Whether to use raw data for calculations.use_rapids (
bool(default:False)) – Whether to use RAPIDS for GPU acceleration.
- Return type:
- Returns:
Updates the following attributes: -
self.marker_dfs-self.marker_genes
- CellAnnotator.get_expected_cell_type_markers(n_markers=5, filter_to_var_names=True, provide_var_names=True)#
Get expected cell types and marker genes.
- Parameters:
n_markers (
int(default:5)) – Number of marker genes per cell type.filter_to_var_names (
bool(default:True)) – Whether to filter marker genes to only include those present inadata.var_namesprovide_var_names (
bool(default:True)) – If True, include the available gene names in the prompt and instruct the model to restrict itself to this set.
- Return type:
- Returns:
Updates the following attributes: -
self.expected_cell_types-self.expected_marker_genes
- CellAnnotator.list_available_models()#
List available models for the current provider.
- CellAnnotator.query_llm(instruction, response_format, agent_description=None, other_messages=None)#
Query the LLM with a given instruction.
- Parameters:
instruction (
str) – Instruction to provide to the model.response_format (
type[BaseOutput]) – Response format class.agent_description (
str|None(default:None)) – Optional system prompt override. If None, uses the default cell-annotation prompt fromself.prompts.other_messages (
list|None(default:None)) – Additional messages to provide to the model.
- Return type:
BaseOutput- Returns:
Parsed response.
- CellAnnotator.test_query(return_details=False)#
Test if the LLM setup is working correctly.
Performs a simple structured-output query against the configured model. For OpenRouter slugs whose upstream model does not implement OpenAI’s
.parse()endpoint, the provider’s fallback chain (extra_bodyjson_schema → plainjson_object→ optional text-repair) carries the request, so the same code path works for every provider.- Parameters:
return_details (
bool(default:False)) – If True, returns (success, message) tuple with detailed information. If False, returns only boolean success status.- Return type:
- Returns:
If return_details=False: True if the test query succeeds, False otherwise. If return_details=True: Tuple of (success, message) with detailed status.