What is BioContext7
An overview of BioContext7 — the bioinformatics registry aggregator and pipeline generator
Overview
BioContext7 is a bioinformatics registry aggregator and self-healing pipeline generator designed for integration with Claude Code via the Model Context Protocol (MCP).
It solves a core problem in bioinformatics: discovering the right tools for a workflow and generating correct, executable pipeline code from natural language descriptions.
Architecture
BioContext7 is composed of four layers:
Registry Layer
Aggregates tool metadata from multiple sources into a unified, searchable index:
- bio.tools — 25,000+ curated bioinformatics tools with EDAM annotations
- BioContainers — Docker and Singularity container images for reproducible execution
- EDAM Ontology — Semantic terms for operations, topics, data types, and formats
- UniProt — Protein sequences, annotations, and ID mapping
- GA4GH Standards — Beacon (variant queries), VRS (variant representation), WES (workflow execution)
- Metabolomics — HMDB, Metabolomics Workbench, MassBank, LIPID MAPS
Compiler Layer
Translates a language-agnostic intermediate representation (PipelineSpec) into target-specific
pipeline code:
| Target | Output |
|---|---|
| Nextflow DSL2 | main.nf + nextflow.config |
| Snakemake | Snakefile + rule files |
| WDL | WDL task and workflow definitions |
| CWL | CWL workflow + tool definitions |
Healing Layer
Validates generated pipelines using Language Server Protocol (LSP) integration and automatically fixes errors through iterative correction loops:
- Generate pipeline code
- Run LSP validation (syntax, type checking)
- Collect diagnostics
- Apply auto-fixes
- Re-validate until clean or max iterations reached
MCP Layer
Exposes BioContext7 capabilities as MCP tools for Claude Code integration:
search_biotools— Search the bio.tools registryget_tool— Get detailed tool informationsuggest_pipeline— Get tool chain suggestions for a workflowlookup_edam— Convert natural language to EDAM ontology terms
Design Principles
- Deterministic core — Data and specs produce build artifacts without LLM involvement
- Provenance everywhere — Full tracking of inputs, tool versions, and outputs
- Grounded text — Every output references concrete artifacts
- Self-healing — LSP validation loops catch and fix errors automatically