What is BioContext7

An overview of BioContext7 — the bioinformatics registry aggregator and pipeline generator

Overview

BioContext7 is a bioinformatics registry aggregator and self-healing pipeline generator designed for integration with Claude Code via the Model Context Protocol (MCP).

It solves a core problem in bioinformatics: discovering the right tools for a workflow and generating correct, executable pipeline code from natural language descriptions.

Architecture

BioContext7 is composed of four layers:

Registry Layer

Aggregates tool metadata from multiple sources into a unified, searchable index:

bio.tools — 25,000+ curated bioinformatics tools with EDAM annotations
BioContainers — Docker and Singularity container images for reproducible execution
EDAM Ontology — Semantic terms for operations, topics, data types, and formats
UniProt — Protein sequences, annotations, and ID mapping
GA4GH Standards — Beacon (variant queries), VRS (variant representation), WES (workflow execution)
Metabolomics — HMDB, Metabolomics Workbench, MassBank, LIPID MAPS

Compiler Layer

Translates a language-agnostic intermediate representation (PipelineSpec) into target-specific pipeline code:

Target	Output
Nextflow DSL2	`main.nf` + `nextflow.config`
Snakemake	`Snakefile` + rule files
WDL	WDL task and workflow definitions
CWL	CWL workflow + tool definitions

Healing Layer

Validates generated pipelines using Language Server Protocol (LSP) integration and automatically fixes errors through iterative correction loops:

Generate pipeline code
Run LSP validation (syntax, type checking)
Collect diagnostics
Apply auto-fixes
Re-validate until clean or max iterations reached

MCP Layer

Exposes BioContext7 capabilities as MCP tools for Claude Code integration:

search_biotools — Search the bio.tools registry
get_tool — Get detailed tool information
suggest_pipeline — Get tool chain suggestions for a workflow
lookup_edam — Convert natural language to EDAM ontology terms

Design Principles

Deterministic core — Data and specs produce build artifacts without LLM involvement
Provenance everywhere — Full tracking of inputs, tool versions, and outputs
Grounded text — Every output references concrete artifacts
Self-healing — LSP validation loops catch and fix errors automatically