BioContext7

What is BioContext7

An overview of BioContext7 — the bioinformatics registry aggregator and pipeline generator

Overview

BioContext7 is a bioinformatics registry aggregator and self-healing pipeline generator designed for integration with Claude Code via the Model Context Protocol (MCP).

It solves a core problem in bioinformatics: discovering the right tools for a workflow and generating correct, executable pipeline code from natural language descriptions.

Architecture

BioContext7 is composed of four layers:

Registry Layer

Aggregates tool metadata from multiple sources into a unified, searchable index:

  • bio.tools — 25,000+ curated bioinformatics tools with EDAM annotations
  • BioContainers — Docker and Singularity container images for reproducible execution
  • EDAM Ontology — Semantic terms for operations, topics, data types, and formats
  • UniProt — Protein sequences, annotations, and ID mapping
  • GA4GH Standards — Beacon (variant queries), VRS (variant representation), WES (workflow execution)
  • Metabolomics — HMDB, Metabolomics Workbench, MassBank, LIPID MAPS

Compiler Layer

Translates a language-agnostic intermediate representation (PipelineSpec) into target-specific pipeline code:

TargetOutput
Nextflow DSL2main.nf + nextflow.config
SnakemakeSnakefile + rule files
WDLWDL task and workflow definitions
CWLCWL workflow + tool definitions

Healing Layer

Validates generated pipelines using Language Server Protocol (LSP) integration and automatically fixes errors through iterative correction loops:

  1. Generate pipeline code
  2. Run LSP validation (syntax, type checking)
  3. Collect diagnostics
  4. Apply auto-fixes
  5. Re-validate until clean or max iterations reached

MCP Layer

Exposes BioContext7 capabilities as MCP tools for Claude Code integration:

  • search_biotools — Search the bio.tools registry
  • get_tool — Get detailed tool information
  • suggest_pipeline — Get tool chain suggestions for a workflow
  • lookup_edam — Convert natural language to EDAM ontology terms

Design Principles

  • Deterministic core — Data and specs produce build artifacts without LLM involvement
  • Provenance everywhere — Full tracking of inputs, tool versions, and outputs
  • Grounded text — Every output references concrete artifacts
  • Self-healing — LSP validation loops catch and fix errors automatically

On this page