Git-based Viral Taxonomy Management System
Welcome to ICTV-git! This guide will help you get up and running with the git-based viral taxonomy system.
Before you begin, ensure you have:
git clone https://github.com/shandley/ICTV-git.git
cd ICTV-git
We recommend using a virtual environment:
# Create virtual environment
python3 -m venv ictv_api_env
# Activate it
# On macOS/Linux:
source ictv_api_env/bin/activate
# On Windows:
ictv_api_env\Scripts\activate
# Install API dependencies
pip install -r requirements_api.txt
You have two options:
Option A: Download All Historical Data (Recommended)
python scripts/download_msl.py
This downloads all MSL files from 2005-2024 (~500MB).
Option B: Download Specific Versions
python scripts/download_msl.py --versions MSL38 MSL37 MSL36
Build the complete historical git repository with all 18 MSL releases:
python scripts/complete_20_year_conversion.py
This creates a complete git repository at output/ictv_complete_20_year_taxonomy/
with:
Start the web interface:
streamlit run scripts/run_taxonomy_browser.py
Open your browser to http://localhost:8501 to:
Start the enhanced API server:
python scripts/run_api_server.py --dev
The API will be available at http://localhost:8000 with auto-generated documentation at http://localhost:8000/docs.
Example API calls:
# Natural language query
curl -X POST http://localhost:8000/ai/query \
-H "Content-Type: application/json" \
-d '{"query": "What happened to Caudovirales in 2019?"}'
# Get species information
curl http://localhost:8000/taxonomy/species/Tobacco%20Mosaic%20Virus
# Search with filters
curl -X POST http://localhost:8000/search/species \
-H "Content-Type: application/json" \
-d '{"query": "coronavirus", "family_filter": "Coronaviridae"}'
# Compare historical releases
curl http://localhost:8000/historical/compare/MSL35/MSL40
# Get timeline summary
curl http://localhost:8000/historical/timeline
Compare any two MSL versions:
python -m src.community_tools.version_comparator output/git_taxonomy \
--version1 MSL37 --version2 MSL38 --output comparison_report.json
This generates:
If you have research data using an older taxonomy version:
from src.utils.migration_mapper import DatasetMigrator
# Initialize migrator
migrator = DatasetMigrator("output/git_taxonomy")
# Migrate your data
old_data = [
{"virus": "Escherichia phage T4", "family": "Myoviridae"},
# ... more entries
]
new_data = migrator.migrate_dataset(
old_data,
from_version="MSL36",
to_version="MSL38"
)
# View migration report
print(migrator.get_migration_summary())
Generate proper citations for reproducibility:
from src.community_tools.citation_generator import CitationGenerator
generator = CitationGenerator("output/git_taxonomy")
# Cite a specific species
citation = generator.cite_species(
"Severe acute respiratory syndrome-related coronavirus",
"MSL38",
format="bibtex" # or "standard", "ris", "git"
)
print(citation)
# Check if Caudovirales still exists
from src.community_tools.taxonomy_browser import TaxonomyBrowser
browser = TaxonomyBrowser("output/git_taxonomy")
# Use the browser to track changes
When reviewers ask you to update to the latest taxonomy:
# Which families are most stable?
from src.utils.stability_analyzer import StabilityAnalyzer
analyzer = StabilityAnalyzer("output/git_taxonomy")
stability_report = analyzer.analyze_family_stability()
Now that you have the basics working:
examples/
directory for Jupyter notebooksdocs/api_usage_examples.md
docs/migration_guide.md
CONTRIBUTING.md
to help improve the projectImportError when running scripts
# Make sure you're in the project root and have installed requirements
cd /path/to/ICTV-git
pip install -r requirements.txt
“MSL file not found” errors
# Download the data first
python scripts/download_msl.py
Port already in use (API or Browser)
# Change the port
python scripts/run_taxonomy_api.py --port 8001
# or
streamlit run scripts/run_taxonomy_browser.py --server.port 8502
Coming soon:
Join the discussion:
Happy exploring! đź¦