BenchLoop Documentation

BenchLoop is a Python library for managing, processing, and benchmarking datasets in SQLite databases, designed for AI pipelines, LLM prompt engineering, and dataset curation.

Table of Contents

  1. Introduction
  2. Installation
  3. Project Structure
  4. Quickstart
  5. API Reference
  6. Advanced Usage
  7. Best Practices
  8. Troubleshooting
  9. FAQ

Introduction

BenchLoop is a Python library for managing, processing, and benchmarking datasets in SQLite databases, designed for AI pipelines, LLM prompt engineering, and dataset curation. It enables you to:

Installation

Requirements:

Install BenchLoop:

pip install benchloop
pip install openai  # For LLM prompt execution

Project Structure

benchloop/
├── __init__.py
├── loader.py
├── db_manager.py
├── prompt_runner.py
├── dataset_exporter.py
├── benchmarker.py
docs/
└── benchloop.md
main.py

Quickstart

from benchloop.loader import load_table
from benchloop.prompt_runner import execute_prompt_on_table
from benchloop.dataset_exporter import export_training_dataset
from benchloop.benchmarker import benchmark_responses

# 1. Load data
load_table(
    table_name="products",
    data_source=[{"id": 1, "name": "Zapato", "price": "50"}],
    db_path="mydb.sqlite"
)

# 2. Run prompts and store LLM responses
execute_prompt_on_table(
    table_name="products",
    prompt_template="Describe the product {name} that costs {price} dollars.",
    columns_variables=["name", "price"],
    result_mapping={"response": "llm_response"},
    db_path="mydb.sqlite",
    model="gpt-4o",
    api_key="sk-...",
)

# 3. Export dataset for training
export_training_dataset(
    table_name="products",
    prompt_template="Describe the product {name} that costs {price} dollars.",
    response_column="llm_response",
    output_file="dataset.jsonl",
    db_path="mydb.sqlite",
    format="messages"
)

# 4. Benchmark responses
benchmark_responses(
    table_name="products",
    column_ai="llm_response",
    column_ground_truth="ground_truth",
    db_path="mydb.sqlite",
    benchmark_tag=None
)

API Reference

load_table

Purpose: Load structured data into an SQLite table from CSV, JSON, or a list of dicts.
Location: benchloop/loader.py

load_table(table_name: str, data_source: Union[str, List[Dict]], db_path: str)

Features:

Example:

load_table("mytable", "data.csv", "mydb.sqlite")

filter_rows

Purpose: Flexible filtering of rows with support for operators and include/exclude modes.
Location: benchloop/db_manager.py

DBManager.filter_rows(table_name: str, filters: dict, mode: str, db_path: str) -> list

Example:

rows = DBManager.filter_rows("mytable", {"price": {">": 100}}, "include", "mydb.sqlite")

execute_prompt_on_table

Purpose: Run prompts row-by-row, substitute variables, call LLM, and store responses.
Location: benchloop/prompt_runner.py

execute_prompt_on_table(
    table_name: str,
    prompt_template: str,
    columns_variables: List[str],
    result_mapping: Dict[str, str],
    db_path: str,
    filters: Optional[Dict] = None,
    limit: Optional[int] = None,
    model: str = "gpt-4o",
    api_key: str = ""
)

Example:

execute_prompt_on_table(
    "products",
    "Describe {name}",
    ["name"],
    {"response": "llm_response"},
    "mydb.sqlite",
    model="gpt-4o",
    api_key="sk-..."
)

export_training_dataset

Purpose: Export a JSONL dataset for training, with prompt/response pairs.
Location: benchloop/dataset_exporter.py

export_training_dataset(
    table_name: str,
    prompt_template: str,
    response_column: str,
    output_file: str,
    db_path: str,
    filters: Optional[Dict] = None,
    format: str = "messages"
)

Example:

export_training_dataset(
    "products",
    "Describe {name}",
    "llm_response",
    "dataset.jsonl",
    "mydb.sqlite"
)

benchmark_responses

Purpose: Compare AI responses vs. ground truth and compute metrics.
Location: benchloop/benchmarker.py

benchmark_responses(
    table_name: str,
    column_ai: str,
    column_ground_truth: str,
    db_path: str,
    benchmark_tag: Optional[str] = None,
    benchmark_column: str = "benchmark",
    similarity_threshold: float = 0.9
) -> Dict

Example:

metrics = benchmark_responses(
    "products",
    "llm_response",
    "ground_truth",
    "mydb.sqlite",
    benchmark_tag="PrecioTest"
)

Advanced Usage

Best Practices

Troubleshooting

FAQ


BenchLoop is designed to make dataset curation, prompt engineering, and benchmarking fast, reproducible, and robust.
For more examples, see main.py or open an issue on the project repository.