Deprecation Notice
==================

.. warning::
   **The ``evaluation`` module is deprecated and will be removed in version 2.0.0.**
   
   Please migrate to the ``experiments`` module for new features, better architecture,
   and continued support.

Overview
--------

The ``honeyhive.evaluation`` module has been superseded by ``honeyhive.experiments`` which provides:

- **Improved Architecture**: Decorator-based evaluators instead of class inheritance
- **Backend Aggregation**: Server-side metric aggregation for better performance
- **Enhanced Tracer Integration**: Seamless integration with the multi-instance tracer
- **Better Type Safety**: Pydantic v2 models with full validation
- **Cleaner API**: Simpler, more intuitive function signatures

Deprecation Timeline
--------------------

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Version
     - Status
   * - 0.2.x (Current)
     - ``evaluation`` module works with deprecation warnings
   * - 1.x
     - ``evaluation`` module continues to work with warnings
   * - 2.0.0 (Future)
     - ``evaluation`` module removed, must use ``experiments``

Migration Guide
---------------

Quick Migration Checklist
~~~~~~~~~~~~~~~~~~~~~~~~~

1. Update imports: ``honeyhive.evaluation`` → ``honeyhive.experiments``
2. Replace class-based evaluators with ``@evaluator`` decorator
3. Update ``evaluate()`` function signature
4. Update result handling to use new models

Detailed Migration Steps
~~~~~~~~~~~~~~~~~~~~~~~~

**Step 1: Update Imports**

.. code-block:: python

   # OLD
   from honeyhive.evaluation import evaluate, BaseEvaluator, EvaluationResult
   
   # NEW
   from honeyhive.experiments import evaluate, evaluator, ExperimentResultSummary

**Step 2: Convert Class-Based Evaluators to Decorators**

.. code-block:: python

   # OLD - Class inheritance
   from honeyhive.evaluation import BaseEvaluator
   
   class AccuracyEvaluator(BaseEvaluator):
       def __init__(self, threshold=0.8):
           super().__init__("accuracy")
           self.threshold = threshold
       
       def evaluate(self, inputs, outputs, ground_truth):
           score = calculate_accuracy(outputs, ground_truth)
           return {
               "score": score,
               "passed": score >= self.threshold
           }
   
   # NEW - Decorator-based
   from honeyhive.experiments import evaluator
   
   @evaluator
   def accuracy_evaluator(outputs, inputs, ground_truth):
       """Note: outputs is first parameter in new signature."""
       score = calculate_accuracy(outputs, ground_truth)
       threshold = 0.8  # Can use closures or default args
       return {
           "score": score,
           "passed": score >= threshold
       }

**Step 3: Update evaluate() Function Calls**

.. code-block:: python

   # OLD
   from honeyhive.evaluation import evaluate
   
   result = evaluate(
       inputs=test_inputs,
       outputs=test_outputs,
       evaluators=[AccuracyEvaluator(), F1Evaluator()],
       ground_truth=expected_outputs
   )
   
   # NEW
   from honeyhive.experiments import evaluate
   
   result = evaluate(
       function=my_llm_function,  # Your function to test
       dataset=[
           {"inputs": {...}, "ground_truth": {...}},
           {"inputs": {...}, "ground_truth": {...}},
       ],
       evaluators=[accuracy_evaluator, f1_evaluator],  # Function refs
       api_key="your-key",
       project="your-project",
       name="experiment-v1"
   )

**Step 4: Update Result Handling**

.. code-block:: python

   # OLD
   from honeyhive.evaluation import EvaluationResult
   
   result = evaluate(...)
   
   # Access results (old structure)
   overall_score = result.score
   metrics = result.metrics
   
   # NEW
   from honeyhive.experiments import ExperimentResultSummary
   
   result = evaluate(...)
   
   # Access results (new structure)
   print(f"Run ID: {result.run_id}")
   print(f"Status: {result.status}")
   print(f"Success: {result.success}")
   print(f"Passed: {len(result.passed)}")
   print(f"Failed: {len(result.failed)}")
   
   # Aggregated metrics
   accuracy = result.metrics.get_metric("accuracy_evaluator")
   all_metrics = result.metrics.get_all_metrics()

**Step 5: Update Async Evaluators**

.. code-block:: python

   # OLD - Async class method
   class AsyncEvaluator(BaseEvaluator):
       async def evaluate(self, inputs, outputs, ground_truth):
           result = await external_api_call(outputs)
           return {"score": result.score}
   
   # NEW - @aevaluator decorator
   from honeyhive.experiments import aevaluator
   
   @aevaluator
   async def async_evaluator(outputs, inputs, ground_truth):
       result = await external_api_call(outputs)
       return {"score": result.score}

Common Patterns
~~~~~~~~~~~~~~~

**Pattern 1: Built-in Evaluators**

.. code-block:: python

   # OLD
   from honeyhive.evaluation.evaluators import (
       ExactMatchEvaluator,
       LengthEvaluator,
       SemanticSimilarityEvaluator
   )
   
   evaluators = [
       ExactMatchEvaluator(),
       LengthEvaluator(min_length=10, max_length=100),
       SemanticSimilarityEvaluator()
   ]
   
   # NEW - Implement as decorator-based evaluators
   from honeyhive.experiments import evaluator
   
   @evaluator
   def exact_match(outputs, inputs, ground_truth):
       return {"score": 1.0 if outputs == ground_truth else 0.0}
   
   @evaluator
   def length_check(outputs, inputs, ground_truth):
       length = len(str(outputs))
       in_range = 10 <= length <= 100
       return {"score": 1.0 if in_range else 0.0}
   
   # Use external APIs for factual accuracy
   @aevaluator
   async def factual_accuracy(outputs, inputs, ground_truth):
       result = await fact_check_api(outputs, ground_truth)
       return {"score": result.accuracy}
   
   evaluators = [exact_match, length_check, factual_accuracy]

**Pattern 2: Evaluator with State**

.. code-block:: python

   # OLD
   class StatefulEvaluator(BaseEvaluator):
       def __init__(self, model):
           super().__init__("stateful")
           self.model = model  # Store state
       
       def evaluate(self, inputs, outputs, ground_truth):
           score = self.model.predict(outputs)
           return {"score": score}
   
   # NEW - Use closures or class methods with decorator
   from honeyhive.experiments import evaluator
   
   # Option 1: Closure
   def create_stateful_evaluator(model):
       @evaluator
       def stateful_evaluator(outputs, inputs, ground_truth):
           score = model.predict(outputs)
           return {"score": score}
       return stateful_evaluator
   
   model = load_model()
   my_evaluator = create_stateful_evaluator(model)
   
   # Option 2: Class with __call__
   class StatefulEvaluator:
       def __init__(self, model):
           self.model = model
       
       @evaluator
       def __call__(self, outputs, inputs, ground_truth):
           score = self.model.predict(outputs)
           return {"score": score}
   
   my_evaluator = StatefulEvaluator(load_model())

**Pattern 3: Batch Evaluation**

.. code-block:: python

   # OLD
   from honeyhive.evaluation import evaluate_batch
   
   results = evaluate_batch(
       inputs_list=batch_inputs,
       outputs_list=batch_outputs,
       evaluators=[evaluator1, evaluator2],
       max_workers=4
   )
   
   # NEW - Use evaluate() with dataset
   from honeyhive.experiments import evaluate
   
   result = evaluate(
       function=my_function,
       dataset=test_dataset,
       evaluators=[evaluator1, evaluator2],
       max_workers=4,
       api_key="key",
       project="project"
   )

Backward Compatibility Layer
----------------------------

The old ``evaluation`` module still works through a compatibility layer:

.. code-block:: python

   # This still works but shows deprecation warnings
   from honeyhive.evaluation import evaluate, evaluator
   
   # Internally redirects to honeyhive.experiments
   result = evaluate(...)

**Deprecation Warnings:**

When you use the old module, you'll see warnings like:

.. code-block:: text

   DeprecationWarning: honeyhive.evaluation.evaluate is deprecated.
   Please use honeyhive.experiments.evaluate instead.
   The evaluation module will be removed in version 2.0.0.

Breaking Changes
----------------

**Parameter Order Change**

Evaluator function signature changed:

.. code-block:: python

   # OLD
   def evaluator(inputs, outputs, ground_truth):
       pass
   
   # NEW - outputs comes first
   def evaluator(outputs, inputs, ground_truth):
       pass

**evaluate() Signature Change**

The main evaluate function has a completely new signature:

.. code-block:: python

   # OLD
   evaluate(inputs, outputs, evaluators, ground_truth=None)
   
   # NEW
   evaluate(
       function,          # NEW: function to test
       dataset,           # NEW: combined inputs + ground_truth
       evaluators,
       api_key,           # NEW: required
       project,           # NEW: required
       name=None,
       max_workers=1,
       aggregate_function="average",
       verbose=False
   )

**Return Type Change**

.. code-block:: python

   # OLD
   result: EvaluationResult = evaluate(...)
   result.score           # Overall score
   result.metrics         # Dict of metrics
   result.passed          # Bool
   
   # NEW
   result: ExperimentResultSummary = evaluate(...)
   result.run_id          # Unique run ID
   result.status          # ExperimentRunStatus enum
   result.success         # Bool
   result.passed          # List[str] of passed datapoint IDs
   result.failed          # List[str] of failed datapoint IDs
   result.metrics         # AggregatedMetrics object

Support & Help
--------------

**Documentation:**

- :doc:`../experiments/experiments` - Experiments module overview

**Common Issues:**

1. **Import Error**: Make sure you've updated imports to ``honeyhive.experiments``
2. **Parameter Order**: Remember ``outputs`` comes first in new evaluators
3. **Missing api_key/project**: These are now required for ``evaluate()``
4. **Result Structure**: Use new ``ExperimentResultSummary`` structure

**Getting Help:**

- GitHub Issues: https://github.com/honeyhive/python-sdk/issues
- Documentation: https://docs.honeyhive.ai
- Community: https://discord.gg/honeyhive

See Also
--------

- :doc:`../experiments/experiments` - New experiments module
- :doc:`../experiments/evaluators` - Decorator-based evaluators

