Metadata-Version: 2.1
Name: hyanova
Version: 1.1.2
Summary: A pure python implementation of fuctional ANOVA algorithm.
Home-page: https://github.com/exiarepairii/hyanova
Author: Su Qiao
Author-email: qiaosu98@outlook.com
License: MIT
Description: # HyANOVA
        
        HyANOVA is a pure python implementation of fuctional ANOVA algorithm,
        which can be used to analyze the importance of hyperparameters in
        machine learning algorithm.
        
        .. _header-n3:
        
        Quick Start
        ===========
        
        To install the package, please use the ``pip`` installation as follows:
        
        .. code:: shell
        
           pip install hyanova
        
        Here is a short example of usage. You can download the
        `data <./examples/iris[GridSearchCV]Model1.csv>`__ from the example
        folder.
        
        .. code:: python
        
           import hyanova
        
           path = './iris[GridSearchCV]Model1.csv' 		# gridsearch results generated by sklearn
           metric = 'mean_test_score' 				# metric for model performance
           df,params = hyanova.read_csv(path,metric)
           # df,params = hyanova.read_df(df,metric)		 You can also load data from pd.DataFrame
           importance = hyanova.analyze(df)
        
        The ``metric`` is the feature you choose to evaluate the model
        performance, it must appears in the ``.csv`` file or the
        ``pandas.DataFrame`` object's column. And the result you got will be
        similar to this below, see the next section(ANOVA) for more details.
        
        .. code:: python
        
           print(importance)
           >>>              u       v_u  F_u(v_u/v_all)
           0           (alpha,)  0.056885        0.892057
           1        (l1_ratio,)  0.002489        0.039030
           2  (alpha, l1_ratio)  0.004394        0.068912
        
        .. _header-n11:
        
        APIs
        ====
        
        .. _header-n12:
        
        Load Data
        ---------
        
        HyANOVA is designed to analyze the grid search results generated by
        sklearn. It provides two ways to load the data.
        
        .. _header-n14:
        
        read_df(df,metric)
        ~~~~~~~~~~~~~~~~~~
        
        You can use ``read_df(df,metric)`` to load data from a
        ``<class 'pandas.core.frame.DataFrame'>`` object.
        
           **Parameters:**
        
           -  **df:**\ <class 'pandas.core.frame.DataFrame'>, the ``DataFrame``
              you want to analyze.
        
           -  **metric:**\ string, the metric you choose.
        
           **Returns:**
        
           -  **result_df:**\ <class 'pandas.core.frame.DataFrame'>,a
              ``DataFrame`` with all hyperparameters' value and the value of
              metric you choose
        
           -  **params_list:** list, a ``list`` of all hyperparameters' name.
        
        .. _header-n29:
        
        read_csv(path,metric)
        ~~~~~~~~~~~~~~~~~~~~~
        
        Use ``hyanova.read_csv(path,metric)`` to load data from ``.csv`` file.
        It is equivalent to ``hyanova.read_df(pandas.read_csv(path),metric)``.
        
           **Parameters:**
        
           -  **path:**\ string, path of the ``DataFrame`` you want to analyze.
        
           -  **metric:**\ string, the metric you choose.
        
           **Returns:**
        
           -  **result_df:**\ <class 'pandas.core.frame.DataFrame'>,a
              ``DataFrame`` with all hyperparameters' value and the value of
              metric you choose
        
           -  **params_list:** list, a ``list`` of all hyperparameters' name.
        
        .. _header-n44:
        
        Example
        ~~~~~~~
        
        The `template
        file <https://github.com/exiarepairii/hyanova/tree/master/example/iris[GridSearchCV]Model1.csv>`__
        can be find at the example folder. Here is an example.
        
        .. code:: python
        
           print(df.head)
        
        .. code:: shell
        
           >>> mean_fit_time  std_fit_time  mean_score_time  std_score_time  param_alpha  \
           0       0.003899      0.000194         0.048513        0.007621     0.000977   
           1       0.003401      0.000584         0.042454        0.011295     0.000977   
           2       0.002706      0.000502         0.048544        0.009059     0.000977   
           3       0.003304      0.000531         0.040709        0.003031     0.000977   
           4       0.001801      0.000116         0.000289        0.000014     0.000977   
        
              param_l1_ratio                                     params  \
           0            0.00   {'alpha': 0.0009765625, 'l1_ratio': 0.0}   
           1            0.25  {'alpha': 0.0009765625, 'l1_ratio': 0.25}   
           2            0.50   {'alpha': 0.0009765625, 'l1_ratio': 0.5}   
           3            0.75  {'alpha': 0.0009765625, 'l1_ratio': 0.75}   
           4            1.00   {'alpha': 0.0009765625, 'l1_ratio': 1.0}   
        
              split0_test_score  split1_test_score  split2_test_score  mean_test_score  \
           0           0.828571           0.971429           0.971429         0.923810   
           1           0.885714           0.971429           0.942857         0.933333   
           2           0.885714           1.000000           0.942857         0.942857   
           3           0.885714           0.914286           0.914286         0.904762   
           4           0.885714           1.000000           0.942857         0.942857   
        
              std_test_score  rank_test_score  
           0        0.067344                4  
           1        0.035635                3  
           2        0.046657                1  
           3        0.013469                5  
           4        0.046657                1  
        
        .. code:: python
        
           df,params = hyanova.read_df(df,'mean_test_score')
           print(df.head)
           >>>  alpha  l1_ratio  mean_test_score
           0  0.000977      0.00         0.923810
           1  0.000977      0.25         0.933333
           2  0.000977      0.50         0.942857
           3  0.000977      0.75         0.904762
           4  0.000977      1.00         0.942857
           print(params)
           >>> ['alpha', 'l1_ratio']
        
        .. _header-n49:
        
        ANOVA
        -----
        
        .. _header-n50:
        
        analyze(df,max_iter=-1)
        ~~~~~~~~~~~~~~~~~~~~~~~
        
        Use ``hyanova.analyze(df,max_iter=-1)`` to do the functional ANOVA
        decomposition.
        
           **Parameters:**
        
           -  **df:**\ <class 'pandas.core.frame.DataFrame'>, the ``DataFrame``
              you want to analyze.
        
           -  **max_iter:**\ int, default to -1.
        
           **Returns:**
        
           -  **result_df:**\ <class 'pandas.core.frame.DataFrame'>
        
        The ``df`` parameter needs a ``pnadas.DataFrame`` object which has a
        format similar to the following table. You can use the methods HyANOVA
        provides to load data easily.
        
        == ======= ======== ===================
        \  alpha   l1_ratio mean\ *test*\ score
        == ======= ======== ===================
        0  0.00977 0.00     0.923810
        1  0.00977 0.25     0.933333
        2  0.00977 0.50     0.942857
        3  0.00977 0.75     0.904762
        == ======= ======== ===================
        
        **Note:** The metric(mean\ *test*\ score) should always be in the last
        column.
        
        .. _header-n91:
        
        Example
        ~~~~~~~
        
        The ``hyanova.analyze(df)`` will return a ``DataFrame`` with
        hyperparameters' name, variance(v\ *u) and the importance(F*\ u).
        
        .. code:: python
        
           importance = hyanova.analyze(df)
           >>> 100%|██████████████████████████████████| 3/3 [00:00<00:00, 11.32 it/s]
           print(importance)
           >>>              u       v_u  F_u(v_u/v_all)
           0           (alpha,)  0.056885        0.892057
           1        (l1_ratio,)  0.002489        0.039030
           2  (alpha, l1_ratio)  0.004394        0.068912
        
        **Note:** The F\ *u is the ratio of the variance caused by the
        hyperparameter itself(v*\ u) to the variance of all trials(v\ *all), so
        all F*\ u sums always equal to 1.See references for more details.
        
        Due to the performance limitations of Python, the functional ANOVA will
        be very slow when the number of hyperparameters is high (more than 5).
        You can end the analysis early by setting the ``max_iter`` parameter. In
        fact, we usually only need the univariate importance, so set the
        ``max_iter`` parameter to equal the number of features for shorter
        runtime.
        
        .. code:: python
        
           importance = hyanova.analyze(df,max_iter=2)
           >>> 100%|██████████████████████████████████| 2/2 [00:00<00:00, 8.12 it/s]
           print(importance)
           >>>              u       v_u  F_u(v_u/v_all)
           0           (alpha,)  0.056885        0.892057
           1        (l1_ratio,)  0.002489        0.039030
        
        .. _header-n97:
        
        Example usage
        -------------
        
        You can use sklearn to do hyperparameters search and then use hyanova to
        analyze the importance of hyperparameters.
        
        .. code:: python
        
           import sklearn.datasets
           from sklearn.model_selection import GridSearchCV
           from sklearn.svm import SVC
           import pandas as pd
           import hyanova
        
           iris = sklearn.datasets.load_iris()
           X = iris.data
           y = iris.target
           model = SVC()
           grid = {'C': np.linspace(1e-9, 128, 10000)
           		'kernel': ('rbf', 'linear', 'poly', 'sigmoid')}
           grid_search = GridSearchCV(model,grid)
           result = grid_search.fit(X, y)
           df = pd.DataFrame(result.cv_results_)
           metric = 'mean_test_score'
           df, params = hyanova.read_df(df,metric)
           importance = hyanova.analyze(df)
        
        .. _header-n100:
        
        Dependencies
        ============
        
        -  numpy
        
        -  pandas
        
        -  tqdm
        
        .. _header-n108:
        
        Why created HyANOVA?
        ====================
        
        I am completing my undergraduate thesis. In order to better understand
        the models used in my article, I looked for a lot of algorithms that can
        measure the importance of hyperparameters. Among them, functional ANOVA
        seems to be the most effective. But the original author's implementation
        is based on java and uses python to call java files, which confuses me.
        I hope there is a module that is easier to understand and implemented
        completely based on python, which can help me with ANOVA decomposition,
        so I created HyANOVA. Hope that will help you too!
        
        .. _header-n110:
        
        References
        ==========
        
        1. Hutter, F., Hoos, H. & Leyton-Brown, K.. (2014). An Efficient
           Approach for Assessing Hyperparameter Importance. Proceedings of the
           31st International Conference on Machine Learning, in PMLR
           32(1):754-762
        
        2. https://github.com/frank-hutter/fanova
        
Keywords: anova,sklearn,hyperparameter,hyperparameter importance
Platform: any
Requires-Python: >=3.6
Description-Content-Type: text/x-rst
