Metadata-Version: 2.4
Name: etops
Version: 26.2.1
Summary: Einsum tree operations.
Keywords: einsum trees,tensor operations
License-Expression: MIT AND BSD-3-Clause AND BSD-2-Clause
License-File: LICENSE
Project-URL: Homepage, https://github.com/scalable-analyses/einsum_ir
Project-URL: Source, https://github.com/scalable-analyses/einsum_ir
Project-URL: Issues, https://github.com/scalable-analyses/einsum_ir/issues
Requires-Python: >=3.8
Requires-Dist: numpy>=1.20
Description-Content-Type: text/x-rst

etops
=====

The `etops` package provides a Python interface for the Tiled Execution IR (TEIR). It enables users to define, configure, optimize, and execute complex tensor contractions and elementwise operations. The package is built on top of the einsum_ir C++ backend and exposes advanced features such as dimension fusion, dimension splitting, and backend-specific optimizations.

Main Features
-------------
- Abstractions for tensor operations and configuration
- Support for multiple data types (float32, float64)
- Primitive operations: zero, copy, relu, gemm, brgemm, etc.
- Dimension execution strategies: primitive, sequential, shared, space-filling curve (SFC)
- Dimension and stride configuration for advanced memory layouts
- Interface for built-in contraction optimizer
- Pythonic API with dataclass-based configuration

Installation
------------
Install the package using pip:

.. code-block:: bash

    pip install etops

Unary Examples
--------------
Below are some examples showing how to configure and execute unary tensor operations:

.. code-block:: python

    import etops

    # ---------------------------------------
    # First example:
    #   Matrix transpose using copy primitive
    #   Compares the result with NumPy
    # ---------------------------------------
    # Define a transpose configuration
    top_config = etops.TensorOperationConfig(
        backend    =   "tpp",
        data_type  =   etops.float32,
        prim_first =   etops.prim.none,
        prim_main  =   etops.prim.copy,
        prim_last  =   etops.prim.none,
        dim_types  =   (etops.dim.c,     etops.dim.c    ),
        exec_types =   (etops.exec.prim, etops.exec.prim),
        dim_sizes  =   (3,               4              ),
        strides    = (((4,               1               ),   # in
                       (1,               3               )),) # out
    )

    # Create the TensorOperation instance
    top = etops.TensorOperation(top_config)

    # Create input and output arrays
    import numpy as np
    A = np.random.randn(3,4).astype(np.float32)
    B = np.random.randn(4,3).astype(np.float32)

    top.execute(A, None, B)

    B_np = np.einsum("ij->ji", A)

    # Check correctness
    error_abs = np.max( np.abs(B - B_np) )
    print("Matrix Transpose using copy primitive:")
    print(f"  Max absolute error: {error_abs:.6e}")

    # -------------------------------------------------
    # Second example:
    #   Permutation of a 4D tensor using copy primitive
    #   Compares the result with NumPy
    # -------------------------------------------------
    # Define a permutation configuration
    perm_config = etops.TensorOperationConfig(
        backend     =   "tpp",
        data_type   =   etops.float32,
        prim_first  =   etops.prim.none,
        prim_main   =   etops.prim.copy,
        prim_last   =   etops.prim.none,
        dim_types   =   (etops.dim.c,    etops.dim.c,     etops.dim.c,     etops.dim.c    ),
        exec_types  =   (etops.exec.seq, etops.exec.seq,  etops.exec.prim, etops.exec.prim),
        dim_sizes   =   (2,              4,               3,               5              ),
        strides     = (((3*4*5,          5,               4*5,             1              ),   # in
                        (3,              2*3,             1,               4*2*3          )),) # out
    )

    # Create the TensorOperation instance
    perm_op = etops.TensorOperation(perm_config)

    # Create input and output arrays
    A = np.random.randn(2,3,4,5).astype(np.float32)
    B = np.random.randn(5,4,2,3).astype(np.float32)

    perm_op.execute(A, None, B)

    B_np = np.einsum("abcd->dcab", A)

    # Check correctness
    error_abs = np.max( np.abs(B - B_np) )
    print("4D Tensor Permutation using copy primitive:")
    print(f"  Max absolute error: {error_abs:.6e}")

    # -------------------------------------------------
    # Third example:
    #   Permutation of a 4D tensor using copy primitive
    #   Uses the built-in optimization routine
    #   Compares the result with NumPy
    # -------------------------------------------------
    perm_config = etops.TensorOperationConfig(
        data_type   =   etops.float32,
        prim_first  =   etops.prim.none,
        prim_main   =   etops.prim.copy,
        prim_last   =   etops.prim.none,
        dim_types   =   (etops.dim.c,    etops.dim.c,     etops.dim.c,    etops.dim.c   ),
        exec_types  =   (etops.exec.seq, etops.exec.seq,  etops.exec.seq, etops.exec.seq),
        dim_sizes   =   (2,              4,               3,              5             ),
        strides     = (((3*4*5,          5,               4*5,            1             ),   # in
                        (3,              2*3,             1,              4*2*3         )),) # out
    )

    # Use default optimization config
    optimized_config = etops.optimize(perm_config)

    # Create the TensorOperation instance
    perm_op = etops.TensorOperation(optimized_config)

    # Create input and output arrays
    A = np.random.randn(2,3,4,5).astype(np.float32)
    B = np.random.randn(5,4,2,3).astype(np.float32)

    # Execute the operation
    perm_op.execute(A, None, B)

    B_np = np.einsum("abcd->dcab", A)

    # Check correctness
    error_abs = np.max( np.abs(B - B_np) )
    print("4D Tensor Permutation using optimized copy primitive:")
    print(f"  Max absolute error: {error_abs:.6e}")

Binary Examples
---------------
Below are some examples showing how to configure and execute binary tensor operations:

.. code-block:: python

    import etops

    # -----------------------------------------
    # First example:
    #   Column-major GEMM operation
    #   Compares the result with NumPy's einsum
    # -----------------------------------------
    # Define a column-major GEMM configuration
    top_config = etops.TensorOperationConfig(
        backend    =   "tpp",
        data_type  =   etops.float32,
        prim_first =   etops.prim.zero,
        prim_main  =   etops.prim.gemm,
        prim_last  =   etops.prim.none,
        dim_types  =   (etops.dim.m,     etops.dim.n,     etops.dim.k    ),
        exec_types =   (etops.exec.prim, etops.exec.prim, etops.exec.prim),
        dim_sizes  =   (64,              32,              128            ),
        strides    = (((1,               0,               64             ),   # in0
                       (0,               128,             1              ),   # in1
                       (1,               64,              0              )),) # out
    )

    # Create the TensorOperation instance
    top = etops.TensorOperation(top_config)

    # Create input and output arrays
    import numpy as np
    A = np.random.randn(128,64).astype(np.float32)
    B = np.random.randn(32,128).astype(np.float32)
    C = np.random.randn(32, 64).astype(np.float32)

    # Execute the operation
    top.execute(A, B, C)

    C_np = np.einsum("km,nk->nm", A, B)

    # Compute absolute and relative errors
    error_abs = np.max( np.abs(C - C_np) )
    error_rel = np.max( np.abs(C - C_np) / (np.abs(C_np) + 1e-8) )
    print("Column-major GEMM operation:")
    print(f"  Max absolute error: {error_abs:.6e}")
    print(f"  Max relative error: {error_rel:.6e}")

    # -----------------------------------------
    # Second example:
    #   Batched GEMM operation
    #   Compares the result with torch's einsum
    # -----------------------------------------
    # Define a batched GEMM configuration
    batched_config =    etops.TensorOperationConfig(
        backend    =    "tpp",
        data_type  =    etops.float32,
        prim_first =    etops.prim.zero,
        prim_main  =    etops.prim.gemm,
        prim_last  =    etops.prim.none,
        dim_types  =   (etops.dim.c,       etops.dim.m,     etops.dim.n,     etops.dim.k    ),
        exec_types =   (etops.exec.shared, etops.exec.prim, etops.exec.prim, etops.exec.prim),
        dim_sizes  =   (48,                64,              32,              128            ),
        strides    = (((128*64,            1,               0,               64             ),   # in0
                       (32*128,            0,               128,             1              ),   # in1
                       (32*64,             1,               64,              0              )),) # out
    )
    # Create the batched TensorOperation instance
    top = etops.TensorOperation(batched_config)

    import torch
    # Create input and output arrays
    A = torch.randn(48, 128, 64, dtype=torch.float32)
    B = torch.randn(48, 32, 128, dtype=torch.float32)
    C = torch.randn(48, 32, 64,  dtype=torch.float32)

    # Execute the operation
    top.execute(A, B, C)

    C_torch = torch.einsum("bkm,bnk->bnm", A, B)

    # Compute absolute and relative errors
    error_abs = torch.max(torch.abs(C - C_torch))
    error_rel = torch.max(torch.abs(C - C_torch) / (torch.abs(C_torch) + 1e-8))

    print("Batched GEMM operation:")
    print(f"  Max absolute error: {error_abs:.6e}")
    print(f"  Max relative error: {error_rel:.6e}")

    #--------------------------------------------
    # Third example:
    #   GEMM operation with row-major first input
    #   packed to column-major
    #   Compares the result with NumPy's einsum
    # -------------------------------------------

    # Define a row-major GEMM configuration with packing
    top_config = etops.TensorOperationConfig(
        backend    =   "tpp",
        data_type  =   etops.float32,
        prim_first =   etops.prim.zero,
        prim_main  =   etops.prim.gemm,
        prim_last  =   etops.prim.none,
        dim_types  =   (etops.dim.m,     etops.dim.n,     etops.dim.k    ),
        exec_types =   (etops.exec.prim, etops.exec.prim, etops.exec.prim),
        dim_sizes  =   (64,              32,              128            ),
        strides    = (((1,               0,               64             ),   # in 0
                       (0,               128,             1              ),   # in 1
                       (1,               64,              0              )),  # out
                      ((128,             0,               1              ),   # packing in 0
                       (0,               0,               0              ),   # packing in 1
                       (0,               0,               0              )),) # packing out
    )

    # Create the TensorOperation instance
    top = etops.TensorOperation(top_config)

    # Create input and output arrays
    import numpy as np
    A = np.random.randn(64,128).astype(np.float32)
    B = np.random.randn(32,128).astype(np.float32)
    C = np.random.randn(32, 64).astype(np.float32)

    # Execute the operation
    top.execute(A, B, C)

    A_T = np.transpose(A)
    C_np = np.einsum("km,nk->nm", A_T, B)

    # Compute absolute and relative errors
    error_abs = np.max( np.abs(C - C_np) )
    error_rel = np.max( np.abs(C - C_np) / (np.abs(C_np) + 1e-8) )
    print("GEMM operation with packing:")
    print(f"  Max absolute error: {error_abs:.6e}")
    print(f"  Max relative error: {error_rel:.6e}")

    # -----------------------------------------------
    # Fourth example:
    #   Batch-reduce GEMM operation with optimization
    #   Compares the result with torch's einsum
    # -----------------------------------------------
    # Define a batch-reduce GEMM configuration
    batched_config =   etops.TensorOperationConfig(
        data_type  =   etops.float32,
        prim_first =   etops.prim.zero,
        prim_main  =   etops.prim.gemm,
        prim_last  =   etops.prim.none,
        dim_types  =   (etops.dim.k,    etops.dim.m,    etops.dim.n,    etops.dim.k   ),
        exec_types =   (etops.exec.seq, etops.exec.seq, etops.exec.seq, etops.exec.seq),
        dim_sizes  =   (48,             64,             32,             128           ),
        strides    = (((128*64,         1,              0,              64            ),   # in0
                       (32*128,         0,              128,            1             ),   # in1
                       (0,              1,              64,             0             )),) # out
    )

    # Optimize the configuration
    optimized_config = etops.optimize(
        batched_config,
        {
            "target_m":            16,
            "target_n":            12,
            "target_k":            64,
            "num_threads":         4,
            "br_gemm_support":     True,
            "packed_gemm_support": True
        }
    )

    # Create the optimized TensorOperation instance
    top = etops.TensorOperation(optimized_config)

    import torch
    # Create input and output arrays
    A = torch.randn(48, 128, 64, dtype=torch.float32)
    B = torch.randn(48, 32, 128, dtype=torch.float32)
    C = torch.randn(    32, 64,  dtype=torch.float32)

    # Execute the operation
    top.execute(A, B, C)

    C_torch = torch.einsum("bkm,bnk->nm", A, B)

    # Compute absolute and relative errors
    error_abs = torch.max(torch.abs(C - C_torch))
    error_rel = torch.max(torch.abs(C - C_torch) / (torch.abs(C_torch) + 1e-8))
    print("Batch-reduce GEMM operation:")
    print(f"  Max absolute error: {error_abs:.6e}")
    print(f"  Max relative error: {error_rel:.6e}")

See the source code and inline documentation for more advanced usage.
