Skip to content

Architecture Overview

TensorPlay is designed with a strictly decoupled, layered architecture. It consists of four core libraries with clear boundaries and unidirectional dependencies.

The 4 Core Libraries

1. P10 (Tensor Computation Engine)

  • Role: The foundational "Calculation Engine".
  • Design Philosophy:
    • Hardware Abstraction: Uses a Tensor interface and TensorImpl polymorphism to support multiple hardware backends (CPU, CUDA, Custom Edge Chips) without changing the frontend API.
    • Zero Differential Logic: Focused purely on efficient tensor kernels and memory management, serving as the stable bedrock for all other layers.
    • Dispatcher Pattern: Decouples operator definitions from device-specific implementations, allowing for easy integration of libraries like MKL or cuDNN.

2. TPX (Autograd Engine)

  • Role: The "Differentiation Layer".
  • Design Philosophy:
    • Decoupled Autograd: Implemented as a lightweight extension layer rather than being baked into the tensor core. It only tracks operations when requires_grad=True.
    • Explicit Graph Building: Designed for educational clarity, allowing users to inspect how the computational graph is constructed and traversed during the backward pass.
    • Pluggable Engine: Can be replaced or extended with different differentiation modes (e.g., higher-order derivatives) without affecting the underlying P10 engine.

3. Stax (Static Graph Accelerator)

  • Role: The "Optimization Layer".
  • Design Philosophy:
    • Optimization-First: Focuses purely on static graph capture, operator fusion, and just-in-time (JIT) compilation to minimize Python overhead.
    • Independent Path: Operates on a separate dependency chain from TPX/NN, making it a modular component that can be added or removed based on performance needs.
    • Compiler Integration: Designed to interface with advanced compiler backends like MLIR or TVM in the future.

4. NN (Neural Network Library)

  • Role: The "Business Layer".
  • Design Philosophy:
    • User-Friendly Abstraction: Provides a familiar, PyTorch-compatible interface for high-level components like Linear, Conv2d, and Optimizers.
    • Blueprint Approach: Every layer is designed to be a clear, readable blueprint, demonstrating how complex neural network components are built from basic tensor operations.
    • Pure Dependency: Relies strictly on the public APIs of P10 and TPX, ensuring it remains an optional, non-intrusive layer for high-level modeling.

Dependency Graph

mermaid
graph TD
    NN[NN: Neural Networks] --> TPX[TPX: Autograd]
    NN --> P10[P10: Computation]
    TPX --> P10
    Stax[Stax: Static Graph] --> P10
    
    style P10 fill:#f9f,stroke:#333,stroke-width:2px
    style TPX fill:#bbf,stroke:#333,stroke-width:1px
    style Stax fill:#bfb,stroke:#333,stroke-width:1px
    style NN fill:#fbb,stroke:#333,stroke-width:1px

Why This Architecture?

  1. Decoupling: Each library can be developed, tested, and optimized independently.
  2. Extensibility: Adding a new hardware backend only requires changes to P10. Adding a new differentiation mode only affects TPX.
  3. Performance: Users only pay for what they use. Pure computation tasks don't load the autograd engine or the static graph compiler.
  4. Educational Value: By separating these concerns, TensorPlay makes it easy for developers to understand the internals of a modern deep learning framework.

Released under the Apache 2.0 License.

📚DeepWiki