Architecture Overview

TensorPlay is designed with a strictly decoupled, layered architecture. It consists of four core libraries with clear boundaries and unidirectional dependencies.

The 4 Core Libraries

1. P10 (Tensor Computation Engine)

Role: The foundational "Calculation Engine".
Design Philosophy:
- Hardware Abstraction: Uses a Tensor interface and TensorImpl polymorphism to support multiple hardware backends (CPU, CUDA, Custom Edge Chips) without changing the frontend API.
- Zero Differential Logic: Focused purely on efficient tensor kernels and memory management, serving as the stable bedrock for all other layers.
- Dispatcher Pattern: Decouples operator definitions from device-specific implementations, allowing for easy integration of libraries like MKL or cuDNN.

2. TPX (Autograd Engine)

Role: The "Differentiation Layer".
Design Philosophy:
- Decoupled Autograd: Implemented as a lightweight extension layer rather than being baked into the tensor core. It only tracks operations when requires_grad=True.
- Explicit Graph Building: Designed for educational clarity, allowing users to inspect how the computational graph is constructed and traversed during the backward pass.
- Pluggable Engine: Can be replaced or extended with different differentiation modes (e.g., higher-order derivatives) without affecting the underlying P10 engine.

3. Stax (Static Graph Accelerator)

Role: The "Optimization Layer".
Design Philosophy:
- Optimization-First: Focuses purely on static graph capture, operator fusion, and just-in-time (JIT) compilation to minimize Python overhead.
- Independent Path: Operates on a separate dependency chain from TPX/NN, making it a modular component that can be added or removed based on performance needs.
- Compiler Integration: Designed to interface with advanced compiler backends like MLIR or TVM in the future.

4. NN (Neural Network Library)

Role: The "Business Layer".
Design Philosophy:
- User-Friendly Abstraction: Provides a familiar, PyTorch-compatible interface for high-level components like Linear, Conv2d, and Optimizers.
- Blueprint Approach: Every layer is designed to be a clear, readable blueprint, demonstrating how complex neural network components are built from basic tensor operations.
- Pure Dependency: Relies strictly on the public APIs of P10 and TPX, ensuring it remains an optional, non-intrusive layer for high-level modeling.

Dependency Graph

mermaid

graph TD
    NN[NN: Neural Networks] --> TPX[TPX: Autograd]
    NN --> P10[P10: Computation]
    TPX --> P10
    Stax[Stax: Static Graph] --> P10
    
    style P10 fill:#f9f,stroke:#333,stroke-width:2px
    style TPX fill:#bbf,stroke:#333,stroke-width:1px
    style Stax fill:#bfb,stroke:#333,stroke-width:1px
    style NN fill:#fbb,stroke:#333,stroke-width:1px

Why This Architecture?

Decoupling: Each library can be developed, tested, and optimized independently.
Extensibility: Adding a new hardware backend only requires changes to P10. Adding a new differentiation mode only affects TPX.
Performance: Users only pay for what they use. Pure computation tasks don't load the autograd engine or the static graph compiler.
Educational Value: By separating these concerns, TensorPlay makes it easy for developers to understand the internals of a modern deep learning framework.

Architecture Overview ​

The 4 Core Libraries ​

1. P10 (Tensor Computation Engine) ​

2. TPX (Autograd Engine) ​

3. Stax (Static Graph Accelerator) ​

4. NN (Neural Network Library) ​

Dependency Graph ​

Why This Architecture? ​

Architecture Overview

The 4 Core Libraries

1. P10 (Tensor Computation Engine)

2. TPX (Autograd Engine)

3. Stax (Static Graph Accelerator)

4. NN (Neural Network Library)

Dependency Graph

Why This Architecture?