Architecture Overview
TensorPlay is designed with a strictly decoupled, layered architecture. It consists of four core libraries with clear boundaries and unidirectional dependencies.
The 4 Core Libraries
1. P10 (Tensor Computation Engine)
- Role: The foundational "Calculation Engine".
- Design Philosophy:
- Hardware Abstraction: Uses a
Tensorinterface andTensorImplpolymorphism to support multiple hardware backends (CPU, CUDA, Custom Edge Chips) without changing the frontend API. - Zero Differential Logic: Focused purely on efficient tensor kernels and memory management, serving as the stable bedrock for all other layers.
- Dispatcher Pattern: Decouples operator definitions from device-specific implementations, allowing for easy integration of libraries like MKL or cuDNN.
- Hardware Abstraction: Uses a
2. TPX (Autograd Engine)
- Role: The "Differentiation Layer".
- Design Philosophy:
- Decoupled Autograd: Implemented as a lightweight extension layer rather than being baked into the tensor core. It only tracks operations when
requires_grad=True. - Explicit Graph Building: Designed for educational clarity, allowing users to inspect how the computational graph is constructed and traversed during the backward pass.
- Pluggable Engine: Can be replaced or extended with different differentiation modes (e.g., higher-order derivatives) without affecting the underlying P10 engine.
- Decoupled Autograd: Implemented as a lightweight extension layer rather than being baked into the tensor core. It only tracks operations when
3. Stax (Static Graph Accelerator)
- Role: The "Optimization Layer".
- Design Philosophy:
- Optimization-First: Focuses purely on static graph capture, operator fusion, and just-in-time (JIT) compilation to minimize Python overhead.
- Independent Path: Operates on a separate dependency chain from TPX/NN, making it a modular component that can be added or removed based on performance needs.
- Compiler Integration: Designed to interface with advanced compiler backends like MLIR or TVM in the future.
4. NN (Neural Network Library)
- Role: The "Business Layer".
- Design Philosophy:
- User-Friendly Abstraction: Provides a familiar, PyTorch-compatible interface for high-level components like
Linear,Conv2d, andOptimizers. - Blueprint Approach: Every layer is designed to be a clear, readable blueprint, demonstrating how complex neural network components are built from basic tensor operations.
- Pure Dependency: Relies strictly on the public APIs of P10 and TPX, ensuring it remains an optional, non-intrusive layer for high-level modeling.
- User-Friendly Abstraction: Provides a familiar, PyTorch-compatible interface for high-level components like
Dependency Graph
mermaid
graph TD
NN[NN: Neural Networks] --> TPX[TPX: Autograd]
NN --> P10[P10: Computation]
TPX --> P10
Stax[Stax: Static Graph] --> P10
style P10 fill:#f9f,stroke:#333,stroke-width:2px
style TPX fill:#bbf,stroke:#333,stroke-width:1px
style Stax fill:#bfb,stroke:#333,stroke-width:1px
style NN fill:#fbb,stroke:#333,stroke-width:1pxWhy This Architecture?
- Decoupling: Each library can be developed, tested, and optimized independently.
- Extensibility: Adding a new hardware backend only requires changes to P10. Adding a new differentiation mode only affects TPX.
- Performance: Users only pay for what they use. Pure computation tasks don't load the autograd engine or the static graph compiler.
- Educational Value: By separating these concerns, TensorPlay makes it easy for developers to understand the internals of a modern deep learning framework.
