Sematic is designed to maximize the productivity of ML Engineers and their teams.
Visualizations, metrics, logs, exceptions, infrastructure traces and more are surfaced in real-time in the dashboard UI to make it as efficient as possible to iterate on complex Machine Learning pipelines.
Here's a sample of Sematic's best features.
Sematic enables iterating on pipelines locally for easier development and debugging. Pipelines can run against a local or deployed metadata sever. The same code can then run at larger scale on a Kubernetes cluster.
Sematic can orchestrate large scale pipelines on Kubernetes clusters. Users can specify required resources on a per-function basis (GPUs, CPUs, memory, etc.) and leverage multiple nodes for distributed compute.
The Sematic web Dashboard lets users monitor pipelines, visualize artifacts and metrics, investigate failures, and collaborate with team mates. Pipelines can be replayed from the dashboard and results can be shared easily.
Sematic's lightweight Python SDK makes it extremely easy to convert arbitrary business logic into an orchestrated pipeline. In Sematic, everything is Python-centric: business logic, DAG definition, resource requirement, visualizations, etc.
The Sematic CLI lets users run pipelines, submit them to run on their cluster, list resources and configure Sematic.
Sematic persists and tracks all assets pertaining to pipeline executions. Code, configuration, resources used, inputs and outputs of all functions. Sematic keeps a source of truth of all runs, enabling traceability and reproducibility.
Sematic displays all produced plots, images, configurations, dataframes, etc. in the web dashboard. Visualizations can be customized from the pipeline's Python code and then shared to team mates from the dashboard.
At runtime, Sematic packages pipeline code and its dependencies (user code, Python dependencies, static libraries, hardware drivers, etc.) and ships it to the Kubernetes cluster. This ensures the fastest possible iteration loop to visualize results at scale.
Sematic lets users log timeseries metrics from ongoing jobs and visualize them in real-time in the dashboard (e.g. loss curves, learning rates, etc.). This provides greater visibility into workloads for optimization and early stoppage.
Sematic surfaces workloads container logs directly in the dashboard to accelerate debugging and increase observability.
Each function in a Sematic pipeline can request custom resources (GPUs, CPUs, memory, Ray clusters, etc.). Sematic will dynamically allocate these resources at runtime and run the corresponding workloads.
Sematic scales as much as the underlying Kubernetes on which it is deployed, enabling access to a large variety of VM types, GPUs, and hardware profiles. Workloads can also scale horizontally thanks to Sematic's Ray integration.
Sematic integrates with Ray to let workloads spin Ray clusters up and down at runtime with only a few lines of code. This enables parallelized data processing and distributed training.
Sematic enables caching pipeline steps whose inputs are unchanged between runs. This can greatly accelerate development workflows and debugging sessions, and dramatically reduces resource usage and costs.
Sematic enables fault tolerant pipelines by catching transient failures and retrying workloads in order to optimize resource usage and costs. Never let a network failure crash a workload.
Sematic enables sharing results and collaboration with team members with run-specific notes and tags for better organization of workloads.