Deterministic Kubernetes Autoscaling

Expose the real bottleneck before you scale the wrong service.

ThriveScale is built to move beyond threshold-only autoscaling. It combines service-level QoS, runtime dependency awareness, and low-level kernel evidence to determine whether a microservice should scale, or whether the delay is really coming from a downstream dependency.

3 core research gaps addressed in the current scope
2s target control interval for near-real-time decisions
eBPF kernel observability without mandatory service mesh dependency

The Problem Space

ThriveScale focuses on microservice autoscaling when latency is user-visible, but the root cause is not obvious from coarse infrastructure signals alone.

Threshold-only scaling can be misleading

CPU spikes, low-level scheduler noise, or short bursts do not always mean a service should scale. Scaling the wrong component wastes replicas and still leaves the user-facing latency problem unresolved.

Dependency delay is easy to misread

A service may violate its SLO because it is waiting on another service, a datastore, or an external dependency. In that case, scaling the root service alone can make the system noisier rather than healthier.

Operators need explainability

If a framework scales automatically, operators need to know why. ThriveScale keeps decision traces, topology context, and bottleneck hints visible so the scaling outcome can be reviewed rather than guessed.

Research Gaps Addressed

The current implementation scope is organized around the three gaps identified in the thesis and reflected in the live system.

Gap 1: External-tool dependence for QoS and dependency visibility

ThriveScale collects kernel-side runtime evidence and service-level truth without requiring a service mesh as a mandatory part of the core decision path.

Gap 2: Difficulty identifying the true bottleneck

The framework combines QoS pressure, throughput context, dependency structure, service handling delay, dependency delay, and run queue evidence to decide whether scaling is locally useful.

Gap 3: Lack of deterministic and explainable autoscaling

Instead of requiring ML, DL, or RL model training at runtime, ThriveScale uses explicit deterministic decision logic, cooldown control, and stored decision traces that operators can inspect directly.

What You Can Do In The Dashboard

The live dashboard is the operator surface for monitoring, explanation, control, and support workflow in the current implementation.

Observe Service Health

View P90 latency, SLO target, throughput, service handling latency, dependency delay, external wait, run queue behavior, health state, and bottleneck hints for each service.

See Runtime Dependencies

Inspect the live dependency map to understand how traffic flows and which components are likely contributing to downstream latency.

Control The System

Start or stop controlled traffic, update SLO settings, scale a deployment to minimum, or set replicas manually when validation or intervention is needed.

Review Evidence

Inspect alerts, decision traces, audit events, and support tickets so the autoscaler remains understandable, reviewable, and operationally usable.

How To Use It

  1. Open the dashboard and confirm the target services and their current health state.
  2. Review or update the SLO and replica bounds for the selected service.
  3. Start controlled traffic when you want to exercise the autoscaler under load.
  4. Watch the dependency map, alerts, and trace log to understand what the controller is seeing.
  5. Use manual controls or the Support Desk when validation, intervention, or assistance is needed.

Why The Dashboard Matters

The dashboard is not only a visual layer. It is how ThriveScale exposes the reasoning behind scaling decisions. Operators can see when latency is local, when a dependency is dominating, and when evidence is too weak for safe action.

Ready to explore the live control plane?

Open the live dashboard to view metrics, dependency relationships, alerts, decision traces, manual controls, and the built-in Support Desk.

Contact Us To Set Up ThriveScale

If you want to install, validate, or tune ThriveScale for your own cluster, start with the built-in support workflow and the deployment guidance already provided with the project.

Recommended Setup Path

1. Use the deployment guide to bring up ThriveScale and the target microservice environment.
2. Open the dashboard and use the Support Desk panel to create a setup or support ticket.
3. Review alerts, traces, and audit events as the system is tuned for your workload.