Karma: Adaptive Video Streaming
via Causal Sequence Modeling


Existing schemes via observation-to-action mapping.


Karma's core idea: causal sequence modeling

Abstract

Optimal adaptive bitrate (ABR) decision depends on a comprehensive characterization of state transitions that involve interrelated modalities over time including environmental observations, returns, and actions. However, state-of-the-art learning-based ABR algorithms solely rely on past observations to decide the next action. This paradigm tends to cause a chain of deviations from optimal action when encountering unfamiliar observations, which consequently undermines the model generalization.
This paper presents Karma, an ABR algorithm that utilizes causal sequence modeling to improve generalization by comprehending the interrelated causality among past observations, returns, and actions and timely refining action when deviation occurs. Unlike direct observation-to-action mapping, Karma recurrently maintains a multi-dimensional time series of observations, returns, and actions as input and employs causal sequence modeling via a decision transformer to determine the next action. In the input sequence, Karma uses the maximum cumulative future quality of experience (QoE) (a.k.a, QoE-to-go) as an extended return signal, which is periodically estimated based on current network conditions and playback status. We evaluate Karma through trace-driven simulations and real-world field tests, demonstrating superior performance compared to existing state-of-the-art ABR algorithms, with an average QoE improvement ranging from 10.8% to 18.7% across diverse network conditions. Furthermore, Karma exhibits strong generalization capabilities, showing leading performance under unseen networks in both simulations and real-world tests.

Training

To train Karma, we must provide a set of extended expert trajectories as training samples, each containing a tuple of observations, corresponding QoE-to-go, and optimal action. To this aim, we introduce a simulator to simulate the video streaming environment faithfully, which largely accelerates the process of producing extended expert trajectories. As the accurate QoE-to-go is unavailable until the video streaming session ends, we first train a QoE-to-go estimator under synthetic network traces to generate the estimated QoE-to-go modality of extended expert trajectory recurrently based on current observations. Then we use the dynamic programming (DP) algorithm as an ABR method under real traces to generate the observation and corresponding action modalities of extended expert trajectory. Finally, a causal decision transformer is trained based on these extended expert trajectories.

Inference

To select a bitrate for video chunk t, Karma first utilizes a multi-dimensional sequence of observations, estimated QoE-to-go, and actions for the past K chunks as input, which totally possess 3K-1 tokens (not including a_t token which is to be predicted). These tokens are then combined with their timesteps to execute positional encoding. Finally, Karma passes the positional-encoded tokens into the trained causal decision transformer to predict the bitrate a_t of the video chunk t.

Experiments

Comparing Karma with existing ABR algorithms on FCC network traces.

Comparing Karma with existing ABR algorithms on Norway network traces.

Comparing Karma with existing ABR algorithms on unexperienced Oboe network traces.

Citation

Acknowledgements

The website template was borrowed from Michaël Gharbi.