Karma: Adaptive Video Streaming
via Causal Sequence Modeling
Existing schemes via observation-to-action mapping.
Karma's core idea: causal sequence modeling
Abstract
Optimal adaptive bitrate (ABR) decision depends on a comprehensive characterization of state
transitions that involve interrelated modalities over time including environmental observations,
returns, and actions. However, state-of-the-art learning-based ABR algorithms solely rely on past
observations to decide the next action. This paradigm tends to cause a chain of deviations from
optimal action when encountering unfamiliar observations, which consequently undermines the model
generalization.
This paper presents Karma, an ABR algorithm that utilizes causal sequence modeling to improve
generalization by comprehending the interrelated
causality among past observations, returns, and actions and timely refining action when deviation
occurs. Unlike direct observation-to-action mapping, Karma recurrently maintains a multi-dimensional
time series of observations, returns, and actions as input and employs causal sequence modeling via
a decision transformer to determine the next action. In the input sequence, Karma uses the maximum
cumulative future quality of experience (QoE) (a.k.a, QoE-to-go) as an extended return
signal, which is periodically estimated based on current network conditions and playback status. We
evaluate Karma through trace-driven simulations and real-world field tests, demonstrating superior
performance compared to existing state-of-the-art ABR algorithms, with an average QoE improvement
ranging from 10.8% to 18.7% across diverse network conditions. Furthermore, Karma exhibits
strong generalization capabilities, showing leading performance under unseen networks in both
simulations and real-world tests.
Training
To train Karma, we must provide a set of extended expert trajectories as training samples, each containing a tuple of observations, corresponding QoE-to-go, and optimal action. To this aim, we introduce a simulator to simulate the video streaming environment faithfully, which largely accelerates the process of producing extended expert trajectories. As the accurate QoE-to-go is unavailable until the video streaming session ends, we first train a QoE-to-go estimator under synthetic network traces to generate the estimated QoE-to-go modality of extended expert trajectory recurrently based on current observations. Then we use the dynamic programming (DP) algorithm as an ABR method under real traces to generate the observation and corresponding action modalities of extended expert trajectory. Finally, a causal decision transformer is trained based on these extended expert trajectories.
Inference
To select a bitrate for video chunk t, Karma first utilizes a multi-dimensional sequence of observations, estimated QoE-to-go, and actions for the past K chunks as input, which totally possess 3K-1 tokens (not including a_t token which is to be predicted). These tokens are then combined with their timesteps to execute positional encoding. Finally, Karma passes the positional-encoded tokens into the trained causal decision transformer to predict the bitrate a_t of the video chunk t.
Experiments
Comparing Karma with existing ABR algorithms on FCC network traces.
Comparing Karma with existing ABR algorithms on Norway network traces.
Comparing Karma with existing ABR algorithms on unexperienced Oboe network traces.
Citation
Acknowledgements
The website template was borrowed from Michaël Gharbi.