Instagram: Designing a Seamless Dash Exoplayer Streaming Experience: From Chunked Video Processing to Low-Bandwidth Playback [Practical Web+Androd]

Shubham Kumar Gupta
11 min readMar 3, 2025

--

In today’s fast-paced mobile world, delivering smooth, adaptive video playback — especially under low-bandwidth conditions — is a real challenge. This project demonstrates an end-to-end solution inspired by Instagram’s reels, where videos start playing immediately with minimal data and progressively improve in quality as more segments download. The implementation spans a `NodeJS` backend for video processing and an Android app built with a clean MVVM architecture, using ExoPlayer and a custom caching manager.

Why this ?

I and my manager were having a discussion how Instagram works?, when I scroll I see video plays awesome from start but after 2sec it buffers and then again start playing from there, so we can keep scrolling till few videos and all will play smooth at start. We came to a conclusion that they may be using chunks of video and downloading only first chunk on low quality and playing them, once user stays on the page they start downloading next chunks on high quality and it downloads only those needed chunks, so when high quality chunks are getting downloaded low quality chunks of rest videos should be de-prioritized.

We will be building!

So, I wanted to mimic Instagram and build something for myself that acts like this with control over video and audio playback separately and also running over low network bandwidth conditions, for example, why an entire 15-second video should be downloaded for a short if the user may not even watch it fully? so why not share only what the user needs or wants?

I downloaded a few videos from Instagram and made them split into chunks, each of 3 seconds with various qualities like low, medium, and high. One can use it locally as well for the part that is downloaded without the internet; once the app is killed, it will remove downloaded chunks!

  • Splits videos into short chunks (typically 3 seconds each) in multiple quality levels.
  • Extracts audio as a separate stream for synchronized playback.
  • Uses a DASH manifest to dynamically select the best quality based on available bandwidth.
  • Preloads low-quality first chunk for audio and video for a quick start and progressively downloads higher-quality segments if the user continues watching.

This approach minimizes wasted data, reduces startup delays, and ensures a responsive user experience on mobile networks.

Converting a Reel into DASH Segments

Instead of downloading an entire video (for example, a 15-second reel) at once, this solution processes the video file (e.g., reel.mp4) and converts it into various segmented files such as:

Video Segments:

  • segment_low_1.m4s, segment_low_2.m4s, … CHUNKS
  • segment_medium_1.m4s,segment_medium_2.m4s, ... CHUNKS
  • segment_high_1.m4s , ...

Audio Segment:

  • segment_audio_1.m4s, ...

Initialization File:

  • segment_init.mp4, segment_audio_init.mp4

Manifest File:

  • manifest.mpd (for DASH)

Backend Video Processing: From Raw Footage to DASH Segments

The server-side component is built with NodeJS and employs FFmpeg for transcoding. The workflow consists of several key steps:

  1. Multiple Quality Versions:
    The video is re-encoded into different quality levels — low, medium, and high — based on the original resolution and bitrate. By varying the bitrate, the system offers a tailored experience depending on the user’s network conditions.
  2. Audio Extraction:
    Audio is extracted into a separate stream, ensuring that it remains in sync with the video regardless of the selected quality. This separation also allows independent buffering strategies for audio and video.
  3. Segmenting the Video:
    With each quality version created, the video is split into smaller chunks (typically 3 seconds long). This segmentation is vital for DASH, as it allows the client to download only the required portions of the video.
  4. DASH Manifest Generation:
    Using MP4Box from the GPAC toolkit, a DASH manifest (MPD file) is generated. This manifest outlines how the video is divided across multiple quality representations and includes a dedicated adaptation set for audio. A simplified example of the manifest generation command is:
MP4Box -dash 3000 -frag 3000 -rap -segment-name segment_ -out "manifest.mpd" \   "temp/video_low.mp4#video" \   "temp/video_medium.mp4#video" \   "temp/video_high.mp4#video" \   "temp/video_audio.mp4#audio"
  1. The generated MPD file contains essential information like segment durations, start numbers, and initialization segments, which guides the client in adaptive streaming.
  2. This structure supports multiple quality levels for video and ensures that the audio is handled separately yet synchronized, facilitating adaptive bitrate streaming.

A typical DASH manifest generation script (simplified) looks like this for sepearte audio and video chunks:

<?xml version="1.0"?>
<!-- MPD file Generated with GPAC version 2.4-revrelease at 2025-03-02T22:47:19.998Z -->
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" minBufferTime="PT1.500S" type="static" mediaPresentationDuration="PT0H0M14.815S" maxSegmentDuration="PT0H0M3.024S" profiles="urn:mpeg:dash:profile:full:2011">
<ProgramInformation moreInformationURL="https://gpac.io">
<Title>manifest.mpd generated by GPAC</Title>
</ProgramInformation>

<Period duration="PT0H0M14.815S">
<AdaptationSet segmentAlignment="true" maxWidth="360" maxHeight="640" maxFrameRate="30" par="9:16" bitstreamSwitching="true" startWithSAP="1">
<SegmentTemplate media="segment_$Number$.m4s" initialization="segment_init.mp4" timescale="15360" startNumber="1" duration="46080"/>
<Representation id="1" mimeType="video/mp4" codecs="avc1.64001E" width="360" height="640" frameRate="30" sar="1:1" bandwidth="353000">
</Representation>
<Representation id="2" mimeType="video/mp4" codecs="avc1.64001E" width="360" height="640" frameRate="30" sar="1:1" bandwidth="404000">
<SegmentTemplate media="segment__r1_$Number$.m4s" timescale="15360" startNumber="1" duration="46080"/>
</Representation>
<Representation id="3" mimeType="video/mp4" codecs="avc1.64001E" width="360" height="640" frameRate="30" sar="1:1" bandwidth="505000">
<SegmentTemplate media="segment__r2_$Number$.m4s" timescale="15360" startNumber="1" duration="46080"/>
</Representation>
</AdaptationSet>
<AdaptationSet segmentAlignment="true" startWithSAP="1">
<SegmentTemplate media="segment__r3_$Number$.m4s" initialization="segment__r3_init.mp4" timescale="44100" startNumber="1" duration="132300"/>
<Representation id="4" mimeType="audio/mp4" codecs="mp4a.40.2" audioSamplingRate="44100" bandwidth="128000">
<AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"/>
</Representation>
</AdaptationSet>
</Period>
</MPD>

The backend uses FFmpeg and MP4Box (from the GPAC toolkit) to re-encode the video into different quality levels and to split the content into 3-second chunks.

Android Integration: ExoPlayer and a Smart Caching Strategy

On the client side, the Android application leverages Media3 ExoPlayer to manage video playback. The design is centered around a clean MVVM architecture, ensuring that the user interface remains responsive and the playback logic is robust.

ExoPlayer Provider

A singleton class configures ExoPlayer with custom buffer settings and track selection parameters that optimize for mobile devices. For instance, the player is configured with:

  • A minimum buffer duration to ensure smooth playback.
  • A maximum buffer duration to avoid excessive memory usage.
  • Custom track selection to favor efficient codecs (e.g., H.264) and cap the resolution to reduce bandwidth consumption.

Enhanced ExoPlayer Configuration

The getPlayer() function in the ExoPlayerProvider is designed to deliver smooth, adaptive playback by tuning buffering parameters, selecting the optimal track, and ensuring compatibility with DASH manifests. Here’s an extended snippet that highlights the key aspects of this setup:

@UnstableApi
@Singleton
class ExoPlayerProvider @Inject constructor(
@ApplicationContext private val context: Context,
private val localIpAddress: String // my pc IP Address!
) {
companion object {
private const val MAX_BUFFER_MS = 30000 // Maximum buffering time (30 seconds)
private const val MIN_BUFFER_MS = 2000 // Minimum buffer needed (2 seconds)
private const val BUFFER_FOR_PLAYBACK_MS = 1000 // Buffer required to start playback (1 second)
}
// Create and configure an ExoPlayer instance with advanced settings
fun getPlayer(): ExoPlayer {
// Configure load control for optimal buffering
val loadControl = DefaultLoadControl.Builder()
.setBufferDurationsMs(
MIN_BUFFER_MS,
MAX_BUFFER_MS,
BUFFER_FOR_PLAYBACK_MS,
BUFFER_FOR_PLAYBACK_MS
)
.setPrioritizeTimeOverSizeThresholds(true)
.build()
// Set up track selection parameters to favor performance and efficient streaming
val trackSelector = DefaultTrackSelector(context, RandomTrackSelection.Factory(0)).apply {
parameters = buildUponParameters()
.setPreferredVideoMimeType(MimeTypes.VIDEO_H264)
.setAllowVideoMixedMimeTypeAdaptiveness(true)
.setAllowVideoNonSeamlessAdaptiveness(true)
.setSelectUndeterminedTextLanguage(true)
.setMaxVideoSize(720, 1280) // Cap video resolution to reduce bandwidth usage
.setForceHighestSupportedBitrate(false)
.setExceedRendererCapabilitiesIfNecessary(true)
.setTunnelingEnabled(true)
.build()
}
// Build the ExoPlayer instance with the customized load control and track selector
val player = ExoPlayer.Builder(context)
.setLoadControl(loadControl)
.setMediaSourceFactory(DefaultMediaSourceFactory(context))
.setRenderersFactory(
DefaultRenderersFactory(context)
.setExtensionRendererMode(DefaultRenderersFactory.EXTENSION_RENDERER_MODE_PREFER)
)
.setTrackSelector(trackSelector)
.build()
player.repeatMode = Player.REPEAT_MODE_OFF
// Add analytics and error listeners for detailed logging and debugging
player.addAnalyticsListener(object : AnalyticsListener {
override fun onVideoInputFormatChanged(eventTime: AnalyticsListener.EventTime, format: Format, decoderReuseEvaluation: DecoderReuseEvaluation?) {
Log.d("DASHPlayer", "Video format changed: ${format.sampleMimeType}, ${format.width}x${format.height}")
}
override fun onLoadCompleted(eventTime: AnalyticsListener.EventTime, loadEventInfo: LoadEventInfo, mediaLoadData: MediaLoadData) {
Log.d("DASHPlayer", "Load completed: ${loadEventInfo.uri}, dataType=${mediaLoadData.dataType}")
}
})
// Optional: Additional listener to debug playback state transitions
player.addListener(object : Player.Listener {
override fun onPlaybackStateChanged(state: Int) {
val stateName = when (state) {
Player.STATE_IDLE -> "IDLE"
Player.STATE_BUFFERING -> "BUFFERING"
Player.STATE_READY -> "READY"
Player.STATE_ENDED -> "ENDED"
else -> "UNKNOWN"
}
Log.d("DASHPlayer", "Playback state changed: $stateName")
}
})
return player
}
}

This extended configuration not only optimizes the buffering and track selection but also adds comprehensive logging for analytics. These enhancements enable developers to fine-tune the playback performance and troubleshoot issues in real-time.

Custom Video Cache Manager

The caching component is critical for ensuring that initial segments are available immediately upon video selection. This manager:

  • Checks if the essential segments (like the initialization and first video/audio chunks) are already cached.
  • Downloads missing segments in the background.
  • Preloads additional chunks to maintain smooth playback even if the network speed fluctuates.
  • Cleans up older segments to prevent cache bloating.

This proactive caching strategy reduces the likelihood of interruptions during playback and contributes to a seamless viewing experience.

Key Caching Mechanisms

  1. Manifest Downloading:
    The cache manager first downloads the DASH manifest if it isn’t already cached. This snippet demonstrates how the manifest is fetched and saved locally:
private suspend fun downloadManifest(video: VideoResponse): Boolean {
val videoId = video.id ?: return false
val manifestUrl = video.dashManifest ?: return false

Log.d(TAG, "Downloading manifest for video $videoId from $manifestUrl")
return try {
withContext(Dispatchers.IO) {
val request = Request.Builder().url(manifestUrl).build()
val response = okHttpClient.newCall(request).execute()
if (!response.isSuccessful) {
Log.e(TAG, "Manifest download failed: ${response.code}")
throw IOException("Failed to download manifest: ${response.code}")
}
val videoDir = getVideoCacheDir(videoId)
val manifestFile = File(videoDir, "manifest.mpd")
manifestFile.outputStream().use { outputStream ->
response.body?.byteStream()?.copyTo(outputStream)
}
Log.d(TAG, "Manifest downloaded successfully, size=${manifestFile.length()}")
true
}
} catch (e: Exception) {
Log.e(TAG, "Error downloading manifest: ${e.message}")
false
}
}
  1. Downloading Initial Segments:
    Before playback, essential segments (e.g., sgment_init.mp4 and the first video/audio chunks) are downloaded. This proactive approach ensures a smooth playback start, even on low-bandwidth networks.
  2. Preloading Subsequent Segments:
    The cache manager also preloads further segments in the background. This mechanism prevents playback interruption when users continue watching the video.

By managing the caching logic carefully, the VideoCacheManager reduces buffering and adapts to network variability, ensuring that the user always experiences smooth video playback.

Feed Adapter and Smooth Scrolling

The video feed is implemented using a vertical ViewPager2, which emulates the familiar social media reel experience. The adapter is responsible for:

  • Managing multiple ExoPlayer instances for each visible video.
  • Starting playback when a video item becomes visible and pausing it when scrolled out of view.
  • Introducing a slight delay before playback to avoid interruptions during fast scrolling.
  • Coordinating with the caching manager to preload initial segments for each video.

This intelligent coordination ensures that users always see a thumbnail initially, which then transitions smoothly into a playing video once the essential segments are downloaded.

Performance Optimization and User Experience

Smooth Transitions and Reduced Latency

The application architecture minimizes latency through:

  • Preloading Initial Segments:
    Essential video and audio segments are preloaded, ensuring that playback can begin almost instantly.
  • Adaptive Quality Switching:
    As network conditions change, the player can seamlessly switch between representations defined in the manifest.
  • Efficient Buffering:
    Customized ExoPlayer settings optimize buffer size to balance between smooth playback and efficient memory usage.

Handling Errors and Network Fluctuations

Robust error handling is built into the system:

  • Player Analytics and Logging:
    ExoPlayer's analytics listeners capture detailed playback events, helping developers diagnose issues in real-time.
  • Segment Download Retries:
    The caching manager implements retry logic and exponential backoff to handle network errors, ensuring transient issues do not impact playback.
  • Fallback Mechanisms:
    If local caching fails, the system automatically falls back to remote manifests and segments, maintaining continuous playback.
 player.addListener(object : Player.Listener {
override fun onTracksChanged(tracks: Tracks) {
val TAG = "DASHPlayerTack"
Log.d(TAG, "Available tracks: ${tracks.groups.size}")
for (group in tracks.groups) {
if (group.type == C.TRACK_TYPE_VIDEO) {
Log.d("DASHPlayer", "Video track group found: ${group.length} tracks")

for (i in 0 until group.length) {
val format = group.getTrackFormat(i)
val selected = group.isTrackSelected(i)
Log.d(
"DASHPlayer", "Video track $i - Selected: $selected, " +
"Codec: ${format.codecs}, " +
"Resolution: ${format.width}x${format.height}, " +
"Bitrate: ${format.bitrate}, " +
"Sample MIME: ${format.sampleMimeType}"
)
}
}
}
}

override fun onPlayerError(error: PlaybackException) {
Log.e("DASHPlayer", "Player error: ${error.message}", error)
// Log the stacktrace
error.printStackTrace()

val TAG = "DASHPlayer"

if (error is ExoPlaybackException && error.type == ExoPlaybackException.TYPE_SOURCE) {
// Source error handling (e.g., 404 Not Found)
if (error.cause is HttpDataSource.InvalidResponseCodeException) {
val responseCode =
(error.cause as HttpDataSource.InvalidResponseCodeException).responseCode
val dataSpec =
(error.cause as HttpDataSource.InvalidResponseCodeException).dataSpec
val failedUri = dataSpec.uri

if (responseCode == 404) {
Log.e(TAG, "Source not found (404): $failedUri, original error=$error")
// Show a message to the user, retry, etc.
} else {
Log.e(
TAG,
"Source error (code $responseCode): $failedUri, original error=$error"
)
}
} else if (error.cause is Loader.UnexpectedLoaderException) {
Log.e(
TAG,
"UnexpectedLoaderException: ${(error.cause as Loader.UnexpectedLoaderException).message}"
)
} else {
Log.e(TAG, "TYPE_SOURCE ERROR, original error: ${error.message}")
}

} else if (error is ExoPlaybackException) {
Log.e(TAG, "Other ExoPlaybackException: $error")
} else {
Log.e(TAG, "Unknown playback error: $error")
}

// Try to get more information about what caused the error
when (error.errorCode) {
PlaybackException.ERROR_CODE_IO_NETWORK_CONNECTION_FAILED ->
Log.e("DASHPlayer", "Network connection failed")

PlaybackException.ERROR_CODE_IO_NETWORK_CONNECTION_TIMEOUT ->
Log.e("DASHPlayer", "Network timeout")

PlaybackException.ERROR_CODE_PARSING_CONTAINER_MALFORMED ->
Log.e("DASHPlayer", "Malformed container data")

PlaybackException.ERROR_CODE_PARSING_MANIFEST_MALFORMED ->
Log.e("DASHPlayer", "Malformed manifest")
}
}
})

Challenges Faced

Implementing adaptive video streaming is a complex process that came with its share of challenges. Here are some of the specific hurdles encountered during development:

  • Partial Playback:
    Initially, only the first chunk of the video was playing, while subsequent chunks were not being fetched or rendered correctly.
  • Audio-Only Playback:
    In one approach, audio chunks played perfectly, but the corresponding video chunks failed to load, leading to an unsynchronized and incomplete playback experience.
  • Video Flickering:
    The video started flickering during playback, which was attributed to low memory issues on certain devices, causing frequent reinitialization of the player.
  • Audio Merging Issues:
    There were instances where multiple audio tracks would merge together, resulting in distorted or overlapping sound.
  • ExoPlayer Exposure:
    The ExoPlayer instance was not being exposed properly in the app lifecycle, leading to unexpected behavior during playback and resource cleanup.
  • Data Persistence Problems:
    Critical data, such as downloaded segments and manifests, was not being saved reliably, causing playback failures when attempting to access local resources.
  • Source Errors:
    ExoPlayer sometimes reported “source not found” or generic source errors, indicating issues with either the URL resolution or the integrity of the DASH manifest.
  • Unmatched Track Issues:
    There were errors related to unmatched track types (e.g., unmatched track of type 1), which complicated the synchronization between audio and video streams.

Each of these challenges required iterative debugging and adjustments to the architecture — from refining the DASH manifest generation to enhancing the caching logic and fine-tuning ExoPlayer settings. Overcoming these issues was essential in achieving a robust and seamless streaming experience.

Final Result:

Conclusion

This comprehensive solution demonstrates how a well-architected system can deliver high-quality, adaptive video streaming even under challenging network conditions. By breaking down videos into manageable chunks, generating a robust DASH manifest, and employing intelligent caching and playback strategies on Android, the project achieves an experience reminiscent of modern social media platforms.

The collaboration between backend processing and Android playback components highlights the importance of a holistic approach — one that considers every aspect from data preparation to user experience. With such a design, developers can create scalable, efficient, and responsive video streaming applications that truly meet the demands of today’s mobile users.

About the Author

I’m just a passionate Android developer who loves finding creative elegant solutions to complex problems. Feel free to chat on LinkedIn for Android-related stuff and more. Thank You for reading this!

--

--

Shubham Kumar Gupta
Shubham Kumar Gupta

No responses yet