How Media Players work and the science behind it

Entertainment has been a crucial part of our lives. It has been there as a source of enjoyment or amusement for centuries and we can say it's here to stay. Its evolution in phases is comparable to the evolution of Human Beings (Minus the Awkward Phase, of course). It started when the french-men wanted to keep their guests happy by singing their songs and lo, the history of Entertainment has been going on relentlessly.


Now, stepping into the 21st century, we have different ways of Entertainment. We have Movies, Dramas, Web-Series and what-not. It is a Way of Life and it is completely digitized for that matter. It is also highly localized. A Movie in the 1970's had to be watched only in a Cinema Hall/Auditorium. But now, we have it in our pockets and can be watched without disturbing our precious comfort zones. All thanks to Media Players.


Media Players are a perfect concoction of Codecs, a UI and an Engine. The Word Codec is a joint of

the words CODer and DECoder.

FIG.1: The architecture of a Media Player

A codec might not have to be a necessarily a software code but it could be a small piece of hardware but now that we're talking about Media Players, let us stick with the software. These codecs help in compressing or decompressing the RAW Video shot from the camera. Without codecs, a 1 hour 480p video might be around 200GB but, thanks to compression in the H.264 codec's algorithm, it will be approximately around 200-300MB(Value derived from Youtube).

The incoming stream of analog video and audio signals are digitized and then sampled using the Nyquist Sampling theorem which is based on the fourier transform of signals. To know more about the Fourier Transform, click here. These codecs are designed to compress and decompress both the video and audio streams in a container which is quite named after its namesake for 'containing' the video and audio streams. Different Audio Layers will be packed which might be different Audio Languages or might be different audio quality with surround layers with different audio positioning. The audio files too, have different codecs which have uniquely defined bitrates amd frame size for every codec. These both are pre-processed i.e encoded in the container and the information is kept as metadata which helps the Media Player recognize the codec and decode it for smooth rendering. This processing and viewing is made possible by the renderer of the Media Player.

Fig.2: The Different Frequency plots when separated using Fourier Transform


The Engine consists of a library of all supported codecs, equalizers, file handlers, subtitles and everything that is important for playback of a media file. All the important metadata is handled as well as their features can be used/manipulated by the Media Engine.

A UI is built on top of it so the end-user can utilize the features of the Media Players on a more interactive platform; After all, the Media Player itself needs the bling when you're watching Hotline Bling on your Screen. Media Players embedded in Websites like Youtube will need to be built on a backend which controls your next video recommendations, ads placement etc. Unlike Media Players like VLC Media Player and KMPlayer do not really need this part of the backend as they work offline. But your favourite sites like Netflix is just a small part of the package that is a whole different story for another day!

Source(s):

Fig.1 : https://blog.streamroot.io/how-modern-video-players-work/

Fig.2 : https://en.wikipedia.org/wiki/Fourier_transform