Building the Videosher VOD Transcoding SaaS Service: A Technical Journey

August 3, 2023


In 2014, we launched the first version of our online video platform, with VOD transcoding as one of key components, a critical process for transforming source video assets into suitable formats for online streaming. Since then, our service has evolved significantly, catering to the needs of enterprise-level customers. This article highlights the technical challenges we encountered during the development and the solutions we implemented to create a scalable, efficient, and user-friendly VOD transcoding service.

For the service to be enterprise level which we decided to make it multi-tenant, highly scalable, supported necessary formats and as much automated as possible to minimize the workload for those who manage it. We had a mix of our own understanding and customer feedback. The interesting fact is that the product evolved not in the usual inside-out way when the internally used product is commercialized, but outside-in when a product already successful on the market is gaining traction also in the company group.

Our main goal was to make the process fully configurable and controllable by customer from an easy-to-use web portal, all service components and workflow steps very customer friendly and requiring no huge investment initially. The value for the customer is usage-based payment and no worries about capacity issues either of too large infrastructure which is idle most of the time or not enough capacity resulting in long waiting times.


To provide a seamless experience, we developed fully configurable workflows that apply automatically to all process steps. Logical folders allow customers to categorize and filter their large content libraries efficiently. We also implemented ingest point separation to manage content from different providers. The service prioritizes processes based on importance, ensuring timely delivery of hot content while processing back-catalogs with lower priority.

  • Optimal Resource Utilization: We optimized performance by running multiple processes in parallel on the same infrastructure without compromising efficiency. Additionally, we employed a combination of dedicated and cloud infrastructure to manage transcoding, leveraging the benefits of both.
  • Storage and Network Bottlenecks: Careful planning and utilization of advanced file transfer technologies allowed us to handle large files efficiently, avoiding storage and network bottlenecks.
  • Transcoding Module Flexibility: We developed a modular transcoding process that supports various codecs, formats, and manifests, making it easier to meet diverse customer requirements.
  • Fair Queueing and Process Prioritization: Microservices work asynchronously and receive tasks through a message queue, ensuring a fair allocation of resources and efficient prioritization of processes.
  • Hardware Design: A mix of public cloud resources and dedicated instances with custom configurations allowed us to optimize performance and handle peak workloads effectively.

Challenges in Detail

From the process perspective the main challenges to think about:

  • Easy workflow configuration,
  • Easy management of large libraries of content in the web-based portal, categorization, filtering, error reporting,
  • Ingest point separation for content providers who send the files into different workflows,
  • Resource distribution among customers so that the processes run smoothly and without significant delays,
  • Process prioritization to handle the most important tasks first, e.g., hot content delivered in minimum time to viewers with large back-catalogs processed with lower priority,
  • Eliminating process bottlenecks to deliver the videos to viewers in shortest time possible.

From the technological perspective the main challenges we faced:

  • Optimal utilization of available resources – reach maximum performance by running several processes in parallel on the same infrastructure but not losing effectiveness,
  • Utilization of dedicated vs. cloud infrastructure – as we are the public cloud virtualization provider ourselves, we use the public infrastructure for overflow cases but it’s more cost-efficient to utilize some dedicated cloud infrastructure part as transcoding is a compute intensive,
  • Storage bottlenecks in terms of network and IOPS – you can easily overload any storage if file operations are not carefully planned,
  • Network capacity for large file transfers,
  • Network stability for ingest from content providers and external uploads to customer’s storage.

How we addressed these challenges

There were lot of things to decide, and some were redesigned several times until we came to the current version of the service. As we are improving it constantly there will be new solutions implemented over time, but this is our experience so far.


We created fully configurable workflows which are configured once and then applied automatically for all process steps.

We decided to allow workflows to be configured through logical folders which can be nested and child folders having the parent’s folder properties if not explicitly defined. Depending on customer needs separate folders can be used for every content provider or content type.

Ingest file patterns contain fields like asset ID, asset name, bitrate, language, channel ID with defined default patterns which can be overridden and made with custom field order and delimiters for each folder and each file type in the folder to allow file grouping by assets, e.g., separate video, audio and subtitle tracks, some supplemental metadata files, images, documents and other files.

After the ingest is completed the process needs to decide when to start the transcoding. This is tricky when the asset has multiple files that can arrive in random order. The best scenario is to have a manifest file that contains list of all expected files, their sizes, and some other information to verify before transcoding starts. Another option is to use container files like tar, for example. In the worst case some assumptions should be made, e.g., all files arrive within specified number of hours and transcoding can start when this time has come after the first asset file ingest. We decided to implement all these scenarios for customer to choose which is the most suitable for each workflow.

Each folder should have a transcoding preset that defines encoding settings with all video and audio renditions, some advanced parameters, advanced processing, and output file packages (see more in the Transcoding section).

After the transcoding is finished, we allow immediate global streaming from our Videosher platform but also, we acknowledge that some customers have their own streaming platforms, including those in the internal controlled network. To deliver the files to such platforms we implemented upload destinations multiple of which can be configured per folder.

We allow the customers to store the source and transcoded content for any time necessary for backup but also, we understand that customers want to keep the storage cost down and avoid unnecessary storage consumption. For this implemented automated storage cleanup process which deletes the source, transcoded or all asset files after defined number of hours after transcoding. Also, we enabled the backup of the content to cheap long term storage like AWS Glacier or Azure Blob Archive.

Catalog management

The optimal structure of the asset database, utilizing the indexes, profiling and optimization of the large queries is the obvious part of the backend design. We use both SQL and no-SQL databases for different data for optimum performance and employed a lot of data caching at different technology stack places.

The web portal itself is separated from the backend and using API to retrieve and update the data. This allows additional optimizations and makes the web portal more lightweight and responding faster.

There was a lot of design and coding to implement a portal capable of easy and fast management of many thousands of assets for each customer. The portal also has reports to see the transcoding jobs overall and in detail up to every processed and generated file and track.

File operations

There are many different file types used by our customers. While some use relatively small files, e.g. news videos which do not require super high quality, those who have b2c entertainment services quite often operate with high quality mezzanine files. Earlier those were large 50 Mbps MPEG2 videos, nowadays those are huge Apple ProRes files with bitrates of several hundred Mbps and sizes of hundreds of GBs.

Large files are challenging to operate with and not only you need fast storage and networks but also optimized processes to eliminate the file transfer. In the good old days, the files were transferred over FTP (or better SFTP for security) but with the large files the FTP protocol doesn’t provide enough speed and better technologies are necessary. As we have our own S3 compatible CEPH storage, we allow direct ingest into the bucket which gives much better speed than FTP. But in some cases, specific file transfer technologies like SIGNIANT or Aspera need to be used. Additionally, we implemented error control on ingest to avoid incomplete and faulty files being processed further down.

During the transcoding workflow all source files are ingested through the various technologies and stored in our S3 compatible storage. The optimization of the whole path until those are processed and stored is crucial to have no bottlenecks.

After the asset files are stored and the transcoding process starts the files need to be accessed multiple times to generate all renditions and for some other processing. Reading and transferring a file of 200 GB multiple times can significantly reduce the whole process. That’s why we optimized the whole process to read the files as few times as possible but at the same time generate the renditions of the asset in parallel to speed up the process.

The transcoded files typically are smaller than sources and further processes are less challenging but again - the quality control is necessary for uploads to customer’s own storages as different errors might happen which need to be gracefully handled.


There are different codecs and formats our customers need with specific manifests for their packagers. Some have several streaming services like OTT and IPTV, sometimes even different versions of each with different file formats. Also, the content providers have a huge variety of source file formats and there are formats that do not contain all necessary information for outputs which has to be configured somehow to allow automated processing. We made the transcoding process modular so that all modules can be used in a row to complement each other.

The first step is to create uniform renditions which we call the Base file package. All video and audio tracks are split into separate files encoded with the selected codec, subtitles converted to the multi-language TTML format and SMIL manifest that is compatible with our Videosher streaming service. Having the Base package is enough to stream the videos to the audience immediately after transcoding is finished.

Other packages are created for the needs of customer’s own streaming services. The Base package files are used as mezzanine files for further media file generation which is mostly done by re-muxing them into necessary formats. Additional modules are used for creating some legacy formats, e.g., MPEG2 TS files with DVB subtitle tracks embedded or Microsoft Smooth Streaming format files. In the core of the process, we use own-compiled open-source ffmpeg where some additional libraries are added. In addition, we use some other tools for some specific processing tasks.

There is additional logic implemented that generates renditions and output files based on conditions to allow the same workflow for the content that doesn’t necessarily have the same properties, e.g., source bitrate, resolution, aspect ratio etc. In general, we made a default configuration which is good for most cases and is utilized by many customers out of the box. Only some need more sophisticated tweaks to make the result as they require.

Process design

All modules are implemented as microservices which are quite independent in their nature and work asynchronously. Every module gets a job to be done from the MQTT message queue. Such method allows to avoid conflicting processes and makes the process prioritization easier.

All processes are designed to provide ‘fair’ queues for all tenants and honoring the set prioritization. That way all customers do not experience any delays as there is always resource available for them. Knowing that shorter videos usually need faster publication times, e.g., news clips or user-generated content, there is a separate queue for videos under 5 minutes with faster turnaround times.

An average enterprise level customer usually has several tens of content providers each providing content directly to our platform at different pace. First task we had to tackle is to isolate ingest points so that every provider has its own and doesn’t see the others. It is also important to allow customer to create those ingest points themselves and discard whenever no longer needed.

As content providers have different content types, e.g., hot series with new episodes to be published in a matter of hours or huge back-catalogs of tens of thousands of content hours, we created priorities by ingest folders, so that hot content is always ingested and processed fast.

One of the challenges of implementing such service is also setting up an advanced test environment to test how processes run in parallel, how they behave under a substantial load during peak consumption. Also, its important to handle errors as soon as they appear and report to users.

Hardware design

Initially we started with our own public cloud resources only which were utilized when necessary. When the service grew, and the workload became quite intensive we estimated from cost perspective that it’s reasonable to have some reserved cloud instances which can be custom configured for maximum performance.

The first thing was to create pools of instances sharing the same NVMe storage pool for local processing. That allowed to decrease file operations from the long term S3 compatible storage by order. The next was reserved network connection to storage to speed up the file transfers to and from S3 storage.

We made a series of tests to calculate what is the optimal number of parallel running processes on each transcoding instance given the available CPU resource and other performance parameters. In our setup we decided to have 4 parallel processes per instance because when having less the resource was underutilized but when having more the processing times started to increase substantially.

We still use the public cloud resources as well. We do that for tasks that do not put 100% load on compute resources like queue management, database, portal etc. Also, we can overflow the transcoding tasks in case the dedicated instances are full. That would be slower, and we would first offload low-priority tasks but that gives us virtually unlimited resource when necessary. Also, our micro-services are designed in a way that they are technically infrastructure independent allowing them to run on almost any private or public cloud infrastructure with the only challenge being to maintain reasonable performance by optimizing it.


Building the Videosher VOD Transcoding SaaS service was a challenging yet rewarding experience. We focused on enhancing capacity, performance, resilience, stability, and usability continuously. The service has grown to process thousands of content hours daily and serves enterprise-level customers.

Our efforts that initially started as a component for a platform for smaller content owners ended up being an enterprise level service able to process thousands of content hours daily and able to serve customers like Telia Play with 3 million subscribers in the Northern Europe. Read more about this in a case study here - or watch our demo session at IBC 2022 Content Everywhere Hub Theatre together with Andreas Backlund from Telia Play here -

When our company was acquired by Tet who itself is an IPTV and OTT service provider Tet was using old Vantage encoder farm which had to be upgraded sooner than later. Our colleagues started to test our service and tests were successful. The turning point was when the Vantage encoder farm crashed, and colleagues moved all transcoding workflows to our service in a matter of several days.