Media Onboarding Service

Service Design

Objective

The objective of a media onboarding service is to provide a streamlined and efficient process for uploading and onboarding media assets into a system. This service should enable users to easily upload media files and associated metadata, such as title, description, genre, and language, and ensure that the files are processed and stored appropriately. The service should also include validation and error handling to ensure that uploaded assets meet the required specifications and are properly formatted. Ultimately, the objective is to provide a reliable and user-friendly way for media assets to be added to a system and made available for use by authorised parties.

World is witnessing a significant rise of content creators and content consumers. With the internet getting cheaper and faster day by day we are motivated towards a goal to serve different types of media content to different sets of customers with the best experience. Aligning on this goal we need to provide a platform to content creators where creators can onboard their media content.

Functional Requirements

User authentication and authorisation: The service should provide a secure way for users to authenticate and authorise their access to the service.
Media upload: The service should allow users to upload media content in various formats like video, audio, images, etc.
Metadata management: The service should provide a way to manage metadata associated with the media content like title, description, author, etc.
Media asset management: The service should provide a way to manage media assets like files, images, etc. This includes storage, retrieval, and deletion of media assets.
Media processing: The service should provide a way to process media assets by converting them to different formats, resizing, cropping, and other necessary processing. Content creators should be able to see the status of the media being processed.
Content moderation: The service should provide a way to moderate user-generated content to ensure that it meets community guidelines and standards.

Non Functional requirement

Scalability: The service must be designed to handle increasing amounts of content and users over time. This includes the ability to horizontally scale by adding more instances of the service and to vertically scale by increasing the resources allocated to each instance.
Reliability: The service must be highly reliable and available to ensure that content can be added to the platform without interruption. This includes the ability to handle failures gracefully, such as by automatically retrying failed operations or failing over to backup systems.
Security: The service must be designed with security in mind, including the ability to authenticate and authorise users, encrypt sensitive data, and protect against attacks such as cross-site scripting (XSS) and SQL injection.
Performance: The service must be designed to deliver content quickly and efficiently, including the ability to cache frequently accessed data and optimise database queries.
Maintainability: The service must be easy to maintain and update over time. This includes the ability to monitor the service for errors and performance issues, as well as the ability to deploy new versions of the service without downtime, with clear documentation, modular architecture, and well-organised code.
Interoperability: The service must be designed to work with other systems and services within the organization, such as the recommendation engine, user management system, and billing system. This includes the ability to exchange data and messages with these systems using standard protocols and APIs.
Usability: The service must be easy to use for content providers and internal users. This includes providing clear and concise documentation, error messages, and user interfaces.
Compatibility: The system should be compatible with different types of media files, different file formats, and different metadata.
Compliance: The system should comply with relevant regulations and standards, such as data protection laws and industry-specific regulations.

Service Level Agreement

Availability: The service should be available at least 99.9% of the time, with a maximum downtime of 43.2 minutes per month, excluding scheduled maintenance.
Response time: The service should respond to requests within 500 milliseconds on average, with a maximum response time of 2 seconds for any request.
Throughput: The service should be able to handle a minimum of 100 requests per second during peak usage.
Data security: The service should store and transmit all media content securely, using encryption and secure protocols to prevent unauthorised access.
Scalability: The service should be able to scale up or down based on demand, and handle at least 10,000 concurrent connections without any performance degradation.
Data backup and recovery: The service should have a reliable backup and recovery mechanism in place, to ensure that all data can be restored in the event of any data loss or disaster.
Compliance: The service should comply with all relevant laws and regulations, including data protection and privacy laws.
Support: The service should provide 24/7 technical support, with a maximum response time of 2 hours for any support request.
Monitoring and logging: The service should have a comprehensive monitoring and logging system in place, to track and analyse all requests, errors, and system events.

High Level Design

Components

API Gateway: Serves as the entry point for external requests to the micro service.
Load Balancer: Distributes incoming requests across multiple instances of the micro service to ensure high availability and scalability.
Micro service Instances: Handle incoming requests and perform the necessary business logic, such as media asset ingestion, validation, and storage.
- Media Uploader: This component will expose APIs to ingest the media.
- Media Retrieval: This component will retrieve media metadata and media storage details which will be used by delivery service to deliver the media to end user efficiently.
Media Metadata Storage: Stores media assets and related metadata.
Raw Media Storage: Stores media files, providing high availability, scalability, and durability.
Metadata Service: Stores and retrieves metadata related to media assets, such as genre, language, and user information.
Media Ingestion: This component will handle the ingestion of media content uploaded by the content creator. It will verify the content against predefined criteria such as format, size, and duration. If the content meets the criteria, it will be stored in a temporary storage area for processing. If the content fails the criteria, it will be rejected and the content creator will be notified.
Media Processing: This component would receive the uploaded media from the Media Onboarding component and then perform various operations such as transcoding, compression, and format conversion. It may also include components such as a media encoder, decoder, and streamer. The output of this component would then be stored in the Media Storage.
- Media Transcoding: This component will handle the conversion of media files from one format to another. For example, a video file uploaded in MP4 format may need to be converted to a different format such as AVI or FLV to be compatible with different devices or platforms.
- Media Metadata Extraction: This component will extract metadata from the uploaded media content, such as title, description, and keywords. It will store the metadata in a separate database for faster and more efficient retrieval.
- Media Thumb-nailing: This component will generate thumbnail images for the uploaded media files. The component will take the uploaded media file and generate a small image, which will serve as a preview of the content. The Media Thumbnailing component will also handle the storage of the thumbnail images. It will store the thumbnail image in a separate location from the original media file, as thumbnail images are typically much smaller in size. The component will also generate a unique identifier for the thumbnail image, which will be used to link the thumbnail with the original media file. This linking information will be stored in a delivery storage.
- Processed Media & Thumbnail storage: This component will store the processed media content and thumbnail in a scalable and fault-tolerant storage system such as AWS S3 or Azure Blob Storage.
Media analysis: This component will perform analysis on the media content, such as video analysis for detecting objects, faces, and text. It may also perform audio analysis for speech-to-text conversion, language identification, and sentiment analysis.
Notification Service: Sends notifications to users about the status of their media assets, such as when they have been successfully ingested and are available for consumption.

Data model

Option 1 - One Table per Entity

Media

Media Metadata

Media Thumbnail

Genre

Language

Option 2 - Single Table for all Entities

We can store all the media-related data in a single DynamoDB table, using a composite primary key consisting of a partition key and a sort key. The partition key can be used to identify a specific media item, while the sort key can be used to store different types of data related to that item.

For example, the table could have a composite primary key of "media_id" as the partition key and "attribute_type" as the sort key. The "attribute_type" could be used to distinguish between different types of attributes, such as metadata, genre, thumbnail, and languages.

Media Metadata Store

aws dynamodb create-table \

--table-name media_metadata_store \

--attribute-definitions \

AttributeName=media_id,AttributeType=S \

AttributeName=attribute_type,AttributeType=S \

--key-schema \

AttributeName=media_id,KeyType=HASH \

AttributeName=attribute_type,KeyType=RANGE \

--billing-mode PAY_PER_REQUEST

Preferred Data Model - Dynamo DB Single Table Design

We will prefer Option 2 - Single table for all entities because of following benefits.

If we prefer option 1, in order to get all media details we have read data from all tables leading to more read capacity units consumption.
In case of option 2, with a single read capacity unit we can read all media details by querying over partition_key.
Each entity of attribute_type can vary in size. AWS charge updates to each record based on size of data. So we can store frequently updating attributes in another attribute_type saving a lot on entity size and on cost.
Single table design can help in reduce operational burden.
Easier export for analytics if needed.

Query Patterns

Get all metadata for a particular media: query the table with partition key as media_id and sort key as attribute_type with attribute_type value as "metadata". This will return all the metadata for a particular media.
Get all genres for a particular media: query the table with partition key as media_id and sort key as attribute_type with attribute_type value as "genre". This will return all the genres for a particular media.
Get all media for a particular genre: query the table with partition key as attribute_type and sort key as attribute_value with attribute_type value as "genre" and attribute_value as the desired genre.
Get all media with a particular language: query the table with partition key as attribute_type and sort key as attribute_value with attribute_type value as "language" and attribute_value as the desired language.
Get all media uploaded by a particular user: query the table with partition key as user_id and sort key as media_id. This will return all the media uploaded by a particular user.
Get all media uploaded on a particular date: query the table with partition key as upload_date and sort key as media_id. This will return all the media uploaded on a particular date.
Get all thumbnails for a particular media: query the table with partition key as media_id and sort key as attribute_type with attribute_type value as "thumbnail". This will return all the thumbnails for a particular media.

API End Points

Option 1: Passing all the inputs as fields to API as request parameters.

public interface MediaUploaderApi {

/**

* Uploads a new media file.

* @param file the media file to upload

* @param assetUserId the ID of the user who owns the media asset

* @return the uploaded media

* @throws MediaUploadException if the upload fails

Media uploadMediaAsset(File file, string mediaUserId) throws MediaUploadException;

/**

* Gets a media by ID.

* @param id the ID of the media to get

* @return the media with the given ID

* @throws MediaNotFoundException if the media does not exist

Media getMediaAssetById(String id) throws MediaNotFoundException;

/**

* Deletes a media by ID.

* @param id the ID of the media to delete

* @throws MediaNotFoundException if the media does not exist

void deleteMediaById(String id) throws MediaNotFoundException;

}

Option 2: Passing a single request object to API and all the inputs as field of given request object.

public class Media {

private String id;

private String title;

private String description;

private String type;

private String filePath;

private LocalDateTime createdAt;

private LocalDateTime updatedAt;

private Map<String, String> metadata;

private List<Genre> genres;

private User assetUser;

private String onboardingStatus;

}

public class MediaUploadRequest {

private MultipartFile file;

private int assetUserId;

// getters and setters

}

@RestController

@RequestMapping("/media")

public class MediaController {

@Autowired

private MediaService mediaService;

@PostMapping("/upload")

public ResponseEntity<Media> uploadMedia(@RequestBody MediaUploadRequest mediaUploadRequest) {

Media media = mediaService.uploadMedia(mediaUploadRequest);

return media;

}

@GetMapping("/{id}")

public Media getMediaById(@PathVariable("id") String id) {

Media media = mediaService.getMediaById(id);

if (media == null) {

throw new ResponseStatusException(HttpStatus.NOT_FOUND, "Media with ID " + id + " not found");

}

return media;

}

@DeleteMapping("/{id}")

public ResponseEntity<Void> deleteMediaById(@PathVariable("id") String id) {

mediaService.deleteMediaById(id);

return ResponseEntity.noContent().build();

}

@GetMapping("/media/genre/{genre}")

public ResponseEntity<List<Media>> getMediaByGenre(@PathVariable("genre") String genre) {

List<Media> medias = mediaService.getMediaByGenre(genre);

return medias;

}

@GetMapping("/language/{language}")

public ResponseEntity<List<Media>> getMediaByLanguage(@PathVariable(language) String language) {

List<Media> medias = mediaService.getMediaByLanguage(genre);

return medias;

}

@GetMapping("/media/user/{userId}")

public ResponseEntity<List<Media>> getMediaByUser(@PathVariable("userId") String userId) {

List<Media> medias = mediaService.getMediaByUser(genre);

return medias;

}

Preferred API Design - Option 2

When designing an API, it is generally better to have one request object with 10 fields than to have 10 separate fields in the API request.

Simplicity and ease of use: Having one request object with 10 fields makes the API easier to use and understand. It simplifies the API call and makes it more intuitive for developers.
Extensibility: If you need to add more fields in the future, it is easier to do so with a request object. You can simply add new fields to the request object, without having to change the API endpoint.
Consistency: A request object with multiple fields allows you to maintain consistency across the API. Each request can be validated against the same schema, making it easier to maintain and debug the API.
Versioning: If you need to version your API, having a request object makes it easier to manage versioning. You can add or remove fields in the request object for different versions of the API, without breaking backward compatibility.

Entities

@DynamoDBTable(tableName = "media_metadata_store")

public class MediaMetadataStore {

@DynamoDBHashKey(attributeName = "media_id")

private String mediaId;

@DynamoDBRangeKey(attributeName = "attribute_type")

private String attributeType;

@DynamoDBAttribute(attributeName = "attribute_value")

private String attributeValue;

public MediaMetadataStore(String mediaId, String attributeType, String attributeValue) {

this.mediaId = mediaId;

this.attributeType = attributeType;

this.attributeValue = attributeValue;

}

// Default constructor needed by DynamoDB mapper

public MediaMetadataStore() {}

}

Business Layer

public interface MediaService {

Media uploadMedia(MediaUploadRequest mediaUploadRequest);

Media getMediaById(String id);

void deleteMediaById(String id);

List<Media> getMediaByGenre(String genre);

List<Media> getMediaByLanguage(String language);

List<Media> getMediaByUser(String userId);

}

Repository Layer

public interface MediaMetadataRepository extends CrudRepository<MediaMetadataStore, String> {

List<MediaMetadataStore> findByMediaIdAndAttributeType(String mediaId, String attributeType);

List<MediaMetadataStore> findByAttributeTypeAndAttributeValue(String attributeType, String attributeValue);

}

Sequence Diagram

uploadMedia

getMediaById

getMediaByLanguage

getMediaByUser

deleteMediaById

getMediaByLanguage

Development

Initialise the project

Page updated

Google Sites

Report abuse