Glossary
Multimedia
- Packet is a formatted unit of data transmitted over network. In order to send data over network it has to be fragmented into packets, which size is limited by MTU(Maximum Transfer Unit) - 1500 bytes when using Ethernet.
- Frame can refer to either network frame or media frame, which is a basic data unit used by media coding formats. In particular one media frame can represent a single image in a video.
- (media)Track is equivalent to a single audio or video stream.
-
Web protocols:
- UDP(User Datagram Protocol) is a transport layer protocol using connectionless communication. See here for more details.
- TCP(Transmission Control Protocol) is a transport layer protocol using connection-oriented communication. See this explanation on how TCP works.
- RTP(Real-time Transport Protocol) is an application layer protocol for delivering real-time audio and video over IP networks. RTP packet structure is described here. There is an extension of RTP - SRTP (Secure RTP), which adds security features and is used by WebRTC.
- HTTP(Hypertext Transfer Protocol) is an application layer protocol for fetching data from a server by a client. It is used by HLS and MPEG-DASH for media streaming.
- (HTTP) Long Polling is a technique of keeping an open connection after the client’s request for as long as new data is not available. This is more efficient than naive repeated polling by a client until new data is received.
- WebRTC(Web Real-Time Communication) is a free and open-source project providing web browsers and mobile applications with real-time communication (RTC). WebRTC implements three APIs: MediaStream used for acquiring media from the browser, RTCPeerConnection handling stable and efficient communication of streaming data between peers, and RTCDataChannel enabling a peer-to-peer exchange of arbitrary data with low latency and high throughput. Learn more about WebRTC here.
- SDP(Session Description Protocol) is used for describing multimedia communication sessions for the purposes of announcement and invitation. It is used in the WebRTC signaling process for describing a session.
- WebSocket is an application layer communication protocol that enables full-duplex communication between client and server in near real-time. It is based on TCP and, in contrast to HTTP, allows to create persistent connections. Today it is supported by most web browsers and web servers.
- ICE(Interactive Connectivity Establishment) is a technique for establishing the most direct connection between two computers, which is used in P2P communication.
- STUN(Session Traversal Utilities for NAT) is a protocol used in interactive communications with hosts hidden behind a NAT. Its goal is to find public addresses of the peers that they can use to directly communicate with each other.
- TURN(Traversal Using Relays around NAT) is a protocol utilizing TURN server which relays data between peers in case when direct connection cannot be established. However, this comes with an overhead since all the media must be sent through this server.
- DTLS(Datagram Transport Layer Security) is a protocol used for providing security to datagram-based applications. It is based on TLS and guarantees a similar level of security. All of the WebRTC related protocols are required to encrypt their communications using DTLS, this includes SCTP, SRTP and STUN.
- NAT(Network address translation) is a technique of sharing one public IP address by multiple computers.
- Container format is a file format that allows multiple data streams to be embedded into a single file, e.g. MP4 format can contain video, audio, and subtitles streams inside of it.
- YUV is a color encoding system that defines one luminance and two chrominance components. By reducing the resolution of the chrominance components it is possible to compress an image with minuscule effect on human perception of the image.
- Encoding is a process of converting media from raw format to encoded format. The main purpose is to reduce media size - the raw format is uncompressed and takes up a lot of space. Examples of encoded formats are MP3 and AAC for audio and AVC and MPEG4 for video.
- Decoding is a process of converting media from encoded format to raw format, e.g. in order to play it on the end device.
- Encryption is a way of modifying a message, so that only authorized parties are able to interpret it.
- Decryption is a process of retrieving data from an encrypted message.
- Muxing(abbr. from multiplexing) is a method of combining multiple streams into a single container, e.g. muxing video and audio into an MP4 container.
- Demuxing(abbr. from demultiplexing) is a method of separating streams from one combined container, e.g. retrieving audio and video from MP4.
-
Server’s architecture
(here is a short article to get you started)
- SFU(Selective Forwarding Unit) is a video conferencing architecture that consists of a single server, which receives incoming streams from all participants and forwards each participant’s stream to all other conference participants.
- MCU(Multipoint Control Unit) is an architecture consisting of a single server, which receives incoming streams from all participants, mixes the streams, and sends them to each of the participants.
- P2P(Peer to Peer) is an architecture in which each participant is directly connected to all other participants, which eliminates the need for MCU or SFU.
Membrane Framework
- Pad is an input or output of an elements or a bin. Output pads of one element are connected to input pads of another element or bin.
- Caps(abbr. from capabilities) define pads specification, allowing us to determine whether two elements are compatible with each other.
- Pipeline is a chain of linked elements or bins which together accomplish some media processing task.
- Bin is a container for elements, which allows for creating reusable groups of elements. Bin can incorporate elements and other bins as well.
- Buffer is a fundamental structure in Membrane used to send data between elements.
- Element is the most basic entity responsible for processing multimedia. Each element is created to solve one problem. Elements can be divided into three categories:
- Source is an element with only output pads, the first element of each pipeline. It is responsible for fetching the data and transmitting it through the output pad.
- Filter is an element with both input and output pads, which is responsible for transforming data.
- Sink is an element with only input pads, the last element of a pipeline. It might be responsible, i.e. for writing the output to the file or playing the incoming media stream.
-
Types of elements:
- Payloader and Depayloader are responsible for respectively dividing frames into packets and assembling packets back into frames.
- Encoder and Decoder are responsible for encoding and decoding.
- Encryptor and Decryptor are responsible for encryption and decryption.
- Muxer and Demuxer are responsible for muxing and demuxing.
- Mixer is responsible for mixing multiple media streams into a single stream. Unlike multiplexing, mixing is an irreversible operation.
- Jitter buffer / Ordering buffer is an element responsible for ordering packets incoming from the network as their order can be disrupted during transmission due to network unreliability.
-
Demands mechanism