Multi-media Networks

Multi-media Networks

Andrew Swerdlow

swerdlow@uvic.ca

University of Victoria, Department of Computer Science

 

 

 

Abstract

This paper will review some of the factors of implementing multi media networks.  We will provide an overview of network architecture, communication protocols, and application services. The intent of this paper is to provide a foundation for developing multi media systems in a holistic approach that takes into consideration all the different factors described above.  We also review the working multi media network and evaluate the current status of the system.

 

Keywords

Multi media, networks, videoconferencing, rich communication, e-learning

1            Introduction

This paper will examine factors related to implementing network architectures for high performance media communications.   We will start by defining the term “multi-media networks” (MMN).  A MMN is the medium for transporting real-time synchronous audio, video and data between multiple sites.  It should be noted that MMN can also provide on-demand asynchronous voice and video services; however, we will focus on the former, since they are commonly more detaining scenarios.

 

Over the last ten years advances in networking technology and data compressions have allowed for rich media to be transported over multi purpose networks.  However, there has also been a symmetric increase in demands for improved quality of service (QoS), more features for users and lower costs for deployment and operations.  This paper will examine some of the current strategies for deploying MMN’s and will evaluate a working system.  Our goal is to examine how MMN’s are being used in industry and what factors are important to consider during implementation.

 

Some of the emerging applications of multi media networking include distributed education using high quality audio/video and application sharing.  MMN’s are also starting to be deployed to provide medical services to rural areas using telemedicine.  Telemedicine requires that there be a high level of quality of service with respect to the communication channels between doctors and patients. Indeed the quality of service of a MMN in the case of telemedicine could have an influence on the lives of patients in rural communities.   Another application of MMN’s are residential triple play services (TPS), where residential consumers can combine voice, video and data services to their homes.   Undoubtedly each application of MMN’s will have distinct requirements that will effect their implementation, yet we will examine the intrinsic properties that are similar to many of the applications.

 

This paper will be laid out in the preceding manner.  Section 2 will provide background information about Multi-media Networks architecture.  Section 3 will discuss performance metrics.  Section 4 will summarize some of the multi media protocols.  Section 5 will consist of a case study we preformed.  We will conclude with section 6.

 

2            Network Architecture

There are many different approaches to deployment that can be taken when designing MMN’s. Some of the more traditional approaches utilized leased lines solutions.  This is when an organization rents dedicated circuits from some points to others.  The leased lines are commonly used as the long haul transport networks between distributed sites of an organization.  These transport networks come in many different technology groups such as Asynchronous Transfer Mode (ATM), Frame Relay, Multi-protocol Label Switching (MPLS) and more recently Metro Ethernet.   It is common that MMN’s will span WAN’s that use heterogeneous networking equipment.  Some of the greatest challenges of deploying MMN’s on a global scale are how to integrate the diverse technologies. Specific integration techniques are beyond the scope of this paper; instead we will survey some of the common technologies to provide a background survey. 

 

Frame Relay is an older technology that is used to interconnect network nodes across long distances.  It is a packet switching technology that is primarily used from long haul networks.  For the most part Frame Relay is used to inter connect LAN’s across a WAN. 

 

ATM is also network technology that can be used to implement LAN to LAN connectivity.  However ATM can also be implemented within a LAN.  ATM is suited to MMN’s because they provide support for QoS and are connection orientated [1] [9].   ATM has the concept of Virtual Paths and Virtual Circuits.  An example is the Switched Virtual Circuit (SVC) which is a temporary connection between nodes on an ATM network.  This idea of having an established connection between nodes provides a good level of QoS for MMN’s because they can provide a high level of assurance that the data will flow from transmitting node to receiving node. 

 

MPLS is a perhaps one of the newer technologies that can be used to provide connectivity between LANs.  The primary benefit of MPLS is that it allows data to be transported across circuit switched networks and packet switched networks.  MPLS also allows for Traffic Engineering (TE).  This is important to MMN’s because it allows content to be forwarded across optimal paths.   MPLS also allows us to set up Virtual Private LAN services (VPLS).  This means that multiple distributed LAN segments can be connected so that they appear to function as a single LAN.

 

Metro Ethernet is another newer technology that is used to connect geographically dispersed LAN segments.  Perhaps the greatest appeal of Metro Ethernet is that most Multi Media endpoints use Ethernet as their native transport protocol.  This means that data flows do not need to be converted into other formats to be transmitted between sites.  This has potential to reduce latency and jitter. 

 

In general MMN’s that are based on homogenous network architectures are less complicated and less expensive to deploy.  They also have the potential to be managed in a more efficient manner since they could require less special skilled workers to manage the network.   However even in a homogenous network environment considerations about reliability and performance are still a concern. 

2.1       Quality of Service

Arguably the most important part of MMN’s is the implementation of Quality of Service (QoS).  QoS is important to MMN’s because the content of the flows real-time information that need guaranties that the information will arrive in a timely manner to its destination. Real time traffic is highly sensitive to temporal errors.  That is, multi-media information is usually linear, constant and complete in nature, so it is important that the information propagates through the network in a manner that reduces latency and jitter and packet loss [13].  QoS can also be implemented in different ways. There are two common implementations of QoS: Layer 2 QoS and Layer 3 QoS.   The layers refer to the OSI layering model.  Layer 2 is the Data Link Layer of the OSI model and is concerned with the MAC level protocols.  The Layer 3 QoS is the Network layer and routing addressing technologies.   Layer 2 QoS will try to guarantee access the physical transport medium, where Layer 3 QoS provides access in relationship to other traffic.  It could be argued that Layer 2 QoS is more reliable but more expensive, where Layer 3 QoS is easer to manage and cheaper, yet somewhat less reliable.  The decision to implement either is more likely determined by the existing infrastructure rather then cost benefit.   

 

2.1.1       Multimedia Encoding

Irrespective of which QoS implementation is employed, the ultimate goal is to provide the best results to the application layer content.  That being said, the applications can also help with providing the best quality of service to the end users.  How the media content is encoded can affect the perceived quality of the service, so it is important to know what type of traffic will be traversing the MMN.  Understanding the properties of multi media content will allow us to develop metrics for evaluating performance and QoS of MMN implementations.  Audio and video signals commonly start off as an analog signal.  The processes of encoding the signals involve converting them from analog to digital.  The digital signals are then compressed using codec’s into a format that can be easily transported.  See Figure 1 for a list of some of the more commonly used audio and video codec’s.

 

Figure 1: Some of the commonly used audio and video codec’s  [8].

 

More often then not we will be examining audio and video used for real-time synchronous communication.  The primary application we will be examining is videoconferencing.  Most videoconferencing technologies use encodings based on the MPEG standard such at H.261, H.263, and most recently H.264.  A common encoding principal that they all use is that of intra-frame coding [6].   MPEG uses the concept of motion vectors to reduce temporal redundancies in video transmission.  MPEG uses three types of image frames: I, P and B [10].   The I frame is a complete image of a frame.  That is an I frame (Intra-frame) is a full snapshot of an image.  The P frame (Prediction frame) and the B frame (Bi-directional frame) only contains information about changes to previous or other frames.   By eliminating temporal redundancies it is possible to compress multi-medial information there by making best use of your resources.   That is by understand the nature of how audio and video is encoded will allow us to characterize network traffic patterns, this can help use devise strategies for efficiently achieving QoS.  In the case of MPEG we expect bursts of traffic when I frames are sent and less traffic when P and B frames are sent.  Similar assumptions can be made for audio.   In section 3 we will describe specific metrics and thresholds that audio and video encodings require to maintain continuity for end users.

2.1.2       Real Time Protocol

The Real Time Protocol (RTP) is a protocol designed for transmitting temporally sensitive data, and is therefore well suited for transmitting encoded audio and video data.  It was developed by IETF [4] and is commonly used in conjunction with UDP to transport audio and video in MMN’s.   It is popular for use in multi-media applications because it will deliver packets to the destination and does not provide explicit guarantees of delivery.  The RTP packet contains a header and a payload.  The header contains information such as sequence number, time stamp and payload type [4].  The sequence number can be optionally used by the application to re-sequence packets.  

 

For the most part human perception of multi-media information such as video can tolerate some loss of information.  An example is video transmission; if some frame in a sample of video is lost in the transmission it would not likely be noticeable to the viewer.  This means that we do not have to guarantee the delivery of all information as we do with TCP.  It should be noted that there are thresholds for human tolerance of data transmission errors.  Some of the metrics we can use to monitor these thresholds are described in proceeding sections of this paper.

3            Performance Metrics

3.1.1       Jitter

Jitter is a metric used to describe a variation in the delay of received packets [15].  That is, if packets are transmitted from a node in a continuous manner with even spacing between the packets, then the jitter is defined as the variance in the packet spacing when they arrive at the receiving node.   

 

Audio and video signals are commonly transported by RTP, since RTP has a packet sequence number then the application has the potential to re-sequence the packets thus reducing the effects of jitter.  The threshold for jitter for voice and video applications is suggested to be no higher then 30ms [12]. 

3.1.2       Delay

Delay is the time it takes for information to travel from the transmitter to the receiver.  In videoconferencing that would be the time it takes for the sender’s actions and audio to be encoded by their local endpoint and transmitted across the network to a remote endpoint and perceived by the receiver.  Delay can be introduced at many points of the process; some of the common points of delay are at the encoding and decoding of the audio and video, network congestion and geographical distance.  In general the maximum acceptable delay time from mouth to ear or camera to eye are approximately 150ms [12].

3.1.3       Packet Loss

Packet loss is a metric that is based on a sampling of packets.  The metric is defined as the percentage of packets that did not arrive at the intended destination.  For example, if an endpoint transmitted 100 packets across the network and the receiving endpoint received 95 of those packets then the packet loss would be 5%.   Packets can be lost for two reasons; the most common is buffer over flows and network congestion, the second less likely cause is bit errors [4].  The accentual threshold for packet loss is approximately 1 percent or less.   Since MPEG video encoding uses intra-frame encoding it can be particularly susceptible to introducing artifact distortions with packet loss greater than 1 percent.  This is because an artifact can persist for longer times since it might take several seconds before an I frame is sent.  Gaglianello in [3] show that 3 percent packet loss can translate into frame error rates as high as 30 percent.   With error rates that high it would be difficult for users to interpret the media content. 

 

4            IP Multi Media Protocols

 

There are many different ways to deliver multi-media information across networks.  We have reviewed some of the physical connectivity issues related to network architecture as well as some of the application factors.  We have also provided some QoS metrics.  The next step is to examine how multi-media endpoints communicate to each other across the network.  Some of the most common multi-media communication protocols are H.323 and SIP.

4.1       H.323

H.323 was developed by the ITU-T and it contains other protocol definitions for IP videoconferencing.  H.323, could be considered the successor of H.320, which is a protocol for voice and video over ISDN networks.  At the time of writing this paper H.323 is the most prevalent protocol used for videoconferencing. Most of the appliance based videoconferencing manufactures such as Tandberg, Polycom and VCON use H.323 as their primary method of providing video conferencing services to their clients. H.323 includes preparatory algorithms for encoding audio and video such as AAC and H.264.  H.323 uses RTP to transport audio and video information across IP networks. 

4.2       SIP

SIP stands for Session Initiation Protocol and is a newer protocol that is the proposed replacement of H.323.  Like H.323 is also uses RTP to transmit information across the network.  However it is a less complex protocol that was defined by the IETF working group.  SIP is gaining popularity in the software based endpoints as well as VoIP only appliances.  Due to its simplicity and scalability it is predicted that SIP will replace H.323 [11].  Although SIP does seem to have many advantages, appliance based videoconferencing units are still predominantly h.323. Glasmann et al provide a comparison between the different components required and the protocols they execute. 

 

5            Case Study

 

Our case study will examine an existing MMN the purpose of this study is to examine some of the key factors related to implementing the MMN.  The system we will examine is the University of British Colombia (UBC) Distributed Medical Program Multi Media Network.  The network is the medium for delivering e-Learning medical course material to students distributed across the province of British Colombia.  The UBC distributed medical program is one of the largest medical programs in North America and is the first fully distributed medical education program in North America. Students receive lectures and labs using network based applications such as videoconferencing, virtual network computing and various web technologies.  This case study will examine the architecture and application of their network used to transport their videoconference traffic. 

 

5.1       Network Topology

The network is comprised of three main sites: The University of Victoria (UVic), The University of Northern British Colombia (UNBC) and UBC.  Every site then has a number of endpoints used to receive and transmit H.323 traffic.  The endpoint hardware is Tandberg 6000 MXP’s running firmware F2.5.  Each endpoint has a unique IP address and managed using the Tandberg Management Suite (TMS).  All Tandberg endpoint terminates three kinds of local multi media signals: several composite video signals, multiple stereo audio channels and two DVI sources.  The signals are encoded by the Tandberg’s and then transmitted across the network using H.323.  The Tandberg’s also receive encoded traffic form other endpoints and decode the signals to be displayed in their native format See figure 5. 

 

Figure 5: Overview of the Tandberg endpoint input and output

 

 

The media sources are part of an integrated collaborative environment used for teaching lectures and labs in a distributed mode.   Most of the learning scenarios include at least three Tandberg endpoints conferenced together using the Tandberg MCU capabilities. 

 

The network was designed to be an entirely private layer 2 network [2].  For the most part the network is connected by switches except for a segment that uses MPLS to tunnel between UNBC and UBC see figure 6.  The backbone long haul transport network is managed by BCNet, Telus and Canarie.  All Tandberg endpoints share the same VLAN, traffic from that VLAN has a QoS policy associated with its traffic.

 

Figure 6: Topology of the UBC MMN which spans across the province of British Columbia. Each of the rectangles are Tandberg endpoints.

 

5.2       QoS Approach

To implement QoS on the UBC MMN there needed to be a double approach.  The double approach was to use Layer 2 and Layer 3 QoS.  Layer 2 QoS uses an 802.1P CoS value, which is a 3 bit field in the 802.1Q frame header [5].  Layer 2 QoS was implemented on each campus from the endpoint to the demark point of the transport network.  When packets get to the demark point they are remarked with layer 3 DiffServ code points in the 6 bit field in the IP header.   They used the Assured Forwarding code point AF41 (100010) to denote priority videoconference traffic.  This approach was developed by networking teams from all the institutions involved with the project and was based on some guidelines provided by the equipment manufactures.  We will discuss some of the implications of their approach to mapping the QoS values to the application in the next section.

 

The overall goal of this QoS strategy was to assure that content was delivered to its destination in a linear, constant and complete manner.   To test the implementation we collected data from Cisco Service Assurance Agents (SAA) that were located at each campus site on the MMN.  We examined data for the month of October 2005.   The results were encouraging with packet loss no higher then 0.25% see figure 7, jitter less then 1ms see figure 8 and latency less then 17ms see figure 9.

 

Figure 7: UBC expanded medical program MMN SAA results for packet loss in October 2005

 

 

 

Figure 8: UBC expanded medical program MMN SAA results for jitter in October 2005

 

 

Figure 9: UBC expanded medical program MMN SAA results for latency in October 2005

 

 

After reviewing the collected data, it would indicate that all performance indicators fall well below the suggested thresholds.  However it should be noted that the UBC distributed medical program is still in its initial stages and will continue to grow over the next 5 years as it adds new sites.   This means that more demand will be put on the MMN which could impact performance.

5.3       Potential Issues

When examining the implementation of the QoS policies of the UBC MMN we discovered that there were performing bandwidth policing.  This means that there was a limit on the amount of traffic between the different institutions.  The limit was set to 30Mbps per campus at each demark point of the transport network.   All traffic under 30Mbps was marked with AF41 (100010) DiffServ code point.  Traffic in excess of 30Mbps was remarked with Best Efforts (BE)  (000000) DiffServ code point.  Best efforts means that the traffic will not receive priority over other traffic on the transport network.

 

This has the potential to introduce errors in to the traffic flows.  For example, if packet a is sent at time t0 and is in excess of the 30Mbps bandwidth limitation then it is marked as BE.  Packet a is then followed by packet b at time t1 and it is under the 30Mbps threshold then it is marked AF41 and has priority on the network.   This could result in packets packet b arriving at the destination before packet a.  So to summarize the implemented bandwidth policing has the potential to introduce the error of sending packets out of sequence see figure 9.   It should be noted that the Tandberg 6000MXP will discard packets that arrive out of sequence if they exceed the 100ms jitter buffer.  Any packets that arrive with in the 100ms jitter buffer will be re sequence using the RTP sequence number in the RTP header.

 

 

 

Figure 10: UBC MMN Queuing strategy

 

Since it is likely that packets sent to the BE queue will be dropped by the Tandberg endpoint it would be more efficient on network resources to drop the packet before it hits the BE queue.   That is if the packet has a high probability of being dropped by the endpoint because it is in the BE queue then to reduce congestions on the network the router could drop the packet.  A solution could be to implement a Weighted Random Early Detection (WRED) algorithm as suggested in [14]. 

 

However the current usage on the UBC MMN has not exceeded 30Mbps yet due to the fact that the program is not yet at capacity.   Currently there is the potential for 4 videoconferencing sessions to be hosted at UBC each requiring a maximum 6144Kbps and a total bandwidth of 24Mbps at the UBC demark point to the transport network.  It should be noted that at the time of writing this paper two new sites have come online at Vancouver General Hospital (VGH).  They share the demark point with UBC.  This has the potential to increases the bandwidth by 12Mbps bringing the UBC maximum bandwidth potential to 36Mbps which is in excess of the current limitations.    This means that using the current approach to police bandwidth there is a very likely chance that network integrity will be compromised to errors due to out of order packets.

6            Conclusion

This paper has presented a real world example of a multi media network; we have enumerated some of the issues that must be considered when providing high levels of QoS.  It would seem that there are indeed many factors that need to be considered when developing MMN’s.  Such as: network topologies, technology integration, methods of encoding and performance metrics.  It is our hope that this paper can provide a framework for others developing multi media networks.   We also reviewed the UBC case study demonstrated that QoS issues must be examined carefully so as not to introduce efforts into the design.   Our next steps are to increases the sample size of our case study in hopes to allow our observations generalize to a greater population.   It is would seem that there is a great demand for multi media applications systems in many areas such as health care, education and business.  Yet the realization of these systems are dependent on providing well thought out networking infrastructure, we hope that this paper makes a positive contribution in to moving towards that goal. 

 

References

 

[1]   Alles A, “ATM internetworking,” tech. rep., CISCO Systems Inc, May 1995. http://www.cisco.com

[2]   BCNET AV Network Plan Draft 1.00 September 29, 2004

 

[3]   Boyce J, Gaglianello R. Packet Loss Effects on MPEG Video Sent Over the Public Internet

 

[4]   Busse I, Deffner B, Schulzrinne H. Dynamic QoS control of multimedia applications based on RTP. Computer Communications, 19(1):49–58, Jan. 1996

 

[5]   Glasmann J., W. Kellerer, and H. Muller, Service Architectures in H.323 and SIP: A Comparison, IEEE Communications Society Surveys and Tutorials, 5(2),2003

 

[6]   Gringeri S, Egorov R, Shuaib K, Lewis A, Basch B. Robust compression and transmission of MPEG-4 video. In Proc. ACM Multimedia, 1999

 

[7]   Implementing QoS Solutions for H.323 Video Conferencing over IP http://www.cisco.com/warp/public/105/video-qos.html Accessed on Nov 12 2005

 

[8]   List of codecs http://en.wikipedia.org/wiki/List_of_codecs Accessed Nov 11 2005

 

[9]   Rodrigues R, Grilo A, Santos M, Nunes M. Native ATM Videoconferencing Based on H.323. Proceedings of the II Conference on Telecommunications, ConfTele'99. Sesimbra, Portugal. April 1999

 

[10] Rose O. Statistical Properties of MPEG Video Trac and Their Impact on Trac Modeling in ATM Systems. Technical Report 101, Institute of Computer Science, University of Wurzburg, Germany, February 1995

 

[11] Schulzrinne H, Rosenberg J. A comparison of sip and h.323 for internet telephony. In Network and Operating System Support for Digital Audio and Video (NOSSDAV), July 1998

 

[12] Service Provider Quality-of-Service Overview http://www.cisco.com/warp/public/cc/so/neso/sqso/spqos_wp.htm Accessed on Nov 12 2005

 

[13] Shahsavari M, Al-Tunsi A. MPLS Performance Modeling Using Traffic Engineering to Improve QoS Routing on IP Networks, Southeast Conference 2002, Proceedings IEEE pp. 152-157 2002

 

[14] Takeo, J.   Tasaka, S.. Application-Level QoS of web Access and Streaming Services with AF Services on DiffServe. Global Telecommunications Conference, 2003. GLOBECOM '03. IEEE

 

[15] Understanding Jitter in Packet Voice Networks (Cisco IOS Platforms) Mar 30, 2005 http://www.cisco.com/warp/public/788/voice-qos/jitter_packet_voice.html