Multi-media
Networks
Andrew
Swerdlow
swerdlow@uvic.ca
University of Victoria, Department of Computer
Science
Abstract
This paper will review some of the factors of
implementing multi media networks. We
will provide an overview of network architecture, communication protocols, and
application services. The intent of this paper is to provide a foundation for
developing multi media systems in a holistic approach that takes into
consideration all the different factors described above. We also review the working multi media
network and evaluate the current status of the system.
Keywords
Multi media,
networks, videoconferencing, rich communication, e-learning
1
Introduction
This paper will examine factors related to implementing
network architectures for high performance media communications. We will
start by defining the term “multi-media networks” (MMN). A MMN is the medium for transporting real-time
synchronous audio, video and data between multiple sites. It should be noted that MMN can also provide on-demand
asynchronous voice and video services; however, we will focus on the former,
since they are commonly more detaining scenarios.
Over the last ten years advances in networking
technology and data compressions have allowed for rich media to be transported
over multi purpose networks. However,
there has also been a symmetric increase in demands for improved quality of
service (QoS), more features for users and lower costs for deployment and
operations. This paper will examine some
of the current strategies for deploying MMN’s and will evaluate a working
system. Our goal is to examine how MMN’s
are being used in industry and what factors are important to consider during
implementation.
Some of the emerging applications of multi media
networking include distributed education using high quality audio/video and
application sharing. MMN’s are also
starting to be deployed to provide medical services to rural areas using
telemedicine. Telemedicine requires that
there be a high level of quality of service with respect to the communication
channels between doctors and patients. Indeed the quality of service of a MMN
in the case of telemedicine could have an influence on the lives of patients in
rural communities. Another application
of MMN’s are residential triple play services (TPS), where residential
consumers can combine voice, video and data services to their homes. Undoubtedly each application of MMN’s will
have distinct requirements that will effect their implementation, yet we will
examine the intrinsic properties that are similar to many of the applications.
This paper will be laid out in the preceding manner. Section 2 will provide background information
about Multi-media Networks architecture.
Section 3 will discuss performance metrics. Section 4 will summarize some of the multi
media protocols. Section 5 will consist
of a case study we preformed. We will
conclude with section 6.
2
Network Architecture
There are many different approaches to
deployment that can be taken when designing MMN’s. Some of the more traditional
approaches utilized leased lines solutions. This is
when an organization rents dedicated circuits from some points to others. The leased lines are commonly used as the
long haul transport networks between distributed sites of an organization. These transport networks come in many
different technology groups such as Asynchronous Transfer Mode (ATM), Frame Relay, Multi-protocol Label Switching (MPLS) and more recently Metro
Ethernet. It is common that MMN’s will span WAN’s that
use heterogeneous networking equipment.
Some of the greatest challenges of deploying MMN’s on a global scale are
how to integrate the diverse technologies. Specific integration techniques are
beyond the scope of this paper; instead we will survey some of the common
technologies to provide a background survey.
Frame Relay is an older technology that is used to
interconnect network nodes across long distances. It is a packet switching technology that is primarily
used from long haul networks. For the
most part Frame Relay is used to inter connect LAN’s across a WAN.
ATM is also network technology that can be used to
implement LAN to LAN connectivity. However
ATM can also be implemented within a LAN.
ATM is suited to MMN’s because they provide support for QoS and are
connection orientated [1] [9]. ATM has
the concept of Virtual Paths and Virtual Circuits. An example is the Switched Virtual Circuit
(SVC) which is a temporary connection between nodes on an ATM network. This idea of having an established connection
between nodes provides a good level of QoS for MMN’s because they can provide a
high level of assurance that the data will flow from transmitting node to
receiving node.
MPLS is a perhaps one of the newer technologies that
can be used to provide connectivity between LANs. The primary benefit of MPLS is that it allows
data to be transported across circuit switched networks and packet switched
networks. MPLS also allows for Traffic Engineering
(TE). This is important to MMN’s because
it allows content to be forwarded across optimal paths. MPLS also allows us to set up Virtual
Private LAN services (VPLS). This means
that multiple distributed LAN segments can be connected so that they appear to
function as a single LAN.
Metro Ethernet is another newer technology that is
used to connect geographically dispersed LAN segments. Perhaps the greatest appeal of Metro Ethernet
is that most Multi Media endpoints use Ethernet as their native transport
protocol. This means that data flows do
not need to be converted into other formats to be transmitted between
sites. This has potential to reduce
latency and jitter.
In general MMN’s that are based on homogenous
network architectures are less complicated and less expensive to deploy. They also have the potential to be managed in
a more efficient manner since they could require less special skilled workers
to manage the network. However even in
a homogenous network environment considerations about reliability and
performance are still a concern.
2.1
Quality
of Service
Arguably the most important part of MMN’s is the
implementation of Quality of Service (QoS).
QoS is important to MMN’s because the content of the flows real-time
information that need guaranties that the information will arrive in a timely
manner to its destination. Real time traffic is highly sensitive to temporal
errors. That is, multi-media information
is usually linear, constant and complete in nature, so it is important that the
information propagates through the network in a manner that reduces latency and
jitter and packet loss [13]. QoS can
also be implemented in different ways. There are two common implementations of
QoS: Layer 2 QoS and Layer 3 QoS. The
layers refer to the OSI layering model.
Layer 2 is the Data Link Layer of the OSI model and is concerned with
the MAC level protocols. The Layer 3 QoS
is the Network layer and routing addressing technologies. Layer 2 QoS will try to guarantee access the
physical transport medium, where Layer 3 QoS provides access in relationship to
other traffic. It could be argued that
Layer 2 QoS is more reliable but more expensive, where Layer 3 QoS is easer to
manage and cheaper, yet somewhat less reliable.
The decision to implement either is more likely determined by the existing
infrastructure rather then cost benefit.
2.1.1
Multimedia
Encoding
Irrespective of which QoS implementation is
employed, the ultimate goal is to provide the best results to the application
layer content. That being said, the
applications can also help with providing the best quality of service to the
end users. How the media content is
encoded can affect the perceived quality of the service, so it is important to
know what type of traffic will be traversing the MMN. Understanding the properties of multi media
content will allow us to develop metrics for evaluating performance and QoS of
MMN implementations. Audio and video
signals commonly start off as an analog signal.
The processes of encoding the signals involve converting them from
analog to digital. The digital signals
are then compressed using codec’s into a format that can be easily
transported. See Figure 1 for a list of
some of the more commonly used audio and video codec’s.
|
Audio
|
Video
|
|
AAC
(MPEG 2-4 Audio)
|
H.261
(similar to MPEG 1 Part 2)
|
|
Free
Lossless Audio Codec (FLAC)
|
MPEG
2 Part 2
|
|
WavPack
|
H.263
|
|
AC-3
(Dolby Digital A/52)
|
MEPG
4 Part 2 (DivX, Xvid)
|
|
WMA
(windows media audio)
|
MPEG
4 Part 10 (H.264)
|
|
GSM
(voice)
|
Sorenson
(Quick Time video)
|
|
G.722,
G.711 (voice)
|
WMV
(windows Media player video)
|
|
MP3
|
RealVideo
(Real Player video)
|
|
MP2
|
MJEPG
|
Figure 1: Some of the commonly used audio and
video codec’s [8].
More often then not we will be examining
audio and video used for real-time synchronous communication. The primary application we will be examining
is videoconferencing. Most
videoconferencing technologies use encodings based on the MPEG standard such at
H.261, H.263, and most recently H.264. A
common encoding principal that they all use is that of intra-frame coding [6]. MPEG uses the concept of motion vectors to
reduce temporal redundancies in video transmission. MPEG uses three types of image frames: I, P
and B [10]. The I frame is a complete
image of a frame. That is an I frame (Intra-frame)
is a full snapshot of an image. The P
frame (Prediction frame) and the B frame (Bi-directional frame) only contains
information about changes to previous or other frames. By eliminating temporal redundancies it is
possible to compress multi-medial information there by making best use of your
resources. That is by understand the
nature of how audio and video is encoded will allow us to characterize network
traffic patterns, this can help use devise strategies for efficiently achieving
QoS. In the case of MPEG we expect
bursts of traffic when I frames are sent and less traffic when P and B frames
are sent. Similar assumptions can be
made for audio. In section 3 we will
describe specific metrics and thresholds that audio and video encodings require
to maintain continuity for end users.
2.1.2
Real Time
Protocol
The Real Time Protocol (RTP) is a protocol
designed for transmitting temporally sensitive data, and is therefore well
suited for transmitting encoded audio and video data. It was developed by IETF [4] and is commonly
used in conjunction with UDP to transport audio and video in MMN’s. It is
popular for use in multi-media applications because it will deliver packets to
the destination and does not provide explicit guarantees of delivery. The RTP packet contains a header and a payload. The header contains information such as
sequence number, time stamp and payload type [4]. The sequence number can be optionally used by
the application to re-sequence packets.
For the most part human perception of
multi-media information such as video can tolerate some loss of
information. An example is video
transmission; if some frame in a sample of video is lost in the transmission it
would not likely be noticeable to the viewer.
This means that we do not have to guarantee the delivery of all
information as we do with TCP. It should
be noted that there are thresholds for human tolerance of data transmission
errors. Some of the metrics we can use
to monitor these thresholds are described in proceeding sections of this paper.
3
Performance Metrics
3.1.1
Jitter
Jitter is a metric used to describe a
variation in the delay of received packets [15]. That is, if packets are transmitted from a
node in a continuous manner with even spacing between the packets, then the
jitter is defined as the variance in the packet spacing when they arrive at the
receiving node.
Audio and video signals are commonly
transported by RTP, since RTP has a packet sequence number then the application
has the potential to re-sequence the packets thus reducing the effects of
jitter. The threshold for jitter for
voice and video applications is suggested to be no higher then 30ms [12].
3.1.2 Delay
Delay is the time it takes for information to
travel from the transmitter to the receiver.
In videoconferencing that would be the time it takes for the sender’s
actions and audio to be encoded by their local endpoint and transmitted across
the network to a remote endpoint and perceived by the receiver. Delay can be introduced at many points of the process;
some of the common points of delay are at the encoding and decoding of the
audio and video, network congestion and geographical distance. In general the maximum acceptable delay time
from mouth to ear or camera to eye are approximately 150ms [12].
3.1.3 Packet Loss
Packet loss is a metric that is based on a
sampling of packets. The metric is
defined as the percentage of packets that did not arrive at the intended
destination. For example, if an endpoint
transmitted 100 packets across the network and the receiving endpoint received
95 of those packets then the packet loss would be 5%. Packets can be lost for two reasons; the
most common is buffer over flows and network congestion, the second less likely
cause is bit errors [4]. The accentual
threshold for packet loss is approximately 1 percent or less. Since MPEG video encoding uses intra-frame
encoding it can be particularly susceptible to introducing artifact distortions
with packet loss greater than 1 percent.
This is because an artifact can persist for longer times since it might
take several seconds before an I frame is sent.
Gaglianello in [3] show that 3 percent packet loss can translate into
frame error rates as high as 30 percent.
With error rates that high it would be difficult for users to interpret
the media content.
4
IP Multi Media Protocols
There are many different ways to deliver
multi-media information across networks.
We have reviewed some of the physical connectivity issues related to
network architecture as well as some of the application factors. We have also provided some QoS metrics. The next step is to examine how multi-media
endpoints communicate to each other across the network. Some of the most common multi-media
communication protocols are H.323 and SIP.
4.1
H.323
H.323 was developed by the ITU-T and it contains
other protocol definitions for IP videoconferencing. H.323, could be considered the successor of
H.320, which is a protocol for voice and video over ISDN networks. At the time of writing this paper H.323 is the
most prevalent protocol used for videoconferencing. Most of the appliance based
videoconferencing manufactures such as Tandberg, Polycom and VCON use H.323 as
their primary method of providing video conferencing services to their clients.
H.323 includes preparatory algorithms for encoding audio and video such as AAC
and H.264. H.323 uses RTP to transport
audio and video information across IP networks.
4.2
SIP
SIP stands for Session Initiation Protocol
and is a newer protocol that is the proposed replacement of H.323. Like H.323 is also uses RTP to transmit
information across the network. However
it is a less complex protocol that was defined by the IETF working group. SIP is gaining popularity in the software
based endpoints as well as VoIP only appliances. Due to its simplicity and scalability it is
predicted that SIP will replace H.323 [11].
Although SIP does seem to have many advantages, appliance based
videoconferencing units are still predominantly h.323. Glasmann et al provide a
comparison between the different components required and the protocols they
execute.
5
Case Study
Our case study will examine an existing MMN the
purpose of this study is to examine some of the key factors related to
implementing the MMN. The system we will
examine is the University of British Colombia (UBC) Distributed Medical Program
Multi Media Network. The network is the
medium for delivering e-Learning medical course material to students
distributed across the province
of British Colombia. The UBC distributed medical program is one of
the largest medical programs in North America and is the first fully
distributed medical education program in North America.
Students receive lectures and labs using network based applications such as
videoconferencing, virtual network computing and various web technologies. This case study will examine the architecture
and application of their network used to transport their videoconference
traffic.
5.1 Network Topology
The network is comprised of three main sites:
The University of Victoria (UVic), The University of Northern British Colombia
(UNBC) and UBC. Every site then has a
number of endpoints used to receive and transmit H.323 traffic. The endpoint hardware is Tandberg 6000 MXP’s
running firmware F2.5. Each endpoint has
a unique IP address and managed using the Tandberg Management Suite (TMS). All Tandberg endpoint terminates three kinds
of local multi media signals: several composite video signals, multiple stereo
audio channels and two DVI sources. The
signals are encoded by the Tandberg’s and then transmitted across the network
using H.323. The Tandberg’s also receive
encoded traffic form other endpoints and decode the signals to be displayed in
their native format See figure 5.
Figure 5: Overview of the Tandberg endpoint
input and output
The media sources are part of an integrated
collaborative environment used for teaching lectures and labs in a distributed
mode. Most of the learning scenarios include at
least three Tandberg endpoints conferenced together using the Tandberg MCU
capabilities.
The network was designed to be an entirely
private layer 2 network [2]. For the most
part the network is connected by switches except for a segment that uses MPLS
to tunnel between UNBC and UBC see figure 6.
The backbone long haul transport network is managed by BCNet, Telus and
Canarie. All Tandberg endpoints share
the same VLAN, traffic from that VLAN has a QoS policy associated with its
traffic.
Figure 6: Topology of the UBC MMN which spans
across the province
of British Columbia. Each
of the rectangles are Tandberg endpoints.
5.2 QoS Approach
To implement QoS on the UBC MMN there needed
to be a double approach. The double
approach was to use Layer 2 and Layer 3 QoS.
Layer 2 QoS uses an 802.1P CoS value, which is a 3 bit field in the
802.1Q frame header [5]. Layer 2 QoS was
implemented on each campus from the endpoint to the demark point of the
transport network. When packets get to
the demark point they are remarked with layer 3 DiffServ code points in the 6
bit field in the IP header. They used
the Assured Forwarding code point AF41 (100010) to denote priority
videoconference traffic. This approach
was developed by networking teams from all the institutions involved with the
project and was based on some guidelines provided by the equipment
manufactures. We will discuss some of
the implications of their approach to mapping the QoS values to the application
in the next section.
The overall goal of this QoS strategy was to assure
that content was delivered to its destination in a linear, constant and
complete manner. To test the
implementation we collected data from Cisco Service Assurance Agents (SAA) that
were located at each campus site on the MMN.
We examined data for the month of October 2005. The results were encouraging with packet loss no higher then 0.25% see figure 7, jitter less then 1ms see figure 8 and latency less then 17ms see figure 9.
Figure 7: UBC expanded medical program MMN
SAA results for packet loss in October 2005
Figure 8: UBC expanded medical program MMN
SAA results for jitter in October 2005
Figure 9: UBC expanded medical program MMN
SAA results for latency in October 2005
After reviewing the collected data, it would indicate
that all performance indicators fall well below the suggested thresholds. However it should be noted that the UBC
distributed medical program is still in its initial stages and will continue to
grow over the next 5 years as it adds new sites. This means that more demand will be put on
the MMN which could impact performance.
5.3 Potential Issues
When examining the implementation of the QoS
policies of the UBC MMN we discovered that there were performing bandwidth
policing. This means that there was a limit
on the amount of traffic between the different institutions. The limit was set to 30Mbps per campus at
each demark point of the transport network.
All traffic under 30Mbps was marked with AF41 (100010) DiffServ code
point. Traffic in excess of 30Mbps was
remarked with Best Efforts (BE) (000000) DiffServ code
point. Best efforts means that the
traffic will not receive priority over other traffic on the transport network.
This has the potential to introduce errors in
to the traffic flows. For example, if
packet a is sent at time t0 and is in excess of the 30Mbps bandwidth
limitation then it is marked as BE.
Packet a is then followed by packet b at time t1 and it is
under the 30Mbps threshold then it is marked AF41 and has priority on the
network. This could result in packets packet
b
arriving at the destination before packet a.
So to summarize the implemented bandwidth policing has the potential to
introduce the error of sending packets out of sequence see figure 9. It should be noted that the Tandberg 6000MXP
will discard packets that arrive out of sequence if they exceed the 100ms
jitter buffer. Any packets that arrive
with in the 100ms jitter buffer will be re sequence using the RTP sequence
number in the RTP header.
Figure 10: UBC MMN Queuing strategy
Since it is likely that packets sent to the BE queue
will be dropped by the Tandberg endpoint it would be more efficient on network
resources to drop the packet before it hits the BE queue. That is if the packet has a high probability
of being dropped by the endpoint because it is in the BE queue then to reduce
congestions on the network the router could drop the packet. A solution could be to implement a Weighted
Random Early Detection (WRED) algorithm as suggested in [14].
However the current usage on the UBC MMN has not
exceeded 30Mbps yet due to the fact that the program is not yet at capacity. Currently there is the potential for 4 videoconferencing
sessions to be hosted at UBC each requiring a maximum 6144Kbps and a total
bandwidth of 24Mbps at the UBC demark point to the transport network. It should be noted that at the time of
writing this paper two new sites have come online at Vancouver General Hospital
(VGH). They share the demark point with
UBC. This has the potential to increases
the bandwidth by 12Mbps bringing the UBC maximum bandwidth potential to 36Mbps
which is in excess of the current limitations. This means that using the current approach
to police bandwidth there is a very likely chance that network integrity will
be compromised to errors due to out of order packets.
6
Conclusion
This paper has presented a real world example of a
multi media network; we have enumerated some of the issues that must be considered
when providing high levels of QoS. It
would seem that there are indeed many factors that need to be considered when
developing MMN’s. Such as: network topologies,
technology integration, methods of encoding and performance metrics. It is our hope that this paper can provide a
framework for others developing multi media networks. We also reviewed the UBC case study
demonstrated that QoS issues must be examined carefully so as not to introduce
efforts into the design. Our next steps
are to increases the sample size of our case study in hopes to allow our
observations generalize to a greater population. It is would seem that there is a great
demand for multi media applications systems in many areas such as health care,
education and business. Yet the
realization of these systems are dependent on providing well thought out
networking infrastructure, we hope that this paper makes a positive
contribution in to moving towards that goal.
References
[1]
Alles
A, “ATM internetworking,” tech. rep., CISCO Systems Inc, May 1995.
http://www.cisco.com
[2]
BCNET
AV Network Plan Draft 1.00 September 29, 2004
[3]
Boyce
J, Gaglianello R. Packet Loss Effects on MPEG Video Sent Over the Public
Internet
[4]
Busse
I, Deffner B, Schulzrinne H. Dynamic QoS control of multimedia applications
based on RTP. Computer Communications, 19(1):49–58, Jan. 1996
[5]
Glasmann
J., W. Kellerer, and H. Muller, Service Architectures in H.323 and SIP: A
Comparison, IEEE Communications Society Surveys and Tutorials, 5(2),2003
[6]
Gringeri
S, Egorov R, Shuaib K, Lewis A, Basch B. Robust compression and transmission of
MPEG-4 video. In Proc. ACM Multimedia, 1999
[7]
Implementing
QoS Solutions for H.323 Video Conferencing over IP
http://www.cisco.com/warp/public/105/video-qos.html Accessed on Nov 12 2005
[8]
List
of codecs http://en.wikipedia.org/wiki/List_of_codecs Accessed Nov 11 2005
[9]
Rodrigues
R, Grilo A, Santos
M, Nunes M. Native ATM Videoconferencing Based on H.323. Proceedings of the II
Conference on Telecommunications, ConfTele'99. Sesimbra, Portugal.
April 1999
[10]
Rose
O. Statistical Properties of MPEG Video Trac and Their Impact on Trac Modeling
in ATM Systems. Technical Report 101, Institute
of Computer Science, University of Wurzburg, Germany,
February 1995
[11]
Schulzrinne
H, Rosenberg J. A comparison of sip and h.323 for internet telephony. In
Network and Operating System Support for Digital Audio and Video (NOSSDAV),
July 1998
[12]
Service
Provider Quality-of-Service Overview
http://www.cisco.com/warp/public/cc/so/neso/sqso/spqos_wp.htm Accessed on Nov
12 2005
[13]
Shahsavari
M, Al-Tunsi A. MPLS Performance Modeling Using Traffic Engineering to Improve
QoS Routing on IP Networks, Southeast Conference 2002, Proceedings IEEE pp.
152-157 2002
[14]
Takeo,
J. Tasaka, S.. Application-Level QoS of
web Access and Streaming Services with AF Services on DiffServe. Global
Telecommunications Conference, 2003. GLOBECOM '03. IEEE
[15]
Understanding
Jitter in Packet Voice Networks (Cisco IOS Platforms) Mar 30, 2005
http://www.cisco.com/warp/public/788/voice-qos/jitter_packet_voice.html