Mouth-to-ear delay

Latency refers to a short period of delay (usually measured in milliseconds) between when an audio signal enters and when it emerges from a system. Potential contributors to latency in an audio system include analog-to-digital conversion, buffering, digital signal processing, transmission time, digital-to-analog conversion and the speed of sound in air.

Latency in broadcast audio

Audio latency can be experienced in broadcast systems where someone is contributing to a live broadcast over a satellite or similar link with high delay, where the person in the main studio has to wait for the contributor at the other end of the link to react to questions. Latency in this context could be between several hundred milliseconds and a few seconds. Dealing with audio latencies as high as this takes special training in order to make the resulting combined audio output reasonably acceptable to the listeners. Wherever practical, it is important to try to keep live production audio latency low throughout the production system in order to keep the reactions and interchange of participants as natural as possible. A latency of 10 milliseconds or better is the target for audio circuits within professional production structures,[1] local circuits should ideally have a latency of 1 millisecond or better.[dubious ]

Latency in telephone calls

In all systems, latency can be said to consist of three elements: Codec delay, Playout delay and Network delay.

Cellular Calls

Latency in telephone calls is sometimes referred to as mouth-to-ear delay; the telecommunications industry also uses the term Quality of Experience (QoE). Voice quality is measured according to the ITU model; measurable quality of a call degrades rapidly where the ear-to-mouth latency exceeds 200 miliseconds. The MOS score is also comparable in a near-linear fashion with the ITU's quality scale - defined in standards G.107 (page 800),[2] G.108[3] and G.109[4] - with a quality factor R ranging from 0 to 100. An MOS of 4 ('Good') would have an R score of 80 or above; to achieve 100R requires an MOS exceeding 4.5.

The ITU and 3GPP groups end user services into classes based on latency sensitivity:[5]

Very sensitive to delay Less sensitive to delay
  • Conversational Class (3GPP)
  • Interactive Class (ITU)
  • Interactive Class (3GPP)
  • Responsive Class (ITU)
  • Streaming Class (3GPP)
  • Timely Class (ITU)
  • Background Class (3GPP)
  • Non Critical Class (ITU)
Services Conversational video/voice, realtime video Voice messaging Streaming video and voice Fax
Realtime data Transactional data Non realtime data Background data

Similarly, the G.114 recommendation regarding mouth-to-ear latency indicates that most users are "very satisfied" as long as latency does not exceed 200 ms, with an according R of 90+. Codec choice also plays an important role; the highest quality (and highest bandwidth) codecs like G.711 are usually configured to incur the least encode-decode latency, so on a network with sufficient throughput sub-100 ms latencies can be achieved. G.711 is the encoding method used on nearly all PSTN/POTS networks, at a bitrate of 64 kbit/s and generally limited to 160 bytes (20ms of voice) by encoding 20 ms of audio per frame.[6]

By comparison the AMR narrowband codec, used currently in UMTS networks, is a low bitrate, highly compressed, adaptive bitrate codec achieving rates from 4.75 to 12.2 kbit/s with 'toll quality' (MOS 4.0 or above) from 7.4 kbit/s. 2G networks use the AMR-12.2 codec, equivalent to GSM-EFR. As mobile operators upgrade existing best-effort networks to support concurrent multiple types of service over all-IP networks, services such as Hierarchical Quality of Service (H-QoS) allow for per-user, per-service QoS policies to prioritise time-sensitive protocols like voice calls and other wireless backhaul traffic. Along with more efficient voice codecs, this helps to maintain a sufficient MOS rating whilst the volume of overall traffic on often oversubscribed mobile networks increases with demand.[7][8][9]

Another overlooked aspect of mobile latency is the inter-network handoff; as a customer on Network A calls a Network B customer the call must traverse two separate Radio Access Networks, two core networks and an interlinking Gateway Mobile Switching Centre (GMSC) which performs the physical interconnecting between the two providers.[10]

IP Calls

On a stable connection with sufficient bandwidth and minimal latency, VoIP systems typically have a minimum of 20 ms inherent latency and target 150 ms as a maximum latency for general consumer use. With end-to-end QoS managed and assured rate connections, latency can be reduced to analogue PSTN/POTS levels. Latency is a larger consideration in these systems when an echo is present therefore popular VoIP codecs such as G.729 perform complex voice detection and noise suppression.[11]

Latency in computer audio

Latency can be a particular problem in audio platforms. A popular solution is Steinberg's ASIO, which bypasses these layers and connects audio signals directly to the sound card's hardware. Most professional and semi-professional audio applications utilize the ASIO driver, allowing users to work with audio in real time.[12]

The RT-kernel (RealTime-kernel)[13] is a modified Linux-kernel, that alters the standard timer frequency the Linux kernel uses and gives all processes or threads the ability to have realtime-priority. (This means, that a time-critical process like an audio-stream can get priority over another, less-critical process like network activity. This is also configurable per user (for example, the processes of user "tux" could have priority over processes of user "nobody" or over the processes of several system daemons). On a standard Linux-system, this is possible with only one process at the same time.

Latency in HDTV audio

Many modern TVs use sophisticated audio processing, which can create a delay between the time when the audio signal is received by the TV and the time when it is heard on the speakers. Since many of these TVs also cause delays in processing the video signal this can result in the two signals being sufficiently synchronized to be unnoticeable by the viewer. However, if the difference between the audio and video delay is significant, the effect can be disconcerting. Some TVs have a "lip sync" setting that allows the audio lag to be adjusted to synchronize with the video, and others may have advanced settings where some of the audio processing steps can be turned off.

Audio lag is also a significant detriment in rhythm games, where precise timing is required to succeed. Most of these games have a lag calibration setting where upon the game will adjust the timing windows by a certain number of milliseconds to compensate. In these cases, the notes of a song will be sent to the speakers before the game even receives the required input from the player in order to maintain the illusion of rhythm. Unfortunately, games that rely upon "freestyling", such as Rock Band drums or DJ Hero, can still suffer tremendously, as the game cannot predict what the player will hit in these cases, and excessive lag will still create a noticeable delay between hitting notes and hearing them play.

Audio latency in live performance

Professional digital audio equipment has latency associated with two general processes: conversion from one format to another, and digital signal processing (DSP) tasks such as equalization, compression and routing. Analog audio equipment has no appreciable latency.

Digital conversion processes include analog-to-digital converters (ADC), digital-to-analog converters (DAC), and various changes from one digital format to another, such as AES3 which carries low-voltage electrical signals to ADAT, an optical transport. Any such process takes a small amount of time to accomplish; typical latencies are in the range of 0.2 to 1.5 milliseconds, depending on sampling rate, bit depth, software design and hardware architecture.[14]

DSP can take several forms; for instance, Finite impulse response (FIR) and Infinite impulse response (IIR) filters take two different mathematical approaches to the same end and can have different latencies, depending on the lowest audio frequency that is being processed as well as on software and hardware implementations. Typical latencies range from 0.5 to ten milliseconds with some designs having as much as 30 milliseconds.[15]

Individual digital audio devices can be designed with a fixed overall latency from input to output or they can have a total latency that fluctuates with changes to internal processing architecture. In the latter design, engaging additional functions adds latency.

Latency in digital audio equipment is most noticeable when a singer's voice is transmitted through their microphone, through digital audio mixing, processing and routing paths, and then sent to their own ears via in ear monitors or headphones. In this case, the singer's vocal sound is conducted to their own ear through the bones of the head and then a few milliseconds later through the digital pathway to their ears.

Latency for other musical activity such as playing a guitar does not have the same critical concern. Ten milliseconds of latency isn't as noticeable to a listener who is not hearing his or her own voice.[16]

Latency used for delayed loudspeakers

In audio reinforcement for music or speech presentation in large venues, it is optimal to deliver sufficient sound volume to the back of the venue without resorting to excessive sound volumes near the front. One way for audio engineers to achieve this is to use additional loudspeakers placed at a distance from the stage but closer to the rear of the audience. Sound travels through air at the speed of sound (around 343 metres (1,125 ft) per second depending on air temperature and humidity). By measuring or estimating the difference in latency between the loudspeakers near the stage and the loudspeakers nearer the audience, the audio engineer can introduce an appropriate delay in the audio signal going to the latter loudspeakers. Because of the Haas effect approximately 15 milliseconds can be added to the delay time of the loudspeakers nearer the audience, to focus the audience's attention on the stage rather than the local loudspeaker. The slightly later sound from delayed loudspeakers simply increases the perceived sound level without negatively affecting localization.

See also


External links

  • Fixing Audio Latencyde:Latenz

fr:Latence pl:Latencja pt:Latência fi:Latenssi

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.