SP&T News

Audio applications for IP video systems

The capability to transmit and record two-way high-fidelity audio alongside video on an IP network is increasingly being used in a number of surveillance and non-security applications. It is becoming a powerful tool and is often overlooked when designing IP Video systems. This article explores some of the applications for audio and the technology that enables it to be such a useful addition to IP Video projects.

November 23, 2011  By Oliver Vellacott

Two-Way Audio
IP video systems allow full duplex, two way digital audio to be transmitted over the IP network. This means the ability to speak and listen at both ends at the same time. Leading IP video manufacturers have built-in audio input/output capability in their IP cameras and video encoders. This allows a microphone and/or speaker to be mounted on or near the camera/encoder. The audio is compressed typically using the AAC compression standard. AAC (Advanced Audio Coding) is an extensive audio compression standard, which forms part of both the ISO 13818 MPEG-2 and ISO 14496 MPEG-4 standards, which also cover such video compression standards as ISO 14496-10 (H.264).

This basic two-way audio is often used for intercom applications and public safety help points. The intercom can be audio only or use video, for a fully functional video intercom.

The authorities in Georgetown, the capital of Penang Island, Malaysia, installed a number of Emergency Kiosks as part of a new IP video surveillance system. When a member of the public activates the emergency button, two-way communication is opened up with one of the control room operators via a hidden microphone and camera in the kiosk. The intercom video from the kiosk automatically displays on a video management workstation and the nearest PTZ camera is panned and zoomed to the kiosk area. This is all achieved over the wireless network.

Message Broadcast


Broadcasting pre-recorded messages from speakers mounted in the vicinity of cameras, triggered from a local alarm or event is a powerful deterrent. IP cameras and video encoders can have digital I/O capability to interface to intruder alarm systems, thereby triggering an automatic “You are being recorded on CCTV” message if an intruder is detected. Real time analytics running at the network edge in the camera can also be used as a trigger. For example, a Virtual Tripwire analytic function can be configured to protect a secure area. Anybody crossing the tripwire will be warned with an audible message.

Tight integration between access control and surveillance systems over IP networks is now built into the more advanced IP video offerings. This allows incidents such as a forced door alarm to automatically trigger events in the IP Video system, including pre-recorded message broadcasts.

Broadcasting pre-recorded messages enables unattended use. However, using free-speech is also a very powerful feature, and has been used to great effect in California. Adding one-way audio to some of the cameras has stepped up surveillance effectiveness in the City of Redlands, Calif., especially in their public and park areas. Dispatch is now able to communicate directly to individuals in the act of a nuisance crime. In most cases, the individual has stopped what they were doing and left the scene as soon as they heard the audio come through. The “virtual policing” has freed-up more expensive manpower to concentrate on issues that require professionally trained officers. Redlands police force now handles a greater number of incidents with their existing but smaller staff.

Synchronized Audio

Synchronizing audio with video to eliminate lip-sync delays has become an important prerequisite for a number of applications including law enforcement interviews. The police have to ensure that evidential video clips used in court cannot be open to interpretation. Having any delay between the video and audio could introduce uncertainty during playback. Similarly, any dropped frames in the video would be unacceptable for the same reason. IP video vendors that can fully support synchronized video and guarantee no dropped video frames are being increasingly specified for these law enforcement applications. Some IP Video vendors are also developing dual channel audio capability, which will allow the interviewer and interviewee to have their own microphone to further improve the clarity of the recorded interview.

An example of how the latest IP video technology can transform the interview process is seen at the Hudson County Prosecutor’s Office, New Jersey. The Prosecutor’s Office has a staff in excess of 300 that includes assistant prosecutors, detectives, and clerical and support personnel. The investigative division of the office is supplemented by officers employed by the County Sheriff and the county’s 12 municipalities. It is the second largest county prosecutor’s office in the State of New Jersey.

The new IP Video system replaced an analogue/VCR system that was proving to be inadequate for the demanding application of recording law enforcement interviews. Officers could only view video and listen to audio from a small room adjacent to the interview room itself and longer interviews had to be interrupted to change tapes. The IP Video distributed architecture now allows live and recorded video from any camera to be viewed from anywhere on the network. Redundant and fault tolerant Networked Video Recorders (NVRs) have replaced the unreliable VCRs and removed the need for tape changes. The ability for multiple users to view video over the network has also aided in the investigative process.

“Following an analysis of competing systems we chose the solution that delivered the highest quality video and the best audio and video synchronization,” explains Sergeant Gerald Dezenzo, Head of Computer Crimes. “It is vitally important when recording interviews that we have no delay between the audio and video, otherwise it could bring into question the validity of the interview.”

The County Prosecutor’s Office currently works out of three buildings and has 25-30 users that monitor live and recorded video from 20 cameras installed in four interview rooms, cells and public waiting areas. The Task Force building is home to the Homicide Unit, Special Victims Unit (SVU) and Narcotics Task Force, each of which has its own interview and monitoring room. In addition, the SVU also conducts interviews in the Hudson County Child Advocacy Center which is approximately 2 miles away. All of the buildings are connected via a network and video from any camera can be viewed from each location.

“Having a fully redundant recording solution for video and audio was an important consideration when choosing the system,” added Dezenzo. “NVRs have been configured to automatically record an interview onto two separate recorders for each room, thereby ensuring no part of the interview is lost in the event of an NVR failure.”

Public Address (PA)

The next logical step for IP video systems is to integrate PA capability into their security management software, thereby offering building owners an additional service without the need to install a separate system, generating significant cost savings. The flexibility of this approach has been demonstrated in schools. Here a pre-recorded message of a class-change bell is broadcast to all speakers at regular intervals throughout the day. Staff can also manually transmit a live message to one or more specific speakers in response to an incident viewed on camera, warning students of their actions. This is achieved through the ability to group cameras (and their associated speakers) into ‘PA’ groups, enabling messages to be simply broadcast to particular parts of a building — all across the IP network.


Latency is the delay introduced when transmitting information across a network. The delay increases as more data is transmitted. In reality it takes very little data to transmit high-quality audio across a network, as is seen by the many people that stream music across the Internet and their home networks. However, as audio is generally synchronized with the video, which takes considerably more bandwidth, it is the video that determines the delay in the audio. Therefore the IP Video systems with the best video compression and bandwidth management will be able to deliver audio with little or no delay for the majority of applications, even when transmitting across long distances.

Educational Applications

IP Video systems are finding their way more and more into non-security applications, where audio also plays an important role. This is particularly the case in the education and training sector.

A complete IP Video solution is at the heart of a medical training system for the prestigious Oxford Brookes University School of Health & Social Care. Based in a simulated 24-bed ward facility, the innovative system uses specialist cameras to allow tutors and medical students to interact in a live patient care environment. Video management software is used to manage the viewing of live and recorded video and audio from each bed. A tutor can simultaneously monitor all 24 beds remotely and enter into a two-way conversation with students at the bedside — no longer does the consultant have to peer over the student’s shoulder.

On March 4, 2010, New Zealand’s Minister of Education officially opened Rangitoto College’s upgraded computer network during the school’s assembly. Rangitoto College is a state co-educational high school based on the North Shore of Auckland, New Zealand. It is the country’s largest high school with over 3,000 students. However, due to the size of the school they are unable to accommodate all of the students in the main hall at the same time. So an innovative solution was developed to use a High Definition (HD) IP Camera to view the opening ceremony and web-cast live video and synchronized audio to over 600 computers around the school’s network, allowing every student to watch proceedings. It proved so successful the school now regularly uses the technology to broadcast whole-school assemblies.

Oliver Vellacott is the CEO of IndigoVision, which he founded in 1994. He was previously a product manager with a background in intelligent camera products. Vellacott gained his first degree in Software Engineering from Imperial College, London, and then a PhD in Electrical Engineering from Edinburgh University.

Print this page


Stories continue below