Originally Published by Audio engineering Society, 2015
Recordings involving cellular telephones or personal digital assistants ("PDAs") are increasingly the source evidence in audio forensic examinations, compared to recordings originating with other devices such as hand-held digital recorders. On modern PDA cellular telephones recordings can be made either directly to the telephone or transmitted as voice mail messages. The current investigation focuses on differences in the two types of recordings in terms of dynamic range and linearity of levels. Such information can be important for characterizing the distance of sound sources relative to the microphone and are important for understanding transformation of recorded speech and non-speech sounds.
Recordings involving cellular telephones or personal digital assistants ("PDAs") are increasingly the focus of audio forensic examinations, while recordings originating with other devices such as hand-held digital recorders have decreased as these devices are supplanted by PDAs, much as what has happened with consumer-level cameras. On modern PDA cellular telephones, recordings can be made either directly to the telephone, or transmitted as voice mail messages to a particular messaging system. It becomes immediately obvious that the quality of such recordings varies widely between messaging systems and between built-in recording applications;furthermore, the sensitivity of recordings to different sound source levels varies greatly, due to codecs and/or telephonic transmission. An understanding of these transformational properties has important implications for forensic analyses that include speech transcription of "background" voices, gunshots, distance of sound sources, and environmental context analysis (sometimes referred to as "roomprints") [1, 2].
When made directly to the telephone, the recording function is similar to a traditional hand-held digital recorder. For example the native voice memo recording application on the iPhone 5s writes .m4a files that, while technically written using a lossy codec, are both wideband and decent dynamic range (sampling rate 44.1 kHz, bitrate 64 Kbps). A particular application from a third party may allow alternative means of export (but not necessarily recording). The use of the iPhone microphone as input to a sound level meter application has also been evaluated in the literature .
When a recording is saved as a voicemail message, either on another telephone, on a corporate voice mail server, or on an emergency ("911") call log recording system, several additional and varied stages of signal processing can be involved. These stages include algorithms designed to optimize the speech signal against background noise, including compression and voice activity detection algorithms. Most importantly, these stages of signal processing alter the spectrum of speech as a function of level.
The current investigation is necessarily limited in scope, focusing primarily on tests using an iPhone 5s in our laboratory using an AT&T carrier (GSM 4G) in the San Francisco Bay Area. Analyses of actual gunshots from a different older-model cellular telephone are provided in a final section (details have been omitted due to privacy concerns). We have approached understanding of effects from the standpoint of a "black box" analysis, where we provide detailed description of the input and resulting output. The analyses focus on differences in types of recordings in terms of dynamic range and linearity of levels. Such information can be important for characterizing the distance of sound sources relative to the microphone and are important for understanding the impact of recorded speech and non-speech sounds in forensic settings.
An admitted but not fatal limitation of this study is a lack of certainty regarding the technical aspects of the hardware involved, and how the signal was transformed by individual signal processing elements in the communication chain of the cellular telephone system, including those features caused by linear predictive coding, discontinuous transmission, voice activity detection, and the inclusion of so-called "comfort noise" (see e.g.  for an informative overview). In other words, we observe what occurs to the signal and its implications for forensic analysis, without analyzing the specific causes.
The frequency response characteristics of the voice mail recordings (AMR file structure) indicate a classic telephony narrow band, fixed rate with a cutoff at around 4k consistent with (but not necessarily indicating) a G711 mu-law codec. There are 3 microphones in the iPhone, one at the bottom, one next to the speaker above the screen, and one between the flash and the lens of the camera on the rear of the phone. Investigation indicates that different applications may access different microphones to optimize the functionality. For example, when using the front facing camera for recording video, the front facing microphone is active. This microphone is deactivated when the camera is switched to rear facing, and the rear facing microphone becomes active. The voice memo appears to use all three microphones to varying degrees, with the main (bottom) mic being the main source. When making voice calls, only the bottom mic is actively transmitting speech, however the iPhone features a noise cancellation control system and it is believed that at least one additional microphone is used for noise cancellation during voice calls when this feature is enabled.
Durand R. Begault received a PhD in Computer Audio from UC San Diego in 1988 and an MFA from Mills College in 1981. He has been associated with the Audio Forensic Center and Charles M. Salter Associates, Inc. since 1996 and has worked as an expert witness, legal consultant, and acoustical consultant on a variety of audio-related cases since 1988. Dr. Begault also holds a research position at NASA Ames Research Center and is an adjunct faculty member at McGill University. He is the author of several U.S. patents and is qualified as an expert witness in both Federal and State Courts. Dr. Begault has testified in over 40 cases on a variety of audio technology and forensic audio-related cases.
©Copyright - All Rights Reserved
DO NOT REPRODUCE WITHOUT WRITTEN PERMISSION BY AUTHOR.