Home enSupportKnow-HowHow do we measure Speech Intelligibility STI?

Speech Intelligibility STI

The intelligibility of an emergency announcement in a public area can be crucial for those present. This applies in particular to train stations and airports, congress and shopping centers, stadiums, lecture halls and classrooms, etc.

In order to ensure good speech intelligibility, an objective measurement process is necessary that delivers relevant and reproducible results. Measuring the Speech Transmission Index (STI) is such a process.

How do we measure Speech Intelligibility STI?

By analyzing speech acoustically, we find that human vocal sounds have two characteristics:

they are in the frequency range from approx. 100 Hz to 10 kHz
the intensity of the sounds modulates slowly (between 0.63 and 12.5 Hz)

These modulations of the sound signal generated by a speaking person are an important part of the transmission of information within a language. If they are partially lost during transmission, the intelligibility of the speech suffers.

Long-term Speech Spectrum

An STI measurement examines how well the modulations are preserved, and presents the result using a Modulation Transfer Function (MTF). The MTF measurements of individual octave bands indicate how well the modulations were preserved in each of the frequency bands, by examining the modulation ratios “mr” in each octave band. The measurement involves sending, receiving and analyzing a synthetic speech signal. The test signal itself is based on pink noise, modulated in seven octave bands with 14 different frequencies. 98 combinations result.

Examples of the 1 kHz Octave Band signal with some modulations

b1k_no.wav – 1 kHz band with no modulation
b1k_f1m1.wav – 1 kHz band with modulation at 2 Hz, m=1
b1k_f2m1.wav – 1 kHz band with modulation at 10 Hz, m=1
b1k_f1f2.wav – 1 kHz band with modulation at 2 Hz and 10 Hz, m=0.55

Measuring all 98 combinations, known as the full STI method, is quite complex and difficult to implement in a portable device. However, this method is the most detailed solution of measuring speech intelligibility and is used whenever alternative approaches fail to deliver reliable results due to adverse environmental conditions.

In practice, the Speech Transmission Index for Public Address (STIPA) measurement method is mostly used. It measures 14 of the combinations and was specially developed for portable devices.

Assuming that there is no loud, impulsive ambient noise and that there are no strong nonlinear distortions, the STIPA method delivers STI measurement results within 15 seconds and with an accuracy comparable to using the full STI method.

If impulsive ambient noise is present during normal operating times, the measurement is typically carried out at a more favorable time, e.g. at night.

At the measurement position, the measurement device determines the frequency response and the extent to which the transmitted modulations have been changed. From this, the standardized STI result is calculated using a “quality rating” scale. A value of STI = 1 stands for perfect intelligibility, while STI = 0 means that the information content has been completely lost.

STI value	Quality rating acc. to IEC 60268-16
0 ... 0.3	bad
0.3 ... 0.45	poor
0.45 ... 0.6	fair
0.6 ... 0.75	good
0.75 ... 1	excellent

Alternatively, the result can also be displayed on a Common Intelligibility Scale (CIS), which is calculated as: CIS = 1 + log(STI).

Challenges

External, impulsive noises that are present during an STI measurement interfere with the test signal and thus change the result. Therefore, the STI measurement should always take place in the quietest possible environment i.e. excluding noise from machines and people, etc.

If the noise is an integral part of the environment, it will change the STI value. It should thus be measured separately and included in the final calculation of the STI result.

How is a STIPA measurement done?

Measure the typical background noise

The background noise level is measured under typical conditions, i.e. in the presence of a crowd. The LAeq is recorded for 30 seconds (or more) and saved in octave resolution. If an unusual, loud noise occurs during this measurement, the measurement must be rejected and repeated.

STI Measurement

The STI measurement itself ideally takes place when the location is empty e.g. at night.

Note: in certain places - e.g. a smaller train station in the middle of a residential area - it may not be possible to carry out STI measurements at night, as this would disturb the nearby residents. In such cases, the STI measurement takes place during the day, i.e. in the normal operating environment, and no correction of the STI result with a previously-recorded background noise level is necessary.

The STI test signal can be reproduced in two ways:

With an audio cable into the existing PA system. e.g. use the MR-PRO signal generator

(Note: CD or MP3 players are less suitable because they have fluctuations in the sampling rate or can change the test signal through compression, which in turn negatively affects the measurement result).

Acoustically from a dedicated loudspeaker, e.g. the NTi Audio TalkBox, which reproduces the test signal with a calibrated sound level of 60 dB at a distance of 1 m (the normal human speaking level). This solution is used wherever announcements are usually made through a microphone, or in places where the speech signal is not amplified electro-acoustically, e.g. in classrooms.

If a PA system is used for announcements, the next step is to adjust the volume in the public area. This should be measured at least at 6 dB, but better 10-18 dB above the usual background noise level. It should be noted that if the announcement level is too loud (over 80 dB) speech intelligibility will likely decrease.

Finally, STI values should be taken at several measurement positions; namely wherever people are usually located. The measuring points must be at a reasonable distance from one another in order to obtain a representative result. An individual STIPA measurement take 15 seconds per position. The measurements are averaged to a single result for the entire room.

Examining the Results

The plausibility of each individual result obtained must be checked. This identifies invalid measurements, e.g. due to impulsive ambient noise. The following errors can occur:

Invalid modulation ratios in the individual octave bands (mr1 or mr2> 1.3)
Fluctuating level relationships or impulsive conditions during the measurement (detected by comparing the first half of the measurement period with the second)

Note: advanced acoustic analyzers such as the XL2 perform this analysis and display the result automatically.

Analyzing the Results

The next step is to offset the measured STI results against the spectrum of the usual background noise. There are three methods available for this procedure:

Direct measurement of speech intelligibility STI in a normal operating environment, i.e. in the presence of a crowd (see note in the above section “STI measurement”).
Separately measure the typical ambient noise and add it to the measured STI value.
Manually add a suitable set of predefined ambient noise data values (e.g. according to “Richtlinie des Österreichischen Bundesfeuerwehrverbandes”, TRVB S 1458).

Note: advanced acoustic analyzers such as the XL2 support each of these three methods and automatically calculate and display the result.

Averaging

Various standards determine how many times each measurement should be taken. The IEC 60268-16 standard, for example, recommends averaging at least three measured values at each measurement point, when in the presence of background noise. The deviation between any two of these three results must not be greater than 0.03 STI. The German VDE 0833-4 standard, on the other hand, requires a minimum of three measurements only if the first STI value is < 0.63.

Note: advanced acoustic analyzers such as the XL2 can independently perform this calculation and display the results.

Special Consideration

In emergencies, announcers tend to raise their voices and speak louder. This behavior is called the Lombard effect. To cover this situation, the acoustic input of the STI test signal can also be played at a level 10 dB higher. The TalkBox supports this application.

Documentation in accordance with Standards

The last step of a complete speech intelligibility analysis is the creation of a standard-compliant report. This report must take into account the applicable standard, e.g.

AS 1670.4
CEN/TS 54-32:2015
DIN EN 50849:2017
IEC 60268-16
ISO 7240-19:2007
VDE V 0833-4-32:2016
VDE 0828-1:2017-11

Note: the free STI Reporting Tool from NTi Audio covers this requirement, importing the XL2 measurement data and providing a standard-compliant report

How to improve Speech Intelligibility

Voice Alarm System

Deficiencies in the Voice Alarm System, e.g. distortion, defective components, or incorrectly-wired speakers, can lead to poor speech intelligibility. Identifying these errors requires a suitable signal generator and measuring device for the necessary electrical and acoustic tests. The MR-PRO and the XL2 are ideal for this.

Also, an unfavorable layout of the voice alarm system can contribute to poor STI values. Too few speakers, for example, could lead to an inhomogeneous sound field with 'holes' in certain areas. This however means that the loudspeakers have to be operated correspondingly louder, which in turn leads to unpleasantly loud sound in other areas. Generally speaking, it is therefore advisable to install rather more than less, evenly distributed loudspeakers in the room.

Room Acoustics

The acoustic characteristics of the location have an important influence on speech intelligibility. The primary factor here is whether the direct sound to the listener is sufficiently dominant over any sound reflections that may occur. As long as this is the case, no further measures may be necessary. Lots of reverberation can impair the intelligibility of the speech. As a countermeasure, we recommend installing sound-absorbing objects such as curtains, carpets, upholstered furniture or special acoustic panels.

Background Noise

If there is a lot of ambient noise, the intelligibility of speech may deteriorate. This can happen if the location is insufficiently shielded from nearby noise sources.

In such cases, it usually helps to install better windows, noise barriers or similar measures that decouple the public area from the external noise source.