from Wm. M. D. Wright
M.I.T. TERM PAPER
AUDIO MARKET NEWS (unpublished)
M.I.T. TERM PAPER
Date: Sept. 18, 1955
Title: Neurally-modeled Signal Processing
H. E. Edgerton, Dept. of Electrical Engineering
Co-advisor: Dr. G. Lyons, Dept. of Food Technology, Professor of Bio-Physics
To develop a method of signal processing using optical methods modeled after neural methods.
By postulating a set of theoretical black-box models of the human brain, and using an iterative methodology, those models which offer the best fit can be studied in more detail.
I propose using what is known of ' microwave-optics and diffraction as an aide to understanding the possible methods of using optical methods.
I further propose employing optical methods using light of great phase-coherence, perhaps by using a diffraction grating and suitable slit-aperture masks, with a light source of high intensity to enable complex diffraction patterns to be studied. Theoretically, even mono-chromatic images could be created.
If this is possible, optical methods could enable signals to be processed at speeds only constrained by the storage speeds that are themselves involved.
The models should allow for learning; initially-imperfect logic which can be refined by "training" and reverberant reprocessing. It is hoped by this process, that "creative' neural-networks might be enabled using a sort of serendipity. As creativity is a form of pattern-recognition, I also wish, if there is any merit, to expand my project into a thesis.
Wm. Michael D.
Holograms and the brain
The stimulating interview "The challenge of the brain" in the April issue of SCIENCE JOURNAL prompts me to comment on the parallels between the human brain and a holographic system which have been a continuous source of fascination to me. Theoretical work in this company in an attempt to devise a unit which would discriminate between sonar return frequency power spectra led to the consideration of holograms as complex filters. While the ability of a hologram to 'store' a number of pictures is well known the respective pictures being apparent upon illumination of the hologram with a coherent light source at each of several points pre-determined by the original exposure set-up-it is not generally realized that the process is reversible. Identification of complex inputs is possible by making the hologram produce a virtual source of coherent light at a point corresponding to the location of the real source during the period when the hologram was made.
In theory such a device can be used to
classify complex inputs such as characters, numerals and power spectra.
Inputs from different sources can be combined if necessary to result in
even more complex associations where required. In addition, 'recall'
of inputs could be effected by introducing a coherent light source stimulus
at one of the points where a virtual source was known to exist. Since
the theory of holography applies to waveforms other than light radiation,
such a device would presumably function for these other forms.
As a communications engineer (originally), I could never understand why perfectly good serial format information such as audible sound waves should be. broken down into a frequency analysis before being accepted by the brain. Even limitations in bandwidth do not explain the number of analysis components used.
However, as simultaneous presentation
of inputs, rather than serial presentation, is necessary for a holographic
system, this apparent complexity of processing could make sense.
This applies to other inputs
to the brain.
Sleep may be that period, free from outside
stimulus, when the transfer takes place between a 'buffer' memory and the
permanent one, dreams being the 'spill-over’ resulting from excitation
of coherent emission source locations. There are many other parallels,
too numerous to detail here, which will be apparent to the person having
knowledge of the two fields.
Perhaps the most fascinating of the things about the theory is that the brain is not called upon to be a sort of 'data processor' similar to the modem digital computers. Outside of a limited number of adaptive connections, the bulk of the system can be a simple duplication although to a fantastic degree -- of a somewhat straightforward system element.
WILLIAM M. D. WRIGHT
Vice President of Research and
Hydrospace Developments Ltd.,
Hydrospace Developments Ltd.
September 19, 1967
To the many people who have presented me with masses of refutation, greetings! My schedule is too full to permit to answer more than a half -dozen of these letters in detail, so I am taking the liberty of covering the important points...
The holographic model fails in isolation. What is necessary is to understand the concept that makes the model "work" - for it is not sufficient just accept the model, but to consider the rest of the hypothesis - that of thoughts echoing through the mind, picking up reinforcement as they are propagated - or to die away through lack of that reinforcement. One of the key points it the "trigger" threshold whereby thought either dies away unborn; or is augmented throughout with the passage of time. It is interesting to consider that perhaps the "serial thinkers" have a one-to-one (or limited) relationship between cause and effect; whereas to the parallel thinkers cause and effect are not so obvious. (leading to "richer” field of choices.)
One can also speculate, quite usefully as it turns out, as to the differences in the child' s upbringing that leads to these disparate personalities. Another clue to the almost self destructive personality of the extremely creative, is the necessary co-existence of the serial and parallel personalities in one person!
As to a possible Model? It occurs to me that we might better start with something simple, such the perception of sound and that the obvious way to develop an understanding is to state the obvious problem: that of converting the 20 k cps (more or less) bandwidth to something more manageable. Taking a cue from Gabor and his holograms; as the hologram is the Fourier transform of a complex image and that hologram is capable of being cut up into pieces, all of which have are capable - essentially - of recreating the original image, albeit with reduced detail and given that much of the auditory information is redundant; the ideal compression mechanism would seem to based on a Fourier - or simplified Fourier - like transform (or inverse transform) which could present, to the brain, a parallel "image” which would lend itself to the creation of conjucate-focii. These, in turn, could produce other complex conjugate foci; and so on and so on. And I should have added one other clarification which should have been obvious, for "power spectrum", read "log of the power spectrum"!
Is this the quintessence of thought? Perhaps!
I concede that this an oversimplification, but one has to start somewhere?
William M. D. Wright
Vice-president of Research and Development
Hydrospace Developments Ltd.
Hydrospace Developments Ltd.
October 17th, 1967
Well, I have some new correspondence that I must answer. Again, I'm sending this same letter to all the people who quibbled with my last letter dated Sept, 19th.
The first point I must make is that the hologram need not be confined to the brain or, for that matter, a flat surface as I proposed to use in the recognition of sonar returns while at MIT. This was what got me into trouble in the first place, and, thank God that Dr. Edgerton was able to 'suggest' that I take a job in Canada until the 'hoo-huraa' died down - and by pulling 'old-boy' strings I was able to contact the (then) Hydro Electric Power Commission of Ontario, where I was assigned to a remote area in Ontario. Whitedog Falls.
There, because of my experience I worked, among other tasks, scuba diving to inspect the cofferdam. I used the experience gained to start the present company. But I digress.
I made a series of patent disclosures to Rudolf Amann, a patent attorney in Boston (about 1959) where I disclosed a 'solid crystalline form' of holographic storage, citing the immense amount of information that (if needs be stored in a read-only format). I had some thoughts of how this could be made to write to memory as well. Alas, I couldn't locate the needed backing to carry this out.
I also wrote to a scientist at Monash University in Australia about the possibility of using the more sensitive areas of the skin (such on the back of the hand) to present a low definition 'picture' which would enable a blind person to circumnavigate a room.
Again this was treated with mild derision. I have found out as Dr. Edgerton told me that the worst people to vet any new idea are the experts. I remember a story that he used to tell in the Faculty Lounge (where I was his guest) about an expert on vacuums, who, as his expertise grew until he was an expert on nothing! This was his answer to over specialization. He said that for fifty experts who will explain, in exhaustive detail, why an idea won't work, there will bo one who will say, 'let's try it'!
This is why most of the breakthroughs are made by research associates, their boss has the clout to find the funds, and usually takes part of the credit for the result. He insisted, "make a breakthrough in a research project, form a company to locate the backers, and go!"
Now I'll admit that I am not answering some of the letters I have received. But, consider the context of your objections. Are you in danger of becoming an 'expert on nothing'.
William M. D. Wright
Vice-president of Research and Development
Hydrospace Developments Ltd.
Predictions for 1977
(This was submitted in November 1976, but you can see, it was never used, as it was too gloomy to run).
From my attempts to remain a lighthearted columnist, even though I have lost Watson Labs; once again, I have been asked for my predictions for the coming year. PS: the editor neglected to state “Audio” when he asked me!
Well, here goes! Because I once worked as a student researcher at MIT, a Dr. Lyons got me interested in neuophysics. As one assignment, he had me search for the effects of very low levels of zinc on the soil. First, I discovered that it seems that it is almost added to fertilizers. At the same time he suggested that I look for cancerous-like growths on trees (they were easy to observe). I began with oak burls, a strange knotty growth on oaks. I found that the soil where they grew was severely lacking in zinc.
I have to add that Dr. Lyons was deeply concerned about the over use of antibiotics, and felt that their over use would lead to breeding strains of germs that were increasingly resistant to antibiotics. He urged me to investigate the effects of vitamin and mineral deficiencies in areas such as autoimmune diseases such as MS and ALS. There was a severe polio outbreak that year. As I had had received St. Johns Ambulance worked as a student orderly in Mass. General on White nine.
It seems that I must mention a story (a serial) I read in the Saturday Evening Post (circa 1948 or ’49) where the plot was based on a wealthy man acquiring vast amounts of land as far away from coastal as was as possible. I think that his plan was to detonate atomic bombs in arctic areas to raise the oceans water level flooding England and New York as an example. He had a map that showed the effect of his plan. To me it made think about the consequences of global warming.
I thought that eventually it might lead to shifts on the loading on the earths crust. In turn there would be an increase in volcanic activity and earthquakes. A terrible concept! I expect that these effects will become evident sometime in 1990 to 1993. It will take at least six years before the consequences will be self evident!
Still there will always be politicians who will ignore it - and when it is too late will find someone else to blame! I also believe that there will be a centralizing of corporate power. The “bottom line” will override ethics! Nobody will see that at least seventy percent of new jobs will be crated by small business! Large corporations will get larger and larger until the lobbyists who are funded by the multinationals will in effect, elect the most governments. I also predict that the large drug companies will spent more on public relations than on R&D. More useless drugs will be pushed though the regulatory bodies and eventually mist universities’ R&D will be funded by the very companies that will profit the most. Objectivity will be lost. Peer review will be concentrated on the increasingly conservative who will be promoted in the basis if the amount of trivia they publish! If this seemed cynical, wait for five years.
My forte is to try to connect the seemingly aberrations in experiments.
Dr. Edgerton of MIT had a dictum, which served him well. I will attempt or paraphrase it: ”Acquire a working knowledge in several fields – because people speak a pseudo scientific jargon to appear erudite when presenting papers – learn to edit to the bone – look for anomalies – and attempt to change to a common syntax – one the most important thing is to what questions you have to ask yourself and what to ask the “experts” - there you will perceive things that the experts have missed because they are too close to the subject – and too prone to spent too much time in attempting to prove someone else wrong than continue on their own work – avoid petty squabbles – don't ever get too distracted – keep on with your own work – and above all else, develop some commercial sense so that you will always have the money to fund yourself!
There seems to be a trend towards the use of antibiotics in (two examples come to mind - dairy farming and by extension, beef cattle, as well as the use of of non organic pesticides). Some of them act to promote growth by acting as synthetic hormones. Abuse will inevitably follow. We will see an increase in neurological aberrations. It seems possible that someone will attempt to alter the genes, not by selective breeding where the process can take twenty years (and dangerous results can be stopped) but by R&D based gene altering. The quickness of this process will make it very attractive to major corporations who see their share prices as the only measure of success. Therefore I think that it is safe to predict that within 15 to 20 years we will see a disaster linked to this technology.
It is not very well known that in the '50's that summer agricultural students were hired by the Canadian Government to test a new posphate-based fertilizer on prairie grain fields. A significant number developed MS. I have attempted to locate one of the papers I was sent in 1956 as I knew one of the authors (from The University of Manitoba), in the list of Goverment-funded studies. It is now missing from their archives.
Now for my predictions in audio….
Joe Heusi (who put up some of the funds so that TELARC records were able to press their digital records - as Baron productions - the name of his dashund - asked me to sell Soundstream some of my ultra-low noise microphone preamps for use in their system - I could not go ahead as although Wright Electroacoustics had done the R&D and I owned the technology - Watson Labs was out of business until that was settled my hands were tied), had been using Dayton Wright XG-8's for years - in a four channel (Hafler) matrixed system. As Soundstream was limited (at the time) to 16 bit coding, I was apprehensive about the loss of some of the low-level sounds - that when they interact acoustically as well as in the brain inhibit some of the 'processing artifacts' that make music real. Some of these low-level sounds were masked by record noise, but when they were processed in the brain, the listener could mask them and leave the rest as contributory to the realism.
I felt that when the resolution was increased to 20 bits, the recording process would be better. As I felt that a complex Laplace Transform was taking place, some various form of bandwidth form, such as 'hanning' was at work. Therefore having to use sharp cutoffs as necessary to restrict the passband to an upper-limit of 17 kHz was, itself introducing strange artifacts into the sound. These were the reason tat some of the records sounded 'clean' but 'non-musical'.
I predict that some form of digital recording will be introduced within four-to-ten years, where new technology will allow as much as 24 bit encoding to be used. The results to sound will be great.
Until then, some R&D will endeavor to add noise above the 17 kHz cutoff, perhaps by measuring the sound distribution between 6 kHz and 16 kHz, storing and doubling it , and introducing filtered noise above 17 kHz to 'enhance' the digital sound.
Common mistakes when testing Dayton Wright Electrostatic Loudspeakers. Remember that these ESL's are desiged to perform properly in the 'far field' (distanced over twelve feet). At that distance phase coherence is maximumised! Tests in the 'near field' (typically under eighteen to twenty four inches) are terribly misleading. The differing distance from each set of panels leads to extreme phase cancellation effects and results in a erratic frequency response! Almost no ESL is designed to be used at a two foot distance from the listner's ear! The consequence is that the tweeter level is much too high and the mid-range exhibits erratic frequency response!
So called 'experts' are often wrong when they go out and aquire expensive test equipment as they often rely too often on 'near field' testing which is OK for microphone calibration. However, what is critical is how the loudspeaker actually sounds in a listing room. In 1958, we found that there were major discrepancies between measurments and listing tests. This led to an extensive R&D program that didn't corelate until the middle '70's. It wasn't until 1983 that we were able to finally able to evolve the computerised technique of 'Periodicity Testing'. We took the Power FFT analysis and by applying the proper 'gating' techiques, stopped or reduced the artifacts that were introduced by the bandwith 'cut-off's'
The ultimate result is when any loudspeaker actually sounds like what the program has, in advance', predicted!
We have published some of the criteria.
William M D Wright
THE DAYTON WRIGHT
97 Newkirk Road North
Richmond Hill, Ontario L4C 3G4
THE CEPSTRUM IN LISTENING
Most of us assume that when we listen to music on our Stereo systems that within the limits of our earls frequency and loudness response, we perceive what we hear. We assume that our ears pick up the sound more or less like microphones, and pass it on to us to listen to as if our heads were some sort of tape recorder. We may suspect that it's a bit more complex than that, but after all "Hearing is believing, isn't it?" Well, not necessarily so, and here is where I have kick the stuffing out of a lot of the commonly accepted ideas about frequency response, distortion, and phase response! Much of what I will write here is still hypothetical, and while we are responsible for some of this theory, this paper is too small, and its foundations too broad for me to hope to give credit to all the researcher's whose papers I've mined for information.
At this point I don't want to get too involved in the physiology and neurology of the ear and brain, so suffice it to say that over most of the audible range the physical mechanism of the ear acts more like a frequency analyzer than a microphone; the vibration of the eardrum is passed on to a detection device which consists of a long coiled liquid filled tube in which are immersed vibratory detection cilia and their attached nerves. The shortness of the coiled tube together with the speed of sound in the liquid does not reassure us that this apparatus is a pure analyzer, for the tube is just too damn short to handle low notes. These seem to be detected by a combination of standing waves, phase analysis, and, perhaps, actual microphone action.
In order be sure that I'm understood I must define several concepts, adding comments where I feel they are needed. These terms are not all used in this presentation; where I have included these latter, it is because I feel that they may shed additional light on our subject:
Preliminary Mental Signal Processing:
This is a process that takes place in the brain, where the acoustic signals that we have detected within the inner ear are processed to remove unwanted noise. If you've ever made a tape recording of a dinner party, your’ efficiency as a human noise filter should have been evident on play-back. It probably sounded as if the guests were smashing the crockery! This processing also encompasses such things as the conversion of the two related-monaural signals from the speakers together with the room ambiance into a 'stereo' image, as well as giving us peripheral information on ambiance, air, transparency etc.
That sound which is perceived AFTER the brain does its preliminary Mental
Signal Processing - in other words what you and I actually think that things
Masking is any phenomenon that alters the data to which it is applied. We use it in three senses here.
A processing alteration.
This is a form of masking that leads the listener into either thinking he is hearing a more realistic sound, (i.e. a synthetic reverberation system) or acts as a mask of unpleasant sounds (i.e. the masking white noise used to cover dining noise in some restaurants).
This is a form of masking that prevents the listener from perceiving all that he should perceive properly (i.e. any sound that either causes his brain to reject part of the sound he hears, or blocks part of the mental Signal Processing). It could be masking that prevents a listener from constructing a stereo image or resolve fine detail in the sound. Another example; instructions given in a noisy environment are liable to be forgotten, too much processing is being devoted to filtering out the noise, and not enough is left to produce the associative 'flags' on the instruction so that it can be remembered.
A rule or procedure for solving a problem. The basis for problem solving computer programs.
A repetitive modulation of the sound which may show up as frequency ripple or phase ripple; sometimes almost imperceptible in phase and frequency plots; sometimes present in such quantities as to make frequency analysis almost useless. It may be due to diffraction or reflection effects, as in a speaker; or it may be due to reflective room acoustic effects such as reverberation. Note that the periodicity may be based on an absolute or on a logarithmic scale.
Fast Fourier Transform:
A Fourier Transform is mathematical technique (normally performed on a computer, you could in theory, do it by hand) used to take a Time-Domain Signal (such as an actual waveform) and Transform it into the Frequency-Domain (so that we can plot it's Frequency response). The Fast Fourier transform is an algorithm that through limiting the number of multiplication's, and by doing some bit-swapped data tests and exchanging of the position of the data in the output array reduces the number of multiplication's involved and thus speeds up the otherwise slow process.
The inverse Fourier Transform of the logarithm of the power spectrum (frequency domain); the Cepstrum is in the Time Domain; when plotted, the vertical axis may be the 'REAL' cepstrum, or if the real and imaginary parts are squared and added, then the square root taken and plotted, the Amplitude Cepstrum. The amplitude at any given position indicates the amount of periodicity and the time constant of that position indicates how often the periodicity occurs in the Frequency Spectrum that was used as the input data. The horizontal axis of the plot is called Quefrency. The Cepstrum (the name is an anagram of Spectrum as are all the terms used, for example Phase is called Saphe) is best remembered as a specialized form of Time-Domain presentation in which periodicity's such as reverberation and echoes are much easier to extract. Technically .this is because it is a transform of the log of the power spectrum, and thus effects are additive.
Quefrency does not represent absolute frequency, but presents information only about frequency spacings or periodicity's. For example, an effect which produced a series of reflections having a period of 0.1 mSecond could be impossible to extract from the frequency response plot; for the ‘side band' activity would be distributed over the entire frequency response, yet this would show up as a line at 0.1 mSec on the cepstral plot.
The fundamental line's amplitude will quantify the amount of periodicity at he 0.1 mSec while the distortion of the periodicity would show up as rahmonics (equivalent to harmonics on the frequency plot). only careful analysis can reveal whether these higher rahmonic lines are distortion or whether they are true additional periodicity's (with a potential of rahmonics of their own).
From Log Cepstrum. Whereas the Cepstrum is based on an equal frequency interval Spectrum, the Logstrum is based on an equal octave interval spectrum. The Logstrum Lines therefore indicate an octave based periodicity. While the Cepstrum is useful or absolute analysis, the Logstrum is closer in behavior (and misbehavior) to human hearing.
Depth of Image:
The perceived depth of the stereo image. Some systems with certain types of masking periodicity (which seems to trip up the algorithm used by the brain to 'fuse' the sound from the two speakers) sound 'Flat' physically. They usually have a rather lifeless quality to their sound; even though the left/right location of the instruments is correct there is no sense of depth. In its own way this type of speaker has as many problems as one having high distortion.
The response attributes of sound upon which we make our decisions. Not what we hear, but what we think we hear. What's left after the Preliminary Mental Signal Processing has taken place!
The more mental signal processing work we have to do to listen, the more tired we get. This is Listening Fatigue. This processing fatigue is most evident when there is an apparent discrepancy between our sensory inputs. At it's worst, such as when we are seated in a theater watching a roller-coaster presentation, the discrepancy between what we see and what we feel our bodies are doing can cause nausea. A similar effect is sometimes noted when listening to bad audio systems.
Derived from the terminology of Dennis Gabor's work on holography, a Holographic processor is one in which substantially the whole processor unit is used in every information processing job. Such a system has the capability (as does the holograph) such that elimination of part of the system merely slows down the processing speed and/or reduces the resolution of the processing. Thus the more jobs that must be processed together in a given time, the lower the resolution of all of them. Interestingly enough a hologram can be considered as a two-axis Fourier Transform Device.
Distortion of the size of an instrument or group of instruments produced by the speaker system. (i.e. A singer with a twelve foot mouth, or the world's smallest orchestra).
Where the ambiance of reproduction is enhanced, destroyed, or just plain screwed-up by the reproduction system.
Where the instruments seem to exist in different space from the ambiance; or where the sense of physical space is contrary to the ambiance and reverberation of the instruments. A recording fault not uncommon on popular recordings where an artificial ambiance is added. We must remember that the positioning of a mono-miked instrument between two channels as done with a Pan-pot on a studio mixer, only involves signal strength changes, phase (time delay) information is not used. While this does give us information that we can use to establish the lateral position of the instrument, the supporting phase information is not present. This causes a less than ideal situation in terms of supporting the mental algorithm used in determination of stereo position and depth.
A quality which requires a modification of the playback level in order to achieve a perceptual sound level that is satisfactory to the listener. An example would be the playback of sound at a very high level so that distortion in the sound signal could no longer be perceived due to overload of the listener's perceptual capabilities.
The width of the reproduced sound stage as presented in the listening room.
The successful elimination of a central 'hole' in perceived sound stage. one aim of a good speaker system is to maximize the Sonic Lateralization without loosing Stereo Fusion or encountering Scale Distortion.
Perceptual effects which are produced by the presence of sounds of intensity and frequency such that they fool the ear into thinking that other sounds or frequencies are present; due to anomalies in the method of hearing or signal processing.
A technique where the opposite channels signal is inverted in phase and mixed either electrically or acoustically with the signal to either cancel some degree of crosswalk or augment the separation.
A technique used to make a pair of speakers sound as if they represented a distributed source. This is necessary when two speakers are used to 'trick' the listener into thinking he/she is hearing true multiple source stereo. The technique is based on the measurement of the difference in both phase response and in frequency response between a distributed source and a 25 degree off center axis point source both measured by a 'ear-shape-masked' microphone. It has been found that these frequency and phase differences, if used as compensation factors that are designed into both channels of a stereo system, (either in the electronic signal path or in the speakers) will cause the listener to perceive two speakers as having the same pink noise response as a uniformly distributed source of sound played through a 'flat' system.
The technique has been anticipated by some manufacturers who, based on their own experiments, felt that a slight midrange dip led to a more natural sound.
A technique of data weighting used to reduce aliasing or leakage in a FFT or IFFT.
The mathematician's name for aliasing.
The production of artifacts by a periodic process where the observer's windows, each seeing only a small portion of the data, produce a waveform totally unlike the true waveform being sampled. often of interest in modeling because apparently irrational behavior in the system being modeled may be the result of aliasing effects.
The frequency response of our ears varies with the location and size of sound in relation to the ear, given a reasonably broad spectrum noise, we can usually localize it and judge its size by a combination of apparent spectral distribution (it also helps if the sound source is in motion) and by the earls apparent sensitivity to phase ripple and convolution effects in the frequency response.
Point sources such are encountered (more or less) in the two sources we
listen to for Stereo are not perceived as having the same frequency response
as a horizontally distributed sound source. The idea of "Flat Frequency
Response' is not so important as "Perceived Flat Frequency Response" and
in this I refer only to the speaker response, not to that of the room (that's
another matter entirely). This localization effect is psycoacoustically
enhanced in the mid-range. It might have been a survival factor when
men were cast in the role of hunters. Then it was vitally important
to locate your prey before it located you; people without this facility
were often eaten which effectively removed their genes from the gene pool!
It has been suggested that the higher mid-frequency balance point of women
might have been a survival trait where it was necessary for the women quickly
to locate the higher pitched voices of children.
This, however, is sheer speculation. By using a soft silicone casting that resembles the human ear (and the head behind it) and mounting a microphone in it, the effect of varying the sound arrival angle can be studied in terms o Frequency and Phase. We find that, for example, the aggregate sound of a row of identical speakers distributed through a visual angle of 50 degrees produces a different frequency and phase response than a single speaker at a angle to the 'heads' center line, of 25 degrees. Reason tells us that an attempt to produce a stereo image from two such single source speakers at a angle of plus and minus 25 degrees to the center line through the 'head' may fall short of reality because the two speaker set-up will result in different frequency and phase response in the listener's hearing from that of a real source distributed between these same points.
In order for a pair of speakers to enable a listener to 'Fake' satisfactory Stereo image, (more on this later), the speaker actually has to be designed to have a frequency and phase response that performs this explicit de-localization effect. In its most simple form this involves the engineering-in of a slight dip in the mid range both in frequency and phase response o the loudspeakers. (This is not the same kind of ‘Dip’ often found in and around audio clubs and stores!) Listening tests have apparently demonstrated that a least initially, the closer the compensation lies to the ideal (which at best is an average) the more 'Natural' the sound seems to be to the listeners.
It might be argued that absolutely flat frequency response is best only for the reproduction of monaural material, and then only when the listener i directly on-axis with the loudspeaker.
Some other factors which are important in sound perception and the elimination of Listening Fatigue seem to have been forgotten by a few loudspeaker designers. For example, it is important that the system' response be balanced about a mid-frequency point which is somewhat higher for women than for men. This refers to the by now often ignored idea that the product of the low and high-frequency half-power points should be about 400,000 for men and 500,000 for women. Otherwise the speaker will sound either bass heavy, or shrill.
The brain appears to perform its processing job balanced about this central frequency, and it is interesting to note that some experiments have demonstrated, with varying degrees of success, that you can tolerate a peak in the low mid-range if a similar peak is placed in the upper mid-range so that the two of them are symmetrically located about the aforementioned balance point. Now, I am NOT advocating the introduction of peaks in the response, just explaining some of their weird effects. Thus, the addition of the corresponding conjunctive peak to the system having the original peak can make it sound better than when it is played with a response having only one of the peaks!
We have speculated as to the reason for the central frequency being where
it is - the only hypothesis that seems to make any sense is that it is
the frequency (about 650 Hz) where a sound source, located to the side
of a listener, will produce the maximum phase shift between the left and
right ears. While larger phase shifts will be produced at higher
frequencies, there will be some possible redundancy as for example, a 300
degree positive shift might be a 60 degree negative shift. At lower
frequencies the usefulness of the phase information may be increasingly
limited by the apparent fact that low frequencies are determined more by
the periodicity of the Logstrum than by actual frequency analysis in the
ear itself. It is interesting that this frequency also seems to be
a half-power point for the Fletcher Munson curves
Yet other factors exist which are not so well understood, and few have been formally examined. For example, several years ago we noticed, quite by accident, that the use of a low-distortion sub-woofer and woofer seemed to leave the treble much cleaner; while this might well be part of this conjugate frequency effect I've mentioned, it seems more likely that the FFT Process used to detect low frequency sounds and the 'Leakage' artifacts from that process (where distortion is involved) hamper the detection and/or reprocessing of higher frequencies.
The masking effect was sufficiently prominent so that we spent a lot of R&D aimed at cleaning up the low-frequency distortion of our speakers, especially near, at, and below their low-frequency resonance's. As the compliance controls the action of the cone below and at the resonance (and above the resonance as well, although it's effects drop off very rapidly with increasing frequency); it is thus very important to have a linear suspension system whether it be mechanical, pneumatic, or as in the case of speakers, a combination of the two. Thus our use of a high specific heat gas; bagged inside the low frequency sections of our speakers is beneficial to the perceived high frequency characteristics as well as to the more obvious low frequency distortion. This gas is a MUCH more linear spring than air! A restoring force (as provided by the compliance/suspension) IS necessary in order to center the cone otherwise the cone would fall out of the speaker.
To understand some of the material we have developed, it is necessary to go into the way we think that the human brain functions. I should mention that I was fortunate in being able to draw on some developmental work on the holographic model of the brain in which I participated in 1964 and 1967 (see Science journal in, I think, August of that year).
The brain is, so far as our models are concerned, what might be termed a Holographic Co-processor. It is the convolutions in the outer surface oft he brain (which substantially increase the area available for nerve termination's from all our senses) which allow us the potentially high degree of sensory resolution that we enjoy. Under this theory of brain action, the sensory information is presented over a large area, and stimuli radiate from that area forming complex patterns and foci within the goop (containing the synaptic junctions) that lies within. Two forms of storage take place: a temporary one (Short-term memory) which involves a complex change in the chemical balance within each synaptic gap (thereby affecting the necessary potential before a signal as passed across that gap), and an even more complex one that translates this new chemical balance into the chain molecule ending (that determines the contents of our less accessible Long-term memory), we think, during sleep. Interesting enough, stimulation of one of these foci can, by reverse-re-focussing, produce a very strong recall of past events! The memory is NOT stored at that specific point, instead, the point is just a foci caused by the events leading to that memory, and in re-radiating through stimulation excites a similar series of foci to those produced by the original event.
The complex patterns that are produced within this network of synapses serve to "focus" energy into yet other patterns; excite one part of these stored holograms by a particular sensory input, and you run the risk of evoking the same sort of ghost images that have plagued attempts of researchers to use holograms as complex filters. It is these ghost images that we call associative memory! Needless to say, the more stimuli or file references we can supply, the more sources are involved in producing complex foci, and the greater the possibility is that we will perceive the resulting aggregate foci; and thus the easier it is to remember things! Most memory aids are based on reinforcing the memory through associations or mental cross referencing! There is some feeling that the cyclic electrical activity we can measure is the sum of the activity in the individual synaptic locations as the chain molecules of long term memory are 'read' to provide some modification of the synaptic responses. When certain thought activity ceases; localized modulation of the waves is reduced and thus the waves are more easily perceptible just as gentle waves are perceived on the surface of a almost pond only when the pond itself is otherwise still' But an alternate is that the brain waves are the result of the inverse transformation of a single line in a periodicity, which is observable only when the source data plane is at rest. This is not wholly satisfactory as it does not explain the extent of the observed phenomenon.
I have always been fascinated with the amount of information that the brain can derive from a sound, the fact that several speakers which to all intents and purposes measure identically can sound so different, simply convinced me that either we were not making the correct measurements or that we were not processing our data to the same degree or in the same way as was being done by our brains! And this being the case, we were probably not even aware of many of the really important factors in speaker design.
For example, in 1977 I designed a three way loudspeaker which had sufficient phase accuracy such that a quite good square wave could be produced anywhere between the frequencies of 300 Hz and 5000 Hz (where the loss of the high harmonics beyond 25 kHz caused the waveform to degenerate). I was rather disappointed at how little beneficial effect this phase coherency had on the sound. It was a good speaker, but hardly revolutionary even though the distortion was much lower and the frequency response much flatter than most contemporary designs. Thus I find the emphasis of some technical writers not only misplaced but most annoying, for present advertisers policies notwithstanding, distortion, frequency response, and phase response by themselves are not as important as designers would have you believe, Fortunately, f or our company's present position I realized at that time that I must be trying to improve the wrong factors in speaker design; what I should be doing was trying to find out what the correct factors were and what the correct weighting was for those factors. In other words, how important were each of the different factors in designing a loudspeaker? Indeed, did we even know what all these factors were?
Were we placing too much emphasis on the wrong factors and thereby, (for everything costs money) preventing ourselves from investigating and solving more vital (if not clearly understood) problems?
In 1979 we began an investigation into periodicity effects, phase noise, and phase ripple. This moved into the lab stage in l980 and involved the use of a Z-80 based computer to extract the Cepstrum plot from the frequency response data that we had obtained. This was an arduous process because the resolution that we needed and the number of data points that had to be entered manually into the computer. Due to the size of the arrays we had to process, t' e program itself ran very----s—l—o—w--l--y ! This was a result of the amount of number crunching and the necessity of the use of a floppy disc as virtual memory. After we had measured a number of loudspeakers we discovered that there were interesting similarities in some aspects of their Cepstral plots! It was not too hard a step to take knowing that Cepstral plots also yield a great deal of information on acoustic decay, and the acoustics of a room, to deduce that the listener was somehow using a similar mental (although holographic) algorithm to extract periodic information from sound and then using this information to judge size, timbre etc. We felt that this might be one of the missing elements in correlating electroacoustic measurements with the perceived sound of loudspeakers.
Far from satisfied with the results we were obtaining on a equal frequency base, we resorted to a log distribution frequency response, and found that this was far more illuminating. As an analysis tool it was not as satisfactory, but as I noted, the Logstrum behaved and misbehaved more like the statistically based model of human hearing with which we were familiar. The equal-frequency-span Spectrum did not explain the octave phenomenon in human hearing, nor did it offer any wholly satisfactory explanation of the level augmenting artifacts of the moderately higher order harmonics. Indeed, it did not seem to indicate why high order harmonics should be so dissonant. Using Occam's razor to slice away the complexity of the explanations needed by the Cepstrum, we therefore adopted the Logstrum as the basis for our model of human hearing, and altered our programs accordingly.
But still we were dragged down by processor speed. on high resolution transforms we often felt that we had forgotten the question by the time we succeeded in determining the answer. Towards solving this problem, we acquired a more a more powerful 8 mHz MC68000 based computer. We have spent several man months in writing programs to perform both high resolution Cepstral/Logstral analysis and to extract the significant quefrency (queflency for logstrum) lines. The size of the numeric arrays involved, (and therefore the resolution of the program) is limited only by the available memory of the computer we use (which could be expanded up to 8 megabytes) and by the not inconsiderable time necessary to perform the FFT and IFFT algorithms. What formerly took several hours is now available in several minutes. A 2048 line Cepstrum can be produced in less than 4 minutes including the data acquisition. The increased amount of sampling needed to obtain the Logarithmic distribution Spectrum adds some 3 minutes to this for the Logstrum. In both cases we are using a FFT based spectrum analyzer under computer control, and a programmable function generator also under computer control. The desired band is divided up into a number of 256 line segments, and the generator produces a train of pulses whose width and repetition rate has been optimized for the highest possible signal-to-noise ratio in the Analyzer within the actual frequency pass band used in the respective segment. The computer compensates for both amplitude and phase ripple in the analyzer's digital filtering, as well as adjusting for pass-band level and curvature. The segments when added together produce a linear analysis encompassing the range of hearing as the basis for the Cepstrum, and an equivalent log distribution of lines as the Spectral basis for the Logstrum.
Some plots of as much as 18,384 lines have been run using 'borrowed' memory boards; but the run time is too reminiscent of the much lower resolution runs made with the Z-80 based computer. We are currently investigating the feasibility of adding an array processor to speed up the actual FFT process from tens of minutes (for a 18 Kilo-line FFT) to tens of milliseconds.
In addition, as an aide to separation of fundamentals from rahmonics in the Cepstrum and the lahmonics in the Logstrum, we are investigating the use of a further processing; that is, taking the Cepstrum of the Logstrum in order to obtain the periodicity of the Logstral lines. Some investigation suggests that the systematic discreet re-transformation of the principle Trans-Cepstral lines can be used as a guide to de-composition of the Logstrum itself; considerably assisting the extraction of meaningful periodicity's from the Logstrum. But let us return to the problems of speaker design.
Investigation of a series of cabinet designs showed that although the original work by Dr. Olsen of RCA on cabinet shape and their effects on frequency response was illuminating; equalizing out the frequency aberrations he noted did not get rid of the characteristic coloration's to the degree that might have been expected. In furtherance of our investigations we built two small speakers, both with the same (matched) drivers well damped (and identical) enclosures, but one whose front panel was four by eight sheet of plywood. That one sounded like a large source which didn't surprise anybody. But when a frame (of frontal dimensions equivalent to that of the smaller speaker) was placed on the larger baffle board, it's apparent sonic size as judged in listening tests was almost that (similar enough to be confusing) of the smaller speaker. The frame's discontinuity was introducing the sort of periodicity effects that the ear uses to judge size, and these could be measured' Thus we felt that we now had a valuable clue as to how the listener judged the size of speakers. Even when the height of the frame was reduced to the point where plottable frequency anomalies were virtually eliminated, the ear could still make a size judgment which we felt was based on the still measurable periodicity effect. The use of Cepstrum and Logstrum analysis showed (especially easily in the case of the latter) that so long as the periodicity lines were above the 'grass' on the plot, the ear could hear them.
But we found that in our work on of periodicity analysis that while the locations and 'harmonic' (rahmonic) relationships of the quefrency lines were important, the locations of the queflency lines was much more interesting. The very misbehavior of the Logstrum that made it less desirable as an analytical tool for communications system analysis made it fascinating for study as a possible model of human hearing behavior. The octave relationship is obvious in the Logstrum but it is the relationship of the lahmonics that most closely resembles the peculiarities of human hearing. When the harmonics (n) have the 2n relationship to the fundamental, they will be summed in the logstrum into a single line. Thus the fundamental, 2nd, 4th, 8th etc. harmonics are summed under one queflency line. The simple exception lines such as those resulting from the 3rd, and 5th harmonics have a simple mathematical relationship to the mainline but above this point, the relationship on the Logstrum becomes much more aperiodic .to a similar degree as the 'dissonance' of the harmonic. This is probably why they are much easier to pick out as distortion components. Thus the experimentation seemed to indicate that when the lines are related by simple ratios, the ear judged the quality of sound that resulted in these lines as pleasant. other timbres which produce an harmonic relationship were found to have much more character, but given a sufficiently complex ratio, were also to found to be harsh or dissonant. Our theory is that the more simple the relationship between major lines the easier it is for our brains to correlate the lines in further processing, and as it seems that ease of correlation is, at the very least, an aid to avoiding fatigue; we classify these oddballs as a form of noise as a possible guide to avoidance! Now this is a difficult technique to use as lahmonics of one signal periodicity may coincide with and thus mask the true 'fundamentals' of other signal periodicity's. This is why a decomposition algorithm is of considerable interest to us at the present time.
One can speculate that much in the same way as the eye is subject to the enhancements of choiescence in painting (the deliberate use at a sub-detail scale of strongly contrasting colors in order to add inner luminosity to the painting) so is our hearing subject to interpreting certain periodicity's as giving a special quality of sonority to a sound, a quality we think of as pleasing to the ear. Through the use of closely although not identically tuned strings, a piano produces complex periodicity effects. In this way the piano gains sonority. We have not yet had the opportunity to examine the resultant periodicity plots to see why this should be, but it might be that it elicits a blurring of the following conjugate foci and thus elicits further generalized memories much in the same way that some euphoria is produced by intoxicants.
And that once again brings up Co-Processing. I have made the point that the way a system sounds when you're exhausted is probably the way it really sounds. otherwise you're doing a hell of a lot of signal processing to make that signal acceptable; this occupies processor capability that would be better occupied in the lend process' of listening, that is: the process of correlating the sound with its complex timbre, ambiance, directionality, and such, to the musical experience to which we are supposed to be listening! Not only does this overhead cause fatigue, but it reduces your' enjoyment.
Listening tests also suggested that the presence of a relatively small amount of phase ripple (which was apparent in the Cepstral plot) resulted in a diminished and confused Lateralization and poor Stereo Fusion. The permissible physical separation of the loudspeakers (before Stereo fusion was lost) was adversely affected by some induced periodicity effects. The slight ripple apparently tripped up the brain's ability to process the signal for stereo, it apparently introduced phase noise at variance with the phase information needed to maintain perceived stereo fusion. one possible explanation is tied in with the way the ear/brain separate sound sources vertically, as well as separating front of head sounds from rear of head sounds. The phase differences here are much more subtle, although the frequency differences are easier to measure. It may well be that the introduction of the phase and frequency noise is enough to create a degree of uncertainty in there solution of front-rear discrimination and therefore the image drops out of fusion. We also noticed that systems with huge phase errors as well as large amounts of phase ripple produced, in many rooms, Benign Masking to the point that we called it the 'Airline Terminal Effect'. And, perhaps justifiably, when we found this in a speaker design, we choose to regard it as terminal indeed!
We were also surprised by the degree to which sub-sonics played a part in ambiance. The use of sub-sonic filtering on some string quartet source material definitely produced a sense of Ambiance Disassociation, even though no instrument in the quartet had a range extending within an octave of the cut-off point of the filter. These are clearly not Perceptual Artifacts although they may be related to the same sort of hearing-signal processing functions as false bass. Because the earls low-frequency hearing would appear to be done partly as a result of a Fourier Transform Function is we would expect that the use of the harmonics only could cause some of the same type of periodicity patterning as the full content, and this would appear to be so. Thus the mind if not the body, can be fooled into perceiving a low note that is not there providing certain of the low note's normal harmonics are present. This effect has been noted in organ music where a Diapason chorus can appear to generate a fundamental below the range of the rank of pipes being employed. However, in the case of the missing ambiance it would appear that there are some low-frequency periodicity's associated with or produced by (perhaps through the transient excitation of low frequency resonance's) the concert hall that were eliminated by filtering. Although the effect was subtle, we could detect their absence!
An argument has been made that any sort of sub-woofer can be used as the ear can so easily be fooled into accepting as genuine, the resultant sound; however reference is made to the adverse effect such a distorted substitution can have on the high frequency perceptual response. This is not a problem in musical sound generation or production, but is IS a problem in reproduction. Again, the presence of dissonant harmonic distortion from a subwoofer probably requires a degree of short term masking to counteract such that either the processing overhead is too great for good high frequency perception, or some sort of gross discrimination-threshold level shift takes place which unbalances the high frequency perception.
To sum things up, we came to the conclusion that not only could the unwanted presence of false periodicity's introduce Adverse Masking, but the elimination of true periodicity's could result in a reduction of ambiance; we felt that this loss of stereo-relational information placed an additional signal-processing burden on the brain to the point where Imaging was damaged, Stereo Fusion hampered, and Scale Distortion encountered; in addition, the increased initial signal processing overhead caused a loss of perceived sonic detail and timbre.
Consideration of these effects allowed us to establish a simple criterion for speaker design; design a system that requires too much preliminary signal processing by the brain and it will be tiring, and NO MATTER HOW LITTLE HARMONIC DISTORTION THE SPEAKER SYSTEM MAY HAVE, NO MATTER HOW FLAT IT'S FREQUENCY RESPONSE, OR HOW PHASE COHERENT THE SPEAKER SYSTEM MAY BE, the brain will only be able to perceive a fraction of the sonic detail produced by that speaker! And the listener will suffer from listening fatigue!
What we then attempted to do was to design a simple low-distortion flat frequency/flat-phase response (within the limits of the aforementioned de-localization modifications) that had minimal undesirable periodicity effects. Queflency line bunching and positional analysis from the Logstrum plot were employed to try to rationalize what it was that we felt we perceived during the listening tests.
We found that even if we could not eliminate some of the adverse lines, we could often move them into more 'harmonious' positions, or at worse, mask them with stronger lines that were deliberately introduced. And we also determined that a lot of rule of thumb design generally regarded as gospel by speaker designers just isn't all that important after all! Too often, in the rush for flat amplitude and phase response, some secondary grunch is cleaned up; it is ironic that an many cases it was this secondary grunch that was more important and which should have been itself the direct target of the clean up!
Now I find it interesting that in judging ambiance and 'air', the brain apparently pays a great deal of attention to periodicity effect related phenomenon. often-, all that is necessary to add 'air' to a speaker, is the use of an off-axis driver which, by adding some room acoustic reverberation effects to the higher frequencies, introduce some Benign Masking. And I also find it interesting that these effects when present (related to ambiance) in the source material, can be masked by other strong periodicity's, whether they be scanning rate induced, or stepping induced. And this may well be the reason that some people find CD or digital music lacking in ambiance. It may well be there on the recording, but it has been effectively masked by the scanning periodicity, as well as by the amplitude stepping periodicity! And because people seem to place different weightings on ambiance by periodicity and frequency effects, people also have widely different opinions as to how acceptable CD really is? (And this does not excuse the poor mike techniques either!) one could say that the apparent acceptability of the CD sound is a function of how much experience the listener has in listening to real music and therefore, to what degree his Initial Processing algorithm has been developed as opposed to the degree his algorithm has been based on reproduction system listening!
Can this Adverse Masking be removed? A higher scanning rate might help, or perhaps the use of a similar signal processing technique to that in littering out unwanted cepstral line relationships might solve some of the problems of the Compact Disc.
The crux of any theory is posed by the question, 'Does it work?" and "Can it be used to design better units?" Well, we can only point to our LCM-1 loudspeakers. Here we found that we had designed a speaker whose sound does not seem to suffer as much as most other speaker systems do when the listener is tired. The weighting insight on the tradeoffs (or Value Analysis) seems to work as it is not a terribly expensive design to build. The drivers are good, but not exceptional, and while the frequency and phase response are very good, it is possible to refine them further without much apparent beneficial effect. The key concept is that the design of the speaker has been rationalized to minimize the mental signal processing needed to achieve a stereo image as well as a balanced sound. Undesirable periodicity effects have been designed out or if that was not possible, shifted to produce some benign masking.
The result is a speaker system that has very low listening fatigue, and which allows the listener to perceive a much greater amount of the detail that is present in the source material! In other words, we feel that we have both qualitatively pinned down several of the more important criteria in loudspeaker design, quantified them sufficiently to allow them to be manipulated, and then demonstrated the validity of the design technique with the LCM-1 loudspeaker design. Because the speakers are properly de-localized for stereo, and do not produce undesirable periodicity effects that complicate the brain's preprocessing of the signals for stereo, and because masking is absent, both the stereo fusion, the depth of the image, and the ambiance are much more real. The Listener can perceive a greater amount of detail, texture and timbre than other speakers producing, in many cases, more sound detail! There is also less distortion of the origin's physical size (Scale Distortion) and there is little sensation of image resealing when the listening level is varied. Surprisingly, the comment has been made that the speaker seems less sensitive to absolute listening level than equivalent speakers. This may be related to there being more human processor available to handle the adjustment. There does not seem to be the same need for overpoweringly physical sound pressure level.
Now for us comes a long period of separating the rational from the ritual;
and boiling down the accumulated data in the pot of experimental experience
until we have a meaningful and useful technique. This is all part
of the process of objective quantification of otherwise near subjective
(although statistically sound) qualitative evaluation. For even though
the hypothesis seems to fit the known facts and illuminates many hitherto
problematic areas; only when we can express our criteria in the same sort
of quantitative terms as is done with amplitude and phase, or frequency
response and distortion, will we really be able to use these new techniques
as engineering tools!
From Airwaves '83
UP THE AUDITORY CANAL
GUN & CAMERA
by Mike Wright
The Dayton-Wright Group
It doesn't seem like it was over thirty years ago that I decided to build a Williamson power amplifier in order to get a good sounding audio system at a price I could afford. I suppose that was my first mistake, or perhaps it qualifies as two. Thinking that once I had a system that would be the end of it, and thinking that I would be able to afford it was, with the advantages of hindsight, ludicrous to the point of near insanity.
The amplifier when completed really needed a better output transformer, which was imported from England at great cost; and this when installed demonstrated the need for a better cartridge; which in turn ... well, I think you perceive the drift.
Only it didn't stop with purchased components. At MIT we were all poor engineering students with access to surplus parts and perhaps rather too much knowledge than was good for us; and so our rooms began to house strange-shaped pieces of apparatus, cobbled together from something or other having no relation to audio whatsoever.
And as would be expected, there was always one more thing that had to be modified. By the time I returned to Canada I had a truckload of stuff that probably struck terror into the heart of Canadian Customs; for it looked like a spill over of props from The Return of the Jedi.
Along the line I discovered live music. I was quite surprised at the time; but after all, I should have realized that the recorded sound had to come from somewhere! It sounded so much better than anything we could achieve with all of our equipment that I was hooked! I guess it was about that time I stopped trying to build stuff that was bigger and better than other designers stuff and concentrated on trying to make it should like the real thing. I suspect that this wasn't the best of all moves in the commercial sense, for it has been too often evident that the audio field judges systems by the way other systems sound. I would not be surprised to find that less than 1% of all the people buying stereo systems have been to live concerts. While it is therefore understandable that audiophiles can't agree on what instruments would sound like, that doesn't make it acceptable. It brings to mind the type of joke where the audiophile has bought a second oscilloscope so that he can watch the music in stereo. Superlatives have been so over-used that they too have confused the listener as to what he should be hearing. Violins are not always silky, cellos are not always gutty, and bass drums don't have to move the living room furniture across the floor!
Would it be a bad pun to say that too many recording engineers want the listener to think of stereotyped instrumental sounds? But its true. The days when a fifteen piece group could be recorded with three microphones are long since gone. Hell, come to think of it the days when a fifteen piece group could be recorded with fifteen microphones are long since gone! If you don't have that many microphones for the drummer alone, you're in trouble! A soprano sax might be great in a band, but I'm not convinced I want to hear it solo seven centimeters from my right ear.
The justification for multi-channel recording used to be that the added flexibility made it easier (and therefore cheaper) for the group to record. The fact that the studio had to increase its hourly rate to cover the cost of all those extra transistors buttons, boxes, and things somehow got overlooked. As the majority of consoles now look like the bridge of the Battlestar Galactia, one wonders if we haven't lost sight of what it was we were trying to do. I'm not denying the right of any starting musician to have himself phased, flanged, distorted, frequency shifted, and what have you so long as I'm not forced to listen to him in that sorry state. Has the transistor replaced talent and technique?
Anyhow, I never did get to own my “ideal” system, for I found that the more I listened to complete systems, the more my designs sounded like complete systems; and dammit, they were supposed to sound like the real stuff! Several years ago, therefore, I gave up on the idea of owning or listening to a great system, and resigned myself to using a $35.00 clock radio to hear the news. And while I do, listen very carefully to our systems, I have to temper this with live music lest I lose my ear. I suppose its like the wine taster who never really gets the chance to drink the stuff.
Why is it that systems don't sound real? With enough LED's to decorate Eaton's Christmas tree, enough transistors to include one with every school lunch in Toronto, enough watts to slow the turbines in Hoover dam, and more bits on one single CD recording than all the snowflakes that fell in Canada last year! Perhaps, this is why at the end of a tiring day, our sound systems sound lousy!
My theory is that before we perceive what we hear, and enjoy it, we have to process the two mono signals from the speakers, “tune out” the noise, adjust the frequency response, and convince our brains that what we are hearing is pretty good sounding stuff. Having more or less mentally processed this so as to render it a reasonable facsimile of live sound, we are now in a position to perform the mental processes leading to the actual enjoyment. But since we are of limited mental capacity (just ask any manufacturer), the more we have to do of the former, the less capacity is left over to do the latter! And so the music is not so much missing some essential quality as we simply can't relate to it in the way we relate to the same thing when its live. At audio shows such as CES, we've noticed that the system sounds worse at the end of the day. Now, common sense should tell us that this isn't really so; it's just that we're all pooped out! But what if that is really the way it sounded all along and what has happened is that we just are too tired to mentally process it any more? Interesting hypothesis, isn't it?
If a system could be developed that would not suffer from listener slump, wouldn't it leave a lot more mental facility for the enjoyment of the music? And might we not somehow recapture that ineffable quality lost in reproduced music? And this is what audio is all about! Not trying to emulate Colorado Spring's defense center on your coffee table, but listening to music that sounds real and that inspires the same degree of excitement and involvement as a live concert. Can we attain this? Yes, I think we can.
Since we reacquired our company, we have been trying to pin down the things that stop the music from sounding live. What are they? Nothing all that complicated, it would appear. For example, we seem to have found that when the ear hears instruments that exist in a space unrelated sonically to the ambient space around us (and conveyed -it is hoped -through the system from the concert hall or studio as well), the instrument is attention-getting but somehow unreal. It's as if the brain doesn't know quite how to handle the sound in relation to what it thinks of as “live” sounds, so while the instrument stands out, it stands apart from the very context of what it is we are trying to judge or enjoy. As a dramatic tool, this could have some justification, but then so does a cherry on a sundae. Does this mean we could or even would want to live on Maraschino cherries exclusively?
I doubt it. And I doubt that a collection of solo-sound instruments is really going to fool anyone into thinking that they are live, unless perhaps he is playing each instrument through its own speaker in an environment similar to that of the concert hall or night club.
What else is crudding things up? Now I know that there is a raging controversy between the technical people and the artisans as to whether a moving coil cartridge is really better. And it is too easy to say, “If you can't measure it doesn't exist” and come in on the side of the engineers. And I know that you are not supposed to be able to hear distortion under 0.3% (or 3% or 0.03% depending upon which authority you embrace). Why, then, when we make a pre-amplifier with a distortion of under .0001% at rated output does it sound more like the live thing than a pre-preamplifier with a distortion under identical conditions, of 0.002%? And this is a test we've done several times, without the listeners knowing what it was that they were listening to other than what they heard with consistent results.
We have noticed that the better the system
sounds when we're tired out, the more enjoyable it is when we're fresh
and the closer it sounds to the idealized concept of “live” music. We have
also noticed that there seems to be a great deal more listener involvement
in the music as well. It is better able to do its emotional thing, as it
were, when the listener is freed up from a lot of pre-processing.
So where does this put me on the subject of measurements?
I tell people that there are really two tests that have to be passed. One is quantitative, the other qualitative. Fancy measurements are useless if they are out of context. And gold plating the chassis for better conductivity is nonsense if it doesn't have to conduct in the first place.
Much as I would like to be able to measure everything in sight, I have found that there is always something else to measure for which measurement standards do not yet exist. Then I have found that differences people can hear can usually be measured, given enough time, equipment, and motivation. But I also recognize the fallacy of thinking that my acoustic memory is worth a damn. It can be tricked too easily. There are several things which can make even an A-B test totally wrong in its results! But just because my acoustic memory is quantitatively screwed up doesn't mean it's always qualitatively foul. Perhaps it's easier to understand if I say that I can't always identify what is wrong, but that I am aware that something is wrong! I don't have to use a distortion analyzer to know that, even if it is necessary to quantify the problem.
Why haven't some of the Miracles of Modern Technology that were supposed to improve the realism of reproduction dazzled us with realism? For MOMT you can insert (perhaps injudiciously) things like four-channel, ambiance generation, comb filter generation, etc. Could we make all of this more comprehensible if we were to divide these wondrous additions to our reproduction unit into two classes? Say, those items which remove unwanted distortions or artifacts without degrading the original signal, and those items which claim to enhance the signal and thus are really covering up missing information in the original or masking false information in the original signal? I don't think that's too complex a distinction is it?
The first could include equipment having lower distortion, lower noise, flatter phase response (although this will start fistfights), and other genuine improvements. By genuine, I mean improvements that bear some relationship to reality. In other words, I don't classify the extension of a speaker's top end response from 75 kHz to 400 kHz as especially useful.
The other group would include reverberation devices, holographic image generators, comb-filter/ phasing processors and even, perhaps, four-channel. I justify this on the basis that it may not present an acceptable input to our ears and brains.
I know this sounds esoteric, but remember that stereo is really two different monaural signals which, unlike stereo vision, are not presented individually to each ear as in binaural earphones but are pre-mixed, as it were, in the listening room. Some unusual mental processing is necessary for us to hear “stereo”. Why doesn't four-channel enhance the realism?
The answer may lie in the added difficulty that, for example, the brain may encounter in processing four channels of information into an acceptable format. There is no denying that under suitable circumstances four-channel sound can be more arresting. But how far from the real thing is four-channel sound? Do we know what added problems it presents? How much extra mental processing is involved? Does that processing have any reasonable correlation to the mind's processing of sounds from around the listener and, therefore, a preconditioned sort of program? Or does it complicate an already overworked system?
The test is, I think, that the systems which emulate most closely the natural presentation of sound to the listener probably require the least “new” or “unnatural” processing to make them psycoacoustically acceptable. We should be able to relate this “natural” presentation to what it is we encounter acoustically in the course of our normal lives. Do we encounter sharp cut-off passbands? How does the brain respond to sudden discontinuities in a passband? What sort of situation in nature would a severe phase shift simulate or would we ever encounter it? What sort of add resolving burden is caused by such a phase shift?
The science of psycoacoustically is only now starting to investigate some of these problems. But we can generalize to some extent, drawing some tentative conclusions from anomalous situations. Take for example the experience of seeing a roller coaster on the screen. Often it is far more nauseating in the cinema than in real life! Why? Because whenever there is a sensory conflict, the system feels ill. And in this case the visual reference cues are completely at odds with the balance sense, of the inner ear. And although the auditory situation in stereo perception is quite a bit more subtle, it is not unreasonable to expect the same sort of conflict!
If we accept that all of the brain's signal processing functions operate on a “shared time, semi-shared volume” system, we can see that any complication of the sound system which require, more initial processing is going to reduce the amount left which is needed for the enjoyment and interaction of the listener of and with the music. And that is why, in spite of all the advances made to date in stereo reproduction, it is the rare system indeed which is so transparent that it virtually leaves the listener alone with the orchestra in the concert hall.
An article in the Airwaves '83 Show
William M. D. Wright July 12, - Oct 15, 1984
Presented at the Toronto
Stereo Show – 1984 - also (In the form of a condensation) to the
Audiophile Society in November 1984. In December 1988, Electronics &
Technology Today Ran a story written by Bill Marwick, on 'Periodicity and
Perception' which follows.
The usual methods of speaker testing yield a mystery, one that your certain to have come across in the hi fi press: if two similar speakers from different manufacturers have identical distortion, tone burst and frequency response specifications, why does each speaker have a unique sound? Writers struggle to describe these differences, coming up with such terms as strident. veiled, or muddy in an attempt to capture the subtleties of sound.
One term that seems to work is “Transparent". The accepted meaning is that a transparent speaker adds nothing of its Own to the sound, producing natural audio that just seems to come out of thin air. In the early years of the hi fi boom, this was usually described as "an orchestra actually playing in your living room", an elusive goal for all but the best of systems. A speaker which is not transparent immediately tells you that you're listening to the music through a machine, and this is true whether or not the speaker does well in the standard tests.
Before the advent of affordable computer controlled test gear, there were a number of methods used to quantify speaker response, and despite hightech advances in equipment, they remain the mainstay. The most popular, and one that gives a great deal of information, is the frequency response test. A calibrated microphone is used to measure the output from the speaker as it is swept over the audio frequency range. Unless you have an anechoic chamber which prevents any reflected sound, this test is plagued with the peaks and dips of the room response itself. Some of the ways around this include the averaging of several tests from different directions and the use of rapidly swept frequencies to avoid stimulating room resonance's.
Testers soon realized that steady-state frequency response wasn't telling the whole story, and the tone burst test was used in an attempt to measure the speakers ability to respond quickly without overshoot; the test frequency is switched on and off rapidly, letting through a desired number of cycles. The difficulty comes in trying to interpret the imperfect tone burst which is picked up by the microphone. Sometimes the results have no apparent connection with the perceived sound.
Distortion seems to be an important parameter, measured with the usual notch filter or with a spectrum analyzer that can sum the value of the harmonics, but agree the difficulty lies in trying to explain why a speaker with high distortion sounds better than one with impeccable specifications.
Adding to the technical difficulty is the processing of the sound by the
listener, a subjective variable which we'll come to in a later section.
In the mid-1970s, Bell Laboratories published papers on the use of the Fourier transform in sound analysis. The Fourier analysis is a mathematical tool used to find the various components that make up a complex waveform; a spectrum analyzer displaying the harmonics of a sound is doing a Fast Fourier Transform (FFT). By doing another transform on the new-found components, you can find the periodicity, such anomalies as reflections or speaker shortcomings show up clearly.
The method of analysis was used to analyze the noise signature from Concorde jet engines; previously, the tests had been affected by sound reflections from the runway, but the periodicity tests allowed engineers to separate pure engine noise from the total sound. The method was later used by Bruel & Kjier in their industrial failure-prediction equipment to separate undesired machinery noise from the total sound, allowing detection of impending faults without the necessity of shutdown.
Here in Canada, the method was adapted to speaker analysis by Mike Wright developer of the Dayton Wright loudspeakers and Stabilant 22, a liquid semiconductor used as a contact enhancer (see the review in our October issue). Periodicity testing offered the possibility of easy removal of room effects from a response plot, as well as the detection of unwanted reflections from the speaker construction itself.
One year during a large trade show, Mike noticed that his awareness of speaker quality, was seriously affected by the noise and associated fatigue of maintaining the display booth. Speakers which had previously sounded fine were becoming a chore to listen to, a phenomenon which was easily interpreted as the brain's reluctance to accept any more input.
However, that night he went to a symphony concert and discovered that the five sound had none of the expected shortcomings. The conclusion he arrived at was that all speakers were introducing small oddities of their own, anomalies that the brain filtered out., This extremely cornplex filtering allows you to listen to desired sounds in the middle of a noisy party, and lack of it is why tape recordings of that party will later sound incredibly clattery and jumbled, since, the required important information (phase relationships, visual cues, etc.) is not present.
The periodicity tests seemed like the best way to analyze speaker output and find whatever faults were occupying so much of the brain's audio processing.
The present test setup in the Richmond Hill plant consists of a soundproof room which is finished inside to represent a typical listening environment and even includes easy chairs. A calibrated AKG microphone picks up the speaker output, which is a swept-frequency pulse train. The signals are processed by a Hewlett Packard spectrum analyzer can be displayed on its CRT as a standard frequency response, plot or as a spectrum of components (using the FFT). It also has a 16 bit output which is captured by an HP 68000-based computer.
The software, which consists of 17,000 lines in HP Basic, can then process the information to plot response, phase, and periodicity (the advantage in the spectrum analyzer lies in its speed - the computer takes much longer to derive the FFT).
When Bell Labs published their ideas on using Fourier analysis, someone whimsically labeled the various parameters using anagrams of familiar terms, and so the periodicity chart which looks something like a frequency spectrum, becomes a cepstrum .The periodicity is formally defined as the inverse FFT of the log power of the components of the sound, and the cepstrum is a plot of the ripple in a waveform for each time constant of the components causing the ripple.
For instance, if the cepstrum reveals a spike with a time constant of,
say, about 2 ms, then some two surfaces in the speaker environment are
causing a reflection, and they'll be about 2 feet apart (taking the velocity
of sound as 1 ft/ms). As to why this information is not revealed
in standard speaker testing. the information is there, but the test format
may not 6e ideal for displaying i4 just as a scope display of a square-wave
gives no hint that it's the sum of a long series of odd harmonics.
The process can be used to detect small reflections in drivers and cabinets. For example, speakers often sound better with the grill cloth removed; it is not just a case of sound absorption by the cloth -reflections from the frame itself can cause audible effects. The speaker on the cover is being tested with a fiberglass pad to eliminate surface effects; in production this would be replaced by acoustical foam, and the speaker is constructed so that there's no frame protruding past the front surface.
Standard speaker testing in combination with periodicity plots allows rapid analysis of the speaker drivers, enclosure, crossover, and listening environment. The result of investigating and correcting is a speaker that approaches the ideal of transparency, one that never lets you know its there.
The above discussion on speaker improvements is somewhat oversimplified, @ there's a great deal more to speaker analysis than watching a plot and tinkering here and there. The tester may use periodicity to discover some small ripple in the response, but the decision as to whether or not this is affecting the sound depends on the listener, and most listeners are almost unbelievably flexible in their perception of sound. In most cases, they're unaware of how their own mental processing is fooling them.
Mike Wright held a speaker listening test in which listeners came into the room while a set of speakers were playing. Then noises behind the curtain indicated that the speakers had been changed, and the test was repeated. The listeners liked the ‘first’ set, and said that the second set were inferior to it In fact, the speakers were never touched. What had happened is that the room acoustics dominated the sound environment when the people first entered for the first test. By the time of the second test they were used to the room and began to judge differently. There's also the fact that novelty affects perception; musicians often prefer someone else's instrument - for a while. When the novelty wears off, they're more objective about deciding.
The curtain in the above tests is also used in other testing because visual a= are so important to sound perception, particularly the localizing of a sound. Mike said that additional speakers placed at either side of the listener will cause them to that the stereo @ is much wider, even though the side speakers are not even connected.
Level settings are extremely important during comparison testing of speakers. The usual wisdom is that one dB is the minimum sound level difference we can detect, but the ear is much more sensitive to change in the midrange area; if speakers are tested with a level difference of about 0.5dB, the higher level gives the vague impression of brighter response. If the speakers are switched using the same amplifier, the more efficient speaker sounds louder and brighter.
There's also evidence that the right ear perceives high frequencies in
a different manner than the left, a fact which may be due to the partitioning
of the brain.
The speakers under test cannot occupy the same space, so room acoustics will cause a different sound response even if the speakers are identical. If the test is interrupted and the speakers are interchanged, the delay may not let the listener retain accurate impressions of the sound.
To sum up, the A/B test must be done under extremely well- controlled circumstances in order to reveal anything meaningful. Like statistics, they can be made to prove anything you want.
And how well did this research benefit the Dayton Wright line of speakers? They can definitely have a; right to the claim of transparency, their sound indicates meticulous care in design so much so that Stereo/Video guide of October, 1987 rated them as the number-one speakers
Special thanks to Mike Wright for the
time spent explaining speaker testing.
© 1984 , 1998 Wright Electroacoustics
I discovered this web-site quite by accident. An old audiophile friend, called me in excitement to tell me that there was a site where I could find information.about my XG-8's. I should tell you that at that time I was only a lowly reseach asistent as well as being a Scuba diver. I first met Wright at a Boston Sea Rovers Clinic in 1963. I lost touch with him until 1965 when I attended an Ontario Underwater Society Conference at the ROM. after I graduated and had moved to Montreal. I was happy to recognise him again in a Canadian Conference on Cold Water Diving that was held in Montreal. So you can imagine my suprise when I visited a Stereo Show in Montreal and recognised Wright at the Dayton Wright room.
My job moved me to Oakville. There I was able to purchase a used set of speakers in 1972 which he updated with a new power supply in 1974. Therefore I consider myself an old customer of Dayton Wright Associates Ltd.
I have developed a reputation of being very critical about recordings. I never was hesitant about informing anybody when I found something was wrong with the sound! Wright felt the same.
I made it clear that I didn't much care for the top-end of the XG-8's. He managed to locate one of his vacuum tube amplifiers. Then it was heaven; well, in spite of the cost or replacing the four output tubes once a year! Much later the company sent me a set of leaf tweeters to try. What a hell of a difference! Wright told me that he was about to switch over to the leaf tweeters but couldn't get the manufacturer to sell them in lots of less than 500 pieces! He was able to arrange for a LC to pay for them. No go, the manufacturer in Japan increased the minimum order to 2,000 pieces!I told Wright I had encountered this myself! Wright had been working with Motorola to modify their cone type with rear damping to reduce the edge reflections. I think that the company had to accept a compromise as audio nuts seemed intent on using transistor power amplifiers.
Wright tried to tame the weird load impedence of his speakers so that most transistor power amplifiers wouldn't self-destruct under extreme overload.
I was glad when the new backers of the company sent Wright to the Heathrow Audio Show in England. But I was not surprised when, behind his back, they all-but concluded a sale to Leigh Instroments of Waterloo. Some of my friends were doctors who had invested in a Canadian Hovercraft Venture Capital 'scheme' where Lacy was one of the promoters. When I found out that the same Lacy had taken over control of Leigh Instruments from another old friend. I had to assume that when Lacy was involved with Leigh's take-over of Marsland Manufacturing, (an old and reputable Waterloo company) I waited to see if some of Marsland's real estate holdings would be liquidated! . When it happened, admit that I was upset. I feared the worst. I attempted to contact Wright through their plant in Richmond Hill. Nobody would give Wright's telephone phone number in England nor was I able to contact him at the Heathrow. So I left a message with a guy named MacWilliams asking that Wright call me back. When I persisted in called the plant several more times, MacWilliams then told me in no uncertain terms, that it was none of my business as the sale had already taken place!
I could not help in any way as I had to be away for several months, completing the wind-up of an overseas contract. When I returned in December 1986, I was finally able contacted Wright. He had resigned from Leigh. He told me that Leigh had assigned him an office beside the sheet metal stamping department! It was just a set of partitions. Leigh had dispersed all of the test equipment to other departments. He told me that it took him over a month to locate enough of it so that the quality control department under Dave Reich was able to function!
We went through Leigh's take over of the company and all the confusion that resulted. It was as I had expected. Leigh had fired all of the older people. Just as in the Hovercraft company, the take-over looked rotten. I was not suprised when my broker informed me that Leigh's stock had been watered down!
But all this is over.
Now I am a retired physicist. I found that retirement did not suit me! Now I had the time attempted to make some sense of what Wright told me. Here is what I remember. I think it is correct. As most of my notes were jotted down after our conversations, I might have made sme unjustified assumptions.
In 1977, Wright told me that he finally had enough free time to complete his investigation into the phenomena of quantum tunneling. This seemed to be a method of transferring electrons in a non-conductive media; nothing new there. But the problem with grasping this effect is that the mathematics seem to work only if negative energy is somehow created in a micro-gap. If the investigation is carried further, it becomes involved with paradoxical behavior - such as time reversal. No doubt, others have had to stop at this point rather than face the possible ridicule of their peers!
The potential benefits seemed enormous, so Wright continued. By this time, he had worked on a theoretical method of the creation of complex filters by the employment of complex-plane power FFT's (See his term paper and his letter to Science). He had postulated that a three-dimenional memory 'matrix' could be created. If a method could be devised of selectively altering this 'matrix', it might be able to form a memory that might, in time, be able to 'exhibit a form of adaptive thinking'.
I remember Wright telling me, that in 1957, that he had already discussed this concept at a Science Fiction gathering in Boston. It was greeted with derision. Some of the essentials had not been formulated in their final form. Remember, he was already on shaky ground and could ill afford to cause a heated debate. He told me that "he was still searching for a form of 'liquid crystal' which might be used as a 'matrix'. This had already led him into an involvement with neuophysics.
He returned to Canada in 1964. He sought employment with Huntec who were then involved with Sonar-based scanning believe that he presented his complex-plane FFT filtering concept to Roger Huchings who was not very impressed. I think that there was a sales person who left to join Spar Aerospace. In the process, Wright's presentation was ignored. Huntec claimed that as their scanner already had sufficient resolution. Also, and this seemed to be at the heart of thr matter, there were no funds available for more R&D.
He later formed Hydrospace Developments Ltd. They specialized in high reliability saturation diving equipment. By this stage, Wright was considered as an expert in cold-water scuba and mixed-gas diving technology. His company, won a contract with the Inland Waters Branch of the Canadian Government. This was for the erection of a offshore tower. As it was already built st the department but now it had to be attached to floats. On launching the tower was horizotal. Then it had to be towed out of the basin. When it reached the place where it had to be erected (to the Southeast of the entrance to Hamilton Harbor) the lower half of the tower would drop to the bottom where it would be anchored. A set of anchors would be screwed into the sand on the bottom by using a method developed by Wright. Then an ancored series of guy wires, set in triangular pattern would be used to erect the oceanographic tower. The basic anchoring system used three screw-in anchors placed orthagonally so that all three met tothether at theit upper ends where shackles would hold them. Therefore, they could not walk their way out! Before this NRC used to employ concrete blocks as anchors. Now, concrete under water is not that heavy. In addition, it had a large surface area compared to its weight. It is not all suprising that every time that a storm hit, the anchors would wash away and the towers would topple! As is the case with many innovations, once the requirements are thought through, the solution is almost obvious!
He applied for patents on the then unique anchoring system. He was assured that IF his system worked and that the tower didn't fall, the Inland Waters Branch would not reveal the details to any subsequent bidder. Wright then discovered that all the bidders had received copies of the Hydrospace Developments Ltd., winning proposal. Since Hydrospace Developments Ltd., had incurred the R&D expenses he assumed that if the IMB kept their word, the company could amortize the R&D over a least two more towers in the five tower series
But by this time, Wright had talked to some of the underwater equipment manufacturers based in the New England area They had a recurring complaint. That that after they had sold one patented device to the Inland Waters Branch, they had discovered that it was being duplicated en masse. Then the Inland Waters Branch told them since they were a Canadian Government branch they were exempt from patent laws! Since that time Wright has never trusted the Canadian Government in anything.
Wright was so upset that he he vowed that the development of came to be known as 'Stabilant' wouldn't rely on any government assistance. He was quite vocal about waiting until the patents had been applied for. He stated that he would prefer to wait his patent agents advised him that there were no possibilities of either a US or a Canadian patent not being granted! He has also stated, that in his estimation, the major Canadian Corporations always are given preferential treatment when it comes to support for R&D! He has listed numerous Canadian Corporations whose R&D costs are well above the equivalent R&D costs for Small Corporations. He claims that the amount of paperwork required is so overwhelming that often almost 40% of the funds are spent on the proposal itself. I could appreciate his point that the amount of time can also be a critical factor. The delay would leave small R&D companies well behind the equivalent US companies!
I also discovered that he has been involved with research into autoimmune diseases such as Lupus. He gave me a copy which I persuaded him to include here.
He has stated many times that the secret to effective R&D is the ablity to construct a prototype and test it to see what unexpected bugs have to be worked out. It is termed 'constructive iteration'. He has shown me many exampes inluding the ability to cast a stereo record uding a RTV silicone mold and an spesial epoxy casting. He has also demonstrated the use of the production of nickle plated stampers by the use of the same tecnique.
I admit that Wright never seems to be stumped, but he has to be reined-in lest he gallop off in all directions!
I would like to thank Wm. Wright for accepting and including this e-mail as part of the Dayton Wright Ltd., web-site.
to Home Page