altavista에서 wav file format 으로 검색해서 몇군데 가보았더니...
두 사이트의 내용을 따 왔습니다.
결국.. 앞의 36바이트는 헤더이고..
다음바이트부터 끝 바이트까지가 속칭 'DATA Chunk'라고 불리는 부분입니다.
그 'DATA Chunk'에서 처음 4바이트는 'data'라는 아스키 문자이고...
그 다음 4바이트(파일전체에서 보았을때 위치가 40-43 바이트)가 실제 '데이터'의 크기입니다.
그 다음부터가 모두 '소리의 크기'에 관한 데이터입니다.
아래의 내용의 저~~~ 아래에 보면 스테레오일때, 채널이 여럿일때(스테레오, 4채널,등등)의 프레임
순서가 나오고요...
저도 직접 해보진 않았지만.. (호호...) 아래의 글을 읽으니 알 것 같네요...
앞부분에 실제 데이터를 가지고 분석해보는 부분이 나오는데.. 꼭 읽어보시길.. 도움될듯...
만약.. 내일 술 안마시면.. (친구가 놀러 온다고 해서..) 테스트를 해보고 싶어지는데...
성공(?)하시면 연락주세요.. 코드좀 올려주시면 더욱 감사하고요...
몰라도 아는척.. 구라의 황제.. 천강협이었습니다.
WAV File Format Description
http://www.technology.niagarac.on.ca/courses/comp630/WavFileFormat.html
WAV files are probably the simplest of the common formats for storing audio samples. Unlike MPEG and other compressed formats, WAVs store samples "in the raw" where no pre-processing is required other that formatting of the data.
The following information was derived from several sources including some on the internet which no longer exist. Being somewhat of a proprietary Microsoft format there are some elements here which were empirically determined and so some details may remain somewhat sketchy. From what I've heard, the best source for information is the File Formats Handbook by Gunter Born (1995, ITP Boston)
The WAV file itself consists of three "chunks" of information: The RIFF chunk which identifies the file as a WAV file, The FORMAT chunk which identifies parameters such as sample rate and the DATA chunk which contains the actual data (samples).
Each Chunk breaks down as follows:
RIFF Chunk (12 bytes in length total)
Byte Number
0 - 3 "RIFF" (ASCII Characters)
4 - 7 Total Length Of Package To Follow (Binary, little endian)
8 - 11 "WAVE" (ASCII Characters)
FORMAT Chunk (24 bytes in length total)
Byte Number
0 - 3 "fmt_" (ASCII Characters)
4 - 7 Length Of FORMAT Chunk (Binary, always 0x10)
8 - 9 Always 0x01
10 - 11 Channel Numbers (Always 0x01=Mono, 0x02=Stereo)
12 - 15 Sample Rate (Binary, in Hz)
16 - 19 Bytes Per Second
20 - 21 Bytes Per Sample: 1=8 bit Mono, 2=8 bit Stereo or 16 bit Mono, 4=16 bit Stereo
22 - 23 Bits Per Sample
DATA Chunk
Byte Number
0 - 3 "data" (ASCII Characters)
4 - 7 Length Of Data To Follow
8 - end Data (Samples)
The easiest approach to this file format might be to look at an actual WAV file to see how data is stored. In this case, we examine DING.WAV which is standard with all Windows packages. DING.WAV is an 8-bit, mono, 22.050 KHz WAV file of 11,598 bytes in length. Lets begin by looking at the header of the file (using DEBUG).
246E:0100 52 49 46 46 46 2D 00 00-57 41 56 45 66 6D 74 20 RIFFF-..WAVEfmt
246E:0110 10 00 00 00 01 00 01 00-22 56 00 00 22 56 00 00 ........"V.."V..
246E:0120 01 00 08 00 64 61 74 61-22 2D 00 00 80 80 80 80 ....data"-......
246E:0130 80 80 80 80 80 80 80 80-80 80 80 80 80 80 80 80 ................
246E:0140 80 80 80 80 80 80 80 80-80 80 80 80 80 80 80 80 ................
As expected, the file begins with the ASCII characters "RIFF" identifying it as a WAV file. The next four bytes tell us the length is 0x2D46 bytes (11590 bytes in decimal) which is the length of the entire file minus the 8 bytes for the "RIFF" and length (11598 - 11590 = 8 bytes).
The ASCII characters for "WAVE" and "fmt " follow. Next (line 2 above) we find the value 0x00000010 in the first 4 bytes (length of format chunk: always constant at 0x10). The next four bytes are 0x0001 (Always) and 0x0001 (A mono WAV, one channel used).
Since this is a 8-bit WAV, the sample rate and the bytes/second are the same at 0x00005622 or 22,050 in decimal. For a 16-bit stereo WAV the bytes/sec would be 4 times the sample rate. The next 2 bytes show the number of bytes per sample to be 0x0001 (8-bit mono) and the number of bits per sample to be 0x0008.
Finally, the ASCII characters for "data" appear followed by 0x00002D22 (11,554 decimal) which is the number of bytes of data to follow (actual samples). The data is a value from 0x00 to 0xFF. In the example above 0x80 would represent "0" or silence on the output since the DAC used to playback samples is a bipolar device (i.e. a value of 0x00 would output a negative voltage and a value of 0xFF would output a positive voltage at the output of the DAC on the sound card).
Note that there are extension to the basic WAV format which may be supported in newer systems -- for example if you look at DING.WAV in Windows '95 you'll see some extra bytes added after the format chunk before the "data" area -- but the basic format remains the same.
As a final example consider the header for the following WAV file recorded at 44,100 samples per second in 16-bit stereo.
246E:0100 52 49 46 46 2C 48 00 00-57 41 56 45 66 6D 74 20 RIFF,H..WAVEfmt
246E:0110 10 00 00 00 01 00 02 00-44 AC 00 00 10 B1 02 00 ........D.......
246E:0120 04 00 10 00 64 61 74 61-00 48 00 00 00 00 00 00 ....data.H......
246E:0130 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
Again we find all the expected structures. Note that the sample rate is 0xAC44 (44,100 as an unsigned int in decimal) and the bytes/second is 4 times that figure since this is a 16-bit WAV (* 2) and is stereo (again * 2). The Channel Numbers field is also found to be 0x02 here and the bits per sample is 0x10 (16 decimal).
http://www.wotsit.org
WAVE File Format
WAVE File Format is a file format for storing digital audio (waveform) data. It supports a variety of bit resolutions, sample rates, and channels of audio. This format is very popular upon IBM PC (clone) platforms, and is widely used in professional programs that process digital audio waveforms. It takes into account some pecularities of the Intel CPU such as little endian byte order.
This format uses Microsoft's version of the Electronic Arts Interchange File Format method for storing data in "chunks".
Data Types
A C-like language will be used to describe the data structures in the file. A few extra data types that are not part of standard C, but which will be used in this document, are:
pstring Pascal-style string, a one-byte count followed by that many text bytes. The total number of bytes in this data type should be even. A pad byte can be added to the end of the text to accomplish this. This pad byte is not reflected in the count.
ID A chunk ID (ie, 4 ASCII bytes).
Also note that when you see an array with no size specification (e.g., char ckData[];), this indicates a variable-sized array in our C-like language. This differs from standard C arrays.
Constants
Decimal values are referred to as a string of digits, for example 123, 0, 100 are all decimal numbers. Hexadecimal values are preceded by a 0x - e.g., 0x0A, 0x1, 0x64.
Data Organization
All data is stored in 8-bit bytes, arranged in Intel 80x86 (ie, little endian) format. The bytes of multiple-byte values are stored with the low-order (ie, least significant) bytes first. Data bits are as follows (ie, shown with bit numbers on top):
7 6 5 4 3 2 1 0
+-----------------------+
char: | lsb msb |
+-----------------------+
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
+-----------------------+-----------------------+
short: | lsb byte 0 | byte 1 msb |
+-----------------------+-----------------------+
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 23 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24
+-----------------------+-----------------------+-----------------------+-----------------------+
long: | lsb byte 0 | byte 1 | byte 2 | byte 3 msb |
+-----------------------+-----------------------+-----------------------+-----------------------+
File Structure
A WAVE file is a collection of a number of different types of chunks. There is a required Format ("fmt ") chunk which contains important parameters describing the waveform, such as its sample rate. The Data chunk, which contains the actual waveform data, is also required. All other chunks are optional. Among the other optional chunks are ones which define cue points, list instrument parameters, store application-specific information, etc. All of these chunks are described in detail in the following sections of this document.
All applications that use WAVE must be able to read the 2 required chunks and can choose to selectively ignore the optional chunks. A program that copies a WAVE should copy all of the chunks in the WAVE, even those it chooses not to interpret.
There are no restrictions upon the order of the chunks within a WAVE file, with the exception that the Format chunk must precede the Data chunk. Some inflexibly written programs expect the Format chunk as the first chunk (after the RIFF header) although they shouldn't because the specification doesn't require this.
Here is a graphical overview of an example, minimal WAVE file. It consists of a single WAVE containing the 2 required chunks, a Format and a Data Chunk.
__________________________
| RIFF WAVE Chunk |
| groupID = 'RIFF' |
| riffType = 'WAVE' |
| __________________ |
| | Format Chunk | |
| | ckID = 'fmt ' | |
| |__________________| |
| __________________ |
| | Sound Data Chunk | |
| | ckID = 'data' | |
| |__________________| |
|__________________________|
A Bastardized Standard
The WAVE format is sort of a bastardized standard that was concocted by too many "cooks" who didn't properly coordinate the addition of "ingredients" to the "soup". Unlike with the AIFF standard which was mostly designed by a small, coordinated group, the WAVE format has had all manner of much-too-independent, uncoordinated aberrations inflicted upon it. The net result is that there are far too many chunks that may be found in a WAVE file -- many of them duplicating the same information found in other chunks (but in an unnecessarily different way) simply because there have been too many programmers who took too many liberties with unilaterally adding their own additions to the WAVE format without properly coming to a concensus of what everyone else needed (and therefore it encouraged an "every man for himself" attitude toward adding things to this "standard"). One example is the Instrument chunk versus the Sampler chunk. Another example is the Note versus Label chunks in an Associated Data List. I don't even want to get into the totally irresponsible proliferation of compressed formats. (ie, It seems like everyone and his pet Dachshound has come up with some compressed version of storing wave data -- like we need 100 different ways to do that). Furthermore, there are lots of inconsistencies, for example how 8-bit data is unsigned, but 16-bit data is signed.
I've attempted to document only those aspects that you're very likely to encounter in a WAVE file. I suggest that you concentrate upon these and refuse to support the work of programmers who feel the need to deviate from a standard with inconsistent, proprietary, self-serving, unnecessary extensions. Please do your part to rein in half-ass programming.
Sample Points and Sample Frames
A large part of interpreting WAVE files revolves around the two concepts of sample points and sample frames.
A sample point is a value representing a sample of a sound at a given moment in time. For waveforms with greater than 8-bit resolution, each sample point is stored as a linear, 2's-complement value which may be from 9 to 32 bits wide (as determined by the wBitsPerSample field in the Format Chunk, assuming PCM format -- an uncompressed format). For example, each sample point of a 16-bit waveform would be a 16-bit word (ie, two 8-bit bytes) where 32767 (0x7FFF) is the highest value and -32768 (0x8000) is the lowest value. For 8-bit (or less) waveforms, each sample point is a linear, unsigned byte where 255 is the highest value and 0 is the lowest value. Obviously, this signed/unsigned sample point discrepancy between 8-bit and larger resolution waveforms was one of those "oops" scenarios where some Microsoft employee decided to change the sign sometime after 8-bit wave files were common but 16-bit wave files hadn't yet appeared.
Because most CPU's read and write operations deal with 8-bit bytes, it was decided that a sample point should be rounded up to a size which is a multiple of 8 when stored in a WAVE. This makes the WAVE easier to read into memory. If your ADC produces a sample point from 1 to 8 bits wide, a sample point should be stored in a WAVE as an 8-bit byte (ie, unsigned char). If your ADC produces a sample point from 9 to 16 bits wide, a sample point should be stored in a WAVE as a 16-bit word (ie, signed short). If your ADC produces a sample point from 17 to 24 bits wide, a sample point should be stored in a WAVE as three bytes. If your ADC produces a sample point from 25 to 32 bits wide, a sample point should be stored in a WAVE as a 32-bit doubleword (ie, signed long). Etc.
Furthermore, the data bits should be left-justified, with any remaining (ie, pad) bits zeroed. For example, consider the case of a 12-bit sample point. It has 12 bits, so the sample point must be saved as a 16-bit word. Those 12 bits should be left-justified so that they become bits 4 to 15 inclusive, and bits 0 to 3 should be set to zero. Shown below is how a 12-bit sample point with a value of binary 101000010111 is formatted left-justified as a 16-bit word.
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
| | | | | | | | | | | | | | | | |
| 1 0 1 0 0 0 0 1 0 1 1 1 0 0 0 0 |
|___|___|___|___|___|___|___|___|___|___|___|___|___|___|___|___|
<---------------------------------------------> <------------->
12 bit sample point is left justified rightmost
4 bits are
zero padded
But note that, because the WAVE format uses Intel little endian byte order, the LSB is stored first in the wave file as so:
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
| | | | | | | | | | | | | | | | | |
| 0 1 1 1 0 0 0 0 | | 1 0 1 0 0 0 0 1 |
|___|___|___|___|___|___|___|___| |___|___|___|___|___|___|___|___|
<-------------> <-------------> <----------------------------->
bits 0 to 3 4 pad bits bits 4 to 11
For multichannel sounds (for example, a stereo waveform), single sample points from each channel are interleaved. For example, assume a stereo (ie, 2 channel) waveform. Instead of storing all of the sample points for the left channel first, and then storing all of the sample points for the right channel next, you "mix" the two channels' sample points together. You would store the first sample point of the left channel. Next, you would store the first sample point of the right channel. Next, you would store the second sample point of the left channel. Next, you would store the second sample point of the right channel, and so on, alternating between storing the next sample point of each channel. This is what is meant by interleaved data; you store the next sample point of each of the channels in turn, so that the sample points that are meant to be "played" (ie, sent to a DAC) simultaneously are stored contiguously.
The sample points that are meant to be "played" (ie, sent to a DAC) simultaneously are collectively called a sample frame. In the example of our stereo waveform, every two sample points makes up another sample frame. This is illustrated below for that stereo example.
sample sample sample
frame 0 frame 1 frame N
_____ _____ _____ _____ _____ _____
| ch1 | ch2 | ch1 | ch2 | . . . | ch1 | ch2 |
|_____|_____|_____|_____| |_____|_____|
_____
| | = one sample point
|_____|
For a monophonic waveform, a sample frame is merely a single sample point (ie, there's nothing to interleave). For multichannel waveforms, you should follow the conventions shown below for which order to store channels within the sample frame. (ie, Below, a single sample frame is displayed for each example of a multichannel waveform).
channels 1 2
_________ _________
| left | right |
stereo | | |
|_________|_________|
1 2 3
_________ _________ _________
| left | right | center |
3 channel | | | |
|_________|_________|_________|
1 2 3 4
_________ _________ _________ _________
| front | front | rear | rear |
quad | left | right | left | right |
|_________|_________|_________|_________|
1 2 3 4
_________ _________ _________ _________
| left | center | right | surround|
4 channel | | | | |
|_________|_________|_________|_________|
1 2 3 4 5 6
_________ _________ _________ _________ _________ _________
| left | left | center | right | right |surround |
6 channel | center | | | center | | |
|_________|_________|_________|_________|_________|_________|
The sample points within a sample frame are packed together; there are no unused bytes between them. Likewise, the sample frames are packed together with no pad bytes.
Note that the above discussion outlines the format of data within an uncompressed data chunk. There are some techniques of storing compressed data in a data chunk. Obviously, that data would need to be uncompressed, and then it will adhere to the above layout.