java.lang.Object | |
↳ | android.media.AudioFormat |
The AudioFormat
class is used to access a number of audio format and
channel configuration constants. They are for instance used
in AudioTrack
and AudioRecord
, as valid values in individual parameters of
constructors like AudioTrack(int, int, int, int, int, int)
, where the fourth
parameter is one of the AudioFormat.ENCODING_*
constants.
The AudioFormat
constants are also used in MediaFormat
to specify
audio related values commonly used in media, such as for KEY_CHANNEL_MASK
.
The AudioFormat.Builder
class can be used to create instances of
the AudioFormat
format class.
Refer to
AudioFormat.Builder
for documentation on the mechanics of the configuration and building
of such instances. Here we describe the main concepts that the AudioFormat
class
allow you to convey in each instance, they are:
Closely associated with the AudioFormat
is the notion of an
audio frame, which is used throughout the documentation
to represent the minimum size complete unit of audio data.
Expressed in Hz, the sample rate in an AudioFormat
instance expresses the number
of audio samples for each channel per second in the content you are playing or recording. It is
not the sample rate
at which content is rendered or produced. For instance a sound at a media sample rate of 8000Hz
can be played on a device operating at a sample rate of 48000Hz; the sample rate conversion is
automatically handled by the platform, it will not play at 6x speed.
As of API M
,
sample rates up to 192kHz are supported
for AudioRecord
and AudioTrack
, with sample rate conversion
performed as needed.
To improve efficiency and avoid lossy conversions, it is recommended to match the sample rate
for AudioRecord
and AudioTrack
to the endpoint device
sample rate, and limit the sample rate to no more than 48kHz unless there are special
device capabilities that warrant a higher rate.
Audio encoding is used to describe the bit representation of audio data, which can be either linear PCM or compressed audio, such as AC3 or DTS.
For linear PCM, the audio encoding describes the sample size, 8 bits, 16 bits, or 32 bits, and the sample representation, integer or float.
ENCODING_PCM_8BIT
: The audio sample is a 8 bit unsigned integer in the
range [0, 255], with a 128 offset for zero. This is typically stored as a Java byte in a
byte array or ByteBuffer. Since the Java byte is signed,
be careful with math operations and conversions as the most significant bit is inverted.
ENCODING_PCM_16BIT
: The audio sample is a 16 bit signed integer
typically stored as a Java short in a short array, but when the short
is stored in a ByteBuffer, it is native endian (as compared to the default Java big endian).
The short has full range from [-32768, 32767],
and is sometimes interpreted as fixed point Q.15 data.
ENCODING_PCM_FLOAT
: Introduced in
API LOLLIPOP
, this encoding specifies that
the audio sample is a 32 bit IEEE single precision float. The sample can be
manipulated as a Java float in a float array, though within a ByteBuffer
it is stored in native endian byte order.
The nominal range of ENCODING_PCM_FLOAT
audio data is [-1.0, 1.0].
It is implementation dependent whether the positive maximum of 1.0 is included
in the interval. Values outside of the nominal range are clamped before
sending to the endpoint device. Beware that
the handling of NaN is undefined; subnormals may be treated as zero; and
infinities are generally clamped just like other values for AudioTrack
– try to avoid infinities because they can easily generate a NaN.
ENCODING_PCM_FLOAT
for audio capture, processing,
and playback.
Floats are efficiently manipulated by modern CPUs,
have greater precision than 24 bit signed integers,
and have greater dynamic range than 32 bit signed integers.
AudioRecord
as of API M
and
AudioTrack
as of API LOLLIPOP
support ENCODING_PCM_FLOAT
.
For compressed audio, the encoding specifies the method of compression,
for example ENCODING_AC3
and ENCODING_DTS
. The compressed
audio data is typically stored as bytes in
a byte array or ByteBuffer. When a compressed audio encoding is specified
for an AudioTrack
, it creates a direct (non-mixed) track
for output to an endpoint (such as HDMI) capable of decoding the compressed audio.
For (most) other endpoints, which are not capable of decoding such compressed audio,
you will need to decode the data first, typically by creating a MediaCodec
.
Alternatively, one may use MediaPlayer
for playback of compressed
audio files or streams.
When compressed audio is sent out through a direct AudioTrack
,
it need not be written in exact multiples of the audio access unit;
this differs from MediaCodec
input buffers.
Channel masks are used in AudioTrack
and AudioRecord
to describe
the samples and their arrangement in the audio frame. They are also used in the endpoint (e.g.
a USB audio interface, a DAC connected to headphones) to specify allowable configurations of a
particular device.
As of API M
, there are two types of channel masks:
channel position masks and channel index masks.
BASE
.
For input and output, they imply a positional nature - the location of a speaker or a microphone
for recording or playback.
channel count | channel position mask |
1 | CHANNEL_OUT_MONO |
2 | CHANNEL_OUT_STEREO |
3 | CHANNEL_OUT_STEREO | CHANNEL_OUT_FRONT_CENTER |
4 | CHANNEL_OUT_QUAD |
5 | CHANNEL_OUT_QUAD | CHANNEL_OUT_FRONT_CENTER |
6 | CHANNEL_OUT_5POINT1 |
7 | CHANNEL_OUT_5POINT1 | CHANNEL_OUT_BACK_CENTER |
8 | CHANNEL_OUT_7POINT1_SURROUND |
CHANNEL_OUT_STEREO
is composed of CHANNEL_OUT_FRONT_LEFT
and
CHANNEL_OUT_FRONT_RIGHT
.
M
. They allow
the selection of a particular channel from the source or sink endpoint by number, i.e. the first
channel, the second channel, and so forth. This avoids problems with artificially assigning
positions to channels of an endpoint, or figuring what the ith position bit is within
an endpoint's channel position mask etc.
CHANNEL_OUT_QUAD
device, but really one is only interested in channel 0
through channel 3. The USB device would then have the following individual bit channel masks:
CHANNEL_OUT_FRONT_LEFT
,
CHANNEL_OUT_FRONT_RIGHT
, CHANNEL_OUT_BACK_LEFT
and CHANNEL_OUT_BACK_RIGHT
. But which is channel 0 and which is
channel 3?
1 << channelNumber
.
A set bit indicates that channel is present in the audio frame, otherwise it is cleared.
The order of the bits also correspond to that channel number's sample order in the audio frame.
0xF
. Suppose we wanted to select only the first and the third channels; this would
correspond to a channel index mask 0x5
(the first and third bits set). If an
AudioTrack
uses this channel index mask, the audio frame would consist of two
samples, the first sample of each frame routed to channel 0, and the second sample of each frame
routed to channel 2.
The canonical channel index masks by channel count are given by the formula
(1 << channelCount) - 1
.
CHANNEL_OUT_FRONT_LEFT
,
CHANNEL_OUT_FRONT_CENTER
, etc. for HDMI home theater purposes.
AudioTrack
to output movie content, where 5.1 multichannel output is to be written.
AudioRecord
may only want the
third and fourth audio channels of the endpoint (i.e. the second channel pair), and not care the
about position it corresponds to, in which case the channel index mask is 0xC
.
Multichannel AudioRecord
sessions should use channel index masks.
For linear PCM, an audio frame consists of a set of samples captured at the same time,
whose count and
channel association are given by the channel mask,
and whose sample contents are specified by the encoding.
For example, a stereo 16 bit PCM frame consists of
two 16 bit linear PCM samples, with a frame size of 4 bytes.
For compressed audio, an audio frame may alternately
refer to an access unit of compressed data bytes that is logically grouped together for
decoding and bitstream access (e.g. MediaCodec
),
or a single byte of compressed data (e.g. AudioTrack.getBufferSizeInFrames()
),
or the linear PCM frame result from decoding the compressed data
(e.g.AudioTrack.getPlaybackHeadPosition()
),
depending on the context where audio frame is used.
Nested Classes | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
AudioFormat.Builder |
Builder class for AudioFormat objects.
|
Public Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Return the channel count.
| |||||||||||
Return the channel index mask.
| |||||||||||
Return the channel mask.
| |||||||||||
Return the encoding.
| |||||||||||
Return the sample rate.
| |||||||||||
Returns a string containing a concise, human-readable description of this
object.
|
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
From class
java.lang.Object
|
This constant was deprecated
in API level 5.
Use CHANNEL_OUT_DEFAULT
or CHANNEL_IN_DEFAULT
instead.
This constant was deprecated
in API level 5.
Use CHANNEL_INVALID
instead.
This constant was deprecated
in API level 5.
Use CHANNEL_OUT_MONO
or CHANNEL_IN_MONO
instead.
This constant was deprecated
in API level 5.
Use CHANNEL_OUT_STEREO
or CHANNEL_IN_STEREO
instead.
Invalid audio channel mask
This constant was deprecated
in API level 23.
Not the typical 7.1 surround configuration. Use CHANNEL_OUT_7POINT1_SURROUND
instead.
Default audio channel mask
Audio data format: AC-3 compressed
Default audio data format
Audio data format: DTS compressed
Audio data format: DTS HD compressed
Audio data format: E-AC-3 compressed
Invalid audio data format
Audio data format: PCM 16 bit per sample. Guaranteed to be supported by devices.
Audio data format: PCM 8 bit per sample. Not guaranteed to be supported by devices.
Audio data format: single-precision floating-point per sample
Return the channel count.
Return the channel index mask.
See the section on channel masks for more information about
the difference between index-based masks, and position-based masks (as returned
by getChannelMask()
).
setChannelIndexMask(int)
or
CHANNEL_INVALID
if not set or an invalid mask was used.
Return the channel mask.
See the section on channel masks for more information about
the difference between index-based masks(as returned by getChannelIndexMask()
) and
the position-based mask returned by this function.
setChannelMask(int)
or
CHANNEL_INVALID
if not set.
Return the encoding. See the section on encodings for more information about the different types of supported audio encoding.
setEncoding(int)
or
ENCODING_INVALID
if not set.
Return the sample rate.
setSampleRate(int)
or
0 if not set.
Returns a string containing a concise, human-readable description of this object. Subclasses are encouraged to override this method and provide an implementation that takes into account the object's type and data. The default implementation is equivalent to the following expression:
getClass().getName() + '@' + Integer.toHexString(hashCode())
See Writing a useful
toString
method
if you intend implementing your own toString
method.