video terminology and relationships between keywords when recording to a card and editing with KDEnlive

Hi, I have done a weeks cramming and read all relevant parts of "the film makers handbook" and a ton of wikipedia stuff. I am working on a mini manual which explains:

the history of some of the technical settings you are offered, such as frames per second,
the meaning of words such as "video format" and "file format", "wrapper" and "codec" (and standards such as mpeg-4)

I will also make a "free software workflow" using kdenlive as my NLE, this will begin with suggesting things like libreoffice and celtix for pre-production and end with creative commons licences and "made with freesoftware" stamps in the distribution section.

While I was reading, I began to see that much terminology blurs. For example the word codec is often used to talk about a standard like mpeg, the word file format, is often used to refer to the wrapper, the word video format is often used to refer to the file format etc. Some of this is caused by this new generation of cameras producing mp4 and h264 files, some is just for simplicity or from lack of understanding.

Anyway, I started work on a diagram to try and explain some this this. It's probably very nerdy and wont be to everyone's taste, but I would reeeeally value feedback from other nerdy people :D

I haven't worked on my spelling or the presentation yet, being dyslexic, i am sure there are mistakes - its just the content I am looking at for now.

Video terminology is a big mess - partly for historical reasons, and partly because of marketing brand pushing etc.

Speaking of codecs &co, I would put it like this:

1) Mpeg2, H264 etc... are STANDARDS (hopefully :-)

2) CODECS are the particular incarnations of software and/or hardware that code and decode video to/from a given standard.

There can be (and usually are) many different codecs available from various vendors for a single standard. Ideally, any codec for standard X should be able to read streams created by any other codec for standard X.

For example, FFMPEG contains a collection of codecs for a number of different standards. You chose one with the "-vcodec" switch. Type "ffmpeg -codecs" to see what is available.

3) CONTAINERS are more or less file formats (avi, mov,...) into which one or more video/audio/etc streams can be multiplexed. There are usually several multiplexers (SW,HW) available for each container.

FFMPEG can do the (de)multiplexing into several types of containers, choose with "-f" or let it guess from the output file extension you give.
type "ffmpeg -formats" to see which are available.

So, what you do is:
-you choose a standard into which to encode your video
-you choose a codec for that standard
-you choose a file format (container) and the required muxer.

For example, you can use the FFMPEG mpeg2 codec to encode to the mpeg-2 standard, and the FFMPEG mpeg-ps muxer to put the resulting stream into an mpeg-ps container file - all with a single ffmpeg command(line).

BUT OF COURSE life is not that simple - you can not arbitrarily combine standards with containers, and some codecs will have problems reading output of other codecs for the "same standard".

FOSS tools like ffmpeg, vlc, etc are quite forgiving in this respect, so one gets easily relaxed and spoiled when working on Linux!
Commercial stuff (either software or hardware players) tends to be much more picky about what it will eat, and you must be much more cautious when preparing media for the other (non linux) sphere to consume.

BTW, I am not sure if the term "composite signal" is still used with digital video - once it meant the composite (luma+chroma) analog signal.

yeah, i think its still used, but its just a digital composite signal - I am going to annotate that what goes in is analoge and what gets recorded is digital. Just working on the text of the article now, most of what you said is how I have written it, expect the bit about multiplexers whjich i didnt know about at all. thanks for help, keep em comin!

hum, dammit it think you are right. The book is not clear on this point. It almost seems like there is no compression, and that its just the bits per frame/bit depth stuff which defines how much data gets converted and stored Does anyone know more about this?

NB - rang up a proffesional camera shop and asked about this - there are def two types of compression. Will write about what they are later : )

The paragraph you linked just counts how many bits you need for uncompressed video, for a 640x480 format, etc.
(and their use of the word "orthogonal" is a bit weird, they probably just want to say that the time axis is in a separate dimension - otherwise the successive frames are usually highly correlated, far from "mathematically" orthogonal)

Except for some in-house links in studios, you will probably never encounter uncompressed video. For example SDI, DVI, HDMI etc links carry uncompressed video over short distances.

There are quite a few different video compression standards.
Mainly, you should divide them into two main groups:

- those that compress each frame independently ("I-frame only")

- and those that make use of interframe correlation, to achieve higher
compression ratios. Because successive frames are similar, you can save
bits by transmitting only the differences from the previous/next frame,
and encode a full frame only every X frames.

The first group is used in most pro camcorders, and consumer DV cameras.
It is "editing friendly", as frame accurate seeking and cuts are easy.

The second group is used in consumer HD camcorders, and for transmission (either over the air broadcasting or network streaming) and storage, to save bandwidth/space.

DV is a very popular standard from the first group, it compresses each frame as a kind of JPEG image, and consumes 25Mbits/sec.
There are "pro" versions of DV called DVCAM, that are basically the same thing, just with a higher bitrate (50 and 100 Mb/s)
The MJPEG, DNxHD and Apple ProRes are also independent frames formats good for editing.

In the second group you have things like MPEG-X, AVCHD, H264, etc...
AVCHD is used in consumer HD camcorders and in video capable SLRs.
HDV camcorders use a variant of MPEG-2.

hum, I was coming to the conclusion that the i-frame and/or/etc successive frame compression were separate to the initial data disregard. The camera shop guy also agreed, saying that the compression which makes the end file is the same as you do on a pc, but that the initial "compression" is more of a complex hardware thing where colour data is simply thrown away and is usually not called compression anyway.

is this also your understanding? I get the iframe type comrpession stuff quite well, but am just trying to clarify how we talk about the data loss before that stage.

Let me try to describe what happens in a video camera....

1. From the sensor you get analog signals proportional to the light. In a three-chip camera, you get three separate signals on three separate wires (R,G,B). In a single chip camera, you get successive samples (time multiplexed) of the three colors on the same wire, see "Bayer pattern". Therefore, in a single chip camera, an interpolation step is necessarry next, to "spread" the color information over each pixel. This does not necessarily mean that a single chip camera has inferior color resolution, as the chip can have more pixels than the final image (like the cams can take stills with higher res than video), and step 6 reduces color information anyway, even in a three chip cam.

2. Some analog processing is done on the three signals, like adjusting the brightness and contrast ("pedestal"* and "gain"*) and some sharpening ("peaking"* or "aperture correction"*).

The sequence of the next three steps can vary, but they will always be there.

3. A so called "gamma correction" is applied to the signals. You will be often told that this was because the CRT displays were nonlinear, however the fact is, that this is a very beneficial transformation, allowing for a better subjective dynamic range, and would make sense even if there were no CRT's. See Poynton's "color FAQ" for more info on this. After this correction, the signal names are usually appended with a "prime" mark like R' G' B'. Gamma is a kind of dynamic range compression, without it, you would need to sample with more bits of precision, so you might see it also as a kind of (lossy) data compression.

4. The signals are converted into the digital domain (A/D conversion).

5. The signals are converted from R'G'B' to Y'CrCb using a simple matrix multiplication. This is a kind of separation into a black and white image (Y', "Luma") and the color information (Cr and Cb, "Chroma"). This is a trick that was invented in the 1950's to make analog color television backwards compatible, and to save some bandwidth (see point 6 below).

6. One half (4:2:2) to three quarters (4:2:0 and 4:1:1) of the Cr and Cb samples are thrown away. One can afford this because the human eye is less sharp in seeing color differences than brightness patterns. This can be considered as the first "lossy compression" step in the pipeline, that reduces the data rate to two thirds or one half of the original - but in practice, this subsampled signal is still considered to be "uncompressed video".

In modern consumer cams, all of the above steps can be integrated on the same CMOS "sensor" chip.

7. Now we finally come to the actual video codec (using the DV, HDV, AVCHD... standard), which will further reduce the data rate, up to 10 times or more. The output of this step is what you get into your computer, when you download video from your cam.

These are the major steps, that each cam has. Of course, there will be some additional processing done between them, like noise reduction and white point adjustment, but these are secondary, and do not change the "big picture" of things.

* like any self-respecting guild, video engineers have their own "latin", to keep the uninitiated at bay!

thankyou - this was very very very helpfull - It was almost as i understood it myself, which lets me know i have read the right things, however its great to see it all in once place. two points

firsly - is step 5 still called a "composite" signal?

secondly - does step six have a name?



Composite signal:
In the analog days, there was no step 4, and steps 5 and 6 were done with analog circuits (weighted sums and lowpass filtering).
Following that, the chroma information was modulated on a special subcarrier, which was then added to the luma signal, thus making a single "composite" signal, that contained both luma and chroma (and synchronization pulses), and could be sent down a single wire (your cable with the yellow RCA jack).

In the digital domain, there is hardly anything resembling that, so I am hesitant to use the term "composite signal" there. Step 5 is just luma/chroma separation. Technically, it is a change of coordinate systems in the 3D color space.

Step six is "chroma subsampling". There are several ways how it's done, depending on how much chroma you discard, and where you put the remaining chroma samples (420 vs 411, co-sited vs interstitial etc... :-) Much of the complication here comes from the (historical) need to encode interlaced video.