Pyramid Motion Estimation

Background

- Basically it uses the MV of the lower resolution ME as an approximation of the MVp for the higher resolution

http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0405/ZAMPOGLU/Hierarchicalestimation.html
http://www.ece.cmu.edu/~ee899/project/deepak_mid.htm
- Alex is working on implementing this in C as a "Gold Standard" solution
  - see svn://...branches/pyramid

Overview

1) downscale the image by a factor of 2^A. A normal A is 4, for a downscale of 16.

2) Perform motion search on this image with a block size of, say, 16x16, and a diamond motion search

3) A--;

4) Downscale by 2^A, take the corresponding blocks from the previous search and upscale their motion vectors by a factor of 2, and split them into 4 new 16x16 blocks at the current downscaling level

5) Do the motion search again, refining the vectors one has found so far.

6) If(A!=0) GOTO 3

the diamond search is the trivial one: radius-1 diamond, aka "gradient descent." You could of course use something fancier than diamond, but that should be sufficient for pyramidal.

Motivation

- Allegedly the Badaboom encoder uses this solution
- Recommended by Dark_Shikari

Questions

- When down sampling the image what do you do if the image is not divisible by 16 * 2^A? pad (same as H.264)
- Would it be better do overlap the down sampled regions?
- When performing the search, should we be moving over in 1 pel or 2^A pel steps? (2^A pel steps)

Downsampling

- Right now using averaging.
- Alternatives: http://forums.nvidia.com/lofiversion/index.php?t60853.html

In x264

Where is the down sample method located in x264?

- The method (in x264) is NOT absolutely suitable to what we're doing here, and if we actually committed the C code, we'd slightly modify it, but it will work
- look at: x264_frame_init_lowres
- frame_init_lowres_core is an asm function that interpolates 4 lowres planes
  - This is particularly useful if you want to allow subpel at downsampled points of the search
  - Notice how it works: it duplicates the last row and column, calls the interpolation, and then expands the border
  - is there a c version of frame_init_lowres_core?
    - yes, right below, but no reason to use the C version though, just call through the function pointer. that's one of the advantages of trying such code directly in x264. you can just use all the builtins like SAD, SATD, etc
    - lowres C: 11500 cycles
    - lowres SSSE3: 902 cycles (that's why we have an asm version)
  - the fact that it gives you subpel planes may be useful as well. you could do one iteration of subpel at each level of downscale that might improve prediction as it would give a more accurate mvp

Jan 22 00:29:08 <cancan101> when it add 1 border

Jan 22 00:29:17 <cancan101> what does that do if it was % 16 ==0 before

Jan 22 00:29:28 <Dark_Shikari> which part are you referring to

Jan 22 00:29:29 <cancan101> add 15?

Jan 22 00:29:31 <Dark_Shikari> the duplicating the last row and column?

Jan 22 00:29:34 <cancan101> yea

Jan 22 00:29:35 <Dark_Shikari> or the frame_expand_border?

Jan 22 00:29:38 <cancan101> well, the combo of the two

Jan 22 00:29:42 <Dark_Shikari> the duplicating of the last row and column has nothing to do with anything

Jan 22 00:29:45 <cancan101> if it had been exactly at 16

Jan 22 00:29:48 <Dark_Shikari> its just to avoid special casing the end

Jan 22 00:29:52 <Dark_Shikari> it is not a row or column of real pixels

Jan 22 00:29:56 <Dark_Shikari> and isn't maintained after calculations are done

Jan 22 00:30:01 <cancan101> ok

Jan 22 00:30:05 <Dark_Shikari> its just there to avoid having to special-case the last row/column when doing interpolation

Jan 22 00:31:03 <cancan101> h v c?

Jan 22 00:31:11 <Dark_Shikari> those are the 4 hpel positions

Jan 22 00:31:16 <Dark_Shikari> fullpel, and then h v c

Jan 22 00:31:19 <Dark_Shikari> they are as follows

Jan 22 00:31:24 <Dark_Shikari> F H F

Jan 22 00:31:29 <Dark_Shikari> V C V

Jan 22 00:31:31 <Dark_Shikari> F H F

Jan 22 00:31:37 <Dark_Shikari> F is fullpel

Jan 22 00:32:09 <Dark_Shikari> the upper-left 4 of those are the relative positions of the same pixel in each of the 4 planes

Jan 27 03:19:24 <cancan101> how do i use to to upsample?

Jan 27 03:19:28 <Dark_Shikari> you don't

Jan 27 03:19:31 <cancan101> or is it another fnc

Jan 27 03:19:38 <Dark_Shikari> remember the lowres function gives you *4 planes*

Jan 27 03:19:42 <cancan101> right

Jan 27 03:19:53 <Dark_Shikari> F H

Jan 27 03:19:54 <Dark_Shikari> V C

Jan 27 03:20:01 <Dark_Shikari> pick the correct plane for the subpel position

Jan 27 03:20:02 <cancan101> so if i call it

Jan 27 03:20:06 <cancan101> on the original frame

Jan 27 03:20:15 <Dark_Shikari> no, that isn't how you do subpel on the original frame

Jan 27 03:20:22 <Dark_Shikari> original frame already has *real* hpel interpolation done

Jan 27 03:20:29 <cancan101> right

Jan 27 03:20:29 <Dark_Shikari> see LOAD_HPELS in analyse.c

Jan 27 03:20:32 <Dark_Shikari> for the pointers to those

Jan 27 03:20:33 <cancan101> ok

Jan 27 03:20:37 <Dark_Shikari> and how to use them

Jan 27 03:20:43 <Dark_Shikari> there are 4 hpel planes

Jan 27 03:20:49 <Dark_Shikari> one is the one you're used to using--the fullpel plane

Jan 27 03:20:52 <Dark_Shikari> the others are HVC accordingly

Jan 27 03:20:52 <cancan101> right

Jan 27 03:21:00 <Dark_Shikari> to do qpel, you use get_ref

Jan 27 03:21:02 <Dark_Shikari> see qpel refine in me.c

Jan 27 03:21:09 <Dark_Shikari> get_ref is given a pointer to a stride and to a local buffer

Jan 27 03:21:32 <Dark_Shikari> IF the mv is hpel, get_ref sets the stride to the frame stride, and returns a pointer to the hpel data

Jan 27 03:21:43 <Dark_Shikari> if the MV is qpel, get_ref will do the interpolation and return the pointer to the buffer you gave it

Jan 27 03:22:07 <Dark_Shikari> of course, you can also not reinvent the wheel and reuse the qpel refine function that already exists

Jan 27 03:22:22 <Dark_Shikari> just set yourself up an x264_me_t with the needed stuff and call it for each MB

Jan 27 03:22:27 <Dark_Shikari> or even better....

Jan 27 03:22:43 <Dark_Shikari> do qpel refine on a per MB later *during real encoding*

Jan 27 03:22:43 <Dark_Shikari> so have your pyramid search just do fullpel for example

Jan 27 03:22:46 <Dark_Shikari> (for the final step that is)

Jan 27 03:22:58 <Dark_Shikari> and then in real encoding, instead of forcing your mvs, force your mvs *but still do qpel refinement on those mvs*

Jan 27 03:23:11 <cancan101> huh?

Jan 27 03:23:15 <Dark_Shikari> whichever fits what you want to do better

Jan 27 03:23:19 <cancan101> what do u mean real encoding

Jan 27 03:23:24 <Dark_Shikari> in the encoding loop

Jan 27 03:23:27 <Dark_Shikari> that comes after your pyramid search

Jan 27 03:26:13 <cancan101> for SAD: h->pixf.fpelcmp[i_pixel](cur, 16, ref, stride )?

Jan 27 03:27:34 <Dark_Shikari> yes

Jan 27 03:27:47 <Dark_Shikari> for you the stride of cur may not be 16

Jan 27 03:27:51 <cancan101> right

Jan 27 03:27:55 <Dark_Shikari> remember, its src1, stride1, src2, stride2

Jan 27 03:27:55 <Dark_Shikari> yeah

Jan 27 03:28:01 <Dark_Shikari> i_pixel is the size of your comparison

Jan 27 03:28:05 <Dark_Shikari> PIXEL_16x16 for 16x16

Jan 27 03:29:12 <cancan101> and src

Jan 27 03:29:20 <cancan101> is to the first pixel in the bock?

Jan 27 03:30:29 <Dark_Shikari> yes

Jan 27 03:30:35 <Dark_Shikari> as is ref (for the comparison block)

Subpartions

What is this doing (in analyse.c :: x264_macroblock_analyse): analysis.l0.i_cost8x8 < analysis.l0.me16x16.cost ?

- if p8x8 score is better than 16x16 score, do sub-8x8 analysis. For ME of 8x8, you just take the MVs from 16x16 and use them as starting points for 8x8.
- We don't care about this, because we're not doing sub-8x8 analysis.