2 files changed, 444 insertions, 12 deletions
diff --git a/unstable/linux-dmabuf/feedback.rst b/unstable/linux-dmabuf/feedback.rst
new file mode 100644
index 0000000..0f6e1b5
--- /dev/null
+++ b/unstable/linux-dmabuf/feedback.rst
@@ -0,0 +1,218 @@
+.. Copyright 2021 Simon Ser
+
+.. contents::
+
+
+linux-dmabuf feedback introduction
+==================================
+
+linux-dmabuf feedback allows compositors and clients to negotiate optimal buffer
+allocation parameters. This document will assume that the compositor is using a
+rendering API such as OpenGL or Vulkan and KMS as the presentation API: even if
+linux-dmabuf feedback isn't restricted to this use-case, it's the most common.
+
+linux-dmabuf feedback introduces the following concepts:
+
+1. A main device. This is the render device that the compositor is using to
+   perform composition. Compositors should always be able to display a buffer
+   submitted by a client, so this device can be used as a fallback in case none
+   of the more optimized code-paths work. Clients should allocate buffers such
+   that they can be imported and textured from the main device.
+
+2. One or more tranches. Each tranche consists of a target device, allocation
+   flags and a set of format/modifier pairs. A tranche can be seen as a set of
+   formats/modifier pairs that are compatible with the target device.
+
+   A tranche can have the ``scanout`` flag. It means that the target device is
+   a KMS device, and that buffers allocated with one of the format/modifier
+   pairs in the tranche are eligible for direct scanout.
+
+   Clients should use the tranches in order to allocate buffers with the most
+   appropriate format/modifier and also to avoid allocating in private device
+   memory when cross-device operations are going to happen.
+
+linux-dmabuf feedback implementation notes
+==========================================
+
+This section contains recommendations for client and compositor implementations.
+
+For clients
+-----------
+
+Clients are expected to either pick a fixed DRM format beforehand, or
+perform the following steps repeatedly until they find a suitable format.
+
+Basic clients may only support static buffer allocation on startup. These
+clients should do the following:
+
+1. Send a ``get_default_feedback`` request to get global feedback.
+2. Select the device indicated by ``main_device`` for allocation.
+3. For each tranche:
+
+   1. If ``tranche_target_device`` doesn't match the allocation device, ignore
+      the tranche.
+   2. Accumulate allocation flags from ``tranche_flags``.
+   3. Accumulate format/modifier pairs received via ``tranche_formats`` in a
+      list.
+   4. When the ``tranche_done`` event is received, try to allocate the buffer
+      with the accumulated list of modifiers and allocation flags. If that
+      fails, proceed with the next tranche. If that succeeds, stop the loop.
+
+4. Destroy the feedback object.
+
+Tranches are ordered by preference: the more optimized tranches come first. As
+such, clients should use the first tranche that happens to work.
+
+Some clients may have already selected the device they want to use beforehand.
+These clients can ignore the ``main_device`` event, and ignore tranches whose
+``tranche_target_device`` doesn't match the selected device. Such clients need
+to be prepared for the ``wp_linux_buffer_params.create`` request to potentially
+fail.
+
+If the client allocates a buffer without specifying explicit modifiers on a
+device different from the one indicated by ``main_device``, then the client
+must force a linear layout.
+
+Some clients might support re-negotiating the buffer format/modifier on the
+fly. These clients should send a ``get_surface_feedback`` request and keep the
+feedback object alive after the initial allocation. Each time a new set of
+feedback parameters is received (ended by the ``done`` event), they should
+perform the same steps as basic clients described above. They should detect
+when the optimal allocation parameters didn't change (same
+format/modifier/flags) to avoid needlessly re-allocating their buffers.
+
+Some clients might additionally support switching the device used for
+allocations on the fly. Such clients should send a ``get_surface_feedback``
+request. For each tranche, select the device indicated by
+``tranche_target_device`` for allocation. Accumulate allocation flags (received
+via ``tranche_flags``) and format/modifier pairs (received via
+``tranche_formats``) as usual. When the ``tranche_done`` event is received, try
+to allocate the buffer with the accumulated list of modifiers and the
+allocation flags. Try to import the resulting buffer by sending a
+``wp_linux_buffer_params.create`` request (this might fail). Repeat with each
+tranche until an allocation and import succeeds. Each time a new set of
+feedback parameters is received, they should perform these steps again. They
+should detect when the optimal allocation parameters didn't change (same
+device/format/modifier/flags) to avoid needlessly re-allocating their buffers.
+
+For compositors
+---------------
+
+Basic compositors may only support texturing the DMA-BUFs via a rendering API
+such as OpenGL or Vulkan. Such compositors can send a single tranche as a reply
+to both ``get_default_feedback`` and ``get_surface_feedback``. Set the
+``main_device`` to the rendering device. Send the tranche with
+``tranche_target_device`` set to the rendering device and all of the DRM
+format/modifier pairs supported by the rendering API. Do not set the
+``scanout`` flag in the ``tranche_flags`` event.
+
+Some compositors may support direct scan-out for full-screen surfaces. These
+compositors can re-send the feedback parameters when a surface becomes
+full-screen or leaves full-screen mode if the client has used the
+``get_surface_feedback`` request. The non-full-screen feedback parameters are
+the same as basic compositors described above. The full-screen feedback
+parameters have two tranches: one with the format/modifier pairs supported by
+the KMS plane, with the ``scanout`` flag set in the ``tranche_flags`` event and
+with ``tranche_target_device`` set to the KMS scan-out device; the other with
+the rest of the format/modifier pairs (supported for texturing, but not for
+scan-out), without the ``scanout`` flag set in the ``tranche_flags`` event, and
+with the ``tranche_target_device`` set to the rendering device.
+
+Some compositors may support direct scan-out for all surfaces. These
+compositors can send two tranches for surfaces that become candidates for
+direct scan-out, similarly to compositors supporting direct scan-out for
+fullscreen surfaces. When a surface stops being a candidate for direct
+scan-out, compositors should re-send the feedback parameters optimized for
+texturing only.  The way candidates for direct scan-out are selected is
+compositor policy, a possible implementation is to select as many surfaces as
+there are available hardware planes, starting from surfaces closer to the eye.
+
+Some compositors may support multiple devices at the same time. If the
+compositor supports rendering with a fixed device and direct scan-out on a
+secondary device, it may send a separate tranche for surfaces displayed on
+the secondary device that are candidates for direct scan-out. The
+``tranche_target_device`` for this tranche will be the secondary device and
+will not match the ``main_device``.
+
+Some compositors may support switching their rendering device at runtime or
+changing their rendering device depending on the surface. When the rendering
+device changes for a surface, such compositors may re-send the feedback
+parameters with a different ``main_device``. However there is a risk that
+clients don't support switching their device at runtime and continue using the
+previous device. For this reason, compositors should always have a fallback
+rendering device that they initially send as ``main_device``, such that these
+clients use said fallback device.
+
+Compositors should not change the ``main_device`` on-the-fly when explicit
+modifiers are not supported, because there's a risk of importing buffers
+with an implicit non-linear modifier as a linear buffer, resulting in
+misinterpreted buffer contents.
+
+Compositors should not send feedback parameters if they don't have a fallback
+path. For instance, compositors shouldn't send a format/modifier supported for
+direct scan-out but not supported by the rendering API for texturing.
+
+Compositors can decide to use multiple tranches to describe the allocation
+parameters optimized for texturing. For example, if there are formats which
+have a fast texturing path and formats which have a slower texturing path, the
+compositor can decide to expose two separate tranches.
+
+Compositors can decide to use intermediate tranches to describe code-paths
+slower than direct scan-out but faster than texturing. For instance, a
+compositor could insert an intermediate tranche if it's possible to use a
+mem2mem device to convert buffers to be able to use scan-out.
+
+``dev_t`` encoding
+==================
+
+The protocol carries ``dev_t`` values on the wire using arrays. A compositor
+written in C can encode the values as follows:
+
+.. code-block:: c
+
+    struct stat drm_node_stat;
+    struct wl_array dev_array = {
+        .size = sizeof(drm_node_stat.st_rdev),
+        .data = &drm_node_stat.st_rdev,
+    };
+
+A client can decode the values as follows:
+
+.. code-block:: c
+
+    struct dev_t dev;
+    assert(dev_array->size == sizeof(dev));
+    memcpy(&dev, dev_array->data, sizeof(dev));
+
+Because two DRM nodes can refer to the same DRM device while having different
+``dev_t`` values, clients should use ``drmDevicesEqual`` to compare two
+devices.
+
+``format_table`` encoding
+=========================
+
+The ``format_table`` event carries a file descriptor containing a list of
+format + modifier pairs. The list is an array of pairs which can be accessed
+with this C structure definition:
+
+.. code-block:: c
+
+    struct dmabuf_format_modifier {
+        uint32_t format;
+        uint32_t pad; /* unused */
+        uint64_t modifier;
+    };
+
+Integration with other APIs
+===========================
+
+- libdrm: ``drmGetDeviceFromDevId`` returns a ``drmDevice`` from a device ID.
+- EGL: the `EGL_EXT_device_drm_render_node`_ extension may be used to query the
+  DRM device render node used by a given EGL display. When unavailable, the
+  older `EGL_EXT_device_drm`_ extension may be used as a fallback.
+- Vulkan: the `VK_EXT_physical_device_drm`_ extension may be used to query the
+  DRM device used by a given ``VkPhysicalDevice``.
+
+.. _EGL_EXT_device_drm: https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_device_drm.txt
+.. _EGL_EXT_device_drm_render_node: https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_device_drm_render_node.txt
+.. _VK_EXT_physical_device_drm: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_physical_device_drm.html
diff --git a/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml b/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml
index 09cf0bb..14cf242 100644
--- a/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml
+++ b/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml
@@ -24,17 +24,18 @@
     DEALINGS IN THE SOFTWARE.
   </copyright>
 
-  <interface name="zwp_linux_dmabuf_v1" version="3">
+  <interface name="zwp_linux_dmabuf_v1" version="4">
     <description summary="factory for creating dmabuf-based wl_buffers">
       Following the interfaces from:
       https://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt
       https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_image_dma_buf_import_modifiers.txt
       and the Linux DRM sub-system's AddFb2 ioctl.
 
-      This interface offers ways to create generic dmabuf-based
-      wl_buffers. Immediately after a client binds to this interface,
-      the set of supported formats and format modifiers is sent with
-      'format' and 'modifier' events.
+      This interface offers ways to create generic dmabuf-based wl_buffers.
+
+      Clients can use the get_surface_feedback request to get dmabuf feedback
+      for a particular surface. If the client wants to retrieve feedback not
+      tied to a surface, they can use the get_default_feedback request.
 
       The following are required from clients:
 
@@ -123,10 +124,9 @@
         For the definition of the format codes, see the
         zwp_linux_buffer_params_v1::create request.
 
-        Warning: the 'format' event is likely to be deprecated and replaced
-        with the 'modifier' event introduced in zwp_linux_dmabuf_v1
-        version 3, described below. Please refrain from using the information
-        received from this event.
+        Starting version 4, the format event is deprecated and must not be
+        sent by compositors. Instead, use get_default_feedback or
+        get_surface_feedback.
       </description>
       <arg name="format" type="uint" summary="DRM_FORMAT code"/>
     </event>
@@ -152,6 +152,10 @@
         For the definition of the format and modifier codes, see the
         zwp_linux_buffer_params_v1::create and zwp_linux_buffer_params_v1::add
         requests.
+
+        Starting version 4, the modifier event is deprecated and must not be
+        sent by compositors. Instead, use get_default_feedback or
+        get_surface_feedback.
       </description>
       <arg name="format" type="uint" summary="DRM_FORMAT code"/>
       <arg name="modifier_hi" type="uint"
@@ -159,9 +163,34 @@
       <arg name="modifier_lo" type="uint"
            summary="low 32 bits of layout modifier"/>
     </event>
+
+    <!-- Version 4 additions -->
+
+    <request name="get_default_feedback" since="4">
+      <description summary="get default feedback">
+        This request creates a new wp_linux_dmabuf_feedback object not bound
+        to a particular surface. This object will deliver feedback about dmabuf
+        parameters to use if the client doesn't support per-surface feedback
+        (see get_surface_feedback).
+      </description>
+      <arg name="id" type="new_id" interface="zwp_linux_dmabuf_feedback_v1"/>
+    </request>
+
+    <request name="get_surface_feedback" since="4">
+      <description summary="get feedback for a surface">
+        This request creates a new wp_linux_dmabuf_feedback object for the
+        specified wl_surface. This object will deliver feedback about dmabuf
+        parameters to use for buffers attached to this surface.
+
+        If the surface is destroyed before the wp_linux_dmabuf_feedback object,
+        the feedback object becomes inert.
+      </description>
+      <arg name="id" type="new_id" interface="zwp_linux_dmabuf_feedback_v1"/>
+      <arg name="surface" type="object" interface="wl_surface"/>
+    </request>
   </interface>
 
-  <interface name="zwp_linux_buffer_params_v1" version="3">
+  <interface name="zwp_linux_buffer_params_v1" version="4">
     <description summary="parameters for creating a dmabuf-based wl_buffer">
       This temporary object is a collection of dmabufs and other
       parameters that together form a single logical buffer. The temporary
@@ -219,8 +248,8 @@
         defined by the DRM fourcc code.
 
         Warning: It should be an error if the format/modifier pair was not
-        advertised with the modifier event. This is not enforced yet because
-        some implementations always accept DRM_FORMAT_MOD_INVALID. Also
+        advertised by zwp_linux_dmabuf_feedback_v1. This is not enforced yet
+        because some implementations always accept DRM_FORMAT_MOD_INVALID. Also
         version 2 of this protocol does not have the modifier event.
 
         This request raises the PLANE_IDX error if plane_idx is too large.
@@ -368,7 +397,192 @@
       <arg name="format" type="uint" summary="DRM_FORMAT code"/>
       <arg name="flags" type="uint" enum="flags" summary="see enum flags"/>
     </request>
+  </interface>
+
+  <interface name="zwp_linux_dmabuf_feedback_v1" version="4">
+    <description summary="dmabuf feedback">
+      This object advertises dmabuf parameters feedback. This includes the
+      preferred devices and the supported formats/modifiers.
+
+      The parameters are sent once when this object is created and whenever they
+      change. The done event is always sent once after all parameters have been
+      sent. When a single parameter changes, all parameters are re-sent by the
+      compositor.
+
+      Compositors can re-send the parameters when the current client buffer
+      allocations are sub-optimal. Compositors should not re-send the
+      parameters if re-allocating the buffers would not result in a more optimal
+      configuration. In particular, compositors should avoid sending the exact
+      same parameters multiple times in a row.
+
+      The tranche_target_device and tranche_modifier events are grouped by
+      tranches of preference. For each tranche, a tranche_target_device, one
+      tranche_flags and one or more tranche_modifier events are sent, followed
+      by a tranche_done event finishing the list. The tranches are sent in
+      descending order of preference. All formats and modifiers in the same
+      tranche have the same preference.
+
+      To send parameters, the compositor sends one main_device event, tranches
+      (each consisting of one tranche_target_device event, one tranche_flags
+      event, tranche_modifier events and then a tranche_done event), then one
+      done event.
+    </description>
+
+    <request name="destroy" type="destructor">
+      <description summary="destroy the feedback object">
+        Using this request a client can tell the server that it is not going to
+        use the wp_linux_dmabuf_feedback object anymore.
+      </description>
+    </request>
+
+    <event name="done">
+      <description summary="all feedback has been sent">
+        This event is sent after all parameters of a wp_linux_dmabuf_feedback
+        object have been sent.
+
+        This allows changes to the wp_linux_dmabuf_feedback parameters to be
+        seen as atomic, even if they happen via multiple events.
+      </description>
+    </event>
+
+    <event name="format_table">
+      <description summary="format and modifier table">
+        This event provides a file descriptor which can be memory-mapped to
+        access the format and modifier table.
+
+        The table contains a tightly packed array of consecutive format +
+        modifier pairs. Each pair is 16 bytes wide. It contains a format as a
+        32-bit unsigned integer, followed by 4 bytes of unused padding, and a
+        modifier as a 64-bit unsigned integer. The native endianness is used.
+
+        The client must map the file descriptor in read-only private mode.
+
+        Compositors are not allowed to mutate the table file contents once this
+        event has been sent. Instead, compositors must create a new, separate
+        table file and re-send feedback parameters. Compositors are allowed to
+        store duplicate format + modifier pairs in the table.
+      </description>
+      <arg name="fd" type="fd" summary="table file descriptor"/>
+      <arg name="size" type="uint" summary="table size, in bytes"/>
+    </event>
+
+    <event name="main_device">
+      <description summary="preferred main device">
+        This event advertises the main device that the server prefers to use
+        when direct scan-out to the target device isn't possible. The
+        advertised main device may be different for each
+        wp_linux_dmabuf_feedback object, and may change over time.
+
+        There is exactly one main device. The compositor must send at least
+        one preference tranche with tranche_target_device equal to main_device.
+
+        Clients need to create buffers that the main device can import and
+        read from, otherwise creating the dmabuf wl_buffer will fail (see the
+        wp_linux_buffer_params.create and create_immed requests for details).
+        The main device will also likely be kept active by the compositor,
+        so clients can use it instead of waking up another device for power
+        savings.
+
+        In general the device is a DRM node. The DRM node type (primary vs.
+        render) is unspecified. Clients must not rely on the compositor sending
+        a particular node type. Clients cannot check two devices for equality
+        by comparing the dev_t value.
+
+        If explicit modifiers are not supported and the client performs buffer
+        allocations on a different device than the main device, then the client
+        must force the buffer to have a linear layout.
+      </description>
+      <arg name="device" type="array" summary="device dev_t value"/>
+    </event>
+
+    <event name="tranche_done">
+      <description summary="a preference tranche has been sent">
+        This event splits tranche_target_device and tranche_modifier events in
+        preference tranches. It is sent after a set of tranche_target_device
+        and tranche_modifier events; it represents the end of a tranche. The
+        next tranche will have a lower preference.
+      </description>
+    </event>
 
+    <event name="tranche_target_device">
+      <description summary="target device">
+        This event advertises the target device that the server prefers to use
+        for a buffer created given this tranche. The advertised target device
+        may be different for each preference tranche, and may change over time.
+
+        There is exactly one target device per tranche.
+
+        The target device may be a scan-out device, for example if the
+        compositor prefers to directly scan-out a buffer created given this
+        tranche. The target device may be a rendering device, for example if
+        the compositor prefers to texture from said buffer.
+
+        The client can use this hint to allocate the buffer in a way that makes
+        it accessible from the target device, ideally directly. The buffer must
+        still be accessible from the main device, either through direct import
+        or through a potentially more expensive fallback path. If the buffer
+        can't be directly imported from the main device then clients must be
+        prepared for the compositor changing the tranche priority or making
+        wl_buffer creation fail (see the wp_linux_buffer_params.create and
+        create_immed requests for details).
+
+        If the device is a DRM node, the DRM node type (primary vs. render) is
+        unspecified. Clients must not rely on the compositor sending a
+        particular node type. Clients cannot check two devices for equality by
+        comparing the dev_t value.
+
+        This event is tied to a preference tranche, see the tranche_done event.
+      </description>
+      <arg name="device" type="array" summary="device dev_t value"/>
+    </event>
+
+    <event name="tranche_formats">
+      <description summary="supported buffer format modifier">
+        This event advertises the format + modifier combinations that the
+        compositor supports.
+
+        It carries an array of indices, each referring to a format + modifier
+        pair in the last received format table (see the format_table event).
+        Each index is a 16-bit unsigned integer in native endianness.
+
+        For legacy support, DRM_FORMAT_MOD_INVALID is an allowed modifier.
+        It indicates that the server can support the format with an implicit
+        modifier. When a buffer has DRM_FORMAT_MOD_INVALID as its modifier, it
+        is as if no explicit modifier is specified. The effective modifier
+        will be derived from the dmabuf.
+
+        A compositor that sends valid modifiers and DRM_FORMAT_MOD_INVALID for
+        a given format supports both explicit modifiers and implicit modifiers.
+
+        Compositors must not send duplicate format + modifier pairs within the
+        same tranche or across two different tranches with the same target
+        device and flags.
+
+        This event is tied to a preference tranche, see the tranche_done event.
+
+        For the definition of the format and modifier codes, see the
+        wp_linux_buffer_params.create request.
+      </description>
+      <arg name="indices" type="array" summary="array of 16-bit indexes"/>
+    </event>
+
+    <enum name="tranche_flags" bitfield="true">
+      <entry name="scanout" value="1" summary="direct scan-out tranche"/>
+    </enum>
+
+    <event name="tranche_flags">
+      <description summary="tranche flags">
+        This event sets tranche-specific flags.
+
+        The scanout flag is a hint that direct scan-out may be attempted by the
+        compositor on the target device if the client appropriately allocates a
+        buffer. How to allocate a buffer that can be scanned out on the target
+        device is implementation-defined.
+
+        This event is tied to a preference tranche, see the tranche_done event.
+      </description>
+      <arg name="flags" type="uint" enum="tranche_flags" summary="tranche flags"/>
+    </event>
   </interface>
 
 </protocol>