<?xml version="1.0"?>
<oembed><version>1.0</version><provider_name>CROWDWORKS Blog</provider_name><provider_url>http://crowdworks.blog/en/</provider_url><author_name>ahrawriter</author_name><author_url>http://crowdworks.blog/en/author/ahrawriter/</author_url><title>RLHF and DPO Compared - CROWDWORKS Blog</title><type>rich</type><width>600</width><height>338</height><html>&lt;blockquote class="wp-embedded-content" data-secret="cHlKWdSglb"&gt;&lt;a href="http://crowdworks.blog/en/rlhf-and-dpo-compared/"&gt;RLHF and DPO Compared&lt;/a&gt;&lt;/blockquote&gt;&lt;iframe sandbox="allow-scripts" security="restricted" src="http://crowdworks.blog/en/rlhf-and-dpo-compared/embed/#?secret=cHlKWdSglb" width="600" height="338" title="&#x201C;RLHF and DPO Compared&#x201D; &#x2014; CROWDWORKS Blog" data-secret="cHlKWdSglb" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" class="wp-embedded-content"&gt;&lt;/iframe&gt;&lt;script type="text/javascript"&gt;
/* &lt;![CDATA[ */
/*! This file is auto-generated */
!function(d,l){"use strict";l.querySelector&amp;&amp;d.addEventListener&amp;&amp;"undefined"!=typeof URL&amp;&amp;(d.wp=d.wp||{},d.wp.receiveEmbedMessage||(d.wp.receiveEmbedMessage=function(e){var t=e.data;if((t||t.secret||t.message||t.value)&amp;&amp;!/[^a-zA-Z0-9]/.test(t.secret)){for(var s,r,n,a=l.querySelectorAll('iframe[data-secret="'+t.secret+'"]'),o=l.querySelectorAll('blockquote[data-secret="'+t.secret+'"]'),c=new RegExp("^https?:$","i"),i=0;i&lt;o.length;i++)o[i].style.display="none";for(i=0;i&lt;a.length;i++)s=a[i],e.source===s.contentWindow&amp;&amp;(s.removeAttribute("style"),"height"===t.message?(1e3&lt;(r=parseInt(t.value,10))?r=1e3:~~r&lt;200&amp;&amp;(r=200),s.height=r):"link"===t.message&amp;&amp;(r=new URL(s.getAttribute("src")),n=new URL(t.value),c.test(n.protocol))&amp;&amp;n.host===r.host&amp;&amp;l.activeElement===s&amp;&amp;(d.top.location.href=t.value))}},d.addEventListener("message",d.wp.receiveEmbedMessage,!1),l.addEventListener("DOMContentLoaded",function(){for(var e,t,s=l.querySelectorAll("iframe.wp-embedded-content"),r=0;r&lt;s.length;r++)(t=(e=s[r]).getAttribute("data-secret"))||(t=Math.random().toString(36).substring(2,12),e.src+="#?secret="+t,e.setAttribute("data-secret",t)),e.contentWindow.postMessage({message:"ready",secret:t},"*")},!1)))}(window,document);
/* ]]&gt; */
&lt;/script&gt;
</html><thumbnail_url>https://i0.wp.com/crowdworks.blog/wp-content/uploads/2024/01/&#xBE14;&#xB85C;&#xADF8;_&#xD2B8;&#xB79C;&#xB4DC;&#xC378;&#xB124;&#xC77C;03-1.png?fit=600%2C99999</thumbnail_url><thumbnail_width/><thumbnail_height/><description>Introduction Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) are two approaches in the field of large-scale language models used to enhance models through human guidance. This exploration on the two approaches aims to delve into the distinctive features of RLHF and DPO, providing insights into their applications, mechanisms as well as [&hellip;]</description></oembed>
