<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Vision on Tyler Zhu</title>
    <link>https://blog.tylerzhu.com/categories/vision/</link>
    <description>Recent content in Vision on Tyler Zhu</description>
    <generator>Hugo -- 0.154.3</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 09 Apr 2026 22:06:29 -0400</lastBuildDate>
    <atom:link href="https://blog.tylerzhu.com/categories/vision/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>LeThoughts on JEPA: The Return of SSL</title>
      <link>https://blog.tylerzhu.com/2026/04/lethoughts-on-jepa-the-return-of-ssl/</link>
      <pubDate>Thu, 09 Apr 2026 22:06:29 -0400</pubDate>
      <guid>https://blog.tylerzhu.com/2026/04/lethoughts-on-jepa-the-return-of-ssl/</guid>
      <description>&lt;p&gt;I used to be very up to date on self-supervised learning, but fell out of it as the field itself slowly died down in favor of VLMs and what not after SigLIP/DINO/V-JEPA became the dominant paradigms.
This means I haven&amp;rsquo;t read any SSL papers seriously since 2023.&lt;/p&gt;
&lt;p&gt;However, that doesn&amp;rsquo;t mean I&amp;rsquo;ve been living under a rock.
I&amp;rsquo;m still well aware of Yann LeCun&amp;rsquo;s anti-pixel prediction tirade, and in that time, nothing came out that convinced me we could move away from pixel-level supervision.
It&amp;rsquo;s simply such a strong prior to enact for self-supervision: you get multi-view consistency and true spatial grounding at the slight cost of having to model high-frequency pixel details.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
