Latent Space Podcast 5/8/23 [Summary] - The AI Founder Gene: Being Early, Building Fast, and Believing in Greatness — with Sharif Shameem of Lexica
Ep.11 with Sharif Shameem of Lexica: Dive into the AI founder mindset, uncovering the secrets to pioneering innovation, building game-changing tech, training models, and the intriguing potential of Agents and genomic sequencing.
Original Link: The AI Founder Gene: Being Early, Building Fast, and Believing in Greatness — with Sharif Shameem of Lexica
Summary
Sharif Amin's Tech Odyssey: From University Dropout to AI Pioneer
In the episode of the Latent Space podcast, Alessio and his co-host Swyx, interview Sharif Amin. Sharif delves into his educational journey, sharing that he dropped out of the University of Maryland to pursue a side project, inspired by advice from an investor and a book he read. He then worked at Mitra, a federally funded research and development center, where he was involved in computer vision projects.
Sharif later founded VectorDash, inspired by the costly GPU requirements for machine learning research versus the potential of utilizing GPUs used for cryptocurrency mining. The platform evolved into a GPU cloud provider for video games, somewhat ahead of Google's Stadia, though it faced bandwidth challenges.
In 2020, Sharif was intrigued by OpenAI's GPT-3, which led him to develop debuild. His project's potential was amplified when a tweet showcasing it caught traction and was later featured in various tech publications.
By 2022, Sharif had transitioned to Lexica, a search engine born out of the need to better navigate the stable diffusion discord, which he felt had a deficient search functionality. Sharif's ability to identify gaps in current tools and his innovative mindset stands out through his journey.
Sharif Amin's Tech Odyssey: From University Dropout to AI Pioneer
In the episode of the Latent Space podcast, Alessio and his co-host Swyx, interview Sharif Amin. Sharif delves into his educational journey, sharing that he dropped out of the University of Maryland to pursue a side project, inspired by advice from an investor and a book he read. He then worked at Mitra, a federally funded research and development center, where he was involved in computer vision projects.
Sharif later founded VectorDash, inspired by the costly GPU requirements for machine learning research versus the potential of utilizing GPUs used for cryptocurrency mining. The platform evolved into a GPU cloud provider for video games, somewhat ahead of Google's Stadia, though it faced bandwidth challenges.
In 2020, Sharif was intrigued by OpenAI's GPT-3, which led him to develop debuild. His project's potential was amplified when a tweet showcasing it caught traction and was later featured in various tech publications.
By 2022, Sharif had transitioned to Lexica, a search engine born out of the need to better navigate the stable diffusion discord, which he felt had a deficient search functionality. Sharif's ability to identify gaps in current tools and his innovative mindset stands out through his journey.
Lexica's Rapid Rise and Technical Backbone
Launch Impact: Within 24 hours of launching Lexica, the platform saw a staggering 51,000 queries, and by the second day, it doubled to 111,000 queries. They went on to serve over 5 billion images per month, an impressive growth trajectory Sharif hadn't anticipated.
Tech Insights: Lexica's initial search system relied on Postgres' full text search but was later inspired by "Same Energy" for a semantic image search. The upgraded system was developed on clip embeddings of images, utilizing FAISS (a Facebook library) for efficient KNN search on embeddings. This semantic search was a hit with users, significantly increasing engagement.
Funding and Focus: Lexica, which started as a side project, caught investor attention after an exponential user growth in its initial weeks. They raised $5 million from Daniel Gross and participated in AI Grant, subsequently making Lexica the main focus of their efforts.
From Search to Generation: Noticing user behavior, Sharif decided to integrate generation features directly within Lexica. This allowed users to not just search but also edit and generate images, enhancing user experience and streamlining their interaction.
Learning from AI Grant: The program emphasized that while there's abundant potential in advanced AI models, there's a dearth of compelling products for the average user. The goal is to prioritize product development that addresses real user needs over simply achieving state-of-the-art benchmarks.
The Data Factor: The success in training high-quality models, especially diffusion models, leans heavily on the quality and quantity of data. For Lexica, aesthetic scoring of images and user rankings play a critical role. Using traffic from their platform, they refine data sets and improve their models' aesthetics.
Bridging the Gap: Transitioning to AI and Enhancing Web Interactivity for Models
Exploring Machine Learning and Training Models
Swyx interviewed Sharif about his transition from an infrastructure background to training machine learning models.
Sharif discussed the challenges and parallels between learning programming and learning how to train AI models. He stressed the importance of persistence, leveraging online resources, and learning from open-source communities.
Both discussed the uniqueness of AI in terms of costing as running machine learning models might burn through credits, unlike typical programming.
Sharif highlighted the efficiency of fine-tuning open-source models, mentioning specific models like FLAN-T5 and GPT-J. He further discussed the potential of newer models and their advancements.
A brief discussion ensued about Dolly, a commercial version of the Vicuna model.
On the Rise of AI Agents
Alessio shifted the conversation towards AI agents and their potential after Sharif's past demo using the GPT-3 API.
Sharif explained a past project that aimed to let agents browse documentation, summarize it, and potentially perform online tasks like creating API keys.
He delved into the challenges of converting web pages into a format digestible for AI and how the AI can act on this information.
Sharif introduced a potential solution using terminal-based browsers that turn graphical elements into ASCII and maintaining textual content. He also highlighted challenges in making an AI interact accurately with elements on a web page and proposed solutions.
Swyx raised a concern about how models would act when they lack sufficient information, and Sharif suggested training models using sessions of agents browsing the web.
Alessio and Sharif discussed the potential of reshaping the Document Object Model (DOM) to make web pages more accessible and intuitive for AI models to interact with, highlighting the importance of a more annotated and accessible web.
Multimodal Models and the Evolution of Lexica
Multimodal Capabilities of GPT-4:
Sharif highlights the potential of GPT-4's multimodal capabilities, especially for graphically dense web pages. Its dense multimodal input means it could even extract text from PDFs and summarize content from complex web pages.
Compared to Clip, GPT-4 has both text and vision understanding, offering the best of both worlds.
Ensemble models, like the combination of Blip and Lama, can be useful. An ensemble provides the benefits of multiple models, but there are advantages when different modalities are trained within the same model.
Sharif’s Startup Manual:
Sharif shares his startup advice, emphasizing the importance of launching a minimal version of a product to gather user feedback.
He stresses the significance of not over-engineering, shipping early, and ensuring that a product is loved by users before focusing on growth.
The journey of Lexica's versions (Aperture V1 to V3) highlights the importance of iteration based on user feedback. V1 was less popular, leading to improvements in V2 and V3 based on users' preferences for non-photorealistic images.
Request for AI Startup - LLM Tools:
Sharif expresses interest in developing better tools for language models themselves, such as giving them access to browsers and payment systems.
Unlocking the Potential of Genomics and Future of AI-driven Tools
Unlocking the Potential of Genomics and AI:
The podcast kicks off with a discussion about the incredible power of consumer genomics. Tools like 23andMe allow individuals to export their entire genome as a text file, which can then be interpreted by platforms such as Prometheus. This information provides insights into specific traits and predispositions that individuals possess, from being a night owl to the likelihood of contracting certain diseases.
The Importance of Novel Ideas in Tech:
Swyx and Sharif emphasize the need for more innovative ideas in the tech space. While many are creating uninspiring B2B SaaS apps, there's a desire to motivate people to think outside the box. Projects like the baby AGI and GPT Agent are not necessarily useful in their current form but serve to inspire and show what's possible. The discussion moves to video games, highlighting how NPCs (non-playable characters) in games like "The Last of Us Part Two" have become remarkably realistic through conditional rules. Imagine the game-changer it would be if each NPC was driven by an AI model like GPT-4.
AI in Everyday Tools:
Sharif mentions ChatGPT as one of his favorite AI products and expresses excitement about a company working on a version of VS code with an AI-powered cursor. This cursor aims to help users refine their code simply by describing desired changes. Furthermore, they discuss the need for more language model tools that can perform tasks autonomously, emphasizing the potential for such tools to revolutionize many sectors.
Final Takeaway:
The podcast concludes with a crucial message from Sharif. He stresses that individuals often believe their ideas have been executed before, but in reality, many of these ideas are unique and have not been tried. Sharif's encouragement is for people to bring their unusual and innovative ideas to life, emphasizing that the tech world needs more of such pioneering ideas rather than mere incremental improvements on existing concepts.