Yahav Biran, PhD

Yahav Biran, PhD

Principal Architect at Amazon Web Services

    Yahav Biran is a Principal Architect in AWS, focused on AI workloads at scale. Yahav enjoys contributing to open-source projects and publishing in AWS blog and academic journals. He currently contributes to the K8s Helm community, AWS databases and compute blogs, and Journal of Systems Engineering. He delivers technical presentations at technology events and working with customers to design their applications in the Cloud. He received his Ph.D. (Systems Engineering) from Colorado State University.

    All Sessions by Yahav Biran, PhD

    Day 3 04/25/2024
    1:00 pm - 1:30 pm

    Abstracting ARM/x86 CPUs and NVIDIA/Neuron Hardware Accelerator Allocation for Containerized ML App

    <span class="etn-schedule-location"> <span class="firstfocus">MLOps</span> </span>

    The shortage of hardware accelerators delays model training for customers with computationally intensive and parallel processing capabilities. Moreover, the lack of applications’ flexibility to support both general-purpose compute and high availability accelerators, makes training jobs rigid and difficult to resume after unexpected host interruptions. Also, customers cannot deploy flexible inference services that enable cost, availability, latency, and performance tradeoffs. e.g., defines compute priorities for inferences with different CPU and HW accelerator prices and locations. Until today, customers who trained models and offered model inference services had to manually configure compute infrastructure requirements that matched their application. If these resources could not be allocated, the job was delayed. Cube-scheduler allows more flexibility for machine learning jobs by automatically detecting and matching job specification to processor and hardware accelerator. Cube-scheduler seamlessly invokes ML software packages on optimal resources by abstracting the underlying runtime packages such as Linux and Python.

    Day 3 04/25/2024
    1:00 pm - 1:30 pm

    Abstracting ARM/x86 CPUs and NVIDIA/Neuron Hardware Accelerator Allocation for Containerized ML App

    <span class="etn-schedule-location"> <span class="firstfocus">MLOps </span> </span>

    The shortage of hardware accelerators delays model training for customers with computationally intensive and parallel processing capabilities. Moreover, the lack of applications’ flexibility to support both general-purpose compute and high availability accelerators, makes training jobs rigid and difficult to resume after unexpected host interruptions. Also, customers cannot deploy flexible inference services that enable cost, availability, latency, and performance tradeoffs. e.g., defines compute priorities for inferences with different CPU and HW accelerator prices and locations. Until today, customers who trained models and offered model inference services had to manually configure compute infrastructure requirements that matched their application. If these resources could not be allocated, the job was delayed. Cube-scheduler allows more flexibility for machine learning jobs by automatically detecting and matching job specification to processor and hardware accelerator. Cube-scheduler seamlessly invokes ML software packages on optimal resources by abstracting the underlying runtime packages such as Linux and Python.

    Open Data Science

     

     

     

    Open Data Science
    One Broadway
    Cambridge, MA 02142
    info@odsc.com

    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Youtube
    Consent to display content from - Youtube
    Vimeo
    Consent to display content from - Vimeo
    Google Maps
    Consent to display content from - Google