Going From Unstructured Data to Vector Similarity Search


One of the key concepts used in AI modeling is the storage and query of vectors. This workshop will start with 2 examples of unstructured data, images and journal abstracts. Participants will then work this data all the way through to a usable data store with an application on top it.

We will cover things such as transformers, embeddings, HuggingFace, choosing a vectorization model, and effective query composition. The python code needed to do this is quite easy to understand. If you want to work on your own laptop, there will be prerequisites published beforehand. If not, we will use a cloud-hosted environment for you to do all your work. After this workshop you should be able to understand some of the concepts being used in these new AI architectures and have a better grasp on the work involved with building “AI” applications.

Software needs/requirements:
• A web browser
• A github account
• The ability to read some simple python and SQL code

Key learnings/takeaways
• How to turn unstructured data into knowledge
• What are vectors (also called embeddings)
• Foundational models versus creating your own
• How you can make your own enhacements to foundational models
• How to choose an appropriate vector size for your use case
• How to lead vector data into a vector data store
• What is the HNSW index and why you should care
• Queries that are appropriate for your data

Benefits of attending the workshop:

• Practical hands on work with some of the latest AI technology
• A better grasp of the process of using it with your own data
• A better idea of how to integrate these technologies into your
◦ Data flows
◦ App Architectures
◦ App development

Background Knowledge:

basic python, SQL, and linux CLI familiarity - VERY basic


Steve is a dad, partner, son, and founder of Tech Raven Consulting. He can teach you about Data Analysis, Java, Python, PostgreSQL, Microservices, Containers, Kubernetes, and some JavaScript. He has deep subject area expertise in GIS/Spatial, Statistics, and Ecology. Before founding his company, Steve was a developer Advocate for VMware, Crunchy Data, DigitalGlobe, Red Hat, LinkedIn, deCarta, and ESRI. Steve has a Ph.D. in Ecology and can easily be bribed with offers of bird watching or fly fishing.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google