Vector Databases Explained: What They Are, Why They Matter, and Which One to Pick

Welcome, Developer 👋

A few months ago I was in a planning call and someone mentioned we might need “a vector database layer” for the search feature we were scoping. I nodded. I had heard the term plenty of times by then. RAG pipelines, semantic search, AI-native apps. I knew enough to not ask a dumb question in the meeting.

Then I went and actually learned what they are.

This post is the breakdown I wish I had before that call. Not a glossary. Not a vendor comparison table. A real explanation of what a vector database is, why it solves something that regular databases cannot, and how to think about picking one for your situation.

What is a vector database?

Start with vectors. A vector is just an ordered list of numbers. A point in space, but the space can have hundreds or thousands of dimensions. Something like [1.3, -0.4, 0.8, 2.1, ...] could describe almost anything.

What makes this useful for AI is the concept of embeddings. An embedding is what you get when a machine learning model converts something unstructured, like a sentence, a paragraph, or an image, into one of those vectors. The interesting part is that the model learns to place semantically similar things close together in that high-dimensional space. Two sentences that mean the same thing end up with vectors that are numerically close, even if they share no words. “My car broke down” and “vehicle failure on the highway” end up near each other. “I love pineapple pizza” ends up somewhere completely different.

When I first looked at this, I expected the concept to take longer to click than it did. The analogy that made it concrete for me was geographical: embeddings are like GPS coordinates, but instead of latitude and longitude you have hundreds of dimensions, each capturing some aspect of meaning. Things that are similar end up close together on the map.

Once you have embeddings, the question becomes: how do you find the most similar ones?

And here is where a regular SQL database runs into a wall.

SQL is built for exact lookups. “Give me the row where user_id = 42” or “give me all orders where status = ‘pending’.” Even range queries are fundamentally about discrete values on a number line. But similarity search is asking a different question entirely: given this vector, find me the 10 vectors that are closest to it in space. There is no index structure in traditional relational databases designed for that. You could compute the distance between your query vector and every row in the table, but at a million rows that becomes unusably slow. At ten million rows it is a non-starter.

Vector databases are built from the ground up to answer that question fast.

How it works under the hood

The core algorithm powering most vector databases is called Approximate Nearest Neighbor search, or ANN. The name tells you something important: it is approximate, not exact. And that trade-off is the right one.

Exact nearest neighbor search at scale requires comparing a query vector against every stored vector. ANN algorithms use specialized index structures, the most common being HNSW (Hierarchical Navigable Small World), to organize vectors into a graph that lets the search skip the vast majority of candidates. Think of it like a navigable map rather than a flat list. You enter at a high level, find a rough neighborhood, and zoom in. The result is not guaranteed to be the single closest match in the entire dataset, but it is close enough for real-world use cases, and it is orders of magnitude faster than brute force.

What surprised me was how fast this actually is in practice. Querying hundreds of millions of vectors in under 100 milliseconds is not unusual with a well-tuned setup.

When you run a query, you pass in a vector (usually the embedding of whatever the user typed or uploaded), and the database walks that index to return the top k results with their similarity scores. That is the whole operation. The engineering challenge is making it work at scale without falling apart on memory or latency.

Real-world use cases

This is where the concept clicks. If you have been wondering when you would actually need this, here are the cases that made it concrete for me.

Semantic search. Instead of searching by keyword, you search by meaning. A user types “how do I cancel my subscription” and the system finds the most relevant support articles even if none of them contain those exact words. The query gets embedded, the knowledge base is already embedded, and you pull the closest matches. The difference in result quality over keyword search is significant.

RAG pipelines. Retrieval Augmented Generation is the pattern behind most enterprise AI assistants right now. You store a knowledge base as embeddings in a vector database, and when a user asks a question you retrieve the most relevant chunks to inject into the LLM’s context window before it answers. The vector database is what makes the retrieval step fast and relevant. Without it, you either stuff everything into the context window (expensive, limited) or search by keyword (low quality).

Recommendation engines. If a user has interacted with certain items, those items can be averaged into a preference vector. Query the vector database for what is closest to that vector and you have recommendations. Modern recommendation systems are doing something like this under the hood, even when they have custom infrastructure on top.

Image and media similarity. The same idea applies to images. Embed a product photo and find visually similar products in your catalog. Embed a song’s audio fingerprint and find acoustically similar tracks. The embedding model changes but the database layer is identical.

Anomaly detection. If normal behavior can be represented as a cluster of vectors, you can flag anything that lands far from that cluster. Fraud detection, log analysis, quality control in manufacturing. Anything that can be embedded can be checked for distance from the norm.

The options: a practical comparison

After going through the docs and spinning up a few of these, here is my honest take on each one.

pgvector is a Postgres extension that adds a vector column type and similarity search operators to your existing database. If you are already on Postgres, this is almost always where I would tell you to start. One database, one connection pool, one backup strategy. Up to around 5 million vectors with properly tuned HNSW indexes it handles the job without drama. The limitation is that it is only as scalable as your Postgres instance, and the ANN indexing options are more limited than purpose-built databases. But for most applications, that is not a real constraint. The pragmatic default.

Pinecone is a fully managed SaaS vector database. You do not run any infrastructure. You do not think about indexing configuration, memory allocation, or scaling. You push vectors in and query them out. It scales to billions of vectors and the latency is consistently good. The honest limitation is that it is closed source, it is not cheap at scale, and you are dependent on their infrastructure and pricing. The “just make it work” option, with a line item in your monthly bill to match.

Qdrant is open-source, written in Rust, and built specifically for high-performance vector search with strong payload filtering. What I liked about Qdrant was the ability to combine “find vectors similar to this” with “where category = ‘electronics’ and price < 100” in a single query, efficiently. That combination is harder to do cleanly in some of the other options. My pick for self-hosted production workloads where you want real control over the stack. The limitation is that self-hosting anything in production is work, and Qdrant is no exception.

Weaviate is open-source and includes built-in vectorization modules that can call external embedding models on ingest, so you feed it raw text and it handles the embedding step. It also has solid hybrid search, combining vector similarity with BM25 keyword ranking in a single query. I found Weaviate impressive in terms of features. It is heavier to operate than Qdrant and the configuration surface area is larger. The right choice if you need hybrid search without stitching it together yourself, but budget extra time for the setup.

Chroma was the first one I spun up, mostly because the API is dead simple. In five minutes I had a local instance storing and querying embeddings. It is great for prototyping, for local development, and for understanding how vector databases work before committing to something more serious. It is not a production workhorse yet. Persistence and scalability are still active areas of development. Start here to learn, then migrate when you actually need to scale.

How to pick

Here is how I would think about it, in plain terms.

Already on Postgres with under 5 million vectors? Start with pgvector. You will save yourself the operational overhead of a separate database and you are unlikely to hit its limits. Need zero infrastructure overhead and have the budget for SaaS? Pinecone is the honest answer. Open-source with self-hosted production needs? Qdrant. You want hybrid search without much custom code? Weaviate is worth the setup cost. Just getting started and want to understand what you are working with before committing? Chroma.

The wrong move is picking Pinecone because it sounds more serious than pgvector, or picking Qdrant because self-hosting sounds more principled. Pick what matches your actual constraints today.

Conclusion

Every developer building AI features is going to run into this eventually. The pattern is always the same: some unstructured data needs to become searchable by meaning rather than exact match, and a vector database is what makes that possible at any practical scale.

The concepts are approachable once you sit with them for a bit. You do not need to have shipped one to production to have an informed opinion on which one fits your situation.

Stay focused, Developer!