Skip to main content

Indexing Data - Azure Cognitive Search

Indexing is the process of preparing your data for search by extracting and storing relevant information in a searchable index. With Azure Cognitive Search, you can index a wide range of data types, including structured and unstructured data, images, and documents. In this chapter, we will explore the different ways to index your data, including automatic indexing and manual indexing, and discuss the key indexing concepts and best practices.

Indexing Concepts

Index

An index is a collection of documents or records that are searchable using Azure Cognitive Search. An index is defined by its schema, which includes information such as the fields to be indexed and their data types.

Document

A document is a single item or record that is indexed by Azure Cognitive Search. Documents can be of various types, including text, images, and documents.

Field

A field is a specific piece of information that is extracted from a document and stored in the index. Fields can be of various types, such as string, integer, and date.

Analyzer

An analyzer is a component of Azure Cognitive Search that processes text data during indexing to create searchable terms. Analyzers can be customized to suit the specific needs of your data.

Indexing Modes

There are two main modes of indexing in Azure Cognitive Search: automatic indexing and manual indexing.

Automatic indexing

With automatic indexing, Azure Cognitive Search automatically indexes data as it is added to the data source. This is useful for scenarios where new data is added frequently and needs to be immediately available for search. Automatic indexing can be configured to use a push model, where data is pushed to Azure Cognitive Search, or a pull model, where Azure Cognitive Search pulls data from the data source.

Manual indexing

With manual indexing, you have more control over the indexing process, as you manually trigger the indexing process using the REST API or one of the Azure Cognitive Search SDKs. This is useful for scenarios where you want to control when data is indexed, or when you need to index a large volume of data all at once.