Lab 1 - Cognitive Search

Getting started

Before we can start we need a couple of resources within the Azure Portal

  1. Sign in to the Azure portal.

  2. Click the plus sign ("+ Create Resource") in the top-left corner.

  3. Use the search bar to find "Azure Cognitive Search" or navigate to the resource through Web > Azure Cognitive Search.

4. Choose a subscription 5. Create a new or choose an existing resource group (location: westeurope)

6. Name the service 7. Choose a location (westeurope) 8. Choose a Pricing Tier (Free) 9. Click 'Review and Create'

Azure Cognitive Services

  1. Sign in to the Azure portal.

  2. Click the plus sign ("+ Create Resource") in the top-left corner.

  3. Use the search bar to find "Azure Cognitive Services" or navigate to the resource through Web > Cognitive Services.

4. Choose a subscription 5. Create a new or choose an existing resource group (location: westeurope)

6. Name the service 7. Choose a location (westeurope) 8. Choose a Pricing Tier (Standard S0) 9. Click 'Review and Create'

Storage Account

  1. Sign in to the Azure portal.

  2. Click the plus sign ("+ Create Resource") in the top-left corner.

  3. Use the search bar to find "Azure Cognitive Search" or navigate to the resource through Web > Cognitive Services.

4. Choose a subscription 5. Create a new or choose an existing resource group (location: westeurope)

6. Name the storage account 7. Choose a location (westeurope) 8. Click 'Review and Create'

Load data into the storage account

Create a container

1. Go within the Azure Portal to the storage account that you just created 2. Click on Containers

3. Click on [+ Container] 4. Give the container a name (Ex. 'data') 5. Click Create 6. Click on the created container 7. Click the [Upload] button 8. Choose some local pdf files (or download a small set from here) 9.Click [Upload]

Create a search index

  1. Go in the portal to the 'Search Service' that you created earlier.

  2. Click [+ Import Data]

  3. You will be guided through some steps in a wizard.

Connect to your data:

- Data Source: Azure Blob Storage - Data source name: 'blob' - Connection String: Choose an existing connection, choose the storage and container you just made and click [Select]

Click [Next]

Add Cognitive skills

A skillset is a list of skills that will be executed to enrich the data that is already found in the documents.

Expand 'Attach Cognitive Services' and select the cognitive service you created.

Expand 'Add enrichments' - Check 'Enable OCR' - Check 'Text Cognitive Skills' - Check 'Image Cognitive Skills'

In this step you are adding prebuilt AI skills to your indexation procedure.

You can change the target language to another if you want.

Click [Next]

Customize Index

In this view, you define how and what data you want to save in your index. An index exists out of json documents that all have the same structure.

Notice that for every field you have following options:

  • Retrievable

    • Search API will be able to retrieve this field

  • Filterable

    • Search API will be able to filter on this field

  • Sortable

    • Search API will can sort on this field

  • Facetable

    • Search API can generate facets on this field

  • Searchable

    • Search API can search through this field

  • Analyzer

    • What kind of analyser can be used to search through your field. If you want to understand the differences between them. Try out following demo

  • Sugester

    • Enable this if you want to give your search input box an autosuggest functionality. This will not improve your search results, but only the usability of the frontend that you build.

Leave all the 'Retrievable' checkboxes as they are. Enable Filterable & Facetable on people, organizations, locations, keyphrases & language

Only enable 'Searchable on 'translated text' and 'merged content' Analyzers for 'translated text' needs to be Microsoft - {language you chose} and for 'merged content' it can be 'Microsoft-English'

Click [Next]

Create an indexer

In an indexer you define finetuning of your source, when your data needs to be analysed and how to handle errors.

In advanced options you can add an extension filter on '.pdf'

Click [Submit] to start the process.

The process of analysing the pdf files has started now. You can follow the progress by clicking on the tab 'Indexers'

By clicking on 'Refresh', you will update the status.

Have a look at your data

When the indexation has succeed you can start exploring the data by clicking on 'Search Explorer'

By making use of the 'query string' input field you can explorer your data.

Answer following questions:

  • How many documents are in the index?

  • Return documents that talk about the person 'David Burt' (filter)

  • Return the facets for location linked to the above results

Some help can be found here and here

If you want to go an extra mile:

  • Try to build a small web application that has an input field that can be used to search through your data. And show the results with highlights in a clean way.

  • Build a custom web api skill to manipulate some data (help)

Help can be found here

Last updated