Nature vs Nurture

Mindbreeze InSpire Server

The Mindbreeze Enterprise Search Appliance

When you first start up a Mindbreeze Search Appliance, it's tempting to let it just figure things out on its own. After all, its 'nature' is pretty smart as computers go - out of the box it can already read the documents its indexing and both identify and classify the various terms it finds in them. It can understand that 'Brooklyn' or '42 Wallaby Way' is a location, 'August 23' is a date, 'Wendy Fulton' is a name, and associate those specific bits of knowledge with the document. Using its internal Artificial Intelligence, Mindbreeze is capable of taking this further - it can identify terms that are concepts, like 'project' or 'meeting'. Using natural language processing on both the documents and on search queries allows the appliance to respond differently to similar queries - ask Mindbreeze 'When is the meeting with Microsoft?' and it understands that you are trying to find information about dates and times. Ask 'Who is in the meeting with Microsoft?' and it understands that you are looking for people.

The search appliance's nature, untouched and out of the box, already provides organizations with excellent search capability. To take it to the next level and get the full use out of this powerful search appliance, it's a good idea to help it through 'nurturing' - that is, teaching it what's important to your particular organization, just as you would show the ropes to a new employee.

One way to do this is to provide the appliance with lists of information. In a medical organization, the appliance will encounter terms in documents that a medical professional would recognize as a disease such as 'Fibromyalgia', and other terms that are drug names, such as 'Fentanyl'. Left to its own devices the Mindbreeze appliance may well classify these as relevant terms or concepts and searches on those terms will still produce relevant documents, but the search experience will vastly improve and become more configurable if the appliance is taught that the former is a 'Disease' and the latter is a 'Medication' or 'Drug'. This can be done simply by feeding the appliance lists of relevant drugs and diseases, and then when the appliance encounters terms that match the lists it's been taught it will classify them correctly. With properly classified terms, some very powerful searches become possible, such as a search for a particular disease that then produces a sub-list of drugs that are mentioned in concert with that particular disease, so a user could know at a glance what particular drugs are being studied or used to treat certain diseases.

Another way to teach the appliance about something important in your data is to tell it what that important data looks like. If, for instance, you are a grocery chain, you've probably got a list of products that your stores carry, and each one of those products has a product code. If the product code follows a specific format, such as 'PRD-9583-1435', you can tell the appliance to look for patterns that match that 'PRD-XXXX-XXXX' format and classify all such matching terms to be recorded as product codes within the appliance. This can be especially powerful if used in concert with data lists of search synonyms on the search query, so instead of expecting a user to know the product code they are looking for, they can instead just search on the name of the product, and the search synonym list will then automatically include the actual product code in the search, thus retrieving documents that list either the name or just the product code of whatever they're looking for.

The easiest way to teach your appliance about your data is to simply give it as much data as possible to look at. The more it's able to see, the more it's able to study your organization's data and establish associations between various parts of that data - even when the data comes from entirely separate sources, such as corporate email servers and a product database. The more it's able to read, the more it can understand your data, in its own way.

Utilizing all this helps support one of the most impressive features of the appliance - automatic document classification through search appliance learning. An insurance company might receive thousands of emails per day, sent to a single address listed on its website. The services of several full time employees might be required just to read each message and then send it to the appropriate department - sales, business inquiries, claims, and so on. In something of a reversal, this process begins with 'nurturing' - the appliance is given a set of 'training documents' that show it a representative sample of what it might encounter and what it is supposed to do with them.

After that, the appliance's own 'nature' kicks in to improve the process in the form of its internal artificial intelligence. Based on the semantics of each document and what it learned from the training documents, it starts classifying incoming emails as best it can, sending them to their respective departments. Being taught to identify particular types of data, such as a claim number (the presence of which is a strong indicator that the email should be forwarded to the claims department), can be invaluable in both getting it right off the get-go, and in the learning process later.

Much like a new employee, it will not get everything right off the bat. At least a few emails will be sent to the wrong address. A feedback mechanism is vital. When these mistakes happen the appliance must be told that it did the wrong thing, so that it realizes that it made a mistake. As with a human, every mistake is an opportunity to learn the correct behavior, so even as Mindbreeze makes mistakes, it will also learn from those mistakes and continually get better at its task. Unlike a human, there isn't a good way to tell it exactly why it made a mistake - nor is there a need to do so. The search appliance simply needs to know that it made a mistake, and through its own internal analysis over a sufficient number of mistakes, it will figure out on its own what it should be doing instead.

The result is a search appliance that gets better and better over time. Depending on the nature of the task, matching or even exceeding human accuracy is possible. Given the search appliance's inherent advantages in speed and cost, accuracy is the last aspect of the task that a human might be required for, thus freeing up employees for other duties.

For more information our MindBreeze Experts can be reached at