Boolean Model of Information Retrieval

This program will interpret documents, index the words into a dictionary and finally, it will be possible to search through docs with boolean method

Documents Interpreting

In the first part ofmain.pythe code will use theGetDocText.pymodule to read all the available docs and then index it. After reading the text, below tasks are operated on them:

  • splitting and tokenizing the words from the text
  • preprocess token (lower case + punctuations removal + excluding dirty tokens)
  • indexing tokens into the dictionary (docID + tokenPosision + frequency)

Loging activity

On each program run, a full log will be saved in./logwith the name of[Date + Time]which contains below information:

  • available documents
  • dictionary data
  • entered query and results

Searching for query

The program will ask you to enter your desired query, which can contains up to 3 word and 2 operator

for example:that WITH he WITH isis an acceptable query allowed operator are:

  • AND
  • OR
  • WITH
  • NEAR # which can place within the the query tokens Finally, all possible answers (related documnets) will be shown