Week 1 - Revisit project codebase 🛣️


Overview

The first week of the project was a revisiting phase, centered around reading and understanding the code base more. Initially i have been able to run the entire pipeline during the writing of the original proposal to understand the current stage of the project and where it needs improvement. However, In this first week of revisiting the project, I made sure to try out every section of the pipeline without merging the entire pipeline. From this, i was able to dig out some part of the code base where my contrbution will be come in and applied. After going through the understanding the code base process, i have been able to understand more where my firstt contribution will come in, which is by changing the current semantic similarity search algorithm into a co-occurence or the most efficient algorithm.

Insights

While the roadmap for the project implementation may appear straightforward at first glance, from this first week, i have been able to understand:

  • How some part of the coode base works individually.
  • Have an understanding of where to make my contributions next week in other not to brake the entire codebase
  • And raise the question of:What training approach should we employ?
    • How the co-occurence based algorithm will fit it
    • How efficient can it be compared to the current semantic similarities approach.
    • integrating the algorithm to fit in with the ontology predicates

In the course of my research on answering these questions, I came across a medium article. This article explains in details how the co-occurence based algorithm works and also show a little bit of how to implement it in code. Although the code implementation may not be similar to our usecase, it gives me an insight and inspiration on how to implement the algorithm to fit our project needs.

Next week plan

For the upcoming week, the plan is to dive deeper into the co-occurence algorithm, get to know how it is going to work for our project and fit its needs. And finally implement it in codes.