[R] What existing Artificial tools can I use / read (books) to dynamically monitor changes on a website?
[R] What existing Artificial tools can I use / read (books) to dynamically monitor changes on a website?

[R] What existing Artificial tools can I use / read (books) to dynamically monitor changes on a website?

Good evening, colleagues,

I'm working on a Java project, where I'm parsing a website DOM. I have the problem that the page frequently changes it structure, so I have to adapt the tool I developed often.

My working scenario:

Let’s say I have two versions of the same website:

  1. before changing design (DOM);
  2. after changing design (DOM).

I have Java-based parser, which has the valid version according to 1) variant. It means, my project was written up to the version of the old DOM;

After a period of time my code is broken, ‘cuz creator of the website is changing DOM (design). It means, I’m getting 2) variant, when my parser isn’t working because of the situation while analyzing the structure of the website, it detects that DOM was modified (design), e.g. sidebar was changed or elements are in another way added / presented / reflected / located.

So, I should a strict period of time just to rewrite parser according to the new version of the website (marketplace, in my case).

This leads to a deterioration in my personal productivity; instead of working and filling my product with functionality, I am forced to waste time under someone else’s inappropriate actions.

I call “inappropriate” actions, because I work with a parser that does not steal someone’s money, but collects information, makes a copy of the landing page, the existing framework, where’s title, img-url.

As result, from my personal view, it doesn’t make sense, AT ALL, because why should I think about HOW this creator of the website is going to work tomorrow?

Should I sit & wait until he changed again his DOM? This is a rhetorical question.

Of course, not, right?

Like any sane & healthy person, I won’t do it, because I respect my time, and do not waste it down the drain.

Does anyone know any good tools/resources/practices to do that using data science techniques?

If you need more details, please, write me back.

Please, try to answer on your own on my question if it possible.

Thank you in advance.

submitted by /u/invzbl3
[link] [comments]