Optimize The Title And Description Extraction Method #29
Labels
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: owner/RSE#29
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Right now, to extract the title and description, the whole HTML is downloaded, and passed around to the various methods. And the worst part is that the title and description extraction works by running the WHOLE HTML through RegEX. That's not really great, as pictured in the screenshot beneath. When the scanner starts to extract the title and description, the CPU gets quite toasty. And it can take some time, depending on the size of the HTML.
So I propose that the HTML get cut off at the place where either the title or description tag ends, so we only process the necessary HTML.
More screenshots to showcase the effect.