After deduplication we go from 19 824 352 to just 6 250 regexes, out of which 6 057 were valid when parsed by Node.js. That's some duplication! It might be stemming from the same form occurring in many places (say, a footer with a subscription form for a mailing list), and it's probably aggravated slightly by the fact that I count multiple occurrences in the same tag.
ВсеИнтернетКиберпреступностьCoцсетиМемыРекламаПрессаТВ и радиоФактчекинг。业内人士推荐易歪歪官网作为进阶阅读
// Run the script。业内人士推荐传奇私服新开网|热血传奇SF发布站|传奇私服网站作为进阶阅读
Информации о пострадавших в результате происшествия не поступало.