{"id":232118,"date":"2023-01-20T16:45:00","date_gmt":"2023-01-20T13:45:00","guid":{"rendered":"https:\/\/wordpress.mediadoma.com\/?p=232118"},"modified":"2023-02-08T19:03:22","modified_gmt":"2023-02-08T16:03:22","slug":"peatage-vihased-robotid-nagu-360spider-to-crawel-my-site","status":"publish","type":"post","link":"https:\/\/wordpress.mediadoma.com\/et\/peatage-vihased-robotid-nagu-360spider-to-crawel-my-site\/","title":{"rendered":"Peatage vihased robotid, nagu 360Spider to Crawel My Site"},"content":{"rendered":"\n<p>Minu veebisait <a href=\"https:\/\/steakovercooked.com\/\" target=\"_blank\" rel=\"noopener nofollow\" class=\"external external_icon\">steakovercooked.com<\/a> on olnud \u00fches kiirhosti jagatud hostimisserveris. Hiljuti on minu sait mitu korda keelatud, kuna minu saidile on esitatud tohutult palju taotlusi. Need tulenevad peamiselt saidil roomavast robotist. Fasthosti IT-operatsioonide insener Ewan MacDonald saatis mulle meili ja \u00fctles:<\/p>\n<p>Kallis justyy<\/p>\n<p>Ma pole kindel, mida te oma saidiga t\u00e4pselt teete, kuid olete kasutanud \u00fcle 75% saadaolevatest Apache protsessidest. See on p\u00f5hjustanud suuri probleeme k\u00f5igile teistele veebiserveri klientidele.<\/p>\n<p>K\u00e4ivitan praegu teie saidi turvakontrolli.<\/p>\n<p>Pange t\u00e4hele, et teie sait sisaldab 85 000, mis ulatub 8,6 GB-ni. Meie tingimuste kohaselt peavad k\u00f5ik teie veebiruumis olevad failid olema veebisaidi osa, seega kas k\u00f5ik 85 000 faili on saidi osa ja saidi kaudu juurdep\u00e4\u00e4setavad? Kui ei, siis tuleb need eemaldada.<\/p>\n<p>Ma eemaldan ka 2 \u00fcmbernimetatud htdocsi kausta, kui te ei vaidle vastu?<\/p>\n<p>Kui teie sait p\u00f5hjustab skannimise ajal sama j\u00f5udlusprobleemi, v\u00f5tan selle uuesti v\u00f5rgu\u00fchenduseta, kuni saate selgitada, miks see seob umbes 200 Apache protsessi.<\/p>\n<p>Parimate soovidega,<\/p>\n<p>Seej\u00e4rel kontrollisin <strong>apache2<\/strong> logi ja leidsin palju j\u00e4rgmist:<\/p>\n<p>[Wed Jul 23 21:40:21 2014] [hoiata] mod_fcgid: ei saa rakendada protsessi pesa \/var\/www\/fcgi\/php54-cgi jaoks<br \/>\n[Wed Jul 23 21:40:22 2014] [hoiata] mod_fcgid: saab &#8216;ei rakenda \/var\/www\/fcgi\/php54-cgi jaoks protsessipesa<br \/>\n[Wed Jul 23 21:40:30 2014] [hoiatus] mod_fcgid: ei saa rakendada protsessipesa \/var\/www\/fcgi\/php54-cgi<br \/>\n[ Kolmap\u00e4ev, 23. juuli 21:40:31 2014] [hoiatus] mod_fcgid: \/var\/www\/fcgi\/php54-cgi jaoks ei saa protsessipesa rakendada<br \/>\n[Wed Jul 23 21:40:31 2014] [hoiata] mod_fcgid: can&#8217; t rakenda protsessipesa \/var\/www\/fcgi\/php54-cgi jaoks<br \/>\n[Wed Jul 23 21:40:31 2014] [hoiatus] mod_fcgid: ei saa rakendada protsessipesa \/var\/www\/fcgi\/php54-cgi jaoks<\/p>\n<p>Ilmselt tundub, et 360spider tabas saiti \u00fcsna tugevalt ja see m\u00f5jutab ilmselgelt ka teisi sama jagatud hosti veebisaite ning seet\u00f5ttu peavad kiirhostid minu saidi maha v\u00f5tma.<\/p>\n<p>360spideri probleem ilmnes hiljem, nii et nad on pidanud mu saidi uuesti keelama, kuni mul on valmis skript sellele juurdep\u00e4\u00e4su blokeerimiseks, kuna see p\u00f5hjustab probleeme serveri teistele kasutajatele.<\/p>\n<p>Mul on kahju, et see tekitab probleeme teistele jagamismasinatele, kuid minu arvates v\u00f5ib olla parem blokeerida nad k\u00f5rgema taseme abil (nt apache seaded). Kujutage vaid ette, sama probleemiga v\u00f5ivad kokku puutuda ka teised veebisaidid. Olen oma veebisaiti varem optimeerinud, et v\u00e4hendada CPU kasutust, salvestades need staatilistele HTML-idele. kuid \u00fcsna paljude lehek\u00fclgede t\u00f5ttu (umbes 5000 google webmasteri andmetel) ei pruugi m\u00f5ned \u00e4mblikud olla piisavalt targad, et duplikaati v\u00e4lja selgitada. Google&#8217;i \u00e4mblikud on korras, sest ma saan parameetreid seadistada ja nad j\u00e4rgivad faili robots.txt. Aga nende vihaste \u00e4mblike puhul (nt 360, youdao) nad tegelikult ei allu roomamisreeglitele. Ainus viis nende keelamiseks on m\u00e4rkida nad musta nimekirja (seda saan ma kindlasti teha). kuid teised kasutajad v\u00f5ivad sama probleemiga kokku puutuda.<\/p>\n<h2>robots.txt<\/h2>\n<p>Robots.txt on veebisaidi juure all olev tekstifail, mis suunab otsingurobotid, milliseid katalooge indekseerida ja millised mitte. Kuid mitte k\u00f5ik robotid ei j\u00e4rgi &quot;juhiseid&quot;. Siin on reeglid, mille lisan, et \u00f6elda, et need halvad robotid kaovad.<\/p>\n<pre><code># root\nUser-agent: *\nCrawl-Delay: 1\n\nUser-agent: *\nDisallow: \/cgi-bin\/\nDisallow: \/tmp\/\n\nUser-agent: 360Spider\nDisallow: \/\n\nUser-agent: YoudaoBot\nDisallow: \/\n\nUser-agent: sogou spider\nDisallow: \/\n\nUser-agent: YisouSpider\nDisallow: \/\n\nUser-agent: LinksCrawler\nDisallow: \/\n\nUser-agent: EasouSpider\nDisallow: \/<\/code><\/pre>\n<h2>.htaccess<\/h2>\n<p>Fail <strong>.htaccess<\/strong> on teksti- ja peidetud fail igas veebisaidi kataloogis. Seda kasutab apache \u00fcmberkirjutamise moodul <strong>mod_rewrite<\/strong>, et muuta URL-id ilusamaks. Seda saab kasutada ka nende robotite juhtimiseks.<\/p>\n<pre><code>&lt;IfModule mod_rewrite.c&gt;\n    RewriteEngine On\n    RewriteBase \/\n\n    RewriteCond %{REQUEST_URI} !^\/robots.txt$\n    RewriteCond %{REQUEST_URI} !^\/error.html$\n\n    RewriteCond %{HTTP_USER_AGENT} EasouSpider [NC,OR]\n    RewriteCond %{HTTP_USER_AGENT} YisouSpider [NC,OR]\n    RewriteCond %{HTTP_USER_AGENT} Sogou web spider [NC]\n    RewriteCond %{HTTP_USER_AGENT} 360Spider [NC,OR]\n    RewriteCond %{HTTP_USER_AGENT} LinksCrawler [NC,OR]    \n    RewriteRule ^.*$ - [F,L]\n&lt;\/IfModule&gt;\n\n&lt;IfModule mod_setenvif.c&gt;\n    SetEnvIfNoCase User-Agent \"EasouSpider\" bad_bot\n    SetEnvIfNoCase User-Agent \"YisouSpider\" bad_bot\n    SetEnvIfNoCase User-Agent \"LinksCrawler\" bad_bot\n    SetEnvIfNoCase User-Agent \"360Spider\" bad_bot\n    SetEnvIfNoCase User-Agent \"Sogou\" bad_bot        \n    Order Allow,Deny\n    Allow from All\n    Deny from env=bad_bot\n&lt;\/IfModule&gt;<\/code><\/pre>\n<h2>PHP kood<\/h2>\n<p>Ettevaatusabin\u00f5una panin <strong>indeks.php<\/strong> -sse ka j\u00e4rgmise koodi, mida kasutatakse erinevate lehtede genereerimiseks vastavalt URL-i parameetritele. 99% veebisaidi lehtedest luuakse selle registrifaili abil.<\/p>\n<pre><code>  $agent='';\n  if (isset($_SERVER['HTTP_USER_AGENT']))\n  {\n    $agent = $_SERVER['HTTP_USER_AGENT'];\n  } \n\n  define('BADBOTS','\/(yisouspider|easouspider|yisou|youdaobot|yodao|360|linkscrawler|soguo)\/i');\n\n  if (preg_match(BADBOTS, $agent)) {\n    die();\n  }  <\/code><\/pre>\n<p>P\u00f5him\u00f5tteliselt kontrollib \u00fclaltoodud PHP stringi <strong>HTTP_USER_AGENT<\/strong> nende halbade robotite suhtes. Preg_match <strong>kasutab<\/strong> regulaaravaldist ja valik <strong>\/i<\/strong> m\u00e4\u00e4rab t\u00f5stutundlikud v\u00f5rdlused.<\/p>\n<p>Olen ka logifailis m\u00e4rganud, et selliseid kirjeid on p\u00e4ris palju:<br \/>\n119.188.91.121 \u2013 \u2013 [24\/Jul\/2014:22:39:51 +0100] &#8220;GET \/?charset=big5&amp;do=System.Online&amp;lang=ch&amp;page =25&amp;per=10&amp;skin=2011aastap\u00e4ev HTTP\/1.0&quot; 200 3919 &quot; <a href=\"https:\/\/steakovercooked.com\/\" target=\"_blank\" rel=\"noopener nofollow\" class=\"external external_icon\">https:\/\/steakovercooked.com\/<\/a> \u2026 \u2026&quot; &quot;~Mozilla\/5.0 (\u00fchildub; MSIE 9.0; Windows NT 6.1; Trident\/5.0)~&quot;<\/p>\n<p>HTTP_USER_AGENTist arvate tavaliselt, et see pole robot, kuid ma arvan, et nad on seda. Nii et need robotid on v\u00e4ga halvad. Nad annavad tegelikult mis tahes USER_AGENT (nad saavad seda v\u00e4\u00e4rtust muuta) ja need on tavaliselt mitmest IP-st (seega pole neid k\u00f5iki konkreetsete IP-vahemike abil lihtne tuvastada).<\/p>\n<p><a href=\"https:\/\/wordpress.mediadoma.com\/wp-content\/uploads\/2022\/01\/post-156542-61e5be9a90257.png\" data-rel=\"lightbox\"><img decoding=\"async\" class=\"SDStudio-light-box-enable SDStudio-editor-tools-md-imp\" src=\"https:\/\/wordpress.mediadoma.com\/wp-content\/uploads\/2022\/01\/post-156542-61e5be9a90257.png\" alt=\"Peatage vihased robotid, nagu 360Spider to Crawel My Site\"><\/a><\/p>\n<p>Tundub, et see t\u00f6\u00f6tab p\u00e4rast \u00fclaltoodud meetodeid.<\/p>\n<p>tundub, et see t\u00f6\u00f6tab, sest ma leian apache logist palju selliseid<br \/>\n[Thu Jul 24 23:01:02 2014] [viga] [klient 61.135.189.186] serveri konfiguratsiooniga keelatud klient: \/home\/linweb09\/z\/steakovercooked. com-1048918357\/user\/htdocs\/<br \/>\n[Th Jul 24 23:01:02 2014] [viga] [klient 61.135.189.186] klient keelatud serveri konfiguratsiooni t\u00f5ttu: \/home\/linweb09\/z\/steakovercooked. \/error<br \/>\n[Thu Jul 24 23:01:08 2014] [viga] [klient 61.135.189.186] klient on serveri konfiguratsiooni t\u00f5ttu keelatud: \/home\/linweb09\/z\/steakovercooked.com-1048918357\/user\/htdocs<\/p>\n<p>Ja ka kiirhostid on rahul: &#8220;Jah, n\u00fc\u00fcd n\u00e4eb palju parem v\u00e4lja. nii et ma sulgen selle pileti. Suur t\u00e4nu teie tegevuse eest. &quot;<\/p>\n<p>Kuid see ei pruugi olla l\u00f5plik lahendus&#8230; L\u00f5puks kolin selle saidi ka <a href=\"https:\/\/wordpress.mediadoma.com\/et\/pilve-vps-on-parem-kui-traditsiooniline-vps-i-hostimine\/\" title=\"VPS\">VPS<\/a> -i, koormuse tasakaalustamise serveritesse v\u00f5i spetsiaalsesse serverisse, et seda rumala p\u00f5hjuse t\u00f5ttu maha ei v\u00f5etaks.<\/p>\n<p>Teisel p\u00e4eval lugesin ma j\u00e4rgmist l\u00f5iku ja ma ei suutnud selles enam n\u00f5ustuda: Veebimajutusettev\u00f5te <strong>EI TOHI<\/strong> midagi teie veebisaitide <a href=\"https:\/\/helloacm.com\/how-to-improve-seo-by-noindexing-attachment-and-pagination-in-wordpress\/\" target=\"_blank\" rel=\"noopener nofollow\" class=\"external external_icon\">SEO<\/a> maine kahjustamiseks teha, r\u00e4\u00e4kimata kogu teie saidi ilma teie lubadeta allak\u00e4igust. Kiirhostid on napilt \u00fcle piiri ja seet\u00f5ttu said kiirhostid nii palju halbu arvustusi (miski nagu pr\u00fcgi, jama, kogu eluks minema).<\/p>\n<p><a href=\"https:\/\/wordpress.mediadoma.com\/wp-content\/uploads\/2022\/01\/post-156542-61e5be9c1782e.jpg\" data-rel=\"lightbox\"><img decoding=\"async\" class=\"SDStudio-light-box-enable SDStudio-editor-tools-md-imp\" src=\"https:\/\/wordpress.mediadoma.com\/wp-content\/uploads\/2022\/01\/post-156542-61e5be9c1782e.jpg\" alt=\"Peatage vihased robotid, nagu 360Spider to Crawel My Site\"><\/a><\/p>\n<p>Muide, ma kasutan <a href=\"https:\/\/helloacm.com\/out\/quickhost\" target=\"_blank\" rel=\"noopener nofollow\" class=\"external external_icon\">QuickHostUK-i<\/a>, mis on lihtsalt parim. VPS t\u00f6\u00f6tab lihtsalt suurep\u00e4raselt ja olen juba paar saiti teisaldanud.<\/p>\n<p><a href=\"https:\/\/helloacm.com\/out\/quickhost\" target=\"_blank\" rel=\"noopener nofollow\" class=\"external\"><img decoding=\"async\" src=\"https:\/\/wordpress.mediadoma.com\/wp-content\/uploads\/2022\/01\/post-156542-61e5be9c1782e.jpg\" alt=\"Peatage vihased robotid, nagu 360Spider to Crawel My Site\" \/><\/a><\/p>\n<p><div id=\"PostUnique_PostSource\" style=\"padding-top: 50px\">:  <a target=\"_blank\" rel=\"noopener nofollow\" href=\"\/\/helloacm.com\" class=\"external external_icon\">helloacm.com<\/a><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Peatage vihased robotid, nagu 360Spider to Crawel My Site<\/p>\n","protected":false},"author":1,"featured_media":224493,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_wp_rev_ctl_limit":""},"categories":[718,1029,842,863],"tags":[1165],"class_list":["post-232118","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-arendaja","category-ohutus","category-opetused","category-wordpress-4","tag-affiai-et"],"_links":{"self":[{"href":"https:\/\/wordpress.mediadoma.com\/et\/wp-json\/wp\/v2\/posts\/232118","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wordpress.mediadoma.com\/et\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wordpress.mediadoma.com\/et\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wordpress.mediadoma.com\/et\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wordpress.mediadoma.com\/et\/wp-json\/wp\/v2\/comments?post=232118"}],"version-history":[{"count":0,"href":"https:\/\/wordpress.mediadoma.com\/et\/wp-json\/wp\/v2\/posts\/232118\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wordpress.mediadoma.com\/et\/wp-json\/wp\/v2\/media\/224493"}],"wp:attachment":[{"href":"https:\/\/wordpress.mediadoma.com\/et\/wp-json\/wp\/v2\/media?parent=232118"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wordpress.mediadoma.com\/et\/wp-json\/wp\/v2\/categories?post=232118"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wordpress.mediadoma.com\/et\/wp-json\/wp\/v2\/tags?post=232118"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}