Skip to main content Link Menu Expand (external link) Document Search Copy Copied

What is DarkSpider?

DarkSpider is a multithreaded crawler and extractor for regular or onion webpages through the TOR network, written in Python.

Get started now View it on GitHub

Crawling is not illegal, but violating copyright is. It’s always best to double check a website’s T&C before crawling them. Some websites set up what’s called robots.txt to tell crawlers not to visit those pages. This crawler will allow you to go around this, but we always recommend respecting robots.txt.

Keep in mind

Extracting and crawling through TOR network take some time. That’s normal behaviour; you can find more information here.

What makes it simple?

With a single argument you can read a .onion webpage or a regular one through TOR Network and using pipes you can pass the output to any other tool you prefer.

$ python darkspider.py -u http://github.com/ | grep 'google-site-verification'
    <meta name="google-site-verification" content="xxxx">

If you want to crawl the links of a webpage use the -c and you will get a folder with all the extracted links. You can even use -d to crawl them and so on. As far, there is also the necessary argument -p to wait some seconds before the next crawl.

$ python darkspider.py -v -u http://github.com/ -c -d 2 -p 2
[ DEBUG ] TOR is ready!
[ DEBUG ] Your IP: XXX.XXX.XXX.XXX :: Tor Connection: True
[ DEBUG ] URL :: http://github.com
[ DEBUG ] Folder created :: github.com
[ INFO  ] Crawler started from http://github.com with 2 depth, 2.0 seconds delay and using 16 Threads. Excluding 'None' links.
[ INFO  ] Step 1 completed :: 87 result(s)
[ INFO  ] Step 2 completed :: 4228 result(s)
[ INFO  ] Network Structure created :: github.com/network_structure.json

Output in Readme is trimmed for better readability. General verbose output in log file is much detailed.

$ python darkspider.py -v -u https://thehiddenwiki.org/ -c -d 1 -p 1 -y 0 -t 32

2022-10-12 01:37:50 | DEBUG | checker.py:88 | TOR is ready!
2022-10-12 01:37:54 | DEBUG | checker.py:129 | Your IP: 185.246.188.60 :: Tor Connection: True
2022-10-12 01:37:54 | DEBUG | checker.py:134 | URL :: https://thehiddenwiki.org/
2022-10-12 01:37:54 | DEBUG | darkspider.py:296 | Folder created :: thehiddenwiki.org
2022-10-12 01:37:54 | INFO  | crawler.py:204 | Crawler started from https://thehiddenwiki.org with 1 depth, 1.0 second delay and using 32 Threads. Excluding 'None' links.
2022-10-12 01:37:56 | DEBUG | crawler.py:227 | https://thehiddenwiki.org :: 200
2022-10-12 01:37:56 | INFO  | crawler.py:248 | Step 1 completed :: 106 result(s)
2022-10-12 01:37:57 | INFO  | darkspider.py:311 | Network Structure created :: thehiddenwiki.org/network_structure.json