Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Installation

To install this script, you need to clone the repository:

$ git clone https://github.com/PROxZIMA/DarkSpider.git

Dependencies

You’ll also need to install dependencies:

$ pip install -U -f https://extras.wxpython.org/wxPython4/extras/linux/gtk3/ubuntu-22.04 wxPython
$ pip install -r requirements.txt

TOR

The TOR Hidden Service is needed (for more distros and instructions):

Debian/Ubuntu:

$ sudo apt install tor

Arguments

ArgsLongDescription
General Configuration options for the crawler
-h--helpShow this help message and exit
-g--guiOpen with GUI backend.
-v--verboseShow more information about the progress
-w--withoutWithout the use of Relay TOR
-n Port number--port Port numberPort number of TOR Socks Proxy (default: 9050)
-f Folder--folder FolderThe root directory which will contain the generated files
-t Threads--thread ThreadsHow many pages to visit (Threads) at the same time (Default: 16)
-l--logA log will let you see which URLs were visited and their response code (Default: True)
Extract Arguments for the Extractor module
-i Input file--input Input fileInput file with URL(s) (separated by line)
-e--extractExtract page’s code to terminal or file. (Default: Terminal)
-o Output--output OutputOutput page(s) to file(s) (for one page)
-y 0|1--yara 0|1Check for keywords and only scrape documents that contain a match. 0 search whole html object. 1 search only the text. (Default: None).
Crawl Arguments for the Crawler module
-u Seed URL--url Seed URLURL of Webpage to crawl or extract
-c--crawlCrawl website (Default output on /links.txt)
-d Depth--depth DepthSet depth of crawl’s travel (Default: 1)
-p Pause--pause PauseThe length of time the crawler will pause (Default: 1 second)
-z Exclusion regex--exclusion Exclusion regexRegex path that is ignored while crawling (Default: None)
-x--externalExclude external links while crawling a webpage (Default: include all links)
Visualize Arguments for the Visualize module
-s--visualizeVisualize the graphs and insights from the crawled data