Skip to main content Link Menu Expand (external link) Document Search Copy Copied

As Extractor

Extractor takes maximum file name length under consideration and creates sub-directories based on the url.

http://a.com/b.ext?x=&y=$%z2 -> a.com/b.extxyz2_.html (a.com folder with b.extxyz2_.html file in it)

  • To just extract a single webpage to terminal:
$ python darkspider.py -u http://github.com/
## Termex :: Extracting http://github.com to terminal
## http://github.com ::
<!DOCTYPE html>
...
</html>
  • Extract into a file (github.html) without the use of TOR:
$ python darkspider.py -w -u http://github.com -o github.html
## Outex :: Extracting http://github.com to github.com/github.html
  • Extract to terminal and find only the line with google-site-verification:
$ python darkspider.py -u http://github.com/ | grep 'google-site-verification'
    <meta name="google-site-verification" content="xxxx">
  • Extract to file and find only the line with google-site-verification using yara:
$ python darkspider.py -v -w -u https://github.com -e -y 0
...

Update res/keyword.yar to search for other keywords. Use -y 0 for raw html searching and -y 1 for text search only.

  • Extract a set of webpages (imported from file) to a folder:
$ python darkspider.py -i links.txt -f links_output
...