extract-fields

Scripts to extract HTML form field information from one or several webpages.

⚠️ This project has been archived

No future updates are planned. Feel free to continue using it, but expect no support.

Usage

Make sure you have installed phantomjs first.

$ phantomjs extract-field-info.js
# extract-field-info.js: extracts general form field info from HTML pages over HTTP.
# Usage: extract-field-info.js <mode> [arguments]
#   MODE 		 ARGUMENTS
#   ---- 		 ---------
#   help
#   fields 		  address
#   shared 		  address1  address2  more

Examples

The final 2>/dev/null is just there to hide some phantomjs error output about font performance.

Command line

Extract the field names/values for google.com.

$ phantomjs src/extract-field-info.js fields "https://google.com/" 2>/dev/null

Extract the field names/values for github.com.

$ phantomjs src/extract-field-info.js fields "https://github.com/" 2>/dev/null

While it doesn’t make much sense, let’s extract the shared field names/values for google.com and github.com.

$ phantomjs src/extract-field-info.js shared "https://google.com/" "https://github.com/" 2>/dev/null

extract-field-info.sh

Reads the files in the examples/html/ folder.

In one terminal, start jekyll as a webserver.

$ cd example/html/
$ jekyll serve --watch

In a second terminal, run the extraction script.

$ cd example/
$ ./example/extract-field-info.sh "http://localhost:4000"

Look in example/output/ for the result.

extract-field-names.sh

Reads the files in the examples/html/ folder.

$ cd example/
$ ./example/extract-field-name.sh

Look in example/output/html/ for the result.

See also

  • FormFieldInfo, a javascript plugin used to collect information about forms in a page, which is used by the phantomjs script.

License

Copyright (c) 2012, 2013, 2014, 2015, Joel Purra All rights reserved.

When using extract-fields, comply to the MIT license. Please see the LICENSE file for details, and the MIT License on Wikipedia.