User:Inductiveload/HTML processing
Appearance
Useful tools:
- curl
- pup
- jq
HathiTrust
[edit]Extract links from table
[edit]curl https://catalog.hathitrust.org/Record/000505750 | pup '.viewability-table tbody td json{}' | jq '[.[] | {href: .children[0].children[0].href, text: .children[0].children[0].children[1].text, loc: .children[1].text}]'
produces array like:
[
{
"href": "https://hdl.handle.net/2027/mdp.39015080280129",
"text": "v.36 1957",
"loc": "University of Michigan"
},
...
]
JQ
[edit]To TSV
[edit]jq -r '.[] | [.href, .text] | @tsv'