Command Line XML Parsing (Ubuntu/Linux)

To read xml document with terminal we can use xmllint tool.

Lets say we have a file named 1.xml with the below content:

<?xml version="1.0" encoding="utf-8"?>
<item>
  <definition test="abc">v. to provide a place to sleep or live</definition>
  <definition>def2</definition>
  <word>accommodates</word>
</item>

Get elements:

xmllint --xpath "/item/definition" 1.xml

Output: <definition test="abc">v. to provide a place to sleep or live</definition><definition>def2</definition>

Get text of elements:

xmllint --xpath "//item/word/text()" 1.xml

Output: accommodates

Get attributes:

xmllint --xpath "/item/definition/@test" 1.xml

Output:  test="abc"

Select elements by its attribute:

xmllint --xpath "/item/definition[@test='abc']" 1.xml

Output: <definition test="abc">v. to provide a place to sleep or live</definition>


Search element by part of text:

xmllint --xpath "//*[contains(*,'sleep')]" 1.xml

Search element by part of attribute:

xmllint --xpath "//*[contains(@test,'bc')]" 1.xml

Parse HTML URL:

xmllint --html --xpath //div/a  "http://blog.mbirgin.com"

Parse HTML and ignore error messages: 

curl -s http://www.mbirgin.com | xmllint --html  --xpath '(//td[contains(@class,"lastpost") and contains(@class,"windowbg2")])[1]/text()'  2>/dev/null - 

Share Share Editor: editor | Posted: 2014/10/08 | Views: 64646

Comments

18 + 2 =
Home | Search | Contact | Terms
Editor