Xpath: abbreviazioni
Abbiamo parlato della parentele nella parte 1, vediamo come utilizzare
le abbreviazioni disponibili
Il body utilizzato rimane lo stesso:
<A-TAG class="a-class">A-text <B-TAG class="b-class">B-text</B-TAG> <B-TAG id="1">B-text <C-TAG class="c-class" id="3">C-text</C-TAG> <C-TAG class="c-class" id="4">C-text <D-TAG>D-text</D-TAG> </C-TAG> </B-TAG> <B-TAG id="2">B-text <C-TAG id="5">C-text <D-TAG>D-text</D-TAG> <D-TAG id="6">D-text <E-TAG>E-text</E-TAG> <E-TAG>E-text <F-TAG>F-text</F-TAG> </E-TAG> </D-TAG> </C-TAG> </B-TAG> <B-TAG attr="b-attr">B-text <C-TAG>C-text <D-TAG>D-text</D-TAG> </C-TAG> </B-TAG> </A-TAG>
Abbreviazioni
1.’child::’
è sottinteso pertanto le sintassi:
selector.xpath('//c-tag') selector.xpath('//b-tag/c-tag')
equivalgono a
selector.xpath('//child::c-tag') selector.xpath('//child::b-tag/child::c-tag')
>>> for s in selector.xpath('//child::b-tag/child::c-tag'): print s ... <Selector xpath='//child::b-tag/child::c-tag' data=u'<c-tag class="c-class" id="3">C-text</c-'> <Selector xpath='//child::b-tag/child::c-tag' data=u'<c-tag class="c-class" id="4">C-text\n '> <Selector xpath='//child::b-tag/child::c-tag' data=u'<c-tag id="5">C-text\n <d-tag>'> <Selector xpath='//child::b-tag/child::c-tag' data=u'<c-tag>C-text\n <d-tag>D-text<'>
2. ‘@’
è la stessa cosa che scrivere ‘attribute:: ‘
>>> selector.xpath('//child::b-tag[@id=1]') [<Selector xpath='//child::b-tag[@id=1]' data=u'<b-tag id="1">B-text\n <c-tag clas'>] >>> selector.xpath('//child::b-tag[attribute::id=1]') [<Selector xpath='//child::b-tag[attribute::id=1]' data=u'<b-tag id="1">B-text\n <c-tag clas'>]
attenzione ai path!
se al posto delle parentesi quadre, usassi lo /, invece del tag che contiene quel
determinato attributo, selezionerei l’attributo stesso:
>>> selector.xpath('//child::b-tag/@id=1') [<Selector xpath='//child::b-tag/@id=1' data=u'1'>]
La differenza è evidente se osserviamo il valore di data, o
se utilizziamo il metodo extract() di scrapy:
>>> selector.xpath('//child::b-tag[@id=1]').extract() [u'<b-tag id="1">B-text\n <c-tag class="c-class" id="3">C-text</c-tag><c-tag class="c-class"...tag>'] >>> selector.xpath('//child::b-tag/@id=1').extract() [u'1']
Nel primo caso infatti otteniamo il b-tag con id=1, nel secondo caso otteniamo il testo
dell’attributo id del b-tag con tale id, cioè ‘1’.
3. ‘.’
è la stessa cosa di ‘self::’, ad esempio ‘.//b-tag’ equivale a ‘self::node()//child::b-tag’:
>>> selector.xpath('.//b-tag[@id=1]') [<Selector xpath='.//b-tag[@id=1]' data=u'<b-tag id="1">B-text\n <c-tag clas'>] >>> selector.xpath('self::node()//child::b-tag[@id=1]') [<Selector xpath='self::node()//child::b-tag[@id=1]' data=u'<b-tag id="1">B-text\n <c-tag clas'>]
4. ‘..’
è la stessa cosa di ‘parent::’, ad esempio ‘//b-tag/..’ equivale a ‘//child::b-tag/parent::node()’:
>>> selector.xpath('//b-tag/..') [<Selector xpath='//b-tag/..' data=u'<a-tag class="a-class">A-text\n <b-tag'>] >>> selector.xpath('//child::b-tag[@id=1]/parent::node()') [<Selector xpath='//child::b-tag[@id=1]/parent::node()' data=u'<a-tag class="a-class">A-text\n <b-tag'>]
5. ‘//’
è la stessa cosa di ‘descendant-or-self::’, ad esempio ‘//b-tag//c-tag’ equivale
‘//child::b-tag/descendant-or-self::node()/child::c-tag’
>>> for s in selector.xpath('/html/body/a-tag/child::b-tag/descendant-or-self::node()/child::c-tag'): print s ... <Selector xpath='/html/body/a-tag/child::b-tag/descendant-or-self::node()/child::c-tag' data=u'<c-tag class="c-class" id="3">C-text</c-'> <Selector xpath='/html/body/a-tag/child::b-tag/descendant-or-self::node()/child::c-tag' data=u'<c-tag class="c-class" id="4">C-text\n '> <Selector xpath='/html/body/a-tag/child::b-tag/descendant-or-self::node()/child::c-tag' data=u'<c-tag id="5">C-text\n <d-tag>'> <Selector xpath='/html/body/a-tag/child::b-tag/descendant-or-self::node()/child::c-tag' data=u'<c-tag>C-text\n <d-tag>D-text<'>>>> for s in selector.xpath('//b- tag//c-tag'): print s ... <Selector xpath='//b-tag//c-tag' data=u'<c-tag class="c-class" id="3">C-text</c-'> <Selector xpath='//b-tag//c-tag' data=u'<c-tag class="c-class" id="4">C-text\n '> <Selector xpath='//b-tag//c-tag' data=u'<c-tag id="5">C-text\n <d-tag>'> <Selector xpath='//b-tag//c-tag' data=u'<c-tag>C-text\n <d-tag>D-text<'>
link utili:
parte 1. xpath: Appunti
parte 3. xpath: Funzioni generiche
scrapy
xpath syntax
Commenti recenti