Home > scrapy, xpath > Xpath: abbreviazioni

Xpath: abbreviazioni

10 Ottobre 2017

Abbiamo parlato della parentele nella parte 1, vediamo come utilizzare
le abbreviazioni disponibili

Il body utilizzato rimane lo stesso:

<A-TAG class="a-class">A-text
    <B-TAG class="b-class">B-text</B-TAG>
    <B-TAG id="1">B-text
        <C-TAG class="c-class" id="3">C-text</C-TAG>
        <C-TAG class="c-class" id="4">C-text
            <D-TAG>D-text</D-TAG> 
        </C-TAG>
    </B-TAG>
    <B-TAG id="2">B-text
        <C-TAG id="5">C-text
            <D-TAG>D-text</D-TAG>
            <D-TAG id="6">D-text
                <E-TAG>E-text</E-TAG>
                <E-TAG>E-text
                    <F-TAG>F-text</F-TAG>
                </E-TAG>
            </D-TAG>
        </C-TAG>
    </B-TAG>
    <B-TAG attr="b-attr">B-text
        <C-TAG>C-text
            <D-TAG>D-text</D-TAG>
        </C-TAG>
    </B-TAG>
</A-TAG>

Abbreviazioni

1.’child::’

è sottinteso pertanto le sintassi:

selector.xpath('//c-tag')
selector.xpath('//b-tag/c-tag')

equivalgono a

selector.xpath('//child::c-tag')
selector.xpath('//child::b-tag/child::c-tag')
>>> for s in selector.xpath('//child::b-tag/child::c-tag'): print s
...
<Selector xpath='//child::b-tag/child::c-tag' data=u'<c-tag class="c-class" id="3">C-text</c-'>
<Selector xpath='//child::b-tag/child::c-tag' data=u'<c-tag class="c-class" id="4">C-text\n   '>
<Selector xpath='//child::b-tag/child::c-tag' data=u'<c-tag id="5">C-text\n            <d-tag>'>
<Selector xpath='//child::b-tag/child::c-tag' data=u'<c-tag>C-text\n            <d-tag>D-text<'>

2. ‘@’

è la stessa cosa che scrivere ‘attribute:: ‘

>>> selector.xpath('//child::b-tag[@id=1]')
[<Selector xpath='//child::b-tag[@id=1]' data=u'<b-tag id="1">B-text\n        <c-tag clas'>]
>>> selector.xpath('//child::b-tag[attribute::id=1]')
[<Selector xpath='//child::b-tag[attribute::id=1]' data=u'<b-tag id="1">B-text\n        <c-tag clas'>]

attenzione ai path!
se al posto delle parentesi quadre, usassi lo /, invece del tag che contiene quel
determinato attributo, selezionerei l’attributo stesso:

>>> selector.xpath('//child::b-tag/@id=1')
[<Selector xpath='//child::b-tag/@id=1' data=u'1'>]

La differenza è evidente se osserviamo il valore di data, o
se utilizziamo il metodo extract() di scrapy:

>>> selector.xpath('//child::b-tag[@id=1]').extract()
[u'<b-tag id="1">B-text\n        <c-tag class="c-class" id="3">C-text</c-tag><c-tag class="c-class"...tag>']
>>> selector.xpath('//child::b-tag/@id=1').extract()
[u'1']

Nel primo caso infatti otteniamo il b-tag con id=1, nel secondo caso otteniamo il testo
dell’attributo id del b-tag con tale id, cioè ‘1’.

3. ‘.’

è la stessa cosa di ‘self::’, ad esempio ‘.//b-tag’ equivale a ‘self::node()//child::b-tag’:

>>> selector.xpath('.//b-tag[@id=1]')
[<Selector xpath='.//b-tag[@id=1]' data=u'<b-tag id="1">B-text\n        <c-tag clas'>]
>>> selector.xpath('self::node()//child::b-tag[@id=1]')
[<Selector xpath='self::node()//child::b-tag[@id=1]' data=u'<b-tag id="1">B-text\n        <c-tag clas'>]

4. ‘..’

è la stessa cosa di ‘parent::’, ad esempio ‘//b-tag/..’ equivale a ‘//child::b-tag/parent::node()’:

>>> selector.xpath('//b-tag/..')
[<Selector xpath='//b-tag/..' data=u'<a-tag class="a-class">A-text\n    <b-tag'>]
>>> selector.xpath('//child::b-tag[@id=1]/parent::node()')
[<Selector xpath='//child::b-tag[@id=1]/parent::node()' data=u'<a-tag class="a-class">A-text\n    <b-tag'>]

5. ‘//’

è la stessa cosa di ‘descendant-or-self::’, ad esempio ‘//b-tag//c-tag’ equivale
‘//child::b-tag/descendant-or-self::node()/child::c-tag’

>>> for s in selector.xpath('/html/body/a-tag/child::b-tag/descendant-or-self::node()/child::c-tag'): print s
...
<Selector xpath='/html/body/a-tag/child::b-tag/descendant-or-self::node()/child::c-tag' data=u'<c-tag class="c-class" id="3">C-text</c-'>
<Selector xpath='/html/body/a-tag/child::b-tag/descendant-or-self::node()/child::c-tag' data=u'<c-tag class="c-class" id="4">C-text\n   '>
<Selector xpath='/html/body/a-tag/child::b-tag/descendant-or-self::node()/child::c-tag' data=u'<c-tag id="5">C-text\n            <d-tag>'>
<Selector xpath='/html/body/a-tag/child::b-tag/descendant-or-self::node()/child::c-tag' data=u'<c-tag>C-text\n            <d-tag>D-text<'>>>> for s in selector.xpath('//b-

tag//c-tag'): print s
...
<Selector xpath='//b-tag//c-tag' data=u'<c-tag class="c-class" id="3">C-text</c-'>
<Selector xpath='//b-tag//c-tag' data=u'<c-tag class="c-class" id="4">C-text\n   '>
<Selector xpath='//b-tag//c-tag' data=u'<c-tag id="5">C-text\n            <d-tag>'>
<Selector xpath='//b-tag//c-tag' data=u'<c-tag>C-text\n            <d-tag>D-text<'>

link utili:
parte 1. xpath: Appunti
parte 3. xpath: Funzioni generiche
scrapy
xpath syntax

Categorie:scrapy, xpath Tag: ,
I commenti sono chiusi.