PowerShell quick and dirty: create list of CSFD movies

Welcome to another quick & dirty example how you can use PowerShell. Once in a month my friend/colleague @nikdo sends an email about some movies from CSFD site. The email looks something like this (just a part of it):

Code

He writes the code by hand on move after another. In case there are more movies this could be boooring, agree?
Rather than doing it by hand you can create a PowerShell script. You won't be proud of it, but it will do its job.

function Process-Csfd {
    [cmdletbinding()]
    param($name, $dir)
    function downloader {
            $cli = New-Object net.webclient
            $cli.Headers = New-Object net.webheadercollection
            $cli.Headers.Add('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; cs; rv:1.9.1.7) ' +
                'Gecko/20091221 Firefox/3.5.7 (.NET CLR 3.5.30729)')
            $cli.Headers.Add('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')
            $cli.Headers.Add('Accept-Language', 'cs,en-us;q=0.7,en;q=0.3')
            $cli.Headers.Add('Accept-Encoding', 'gzip,deflate')
            $cli.Encoding = [Text.encoding]::UTF8
            $cli
        }

    function get-csfdImageUrl {
        param($url)
        $cli = (downloader)
        Write-Verbose "Downloading $url"
        $x = convert2xml ($cli.DownloadString($url))
        @{ImageUrl=(Select-Xml $x -xpath //table[`@background]).Node.Background }
    }
    function search-csfdMovie {
        param($what)
        Add-Type -AssemblyName 'system.web'
        $cli = (downloader)
        $url = "http://www.csfd.cz/hledani-filmu-hercu-reziseru-ve-filmove-databazi/?search=" `
            + [system.Web.Httputility]::UrlEncode($what)
        Write-Verbose "Searching csfd, url: $url"
        $x = convert2xml ($cli.DownloadString($url))
        $link = Select-Xml $x -XPath `
            '/html/body/table/tr[2]/td/table/tr/td/table/tr/td/table[2]/tr[2]/td/table/tr/td' |
            select -exp Node | 
            select -exp a | 
            ? { $_.'#text' -eq $what }
        if ($link.Count) {
            Write-Host "there are more links"
            $link | % { write-host $_.OuterXml }
            throw $link
        }
        @{Link='http://www.csfd.cz'+$link.href}
    }
    $where = search-csfdMovie $name
    $imgInfo = get-csfdImageUrl $where.Link
    New-Object PSObject -Property ($where + $imgInfo + @{Name=$what})
}

'Herkules 3D', 'Appaloosa', 'Bílý drak', 'Nerozhodný drak', 'Delfín Filip',`
'Wyatt Earp','Tropická bouře','Naprosto osvětleno','Maratónec' | 
    % { Process-Csfd -name $_ -dir 'c:\temp\dir' -verbose } | 
    % -begin { $r = '<html><body>' } `
      -process { $r += '<a href="{0}" style="display:block;float:left;margin:3px;" title="{1}">
          <img src="{2}" style="width:121px;height:180px;border-style:none" /></a>' `
        -f $_.Link,$_.Name,$_.ImageUrl } `
      -end { $r +'</body></html>' } |
    Set-Content c:\temp\csfd.html

It's very straightforward.

  • Function downloader creates a System.Net.WebClient that mimics browser so that the site doesn't refuses the download.
  • Function search-csfdMovie tries to find a movie and return a link to the movie (e.g. return http://www.csfd.cz/film/8083-maratonec-marathon-man/). Note that I use function convert2xml that you can get from previous post about dictionary. It just converts html to xml.
  • Function get-csfdImageUrl uses the link provided by search-csfdMovie and grabs image url of the movie (e.g. http://img.csfd.cz/posters/0/8083.jpg).
  • In the rest I pipe movies names to the Process-Csfd function, format the output html and store it in a html file.

Can you see the xpath in search-csfdMovie? This is the main reason why this approach is quick&dirty. There are no html classes or ids that can be used when constructing xpath. They maybe just wanted to stop such tools as we created now.
When the site changes its markup, the script will fail. But don't worry, it is very easy to repair it ;).

Saving to disk

In case you need to store the files in a directory, just use the following function and alter the code accordingly.

function download-csfdImage {
    param($url)
    $cli = (downloader)
    Write-Verbose "Downloading $url"
    $x =convert2xml ($cli.DownloadString($url))
    $image = (Select-Xml $x -xpath //table[`@background]).Node.Background
    Write-Verbose "Downloading $image"
    $path = $dir+[io.path]::GetFileName($image)
    $cli.DownloadFile($image, $path)
    @{Path = $path; ImageUrl=$url }
}

And that's all. Keep in mind, that for really simple tasks it is much more efficient to do them by hand then to write the script. However, isn't it fun?

Meta: 2010-02-02, Pepa

Tags: PowerShell