NAME

NAME
Web::PageMeta - get page open-graph / meta data

SYNOPSIS
use Web::PageMeta;
my $page = Web::PageMeta->new(url => "https://www.apa.at/");
say $page->title;
say $page->image;

async fetch previews and images:

use Web::PageMeta;
my @urls = qw(
https://www.apa.at/
http://www.diepresse.at/
https://metacpan.org/
https://github.com/
);
my @page_views = map { Web::PageMeta->new( url => $_ ) }
@urls;
Future->wait_all( map { $_->fetch_image_data_ft, } @page_views )->get;
foreach my $pv (@page_views) {
say 'title> '.$pv->title;
say 'img_size> '.length($pv->image_data);
}

# alternativelly instead of Future->wait_all()
use Future::Utils qw( fmap_void );
fmap_void(
sub { return $_[0]->fetch_image_data_ft },
foreach => [@page_views],
concurrent => 3
)->get;

DESCRIPTION
Get (not only) open-graph web page meta data. can be used in both normal
and async code.

For any other than 200 http status codes during data downloads,
HTTP::Exception is thrown.

ACCESSORS
new
Constructor, only "url" is required.

url
HTTP url to fetch data from.

user_agent
User-Agent header to use for http requests. Default is one from Chrome
89.0.4389.90.

extra_headers
HashRef with extra http request headers.

cookie_jar
Accepts optional HTTP::Cookies compatible object that must provide
"get_cookies()" method. If set will send http cookie headers with each
request.

title
Returns title of the page.

description
Returns description of the page.

image
Returns image location of the page.

image_data
Returns image binary data of "image" link.

Will throw 404 exception if there is not "image" link.

page_meta
Returns hash ref with all open-graph data.

extra_scraper
Web::Scraper object to fetch image, title or description from different
than default location.

use Web::Scraper;
use Web::PageMeta;
my $escraper = scraper {
process_first '.slider .camera_wrap div', 'image' => '@data-src';
};
my $wmeta = Web::PageMeta->new(
url => 'https://www.meon.eu/',
extra_scraper => $escraper,
);

page_body_hdr
Returns array ref with page [$body,$headers]. Can be useful for
post-processing or special/additional data extractions.

fetch_page_meta_ft
Returns future object for fetching paga meta data. See "ASYNC USE". On
done "page_meta" hash is returned.

fetch_image_data_ft
Returns future object for fetching image data. See "ASYNC USE" On done
"image_data" scalar is returned.

fetch_page_body_hdr_ft
Returns future object for fetching page content and headers. See "ASYNC
USE" On done "page_body_hdr" array ref is returned.

ASYNC USE
To run multiple page meta data or image http requests in parallel or to
be used in async programs "fetch_page_meta_ft" and fetch_image_data_ft
returning Future object can be used. See "SYNOPSIS" or t/02_async.t for
sample use.

SEE ALSO
<https://ogp.me/>

AUTHOR
Jozef Kutej, "<jkutej at cpan.org>"

LICENSE AND COPYRIGHT
Copyright 2021 [email protected]

This program is free software; you can redistribute it and/or modify it
under the terms of either: the GNU General Public License as published
by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.