tscrape_update: sync improvements from sfeed_update - tscrape - twitter scraper | |
git clone git://git.codemadness.org/tscrape | |
Log | |
Files | |
Refs | |
README | |
LICENSE | |
--- | |
commit 51995d6fc4760fadac68650bb82773b9bf9eae79 | |
parent db47c97bea3370886d011a2c950ead2551cf3fbc | |
Author: Hiltjo Posthuma <[email protected]> | |
Date: Fri, 2 Aug 2019 18:33:10 +0200 | |
tscrape_update: sync improvements from sfeed_update | |
- change order of functions in script and documentation to match the execution | |
order. | |
- improve a comment about the parallel processing behaviour (performance stall). | |
Diffstat: | |
M README | 4 ++-- | |
M tscrape_update | 16 ++++++++-------- | |
2 files changed, 10 insertions(+), 10 deletions(-) | |
--- | |
diff --git a/README b/README | |
@@ -6,8 +6,8 @@ Twitter feed HTML scraper. | |
It scrapes HTML from stdin and outputs it to a TAB-separated format that can be | |
easier parsed with various (UNIX) tools. There are formatting programs included | |
to convert this TAB-separated format to various other formats. There are also | |
-some programs and scripts included to import and export OPML and to update, | |
-sort, filter and merge feed items. | |
+some programs and scripts included to import and export OPML and to fetch, | |
+filter, merge and order items. | |
Build and install | |
diff --git a/tscrape_update b/tscrape_update | |
@@ -50,23 +50,23 @@ filter() { | |
cat | |
} | |
-# order by timestamp (descending). | |
-# order(name) | |
-order() { | |
- sort -t ' ' -k1rn,1 | |
-} | |
- | |
# merge raw files: unique sort by id, retweetid. | |
# merge(name, oldfile, newfile) | |
merge() { | |
sort -t ' ' -u -k5,5 -k8,8 "$2" "$3" 2>/dev/null | |
} | |
+# order by timestamp (descending). | |
+# order(name) | |
+order() { | |
+ sort -t ' ' -k1rn,1 | |
+} | |
+ | |
# fetch and parse feed. | |
# feed(name, feedurl) | |
feed() { | |
- # wait until ${maxjobs} are finished: throughput using this logic is | |
- # non-optimal, but it is simple and portable. | |
+ # wait until ${maxjobs} are finished: will stall the queue if an item | |
+ # is slow, but it is portable. | |
[ ${signo} -ne 0 ] && return | |
[ $((curjobs % maxjobs)) -eq 0 ] && wait | |
[ ${signo} -ne 0 ] && return |