Tree, treeFilt 1.0

Tree, treeFilt 1.0
------------------
--You Must Have Python For This One--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are two trees!
Well yes, I didn't know that and I found out just before
uploading (phew!).
Anyway, the original tree by Steve Baker <[email protected]>
should be better since it's an ELF executable.
This one has a python source (treeFilt) which you may like changing
(or see how it works), and it can process ls-lR.gz files too!.
A lot of very useful options such as depth, total under branches,
files under dirs, subdirs total size etc are here too.
If you want to keep this one and the original then rename tree which is
a silly little bash script to another name and place it with the
executables.
Of course the original tree is the standard here, and i have no intention
to claim authorship!
The two programs differ in the options, the presentation and the
speed. That tree doesn't show all files while it concentrates on
directories and directory sizes while it has the old MS-DOS \-- thing.
Big Deal!
To see what it does do a 'tree -st /etc'
Gee!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now what?
---------
I see you've just unpacked & untared.
So now all that remains is to move tree.1 to /usr/man/man1 and 'tree' and
'treeFilt' to /usr/bin or wherever you keep file utilities. And we're ready
to roll. Try a 'tree /etc' for example (i insist).

What does it Do?
----------------
You know a Unix's directory tree is a jungle. And sometimes we don't only
care for locating a file, but it's also useful to have an 'idea' of how
things are organized. That is the Tree. And tree shows the Tree in a nice way.
The main thing is done by the script (this is the python script) treeFilt.
TreeFilt is actually working as a filter and takes for input a recursive
directory listing (in the form of ls -lR), and outputs the tree-like
structure. I don't know what will be the result if a different file is
piped to treeFilt (very possible is IndexError), so since it's pointless
to do that don't complain about bugs if you do!

I waited too Long and i was afraid it would erase everything and i CTRL^Ced?
----------------------------------------------------------------------------
TreeFilt may take some time to process a 'BIG' directory listing, that
happens because all dirs of the listing must FIRST be loaded before it can
be processed. So be patient! (well, it's not that long actually. Calculate
how long it would take an ls -lR anyway).

Why is it a filter?
------------------
I made treeFilt to be a filter (that means that it can take the listing
from redirection only f.e. 'cat listing.file | treeFilt'), because this
way we can easily process compressed files, etc.
For instance 'zcat ls-lR.gz | treeFilt' was a good reason to make treeFilt
a filter!
Also, unix people love redirections, pipes and filters !
Now tree is a silly little BASH script that all it does is actually piping
the ls -lAR to treeFilt (and passing parameters). Check it out.

Options & Features?
-------------------
Now to the very interesting part.
TreeFilt understands four options 'f', 's', 't', and 'd'. First of all
let's leave out the 'd' thing and see the rest. For each directory
displayed the 'f' option additionally shows the number of files contained
in it. In a similar way the 's' option shows the total size of files in
each directory. The 't' option shows a grand total of the size of the
whole tree!
The 'd' option now, specifies the depth at which treeFilt should go
down showing branches. The number of depth is the next argument in the
command line following the a 'd'. For instance a valid invocation is
'treeFilt -fst -d 3'.
IF at the same time depth and 'f' or 's' are invoked then the result
depends upon 't'. Without 't' treeFilt will print normally the number of
files or their sizes as if no depth limitation was set. On the other
hand if 't' is set then treeFilt will print the TOTAL size or number
of files BELOW each branch it stops.
For example 'tree -fs -d 1 /' will show the size of the files contained in
the directory usr next to the +--usr branch, while the command:
'tree -fst -d 1 /' will show the total size of ALL the subdirectories below
the +--usr branch. In my computer for example the first shows 168kB and
2 files in /usr while the second 131341 kB and 5930 files!
That is cool.
If 'tree' is invoked then the form must be 'tree [[options] path]', that
is the last argument (if there is one), is assumed to be the path to the
directory to the tree's top (that is up to the script and not to treeFilt).
So if one wants to tree the current directory AND se some of the options,
it should be 'tree -fst .'

One strange thing?
-------------------
Yeah. I Know. If one tries 'ls -lAR /usr/lib | treeFilt -d 2' he/she will
get nothing! That is because the ls shows full path dirs in that case so
depth starts at 2. The tree script doesn't have this problem since it
will 'cd' to the desired directory and THEN perform the 'ls -lAR'.
So bash scripts are better!
Take a look at the script (the tree one) to get an idea of what is
going on. And if one wants to write a better version of the script
remember: "you have to cd before you tree"
But seriously folks, i'd rather cd in the script than fix that in
treeFilt (it a looot more than a simple cd).

Another strange thing in the code?
----------------------------------
In the '#get from the stdin' was it eh?
Well it seems that the same code is repeated twice with little
differences (hmm, smells like OOP here). It's the usual problem of
the below C code :
for (;i<10000;i++) if (a==ADD) b[i]+=c; else b[i]-=c;
as we can see the check if a==ADD is done 10000 times while only one
would be ok. But then again:
if (a==ADD) for (;i<10000;i++) b[i]+=c; else for (;i<10000) b[i]-=c;
has a very _same_ code twice.
Now as the commands in the for() loop become more the dillema of which
implementation to use becomes bigger (i have a graph somewhere :)
In our program now, it seems like things could be faster.
(we don't want to be like M$oft and spend CPU for fun)
The way i see it, we should have different 'for' loops depending on the
following : show_sizes?, show_numbers?, count_totals?, is_depth_limited?
So that is (mumble mumble...) 16 different but very _same_ 'for' loops
there to get the fastest results. I guess it's time for the OOP approach.
If any good person has any ideas about that, contact now!

What else can i change?
------------------------
One more thing that can be changed (well a lot can but this is made in a
way that can be easily changed), is the distance between the tree's branches.
At first you may notice that the distance is 3 characters. That can be
changed by adjusting the BRANCHWIDTH variable in the python script at a
number you prefer.

2Do?
---
Yeah, well, one thing actually. Find a way to avoid showing directories
which are symbolic links by an additional option. Symbolic link directories
are often big and having little important info (especially if their link
is also included in the tree). Also if there are bad links (infinitive
recursion) we will have memory problems. So it would be nice to have an
option to avoid listing those.
Ah! yes, and second thing is to make a binary version for those who don't
have python (but not for those who have a different CPU:)

About & Copying?
---------------
The program is written in Python (tested in 1.3), and well i have to say
that python is a pretty cool language alright. Check out www.python.org
for more.

Let's say I'll be happy if you find this useful (and use it).
I mean people like Linus, the guys at X, and all the others who have
written HUGE programs give them away for free. I'm ashamed even for
having a 'COPYING' section here.

It's absolutely free. Anyway.
And you can do whatever you want with it (ahem).

For anything else contact me
at <[email protected]>

Stelios