Thanks Greg! I was thinking at the data mining algorithm and

advertisement
Thanks Greg!
I was thinking at the data mining algorithm and processing level and you, Matt, are
talking about the underlying Ptree implementations, right? My mistake.
Implementing 2-level Ptrees as binary blobs rather than in ascii format.
Dr. Wettstein thinks there is a role for that and that's good enough for me.
It might be worth our time, though, to send some emails around discussing what advantages
that "might" bring to the table. Currently, the underlying implementation works pretty
well (pretty fast).
But where are the current bottlenecks (e.g., counting 1's? ANDing/ORing large dense
Ptrees? storing and referencing derived ptrees??). Are there reason to expect that a
binary blob implementation of Ptrees would help in these application level tasks (faster
processing)?
An "off the wall" idea where binary blob format might be a winner: Processing Ptrees
using active network devices (the network router or other network device actually
processes Ptrees from disparate sites "on the fly". We looked at this briefly a long
time ago in the context of ATM network routers. It was very problematic since the main
sticking point was lining up the Ptrees inputs correctly so that the result is correct
and meaningful.
Another "off the waller": As binary blobs, maybe one stores the entire 1-level
(uncompressed) ptree and also the upper level (level-1) Ptree that goes with it (assuming
1024 bit leaves for the moment). That way the level-1 ptrees could be processed first
(store them all together separate from the uncompressed blobs? or keep them in memory?).
Using the Ptree result of the level-1 processing, use offsets to process (retrieve?) just
those 1024bit leaves that are necessary to finish the processing? (some sort of machine
level "AND/OR in memory with offsets"???) Or if the level-1 blobs are reorganized so
that all basic Ptree offset=0 1024bit words are stored contiguously, then all basic Ptree
offset=1024 words, etc., after the level-1 processing, one retrieval would get all the
operand words needed to process the leaves that still need to be processed.
Let's keep in mind that the initial creation of the Ptrees (using, say a binary blob
underlying implmentation format) is a one-time process so that speed is much much less an
issue.
> On Oct 7, 10:07am, "William Perrizo" wrote:
> } Subject: Re: Saturday research meetings
>> > Hello,
> Hi, hope the day is going well for everyone.
>> > My proposal revolves around developing a method that will allow people to
>> > create binary PTree blobs of 2 levels(Maybe expandable to N levels).
>>
>> What does the word "binary" mean here? What does "blobs" imply?
>
>> > You can currently only create binary PTrees of 1 level.
>>
>> This seems to imply that by "binary" you mean "containing only 1's and 0's)?
>
> Just a quick comment and something to think about.
>
> The current libPTree library has the ability to read/write single
> level PTree's in ASCII format. This isn't as dense as binary but way
> more portable and manipulatable by standard tools. They also compress
> pretty well with bzip2/gzip etc.
>
> I do think there is a role for both binary and ASCII format's. So the
> challenge would be to implement multi-level format's for both.
Download