Thanks Greg! I was thinking at the data mining algorithm and processing level and you, Matt, are talking about the underlying Ptree implementations, right? My mistake. Implementing 2-level Ptrees as binary blobs rather than in ascii format. Dr. Wettstein thinks there is a role for that and that's good enough for me. It might be worth our time, though, to send some emails around discussing what advantages that "might" bring to the table. Currently, the underlying implementation works pretty well (pretty fast). But where are the current bottlenecks (e.g., counting 1's? ANDing/ORing large dense Ptrees? storing and referencing derived ptrees??). Are there reason to expect that a binary blob implementation of Ptrees would help in these application level tasks (faster processing)? An "off the wall" idea where binary blob format might be a winner: Processing Ptrees using active network devices (the network router or other network device actually processes Ptrees from disparate sites "on the fly". We looked at this briefly a long time ago in the context of ATM network routers. It was very problematic since the main sticking point was lining up the Ptrees inputs correctly so that the result is correct and meaningful. Another "off the waller": As binary blobs, maybe one stores the entire 1-level (uncompressed) ptree and also the upper level (level-1) Ptree that goes with it (assuming 1024 bit leaves for the moment). That way the level-1 ptrees could be processed first (store them all together separate from the uncompressed blobs? or keep them in memory?). Using the Ptree result of the level-1 processing, use offsets to process (retrieve?) just those 1024bit leaves that are necessary to finish the processing? (some sort of machine level "AND/OR in memory with offsets"???) Or if the level-1 blobs are reorganized so that all basic Ptree offset=0 1024bit words are stored contiguously, then all basic Ptree offset=1024 words, etc., after the level-1 processing, one retrieval would get all the operand words needed to process the leaves that still need to be processed. Let's keep in mind that the initial creation of the Ptrees (using, say a binary blob underlying implmentation format) is a one-time process so that speed is much much less an issue. > On Oct 7, 10:07am, "William Perrizo" wrote: > } Subject: Re: Saturday research meetings >> > Hello, > Hi, hope the day is going well for everyone. >> > My proposal revolves around developing a method that will allow people to >> > create binary PTree blobs of 2 levels(Maybe expandable to N levels). >> >> What does the word "binary" mean here? What does "blobs" imply? > >> > You can currently only create binary PTrees of 1 level. >> >> This seems to imply that by "binary" you mean "containing only 1's and 0's)? > > Just a quick comment and something to think about. > > The current libPTree library has the ability to read/write single > level PTree's in ASCII format. This isn't as dense as binary but way > more portable and manipulatable by standard tools. They also compress > pretty well with bzip2/gzip etc. > > I do think there is a role for both binary and ASCII format's. So the > challenge would be to implement multi-level format's for both.