[TRUEZIP-64]
Created:
02/Apr/11 Updated: 07/Apr/11 Resolved: 04/Apr/11
Status: Closed
Project: TrueZIP
Component/s: TrueZIP File*
Affects
Version/s:
TrueZIP 7.0 RC 1
Fix Version/s: None
Type: Improvement
Reporter: qforce
Resolution: Incomplete
None Labels:
Remaining
Estimate:
Time Spent:
Not Specified
Not Specified
Not Specified Original
Estimate:
Description
Priority:
Assignee:
Votes:
Major
Christian Schlichtherle
0
Hi there,
From looking at the Javadoc of TFile, there seems to be no way to distinguish between the following cases:
Case A: Regular directory inside zip archive, e.g. C:\test.zip\folder.zip, where folder.zip is a regular folder.
Case B: Zip archive inside another zip archive, e.g. C:\test.zip\folder.zip, where folder.zip is a zip archive.
TFile.isDirectory(), TFile.isArchive() and TFile.isFile() return the same values in both cases, and I couldn't find any other methods to distinguish between A and B.
There are actually good reasons why one would want to do that:
Incremental file indexing: In case B, store the last-modified value of the zip archive, and only recurse into the zip archive if the stored last-modified value and the current lastmodified value are different. This cannot be done in case A, however, because the lastmodified value makes no sense for directories.
Filename filtering: Apply a filename filter while recursing through a file system, but the filter should only match files and zip archives (case B), not directories (case A).
This problem only exists for nested directories/archives, though, because in case of non-nested files one could create a new java.io.File and call File.isFile() on that to distinguish between
C:\folder.zip (regular directory) and C:\folder.zip (zip archive).
Best regards
Tran Nam Quang
Comments
Comment by Christian Schlichtherle
[ 04/Apr/11 ]
You can use Tfile.isEntry() to discriminate case A and B like this:
TFile file = new TFile("outer.zip/inner.zip"); if (file.isDirectory()) // true archive file, not a false positive? if (file.isArchive()) // is archive file path name, e.g. ends with ".zip"? if (file.isEntry()) // is enclosed in another archive file? bingo();
Please do not use JIRA for service requests. You can get support by using the mailing lists: http://truezip.java.net/mail-lists.html
Comment by qforce
[ 04/Apr/11 ]
I thought I was supposed to come here after reading your latest blog post, where you linked to this JIRA site. Anyway, sorry for opening this issue :/
Comment by Christian Schlichtherle
[ 04/Apr/11 ]
I am sorry, it's my fault. Yes, I want and need feedback and I cannot expect you or anybody else to differentiate between a service request and a feature request.
Comment by qforce
[ 04/Apr/11 ]
Hope you don't mind continueing this discussion...
As far as I can tell, the solution you suggested doesn't seem to resolve this issue completely. The
Javadocs say that TFile.isArchive() only checks the filename, but it seems the zip file contains more accurate information than that. More specifically, the class java.util.ZipEntry has a method isDirectory() that allows distinguishing between files and directories, and this seems more reliable than a filename-only check. I wonder if TrueZIP could somehow incorporate this information in its API?
Anyway, I don't mean to nitpick or anything, I think this is just a minor issue in an otherwise excellent library
Comment by Christian Schlichtherle
[ 04/Apr/11 ]
You're welcome.
The archive file system does indeed use the ZipEntry information and makes it available by calling TFile.isDirectory() etc. The major difference between TFile.isDirectory() and
TFile.isArchive() and TFile.isEntry() is that the latter two methods do not access the file system in any way but instead only represent the result of scanning the path name by means of the
TArchiveDetector.
The short example I've presented therefore makes sure that
(1) the file's true state is really a (virtual) directory, e.g. a real directory or an archive file,
(2) the leaf of its path name matches the pattern for an archive file, e.g. "inner.zip", and
(3) any parent directory of its path name matches the pattern for archive file too, e.g.
"outer.zip".
There remains only one special case you could not differentiate using this simple code:
"inner.zip" could actually be a directory entry within "outer.zip" rather than a regular ZIP file.
This is one case of a false positive archive file.
In case you need to differentiate this, you could use the following:
TFile file2 = new TFile(file.getParentFile(), file.getName(), TDefaultArchiveDetector.NULL);
This prevents the leaf name to be scanned for archive file name patterns. Now if file is a regular
ZIP file, then file.isFile() is true. If file is a regular directory entry, then file.isDirectory() is true instead.
This may seem a bit odd, but it's a result of the odd use case, i.e. a typical application is not expected to identify false positive archive files, but rather rely on TrueZIP to do this - which it does accurately.
Comment by Christian Schlichtherle
[ 04/Apr/11 ]
Sorry, the paragraph before the last paragraph should read:
This prevents the leaf name to be scanned for archive file name patterns. Now if file is a regular
ZIP file, then file2.isFile() is true. If file is a regular directory entry, then file2.isDirectory() is true instead.
Comment by qforce
[ 06/Apr/11 ]
TFile file2 = new TFile(file.getParentFile(), file.getName(), TDefaultArchiveDetector.NULL);
Ah, yes, this was the solution I was looking for
Would be nice though if you could make something like this available as a method on TFile.
Not for me, but for other users who might run into the same problem in the future.
Comment by qforce
[ 06/Apr/11 ]
TFile file2 = new TFile(file.getParentFile(), file.getName(), TDefaultArchiveDetector.NULL);
Ah, yes, this was the solution I was looking for
Would be nice though if you could make this something like this available as a method on
TFile. Not for me, but for other users who might run into the same problem in the future.
Comment by Christian Schlichtherle
[ 06/Apr/11 ]
You're welcome!
Please consider if you really need to differentiate this case. After all, the purpose of TrueZIP is to hide the fact that you're dealing with an archive file and enable you to treat it like a virtual directory.
Regards,
Christian
Comment by qforce
[ 07/Apr/11 ]
Well, I think that in 95% of the use cases there's no need to distinguish between directories and zip archives. However, with regard to the remaining 5%, I'd like to elaborate on the two use cases that I've already described in my original post:
Case 1) Implementing a filename filter, i.e. a filter that hides all files that match a certain filename pattern. Without the ability to distinguish between directories and zip archives, I'd have no choice but to implement a filename filter that only applies to regular files, but not to zip archives. Sure, I could make the filter behave this way, but it would probably come as a big surprise to the user. And yes, I could perhaps put a big bright-yellow warning in the manual about this oddity, but what user reads the manual anyway?
Case 2) Incremental file scanning/indexing: Theoretically, making directories and zip archives indistinguishable wouldn't make any difference to an indexing algorithm that recursively and incrementally walks through a directory tree in order to find and index modified files. However, there's a practical difference: Performance. It's simply faster to skip zip archives that haven't been modified, which could be detected by looking at the last-modified fields. Of course, this requires being able to distinguish between zip archives and directories, because the lastmodified field of a directory is practically meaningless (which is why you can't skip them).
To summarize it: The idea "no difference between zip archives and directories" is a beautiful one and works most of the time, but there are some corner cases where it breaks down. In case
1, it all comes down to user expectations: The average user expects a program to treat zip archives as files, not as directories. In case 2, it all comes down to performance.
Generated at Wed Feb 10 04:07:30 UTC 2016 using JIRA 6.2.3#6260sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.