Friday 15 April 2011

java - Unzip files created with WinZIP with I18N file names? -


People these days create their ZIP archives with WinZIP, which allows for internationalization (i.e. non-Latin: Cyrillic, Greek, Chinese, English, Chinese, you name it) filename. Unfortunately, there is a problem trying to open such a file: Unix Unzip makes garbage-named files and directories such as "£ ¤ ¥ ¥ ì". Java and its jar commands fail miserably on such archives.

Is there a popular way to open such files programmatically? Unix or Java

Supports unicode and arbitrary encoding for filename within zipfiles, either read and write zip for.

This is a .NET library. For the use of Unix, you have to use Mono as prerequisite.

If the zipfile is correctly created by WinZip, in other words, if it is compatible then there is no need to do any special work at that time when you specify encoding while unpacking it. According to zip specs, two supported encoding zipfiles are used for file names: UTF-8 and IBM 437. One or more of these encodings are used in the zip metadata and any zip library can detect and use it. Dotnet Ziff automatically detects it when reading a compliance zip. In this way:

  (var zip = ZipFile.Read ("thearchive.zip")) (foreach (var e in zip) {// e.FileName is used by name name E.Extract ("Extraction-directory");}}  

Writing encoding wrt that are creating "noncompliance" zips. WinRar is a - on this computer In the default encoding will use a encoded zip in the use. In Shanghai it will use CP 950, while in Iceland, some more, and in Lisbon, some more "non-anonymous" The advantage for "LAN" is that Windows Explorer will display the i18n-ized file name in such a zip and display it properly. In other words, "noncompliance" often what people want, because Windows (yet?) UTF-8 Zip does not support files

(What to do with all encoding used in the zipfile, not the encoding used in the files contained in the zip file)

Zip S In other words, if you use CP 950 when you create a zip, your removal logic needs to be "addressed" when it is removed to extract the zip metadata. CP 950 is used - nothing in the zip file, that carries information In addition to this, of course, you use the Zip Library programmatically for removal They should support arbitrary encoding. As far as I know, Java's Zip library is not there. Dotnet zips such as:

 using  (zip file zip = zip file. Read (zipto extract, system .text encoding .gate encoding (950))) {foreach (zip in zip Enter) e.Extract (extractDirectory); DotNetZip is free, and open source. 


No comments:

Post a Comment