Python Zipfile.badzipfile Bad Magic Number For Central Directory
Ok, so basically I'm trying to write simple quick script in python to search xml from *.fla (flash) files. All I'm doing, is opening *.fla files from project via zipfile.ZipFile, go through all files in this zip archive, and search specific term by regex (dirty and simple). This is not the ideal solution for my problem, but this will work for now. I'm using CS6, and I know that *.fla files from CS5 and above are basically zip archives with xml (and other files) inside, and I have successfully extracted those files via 7zip on windows. But somewhy, on half the files from my project, zipfile.ZipFile throws an exception 'Bad magic number for central directory' on creation. The call stack looks like this:
python zipfile.badzipfile bad magic number for central directory
You hex dump shows the start of the file and the first 4 bytes are indeed a valid local header signature. The problem is the python code is complaining about the central directory header - this is near the end of the file.
File "HOMEDIR/anaconda3/envs/q2019.7_fresh/lib/python3.6/zipfile.py", line 1226, in _RealGetContentsraise BadZipFile("Bad magic number for central directory")zipfile.BadZipFile: Bad magic number for central directory
The "magic number" comes from UNIX-type systems where the first few bytes of a file held a marker indicating the file type. Python puts a similar marker into its pyc files when it creates them. The Python interpreter ensures that this number is correct when loading the file.
Anything that damages this magic number will cause the error. This includes editing the .pyc file or trying to run a .pyc file from a different version of Python (usually later) than your interpreter.
Bad magic number, appears whenever the header (magic number in python) of the compiled byte-code is either corrupted or when you try to running a pyc from a different version of python (usually later) than your interpreter. There are two solutions to rectify this runtime error:
importerror: bad magic number error occurs mainly in random module python with Ubuntu operating system because of byte prefix in .pyc file. Actually, what happens with unix-oriented operating systems, The operating system assigns the first few BYTES of the file as an identifier or marker with the file. This is nothing but the magic numbers. The same applies to .pyc files of python. Now if someone tries to use a different Python interpreter or make changes in those .pyc files. The python interpreter throws a bad magic number error. This is the root cause of the following error.
As an aside, the first word of all my 2.5.1(r251:54863) pyc files is 62131, 2.6.1(r261:67517) is 62161. The list of all magic numbers can be found in Python/import.c, reproduced here for completeness (current as at the time the answer was posted, it may have changed since then):
>> I need to save a fairly large set of arrays to disk. I have saved it using>> numpy.savez, and the resulting file is around 11Gb (yes, I did say fairly>> large ;D). When I try to load it using numpy.load, the zipfile module>> compains about>> BadZipfile: Bad magic number for file header>>>> I can't open it with the normal zip utility present on the system, but it>> could be that it's barfing about files being larger than 2Gb.>> Is there some file limit for npzs?> > Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does> not yet support the ZIP64 format.> >> Is there anyway I can recover the data (I>> guess I could try decompressing the file with 7z and extracting the>> individual npy files?)> > Possibly. However, if the normal zip utility isn't working, 7z> probably won't, either. Worth a try, though.
> -----BEGIN PGP SIGNED MESSAGE-----> Hash: SHA1> >>> I need to save a fairly large set of arrays to disk. I have saved it using>>> numpy.savez, and the resulting file is around 11Gb (yes, I did say fairly>>> large ;D). When I try to load it using numpy.load, the zipfile module>>> compains about>>> BadZipfile: Bad magic number for file header>>> >>> I can't open it with the normal zip utility present on the system, but it>>> could be that it's barfing about files being larger than 2Gb.>>> Is there some file limit for npzs?>> >> Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does>> not yet support the ZIP64 format.>> >>> Is there anyway I can recover the data (I>>> guess I could try decompressing the file with 7z and extracting the>>> individual npy files?)>> >> Possibly. However, if the normal zip utility isn't working, 7z>> probably won't, either. Worth a try, though.> > I've had similar problems, my solution was to move to HDF5. There are> two options for accessing and working with HDF files from python: h5py> ( ) and pytables> ( ). Both packages have built in numpy support.> > Regards,> Lafras
try: if key == 'r': self._RealGetContents() elif key == 'w': # set the modified flag so central directory gets written # even if no files are added to the archive self._didModify = True elif key == 'a': try: # See if file is a zip file self._RealGetContents() # seek to start of directory and overwrite self.fp.seek(self.start_dir, 0) except BadZipfile: # file is not a zip file, just append self.fp.seek(0, 2)
# set the modified flag so central directory gets written # even if no files are added to the archive self._didModify = True else: raise RuntimeError('Mode must be "r", "w" or "a"') except: fp = self.fp self.fp = None if not self._filePassed: fp.close() raise
def _RealGetContents(self): """Read in the table of contents for the ZIP file.""" fp = self.fp try: endrec = _EndRecData(fp) except IOError: raise BadZipfile("File is not a zip file") if not endrec: raise BadZipfile, "File is not a zip file" if self.debug > 1: print endrec size_cd = endrec[_ECD_SIZE] # bytes in central directory offset_cd = endrec[_ECD_OFFSET] # offset of central directory self._comment = endrec[_ECD_COMMENT] # archive comment
if self.debug > 2: inferred = concat + offset_cd print "given, inferred, offset", offset_cd, inferred, concat # self.start_dir: Position of start of central directory self.start_dir = offset_cd + concat fp.seek(self.start_dir, 0) data = fp.read(size_cd) fp = cStringIO.StringIO(data) total = 0 while total centdir = fp.read(sizeCentralDir) if len(centdir) != sizeCentralDir: raise BadZipfile("Truncated central directory") centdir = struct.unpack(structCentralDir, centdir) if centdir[_CD_SIGNATURE] != stringCentralDir: raise BadZipfile("Bad magic number for central directory") if self.debug > 2: print centdir filename = fp.read(centdir[_CD_FILENAME_LENGTH]) # Create ZipInfo instance to store file information x = ZipInfo(filename) x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH]) x.comment = fp.read(centdir[_CD_COMMENT_LENGTH]) x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET] (x.create_version, x.create_system, x.extract_version, x.reserved, x.flag_bits, x.compress_type, t, d, x.CRC, x.compress_size, x.file_size) = centdir[1:12] x.volume, x.internal_attr, x.external_attr = centdir[15:18] # Convert date/time code to (year, month, day, hour, min, sec) x._raw_time = t x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F, t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )
# Skip the file header: fheader = zef_file.read(sizeFileHeader) if len(fheader) != sizeFileHeader: raise BadZipfile("Truncated file header") fheader = struct.unpack(structFileHeader, fheader) if fheader[_FH_SIGNATURE] != stringFileHeader: raise BadZipfile("Bad magic number for file header")
try: if self.mode in ("w", "a") and self._didModify: # write ending records pos1 = self.fp.tell() for zinfo in self.filelist: # write central directory dt = zinfo.date_time dosdate = (dt[0] - 1980)