Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving to filesystem breaks images due to extensions #5

Open
wreiske opened this issue Apr 13, 2019 · 7 comments
Open

Moving to filesystem breaks images due to extensions #5

wreiske opened this issue Apr 13, 2019 · 7 comments
Labels
bug Something isn't working

Comments

@wreiske
Copy link

wreiske commented Apr 13, 2019

Cool tool!
I just ran it to move files out of GridFS to the local file system. Some images worked, some images didn't work... I spent about a half hour looking at the differences from the working images and the broken images in the rocketchat_uploads collection, as well as on disk.

After uploading a fresh new file, I realized rocket.chat stored the file without an extension in the uploads folder. I tried moving an image I know was broken from image.png to just image, and it started working!

image

Here's a one liner to fix it.

find -type f -name '*.*' | while read f; do mv "${f}" "${f%.*}"; done
@wreiske
Copy link
Author

wreiske commented Apr 13, 2019

Images with .png .jpg etc in the upload name seem to be broken now. Images that didn't have an extension in the name work....

Saw a diff in the broken vs the worked ones:
image

The following fixed it:

db.rocketchat_uploads.update({}, {$set: {extension: ""}}, {multi: true});
WriteResult({ "nMatched" : 4675, "nUpserted" : 0, "nModified" : 1062 })

@arminfelder arminfelder added the bug Something isn't working label Apr 13, 2019
@jcatarino
Copy link

Is there an estimate when this will be fixed? I'm going to use this script but I will try to wait until this gets sorted. Thank you.

@arminfelder
Copy link
Owner

@jcatarino, I haven't had time to investigat/reproduce the issue yet, but it should be easy to fix, either we could remove the mime detection form the script, or set "extension" to "" during the dbupdate.
In case there is urgency, you can write me a mail to discuss the matter

@jcatarino
Copy link

thanks I will postpone this migration until I can test and change the script myself, as im prioritising other tasks atm. If I eventually do it, I will do a PR with it.

@jcatarino
Copy link

My migration went ok. I commented these lines:

else:
    fileext = mime.guess_extension(res.content_type)

@wreiske
Copy link
Author

wreiske commented May 6, 2019

I hacked this together really quick to fix some of the issues we were having after the migration... it may be a bit overkill, but it solved all the cases we were having....

mimet = MimeTypes()
        db = MongoClient(host=self.host, port=self.port)[self.db]
        db = self.getdb()
        uploadsCollection = db["rocketchat_uploads"]

        uploads = uploadsCollection.find({}, no_cursor_timeout=True)
        print(uploads)
        i = 0
        for upload in uploads:
                        fileext = ""
                        filename = upload['_id']
                        mime = magic.Magic(mime=True)
                        split = upload["name"].rsplit('.',1)
                        if len(split) >= 2 and ' ' not in split[1]:
                            print("Got split: %s", split[1])
                            fileext = split[1]
                        else:
                            if os.path.isfile("/app/uploads/" + upload["_id"]):
                                fileext = mime.from_file("/app/uploads/" + upload["_id"])
                            if "identify" in upload and upload["identify"]["format"] != "":
                                fileext = upload["identify"]["format"]
                            else:
                                fileext = mimet.guess_extension(fileext)

                        if fileext is not None and fileext != "":
                            filename = filename + "."+fileext.replace(".jfif", ".jpg")

                        i += 1
                        print("%i. Renaming %s to %s (%s)" % (i, upload['_id'], filename, upload['name']))
                        uploadsCollection.update_one({"_id": upload["_id"]}, { "$set": { "extension": fileext } })
                        if os.path.isfile("/app/uploads/" + upload["_id"]):
                            os.rename("/app/uploads/" + upload["_id"], "/app/uploads/" + filename)

user578 pushed a commit to user578/gridfsmigrate that referenced this issue Jun 27, 2019
@sveinse
Copy link
Contributor

sveinse commented Mar 8, 2021

I have also experienced problems using this tool to migrate GridFS to FileStore. Some of the images simply doesn't appear as the URL for downloading doesn't work. I think I have worked out the issue.

The entries/rows in the rocketchat_uploads collection has a field extension. It turns out that the file stored on disk has to match this field, otherwise it will not be able to download the files. I've forked and created a feature branch where I have fixed it. When I run this all my pictures and attachments are available after converting. Please see PR #17

Doing this https://github.com/arminfelder/gridfsmigrate/blob/master/migrate.py#L111 does not seem to work, because it might set a suffix other than what the db expects to find.

As a side note, it seems like (but I haven't confirmed it), that newer rocket chat versions does not use any file suffix any more. It only uses the _id field for the filname. My file store has a lot of files with and without suffix. In order to satisfy my desire for clean structure, I made this extra function that I used to rename all files with suffix to the new no-suffix filename scheme and blanking the extension field in the database. After running it on my site it seems like it works too.

    # Put inside class Migrator
    def fixFilenames(self, collection, basedir):

        db = self.getdb()
        uploadsCollection = db[collection]

        uploads = uploadsCollection.find({}, no_cursor_timeout=True)
        i = 0
        for upload in uploads:

            fileext = upload.get('extension') or None
            fileid = upload['_id']

            if not upload['complete']:
                continue

            # Get the real filename by looking for the last path element in 'path',
            # if that file cannot be found, then we're using the ID.
            fnames = upload['path'].split('/')
            filename = fnames[-1]
            if not os.path.isfile(os.path.join(basedir, filename)):
                filename = fileid + '.' + fileext if fileext else fileid

            # Ensure the file is present
            fullfilename = os.path.join(basedir, filename)
            if not os.path.isfile(fullfilename):
                print(f"{filename}: No such file")
                #print(upload)
                continue

            # Does the file have a suffix?
            fsplit = filename.split('.')
            if len(fsplit) > 1 and fsplit[-1]:
                suffix = fsplit[-1]
            else:
                suffix = None

            if suffix != fileext:
                print("{filename}: Suffix mismatches database (expected {fileext})")
                #print(upload)
                continue

            # Skip files without suffix
            if not suffix:
                continue

            i += 1
            print(f"{i}. Renaming {filename} to {fileid}")

            # Rename file
            os.rename(fullfilename, os.path.join(basedir, fileid))

            # Update database
            uploadsCollection.update_one({"_id": upload["_id"]}, { "$set": { "extension": '' } })

    # Add this to the bottom of the file
    if args.command == "renamefiles":
        obj.fixFilenames("rocketchat_uploads", args.destination)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants