Files are already on S3, how to get MediaCloud to recognize them?



  • I am currently using WPEngine and their largefs service which puts the uploads directory on S3 but it is transparent to Wordpress.

    What I would like to do now is install MediaCloud but not have to migrate all the images to the cloud (because they are already there, even though Wordpress doesn't know it).

    Is there a way to tell MediaCloud to just use the images already in my S3 bucket? I've seen that you have a new CLI tool to migrate from the HumanMade S3 plugin and I was wondering how exactly that works and if I could maybe modify it to do what I need?
    I would assume MediaCloud would need to go through my whole media library and verify that the file exists in S3 and then write some metadata so it knows to serve that file from S3. Is that similar to what wp mediacloud migrateS3Uploads does?



  • @jasondiff

    There's the Import from Cloud Storage that will do what you want. Just make sure to specify "Import Only" so it doesn't try to download anything. The function will match up what's on S3 with what's in your Media Library (well in normal situations anyways, not sure with WPEngine's largefs tbh).



  • Great! I'll give that a try. Does the importFromCloud CLI command have the page/limit/offset parameters to do it in batches?



  • I'm trying the WP-CLI command on an install with 117,000 images, it runs for about half an hour and then Putty gives an error message "Software cause connection abort". Trying it from the WP backend now but I don't think that will fare any better.



  • @jasondiff

    I think putty is closing the connection due to inactivity on your end.

    I'd recommend running the command in a tmux or screen session so if you do get disconnected, it'll still be running when you return.

    https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/

    You could also try nohup, eg nohup mediacloud whatever

    https://www.maketecheasier.com/nohup-and-uses/



  • @jasondiff

    nohup would actually be easiest I think.

    Once you start the command with nohup, it'll generate a log file called nohup.out. You can then watch the progress via tail:

    tail -f nohup.out



  • So I'm running it via nohup but what should I be seeing in the output? nohup.out is empty.
    Actually when I ran wp mediacloud importFromCloud directly from the command line I got no output either. Should it be logging every media file as it goes?



  • @jasondiff

    Yes, but if you have 112,000 images it's going to take time for Media Cloud to assemble the list of files to import.

    You should also be able to see the progress via WordPress admin in the "Import from ..." page.

    What service provider are you using? S3?



  • @jasondiff

    Do you still see the command running via:

    ps aux | grep mediacloud

    ?



  • I do see it running:

    wpe-user      45 17.0  0.8 562760 125940 pts/0   S    02:31   0:10 php /usr/local/bin/wp mediacloud importFromCloud --import-only --skip-thumbnails
    

    I also figured out how to set Putty to send keepalives to prevent the terminal from closing. So I'll let it run overnight and see what happens.



  • @jasondiff

    Hopefully WPEngine won't kill it.

    Let me know how it goes.



  • Looks like it was killed
    [1]+ Killed nohup wp mediacloud importFromCloud --import-only --skip-thumbnails

    Gonna have to contact WPEngine support and maybe they can run it.



  • @jasondiff

    Can you use the Storage Browser?



  • I asked WPEngine tech support to run the import command from their end. It's been running for 24 hours now. Does that seem reasonable for 117,000+ images?



  • @jasondiff

    Dunno, never run 117K images. Even though you --skip-thumbnails, Media Cloud still has to sort through all of that, so if it's 117K + thumbnails, I mean that's really 300K entries it has to filter.

    It could take awhile.

    I could add "filtering" which would allow you to run batches, but I can't get to that until tomorrow my time (I live in Vietnam, it's 11AM right now).



  • @jasondiff

    Does the Storage Browser load?



  • I can get the storage browser to load on the backend. I haven't tried importing from that screen.

    I think adding filtering would be ideal, then at least I can see how long it takes for say 100 files. Just so I have a feel for if it is working. No big rush on this, you are super responsive as it is so take your time!


Log in to reply