I created a little PHP/SQL application at work, set up in a Docker container. Its purpose to read, record and display DMARC reports that get emailed to me. The reports are XML files compressed and attached to the emails.
The first part of the app is a simple file upload form where I feed it the files so it can extract the data. I set up the form to accept .zip files and .gz files, as the various organisations creating the the reports use one or the other, I also accept plain .xml if the file has apready been unzipped.
When I set up the server-side validation I made an array of accepted mime types to check against, initially this was:-
Which worked perfectly well for the three accepted types.
Now here is where the confusion begins. I made a copy of the application folder, took it home and rebult the same Docker container on my home PC. When I try uploading a file I get an invalid file type error. When I var_dump the upload to see what’s what, the mime types are reporting differently, Eg:-
The mime type shows as application/x-zip-compressed instead of application/zip, the .gz files show as application/x-gzip
I can of course add these types to my valid types array, but I’m really just curious why the same app running in the same contaner gives different results. Both PCs are running Docker Desktop on Windows 10. What is the difference?
The mime type is whatever the client says it is, it’s unreliable just like any other bit of user input. For windows, the type is whatever is configured in the registry, which might change depending on what software you have installed. Despite there being an official registry of mime types, things don’t always follow it.
IMO, you’d be better off using the file infoextension to try and determine the file type, and only consult the client-provided mime type if the result is inconclusive.
I think that was the flaw in my thinking. I wasn’t thinking of it as user input, coming from the client. But thought of it as an inherant property of the file itself, being queried by the server side. Yes, this is data submitted by a form from the client, that makes sense now.
Using mime_content_type() brings me back to getting application/zip which I guess actually is an inherant property of the file itself, being queried by the server side.
Thank you.
To be pedantic, files don’t really have an inherent mime type property stored anywhere. The way something like mime_content_type works is by inspecting the contents of the file and using a bunch of rules to determine what kind of file it is. Some file types make this easy by including magic bytes, others don’t and might not be detectable.
The important bit is you do that test yourself on the server rather than use the type submitted by the client.
This. Never rely on client input. Always make sure whatever they send is correct. You don’t want to end up storing an .exe (with viruses) file because the client claims it’s a PDF.
Yes, I’m all for validating user input. But while I’ve done plenty before with form processing, I’ve not done much with file uploads before, so in this instance I got my wires crossed and thought the type I got from the array was the server querying what the file is. I realise now that it is the client telling the server what it believes the file is, which has proven to be unreliable with two different clients giving different results for the same files.
With this I’m not storing the file, but extracting data from the file and storing the data, then discarding the file. Anything that isn’t what its supposed to be will throw an error. As in, what is not a Zip will fail to unzip, what isn’t XML will fail to parse as XML. But a lesson learnt for any other upload projects.