feat: auto content-type on blob creation#338
Conversation
Codecov Report
@@ Coverage Diff @@
## master #338 +/- ##
============================================
- Coverage 63.27% 63.20% -0.07%
- Complexity 609 619 +10
============================================
Files 32 32
Lines 5113 5123 +10
Branches 487 488 +1
============================================
+ Hits 3235 3238 +3
- Misses 1713 1720 +7
Partials 165 165
Continue to review full report at Codecov.
|
| import java.util.LinkedList; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import javax.activation.MimetypesFileTypeMap; |
There was a problem hiding this comment.
Use com.google.common.net.MediaType so you don't introduce another dependency, and ping the guava team to get this out of beta
There was a problem hiding this comment.
com.google.common.net.MediaType doesn't provide mapping extension to the meida type. Its parse method recognizes strings like "text/plain".
There was a problem hiding this comment.
What about Files.probeContentType(Path)?
There was a problem hiding this comment.
Both Files.probeContentType(Path) and URLConnection work! I will use it instead. Thanks for the hint.
google-cloud-storage/pom.xml
Outdated
| <groupId>org.threeten</groupId> | ||
| <artifactId>threetenbp</artifactId> | ||
| </dependency> | ||
| <dependency> |
There was a problem hiding this comment.
isn't this included in java 6 and later?
There was a problem hiding this comment.
it works with jdk7,8. This dependency is required for java9+
| return contentType; | ||
| } | ||
| return firstNonNull( | ||
| FILE_NAME_MAP.getContentTypeFor(object.getName().toLowerCase()), |
There was a problem hiding this comment.
toLowerCase always should have a locale
| case "create": | ||
| return storage.create(blobInfo); | ||
| case "writer": | ||
| { |
| String customType = "custom/type"; | ||
| BlobId blobId = BlobId.of(BUCKET, names[0]); | ||
| BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType(customType).build(); | ||
| blob = createBlob(method, blobInfo); |
There was a problem hiding this comment.
don't reuse local variable. There are two separate blob objects here that coincidentally share a variable.
There was a problem hiding this comment.
I had a reason to reuse the variable but after rewriting the test that reason is not actual anymore. Thanks for catching that.
| private void testAutoContentType(String method) throws IOException { | ||
| String[] names = {"file1.txt", "dir with spaces/Pic.Jpg", "no_extension"}; | ||
| String[] types = {"text/plain", "image/jpeg", "application/octet-stream"}; | ||
| Blob blob = null; |
There was a problem hiding this comment.
I think you can push this into the for loop
frankyn
left a comment
There was a problem hiding this comment.
Hi @dmitry-fa, thanks for addressing the request. Have two requests.
| } | ||
| } | ||
|
|
||
| private static String detectContentType(StorageObject object) { |
There was a problem hiding this comment.
This should be an optional feature that's disabled by default.
There was a problem hiding this comment.
I was thinking about making this feature optional, but I didn't find an appropriate place to store this setting. It could be a property of a project/client/bucket/object.
I only found: https://cloud.google.com/storage/docs/metadata#content-type
which states: If the Content-Type is not specified by the uploader and cannot be determined, it is set to application/octet-stream or application/x-www-form-urlencoded, depending on how you uploaded the object.
If I read this correctly, the Storage should attempt to determine the content type if not explicitly set, so this feature should be on by default.
BTW, right now the content type of blobs created with Storage.create() methods is null, but should be application/octet-stream.
There was a problem hiding this comment.
Storage doesn't provide contentType detection in the API.
If contentType is set at creation then application/octet-stream is used.
Please make this optional otherwise it's a behavior change not originally intended on uploads with the Java client.
| } | ||
|
|
||
| private void testAutoContentType(String method) throws IOException { | ||
| String[] names = {"file1.txt", "dir with spaces/Pic.Jpg", "no_extension"}; |
There was a problem hiding this comment.
Is there a way to infer the type based on bytes rather than file extension? That's how Golang does it.
There was a problem hiding this comment.
It's not always possible. In case of resumeable upload a blob is created first and the content is available later:
writer = storage.writer(blobInfo); // creating a blob, detecting content-type, no contect
writer.write(content);
writer.close();
I did an experiment:
I created a file PDF.txt with a pdf content. Then I uploaded it as PDF.jpg with various clients and checked the resulting content type. I got the following:
gsutil: text/plain
NodeJS: image/jpeg
Go: application/pdf
To achieve consistent behavior across all the clients it should be implemented on the server-side at the moment when the upload is finished.
Until this feature is implemented I suggest type detecting based on file extension, as it has been done recently in nodejs (googleapis/nodejs-storage#1190)
| } | ||
| } | ||
|
|
||
| private static String detectContentType(StorageObject object) { |
There was a problem hiding this comment.
Storage doesn't provide contentType detection in the API.
If contentType is set at creation then application/octet-stream is used.
Please make this optional otherwise it's a behavior change not originally intended on uploads with the Java client.
|
For the time being, objects created with Storage.create() method have content-type set to null if not explicitly set, not Is it possible to store custom properties on the project level? |
Whoops I meant, If contentType is not set at creation then application/octet-stream is used by the service. Please introduce this as a new option in the request instead of a client level option for now. |
|
Not sure if it will be convenient to specify 'I want to detect the content type' each time when a new object is created. The content type could be provided explicitly instead. Okay, I'll think about that. If the user specifies explicitly that he wants to detect the content type, he can also select a strategy for that: 'by extension', 'by content' or 'by both'. |
Fixes #47
Proposed fix of the auto detection of the content type based on the blob name, like other clients do (php, nodejs, go).
The class javax.activation.MimetypesFileTypeMap from JDK is used to detect the content type. Using this class with Java 11+ requires the dependency on Java Activation Framework.