#76 Differentiate enclosures based on content type.

Merged
fox merged 1 commits from tkappe/pullreq-enclosure-content-type into fox/master 1 year ago
tkappe commented 1 year ago

Some RSS feeds contain multiple enclosures with the same URL. When the first of these is not recognized as an image, later entries are not added to the database as rows in ttrss_enclosures. This change differentiates enclosures based on their content type, so an entry can have multiple enclosure types with the same URL (but possibly a different content type).

See https://www.japantimes.co.jp/feed/topstories for an example of a feed where the above situation occurs.

Some RSS feeds contain multiple enclosures with the same URL. When the first of these is not recognized as an image, later entries are not added to the database as rows in ttrss_enclosures. This change differentiates enclosures based on their content type, so an entry can have multiple enclosure types with the same URL (but possibly a different content type). See https://www.japantimes.co.jp/feed/topstories for an example of a feed where the above situation occurs.
fox commented 1 year ago
Owner

how and why can one url even have multiple content types? wtf

e: oh i see, it partially lacks content-type although maybe media:thumbnail doesn't require it (?). this could bloat up the database a bit tho.

how and why can one url even have multiple content types? wtf e: oh i see, it partially lacks content-type although maybe media:thumbnail doesn't require it (?). this could bloat up the database a bit tho.
tkappe commented 1 year ago
Poster

Yep, you got that right. Another solution could be to try and distill a content type from the "medium" attribute. I attached a patch that does this.

The problem with merging this patch and not the current pull request, though, is that the database will always contain the mime type of the last enclosure with that URL - even if the code does not detect a mime type for that enclosure.

Yep, you got that right. Another solution could be to try and distill a content type from the "medium" attribute. I attached a patch that does this. The problem with merging this patch and not the current pull request, though, is that the database will always contain the mime type of the last enclosure with that URL - even if the code does not detect a mime type for that enclosure.
fox commented 1 year ago
Owner

going by http://www.rssboard.org/media-rss#media-thumbnails looks like media:thumbnail doesn't require content-type after all (which makes sense, thumbnail must be an image i guess)

maybe better approach would be (also?) assigning some generic image/png content type to all media thumbnails?

currently if the feed only has media:thumbnail without content-type i don't think tt-rss will show it as image properly.


wait, i checked and that's what the parser is doing already (see for example classes/feeditem/atom.php:190). if its not recognized as an image afterwards this seems like a bug somewhere else.

going by http://www.rssboard.org/media-rss#media-thumbnails looks like media:thumbnail doesn't require content-type after all (which makes sense, thumbnail must be an image i guess) maybe better approach would be (also?) assigning some generic image/png content type to all media thumbnails? currently if the feed only has media:thumbnail without content-type i don't think tt-rss will show it as image properly. *** wait, i checked and that's what the parser is doing already (see for example classes/feeditem/atom.php:190). if its not recognized as an image afterwards this seems like a bug somewhere else.
fox commented 1 year ago
Owner

ah, i see. it goes by media:content first which can have bespoke medium="something" attribute instead of content type. aaargh.

i think i'll merge your PR instead of dealing with all this bullshit.

e: well i did both:

54727f9534

regardless of the above, your idea won't hurt i think.

ah, i see. it goes by media:content first which can have bespoke medium="something" attribute instead of content type. aaargh. i think i'll merge your PR instead of dealing with all this bullshit. e: well i did both: https://git.tt-rss.org/fox/tt-rss/commit/54727f9534f8b17b57d87577ed1ec0e8be8be0f3#diff-R117 regardless of the above, your idea won't hurt i think.
This pull request has been merged successfully!
Sign in to join this conversation.
No Label
No Milestone
No assignee
2 Participants
Loading...
Cancel
Save
There is no content yet.