0% found this document useful (0 votes)
6 views4 pages

Unsupervised automatic tagging algorithms_ - Stack Overflow

Uploaded by

yida
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
6 views4 pages

Unsupervised automatic tagging algorithms_ - Stack Overflow

Uploaded by

yida
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 4

2024/12/21 09:10 machine learning - Unsupervised automatic tagging algorithms?

- Stack Overflow

Unsupervised automatic tagging algorithms?


Asked 11 years, 9 months ago Modified 4 years, 11 months ago Viewed 23k times
Part of NLP Collective

I want to build a web application that lets users upload documents, videos, images, music,
and then give them an ability to search them. Think of it as Dropbox + Semantic Search.
24 When user uploads a new file, e.g. Document1.docx, how could I automatically generate
tags based on the content of the file? In other words no user input is needed to determine
what the file is about. If suppose that Document1.docx is a research paper on data
mining, then when user searches for data mining, or research paper, or document1, that
file should be returned in search results, since data mining and research paper will most
likely be potential auto-generated tags for that given document.
1. Which algorithms would you recommend for this problem?
2. Is there an natural language library that could do this for me?
3. Which machine learning techniques should I look into to improve tagging
precision?
4. How could I extend this to video and image automatic tagging?
Thanks in advance!
algorithm machine-learning nlp tagging

Share Improve this question Follow asked Mar 13, 2013 at 4:48
Sahat Yalkabov
33.6k 44 113 176

How would you search for a video? Would you supply another video or would you (more naturally)
input a few words describing it. If the latter you're going to need some sort of user involvement in
tagging. – daniel gratzer Mar 13, 2013 at 4:54
I am pretty sure you can get a lot of literature by googling it. Because as far as I know, there are
even a bunch of research works out there about trying to tag videos automatically. Given that text
is much easier for machines to interpret than videos or images, I believe you can find what you
want on the website. But keep in mind, there is no perfect algorithm that can do the things
exactly what you expect. – yu239 Mar 13, 2013 at 4:56
@jozefg Two options in my mind right now: a) Either input a few keywords b) Extract audio
channel, analyze it for patterns. If speech, parse speech to text and extract relevant keywords. If
music, pass it to Echospirit for music identification. All other cases will result in no tags.
– Sahat Yalkabov Mar 13, 2013 at 5:16

https://stackoverflow.com/questions/15377290/unsupervised-automatic-tagging-algorithms 1/4
2024/12/21 09:10 machine learning - Unsupervised automatic tagging algorithms? - Stack Overflow

1 In other words, you want to build Google. I commend ambitious projects. – Blacksad Mar 13, 2013
at 17:01
Were you able to make it? – Dev_Man Jul 22 at 13:21

5 Answers Sorted by: Highest score (default)

The most common unsupervised machine learning model for this type of task is Latent
Dirichlet Allocation (LDA). This model automatically infers a collection of topics over a
21 corpus of documents based on the words in those documents. Running LDA on your set of
documents would assign words with probability to certain topics when you search for
them, and then you could retrieve the documents with the highest probabilities to be
relevant to that word.
There have been some extensions to images and music as well, see
http://cseweb.ucsd.edu/~dhu/docs/research_exam09.pdf.
LDA has several efficient implementations in several languages:
many implementations from the original researchers
http://mallet.cs.umass.edu/, written in Java and recommended by others on SO
PLDA: a fast, parallelized C++ implementation
Share Improve this answer Follow answered Mar 13, 2013 at 4:59
Andrew Mao
36.8k 24 147 228
4 As much as I am an LDA supporter, i dont think it the "topics" generated from LDA has any value
to produce any useable tags other than for WSI purpose. After generating the topics, an
intermediate step to map topics to a set of semantic/syntactic annotation is required to make a
knowledge-driven NLP annotation, especially one that previous researches had put so much
consideration to create. – alvas Mar 13, 2013 at 22:16
I agree with you, but I think that searching for words that are associated with certain topics can
also retrieve documents with high priors on certain topics that are similar. The OP will have to go
somewhere else for a much more rigorous treatment of this :) – Andrew Mao Mar 14, 2013 at 0:23

These guys propose an alternative to LDA.


5 Automatic Tag Recommendation Algorithms for Social Recommender Systems
http://research.microsoft.com/pubs/79896/tagging.pdf
Haven't read thru the whole paper but they have two algorithms:
1. Supervised learning version. This isn't that bad. You can use Wikipedia to train the
algorithm
https://stackoverflow.com/questions/15377290/unsupervised-automatic-tagging-algorithms 2/4
2024/12/21 09:10 machine learning - Unsupervised automatic tagging algorithms? - Stack Overflow

2. "Prototype" version. Haven't had a chance to go thru this but this is what they
recommend
UPDATE: I've researched this some more and I've found another approach. Basically, it's a
two-stage approach that's very simple to understand and implement. While too slow for
100,000s of documents, it (probably) has good performance for 1000s of docs (so it's
perfect for tagging a single user's documents). I'm going to try this approach and will
report back on performance/usability.
In the mean time, here's the approach:
1. Use TextRank as per http://qr.ae/36RAP to generate a tag list for a single document.
This generates a tag list for a single document independent of other documents.
2. Use the algorithm from "Using Machine Learning to Support Continuous Ontology
Development"
(https://www.researchgate.net/publication/221630712_Using_Machine_Learning_to_Su
pport_Continuous_Ontology_Development) to integrate the tag list (from step 1) into
the existing tag list.
Share Improve this answer edited Jul 2, 2018 at 15:14 answered Jan 26, 2015 at 15:53
Follow Toby U Avalos
2,294 4 25 34 6,778 8 50 83
No update then? The answers here are probably outdated by now written 4 years ago – borgr Aug
26, 2019 at 14:35

Text documents can be tagged using this keyphrase extraction algorithm/package.


http://www.nzdl.org/Kea/ Currently it supports limited type of documents (Agricultural and
1 medical I guess) but you can train it according to your requirements.
I'm not sure how would the image/video part work out, unless you're doing very accurate
object detection (which has it's own shortcomings). How are you planning to do it ?
Share Improve this answer Follow answered Jun 24, 2014 at 13:15
user3675152
23 5
If you have a question/need more information from the OP, you should first post a comment to the
question to get more information, then post an answer that you know will be relevant. – eddie_cat
Jun 24, 2014 at 13:32

You want Doc-Tags (https://www.Doc-Tags.com) which is a commercial product that


automatically and Unsupervised - generates Contextually Accurate Document Tags. The
1 built-in Reporting functionality makes the product a light-weight document management
system.
https://stackoverflow.com/questions/15377290/unsupervised-automatic-tagging-algorithms 3/4
2024/12/21 09:10 machine learning - Unsupervised automatic tagging algorithms? - Stack Overflow

For Developers wanting to customize their own approach - the source code is available
(very cheap) and the back-end service xAIgent (https://xAIgent.com) is very inexpensive to
use.
Share Improve this answer Follow answered Jan 15, 2020 at 15:45
Rod Miller
11 1

I posted a blog article today to answer your question.


0 http://scottge.net/2015/06/30/automatic-image-and-video-tagging/
There are basically two approaches to automatically extract keywords from images and
videos.
1. Multiple Instance Learning (MIL)
2. Deep Neural Networks (DNN), Recurrent Neural Networks (RNN), and the variants
In the above blog article, I list the latest research papers to illustrate the solutions. Some of
them even include demo site and source code.
Thanks, Scott
Share Improve this answer Follow answered Jul 1, 2015 at 20:41
Scott Ge
91 1 6

https://stackoverflow.com/questions/15377290/unsupervised-automatic-tagging-algorithms 4/4

You might also like