I use ledger for expense categorization. I have essentially these categories as you can imagine, all predefined. And a whole bunch of past expenses which have been pre-categorized as well. I want to run the new expenses through a categorization algorithm, which can do a decent job at finding the right categories.
Note that a lot of expenses have only a few words, so categorizing them can be challenging. Also, I’d like to add a few words attaching them to categories myself, which ideally should have a higher weight compared to other trained words. Alternatively, I could just hard code them as the top result, and let a non-weighted algorithm run it’s course, picking the top few categories from it as well.
Which algorithm is best suited for this job? Naive Bayesian Classification with TF-IDF support? Any others?