Crafting a Simple "Zero-Shot Classifier" Using an API – Seeking Your Insights!

I'm hoping you fine folks might be able to give me some guidance.

I have a collection of 700 categories, all potential classifications for articles. My current need is to create a system that can dynamically categorize short texts or articles according to these 700 categories.

I've been experimenting with a rudimentary approach using chatGPT to read the categories from a PDF via a plugin. The process is quite straightforward - I input the title and the first two lines of an article, and chatGPT does a fairly decent job of predicting the most fitting category.

The downside? I'm concerned about its scalability and economic viability. The current method might not work so well when we're talking about classifying a significant number of articles.

My question to you, my fellow AI enthusiasts: How would you approach designing a system, via an API, capable of doing this quickly and on a large scale?

I'm particularly curious about how to integrate my method with chatGPT using OpenAI's API. Is there a feature that allows the Language Learning Model (LLM) to retain the list of 700 categories in its memory so that I don't have to pass it every time? I'm aware that the billing structure is token-based, so it would be ideal to submit the categories once (or as few times as possible) and then pose a simple query like:

"Categorize this article based on the categories I previously gave you. Article title: 'Barbie vs Oppenheimer: Which Movie Will Garner Greater Success?'"

Ideally, I'd want this system to be persistently active and capable of processing countless queries over an extended period, say a month or a year.

So, any ideas on how to design such a system? There are undoubtedly numerous routes to take. I'm really just seeking some initial direction so that I can dive deeper into research on my own.

Thanks in advance for any insights you might provide!

submitted by /u/adv4nced
[link] [comments]