New client onboarding is a tedious process. Starting with PDFs, images, Excel files, or even restaurant websites, the process of transcribing menus and uploading them to internal databases is a time consuming and monotonous task. This menu automation pipeline essentially takes in multimodal menu input and automates the entire process, converting an hour long task to a 30 second click of a button. Built out with agentic AI, this framework is flexible and scalable for many different types of input, enabling universal menu transcription.
Key skills:
• Agentic AI (n8n.io)
• NextJS
• Python, Flask
• Docker, Jenkins
As soon as I was given this problem statement, I took to the Internet to determine what the best tech stack would be. There were two main steps: extracting the text data from multimodal input and formatting that text data into a menu item. Upon researching, I came across agentic frameworks such as Autogen, LangGraph, and n8n.io. All of these provided the flexibility needed to handle different input types, which is why agentic frameworks served as a baseline for this project. Given that these tools are relatively new, I had to scroll through online forums and documentation to learn their ins and outs. Once I gained a solid understanding of how I could build these tools, I began implementing the agentic framework. The agents were equipped with various tools that could help with text extraction, but some of these tools were not readily available. As a result, I had to build out an API endpoint for image OCR, PDF extraction, and Excel extraction. Once these endpoints were deployed, the tool was almost ready. What was still missing? The usability. The backend worked flawlessly, and I could see the menus laid out correctly in JSON format. But, if we wanted to truly step in and make onboarding an efficient process, they needed a UI. As a result, my team and I built out a simple UI on Next.js, enabling them to easily upload a document or website, receive a nicely formatted online delivery page, and edit any inconsistencies they see in the data. Once they approve of the changes and click upload, the new client will successfully be registered in the company’s internal databases. This required another set of API integrations. Through more iteration with the onboarding team, this automation pipeline successfully turned a 6 hour process into a 30 second upload.