Loading…
Back To Schedule
Friday, November 15 • 3:00pm - 3:30pm
Lessons Learnt Building Domain Specific NLP Pipelines

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

At Indix (acquired by Avalara), our goal was to build the "Google of Products". The product catalog currently has 3+ billion products which was amassed by crawling 5000+ retailer and brand web sites. Naturally, we needed a robust NLP pipeline to make sense of the unstructured text data at this scale. The first part of the talk will cover the evolution of the architecture, building blocks and algorithms of the NLP Pipeline. The building blocks I will cover are Language Models, Word Embeddings and Knowledge Graph. The algorithms I will cover will be classification, entity extraction, document similarity and query understanding (for e-commerce domain). Post acquisition by Avalara, the team was tasked to make sense of the unstructured text data in the Tax Compliance domain with limited data. The second part of the talk will focus on how we fine tuned the e-commerce NLP Pipeline and transferred our learnings from the e-commerce domain to the Tax Compliance domain.

Speakers
avatar for Rajesh Muppalla

Rajesh Muppalla

Senior Director of Engineering, Avalara
Sr. Director of Engineering at Avalara. Using AI to solve Tax automation. Previously co-Founder, Indix (acquired by Avalara), Tech Lead on go.cd. Topics - Machine Learning, Data Pipelines, Continuous Delivery, Mentoring


Friday November 15, 2019 3:00pm - 3:30pm PST
data