During an internship in a tax consulting company, Leyton, I helped improve a pipeline that would allow a user to classify companies into industries based on the Global Industry Classification Standard, GICS.
The pipeline included web-scraping relevant information from the company's website, summarizing the text to meet the models max token capacity, translating the text in English (if not already), and finally feeding the text into the fine-tuned model to get the relevant industries the company is in.
Project included:
Web-scraping using Python
Summarization using HuggingFace model
Translation using Google translate
Finetuning model from HuggingFace from private labelled dataset.During an internship in a tax consulting company, Leyton, I helped improve a pipeline that would allow a user to classify companies into industries based on the Global Industry Classification Standard, GICS.
The pipeline included web-scraping relevant information from the company's website, summarizing the text to meet the models max token capacity, translating the text in English (if not already), and finally feeding the text into the fine-tuned model to get the relevant industries the company is in.
Project included:
Web-scraping using Python
Summarization using HuggingFace model
Translation using Google translate
Finetuning model from HuggingFace from private labelled dataset.WWWWWWWWDuring an internship in a tax consulting company, Leyton, I helped improve a pipeline that would allow a user to classify companies into industries based on the Global Industry Classification Standard, GICS.
The pipeline included web-scraping relevant information from the company's website, summarizing the text to meet the models max token capacity, translating the text in English (if not already), and finally feeding the text into the fine-tuned model to get the relevant industries the company is in.
Project included:
Web-scraping using Python
Summarization using HuggingFace model
Translation using Google translate
Finetuning model from HuggingFace from private labelled dataset.