I am Abhishek

Name: Abhishek

Email: abhishek.f@northeastern.edu

Phone: (603) 800-3041

Skills
Python 100%
SQL 100%
JAVA 80%

About me

As a data scientist with a strong software engineering background, I specialize in creating scalable, data-driven solutions across various disciplines including machine learning, cloud computing, and database management. My expertise spans efficient data pipeline design, cloud platforms (Azure, AWS), and real-time monitoring systems using technologies like Kafka and Elasticsearch.
My career highlights include:

  • Leading the development of NLP-based chatbots at DataworksAI, utilizing LLMs within a Retrieval-Augmented Generation (RAG) pipeline.

  • Developing a deep learning model for the multiclass image classification project, optimizing performance through data augmentation.

  • I developed predictive models utilizing both classical and quantum machine learning techniques for classification.

I am motivated by the challenge of converting complex data into actionable insights, consistently creating innovative solutions that deliver concrete value in the field of data science.

Resume

Summary

Seasoned AI/ML professional with 5+ years of experience, combining data science and software engineering expertise. Specializes in NLP-driven chatbots, big data warehousing, ETL automation, cloud-based AI/ML solutions, and technical team leadership. Proven ability to transform advanced AI/ML concepts into practical, high-impact applications across diverse industries.

Education

Masters in Analytics

Major Concentration: Applied Machine Intelligence
2022 - 2024

Northeastern University, Boston, MA

Courses: Data Mining, Data Management & Big Data, Enterprise Analytics, Intermediate Analytics, Predictive Analytics, Fundamentals of AI, Applications of AI, AI System Technologies

GPA: 3.89

Bachelor of Engineering in Computer Science

2013 - 2017

Rajiv Gandhi Technical University (RGPV), Bhopal, India

Courses: Data Structures, Database Management, Design and Analysis of Algorithms & Object-Oriented Programming

Professional Experience

Research Engineer

2024 - Present

DataworksAI, Boston, MA

  • I applied advanced prompt engineering techniques, including few-shot learning, zero-shot learning, and Chain-of-Thought methodologies, to optimize large language model performance, resulting in 87% accuracy across various benchmarks.
  • My contributions to a conversational AI system included implementing comprehensive user and conversation tracking functionality within a FastAPI backend architecture. This system leveraged Python in conjunction with Neo4j graph database and Redis for efficient data management. By implementing reinforcement learning algorithms trained on user feedback data, I achieved a 17% improvement in contextual response accuracy.
  • I architected and developed a sophisticated Knowledge Graph utilizing LangGraph and Neo4j technologies, successfully integrating and structuring complex U.S. university enrollment datasets to enable advanced query capabilities.
  • For semantic understanding, I trained a custom Hugging Face Encoder model using PyTorch, enabling effective vectorization of tabular descriptions and user inputs. These embeddings were subsequently stored in MongoDB for efficient retrieval operations.
  • I spearheaded the deployment of multiple chatbot prototypes based on LlamaIndex and LangChain frameworks on Red Hat OpenShift AI infrastructure. Through rigorous A/B testing methodologies, I demonstrated a 43% enhancement in user satisfaction metrics. The implementation leveraged containerization via Docker, Linux system optimization, and Kubernetes orchestration to ensure scalability and streamlined CI/CD workflows.
  • Collaborated with analytics and AI professionals to identify automation opportunities and optimize LLM applications.

Senior Software Engineer

2018 - 2022

PowerSchool, LLC., Bengaluru, India

  • I spearheaded the construction of a high-performance data warehouse on Databricks that seamlessly integrated student information from multiple sources including Azure Data Lake, RESTful APIs, MongoDB, and AWS S3. This architectural improvement slashed data retrieval times by 30%.
  • By implementing RPA and Airflow automation for ETL processes across various portals, I accelerated project delivery timelines by 35%. The implementation was strengthened through Docker containerization and Git Actions CI/CD pipelines for consistent deployment.
  • I engineered sophisticated T-SQL stored procedures for SQL Server to enhance RPA project logging capabilities, resulting in a 40% boost in error identification and resolution efficiency.
  • My implementation of a real-time monitoring infrastructure using Kafka for event streaming and the Elasticsearch-Kibana stack improved automated ETL processes by 15% through enhanced visibility and proactive issue detection.
  • Leveraging AWS SageMaker alongside Python and Scikit-learn, I developed predictive models to forecast monthly CRM/ERP data migration requirements. The resulting insights, visualized through Tableau and Power BI dashboards, improved migration planning efficiency by 20%.
  • I created an advanced machine learning solution using PyTorch, Pillow, and TrOCR that accurately recognizes and validates handwritten English text and mathematical expressions for an EdTech platform. This innovation converted unstructured handwritten content into structured digital text, reducing teachers' workload by 40% when uploading test papers for online assessments.
  • As a technical mentor, I guided more than 20 new team members in problem-solving techniques and ETL pipeline automation within an Agile framework, utilizing JIRA for effective progress tracking and workload management.

Leveraged Technologies

8

Certifications

12

Projects

7

Recognition

2

Extra Curriculum Activities

Portfolio

University Explorer: Chatbot

PostgreSQL | LLMs | MongoDB | StreamLit |
Langchain

Out of Pattern Detection

Azure | Airflow | Kafka | ElasticSearch |
Kibana | Pandas | Numpy | Pyspark

Viral Rash Classification

Tensorflow | Pytorch | OpenCV | Pillow | Ultralytics | Numpy | Pandas | Seaborn

Glioma Grade Classification

Scikit-learn | Qiskit | Seaborn | Matplotlib

Message Distribution Analysis

Snowflake | Azure | Power BI | Python

NYC Trip Data Analysis

Pyspark | Scikit-learn | Tableau | Python

Boston Housing Dataset

AWS(S3) | Bokeh | Panel | Scikit-learn | Python

Contact

GitHub

Call Us

+1 (603) 800-3041

Email Us

abhishek.f@northeastern.edu

LinkedIn