USE ARROW KEYS OR SPACEBAR TO NAVIGATE
polymath

I use

pure #Python, R, <XML>, <html5>, .css {} ANT / NANT Scripting <> {}

and MySQL.

Software Tools I use

Apache Spark , R Studio Jenkins, Hudson, Github, ISCSM Version Control, Bladelogic, BMC Tools HPSMClear Case, Delivery methodology tools, MS Office, Dreamweaver

AND
Looking for

Data Analyst Profile

Certifications

-• The Data Scientist’s Toolbox by Coursera (License Number: LBCSCRAVH5)

-• R Programming by Coursera (License Number: LFZRPVFLWF)

-• Scalable Machine Learning by edX

-• Introduction to Big Data with Apache Spark by edX

-• Barclays Digital Driving Licence by Barclays

-• Lean Competency by Barclays (LCS Level 1a)

-• Accenture Foundation training program by Accenture





positions of responsibilities

Youth360 (Startup)

Technical Head

(Jun ’11 – Oct’11)

Indian Grand Prix

Volunteer

(Oct’11-Nov’11)

Jaypee Mun

Technical Head

(Jan ’11 – Feb ’12)

here are

some of my projects

• Text Analysis and Entity Resolution

• Click-Through Rate Prediction Pipeline

• Predicting Movie Ratings

• Web Server Log Analysis

• Air Pollution Analyses of 332 locations

• Hospital Quality Analyses

• Most common words in the Complete Works of William Shakespeare

Text Analysis and Entity Resolution

with Apache Spark

• Constructed Created the project to study the volume of unstructured text in existence which is growing dramatically, and Apache Spark is an excellent tool for analyzing this type of data. Developing the task of finding records in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, databases).

• Have used powerful and scalable text analysis techniques to perform entity resolution across two data sets of commercial products. Data files for this project are from the metric-learning project.




Click-Through Rate Prediction Pipeline

with Apache Spark

• Created a project in which we analyzed a user and the page he is visiting, and the probability that he will click on a given ad.

• Worked with the Criteo Labs dataset that was used for a Kaggle competition

• Featurize categorical data using one-hot-encoding (OHE) and parsing CTR data ,generating OHE features

• Visualization of Feature frequency,ROC curve and Hyperparameter heat map

Predicting Movie Ratings

with the Apache Spark


• One of the most common uses of big data is to predict what users want. This allows Google to show you relevant ads, Amazon to recommend relevant products, and Netflix to recommend movies that you might like.

• Used a subset dataset of 500,000 ratings from the movielens 10M stable benchmark rating dataset. This project is divided in two parts:: Basic Recommendations and Collaborative Filtering

Web Server Log Analysis

with Apache Spark

• Creating the content size RDD and Computing the average content size with the reduce() operator

• Counting the response codes using a map-reduce pattern

• Filter out possible bad data and Converting RDD to DataFrame for easy data manipulation and visualization

• Visualizing response codes and changing the visualization of response codes to a pie chart






Air Pollution Analyses of 332 locations

with RStudio

• Used pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations

• Calculated mean of the pollutant across all monitors list and the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold.

Hospital Quality Analyses

with RStudio

• Plotting the 30-day mortality rates for heart

• Finding the best hospital in a state- returns the name of the hospital that has the best (i.e. lowest) 30-day mortality

• Ranking hospitals by outcome in a state

Most common words in the Complete Works of William Shakespeare

with the Apache Spark

• Created the project to study the volume of unstructured text in existence which is growing dramatically, and Apache Spark is an excellent tool for analyzing this type of data.

• Analyzed by creating base RDD and pair RDDS and applying mean value to the number of pair RDDs

education

• Bachelor of Technology

• CGPA (7.2/10) 77%

extracurriculars

• Received Algorithm Badge by HackerRank

• Coding enthusiast. Participant of various contest at HackerRank

• Core Organizing Team Member of JIIT Tresure Hunt


additional information

Process Optimization | Data Analysis Problems | Identification | Machine Learning | Business Systems analysis | Analyzing Results | Data Filtering


Contact Details

prarit.lamba[at]gmail[dot]com

praritlamba.github.io