In this project, I developed a predictive model using logistic regression to classify breast cancer as malignant or benign based on diagnostic data. The dataset was preprocessed by handling missing values, dropping irrelevant columns, and converting categorical data into numerical form for better model performance.
Key Steps:
Data Preprocessing: Cleaned the dataset by removing unnecessary columns and handling missing values. Categorical variables were encoded into binary form to aid model understanding.
Exploratory Data Analysis: Utilized Seaborn and Matplotlib to explore data distributions and relationships, including a correlation matrix heatmap to identify important features.
Model Training: Applied a Logistic Regression model after scaling the features using StandardScaler to improve convergence. The dataset was split into training and testing sets to evaluate model performance.
Model Evaluation: Achieved an impressive accuracy of 98.2%. Evaluated the model using a confusion matrix, classification report, and ROC curve analysis, which showed strong predictive performance with an AUC score reflecting excellent discriminative ability.
Visualization & Insights:
Generated insightful visualizations including heatmaps and ROC curves that not only supported the model evaluation but also provided clear communication of the findings.
This project demonstrates my proficiency in data preprocessing, visualization, model building, and evaluation using Python libraries such as pandas, numpy, scikit-learn, seaborn, and matplotlib, showcasing my ability to deliver actionable insights and accurate predictions in healthcare analytics.…In this project, I developed a predictive model using logistic regression to classify breast cancer as malignant or benign based on diagnostic data. The dataset was preprocessed by handling missing values, dropping irrelevant columns, and converting categorical data into numerical form for better model performance.
Key Steps:
Data Preprocessing: Cleaned the dataset by removing unnecessary columns and handling missing values. Categorical variables were encoded into binary form to aid model understanding.
Exploratory Data Analysis: Utilized Seaborn and Matplotlib to explore data distributions and relationships, including a correlation matrix heatmap to identify important features.
Model Training: Applied a Logistic Regression model after scaling the features using StandardScaler to improve convergence. The dataset was split into training and testing sets to evaluate model performance.
Model Evaluation: Achieved an impressive accuracy of 98.2%. Evaluated the model using a confusion matrix, classification report, and ROC curve analysis, which showed strong predictive performance with an AUC score reflecting excellent discriminative ability.
Visualization & Insights:
Generated insightful visualizations including heatmaps and ROC curves that not only supported the model evaluation but also provided clear communication of the findings.
This project demonstrates my proficiency in data preprocessing, visualization, model building, and evaluation using Python libraries such as pandas, numpy, scikit-learn, seaborn, and matplotlib, showcasing my ability to deliver actionable insights and accurate predictions in healthcare analytics.WWWWWWWW…