fix: fix duplicate

DiogoRibeiro7 · Oct 18, 2024 · dc11e6d · dc11e6d
1 parent f862366
commit dc11e6d
Show file tree

Hide file tree

Showing 239 changed files with 5,087 additions and 3,442 deletions.
diff --git a/_posts/2019-12-29-understanding_splines_what_they_how_they_used_data_analysis.md b/_posts/2019-12-29-understanding_splines_what_they_how_they_used_data_analysis.md
@@ -6,7 +6,9 @@ categories:
 - Machine Learning
 classes: wide
 date: '2019-12-29'
-excerpt: Splines are powerful tools for modeling complex, nonlinear relationships in data. In this article, we'll explore what splines are, how they work, and how they are used in data analysis, statistics, and machine learning.
+excerpt: Splines are powerful tools for modeling complex, nonlinear relationships
+  in data. In this article, we'll explore what splines are, how they work, and how
+  they are used in data analysis, statistics, and machine learning.
 header:
   image: /assets/images/data_science_19.jpg
   og_image: /assets/images/data_science_19.jpg
@@ -16,25 +18,30 @@ header:
   twitter_image: /assets/images/data_science_19.jpg
 keywords:
 - Splines
-- Spline Regression
-- Nonlinear Models
-- Data Smoothing
-- Statistical Modeling
-- python
-- bash
-- go
-seo_description: Splines are flexible mathematical tools used for smoothing and modeling complex data patterns. Learn what they are, how they work, and their practical applications in regression, data smoothing, and machine learning.
+- Spline regression
+- Nonlinear models
+- Data smoothing
+- Statistical modeling
+- Python
+- Bash
+- Go
+seo_description: Splines are flexible mathematical tools used for smoothing and modeling
+  complex data patterns. Learn what they are, how they work, and their practical applications
+  in regression, data smoothing, and machine learning.
 seo_title: What Are Splines? A Deep Dive into Their Uses in Data Analysis
 seo_type: article
-summary: Splines are flexible mathematical functions used to approximate complex patterns in data. They help smooth data, model non-linear relationships, and fit curves in regression analysis. This article covers the basics of splines, their various types, and their practical applications in statistics, data science, and machine learning.
+summary: Splines are flexible mathematical functions used to approximate complex patterns
+  in data. They help smooth data, model non-linear relationships, and fit curves in
+  regression analysis. This article covers the basics of splines, their various types,
+  and their practical applications in statistics, data science, and machine learning.
 tags:
 - Splines
 - Regression
-- Data Smoothing
-- Nonlinear Models
-- python
-- bash
-- go
+- Data smoothing
+- Nonlinear models
+- Python
+- Bash
+- Go
 title: 'Understanding Splines: What They Are and How They Are Used in Data Analysis'
 ---
 

diff --git a/_posts/2019-12-30-evaluating_binary_classifiers_imbalanced_datasets.md b/_posts/2019-12-30-evaluating_binary_classifiers_imbalanced_datasets.md
@@ -5,7 +5,9 @@ categories:
 - Machine Learning
 classes: wide
 date: '2019-12-30'
-excerpt: AUC-ROC and Gini are popular metrics for evaluating binary classifiers, but they can be misleading on imbalanced datasets. Discover why AUC-PR, with its focus on Precision and Recall, offers a better evaluation for handling rare events.
+excerpt: AUC-ROC and Gini are popular metrics for evaluating binary classifiers, but
+  they can be misleading on imbalanced datasets. Discover why AUC-PR, with its focus
+  on Precision and Recall, offers a better evaluation for handling rare events.
 header:
   image: /assets/images/data_science_8.jpg
   og_image: /assets/images/data_science_8.jpg
@@ -14,21 +16,28 @@ header:
   teaser: /assets/images/data_science_8.jpg
   twitter_image: /assets/images/data_science_8.jpg
 keywords:
-- AUC-PR
-- Precision-Recall
-- Binary Classifiers
-- Imbalanced Data
-- Machine Learning Metrics
-seo_description: When evaluating binary classifiers on imbalanced datasets, AUC-PR is a more informative metric than AUC-ROC or Gini. Learn why Precision-Recall curves provide a clearer picture of model performance on rare events.
+- Auc-pr
+- Precision-recall
+- Binary classifiers
+- Imbalanced data
+- Machine learning metrics
+seo_description: When evaluating binary classifiers on imbalanced datasets, AUC-PR
+  is a more informative metric than AUC-ROC or Gini. Learn why Precision-Recall curves
+  provide a clearer picture of model performance on rare events.
 seo_title: 'AUC-PR vs. AUC-ROC: Evaluating Classifiers on Imbalanced Data'
 seo_type: article
-summary: In this article, we explore why AUC-PR (Area Under Precision-Recall Curve) is a superior metric for evaluating binary classifiers on imbalanced datasets compared to AUC-ROC and Gini. We discuss how class imbalance distorts performance metrics and provide real-world examples of why Precision-Recall curves give a clearer understanding of model performance on rare events.
+summary: In this article, we explore why AUC-PR (Area Under Precision-Recall Curve)
+  is a superior metric for evaluating binary classifiers on imbalanced datasets compared
+  to AUC-ROC and Gini. We discuss how class imbalance distorts performance metrics
+  and provide real-world examples of why Precision-Recall curves give a clearer understanding
+  of model performance on rare events.
 tags:
-- Binary Classifiers
-- Imbalanced Data
-- AUC-PR
-- Precision-Recall
-title: 'Evaluating Binary Classifiers on Imbalanced Datasets: Why AUC-PR Beats AUC-ROC and Gini'
+- Binary classifiers
+- Imbalanced data
+- Auc-pr
+- Precision-recall
+title: 'Evaluating Binary Classifiers on Imbalanced Datasets: Why AUC-PR Beats AUC-ROC
+  and Gini'
 ---
 
 When working with binary classifiers, metrics like **AUC-ROC** and **Gini** have long been the default for evaluating model performance. These metrics offer a quick way to assess how well a model discriminates between two classes, typically a **positive class** (e.g., detecting fraud or predicting defaults) and a **negative class** (e.g., non-fraudulent or non-default cases). 

diff --git a/_posts/2019-12-31-deep_dive_into_why_multiple_imputation_indefensible.md b/_posts/2019-12-31-deep_dive_into_why_multiple_imputation_indefensible.md
@@ -4,7 +4,8 @@ categories:
 - Statistics
 classes: wide
 date: '2019-12-31'
-excerpt: Let's examine why multiple imputation, despite being popular, may not be as robust or interpretable as it's often considered. Is there a better approach?
+excerpt: Let's examine why multiple imputation, despite being popular, may not be
+  as robust or interpretable as it's often considered. Is there a better approach?
 header:
   image: /assets/images/data_science_20.jpg
   og_image: /assets/images/data_science_20.jpg
@@ -13,18 +14,22 @@ header:
   teaser: /assets/images/data_science_20.jpg
   twitter_image: /assets/images/data_science_20.jpg
 keywords:
-- multiple imputation
-- missing data
-- single stochastic imputation
-- deterministic sensitivity analysis
-seo_description: Exploring the issues with multiple imputation and why single stochastic imputation with deterministic sensitivity analysis is a superior alternative.
+- Multiple imputation
+- Missing data
+- Single stochastic imputation
+- Deterministic sensitivity analysis
+seo_description: Exploring the issues with multiple imputation and why single stochastic
+  imputation with deterministic sensitivity analysis is a superior alternative.
 seo_title: 'The Case Against Multiple Imputation: An In-depth Look'
 seo_type: article
-summary: Multiple imputation is widely regarded as the gold standard for handling missing data, but it carries significant conceptual and interpretative challenges. We will explore its weaknesses and propose an alternative using single stochastic imputation and deterministic sensitivity analysis.
+summary: Multiple imputation is widely regarded as the gold standard for handling
+  missing data, but it carries significant conceptual and interpretative challenges.
+  We will explore its weaknesses and propose an alternative using single stochastic
+  imputation and deterministic sensitivity analysis.
 tags:
-- Multiple Imputation
-- Missing Data
-- Data Imputation
+- Multiple imputation
+- Missing data
+- Data imputation
 title: A Deep Dive into Why Multiple Imputation is Indefensible
 ---
 

diff --git a/_posts/2020-01-01-causality_correlation.md b/_posts/2020-01-01-causality_correlation.md
@@ -4,7 +4,8 @@ categories:
 - Statistics
 classes: wide
 date: '2020-01-01'
-excerpt: Understand how causal reasoning helps us move beyond correlation, resolving paradoxes and leading to more accurate insights from data analysis.
+excerpt: Understand how causal reasoning helps us move beyond correlation, resolving
+  paradoxes and leading to more accurate insights from data analysis.
 header:
   image: /assets/images/data_science_4.jpg
   og_image: /assets/images/data_science_1.jpg
@@ -18,10 +19,14 @@ keywords:
 - Berkson's paradox
 - Correlation
 - Data science
-seo_description: Explore how causal reasoning, through paradoxes like Simpson's and Berkson's, can help us avoid the common pitfalls of interpreting data solely based on correlation.
+seo_description: Explore how causal reasoning, through paradoxes like Simpson's and
+  Berkson's, can help us avoid the common pitfalls of interpreting data solely based
+  on correlation.
 seo_title: 'Causality Beyond Correlation: Understanding Paradoxes and Causal Graphs'
 seo_type: article
-summary: An in-depth exploration of the limits of correlation in data interpretation, highlighting Simpson's and Berkson's paradoxes and introducing causal graphs as a tool for uncovering true causal relationships.
+summary: An in-depth exploration of the limits of correlation in data interpretation,
+  highlighting Simpson's and Berkson's paradoxes and introducing causal graphs as
+  a tool for uncovering true causal relationships.
 tags:
 - Simpson's paradox
 - Berkson's paradox
@@ -36,20 +41,41 @@ In today's data-driven world, we often rely on statistical correlations to make
 This article is aimed at anyone who works with data and is interested in gaining a more accurate understanding of how to interpret statistical relationships. Here, we will explore how to uncover **causal relationships** in data, how to resolve confusing situations like **Simpson's Paradox** and **Berkson's Paradox**, and how to use **causal graphs** as a tool for making better decisions. The goal is to demonstrate that by understanding causality, we can avoid the pitfalls of over-relying on correlation and make more informed decisions.
 
 ---
-
-## Correlation and Causation: Why the Distinction Matters
-
-In statistics, **correlation** measures the strength of a relationship between two variables. For example, if you observe that ice cream sales increase as temperatures rise, you might conclude that warmer weather causes more ice cream to be sold. This conclusion feels intuitive, but what about cases where the data is less obvious? Imagine a study finds a correlation between shark attacks and ice cream sales. Does one cause the other? Clearly not—but the correlation exists because both are influenced by a common factor: hot weather.
-
-This example underscores the central problem: **correlation does not imply causation**. Just because two variables move together doesn’t mean one causes the other. Correlation can arise for several reasons:
-
-- **Direct causality**: One variable causes the other.
-- **Reverse causality**: The relationship runs in the opposite direction.
-- **Confounding variables**: A third variable influences both.
-- **Coincidence**: The relationship is due to chance.
-
-To understand the true nature of relationships in data, we need to go beyond correlation and ask **why** the variables are related. This is where **causal inference** comes in.
-
+author_profile: false
+categories:
+- Statistics
+classes: wide
+date: '2020-01-01'
+excerpt: Understand how causal reasoning helps us move beyond correlation, resolving
+  paradoxes and leading to more accurate insights from data analysis.
+header:
+  image: /assets/images/data_science_4.jpg
+  og_image: /assets/images/data_science_1.jpg
+  overlay_image: /assets/images/data_science_4.jpg
+  show_overlay_excerpt: false
+  teaser: /assets/images/data_science_4.jpg
+  twitter_image: /assets/images/data_science_1.jpg
+keywords:
+- Simpson's paradox
+- Causality
+- Berkson's paradox
+- Correlation
+- Data science
+seo_description: Explore how causal reasoning, through paradoxes like Simpson's and
+  Berkson's, can help us avoid the common pitfalls of interpreting data solely based
+  on correlation.
+seo_title: 'Causality Beyond Correlation: Understanding Paradoxes and Causal Graphs'
+seo_type: article
+summary: An in-depth exploration of the limits of correlation in data interpretation,
+  highlighting Simpson's and Berkson's paradoxes and introducing causal graphs as
+  a tool for uncovering true causal relationships.
+tags:
+- Simpson's paradox
+- Berkson's paradox
+- Correlation
+- Data science
+- Causal inference
+title: 'Causality Beyond Correlation: Simpson''s and Berkson''s Paradoxes'
 ---
 
 ## The Importance of Causal Inference
@@ -61,23 +87,41 @@ In most real-world scenarios, we rely on **observational data**, which is data c
 Fortunately, researchers have developed methods to uncover causal relationships from observational data by combining **statistical reasoning** with a deep understanding of the data's context. This is where **causal graphs** and tools like **Simpson's Paradox** and **Berkson's Paradox** come into play.
 
 ---
-
-## Simpson's Paradox: The Danger of Aggregating Data
-
-Simpson's Paradox is a statistical phenomenon in which a trend that appears in different groups of data disappears or reverses when the groups are combined. This paradox occurs because of a **lurking confounder**, a variable that influences both the independent and dependent variables, skewing the relationship between them.
-
-### The Classic Example
-
-Imagine you're analyzing the effectiveness of a new drug across two groups: younger patients and older patients. Within each group, the drug seems to improve health outcomes. However, when you combine the two groups, the overall analysis shows that the drug is **less** effective.
-
-This reversal happens because age, a **confounding variable**, is driving the overall result. If more older patients received the drug and older patients have worse outcomes in general, it can skew the overall data. Thus, the combined analysis gives a misleading result, suggesting the drug is less effective when it actually benefits each group.
-
-### Why Does This Happen?
-
-Simpson’s Paradox occurs because the relationship between variables changes when data is aggregated. In the example above, **age** confounds the relationship between the drug and health outcomes. It’s important to note that combining data from different groups without accounting for confounders can hide the true relationships within each group.
-
-This paradox demonstrates why it’s crucial to understand the **story behind the data**. If we simply relied on the overall correlation, we would draw the wrong conclusion about the drug’s effectiveness.
-
+author_profile: false
+categories:
+- Statistics
+classes: wide
+date: '2020-01-01'
+excerpt: Understand how causal reasoning helps us move beyond correlation, resolving
+  paradoxes and leading to more accurate insights from data analysis.
+header:
+  image: /assets/images/data_science_4.jpg
+  og_image: /assets/images/data_science_1.jpg
+  overlay_image: /assets/images/data_science_4.jpg
+  show_overlay_excerpt: false
+  teaser: /assets/images/data_science_4.jpg
+  twitter_image: /assets/images/data_science_1.jpg
+keywords:
+- Simpson's paradox
+- Causality
+- Berkson's paradox
+- Correlation
+- Data science
+seo_description: Explore how causal reasoning, through paradoxes like Simpson's and
+  Berkson's, can help us avoid the common pitfalls of interpreting data solely based
+  on correlation.
+seo_title: 'Causality Beyond Correlation: Understanding Paradoxes and Causal Graphs'
+seo_type: article
+summary: An in-depth exploration of the limits of correlation in data interpretation,
+  highlighting Simpson's and Berkson's paradoxes and introducing causal graphs as
+  a tool for uncovering true causal relationships.
+tags:
+- Simpson's paradox
+- Berkson's paradox
+- Correlation
+- Data science
+- Causal inference
+title: 'Causality Beyond Correlation: Simpson''s and Berkson''s Paradoxes'
 ---
 
 ## Berkson's Paradox: The Pitfall of Selection Bias
@@ -99,39 +143,41 @@ Berkson's Paradox illustrates the problem of **selection bias**—when we restri
 The key takeaway from Berkson’s Paradox is that we need to be careful about **how we select data for analysis**. If we focus only on a specific group without understanding how that group was selected, we can introduce misleading correlations.
 
 ---
-
-## Causal Graphs: A Tool for Visualizing Relationships
-
-To avoid falling into the traps of Simpson’s and Berkson’s Paradoxes, it’s helpful to use **causal graphs** to visualize the relationships between variables. These graphs, also known as **Directed Acyclic Graphs (DAGs)**, allow us to represent the causal structure of a system and identify which variables are influencing others.
-
-### What Are Causal Graphs?
-
-A **causal graph** is a diagram that represents variables as **nodes** and the causal relationships between them as **directed edges** (arrows). A directed edge from variable **A** to variable **B** indicates that **A** has a causal influence on **B**.
-
-Causal graphs are powerful because they help us:
-
-1. **Identify confounders**: Variables that influence both the independent and dependent variables.
-2. **Clarify causal relationships**: Show which variables are direct causes and which are effects.
-3. **Avoid incorrect controls**: Help us decide which variables to control for in statistical analysis.
-
-### Using Causal Graphs to Resolve Simpson's Paradox
-
-Let’s return to the example of the drug trial. A causal graph for this scenario might look like this:
-
-- **Age** influences both **Drug Use** and **Health Outcome**.
-- **Drug Use** directly affects **Health Outcome**.
-
-In this case, **Age** is a **confounder** because it influences both the independent variable (**Drug Use**) and the dependent variable (**Health Outcome**). When we control for **Age**, we remove its confounding effect and can properly assess the impact of the drug on health outcomes.
-
-### Using Causal Graphs to Resolve Berkson's Paradox
-
-In the case of celebrities, a causal graph might look like this:
-
-- **Talent** and **Attractiveness** are independent in the general population.
-- **Celebrity Status** depends on both **Talent** and **Attractiveness**.
-
-Here, **Celebrity Status** is a **collider**, a variable that is influenced by both **Talent** and **Attractiveness**. When we condition on a collider (i.e., focus only on celebrities), we create a spurious correlation between **Talent** and **Attractiveness**. The key is to recognize that the negative correlation between these variables only exists because we have selected a specific subset of the population (celebrities), not because there is a true relationship between talent and attractiveness.
-
+author_profile: false
+categories:
+- Statistics
+classes: wide
+date: '2020-01-01'
+excerpt: Understand how causal reasoning helps us move beyond correlation, resolving
+  paradoxes and leading to more accurate insights from data analysis.
+header:
+  image: /assets/images/data_science_4.jpg
+  og_image: /assets/images/data_science_1.jpg
+  overlay_image: /assets/images/data_science_4.jpg
+  show_overlay_excerpt: false
+  teaser: /assets/images/data_science_4.jpg
+  twitter_image: /assets/images/data_science_1.jpg
+keywords:
+- Simpson's paradox
+- Causality
+- Berkson's paradox
+- Correlation
+- Data science
+seo_description: Explore how causal reasoning, through paradoxes like Simpson's and
+  Berkson's, can help us avoid the common pitfalls of interpreting data solely based
+  on correlation.
+seo_title: 'Causality Beyond Correlation: Understanding Paradoxes and Causal Graphs'
+seo_type: article
+summary: An in-depth exploration of the limits of correlation in data interpretation,
+  highlighting Simpson's and Berkson's paradoxes and introducing causal graphs as
+  a tool for uncovering true causal relationships.
+tags:
+- Simpson's paradox
+- Berkson's paradox
+- Correlation
+- Data science
+- Causal inference
+title: 'Causality Beyond Correlation: Simpson''s and Berkson''s Paradoxes'
 ---
 
 ## The Broader Implications of Causality in Data Analysis