Wednesday 26 February, 2025

Today, I discussed the report with my partner, and we agreed to add two questions to address in our analysis:

  • What is the difference in the number of police shootings between males and females? (How many shooting incidents involve each gender?)
  • What is the difference in police shootings across various racial groups? (How many shooting incidents involve each racial group?)

We then came up with the idea of using a stacked bar plot to visualize these two questions.

Wednesday 19 February, 2025

Today, I discussed my findings with my partner while exploring the relationship between gender and racial groups.

We brainstormed ideas on how to visualize these statistical numbers for our audience. We considered using a heatmap or a bar chart.

We also began writing the report, starting with the title and identifying the key issues we need to address.

Wednesday 12 February, 2025

Today, I conducted some statistical tests to evaluate whether there is a relationship between race and gender in police use of force. I aimed to determine whether certain racial groups have a higher-than-expected number of male or female victims. By this, I could indicate a racially disproportionate pattern in law enforcement interactions.

To do this, I used a contingency table with two categorical variables: race (rows) and gender (columns). Each cell in the table represents the frequency of incidents involving police shooting on a specific racial group and gender. I then calculated the expected frequency for each cell with the assumption that race and gender are independent and compared these expected values with the observed frequencies (cell values).

The results show that there are significant differences in how law enforcement engages with different racial communities, particularly among White, Black, and Hispanic men and women. These researches suggest potential disparities in police use of force based on both race and gender.

Wednesday February 5, 2025

Today, I discussed the number of police-involved shootings by station with my professor. I tried to implement a Pareto chart using a data group that included police station names and the number of incidents each station was involved in. However, unlike my professor, I sorted the data in ascending order (from the lowest to the highest number of shootings). Upon witnessing the chart, I realized that this approach was incorrect. To apply the Pareto chart effectively, the data must be sorted in descending order.

After that, my partner and I continued exploring the dataset and performed data munging by removing unnecessary columns such as race_source,latitude, longtitude, location_precision, name, id.  We noticed a gradual increase in the number of incidents from 2016 to the present. Probably we may need to investigate which features are associated with this rise over the years.

Wednesday January 29, 2025

Today, I engaged in a discussion with my teammate regarding the analysis of the fatal police shootings dataset:

We identified the following research questions for our analysis:

  1. Which agency IDs are associated with the highest number of fatal shooting incidents?
  2. What are the predominant race, age group, and threat type characteristics within these agency IDs that report the most fatal shooting incidents?

Based on guidance from our professor, we determined that it would be better to list all agencies, ranked by the number of shooting incidents attributed to each agency.

we implemented Python scripts to load the dataset and perform preliminary descriptive analyses. Specifically, we:

  • Extracted key columns, including state, threat_type, race, and agency_ids.
  • Assessed the dataset by determining the number of unique values within each column.
  • Calculated the frequency distribution of values within each column