Predictive Model Details

Home
Approach
Predictive Model Details

Model information

Model Approaches:Predictive Model Notebook, Bias & Disparity Dection Engine (BDDE) Notebook
Circuits Used: CIRCUIT 06 - CLEARWATER, CIRCUIT 11 - MIAMI, CIRCUIT 17 - FT. LAUDERDALE
Crime Types Used: Drugs, Robbery, Burglary, Drivers Licenses
List of judges used: Florida Sentencing Details by Judge

Datasets Used

Number of Years in Dataset

Data Science Techniques Applied

Subject Matter Experts Involved

The Use of Judge Name as a Segment in our Predictive and Causal Models

Due to direct feedback from our subject matter experts, the predictive and causal modelling approaches have been run with and without judge information for comparison reasons.

We chose counties with enough judge information for our modelling purposes and noted that many of the judges names we misspelled, shortened or abbreviated. We have applied "fuzzy" search logic, which you can find in this notebook, to group these variations in judges names together.

Number of judges used: 3 (Andrews, Quesada, Peters)

Link to judge reference: Florida Sentencing Details by Judge

Challenges: We believe that OCR (or similar) was used to scan court documents as the team encountered several variations of judges names. Fuzzy logic (levenstein distance) was used to collate the judge names and sanity checks were made to ensure accuracy.

The Use of Segments for the Predictive and Causal Models

To narrow the scope for our MVP we segmented the three circuits and three crime types listed below

Circuits: CIRCUIT 06 - CLEARWATER, CIRCUIT 11 - MIAMI, CIRCUIT 17 - FT. LAUDERDALE
Crime Type: Drugs, Robbery and Traffic Control

Predictive Modelling:

We leveraged pycaret to evaluate 19 algorithms and of the 19 algorithms, the best models were selected by R2 score. Two sets of predictions on sentencing days were created with and without judge name as a variable.

Algorithms Used: huber, lightgbm, ridge regression

Model Inputs:

Age: 20-86 years of age
Gender: M/F
Race: Black/White

Model Output:Predicted Sentencing in Days

Assumptions: No omitted variables

Trade-offs: Model chosen with slight loss accuracy for faster training times

Challenges: Dataset was smaller for the 2nd and 3rd level crime types causing some anomalous predictions

Bias & Disparity Detection Engine (BDDE):

The Bias and Disparity Detection Engine (BDDE) is a public solution created by a team at IBM for analysis of bias and disparity in Federal US Sentencing Datasets. The BDDE teams goal is to isolate disparity in sentencing at the Federal level between Black vs. White defendants. The BDDE team accomplishes this by comparing the average months of sentencing against a historical profile of actual sentences for similar charges by racial demographics.

We include the BDDE approach and results here as a compliment to DAATE and for comparison resaons only.

Approach:

Running BDDE: For DAATE, a Bias and Disparity Detection Engine inference Docker container was created following the guidelines on the Bias & Disparity Detection Engine github page.

Circuit Used: CIRCUIT 11 - MIAMI
Model Inputs: A CSV was created from the Florida DOC dataset that was sent to the BDDE Inference in the created Docker container. This data includes:
- Crime Type: Drug Trafficking
- Amount: 6,7 (units not specified)
- Gender: M/F
- Race: Black/White
Model Output:Sentence in days
Assumptions:
- The data set used be BDDE is Circuit 11 - Miami
- We assumed a level of 6,7 for the amount of possession
Trade-offs:
- We are using an existing service so no control over functionality
Challenges:
- Documenation was lacking
- Unclear how to combine these results with our predictive model without "age" as a model input
- Number of sentencing days returned were much larger than expected