Model information

Datasets Used

Number of Years in Dataset

Data Science Techniques Applied

Subject Matter Experts Involved

The Use of Judge Name as a Segment in our Predictive and Causal Models

Due to direct feedback from our subject matter experts, the predictive and causal modelling approaches have been run with and without judge information for comparison reasons.

We chose counties with enough judge information for our modelling purposes and noted that many of the judges names we misspelled, shortened or abbreviated. We have applied "fuzzy" search logic, which you can find in this notebook, to group these variations in judges names together.

  • Number of judges used: 3 (Andrews, Quesada, Peters)
  • Link to judge reference: Florida Sentencing Details by Judge
  • Challenges: We believe that OCR (or similar) was used to scan court documents as the team encountered several variations of judges names. Fuzzy logic (levenstein distance) was used to collate the judge names and sanity checks were made to ensure accuracy.

  • The Use of Segments for the Predictive and Causal Models

    To narrow the scope for our MVP we segmented the three circuits and three crime types listed below

    • Circuits: CIRCUIT 06 - CLEARWATER, CIRCUIT 11 - MIAMI, CIRCUIT 17 - FT. LAUDERDALE
    • Crime Type: Drugs, Robbery and Traffic Control


    Predictive Modelling:

    We leveraged pycaret to evaluate 19 algorithms and of the 19 algorithms, the best models were selected by R2 score. Two sets of predictions on sentencing days were created with and without judge name as a variable.

  • Algorithms Used: huber, lightgbm, ridge regression
  • Model Inputs:
    • Age: 20-86 years of age
    • Gender: M/F
    • Race: Black/White
  • Model Output:Predicted Sentencing in Days
  • Assumptions: No omitted variables
  • Trade-offs: Model chosen with slight loss accuracy for faster training times
  • Challenges: Dataset was smaller for the 2nd and 3rd level crime types causing some anomalous predictions


  • Bias & Disparity Detection Engine (BDDE):

    The Bias and Disparity Detection Engine (BDDE) is a public solution created by a team at IBM for analysis of bias and disparity in Federal US Sentencing Datasets. The BDDE teams goal is to isolate disparity in sentencing at the Federal level between Black vs. White defendants. The BDDE team accomplishes this by comparing the average months of sentencing against a historical profile of actual sentences for similar charges by racial demographics.


    We include the BDDE approach and results here as a compliment to DAATE and for comparison resaons only.

  • Approach:
  • Running BDDE: For DAATE, a Bias and Disparity Detection Engine inference Docker container was created following the guidelines on the Bias & Disparity Detection Engine github page.
    • Circuit Used: CIRCUIT 11 - MIAMI
    • Model Inputs: A CSV was created from the Florida DOC dataset that was sent to the BDDE Inference in the created Docker container. This data includes:
      • Crime Type: Drug Trafficking
      • Amount: 6,7 (units not specified)
      • Gender: M/F
      • Race: Black/White
    • Model Output:Sentence in days
    • Assumptions:
      • The data set used be BDDE is Circuit 11 - Miami
      • We assumed a level of 6,7 for the amount of possession
    • Trade-offs:
      • We are using an existing service so no control over functionality
    • Challenges:
      • Documenation was lacking
      • Unclear how to combine these results with our predictive model without "age" as a model input
      • Number of sentencing days returned were much larger than expected