Architecture details

  • Platform: Microsoft Azure
  • Main Components: Azure SQL, Azure Databricks, Azure ML Studio, Python, Visual Studio Code, ElasticSearch, Tableau, and GitHub Pages
  • DATTE Current Version: Most Viable Product (MVP) v.01
  • Last Update date: 13 April, 2022

DAATE MVP (v.01):

The DAATE MVP leverages Microsoft Azure Cloud infrastructure to host the synchronous pipeline and data ingestion and processing. The Florida Department of Corrections (DOC) data from 2004-2016 was used and stored in this infrastructure. This backend is used for processing the DOC data through various modelling techniques and is built for high scalability and modularity. The data and modelling results are accessed by Tableau and served on this site via GitHub pages. The pipeline and process consists of multiple stages:

  • Ingest the Florida Department of Corrections (DOC) data from 2004-2016 to Azure Blob Storage
  • Leverage Databricks to move to Azure SQL DB table
  • Elastic search, Azure ML and Python to perform EDA and create MVP Azure SQL table
  • Perform multi-modelling in Azure ML and Python updating the MPV Azure SQL table
  • Tableau is used to access MVP Azure SQL table to create dashboard
  • GitHub pages are used to serve up the DAATE website


  • DAATE Future state (v.02):

    The team sees many different areas that could be explored beyond the MVP in v.02 including:

  • Process DOC data from 2016-current year
  • Investigate the use of social feeds as data sources
  • Extend to other United States cities
  • Image processing of mug shots to determine if there are other factors that influence sentencing
  • Introduce natural language processing (NLP) of court transcripts
  • Partner with an organization that focuses on equity in the criminal justice system.