Structure:

  1. Start a new project
  2. General Structure
  3. Profile.yml
  4. Project

1. Start a new project

In order to start a new dbt project just open the terminal.

  1. First we have to activate the virtualenv that has been installed in the last article. Go the folder with your virtual environment.

    source bin/activate
    
  2. Just go the folder where the dbt project should be saved.

  3. Initialize a new dbt project.

    dbt init my_first_dbt_project
    

    Now a new folder with the name “my_first_dbt_project” should be created.

  4. Let see what kind of folders and files has been created. The following folders are created:

    • analysis
    • data
    • macros
    • models
    • tests And two files are created:
    • dbt_project.yml
    • README.md

In the next section we will quickly go through the project structure of a dbt project.

2. General Structure:

In general in the dbt universe there exists two major .yml files. The first one is the profile.yml which defines all the connection details to the data warehouse. Further you can define (several) target schemas in the profile.yml. The second one is the project folder and in there the project.yml. This yml file specifies how dbt operates inside the specific project. There you can define which models are switched on/off and (to look it up). Further the project.yml holds a reference to the profile.yml and there to the defined data warehouse.

3. Profile.yml

The profile.yml holds all the connection details between dbt and the data warehouse. It can define several different connection details for different data warehouses. It can define multiple targets such as an productive environment and a development environment where the analyst can test and deploy new SQLs.

4. dbt Project

The projects contains all the .sql models, macros, test and .yml files that define the whole dbt project. Further dbt comes with some predefined folders for the models itself, the test you should write and the macros.

Project.yml

Contains all the configurations for the whole project. Such configurations include which folders to use, if models are disabled or not, a reference to the profile.yml and the like. Basically the dbt_project.yml file that you find in your sample project explains the structure of the file quiet well. Keep an eye on the profile tag that the name of your profile is correct. Further you can define more advanced features like seed tables (csv files that are loaded in each run) or pre/post hooks that are invoke in every run.

Models Folder:

Basically a model is a single SQL file. Each model gives as output exactly one table or view and thus contains one final SELECT statement that produces the table/view. But one model can be based on multiple models. As an example you can write several with statements inside one model and connect them in one final SELECT statement. (transformation rein)

The model folder can contain several subfolders to structure your code in a better way. For example you could have folders for the cleaning of data and folders for the reporting layer.

Analysis Folder:

Like the Models Folder the analysis folder holds just single SQL files with a SELECT statement. In the analysis folder you can save your analytical queries that you use for your data science or reporting requests.

Testing Folder:

The testing folder contains your custom sql test queries that you can define inside schema.yml files (especially schema.yml files in the models folder)

Macro Folder:

The macro folder contains the business logics models that are used multiple times in the project. Think about the DRY principle and try to isolate these code parts in one defined macro and call the macro from models in the models folder.