# Welcome!

#### Time Series Data Modeling Example for Cassandra

In this scenario, you will:

- Create tables for a time series use case
- Populate tables with sample time series data
- Design and execute CQL queries over time series data

*This scenario is also available on our datastax.com/dev site, where you can find many more resources to help you succeed with Apache Cassandra™.*

# Congratulations!

#### You've completed the scenario!

###### Scenario Rating

In this scenario, you explored:

- Schema design for a time series use case
- Sample time series data
- CQL queries over time series data

Your environment is currently being packaged as a Docker container and the download will begin shortly. To run the image locally, once Docker has been installed, use the commands

`cat scrapbook_datastax_cassandra-data-modeling/time-series-data_container.tar | docker load`

`docker run -it /datastax_cassandra-data-modeling/time-series-data:`

Oops!! Sorry, it looks like this scenario doesn't currently support downloads. We'll fix that shortly.

### Steps

### Time Series Data Modeling Example for Cassandra

#### Create a keyspace

Create the `time_series`

keyspace:

```
cqlsh -e "
CREATE KEYSPACE time_series
WITH replication = {
'class': 'NetworkTopologyStrategy',
'DC-Houston': 1 };"
```

#### Create tables

Create tables `sources_by_group`

, `metrics`

, `series_by_source_high`

,
`series_by_source_low`

, `series_by_metric_high`

, `series_by_metric_low`

and `statistics_by_source_metric`

:

```
cqlsh -e "
USE time_series;
CREATE TABLE sources_by_group (
group TEXT,
source TEXT,
characteristics MAP<TEXT,TEXT>,
description TEXT STATIC,
PRIMARY KEY ((group), source)
);
CREATE TABLE metrics (
bucket TEXT,
metric TEXT,
unit TEXT,
PRIMARY KEY ((bucket), metric)
);
CREATE TABLE series_by_source_high (
group TEXT,
source TEXT,
timestamp TIMESTAMP,
metric TEXT,
value DECIMAL,
PRIMARY KEY ((group, source), timestamp, metric)
) WITH CLUSTERING ORDER BY (timestamp DESC, metric ASC);
CREATE TABLE series_by_source_low (
group TEXT,
year INT,
source TEXT,
timestamp TIMESTAMP,
metric TEXT,
value DECIMAL,
PRIMARY KEY ((group, year), source, timestamp, metric)
) WITH CLUSTERING ORDER BY (source ASC, timestamp DESC, metric ASC);
CREATE TABLE series_by_metric_high (
group TEXT,
metric TEXT,
timestamp TIMESTAMP,
source TEXT,
value DECIMAL,
PRIMARY KEY ((group, metric), timestamp, source)
) WITH CLUSTERING ORDER BY (timestamp DESC, source ASC);
CREATE TABLE series_by_metric_low (
group TEXT,
year INT,
metric TEXT,
timestamp TIMESTAMP,
source TEXT,
value DECIMAL,
PRIMARY KEY ((group, year, metric), timestamp, source)
) WITH CLUSTERING ORDER BY (timestamp DESC, source ASC);
CREATE TABLE statistics_by_source_metric (
source TEXT,
metric TEXT,
date DATE,
min DECIMAL,
max DECIMAL,
median DECIMAL,
mean DECIMAL,
stdev DECIMAL,
PRIMARY KEY ((source,metric),date)
) WITH CLUSTERING ORDER BY (date DESC);"
```

#### Populate tables using DSBulk

Load data into table `sources_by_group`

:

```
dsbulk load -url sources.csv \
-k time_series \
-t sources_by_group \
-header true \
-logDir /tmp/logs
```

Retrieve rows from table `sources_by_group`

:

```
cqlsh -e "
SELECT group, source, description,
characteristics['Model number']
FROM time_series.sources_by_group;"
```

Load data into table `metrics`

:

```
dsbulk load -url metrics.csv \
-k time_series \
-t metrics \
-header true \
-logDir /tmp/logs
```

Retrieve rows from table `metrics`

:

```
cqlsh -e "SELECT * FROM time_series.metrics;"
```

Load data into tables `series_by_source_high`

and `series_by_metric_high`

:

```
dsbulk load -url series_high_resolution.csv \
-k time_series \
-t series_by_source_high \
-header true \
-logDir /tmp/logs
dsbulk load -url series_high_resolution.csv \
-k time_series \
-t series_by_metric_high \
-header true \
-logDir /tmp/logs
```

Retrieve rows from tables `series_by_source_high`

and `series_by_metric_high`

:

```
cqlsh -e "SELECT * FROM time_series.series_by_source_high LIMIT 5;"
cqlsh -e "SELECT * FROM time_series.series_by_metric_high LIMIT 5;"
```

Load data into tables `series_by_source_low`

and `series_by_metric_low`

:

```
dsbulk load -url series_low_resolution.csv \
-k time_series \
-t series_by_source_low \
-header true \
-logDir /tmp/logs
dsbulk load -url series_low_resolution.csv \
-k time_series \
-t series_by_metric_low \
-header true \
-logDir /tmp/logs
```

Retrieve rows from tables `series_by_source_low`

and `series_by_metric_low`

:

```
cqlsh -e "SELECT * FROM time_series.series_by_source_low LIMIT 5;"
cqlsh -e "SELECT * FROM time_series.series_by_metric_low LIMIT 5;"
```

Load data into table `statistics_by_source_metric`

:

```
dsbulk load -url statistics_by_source_metric.csv \
-k time_series \
-t statistics_by_source_metric \
-header true \
-logDir /tmp/logs
```

Retrieve rows from table `statistics_by_source_metric`

:

```
cqlsh -e "SELECT * FROM time_series.statistics_by_source_metric LIMIT 5;"
```

#### Start the CQL shell

Start the CQL shell and connect to the `time_series`

keyspace:

```
cqlsh -k time_series
```

#### Design query Q1

Find information about all data sources in group `House A`

:

## Solution

```
SELECT group, source, description,
characteristics['Model number']
FROM sources_by_group
WHERE group = 'House A';
```

#### Design query Q2

Find information about all metrics stored in bucket `all`

; order by metric name (asc):

## Solution

```
SELECT *
FROM time_series.metrics
WHERE bucket = 'all';
```

#### Design query Q3

Retrieve time series with a high resolution of 60 seconds for group `House A`

,
sources `Refrigerator A1`

and `Freezer A1`

, and time range [`2020-10-05 12:44:00`

,`2020-10-05 12:47:00`

];
order by timestamp (desc) and metric (asc):

## Solution

```
SELECT *
FROM time_series.series_by_source_high
WHERE group = 'House A'
AND source IN ('Refrigerator A1','Freezer A1')
AND timestamp >= '2020-10-05 12:44:00'
AND timestamp <= '2020-10-05 12:47:00';
```

#### Design query Q4

Retrieve time series with a low resolution of 60 minutes for group `House A`

,
sources `Refrigerator A1`

and `Freezer A1`

, and time range [`2020-10-05 12:00:00`

,`2020-10-05 15:00:00`

];
order by source (asc), timestamp (desc) and metric (asc):

## Solution

```
SELECT *
FROM time_series.series_by_source_low
WHERE group = 'House A'
AND year = 2020
AND source IN ('Refrigerator A1','Freezer A1')
AND timestamp >= '2020-10-05 12:00:00'
AND timestamp <= '2020-10-05 15:00:00';
```

#### Design query Q5

Retrieve time series with a high resolution of 60 seconds for metric `temperature`

,
group `House A`

and time range [`2020-10-04 23:59:00`

,`2020-10-05 00:01:00`

];
order by timestamp (desc) and source (asc):

## Solution

```
SELECT *
FROM time_series.series_by_metric_high
WHERE group = 'House A'
AND metric = 'temperature'
AND timestamp >= '2020-10-04 23:59:00'
AND timestamp <= '2020-10-05 00:01:00';
```

#### Design query Q6

Retrieve time series with a low resolution of 60 minutes for metric `temperature`

,
group `House A`

and time range [`2019-01-01 00:00:00`

,`2019-01-01 06:00:00`

];
order by timestamp (desc) and source (asc):

## Solution

```
SELECT *
FROM time_series.series_by_metric_low
WHERE group = 'House A'
AND year = 2019
AND metric = 'temperature'
AND timestamp >= '2019-01-01 00:00:00'
AND timestamp <= '2019-01-01 06:00:00';
```

#### Design query Q7

Find daily min, max, median, mean and standard deviation values for
a time series with source `Termostate A2`

, metric `humidity`

and
date range [`2019-12-25`

,`2020-01-07`

]; order by date (desc):

## Solution

```
SELECT *
FROM time_series.statistics_by_source_metric
WHERE source = 'Termostate A2'
AND metric = 'humidity'
AND date >= '2019-12-25'
AND date <= '2020-01-07';
```