# What Can Daru Do For You?

Data Analysis in RUby

At my job, I started work on a client project that was rather number intensive. We were going to have to perform repetitive calculations over a dataset. A colleague introduced me to the concept of data frames and the Ruby gem Daru.

### What is Daru?

It’s a data analysis tool that lets us build tabular data sets which we can then manipulate and apply linear calculations. It’s also is a data visualization tool. It’s worth noting that Daru is quite similar in functionality to the pandas Python library.

You can do alot with Daru, but I’ll just be looking at a very small piece of it here. See Resources section below for a link to the docs.

### Vectors and DataFrames

#### Vector

A *Vector* is a one dimensional set of data, like an array. When I was working with Daru, I liked to think of a *Vector* as a single column from a spreadsheet. In this example, I am mocking 7 day price history of some imaginary product. Note that the *Vector* can be named. Also, note the numeric index.

#### DataFrame

A *DataFrame* on the other hand is two dimensional, like a spreadsheet. Expanding
on the example above, we can include a date column. We instantiate the
*DataFrame* with a hash of arrays. Each key of the hash is a column name and
the array contains the data. This is one of several ways to compose a
*DataFrame*. Another way is to use *Vectors* instead of arrays - *Vector*
indicies are lined up with each other.

### So I have a *DataFrame*…Now what?

We can perform analysis on the data frame like finding the mean, counts, min,
and max. Also we can find covariance and correlation bewtween *Vectors*. It’s
also possible to perform SQL like queries against the data. There is also
filtering and sorting…the list goes on and on. See the documentation for the details. Here, I’ll show a couple of examples that I found
useful.

#### Add a rolling mean column

Here is an example of adding a column to the *DataFrame* that is a rolling mean
calculation on the price column with a lookback of 7.

#### Do some arithmetic.

What if we want to calculate the price difference with a lookback of 7. Here is
one way we could solve it with the `lag`

method.

#### Join *DataFrames*

Let’s say we want to compare the prices of two products. We could join two
*DateFrames* together à la SQL.

### Conclusion

Daru is a powerful data analysis tool which I have barely scratched the surface. Some other things of note:

- Creating
*DataFrames*from CSVs or Excel files - Grouping and aggregating data
- Graceful handling of missing data (nils)
- Pivot tables
- Data visulization

There is much more to this library that what I have shown, so I encourage the reader to explore more. I have provided some links below to get started.