`let strs = splitOn "," "bob,joe,nick"`

Many interesting analysis techniques can be used on a large corpus of **words** to examine the structure of a **sentence** or the contents of a book.

Base conversion Substring search (Boyer–Moore–Horspool, Rabin-Karp)Split a stringLongest common subsequence Phonetic codeEdit distance Jaro–Winkler distanceScraping text- Fixing
spelling mistakes

`let checksum = md5 file`

To summarize an item into a small and typically fixed length value, we apply a **hashing function** to it. This chapter will cover the following recipes.

- Hashing data
- MD5 and cryptographic checksums
- Using a hash table
- Google's CityHash
- Geohashing
- Bloom filter
- Perceptual hashing

`data Tree = Node v l r | Null`

Everything from creating simple **binary trees** to practical applications such as **Huffman trees** are covered in this section.

Binary tree Rose tree Depth-first traversalBreadth-first traversalHeight of a treeBinary search tree AVL tree Min-heap Huffman tree encoding and decoding

`type Graph = Table [Vertex]`

A graph allows for **representing network data** such as social networks, biological gene relationship, and road topologies. Graphs are very common in data analysis and this chapter will cover some **essential algorithms**.

List of edges Adjacency list Topological sort Depth first traversalBreadth first traversalVisualizing a graphDirected acyclic word graphsHexagonal and square gridsMaximal cliques

`let (b, m) = linearRegression xs ys`

This chapter contains recipes that answer questions about **data deviation from the norm**, **existence of linear and quadratic trends**, and **probabilistic values** of a network.

Moving average andmedian Linear andquadratic regression Covariance matrix Pearson correlation coefficientBayesian network Playing cards Markov chain N-grams Neural network perception

`let clusters = kmeans points`

Computer algorithms are becoming better and better at analyzing large data sets. As machines perform faster, so do their ability to detect **interesting patterns** in data.

K-means clusteringHierarchical clusteringNumber of clusters Parts of speech Training a parts of speech taggerWord lexemes clusteringVisualizing

`a <- rpar task1`

This chapter will cover **parallel** and **concurrent design**. Massive data analysis is a very real problem which this chapter will try to solve.

Benchmarking runtime - Evaluating in
parallel - Controlling algorithms in
sequence Forking IO Parallelizing pure functions Mapping in parallelAccessing tuple elements in parallelMapReduce

`plot X11 Data2D [Color Red] [] pts`

Visualizing data is important in all steps of data analysis. It is always useful to have an inutitive understanding so this chapter covers many ways to **graph data**.

- Plotting a line graph
- Plotting a pie-chart
- Plotting a bar graph
- Displaying a scatter plot
- Visualizing a graphical network
- Using D3.js