# Three quick python nuggets for beginner data scientists

This blog post gives an overview of 3 core Python features and utilities which you can adopt in your scripts to help write more concise and easier to understand code:

- List comprehensions
- Slicing
- collections.Counter

This post presents several examples that will help you refactor existing Python code using these features.

## List comprehensions

List comprehensions are a great tool when you need to build up a list of data. It is specifically useful when the resulting list is made up of a series of operations or when the resulting list is a result of another list. Take this example, which calculates the euclidean distance from a list of points using a traditional imperative-style programming approach by accumulating new elements into a resulting list.

```
import math
def get_distance(points):
diffs_squared_distance = []
for a, b in points:
diffs_squared_distance.append(pow(a - b, 2))
return math.sqrt(sum(diffs_squared_distance))
```

By using list comprehensions you can refactor this code in a more concise and declarative form:

```
def get_distance(points):
diffs_squared_distance = [pow(a - b, 2) for (a, b) in points]
return math.sqrt(sum(diffs_squared_distance))
```

The list comprehension is the following code: `[pow(a - b, 2) for (a, b) in points]`

It contains three parts:

- An input. Here it is the list of points.
- A variable representing the elements in the list. In this case it is a tuple
`(a, b)`

- An output expression producing the elements of the output list. In this case
`pow(a - b, 2)`

In other words, the code above says that given a list of points, you will produce `pow(a - b, 2)`

for each `(a, b)`

available in the list. The result is a list itself. Note that list comprehensions can also include an optional condition when generating the list. You can find more information in the Python documentation

## Slicing

Python provides a built-in feature that lets you select a subset of elements in a list. This is called slicing and you may find that useful when you are manipulating a data set where samples are stored as elements of a list.

For example:

- select values for specific features in a range
- discarding unnecessary data

```
def slice_list(data, start_index, end_index):
sliced_data = []
for i in range(start_index, end_index):
sliced_data.append(data[i])
return sliced_data
slice_list(data, 1, 4)
```

In python you can do exactly that using a built-in feature for slicing: `data[1:4]`

You can even ignore the end index to indicate that you want all the remaining elements in the list:

`data[1:]`

is equivalent to `data[1: len(data)]`

You may also wish to produce a list without the last element. You can do this by using -1 as an index too:

`data[:-1]`

is equivalent to `data[0: len(data) - 1]`

`collections.Counter`

It is a frequent task to provide summary information about a dataset. For example, you may wish to calculate the frequency of symbols in a data set because the symbols represent different categories in your data. Traditionally, you may implement this yourself. For example:

```
import collections
symbols = ['o', 'x', 'o', 'o', 'x', '-', '-', '-']
count = collections.defaultdict(int)
for s in symbols:
count[s] += 1
print(count)
```

This will produce the output `defaultdict(int, {'-': 3, 'o': 3, 'x': 2})`

Using the `collections.Counter`

utility class you can simply do:

`collections.Counter(symbols)`

which will return the following dictionary `Counter({'-': 3, 'o': 3, 'x': 2})`

However, the Counter utility provides more flexible operations. For example, you may wish to find the first two most common elements:

```
count = collections.Counter(symbols)
count.most_common(2)
```

which produces`[('-', 3), ('o', 3)]`

You can find the iPython notebook with the examples here.