- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
- Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure
One example:
suppose there are two data sets and I’d like to do an inner join between these two tables, the generic coding logic might look like below:
data1= [['id1','Kevin'],['id2','John'],['id3','Mike']]
data2 = [['id1',31],['id2',28],['id3',34],]
join = []
for b in data2:
for a in data1:
if b[0]==a[0]:
list = [x for x in b]
list.append(a[1])
join.append(list)
for j in join:
print j
The output is as follows:
['id1', 31, 'Kevin']
['id2', 28, 'John']
['id3', 34, 'Mike']
With pandas, the code will be much shorter:
from pandas import *
new_data1=DataFrame(data1, columns = ['id','name'])
new_data2=DataFrame(data2, columns = ['id','age'])
join2 = merge(new_data1,new_data2, on = 'id', how='inner')
print join2
The output is as follows:
id name age
0 id1 Kevin 31
1 id2 John 28
2 id3 Mike 34
No comments:
Post a Comment