reading json to pandas DataFrame but with thousands of rows to pandas append
up vote
2
down vote
favorite
I have an text file, where each line I have cleansed up to be of a json format. I can read each line, clean them, and convert them into a panda dataframe.
My problem is that I want to add/combine them all into one dataframe, but there are more than 200k lines.
I am reading each line in as 'd' = '{"test1":"test2","data":{"key":{"isin":"test3"},"creationTimeStamp":1541491884194,"signal":0,"hPreds":[0,0,0,0],"bidPrice":6.413000,"preferredBidSize":1,"offerPrice":6.415000,"preferredOfferSize":1,"averageTradeSize":1029,"averageTradePrice":0.065252,"changedValues":true,"test4":10,"snapshot":false}}'
Assume I am able to convert each line into a panda... is there a way to append each line into the panda dataframe, such that it is very fast. Right now, with >200k lines, it takes hours to append... reading the file itself takes less than 5 min...
file ='fileName.txt'
with open(file) as f:
content = f.readlines()
content = [x.strip() for x in content]
data = pd.DataFrame()
count = 0
for line in content:
line = line.replace('{"string1','')
z = line.splitlines()
z[0] = z[0][:-1]
z = pd.read_json('[%s]' % ','.join(z))
data = data.append(z)
json python-3.x pandas
add a comment |
up vote
2
down vote
favorite
I have an text file, where each line I have cleansed up to be of a json format. I can read each line, clean them, and convert them into a panda dataframe.
My problem is that I want to add/combine them all into one dataframe, but there are more than 200k lines.
I am reading each line in as 'd' = '{"test1":"test2","data":{"key":{"isin":"test3"},"creationTimeStamp":1541491884194,"signal":0,"hPreds":[0,0,0,0],"bidPrice":6.413000,"preferredBidSize":1,"offerPrice":6.415000,"preferredOfferSize":1,"averageTradeSize":1029,"averageTradePrice":0.065252,"changedValues":true,"test4":10,"snapshot":false}}'
Assume I am able to convert each line into a panda... is there a way to append each line into the panda dataframe, such that it is very fast. Right now, with >200k lines, it takes hours to append... reading the file itself takes less than 5 min...
file ='fileName.txt'
with open(file) as f:
content = f.readlines()
content = [x.strip() for x in content]
data = pd.DataFrame()
count = 0
for line in content:
line = line.replace('{"string1','')
z = line.splitlines()
z[0] = z[0][:-1]
z = pd.read_json('[%s]' % ','.join(z))
data = data.append(z)
json python-3.x pandas
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have an text file, where each line I have cleansed up to be of a json format. I can read each line, clean them, and convert them into a panda dataframe.
My problem is that I want to add/combine them all into one dataframe, but there are more than 200k lines.
I am reading each line in as 'd' = '{"test1":"test2","data":{"key":{"isin":"test3"},"creationTimeStamp":1541491884194,"signal":0,"hPreds":[0,0,0,0],"bidPrice":6.413000,"preferredBidSize":1,"offerPrice":6.415000,"preferredOfferSize":1,"averageTradeSize":1029,"averageTradePrice":0.065252,"changedValues":true,"test4":10,"snapshot":false}}'
Assume I am able to convert each line into a panda... is there a way to append each line into the panda dataframe, such that it is very fast. Right now, with >200k lines, it takes hours to append... reading the file itself takes less than 5 min...
file ='fileName.txt'
with open(file) as f:
content = f.readlines()
content = [x.strip() for x in content]
data = pd.DataFrame()
count = 0
for line in content:
line = line.replace('{"string1','')
z = line.splitlines()
z[0] = z[0][:-1]
z = pd.read_json('[%s]' % ','.join(z))
data = data.append(z)
json python-3.x pandas
I have an text file, where each line I have cleansed up to be of a json format. I can read each line, clean them, and convert them into a panda dataframe.
My problem is that I want to add/combine them all into one dataframe, but there are more than 200k lines.
I am reading each line in as 'd' = '{"test1":"test2","data":{"key":{"isin":"test3"},"creationTimeStamp":1541491884194,"signal":0,"hPreds":[0,0,0,0],"bidPrice":6.413000,"preferredBidSize":1,"offerPrice":6.415000,"preferredOfferSize":1,"averageTradeSize":1029,"averageTradePrice":0.065252,"changedValues":true,"test4":10,"snapshot":false}}'
Assume I am able to convert each line into a panda... is there a way to append each line into the panda dataframe, such that it is very fast. Right now, with >200k lines, it takes hours to append... reading the file itself takes less than 5 min...
file ='fileName.txt'
with open(file) as f:
content = f.readlines()
content = [x.strip() for x in content]
data = pd.DataFrame()
count = 0
for line in content:
line = line.replace('{"string1','')
z = line.splitlines()
z[0] = z[0][:-1]
z = pd.read_json('[%s]' % ','.join(z))
data = data.append(z)
json python-3.x pandas
json python-3.x pandas
edited Nov 20 at 16:45
asked Nov 19 at 15:27
Kiann
1068
1068
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
You may check with Series
pd.Series(d)
Out[154]:
averageTradePrice 0.065
averageTradeSize 109
bidPrice 6.13
changedValues True
creationTimeStamp 15414994
Preds [0, 0, 0, 0]
key {'epic': 'XXX'}
dataLevel 10
offerPrice 3.333
dtype: object
Preds
and key
's value are list
and dict
, that is why when you pass it to DataFrame
it flag as :
ValueError: arrays must all be same length
Since you mention json
from pandas.io.json import json_normalize
json_normalize(d)
Out[157]:
Preds averageTradePrice ... key.epic offerPrice
0 [0, 0, 0, 0] 0.065 ... XXX 3.333
[1 rows x 9 columns]
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 at 16:23
@Kiann did you try json_normalize
– W-B
Nov 19 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 at 16:35
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
You may check with Series
pd.Series(d)
Out[154]:
averageTradePrice 0.065
averageTradeSize 109
bidPrice 6.13
changedValues True
creationTimeStamp 15414994
Preds [0, 0, 0, 0]
key {'epic': 'XXX'}
dataLevel 10
offerPrice 3.333
dtype: object
Preds
and key
's value are list
and dict
, that is why when you pass it to DataFrame
it flag as :
ValueError: arrays must all be same length
Since you mention json
from pandas.io.json import json_normalize
json_normalize(d)
Out[157]:
Preds averageTradePrice ... key.epic offerPrice
0 [0, 0, 0, 0] 0.065 ... XXX 3.333
[1 rows x 9 columns]
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 at 16:23
@Kiann did you try json_normalize
– W-B
Nov 19 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 at 16:35
add a comment |
up vote
0
down vote
You may check with Series
pd.Series(d)
Out[154]:
averageTradePrice 0.065
averageTradeSize 109
bidPrice 6.13
changedValues True
creationTimeStamp 15414994
Preds [0, 0, 0, 0]
key {'epic': 'XXX'}
dataLevel 10
offerPrice 3.333
dtype: object
Preds
and key
's value are list
and dict
, that is why when you pass it to DataFrame
it flag as :
ValueError: arrays must all be same length
Since you mention json
from pandas.io.json import json_normalize
json_normalize(d)
Out[157]:
Preds averageTradePrice ... key.epic offerPrice
0 [0, 0, 0, 0] 0.065 ... XXX 3.333
[1 rows x 9 columns]
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 at 16:23
@Kiann did you try json_normalize
– W-B
Nov 19 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 at 16:35
add a comment |
up vote
0
down vote
up vote
0
down vote
You may check with Series
pd.Series(d)
Out[154]:
averageTradePrice 0.065
averageTradeSize 109
bidPrice 6.13
changedValues True
creationTimeStamp 15414994
Preds [0, 0, 0, 0]
key {'epic': 'XXX'}
dataLevel 10
offerPrice 3.333
dtype: object
Preds
and key
's value are list
and dict
, that is why when you pass it to DataFrame
it flag as :
ValueError: arrays must all be same length
Since you mention json
from pandas.io.json import json_normalize
json_normalize(d)
Out[157]:
Preds averageTradePrice ... key.epic offerPrice
0 [0, 0, 0, 0] 0.065 ... XXX 3.333
[1 rows x 9 columns]
You may check with Series
pd.Series(d)
Out[154]:
averageTradePrice 0.065
averageTradeSize 109
bidPrice 6.13
changedValues True
creationTimeStamp 15414994
Preds [0, 0, 0, 0]
key {'epic': 'XXX'}
dataLevel 10
offerPrice 3.333
dtype: object
Preds
and key
's value are list
and dict
, that is why when you pass it to DataFrame
it flag as :
ValueError: arrays must all be same length
Since you mention json
from pandas.io.json import json_normalize
json_normalize(d)
Out[157]:
Preds averageTradePrice ... key.epic offerPrice
0 [0, 0, 0, 0] 0.065 ... XXX 3.333
[1 rows x 9 columns]
answered Nov 19 at 15:36
W-B
96.5k72962
96.5k72962
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 at 16:23
@Kiann did you try json_normalize
– W-B
Nov 19 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 at 16:35
add a comment |
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 at 16:23
@Kiann did you try json_normalize
– W-B
Nov 19 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 at 16:35
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 at 16:22
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 at 16:23
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 at 16:23
@Kiann did you try json_normalize
– W-B
Nov 19 at 16:34
@Kiann did you try json_normalize
– W-B
Nov 19 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 at 16:35
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 at 16:35
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377807%2freading-json-to-pandas-dataframe-but-with-thousands-of-rows-to-pandas-append%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown