filtering a df values within quotes











up vote
1
down vote

favorite












I am generating a df from command line result with code like below :-



df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))


and tehn converting it into a dataframe :-



df=pd.DataFrame(df_output_lines)
df


the data is in the below format :-



abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc


enter image description here



I want to filter it in a way so that value before : will be the column name and the values within the quotes " " be the value and same goes for all columns. The output should be like :-
enter image description here



As of now i am doing it the hard way :-



abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)


and then



abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])

abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])

abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)


Any suggestion for lambda expression or any one liner to do this.



My out put for the raw log is like below :-



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"









share|improve this question




















  • 1




    This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
    – ALollz
    Nov 19 at 16:15










  • this log is being generated on command line and i am capturing it in a data farme with code
    – ak333
    Nov 19 at 16:17






  • 2




    I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
    – Jon Clements
    Nov 19 at 16:20










  • @JonClements i have edited my question .
    – ak333
    Nov 19 at 16:21










  • @ALollz i have edited my question.
    – ak333
    Nov 19 at 16:21















up vote
1
down vote

favorite












I am generating a df from command line result with code like below :-



df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))


and tehn converting it into a dataframe :-



df=pd.DataFrame(df_output_lines)
df


the data is in the below format :-



abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc


enter image description here



I want to filter it in a way so that value before : will be the column name and the values within the quotes " " be the value and same goes for all columns. The output should be like :-
enter image description here



As of now i am doing it the hard way :-



abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)


and then



abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])

abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])

abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)


Any suggestion for lambda expression or any one liner to do this.



My out put for the raw log is like below :-



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"









share|improve this question




















  • 1




    This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
    – ALollz
    Nov 19 at 16:15










  • this log is being generated on command line and i am capturing it in a data farme with code
    – ak333
    Nov 19 at 16:17






  • 2




    I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
    – Jon Clements
    Nov 19 at 16:20










  • @JonClements i have edited my question .
    – ak333
    Nov 19 at 16:21










  • @ALollz i have edited my question.
    – ak333
    Nov 19 at 16:21













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I am generating a df from command line result with code like below :-



df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))


and tehn converting it into a dataframe :-



df=pd.DataFrame(df_output_lines)
df


the data is in the below format :-



abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc


enter image description here



I want to filter it in a way so that value before : will be the column name and the values within the quotes " " be the value and same goes for all columns. The output should be like :-
enter image description here



As of now i am doing it the hard way :-



abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)


and then



abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])

abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])

abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)


Any suggestion for lambda expression or any one liner to do this.



My out put for the raw log is like below :-



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"









share|improve this question















I am generating a df from command line result with code like below :-



df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))


and tehn converting it into a dataframe :-



df=pd.DataFrame(df_output_lines)
df


the data is in the below format :-



abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc


enter image description here



I want to filter it in a way so that value before : will be the column name and the values within the quotes " " be the value and same goes for all columns. The output should be like :-
enter image description here



As of now i am doing it the hard way :-



abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)


and then



abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])

abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])

abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)


Any suggestion for lambda expression or any one liner to do this.



My out put for the raw log is like below :-



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"






python python-3.x pandas dataframe lambda






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 17:53

























asked Nov 19 at 16:14









ak333

1508




1508








  • 1




    This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
    – ALollz
    Nov 19 at 16:15










  • this log is being generated on command line and i am capturing it in a data farme with code
    – ak333
    Nov 19 at 16:17






  • 2




    I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
    – Jon Clements
    Nov 19 at 16:20










  • @JonClements i have edited my question .
    – ak333
    Nov 19 at 16:21










  • @ALollz i have edited my question.
    – ak333
    Nov 19 at 16:21














  • 1




    This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
    – ALollz
    Nov 19 at 16:15










  • this log is being generated on command line and i am capturing it in a data farme with code
    – ak333
    Nov 19 at 16:17






  • 2




    I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
    – Jon Clements
    Nov 19 at 16:20










  • @JonClements i have edited my question .
    – ak333
    Nov 19 at 16:21










  • @ALollz i have edited my question.
    – ak333
    Nov 19 at 16:21








1




1




This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
– ALollz
Nov 19 at 16:15




This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
– ALollz
Nov 19 at 16:15












this log is being generated on command line and i am capturing it in a data farme with code
– ak333
Nov 19 at 16:17




this log is being generated on command line and i am capturing it in a data farme with code
– ak333
Nov 19 at 16:17




2




2




I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements
Nov 19 at 16:20




I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements
Nov 19 at 16:20












@JonClements i have edited my question .
– ak333
Nov 19 at 16:21




@JonClements i have edited my question .
– ak333
Nov 19 at 16:21












@ALollz i have edited my question.
– ak333
Nov 19 at 16:21




@ALollz i have edited my question.
– ak333
Nov 19 at 16:21












3 Answers
3






active

oldest

votes

















up vote
1
down vote



accepted










Feed list of dictionaries to pd.DataFrame



The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:



res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

print(res)

id instance time
0 3214039276626790405 (null) 08:59:38.000
1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000


It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.






share|improve this answer





















  • its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
    – ak333
    Nov 19 at 16:30










  • @ak333, I'm using your definition of abc and it works fine for me.
    – jpp
    Nov 19 at 16:31










  • i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
    – ak333
    Nov 19 at 16:32










  • @ak333, That's the problem, use abc as you defined in your question.
    – jpp
    Nov 19 at 16:35










  • I apologize..i got it. @jpp
    – ak333
    Nov 19 at 16:38


















up vote
0
down vote













Given your example input of:



time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"


Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:



import os
import shlex
import pandas as pd

rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]


This'll give you, for example rows[0] of:



['time:11:22:20.000',
'instance:(null)',
'id:723927731576482920',
'channel:sip:confctl.com',
'type:control',
'elapsedtime:0.000631',
'level:info',
'operation:Init',
'message:Initialize (version 4.9.0002.30618) ... ']


You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:



df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)


Giving you a df of:



            channel elapsedtime                  id               instance  level                                            message operation          time     type
0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control





share|improve this answer





















  • this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
    – ak333
    Nov 19 at 17:29






  • 1




    @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
    – Jon Clements
    Nov 19 at 17:31










  • I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
    – ak333
    Nov 19 at 17:34










  • I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
    – ak333
    Nov 19 at 17:40










  • @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
    – Jon Clements
    Nov 19 at 17:50


















up vote
0
down vote













Though the answer is already produced, However would like to add a regex base approach to achieve the same:



>>> abc
time instance id
0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"


Just applying regex=True within DataFrame.



>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
time instance id
0 08:59:38.000 null 3214039276626790405
1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405

OR

# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)


regex explanation:





  • 1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)


  • 2nd Alternative id: id: matches the characters id: literally (case sensitive)


  • 3rd Alternative time: time: matches the character time: literally (case sensitive)


  • 4th Alternative " matches the character " literally (case sensitive)


  • 5th Alternative [()]' Match a single character present in the list below [()]
    () matches a single character in the list () (case sensitive)








share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378687%2ffiltering-a-df-values-within-quotes%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    Feed list of dictionaries to pd.DataFrame



    The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:



    res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

    print(res)

    id instance time
    0 3214039276626790405 (null) 08:59:38.000
    1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
    2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000


    It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.






    share|improve this answer





















    • its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
      – ak333
      Nov 19 at 16:30










    • @ak333, I'm using your definition of abc and it works fine for me.
      – jpp
      Nov 19 at 16:31










    • i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
      – ak333
      Nov 19 at 16:32










    • @ak333, That's the problem, use abc as you defined in your question.
      – jpp
      Nov 19 at 16:35










    • I apologize..i got it. @jpp
      – ak333
      Nov 19 at 16:38















    up vote
    1
    down vote



    accepted










    Feed list of dictionaries to pd.DataFrame



    The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:



    res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

    print(res)

    id instance time
    0 3214039276626790405 (null) 08:59:38.000
    1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
    2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000


    It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.






    share|improve this answer





















    • its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
      – ak333
      Nov 19 at 16:30










    • @ak333, I'm using your definition of abc and it works fine for me.
      – jpp
      Nov 19 at 16:31










    • i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
      – ak333
      Nov 19 at 16:32










    • @ak333, That's the problem, use abc as you defined in your question.
      – jpp
      Nov 19 at 16:35










    • I apologize..i got it. @jpp
      – ak333
      Nov 19 at 16:38













    up vote
    1
    down vote



    accepted







    up vote
    1
    down vote



    accepted






    Feed list of dictionaries to pd.DataFrame



    The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:



    res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

    print(res)

    id instance time
    0 3214039276626790405 (null) 08:59:38.000
    1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
    2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000


    It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.






    share|improve this answer












    Feed list of dictionaries to pd.DataFrame



    The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:



    res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

    print(res)

    id instance time
    0 3214039276626790405 (null) 08:59:38.000
    1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
    2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000


    It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 19 at 16:22









    jpp

    87.1k194999




    87.1k194999












    • its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
      – ak333
      Nov 19 at 16:30










    • @ak333, I'm using your definition of abc and it works fine for me.
      – jpp
      Nov 19 at 16:31










    • i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
      – ak333
      Nov 19 at 16:32










    • @ak333, That's the problem, use abc as you defined in your question.
      – jpp
      Nov 19 at 16:35










    • I apologize..i got it. @jpp
      – ak333
      Nov 19 at 16:38


















    • its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
      – ak333
      Nov 19 at 16:30










    • @ak333, I'm using your definition of abc and it works fine for me.
      – jpp
      Nov 19 at 16:31










    • i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
      – ak333
      Nov 19 at 16:32










    • @ak333, That's the problem, use abc as you defined in your question.
      – jpp
      Nov 19 at 16:35










    • I apologize..i got it. @jpp
      – ak333
      Nov 19 at 16:38
















    its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
    – ak333
    Nov 19 at 16:30




    its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
    – ak333
    Nov 19 at 16:30












    @ak333, I'm using your definition of abc and it works fine for me.
    – jpp
    Nov 19 at 16:31




    @ak333, I'm using your definition of abc and it works fine for me.
    – jpp
    Nov 19 at 16:31












    i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
    – ak333
    Nov 19 at 16:32




    i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
    – ak333
    Nov 19 at 16:32












    @ak333, That's the problem, use abc as you defined in your question.
    – jpp
    Nov 19 at 16:35




    @ak333, That's the problem, use abc as you defined in your question.
    – jpp
    Nov 19 at 16:35












    I apologize..i got it. @jpp
    – ak333
    Nov 19 at 16:38




    I apologize..i got it. @jpp
    – ak333
    Nov 19 at 16:38












    up vote
    0
    down vote













    Given your example input of:



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"


    Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:



    import os
    import shlex
    import pandas as pd

    rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]


    This'll give you, for example rows[0] of:



    ['time:11:22:20.000',
    'instance:(null)',
    'id:723927731576482920',
    'channel:sip:confctl.com',
    'type:control',
    'elapsedtime:0.000631',
    'level:info',
    'operation:Init',
    'message:Initialize (version 4.9.0002.30618) ... ']


    You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:



    df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)


    Giving you a df of:



                channel elapsedtime                  id               instance  level                                            message operation          time     type
    0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
    1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
    2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
    3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
    4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control





    share|improve this answer





















    • this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
      – ak333
      Nov 19 at 17:29






    • 1




      @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
      – Jon Clements
      Nov 19 at 17:31










    • I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
      – ak333
      Nov 19 at 17:34










    • I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
      – ak333
      Nov 19 at 17:40










    • @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
      – Jon Clements
      Nov 19 at 17:50















    up vote
    0
    down vote













    Given your example input of:



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"


    Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:



    import os
    import shlex
    import pandas as pd

    rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]


    This'll give you, for example rows[0] of:



    ['time:11:22:20.000',
    'instance:(null)',
    'id:723927731576482920',
    'channel:sip:confctl.com',
    'type:control',
    'elapsedtime:0.000631',
    'level:info',
    'operation:Init',
    'message:Initialize (version 4.9.0002.30618) ... ']


    You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:



    df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)


    Giving you a df of:



                channel elapsedtime                  id               instance  level                                            message operation          time     type
    0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
    1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
    2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
    3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
    4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control





    share|improve this answer





















    • this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
      – ak333
      Nov 19 at 17:29






    • 1




      @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
      – Jon Clements
      Nov 19 at 17:31










    • I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
      – ak333
      Nov 19 at 17:34










    • I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
      – ak333
      Nov 19 at 17:40










    • @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
      – Jon Clements
      Nov 19 at 17:50













    up vote
    0
    down vote










    up vote
    0
    down vote









    Given your example input of:



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"


    Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:



    import os
    import shlex
    import pandas as pd

    rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]


    This'll give you, for example rows[0] of:



    ['time:11:22:20.000',
    'instance:(null)',
    'id:723927731576482920',
    'channel:sip:confctl.com',
    'type:control',
    'elapsedtime:0.000631',
    'level:info',
    'operation:Init',
    'message:Initialize (version 4.9.0002.30618) ... ']


    You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:



    df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)


    Giving you a df of:



                channel elapsedtime                  id               instance  level                                            message operation          time     type
    0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
    1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
    2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
    3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
    4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control





    share|improve this answer












    Given your example input of:



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"


    Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:



    import os
    import shlex
    import pandas as pd

    rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]


    This'll give you, for example rows[0] of:



    ['time:11:22:20.000',
    'instance:(null)',
    'id:723927731576482920',
    'channel:sip:confctl.com',
    'type:control',
    'elapsedtime:0.000631',
    'level:info',
    'operation:Init',
    'message:Initialize (version 4.9.0002.30618) ... ']


    You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:



    df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)


    Giving you a df of:



                channel elapsedtime                  id               instance  level                                            message operation          time     type
    0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
    1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
    2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
    3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
    4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 19 at 17:23









    Jon Clements

    97.7k19172216




    97.7k19172216












    • this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
      – ak333
      Nov 19 at 17:29






    • 1




      @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
      – Jon Clements
      Nov 19 at 17:31










    • I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
      – ak333
      Nov 19 at 17:34










    • I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
      – ak333
      Nov 19 at 17:40










    • @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
      – Jon Clements
      Nov 19 at 17:50


















    • this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
      – ak333
      Nov 19 at 17:29






    • 1




      @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
      – Jon Clements
      Nov 19 at 17:31










    • I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
      – ak333
      Nov 19 at 17:34










    • I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
      – ak333
      Nov 19 at 17:40










    • @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
      – Jon Clements
      Nov 19 at 17:50
















    this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
    – ak333
    Nov 19 at 17:29




    this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
    – ak333
    Nov 19 at 17:29




    1




    1




    @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
    – Jon Clements
    Nov 19 at 17:31




    @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
    – Jon Clements
    Nov 19 at 17:31












    I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
    – ak333
    Nov 19 at 17:34




    I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
    – ak333
    Nov 19 at 17:34












    I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
    – ak333
    Nov 19 at 17:40




    I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
    – ak333
    Nov 19 at 17:40












    @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
    – Jon Clements
    Nov 19 at 17:50




    @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
    – Jon Clements
    Nov 19 at 17:50










    up vote
    0
    down vote













    Though the answer is already produced, However would like to add a regex base approach to achieve the same:



    >>> abc
    time instance id
    0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
    1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
    2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"


    Just applying regex=True within DataFrame.



    >>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
    time instance id
    0 08:59:38.000 null 3214039276626790405
    1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
    2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405

    OR

    # abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)


    regex explanation:





    • 1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)


    • 2nd Alternative id: id: matches the characters id: literally (case sensitive)


    • 3rd Alternative time: time: matches the character time: literally (case sensitive)


    • 4th Alternative " matches the character " literally (case sensitive)


    • 5th Alternative [()]' Match a single character present in the list below [()]
      () matches a single character in the list () (case sensitive)








    share|improve this answer



























      up vote
      0
      down vote













      Though the answer is already produced, However would like to add a regex base approach to achieve the same:



      >>> abc
      time instance id
      0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
      1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
      2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"


      Just applying regex=True within DataFrame.



      >>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
      time instance id
      0 08:59:38.000 null 3214039276626790405
      1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
      2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405

      OR

      # abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)


      regex explanation:





      • 1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)


      • 2nd Alternative id: id: matches the characters id: literally (case sensitive)


      • 3rd Alternative time: time: matches the character time: literally (case sensitive)


      • 4th Alternative " matches the character " literally (case sensitive)


      • 5th Alternative [()]' Match a single character present in the list below [()]
        () matches a single character in the list () (case sensitive)








      share|improve this answer

























        up vote
        0
        down vote










        up vote
        0
        down vote









        Though the answer is already produced, However would like to add a regex base approach to achieve the same:



        >>> abc
        time instance id
        0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
        1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
        2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"


        Just applying regex=True within DataFrame.



        >>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
        time instance id
        0 08:59:38.000 null 3214039276626790405
        1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
        2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405

        OR

        # abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)


        regex explanation:





        • 1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)


        • 2nd Alternative id: id: matches the characters id: literally (case sensitive)


        • 3rd Alternative time: time: matches the character time: literally (case sensitive)


        • 4th Alternative " matches the character " literally (case sensitive)


        • 5th Alternative [()]' Match a single character present in the list below [()]
          () matches a single character in the list () (case sensitive)








        share|improve this answer














        Though the answer is already produced, However would like to add a regex base approach to achieve the same:



        >>> abc
        time instance id
        0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
        1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
        2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"


        Just applying regex=True within DataFrame.



        >>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
        time instance id
        0 08:59:38.000 null 3214039276626790405
        1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
        2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405

        OR

        # abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)


        regex explanation:





        • 1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)


        • 2nd Alternative id: id: matches the characters id: literally (case sensitive)


        • 3rd Alternative time: time: matches the character time: literally (case sensitive)


        • 4th Alternative " matches the character " literally (case sensitive)


        • 5th Alternative [()]' Match a single character present in the list below [()]
          () matches a single character in the list () (case sensitive)









        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 19 at 17:31

























        answered Nov 19 at 16:55









        pygo

        1,7391416




        1,7391416






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378687%2ffiltering-a-df-values-within-quotes%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Ottavio Pratesi

            Tricia Helfer

            15 giugno