filtering a df values within quotes

up vote
1
down vote

favorite

I am generating a df from command line result with code like below :-

df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]

df_output_lines  = list(filter(None, df_output_lines))

and tehn converting it into a dataframe :-

df=pd.DataFrame(df_output_lines)

df

the data is in the below format :-

abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])

abc

enter image description here

I want to filter it in a way so that value before : will be the column name and the values within the quotes " " be the value and same goes for all columns. The output should be like :-
enter image description here

As of now i am doing it the hard way :-

abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)

and then

abc['time'] = abc['time'].map(lambda x: str(x)[:-1])

abc['time'] = abc['time'].map(lambda x: str(x)[6:])



abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])

abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])



abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)

Any suggestion for lambda expression or any one liner to do this.

My out put for the raw log is like below :-

    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"



 time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"

edited Nov 19 at 17:53

asked Nov 19 at 16:14

ak333

1508

1

This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
– ALollz
Nov 19 at 16:15

this log is being generated on command line and i am capturing it in a data farme with code
– ak333
Nov 19 at 16:17

2

I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements♦
Nov 19 at 16:20

@JonClements i have edited my question .
– ak333
Nov 19 at 16:21

@ALollz i have edited my question.
– ak333
Nov 19 at 16:21

|
show 3 more comments

up vote
1
down vote

favorite

I am generating a df from command line result with code like below :-

df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]

df_output_lines  = list(filter(None, df_output_lines))

and tehn converting it into a dataframe :-

df=pd.DataFrame(df_output_lines)

df

the data is in the below format :-

abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])

abc

enter image description here

As of now i am doing it the hard way :-

abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)

and then

abc['time'] = abc['time'].map(lambda x: str(x)[:-1])

abc['time'] = abc['time'].map(lambda x: str(x)[6:])



abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])

abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])



abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)

Any suggestion for lambda expression or any one liner to do this.

My out put for the raw log is like below :-

    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"



 time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"

edited Nov 19 at 17:53

asked Nov 19 at 16:14

ak333

1508

1

This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
– ALollz
Nov 19 at 16:15

this log is being generated on command line and i am capturing it in a data farme with code
– ak333
Nov 19 at 16:17

2

I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements♦
Nov 19 at 16:20

@JonClements i have edited my question .
– ak333
Nov 19 at 16:21

@ALollz i have edited my question.
– ak333
Nov 19 at 16:21

|
show 3 more comments

up vote
1
down vote

favorite

I am generating a df from command line result with code like below :-

df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]

df_output_lines  = list(filter(None, df_output_lines))

and tehn converting it into a dataframe :-

df=pd.DataFrame(df_output_lines)

df

the data is in the below format :-

abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])

abc

enter image description here

As of now i am doing it the hard way :-

abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)

and then

abc['time'] = abc['time'].map(lambda x: str(x)[:-1])

abc['time'] = abc['time'].map(lambda x: str(x)[6:])



abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])

abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])



abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)

Any suggestion for lambda expression or any one liner to do this.

My out put for the raw log is like below :-

    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"



 time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"

edited Nov 19 at 17:53

asked Nov 19 at 16:14

ak333

1508

I am generating a df from command line result with code like below :-

df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]

df_output_lines  = list(filter(None, df_output_lines))

and tehn converting it into a dataframe :-

df=pd.DataFrame(df_output_lines)

df

the data is in the below format :-

abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])

abc

enter image description here

As of now i am doing it the hard way :-

abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)

and then

abc['time'] = abc['time'].map(lambda x: str(x)[:-1])

abc['time'] = abc['time'].map(lambda x: str(x)[6:])



abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])

abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])



abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)

Any suggestion for lambda expression or any one liner to do this.

My out put for the raw log is like below :-

    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"



 time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"

python python-3.x pandas dataframe lambda

edited Nov 19 at 17:53

asked Nov 19 at 16:14

ak333

1508

edited Nov 19 at 17:53

asked Nov 19 at 16:14

ak333

1508

edited Nov 19 at 17:53

asked Nov 19 at 16:14

ak333

1508

asked Nov 19 at 16:14

ak333

1508

asked Nov 19 at 16:14

ak333

1508

1

This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
– ALollz
Nov 19 at 16:15

this log is being generated on command line and i am capturing it in a data farme with code
– ak333
Nov 19 at 16:17

2

I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements♦
Nov 19 at 16:20

@JonClements i have edited my question .
– ak333
Nov 19 at 16:21

@ALollz i have edited my question.
– ak333
Nov 19 at 16:21

|
show 3 more comments

1

This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
– ALollz
Nov 19 at 16:15

this log is being generated on command line and i am capturing it in a data farme with code
– ak333
Nov 19 at 16:17

2

I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements♦
Nov 19 at 16:20

@JonClements i have edited my question .
– ak333
Nov 19 at 16:21

@ALollz i have edited my question.
– ak333
Nov 19 at 16:21

This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
– ALollz
Nov 19 at 16:15

this log is being generated on command line and i am capturing it in a data farme with code
– ak333
Nov 19 at 16:17

I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements♦
Nov 19 at 16:20

@JonClements i have edited my question .
– ak333
Nov 19 at 16:21

@ALollz i have edited my question.
– ak333
Nov 19 at 16:21

|
show 3 more comments

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

Feed list of dictionaries to `pd.DataFrame`

The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:

res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])



print(res)



                    id                 instance          time

0  3214039276626790405                   (null)  08:59:38.000

1  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

2  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.

answered Nov 19 at 16:22

jpp

87.1k194999

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– ak333
Nov 19 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.
– jpp
Nov 19 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– ak333
Nov 19 at 16:32

@ak333, That's the problem, use abc as you defined in your question.
– jpp
Nov 19 at 16:35

I apologize..i got it. @jpp
– ak333
Nov 19 at 16:38

|
show 1 more comment

up vote
0
down vote

Given your example input of:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:

import os

import shlex

import pandas as pd



rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

This'll give you, for example rows[0] of:

['time:11:22:20.000',

 'instance:(null)',

 'id:723927731576482920',

 'channel:sip:confctl.com',

 'type:control',

 'elapsedtime:0.000631',

 'level:info',

 'operation:Init',

 'message:Initialize (version 4.9.0002.30618) ... ']

You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

Giving you a df of:

            channel elapsedtime                  id               instance  level                                            message operation          time     type

0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control

1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control

2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control

3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control

4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control

answered Nov 19 at 17:23

Jon Clements♦

97.7k19172216

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– ak333
Nov 19 at 17:29

1

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– ak333
Nov 19 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– ak333
Nov 19 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
– Jon Clements♦
Nov 19 at 17:50

|
show 3 more comments

up vote
0
down vote

Though the answer is already produced, However would like to add a regex base approach to achieve the same:

>>> abc

                  time                            instance                        id

0  time:"08:59:38.000"                   instance:"(null)"  id:"3214039276626790405"

1  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

2  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

Just applying regex=True within DataFrame.

>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)

           time               instance                   id

0  08:59:38.000                   null  3214039276626790405

1  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405

2  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405



OR   



# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)

regex explanation:

1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)

2nd Alternative id: id: matches the characters id: literally (case sensitive)

3rd Alternative time: time: matches the character time: literally (case sensitive)

4th Alternative " matches the character " literally (case sensitive)

5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)

edited Nov 19 at 17:31

answered Nov 19 at 16:55

pygo

1,7391416

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378687%2ffiltering-a-df-values-within-quotes%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

Feed list of dictionaries to `pd.DataFrame`

The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:

res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])



print(res)



                    id                 instance          time

0  3214039276626790405                   (null)  08:59:38.000

1  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

2  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.

answered Nov 19 at 16:22

jpp

87.1k194999

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– ak333
Nov 19 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.
– jpp
Nov 19 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– ak333
Nov 19 at 16:32

@ak333, That's the problem, use abc as you defined in your question.
– jpp
Nov 19 at 16:35

I apologize..i got it. @jpp
– ak333
Nov 19 at 16:38

|
show 1 more comment

up vote
1
down vote

accepted

Feed list of dictionaries to `pd.DataFrame`

The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:

res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])



print(res)



                    id                 instance          time

0  3214039276626790405                   (null)  08:59:38.000

1  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

2  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.

answered Nov 19 at 16:22

jpp

87.1k194999

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– ak333
Nov 19 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.
– jpp
Nov 19 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– ak333
Nov 19 at 16:32

@ak333, That's the problem, use abc as you defined in your question.
– jpp
Nov 19 at 16:35

I apologize..i got it. @jpp
– ak333
Nov 19 at 16:38

|
show 1 more comment

up vote
1
down vote

accepted

Feed list of dictionaries to `pd.DataFrame`

The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:

res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])



print(res)



                    id                 instance          time

0  3214039276626790405                   (null)  08:59:38.000

1  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

2  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.

answered Nov 19 at 16:22

jpp

87.1k194999

Feed list of dictionaries to `pd.DataFrame`

The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:

res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])



print(res)



                    id                 instance          time

0  3214039276626790405                   (null)  08:59:38.000

1  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

2  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.

answered Nov 19 at 16:22

jpp

87.1k194999

answered Nov 19 at 16:22

jpp

87.1k194999

answered Nov 19 at 16:22

jpp

87.1k194999

answered Nov 19 at 16:22

jpp

87.1k194999

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– ak333
Nov 19 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.
– jpp
Nov 19 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– ak333
Nov 19 at 16:32

@ak333, That's the problem, use abc as you defined in your question.
– jpp
Nov 19 at 16:35

I apologize..i got it. @jpp
– ak333
Nov 19 at 16:38

|
show 1 more comment

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– ak333
Nov 19 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.
– jpp
Nov 19 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– ak333
Nov 19 at 16:32

@ak333, That's the problem, use abc as you defined in your question.
– jpp
Nov 19 at 16:35

I apologize..i got it. @jpp
– ak333
Nov 19 at 16:38

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– ak333
Nov 19 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.
– jpp
Nov 19 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– ak333
Nov 19 at 16:32

@ak333, That's the problem, use abc as you defined in your question.
– jpp
Nov 19 at 16:35

I apologize..i got it. @jpp
– ak333
Nov 19 at 16:38

|
show 1 more comment

up vote
0
down vote

Given your example input of:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

import os

import shlex

import pandas as pd



rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

This'll give you, for example rows[0] of:

['time:11:22:20.000',

 'instance:(null)',

 'id:723927731576482920',

 'channel:sip:confctl.com',

 'type:control',

 'elapsedtime:0.000631',

 'level:info',

 'operation:Init',

 'message:Initialize (version 4.9.0002.30618) ... ']

You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

Giving you a df of:

            channel elapsedtime                  id               instance  level                                            message operation          time     type

0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control

1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control

2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control

3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control

4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control

answered Nov 19 at 17:23

Jon Clements♦

97.7k19172216

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– ak333
Nov 19 at 17:29

1

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– ak333
Nov 19 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– ak333
Nov 19 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
– Jon Clements♦
Nov 19 at 17:50

|
show 3 more comments

up vote
0
down vote

Given your example input of:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

import os

import shlex

import pandas as pd



rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

This'll give you, for example rows[0] of:

['time:11:22:20.000',

 'instance:(null)',

 'id:723927731576482920',

 'channel:sip:confctl.com',

 'type:control',

 'elapsedtime:0.000631',

 'level:info',

 'operation:Init',

 'message:Initialize (version 4.9.0002.30618) ... ']

You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

Giving you a df of:

            channel elapsedtime                  id               instance  level                                            message operation          time     type

0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control

1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control

2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control

3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control

4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control

answered Nov 19 at 17:23

Jon Clements♦

97.7k19172216

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– ak333
Nov 19 at 17:29

1

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– ak333
Nov 19 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– ak333
Nov 19 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
– Jon Clements♦
Nov 19 at 17:50

|
show 3 more comments

up vote
0
down vote

Given your example input of:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

import os

import shlex

import pandas as pd



rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

This'll give you, for example rows[0] of:

['time:11:22:20.000',

 'instance:(null)',

 'id:723927731576482920',

 'channel:sip:confctl.com',

 'type:control',

 'elapsedtime:0.000631',

 'level:info',

 'operation:Init',

 'message:Initialize (version 4.9.0002.30618) ... ']

You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

Giving you a df of:

            channel elapsedtime                  id               instance  level                                            message operation          time     type

0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control

1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control

2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control

3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control

4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control

answered Nov 19 at 17:23

Jon Clements♦

97.7k19172216

Given your example input of:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

import os

import shlex

import pandas as pd



rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

This'll give you, for example rows[0] of:

['time:11:22:20.000',

 'instance:(null)',

 'id:723927731576482920',

 'channel:sip:confctl.com',

 'type:control',

 'elapsedtime:0.000631',

 'level:info',

 'operation:Init',

 'message:Initialize (version 4.9.0002.30618) ... ']

You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

Giving you a df of:

            channel elapsedtime                  id               instance  level                                            message operation          time     type

0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control

1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control

2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control

3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control

4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control

answered Nov 19 at 17:23

Jon Clements♦

97.7k19172216

answered Nov 19 at 17:23

Jon Clements♦

97.7k19172216

answered Nov 19 at 17:23

Jon Clements♦

97.7k19172216

answered Nov 19 at 17:23

Jon Clements♦

97.7k19172216

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– ak333
Nov 19 at 17:29

1

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– ak333
Nov 19 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– ak333
Nov 19 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
– Jon Clements♦
Nov 19 at 17:50

|
show 3 more comments

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– ak333
Nov 19 at 17:29

1

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– ak333
Nov 19 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– ak333
Nov 19 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
– Jon Clements♦
Nov 19 at 17:50

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– ak333
Nov 19 at 17:29

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– ak333
Nov 19 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– ak333
Nov 19 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
– Jon Clements♦
Nov 19 at 17:50

|
show 3 more comments

up vote
0
down vote

Though the answer is already produced, However would like to add a regex base approach to achieve the same:

>>> abc

                  time                            instance                        id

0  time:"08:59:38.000"                   instance:"(null)"  id:"3214039276626790405"

1  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

2  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

Just applying regex=True within DataFrame.

>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)

           time               instance                   id

0  08:59:38.000                   null  3214039276626790405

1  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405

2  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405



OR   



# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)

regex explanation:

1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)

2nd Alternative id: id: matches the characters id: literally (case sensitive)

3rd Alternative time: time: matches the character time: literally (case sensitive)

4th Alternative " matches the character " literally (case sensitive)

5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)

edited Nov 19 at 17:31

answered Nov 19 at 16:55

pygo

1,7391416

add a comment |

up vote
0
down vote

Though the answer is already produced, However would like to add a regex base approach to achieve the same:

>>> abc

                  time                            instance                        id

0  time:"08:59:38.000"                   instance:"(null)"  id:"3214039276626790405"

1  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

2  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

Just applying regex=True within DataFrame.

>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)

           time               instance                   id

0  08:59:38.000                   null  3214039276626790405

1  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405

2  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405



OR   



# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)

regex explanation:

1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)

2nd Alternative id: id: matches the characters id: literally (case sensitive)

3rd Alternative time: time: matches the character time: literally (case sensitive)

4th Alternative " matches the character " literally (case sensitive)

5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)

edited Nov 19 at 17:31

answered Nov 19 at 16:55

pygo

1,7391416

add a comment |

up vote
0
down vote

Though the answer is already produced, However would like to add a regex base approach to achieve the same:

>>> abc

                  time                            instance                        id

0  time:"08:59:38.000"                   instance:"(null)"  id:"3214039276626790405"

1  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

2  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

Just applying regex=True within DataFrame.

>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)

           time               instance                   id

0  08:59:38.000                   null  3214039276626790405

1  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405

2  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405



OR   



# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)

regex explanation:

1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)

2nd Alternative id: id: matches the characters id: literally (case sensitive)

3rd Alternative time: time: matches the character time: literally (case sensitive)

4th Alternative " matches the character " literally (case sensitive)

5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)

edited Nov 19 at 17:31

answered Nov 19 at 16:55

pygo

1,7391416

Though the answer is already produced, However would like to add a regex base approach to achieve the same:

>>> abc

                  time                            instance                        id

0  time:"08:59:38.000"                   instance:"(null)"  id:"3214039276626790405"

1  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

2  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

Just applying regex=True within DataFrame.

>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)

           time               instance                   id

0  08:59:38.000                   null  3214039276626790405

1  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405

2  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405



OR   



# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)

regex explanation:

1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)

2nd Alternative id: id: matches the characters id: literally (case sensitive)

3rd Alternative time: time: matches the character time: literally (case sensitive)

4th Alternative " matches the character " literally (case sensitive)

5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)

edited Nov 19 at 17:31

answered Nov 19 at 16:55

pygo

1,7391416

edited Nov 19 at 17:31

answered Nov 19 at 16:55

pygo

1,7391416

answered Nov 19 at 16:55

pygo

1,7391416

answered Nov 19 at 16:55

pygo

1,7391416

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk