Problem with attachments' character encoding using gmail gem in ruby/rails

up vote
2
down vote

favorite

What I am doing:
I am using the gmail gem in a Rails 4 app to get email attachments from a specific account at regular intervals. Here is an extract from the core part (here for simplicity only considering the first email and its first attachment):

require 'gmail'



Gmail.connect(@user_email,@user_password) do |gmail|

    if gmail.logged_in? 

        emails = gmail.inbox.emails(:from => @sender_email)

        email = emails[0]

        attachment = email.message.attachments[0]

        File.open("~/temp.csv", 'w') do |file| 

            file.write(

                StringIO.new(attachment.decoded.to_s[2..-2].force_encoding("ISO-8859-15").encode!('UTF-8')).read

            )

        end

    end

end

The encoding of the attached file can vary. The particular one that I am currently having issues with is in Finnish. It contains Finnish characters and a superscripted 3 character.

This is what I expect to get when I run the above code. (This is what I get when I download the attachment manually through gmail user interface):
This is what I expect to get (and what I get when I download the attachment manually)

What the problem is:

However, I am getting the following odd results.

From cat temp.csv (Looks good to me):
This is from a cat temp.csv (looks good)

With nano temp.csv (Here I have no idea what I am looking at):
This is what it looks like with nano temp.csv

This is what temp.csv looks like opened in Sublime Text (directly via winscp). First line and small parts look ok but then Chinese/Japanese characters:

This is what temp.csv looks like in Notepad (after download via winscp). Looks ok except a blank space has been inserted between each character and the new lines seems to be missing:

What I have tried:

I have without success tried:

.force_encoding(...) with all the different "ISO-8859-x" character sets

putting the force_encoding("ISO-8859-15").encode!('UTF-8') outside the .read (works but doesn't solve the problem)

encode to UTF-8 without first forcing another encoding but this leads to Encoding::UndefinedConversionError: "xC4" from ASCII-8BIT to UTF-8

writing as binary with 'wb' and 'w+b' in the File.open() (which oddly doesn't seem to make a difference to the outcome).

searching stackoverflow and the web for other ideas.

Any ideas would be much appreciated!

edited Nov 18 at 10:41

asked Nov 17 at 19:27

Morten Grum

3561414

add a comment |

up vote
2
down vote

favorite

require 'gmail'



Gmail.connect(@user_email,@user_password) do |gmail|

    if gmail.logged_in? 

        emails = gmail.inbox.emails(:from => @sender_email)

        email = emails[0]

        attachment = email.message.attachments[0]

        File.open("~/temp.csv", 'w') do |file| 

            file.write(

                StringIO.new(attachment.decoded.to_s[2..-2].force_encoding("ISO-8859-15").encode!('UTF-8')).read

            )

        end

    end

end

The encoding of the attached file can vary. The particular one that I am currently having issues with is in Finnish. It contains Finnish characters and a superscripted 3 character.

What the problem is:

However, I am getting the following odd results.

From cat temp.csv (Looks good to me):
This is from a cat temp.csv (looks good)

With nano temp.csv (Here I have no idea what I am looking at):
This is what it looks like with nano temp.csv

This is what temp.csv looks like opened in Sublime Text (directly via winscp). First line and small parts look ok but then Chinese/Japanese characters:

This is what temp.csv looks like in Notepad (after download via winscp). Looks ok except a blank space has been inserted between each character and the new lines seems to be missing:

What I have tried:

I have without success tried:

.force_encoding(...) with all the different "ISO-8859-x" character sets

putting the force_encoding("ISO-8859-15").encode!('UTF-8') outside the .read (works but doesn't solve the problem)

encode to UTF-8 without first forcing another encoding but this leads to Encoding::UndefinedConversionError: "xC4" from ASCII-8BIT to UTF-8

writing as binary with 'wb' and 'w+b' in the File.open() (which oddly doesn't seem to make a difference to the outcome).

searching stackoverflow and the web for other ideas.

Any ideas would be much appreciated!

edited Nov 18 at 10:41

asked Nov 17 at 19:27

Morten Grum

3561414

add a comment |

up vote
2
down vote

favorite

require 'gmail'



Gmail.connect(@user_email,@user_password) do |gmail|

    if gmail.logged_in? 

        emails = gmail.inbox.emails(:from => @sender_email)

        email = emails[0]

        attachment = email.message.attachments[0]

        File.open("~/temp.csv", 'w') do |file| 

            file.write(

                StringIO.new(attachment.decoded.to_s[2..-2].force_encoding("ISO-8859-15").encode!('UTF-8')).read

            )

        end

    end

end

The encoding of the attached file can vary. The particular one that I am currently having issues with is in Finnish. It contains Finnish characters and a superscripted 3 character.

What the problem is:

However, I am getting the following odd results.

From cat temp.csv (Looks good to me):
This is from a cat temp.csv (looks good)

With nano temp.csv (Here I have no idea what I am looking at):
This is what it looks like with nano temp.csv

This is what temp.csv looks like opened in Sublime Text (directly via winscp). First line and small parts look ok but then Chinese/Japanese characters:

This is what temp.csv looks like in Notepad (after download via winscp). Looks ok except a blank space has been inserted between each character and the new lines seems to be missing:

What I have tried:

I have without success tried:

.force_encoding(...) with all the different "ISO-8859-x" character sets

putting the force_encoding("ISO-8859-15").encode!('UTF-8') outside the .read (works but doesn't solve the problem)

encode to UTF-8 without first forcing another encoding but this leads to Encoding::UndefinedConversionError: "xC4" from ASCII-8BIT to UTF-8

writing as binary with 'wb' and 'w+b' in the File.open() (which oddly doesn't seem to make a difference to the outcome).

searching stackoverflow and the web for other ideas.

Any ideas would be much appreciated!

edited Nov 18 at 10:41

asked Nov 17 at 19:27

Morten Grum

3561414

require 'gmail'



Gmail.connect(@user_email,@user_password) do |gmail|

    if gmail.logged_in? 

        emails = gmail.inbox.emails(:from => @sender_email)

        email = emails[0]

        attachment = email.message.attachments[0]

        File.open("~/temp.csv", 'w') do |file| 

            file.write(

                StringIO.new(attachment.decoded.to_s[2..-2].force_encoding("ISO-8859-15").encode!('UTF-8')).read

            )

        end

    end

end

The encoding of the attached file can vary. The particular one that I am currently having issues with is in Finnish. It contains Finnish characters and a superscripted 3 character.

What the problem is:

However, I am getting the following odd results.

From cat temp.csv (Looks good to me):
This is from a cat temp.csv (looks good)

With nano temp.csv (Here I have no idea what I am looking at):
This is what it looks like with nano temp.csv

This is what temp.csv looks like opened in Sublime Text (directly via winscp). First line and small parts look ok but then Chinese/Japanese characters:

This is what temp.csv looks like in Notepad (after download via winscp). Looks ok except a blank space has been inserted between each character and the new lines seems to be missing:

What I have tried:

I have without success tried:

.force_encoding(...) with all the different "ISO-8859-x" character sets

putting the force_encoding("ISO-8859-15").encode!('UTF-8') outside the .read (works but doesn't solve the problem)

encode to UTF-8 without first forcing another encoding but this leads to Encoding::UndefinedConversionError: "xC4" from ASCII-8BIT to UTF-8

writing as binary with 'wb' and 'w+b' in the File.open() (which oddly doesn't seem to make a difference to the outcome).

searching stackoverflow and the web for other ideas.

Any ideas would be much appreciated!

ruby-on-rails ruby character-encoding gmail

edited Nov 18 at 10:41

asked Nov 17 at 19:27

Morten Grum

3561414

edited Nov 18 at 10:41

asked Nov 17 at 19:27

Morten Grum

3561414

edited Nov 18 at 10:41

asked Nov 17 at 19:27

Morten Grum

3561414

asked Nov 17 at 19:27

Morten Grum

3561414

asked Nov 17 at 19:27

Morten Grum

3561414

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

Not beautiful, but it will work for me now.

After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.

decoded_att = attachment.decoded

data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("rn", "n")



data_as_array = data.chars

data_as_array = data_as_array.delete_if {|i| i == "u0000" || i == "ÿ" || i == "þ"}

data = data_as_array.join('').to_s



File.write("~/temp.csv", data.to_s)

This will work for me now. However, I have no idea how these characters have ended up in the attachment ("ÿ" and "þ" in the start of the document and "u0000" between all remaining characters).

answered Nov 18 at 18:36

Morten Grum

3561414

add a comment |

up vote
0
down vote

It seems like you need to do attachment.body.decoded instead of attachment.decoded

answered Nov 18 at 21:11

Dorian

12.5k37383

Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53354765%2fproblem-with-attachments-character-encoding-using-gmail-gem-in-ruby-rails%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

Not beautiful, but it will work for me now.

After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.

decoded_att = attachment.decoded

data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("rn", "n")



data_as_array = data.chars

data_as_array = data_as_array.delete_if {|i| i == "u0000" || i == "ÿ" || i == "þ"}

data = data_as_array.join('').to_s



File.write("~/temp.csv", data.to_s)

answered Nov 18 at 18:36

Morten Grum

3561414

add a comment |

up vote
0
down vote

Not beautiful, but it will work for me now.

After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.

decoded_att = attachment.decoded

data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("rn", "n")



data_as_array = data.chars

data_as_array = data_as_array.delete_if {|i| i == "u0000" || i == "ÿ" || i == "þ"}

data = data_as_array.join('').to_s



File.write("~/temp.csv", data.to_s)

answered Nov 18 at 18:36

Morten Grum

3561414

add a comment |

up vote
0
down vote

Not beautiful, but it will work for me now.

After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.

decoded_att = attachment.decoded

data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("rn", "n")



data_as_array = data.chars

data_as_array = data_as_array.delete_if {|i| i == "u0000" || i == "ÿ" || i == "þ"}

data = data_as_array.join('').to_s



File.write("~/temp.csv", data.to_s)

answered Nov 18 at 18:36

Morten Grum

3561414

Not beautiful, but it will work for me now.

After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.

decoded_att = attachment.decoded

data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("rn", "n")



data_as_array = data.chars

data_as_array = data_as_array.delete_if {|i| i == "u0000" || i == "ÿ" || i == "þ"}

data = data_as_array.join('').to_s



File.write("~/temp.csv", data.to_s)

answered Nov 18 at 18:36

Morten Grum

3561414

answered Nov 18 at 18:36

Morten Grum

3561414

answered Nov 18 at 18:36

Morten Grum

3561414

answered Nov 18 at 18:36

Morten Grum

3561414

add a comment |

up vote
0
down vote

It seems like you need to do attachment.body.decoded instead of attachment.decoded

answered Nov 18 at 21:11

Dorian

12.5k37383

Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28

add a comment |

up vote
0
down vote

It seems like you need to do attachment.body.decoded instead of attachment.decoded

answered Nov 18 at 21:11

Dorian

12.5k37383

Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28

add a comment |

up vote
0
down vote

It seems like you need to do attachment.body.decoded instead of attachment.decoded

answered Nov 18 at 21:11

Dorian

12.5k37383

It seems like you need to do attachment.body.decoded instead of attachment.decoded

answered Nov 18 at 21:11

Dorian

12.5k37383

answered Nov 18 at 21:11

Dorian

12.5k37383

answered Nov 18 at 21:11

Dorian

12.5k37383

answered Nov 18 at 21:11

Dorian

12.5k37383

Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28

add a comment |

Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28

Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

QPLSBt8BF,81gM,uGs8P,0S9uDw5aXUibNpHoyKFn,A4GntC

搜尋此網誌

Nsryjdtyk