Problem with attachments' character encoding using gmail gem in ruby/rails
up vote
2
down vote
favorite
What I am doing:
I am using the gmail gem in a Rails 4 app to get email attachments from a specific account at regular intervals. Here is an extract from the core part (here for simplicity only considering the first email and its first attachment):
require 'gmail'
Gmail.connect(@user_email,@user_password) do |gmail|
if gmail.logged_in?
emails = gmail.inbox.emails(:from => @sender_email)
email = emails[0]
attachment = email.message.attachments[0]
File.open("~/temp.csv", 'w') do |file|
file.write(
StringIO.new(attachment.decoded.to_s[2..-2].force_encoding("ISO-8859-15").encode!('UTF-8')).read
)
end
end
end
The encoding of the attached file can vary. The particular one that I am currently having issues with is in Finnish. It contains Finnish characters and a superscripted 3 character.
This is what I expect to get when I run the above code. (This is what I get when I download the attachment manually through gmail user interface):
What the problem is:
However, I am getting the following odd results.
From cat temp.csv
(Looks good to me):
With nano temp.csv
(Here I have no idea what I am looking at):
This is what temp.csv looks like opened in Sublime Text (directly via winscp). First line and small parts look ok but then Chinese/Japanese characters:
This is what temp.csv looks like in Notepad (after download via winscp). Looks ok except a blank space has been inserted between each character and the new lines seems to be missing:
What I have tried:
I have without success tried:
.force_encoding(...)
with all the different "ISO-8859-x" character sets- putting the
force_encoding("ISO-8859-15").encode!('UTF-8')
outside the.read
(works but doesn't solve the problem) - encode to UTF-8 without first forcing another encoding but this leads to
Encoding::UndefinedConversionError: "xC4" from ASCII-8BIT to UTF-8
- writing as binary with
'wb'
and'w+b'
in theFile.open()
(which oddly doesn't seem to make a difference to the outcome). - searching stackoverflow and the web for other ideas.
Any ideas would be much appreciated!
ruby-on-rails ruby character-encoding gmail
add a comment |
up vote
2
down vote
favorite
What I am doing:
I am using the gmail gem in a Rails 4 app to get email attachments from a specific account at regular intervals. Here is an extract from the core part (here for simplicity only considering the first email and its first attachment):
require 'gmail'
Gmail.connect(@user_email,@user_password) do |gmail|
if gmail.logged_in?
emails = gmail.inbox.emails(:from => @sender_email)
email = emails[0]
attachment = email.message.attachments[0]
File.open("~/temp.csv", 'w') do |file|
file.write(
StringIO.new(attachment.decoded.to_s[2..-2].force_encoding("ISO-8859-15").encode!('UTF-8')).read
)
end
end
end
The encoding of the attached file can vary. The particular one that I am currently having issues with is in Finnish. It contains Finnish characters and a superscripted 3 character.
This is what I expect to get when I run the above code. (This is what I get when I download the attachment manually through gmail user interface):
What the problem is:
However, I am getting the following odd results.
From cat temp.csv
(Looks good to me):
With nano temp.csv
(Here I have no idea what I am looking at):
This is what temp.csv looks like opened in Sublime Text (directly via winscp). First line and small parts look ok but then Chinese/Japanese characters:
This is what temp.csv looks like in Notepad (after download via winscp). Looks ok except a blank space has been inserted between each character and the new lines seems to be missing:
What I have tried:
I have without success tried:
.force_encoding(...)
with all the different "ISO-8859-x" character sets- putting the
force_encoding("ISO-8859-15").encode!('UTF-8')
outside the.read
(works but doesn't solve the problem) - encode to UTF-8 without first forcing another encoding but this leads to
Encoding::UndefinedConversionError: "xC4" from ASCII-8BIT to UTF-8
- writing as binary with
'wb'
and'w+b'
in theFile.open()
(which oddly doesn't seem to make a difference to the outcome). - searching stackoverflow and the web for other ideas.
Any ideas would be much appreciated!
ruby-on-rails ruby character-encoding gmail
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
What I am doing:
I am using the gmail gem in a Rails 4 app to get email attachments from a specific account at regular intervals. Here is an extract from the core part (here for simplicity only considering the first email and its first attachment):
require 'gmail'
Gmail.connect(@user_email,@user_password) do |gmail|
if gmail.logged_in?
emails = gmail.inbox.emails(:from => @sender_email)
email = emails[0]
attachment = email.message.attachments[0]
File.open("~/temp.csv", 'w') do |file|
file.write(
StringIO.new(attachment.decoded.to_s[2..-2].force_encoding("ISO-8859-15").encode!('UTF-8')).read
)
end
end
end
The encoding of the attached file can vary. The particular one that I am currently having issues with is in Finnish. It contains Finnish characters and a superscripted 3 character.
This is what I expect to get when I run the above code. (This is what I get when I download the attachment manually through gmail user interface):
What the problem is:
However, I am getting the following odd results.
From cat temp.csv
(Looks good to me):
With nano temp.csv
(Here I have no idea what I am looking at):
This is what temp.csv looks like opened in Sublime Text (directly via winscp). First line and small parts look ok but then Chinese/Japanese characters:
This is what temp.csv looks like in Notepad (after download via winscp). Looks ok except a blank space has been inserted between each character and the new lines seems to be missing:
What I have tried:
I have without success tried:
.force_encoding(...)
with all the different "ISO-8859-x" character sets- putting the
force_encoding("ISO-8859-15").encode!('UTF-8')
outside the.read
(works but doesn't solve the problem) - encode to UTF-8 without first forcing another encoding but this leads to
Encoding::UndefinedConversionError: "xC4" from ASCII-8BIT to UTF-8
- writing as binary with
'wb'
and'w+b'
in theFile.open()
(which oddly doesn't seem to make a difference to the outcome). - searching stackoverflow and the web for other ideas.
Any ideas would be much appreciated!
ruby-on-rails ruby character-encoding gmail
What I am doing:
I am using the gmail gem in a Rails 4 app to get email attachments from a specific account at regular intervals. Here is an extract from the core part (here for simplicity only considering the first email and its first attachment):
require 'gmail'
Gmail.connect(@user_email,@user_password) do |gmail|
if gmail.logged_in?
emails = gmail.inbox.emails(:from => @sender_email)
email = emails[0]
attachment = email.message.attachments[0]
File.open("~/temp.csv", 'w') do |file|
file.write(
StringIO.new(attachment.decoded.to_s[2..-2].force_encoding("ISO-8859-15").encode!('UTF-8')).read
)
end
end
end
The encoding of the attached file can vary. The particular one that I am currently having issues with is in Finnish. It contains Finnish characters and a superscripted 3 character.
This is what I expect to get when I run the above code. (This is what I get when I download the attachment manually through gmail user interface):
What the problem is:
However, I am getting the following odd results.
From cat temp.csv
(Looks good to me):
With nano temp.csv
(Here I have no idea what I am looking at):
This is what temp.csv looks like opened in Sublime Text (directly via winscp). First line and small parts look ok but then Chinese/Japanese characters:
This is what temp.csv looks like in Notepad (after download via winscp). Looks ok except a blank space has been inserted between each character and the new lines seems to be missing:
What I have tried:
I have without success tried:
.force_encoding(...)
with all the different "ISO-8859-x" character sets- putting the
force_encoding("ISO-8859-15").encode!('UTF-8')
outside the.read
(works but doesn't solve the problem) - encode to UTF-8 without first forcing another encoding but this leads to
Encoding::UndefinedConversionError: "xC4" from ASCII-8BIT to UTF-8
- writing as binary with
'wb'
and'w+b'
in theFile.open()
(which oddly doesn't seem to make a difference to the outcome). - searching stackoverflow and the web for other ideas.
Any ideas would be much appreciated!
ruby-on-rails ruby character-encoding gmail
ruby-on-rails ruby character-encoding gmail
edited Nov 18 at 10:41
asked Nov 17 at 19:27
Morten Grum
3561414
3561414
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
Not beautiful, but it will work for me now.
After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.
decoded_att = attachment.decoded
data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("rn", "n")
data_as_array = data.chars
data_as_array = data_as_array.delete_if {|i| i == "u0000" || i == "ÿ" || i == "þ"}
data = data_as_array.join('').to_s
File.write("~/temp.csv", data.to_s)
This will work for me now. However, I have no idea how these characters have ended up in the attachment ("ÿ"
and "þ"
in the start of the document and "u0000"
between all remaining characters).
add a comment |
up vote
0
down vote
It seems like you need to do attachment.body.decoded
instead of attachment.decoded
Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Not beautiful, but it will work for me now.
After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.
decoded_att = attachment.decoded
data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("rn", "n")
data_as_array = data.chars
data_as_array = data_as_array.delete_if {|i| i == "u0000" || i == "ÿ" || i == "þ"}
data = data_as_array.join('').to_s
File.write("~/temp.csv", data.to_s)
This will work for me now. However, I have no idea how these characters have ended up in the attachment ("ÿ"
and "þ"
in the start of the document and "u0000"
between all remaining characters).
add a comment |
up vote
0
down vote
Not beautiful, but it will work for me now.
After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.
decoded_att = attachment.decoded
data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("rn", "n")
data_as_array = data.chars
data_as_array = data_as_array.delete_if {|i| i == "u0000" || i == "ÿ" || i == "þ"}
data = data_as_array.join('').to_s
File.write("~/temp.csv", data.to_s)
This will work for me now. However, I have no idea how these characters have ended up in the attachment ("ÿ"
and "þ"
in the start of the document and "u0000"
between all remaining characters).
add a comment |
up vote
0
down vote
up vote
0
down vote
Not beautiful, but it will work for me now.
After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.
decoded_att = attachment.decoded
data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("rn", "n")
data_as_array = data.chars
data_as_array = data_as_array.delete_if {|i| i == "u0000" || i == "ÿ" || i == "þ"}
data = data_as_array.join('').to_s
File.write("~/temp.csv", data.to_s)
This will work for me now. However, I have no idea how these characters have ended up in the attachment ("ÿ"
and "þ"
in the start of the document and "u0000"
between all remaining characters).
Not beautiful, but it will work for me now.
After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.
decoded_att = attachment.decoded
data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("rn", "n")
data_as_array = data.chars
data_as_array = data_as_array.delete_if {|i| i == "u0000" || i == "ÿ" || i == "þ"}
data = data_as_array.join('').to_s
File.write("~/temp.csv", data.to_s)
This will work for me now. However, I have no idea how these characters have ended up in the attachment ("ÿ"
and "þ"
in the start of the document and "u0000"
between all remaining characters).
answered Nov 18 at 18:36
Morten Grum
3561414
3561414
add a comment |
add a comment |
up vote
0
down vote
It seems like you need to do attachment.body.decoded
instead of attachment.decoded
Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28
add a comment |
up vote
0
down vote
It seems like you need to do attachment.body.decoded
instead of attachment.decoded
Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28
add a comment |
up vote
0
down vote
up vote
0
down vote
It seems like you need to do attachment.body.decoded
instead of attachment.decoded
It seems like you need to do attachment.body.decoded
instead of attachment.decoded
answered Nov 18 at 21:11
Dorian
12.5k37383
12.5k37383
Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28
add a comment |
Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28
Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28
Thanks. It seems actually that attachment.body.decoded and attachment.decoded return the exact same string. I check both the strings and their arrays of bytes.
– Morten Grum
Nov 19 at 6:28
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53354765%2fproblem-with-attachments-character-encoding-using-gmail-gem-in-ruby-rails%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown