原創|行業資訊|編輯:郝浩|2013-08-07 12:55:32.000|閱讀 2548 次
概述:如何從演示文稿中提取文本?本文以Microsoft PowerPoint PPTX演示文稿為例,為你介紹如何用Aspose.Slides控件從中提取文本。
# 界面/圖表報表/文檔/IDE等千款熱門軟控件火熱銷售中 >>
開發人員需要從演示文稿中提取文本,這并不罕見。要做到這一點,你需要從演示文稿所有不同圖形的幻燈片中提取文本。為此,本文以Microsoft PowerPoint PPTX演示文稿為例, 為你介紹如何用Aspose.Slides控件從中提取文本。無論是從一張幻燈片中提取文本,還是從演示文稿的所有幻燈片中提取文本,Aspose.Slides使用靜態方法PresentationScanner都能幫你做到。提取的文本會自動打包在命名空間Aspose.Slides.Util下面。
Aspose.Slides for .NET提供一個叫做Aspose.Slides.Util的命名空間,它包括一個PresentationScanner類。這個類顯示了多個從一頁演示文稿或幻燈片中提取文本的重載靜態方法。 從PPTX演示幻燈片中提取文本,可以使用PresentationScanner類下面顯示的重載靜態方法GetAllTextBoxes。這個方法接收SlideEx對象作為一個參數。
執行時,SlideEx方法掃描經過的幻燈片上的所有文本,作為參數返回一組TextFrameEx對象。這意味著與文本相關的任何文本格式都適用。下面的一段代碼顯示在第一張幻燈片上提取文本:
//Instatiate PresentationEx class that represents a PPTX file
using(PresentationEx pptxPresentation = new PresentationEx("d:\\pptx\\testx.pptx"))
{
//Get an Array of TextFrameEx objects from the first slide
TextFrameEx[] textFramesSlideOne = SlideUtil.GetAllTextBoxes(pptxPresentation.Slides[0]);
//Loop through the Array of TextFrames
for(int i=0;i<textFramesSlideOne.Length;i++)
//Loop through paragraphs in current TextFrame
foreach( ParagraphEx para in textFramesSlideOne[i].Paragraphs )
//Loop through portions in the current Paragraph
foreach (PortionEx port in para.Portions)
{
//Display text in the current portion
Console.WriteLine(port.Text);
//Display font height of the text
Console.WriteLine(port.FontHeight);
//Display font name of the text
Console.WriteLine(port.LatinFont.FontName);
}
}
'Instatiate PresentationEx class that represents a PPTX file
Using Dim pptxPresentation As New PresentationEx("d:\pptx\testx.pptx")
'Get an Array of TextFrameEx objects from the first slide
Dim textFramesSlideOne() As TextFrameEx = SlideUtil.GetAllTextBoxes(pptxPresentation.Slides(0))
'Loop through the Array of TextFrames
For i As Integer = 0 To textFramesSlideOne.Length - 1
'Loop through paragraphs in current TextFrame
For Each para As ParagraphEx In textFramesSlideOne(i).Paragraphs
'Loop through portions in the current Paragraph
For Each port As PortionEx In para.Portions
'Display text in the current portion
Console.WriteLine(port.Text)
'Display font height of the text
Console.WriteLine(port.FontHeight)
'Display font name of the text
Console.WriteLine(port.LatinFont.FontName)
Next port
Next para
Next i
End Using
要掃描整個演示文稿的文本,可以使用 PresentationScanner類顯示的靜態方法GetAllTextFrames。它包含兩個參數:
1. 一個PresentationEx對象:顯示當前正從中提取文本的PPTX演示文稿
2. 一個布爾值:決定當文本正從演示文稿中掃描時,主幻燈片是否包含在內。
這種方法將返回一組TextFrameEx對象,帶有完整的文本格式信息。下面的代碼表示掃描來自于演示文稿的文本和格式信息,包括主幻燈片。
//Instatiate PresentationEx class that represents a PPTX file
using(PresentationEx pptxPresentation = new PresentationEx("d:\\pptx\\testx.pptx"))
{
//Get an Array of TextFrameEx objects from all slides in the PPTX
TextFrameEx[] textFramesPPTX = SlideUtil.GetAllTextFrames(pptxPresentation, true);
//Loop through the Array of TextFrames
for (int i = 0; i < textFramesPPTX.Length; i++)
//Loop through paragraphs in current TextFrame
foreach (ParagraphEx para in textFramesPPTX[i].Paragraphs)
//Loop through portions in the current Paragraph
foreach (PortionEx port in para.Portions)
{
//Display text in the current portion
Console.WriteLine(port.Text);
//Display font height of the text
Console.WriteLine(port.FontHeight);
//Display font name of the text
Console.WriteLine(port.LatinFont.FontName);
}
}
'Instatiate PresentationEx class that represents a PPTX file
Using Dim pptxPresentation As New PresentationEx("d:\pptx\testx.pptx")
'Get an Array of TextFrameEx objects from all slides in the PPTX
Dim textFramesPPTX() As TextFrameEx = SlideUtil.GetAllTextBoxes(pptxPresentation.Slides(0))
'Loop through the Array of TextFrames
For i As Integer = 0 To textFramesPPTX.Length - 1
'Loop through paragraphs in current TextFrame
For Each para As ParagraphEx In textFramesPPTX(i).Paragraphs
'Loop through portions in the current Paragraph
For Each port As PortionEx In para.Portions
'Display text in the current portion
Console.WriteLine(port.Text)
'Display font height of the text
Console.WriteLine(port.FontHeight)
'Display font name of the text
Console.WriteLine(port.LatinFont.FontName)
Next port
Next para
Next i
End Using
Aspose.Slides.Util.SlideUtil類顯示多個可供選擇的動態方法來掃描演示文稿或幻燈片中的文本。格式信息也連同掃描的文件被提取出來。 如果你也遇到需要從演示文稿中提取文本或類似的難題,不妨試試Aspose.Slides,相信它會帶給你不一樣的體驗和收獲。
本站文章除注明轉載外,均為本站原創或翻譯。歡迎任何形式的轉載,但請務必注明出處、不得修改原文相關鏈接,如果存在內容上的異議請郵件反饋至chenjj@fc6vip.cn
文章轉載自:慧都控件網